How to Evaluate Security Vulnerability Reports: A Case Study with curl and Mythos

By • min read

Overview

Security vulnerability reports are a double-edged sword: they can uncover critical flaws, but they can also generate noise that wastes developer time. In April 2026, Anthropic's Mythos automated tool reported five vulnerabilities in the popular HTTP client library curl. Daniel Stenberg, curl's founder, manually reviewed each one and found that only a single issue was a genuine bug—three were false positives, and another was merely a coding error without security implications. This case study provides a step-by-step guide to evaluating automated vulnerability reports, using Mythos's findings on curl as a concrete example.

How to Evaluate Security Vulnerability Reports: A Case Study with curl and Mythos

Prerequisites

Before diving into the evaluation process, ensure you have:

Basic understanding of HTTP protocols and the curl command-line tool (or libcurl).
Familiarity with common vulnerability classes (buffer overflows, use-after-free, integer overflows).
Access to the curl source code (available on GitHub).
A development environment to compile and test curl locally (Linux, macOS, or WSL on Windows).
Patience to manually inspect code—automated tools are great, but they cannot replace human reasoning.

Step-by-Step Guide

1. Understanding the Mythos Report

In April 2026, Anthropic released a report from its Mythos static analysis tool, claiming five distinct vulnerabilities in curl. The tool flagged code patterns that could lead to memory corruption, information leaks, or denial of service. Do not take any automated report at face value. Treat each finding as a hypothesis that requires verification.

For this tutorial, we will simulate the evaluation of those five findings. Assume the report provides file names, line numbers, and a brief description of each potential issue. (The exact details are omitted here, but the methodology applies universally.)

2. Manual Review: Categorizing Findings

Stenberg categorized the five reports into three groups:

False positives (3): Code that looks dangerous but is actually safe due to invariants or checks elsewhere.
Non-security bug (1): A real coding error that does not cause a security vulnerability under normal use.
Genuine vulnerability (1): A flaw that could be exploited under certain conditions.

To replicate this process, follow these sub-steps:

3a. Reproduce the Reported Issue

Check out the specific version of curl that was analyzed (likely the latest stable release at the time). Compile it with debugging symbols and any special flags needed to trigger the condition Mythos described. Then write a minimal test case (e.g., a crafted HTTP request) to see if the tool's warning leads to abnormal behavior.

For example, if Mythos reported a buffer overflow in handling HTTP headers, craft an HTTP response with an extremely long header and observe curl's behavior with tools like AddressSanitizer (ASan).

3b. Trace the Data Flow

For each finding, manually trace the data flow from input to the flagged function. Use a debugger (GDB/LLDB) or static analysis visualization. Ask: Can an attacker control the size or content that reaches this point? Are there any prior checks that prevent exploitation?

3c. Identify Invariant Protections

Many false positives arise because the tool cannot see cross-function invariants. For example, a function may assume a pointer is non-null because of a check in its caller. Document these protections to confirm the finding is not exploitable.

4. Handling False Positives (Three Cases)

Stenberg found three of Mythos's claims to be false positives. These typically fall into patterns such as:

Unreachable code: The flagged code path is never reached in practice.
Integer overflow that is safe: The overflow occurs but yields no control-flow hijack.
Memory leak without exploitation: A resource leak that does not lead to use-after-free.

To confirm a false positive, add comments in the code explaining why the reported pattern is benign. You may also file a bug report with the tool's maintainers to improve its accuracy.

5. The “Just a Bug” Finding

One of Mythos's findings turned out to be a genuine coding error but not a security vulnerability. For example, a missing null check that could cause a crash only if a specific environment variable was set—something an attacker cannot control. Stenberg referred to this as “just a bug.”

To differentiate a bug from a vulnerability, assess the exploitability:

Is the bug reachable by an attacker? (e.g., via network input, file read, environment)
Can the crash or undefined behavior be leveraged to execute arbitrary code or leak secrets?
What are the preconditions? If they are unrealistic (e.g., root privileges already needed), it is not a security flaw.

In such cases, fix the code but do not assign a CVE unless it meets the criteria for a security issue.

6. Confirming the Real Vulnerability

The fifth finding was a valid vulnerability. To confirm it, you would:

Write a proof-of-concept (PoC) that crashes curl or leaks memory.
Run the PoC under ASan to show a clear violation (e.g., heap-buffer-overflow).
Determine the impact: remote code execution? denial of service? information disclosure?
Report responsibly to the curl security team via security@curl.se.

In Stenberg's case, this single vulnerability likely required a patch and perhaps a CVE. The rest were dismissed after thorough review.

Common Mistakes

When evaluating automated vulnerability reports, avoid these pitfalls:

Trusting the tool blindly: Automated scanners have high false-positive rates. Always verify.
Skipping manual trace: Without tracing data flow, you cannot judge exploitability.
Assuming a crash equals vulnerability: Not all crashes are exploitable. Distinguish between denial-of-service (often low severity) and code execution.
Ignoring context: The same pattern may be safe in one codebase but dangerous in another (e.g., memory allocation in tight loops vs. user-controlled input).
Over-reporting: Filing CVEs for every bug dilutes the value of security advisories. Only flag issues that are both real and exploitable.

Summary

Evaluating automated vulnerability reports requires skepticism, technical diligence, and a systematic approach. In curl's case, Mythos identified five issues, but only one turned out to be a genuine security vulnerability. Three were false positives (safe by design), and one was a non-security bug. By manually reproducing each finding, tracing data flows, and assessing exploitability, developers can separate noise from actionable threats. This process not only improves security but also helps refine automated tools. Always prioritize manual review—especially for critical infrastructure like curl.