github

Why Continuous Fuzzing Still Misses Critical Bugs

GStreamer was fuzzed for 7 years and still had 29 critical bugs. Poppler had 60% coverage and a 1-click RCE in a dependency. Here's why continuous fuzzing fails without human oversight — and how to fix it.

TL;DR

Even projects fuzzed for years by OSS-Fuzz can harbor critical vulnerabilities — GStreamer had 29 new bugs after 7 years of fuzzing
Low code coverage, uninstrumented dependencies, and focus on decoders over encoders create blind spots that fuzzers never explore
A five-step workflow (preparation, coverage, context, value, triaging) can push fuzzing beyond basic edge coverage to find deeper bugs
Some bug classes — like those requiring gigabyte inputs or hours to trigger — remain nearly impossible for fuzzers to catch

The Big Picture

OSS-Fuzz has been running for years. It's fuzzing over 1,300 open source projects. Thousands of bugs found. Case closed, right?

Not even close.

A GitHub Security Lab researcher spent the last year auditing popular projects enrolled in OSS-Fuzz — some for seven years straight — and found critical vulnerabilities that continuous fuzzing completely missed. We're talking about GStreamer (29 new bugs), Poppler (1-click RCE), and Exiv2 (encoding vulnerabilities that sat dormant for years).

The problem isn't that fuzzing doesn't work. It's that most teams treat it like a checkbox. Enroll in OSS-Fuzz, watch the dashboard turn green, move on. But fuzzing without human oversight is like running tests without reading the results. You're generating noise, not security.

This isn't a theoretical problem. These are libraries installed by default on millions of Ubuntu systems. GStreamer runs every time you open a video file. Poppler renders your PDFs. And they've been "protected" by continuous fuzzing for years while critical bugs survived in plain sight.

How Bugs Hide in Plain Sight

GStreamer: The Coverage Problem

GStreamer is the default multimedia framework for GNOME. It's been in OSS-Fuzz since 2017. Seven years of continuous fuzzing. And in December 2024, a researcher found 29 new vulnerabilities.

The stats tell the story. GStreamer has two active fuzzers and 19% code coverage. OpenSSL? 139 fuzzers. The bzip2 compression library? 93% coverage. GStreamer's numbers aren't just bad — they're a warning sign that nobody was watching.

This is the first failure mode: OSS-Fuzz requires human supervision. Someone needs to monitor coverage, write new fuzzers for uncovered code, and actually check if the build is working. Many projects fail at the build stage and aren't being fuzzed at all. But the maintainers don't know because they stopped checking after enrollment.

Developers aren't security experts. For them, "enrolled in OSS-Fuzz" sounds like "protected by Google." It's not. It's a tool that needs maintenance, just like everything else in your stack.

Poppler: The Dependency Problem

Poppler is Ubuntu's default PDF parser. It has 16 fuzzers and 60% code coverage — solid numbers, well above average. And yet, a 1-click RCE was found in Evince (Ubuntu's PDF viewer) that OSS-Fuzz never caught.

The bug wasn't in Poppler. It was in DjVuLibre, a dependency that handles the DjVu document format. DjVuLibre isn't included in Poppler's OSS-Fuzz build at all. But it ships by default with Evince and Papers, installed on millions of systems.

This is the second failure mode: external dependencies. Poppler relies on freetype, cairo, libpng, and others. Based on the low coverage reported for these libraries, they're not instrumented by libFuzzer. The fuzzer gets no feedback from them, so entire execution paths are never tested.

Your software is only as secure as the weakest dependency in your graph. If you're fuzzing the main library but ignoring what it calls, you're testing a fiction.

Exiv2: The Encoder Problem

Exiv2 is a C++ library for reading and writing image metadata. It's used by GIMP, LibreOffice, and others. It's been in OSS-Fuzz since 2021. And new vulnerabilities are still being reported by external researchers.

The reason? Researchers focus on decoders. Decoding is the obvious attack surface — you're parsing untrusted input. But encoding gets ignored. And encoding bugs can be just as critical when libraries are used in background workflows: thumbnail generation, file conversions, cloud processing pipelines.

This is the third failure mode: incomplete attack surface coverage. If you're only fuzzing the parts that feel dangerous, you're leaving the rest of the codebase unprotected.

The Five-Step Fuzzing Workflow

Fuzzing isn't magic. It's a process. And like any process, it needs structure. Here's the workflow that's been producing results in production environments.

Step 1: Code Preparation

Before you fuzz, you need to optimize the target code. Remove checksums that block mutation. Reduce randomness that makes coverage unstable. Drop unnecessary delays that slow execution. Handle signals properly so crashes are caught cleanly.

This isn't about changing functionality. It's about making the code fuzz-friendly so the fuzzer can explore more paths in less time.

Step 2: Improving Code Coverage

This is the iterative loop: run fuzzers, check coverage, improve coverage, repeat. You're looking at LCOV reports for uncovered code areas and deciding whether to write new harnesses or create new input cases.

The target? Over 90% code coverage. Not 60%. Not 70%. Over 90%. That means fuzzing not just decoders but encoders. Not just file readers but file writers. Not just the happy path but the error-handling paths.

You'll need advanced techniques like fault injection (simulating malloc failures, interrupted reads, missing files) and snapshot fuzzing (restoring program state before each test case). These aren't optional for high coverage — they're required.

Step 3: Improving Context-Sensitive Coverage

Most fuzzers track edge coverage: transitions between basic blocks. If execution goes from block A to block B, the fuzzer records it. Simple. Efficient. But incomplete.

Edge coverage doesn't track execution order. In a plugin pipeline where each plugin modifies global state, different execution orders can produce very different bugs. But the edge coverage looks identical, so the fuzzer thinks it's already explored those paths.

Context-sensitive coverage tracks not just which edges were executed, but what code was executed before. AFL++ implements this with context-sensitive branch coverage (hashing call stack IDs with edge identifiers) and N-gram branch coverage (combining the current location with the previous N locations).

You won't hit 90% context-sensitive coverage. Anything above 60% is excellent. But that 60% will find bugs that edge coverage misses.

Step 4: Improving Value Coverage

Here's a divide-by-zero bug in a web server:

uint32_t size = r.content_length / (FRAME_SIZE * 2 - r.padding);

If r.padding == FRAME_SIZE * 2, the denominator is zero. Crash. The fuzzer executed this function 1,910 times and never found it.

Why? Because 100% code coverage doesn't mean you've tested all possible values. The fuzzer hit the line, but it never tried the specific value that triggers the bug.

Value coverage means guiding the fuzzer by variable value ranges, not just control-flow paths. You instrument strategic variables — those controlled by input and involved in critical operations — so different values map to different execution paths.

AFL++ CmpLog and Clang trace-div help, but they don't give variable-level granularity. For that, you need custom instrumentation using LLVM FunctionPass or similar techniques.

Step 5: Triaging

Once you've got crashes, you need to triage them. Deduplicate. Prioritize. Verify exploitability. This is where human judgment matters most. Not every crash is a security issue, and not every security issue is exploitable in practice.

The Last Mile: Bugs Fuzzers Can't Catch

Even with all of this, some bugs survive. Two categories are especially hard.

Big Input Cases

Vulnerabilities that require megabyte or gigabyte inputs are nearly impossible to fuzz. Most fuzzers cap input size at 1 MB because larger inputs slow execution. And the input space is exponential: O(256ⁿ). Even coverage-guided fuzzers struggle as n grows.

CVE-2022-40303 in libxml2 is an integer overflow that requires an input larger than 2 GB. No fuzzer is finding that.

Bugs That Need Time

Fuzzers execute hundreds or thousands of test cases per second. Per-execution timeouts are 1-10 milliseconds. But some bugs need seconds or hours to trigger.

A recent Poppler vulnerability is a reference-count overflow. If you increment a 32-bit counter 2³² times, it wraps to zero and triggers a use-after-free. The proof of concept took 12 hours to run. No fuzzer is waiting 12 hours per test case.

For these bugs, you need static analysis, concolic testing, or manual code review. Fuzzing won't save you.

The Bottom Line

Use continuous fuzzing if you're building security-critical software. But don't treat it as fire-and-forget. Monitor coverage. Instrument dependencies. Fuzz encoders, not just decoders. Push beyond edge coverage into context-sensitive and value coverage.

Skip it if you think enrollment in OSS-Fuzz is enough. It's not. Without human oversight, you're generating a false sense of security while critical bugs sit undetected for years.

The real risk here isn't that fuzzing doesn't work — it's that teams think it's working when it isn't. GStreamer, Poppler, and Exiv2 were all "protected" by OSS-Fuzz. They all had critical vulnerabilities that survived for years. The difference between finding those bugs and missing them isn't the fuzzer. It's whether anyone was actually watching.

Source: GitHub Blog