DEV Community

Ekene Ejike
Ekene Ejike

Posted on

39 CVEs in WebGoat. Only 36 Were Reachable.

The Problem

Our CI pipeline flagged 39 CVEs in a Spring Boot application.

Security policy was simple:

Block the release.

Engineering had a different reality:

"We can't triage 39 vulnerabilities today."

So I tried a different approach.

Instead of asking "How many vulnerabilities exist?",

So I wrote a reachability engine to answer a different question:

How many of those vulnerabilities can the application actually execute?

To test this, I ran the engine against OWASP WebGoat:

  • 203 Maven dependencies
  • 158,000 methods in the call graph

The first scan took 33 minutes.

After fixing a hidden graph traversal bug, the analysis finished in 29 seconds.

All code and experiments in this article are reproducible. NetShield Analyzer on GitHub.

Here is what the engine found:

Result Count Detail
Reachable 36 xstream deserialization entry point called
Unknown 1 commons-lang3 package used
Unreachable 2 jose4j never invoked

Thirty-six of 39 CVEs had a static execution path from application code to the vulnerable method. Two were definitively unreachable. One required manual review.

The Question SCA Tools Don't Answer

Every AppSec team runs Software Composition Analysis. Snyk, Dependabot, OWASP Dependency-Check — they all answer the same question:

Does a vulnerable version exist in your dependency tree?

That is a valid question. But it is not the question that determines risk:

Can my application actually trigger the vulnerable code?

WebGoat imports xstream 1.4.5, which has 36 known CVEs. But the reason those CVEs are reachable is because WebGoat's lesson code directly calls the affected deserialization entry point:

// From WebGoat's VulnerableComponentsLesson
XStream xstream = new XStream();
Object obj = xstream.fromXML(payload);
Enter fullscreen mode Exit fullscreen mode

If that call didn't exist, none of those 36 CVEs would have a static execution path from the application.
Meanwhile, WebGoat also has jose4j on the classpath. jose4j has 2 CVEs. But WebGoat never calls any jose4j methods. Those CVEs are unreachable — present in the dependency tree, but impossible to trigger.

A traditional SCA tool treats both equally: 39 CVEs, all blocking.

Defining Reachability

Before going further, a precise definition:

A vulnerability is marked REACHABLE if a static execution path exists from an application entry point to a method known to be affected by the CVE.

This does not guarantee exploitability. A reachable CVE means the vulnerable code can execute under some input conditions. Whether a full exploit chain exists depends on runtime factors — gadget chains, input validation, security managers that static analysis cannot fully resolve.
What reachability does guarantee: if no static path exists, the vulnerable code cannot execute, regardless of input.

The Reachability Problem

Your application calls methods. Those methods call other methods. This creates a call graph, a directed graph of execution paths from your entry points through your code and into your dependencies.

WebGoatApplication.main()
      ↓
Spring Dispatcher
      ↓
VulnerableComponentsLesson.execute()
      ↓
XStream.fromXML(payload)          ← 36 CVEs reachable through this path

Enter fullscreen mode Exit fullscreen mode

If no path exists from your application entry points to the vulnerable method, the vulnerability is unreachable:

WebGoatApplication.main()
      ↓
... (no path to jose4j) ...

JsonWebEncryption.decrypt()       ← jose4j CVE — never called

Enter fullscreen mode Exit fullscreen mode

This is static call graph analysis. It is well studied in research, beginning to appear in commercial tools, and exactly what I implemented in NetShield Analyzer.

Architecture

The analysis pipeline has four stages: dependency resolution → bytecode parsing → call graph construction → CVE reachability analysis.

┌─────────────────┐    ┌──────────────────┐    ┌───────────────────────┐
│  Maven Project  │───▶│  Dependency Tree │───▶│  Parallel JAR        │   │
│    pom.xml      │    │  (203 deps)      │     │  bytecode parsing     │
└─────────────────┘    └──────────────────┘     │  (10 workers)         │
                                                └───────────┬───────────┘
                                                            │
┌─────────────────┐    ┌──────────────────┐    ┌────────────▼──────────┐
│  Risk Triage    │◀───│  OSV CVE Lookup  │◀───│  Call Graph          │
│  + Trust Score  │    │  (10 concurrent) │    │  158K nodes           │
│                 │    │                  │    │  3M edges             │
│  REACHABLE      │    └──────────────────┘    └───────────────────────┘
│  UNREACHABLE    │
│  UNKNOWN        │
└─────────────────┘

Enter fullscreen mode Exit fullscreen mode

For WebGoat (203 dependencies), the engine builds a call graph with 158,410 nodes and 3,059,079 edges, then identifies 76,710 reachable methods from application entry points.

Engineering Challenges

Building a naive call graph is easy. Building one that works on real applications like WebGoat — with 203 dependencies, Spring Boot, Thymeleaf, Hibernate, and OAuth2 — is hard.

Entry Point Discovery

Frameworks hide execution paths. A Spring Boot application doesn't have a simple main → doWork flow.
The engine automatically detects:

main()                              ← Standard Java
Spring @Controller methods          ← Web MVC
Servlet doGet() / doPost()          ← Jakarta Servlet
JAX-RS @Path Resource methods       ← REST APIs
Kafka onMessage() / consume()       ← Event-driven
Enter fullscreen mode Exit fullscreen mode

WebGoat has dozens of @Controller classes. Missing any of them would mean missing execution paths to vulnerable code.

Virtual Dispatch Resolution

Java is polymorphic. When code calls a method through an interface, the actual implementation depends on the runtime type. A naive call graph sees only the interface method — a dead end.
The engine performs class hierarchy traversal. For every invokevirtual or invokeinterface instruction, it walks the known subclass tree and adds edges to all concrete implementations:

// BFS through class hierarchy to find all concrete implementations
func (b *Builder) addVirtualEdges(cg *models.CallGraph, callerID,
    targetClass, methodName, descriptor string) {

    visited := make(map[string]bool)
    queue := []string{targetClass}

    for len(queue) > 0 {
        currentClass := queue[0]
        queue = queue[1:]
        if visited[currentClass] { continue }
        visited[currentClass] = true

        if classData, ok := b.classRegistry[currentClass]; ok {
            for _, m := range classData.Methods {
                if m.Name == methodName && m.Descriptor == descriptor {
                    if currentClass != targetClass {
                        virtualID := models.GetMethodID(
                            currentClass, methodName, descriptor)
                        cg.AddEdge(callerID, virtualID,
                            models.CallTypeVirtual)
                    }
                    break
                }
            }
        }

        if subclasses, exists := b.classHierarchy[currentClass]; exists {
            queue = append(queue, subclasses...)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Without this, any application using interfaces — which is every real Java application — would have incomplete call graphs.

Reflection and Lambdas

Reflection (Class.forName(), Method.invoke()) and invokedynamic (lambdas, method references) are patterns that defeat naive static analysis. The engine tags these cases with IsReflective, IsLambda, and IsDynamic flags. When reachability depends on a reflective path, the result is UNKNOWN rather than a false negative.
A tool that never says "I don't know" is a tool you cannot trust.

The Accuracy Problem (and How I fixed it)

The first version of the engine had a critical blind spot. When I ran it against WebGoat, all 36 xstream CVEs came back as UNREACHABLE. The verdict was "SAFE TO SHIP" — dangerously wrong for an application that directly calls xstream.fromXML().

Two bugs:

Bug 1: Package Name Matching

Maven coordinates don't always map cleanly to Java package paths. The engine was constructing package names like com/thoughtworks/xstream/xstream — but the actual classes live in com/thoughtworks/xstream/XStream.class.
The fix: generate multiple candidate package prefixes from Maven coordinates:

func buildPackageCandidates(groupID, artifactID string) []string {
    // Pattern 1: groupId/artifactId
    //   org/yaml/snakeyaml
    // Pattern 2: groupId only (when artifactId duplicates last segment)
    //   com/thoughtworks/xstream
    // Pattern 3: Strip common prefixes (jackson-, commons-, spring-)
    //   com/fasterxml/jackson/databind
    // Pattern 4: Hyphen-split last segment
    //   org/apache/commons/lang3
}
Enter fullscreen mode Exit fullscreen mode
Bug 2: CVE Method Extraction

The OSV vulnerability database doesn't provide structured method-level data. The engine needs to infer which methods are vulnerable from CVE descriptions. The original heuristic only checked for "jndi" and "deserialize" — far too narrow.
The fix: a multi-layered heuristic that extracts affected methods from CVE descriptions and known vulnerability patterns:

// Keyword-based: CVE mentions "arbitrary code execution" → map to fromXML, readObject
// Package-based: Package is "xstream" → map to fromXML, unmarshal
// DatabaseSpecific: OSV provides affected_classes → use directly
Enter fullscreen mode Exit fullscreen mode

After both fixes, the same WebGoat scan correctly identified all 36 xstream CVEs as REACHABLE. The verdict flipped from "SAFE TO SHIP" to "DO NOT SHIP" — the correct answer for an application calling xstream.fromXML() on an ancient version with 36 known deserialization vulnerabilities.

The Performance Problem (33 Minutes → 29 Seconds)

The first working scan of WebGoat took 33 minutes. For a CI/CD security gate, that's unacceptable.

Bottleneck 1: Sequential JAR Parsing

203 dependency JARs were parsed one at a time. Each JAR is independent — there's no reason they can't be parsed in parallel.
Fix: 10-worker goroutine pool with a channel for results:

const maxWorkers = 10
sem := make(chan struct{}, maxWorkers)
resultsCh := make(chan jarResult, len(dependencies))

for _, dep := range dependencies {
    wg.Add(1)
    go func(jarPath string) {
        defer wg.Done()
        sem <- struct{}{}
        defer func() { <-sem }()

        analyzer := bytecode.NewJARAnalyzer(jarPath)
        classes, err := analyzer.AnalyzeJAR()
        resultsCh <- jarResult{classes: classes, err: err, jarPath: jarPath}
    }(dep.JARPath)
}
Enter fullscreen mode Exit fullscreen mode

Bottleneck 2: O(V×E) DFS

This was the biggest hidden bottleneck. The DFS reachability traversal did this:

// BEFORE: O(V×E) — scans ALL edges for every visited node
for _, edge := range cg.Edges {
    if edge.From == methodID {
        dfsReachability(cg, edge.To, reachable, visited)
    }
}
Enter fullscreen mode Exit fullscreen mode

With 76K visited nodes and 3M edges, that's approximately 228 billion iterations. The DFS scanned the entire edge list for every visited node.
Fix: add an adjacency list to the call graph and use iterative stack-based DFS:

// AFTER: O(V+E) — follow only outgoing edges via adjacency list
func (cg *CallGraph) AddEdge(from, to string, callType CallType) {
    cg.Edges = append(cg.Edges, &CallEdge{From: from, To: to, Type: callType})
    cg.AdjList[from] = append(cg.AdjList[from], to)  // ← one extra line
}

// DFS now uses the adjacency list
for _, neighbor := range cg.AdjList[current] {
    if !visited[neighbor] {
        stack = append(stack, neighbor)
    }
}
Enter fullscreen mode Exit fullscreen mode

From 228 billion iterations to 3 million. Same results. Same accuracy. Orders of magnitude fewer operations.

Bottleneck 3: Linear Triage Lookups

The triage step iterated all 76K reachable methods for every vulnerability check.
Fix: pre-index reachable methods by class name at construction time. Lookups drop from O(76K) to O(unique classes per query).

Benchmark

Phase Before After
Total runtime 33 min 29 sec
Call graph traversal ~15 min < 1 sec
JAR parsing ~12 min ~2 min
Reachability triage ~6 min ~27 sec

None of these optimizations trade accuracy for speed. The adjacency list produces identical results to scanning all edges. Parallel parsing produces identical results to sequential parsing. The pre-index is the same data in a faster structure.

WebGoat: Full Results

Here is the scan output:
WebGoat scan output

Cross-Referencing with Snyk

To validate coverage, I ran the same project through Snyk, one of the most widely used SCA platforms:

$ snyk test --file=pom.xml
Tested 163 dependencies for known issues, found 48 issues, 48 vulnerable paths.
Enter fullscreen mode Exit fullscreen mode

Snyk found 48 CVEs and 48 vulnerable paths. NetShield found 39. The 9 additional CVEs came from packages that the OSV database does not yet cover:

Package Snyk NetShield (OSV) Gap
xstream@1.4.5 36 36
commons-lang3@3.14.0 1 1
jose4j@0.9.3 2 2
logback-core@1.5.18 2 0 +2
tomcat-embed-core@10.1.46 4 0 +4
jackson-core@2.19.2 1 0 +1
jruby-stdlib@10.0.0.1 2 0 +2
Total 48 39 9

This is not a reachability gap — it is a database coverage gap. NetShield queries osv.dev, an open vulnerability database. Snyk uses a proprietary database with a dedicated research team that catalogs vulnerabilities faster, especially newly published ones (several of the missing CVEs were marked "new" in Snyk's output).
NetShield correctly classified every CVE it received from OSV. It simply never received the 9 additional CVEs because OSV doesn't have them yet. A future version could integrate additional sources to close this gap.

What This Comparison Shows

Metric Snyk (SCA only) NetShield (SCA + Reachability)
CVE database Proprietary (broader) OSV.dev (open, narrower)
CVEs reported 48 39
Reachable identified Not assessed 36
Unreachable identified Not assessed 2
Unknown Not assessed 1
Manual triage required 48 1

The tools answer different questions.

Snyk answers: what vulnerabilities exist?
NetShield answers: which ones can your code trigger?

They are complementary, not competitive.

Accuracy Considerations

Every static analysis tool must address false positives and false negatives.
False positives (marked reachable but not exploitable) may occur when:

  • Reflection resolves differently at runtime than predicted statically
  • Dependency shading changes package names post-build
  • A method is statically reachable but guarded by runtime checks (e.g., input validation, security managers)

False negatives (marked unreachable but actually exploitable) may occur when:

  • Dynamic class loading introduces new call paths at runtime
  • Runtime proxies (CGLIB, Byte Buddy) generate methods not visible in bytecode
  • Native JNI code creates paths invisible to JVM bytecode analysis

The engine mitigates false negatives by reporting UNKNOWN when analysis is incomplete, rather than assuming safety. The commons-lang3 finding in WebGoat is an example: the package is referenced, but the engine could not confirm or deny the specific vulnerable method, so it flagged it for human review.

CVE database coverage is another source of gaps. As the Snyk cross-reference showed, the OSV database does not contain every known advisory. NetShield can only assess reachability for CVEs it knows about. Integrating additional vulnerability sources would reduce this blind spot.

Methodology

For reproducibility, here is how the experiment was conducted:

Project analyzed:    OWASP WebGoat (v2023.8)
Language:            Java 17
Build system:        Maven
Dependencies:        203 (transitive)

Analysis environment:
  CPU:               12-core
  RAM:               16 GB
  OS:                Linux (Kali)

NetShield configuration:
  Entry points:      main + Spring MVC controllers (auto-detected)
  Call graph:        Static bytecode analysis
  Reflection:        Tagged as UNKNOWN
  CVE source:        OSV.dev database
  Analysis date:     March 2026
Enter fullscreen mode Exit fullscreen mode

CI/CD Integration

The engine is designed for CI/CD pipelines. Exit codes:

0 = No reachable vulnerabilities   → Pass the build
1 = Reachable vulnerability found  → Fail the build
2 = Analysis failure               → Fail the build
Enter fullscreen mode Exit fullscreen mode

Output control flags for clean CI logs:

# Minimal CI output: no progress, no CVE details
netshield-analyzer --packages com.yourcompany --quiet --summary-only

# JSON for programmatic consumption
netshield-analyzer --packages com.yourcompany --format json --quiet
Enter fullscreen mode Exit fullscreen mode

What the Engine Cannot Do

No static analysis tool is perfect. The engine reports UNKNOWN when it encounters:

  • Dynamic class loading via external configuration
  • Heavy reflection where targets are runtime-computed strings
  • Native JNI code
  • Runtime code generation (CGLIB proxies)

This happened with the commons-lang3 finding in WebGoat. The package is used, but the engine couldn't confirm whether the specific vulnerable method is called. That gets flagged for human review — not silently ignored, not falsely marked safe.
Security engineers respect tools that admit limits.

Related Work

Several commercial and open-source tools are beginning to incorporate reachability analysis:

  • Snyk Reachability — proprietary call graph analysis integrated into the Snyk platform
  • Endor Labs — function-level reachability with support for multiple languages
  • Chainguard — supply chain security with reachability-aware scanning

NetShield focuses on transparent call graph analysis with CI-native output, open-source implementation, and explicit methodology. The engine's approach is intentionally auditable — every edge in the call graph can be inspected, and the DFS traversal is deterministic.

Conclusion

Traditional SCA answers a dependency question:

Does a vulnerable version exist?

Reachability analysis answers the execution question that actually determines risk:

Can the application run the vulnerable code?

In the WebGoat experiment:

  • 39 CVEs existed in the dependency tree
  • 36 CVEs were reachable from application entry points
  • 2 CVEs were definitively unreachable
  • 1 CVE required manual review

The difference between those numbers is the difference between alert fatigue and actionable security.


NetShield Analyzer source code is available on GitHub.

Top comments (0)