Johnson

Posted on Apr 11

Why AI Agents Get JVM Codebases Wrong — and How Bytecode Changes That

#agents #ai #java #kotlin

Your AI agent just confidently told you which methods call AbClient.getOption(). It listed six call sites. The actual number is nineteen.

The other thirteen are there — just not visible from source code. Some constants are defined in separate modules and passed across class boundaries. Some calls go through Kotlin inline functions that got expanded by the compiler. Some are hidden behind synthetic bridge methods generated for lambda captures.

The agent read the source. The source lied.

The Wrong Layer

Most code intelligence tools — GitNexus, code-review-graph, and the rest — are built on Tree-sitter. Tree-sitter is excellent at what it does: it parses syntax, fast, incrementally, with error tolerance. It's why your editor highlights code correctly while you're still typing.

But syntax is the wrong layer for understanding what code does.

Tree-sitter sees one file at a time, with no type resolution and no cross-file dataflow. Feed it a Spring Boot monolith and ask "what calls this method?" — it will search for matching identifiers. That's grep with an AST. It works until it doesn't, and in any real JVM project, it stops working constantly:

Spring annotation inheritance. A @RequestMapping("/api/orders") on an abstract controller doesn't appear on the concrete subclass. Tree-sitter reads the subclass, finds no annotation, and either misses the endpoint or guesses wrong.

Kotlin inline functions. The compiler erases them. A call to inline fun measure(block: () -> T) disappears from bytecode — the body gets inlined at every call site. Tree-sitter shows you a call to measure(). The actual execution path has no measure() in it.

Cross-file constants. abClient.getOption(AbTestIds.CHECKOUT_V2) — where does CHECKOUT_V2 resolve to? Tree-sitter sees an identifier. The integer value it carries is in another file, possibly in another module. The chain breaks.

Synthetic methods. Kotlin generates bridge methods for private field access, companion object delegation, and lambda captures. None of these exist in source. All of them can be part of a real call chain.

For an LLM, these gaps are not minor imprecision. They produce wrong blast radius estimates, missed cleanup targets, and incorrect dependency maps. An agent acting on a broken call graph makes broken decisions.

The Right Layer

When the Kotlin compiler is done, the result is bytecode. At that point:

Every type has been resolved. var x = foo() becomes INVOKEVIRTUAL com/example/Foo.bar ()Ljava/lang/String; — no ambiguity.
Every inline function has been expanded. The call graph reflects what the JVM will actually execute.
Every synthetic method exists as a real node. Lambda captures, bridge methods, companion delegations — all visible.
Every annotation is queryable data, including inherited ones. Walking the class hierarchy to find @RequestMapping is a graph traversal, not a grep.
Constant values are resolved across class boundaries. AbTestIds.CHECKOUT_V2 = 1042 — the integer is right there in the constant pool.

This is what Graphite builds on. It takes compiled bytecode — your JAR, your Spring Boot fat JAR, your WAR file — and constructs a program graph. Nodes are program elements: methods, fields, constants, call sites. Edges are relationships: dataflow, calls, type hierarchy, annotations.

Then it exposes that graph through Cypher — the same query language used by Neo4j — so you can ask structured questions and get structured answers.

What This Looks Like in Practice

Finding all AB test IDs passed to a specific SDK method:

MATCH (c:IntConstant)-[:DATAFLOW*]->(cs:CallSiteNode)
WHERE cs.callee_class = 'com.example.ab.AbClient'
  AND cs.callee_name = 'getOption'
RETURN c.value, cs.caller_class, cs.caller_name

You get back 19 constants, not 6. Including the ones passed through local variables, the ones defined in a constants object in a different module, and the ones flowing through conditional branches.

Mapping every REST endpoint in a Spring Boot application — including those defined on abstract controllers:

graphite query /data/app-graph \
  "MATCH (n:MethodNode)-[:HAS_ANNOTATION]->(a:AnnotationNode)
   WHERE a.type =~ '.*Mapping'
   RETURN n.declaring_class, a.type, a.value
   ORDER BY a.value"

The type hierarchy traversal is built into the graph. Inherited annotations show up automatically.

The Token Argument

Beyond correctness, there's efficiency. When an LLM agent tries to answer "what calls this method?" by reading source, it needs to scan every file that might contain a call site. For a monolith with 500 service classes, that's easily 2 million tokens — to answer a question that Graphite resolves in a single query returning a few hundred bytes.

Task	Raw source	Graphite	Reduction
Find all AB test IDs	~500 files, 2M tokens	`callSites` + `backwardSlice` → 23 results	99.99%
Map REST endpoints	~200 controllers, 800K tokens	`memberAnnotations` scan → structured list	99.9%
Resolve type hierarchy	~100 files per type chain	`supertypes` / `subtypes` → direct answer	99%

The agent doesn't need to read source to answer structural questions. It queries the graph. The graph answers in milliseconds.

Getting Started

# Install
brew tap johnsonlee/tap
brew install graphite graphite-explore

# Build a graph from your JAR
graphite build app.jar -o /data/app-graph --include com.example

# Query
graphite query /data/app-graph "MATCH (n:CallSiteNode) RETURN n LIMIT 10"

# Visualize
graphite-explore /data/app-graph --port 8080

Or use the Kotlin API directly:

val graph = JavaProjectLoader(LoaderConfig(
    includePackages = listOf("com.example")
)).load(Path.of("app.jar"))

val results = Graphite.from(graph).query {
    findArgumentConstants {
        method {
            declaringClass = "com.example.ab.AbClient"
            name = "getOption"
        }
        argumentIndex = 0
    }
}

Graphite is open source under Apache 2.0: github.com/johnsonlee/graphite

The problem with source-based code intelligence isn't that the tools are bad. It's that source code is a representation designed for humans to write and read — not for programs to reason about. Bytecode is a representation designed for execution. If you want to understand what code does, start where the compiler finished.

DEV Community