<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Johnson</title>
    <description>The latest articles on DEV Community by Johnson (@johnsonlee).</description>
    <link>https://dev.to/johnsonlee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3872684%2Fb58e0504-c401-4126-8fa2-60228011303d.jpeg</url>
      <title>DEV Community: Johnson</title>
      <link>https://dev.to/johnsonlee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/johnsonlee"/>
    <language>en</language>
    <item>
      <title>Why AI Agents Get JVM Codebases Wrong — and How Bytecode Changes That</title>
      <dc:creator>Johnson</dc:creator>
      <pubDate>Sat, 11 Apr 2026 11:48:18 +0000</pubDate>
      <link>https://dev.to/johnsonlee/why-ai-agents-get-jvm-codebases-wrong-and-how-bytecode-changes-that-1hc6</link>
      <guid>https://dev.to/johnsonlee/why-ai-agents-get-jvm-codebases-wrong-and-how-bytecode-changes-that-1hc6</guid>
      <description>&lt;p&gt;Your AI agent just confidently told you which methods call &lt;code&gt;AbClient.getOption()&lt;/code&gt;. It listed six call sites. The actual number is nineteen.&lt;/p&gt;

&lt;p&gt;The other thirteen are there — just not visible from source code. Some constants are defined in separate modules and passed across class boundaries. Some calls go through Kotlin inline functions that got expanded by the compiler. Some are hidden behind synthetic bridge methods generated for lambda captures.&lt;/p&gt;

&lt;p&gt;The agent read the source. The source lied.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wrong Layer
&lt;/h2&gt;

&lt;p&gt;Most code intelligence tools — GitNexus, code-review-graph, and the rest — are built on &lt;a href="https://tree-sitter.github.io/" rel="noopener noreferrer"&gt;Tree-sitter&lt;/a&gt;. Tree-sitter is excellent at what it does: it parses syntax, fast, incrementally, with error tolerance. It's why your editor highlights code correctly while you're still typing.&lt;/p&gt;

&lt;p&gt;But syntax is the wrong layer for understanding what code &lt;em&gt;does&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Tree-sitter sees one file at a time, with no type resolution and no cross-file dataflow. Feed it a Spring Boot monolith and ask "what calls this method?" — it will search for matching identifiers. That's grep with an AST. It works until it doesn't, and in any real JVM project, it stops working constantly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spring annotation inheritance.&lt;/strong&gt; A &lt;code&gt;@RequestMapping("/api/orders")&lt;/code&gt; on an abstract controller doesn't appear on the concrete subclass. Tree-sitter reads the subclass, finds no annotation, and either misses the endpoint or guesses wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kotlin inline functions.&lt;/strong&gt; The compiler erases them. A call to &lt;code&gt;inline fun measure(block: () -&amp;gt; T)&lt;/code&gt; disappears from bytecode — the body gets inlined at every call site. Tree-sitter shows you a call to &lt;code&gt;measure()&lt;/code&gt;. The actual execution path has no &lt;code&gt;measure()&lt;/code&gt; in it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-file constants.&lt;/strong&gt; &lt;code&gt;abClient.getOption(AbTestIds.CHECKOUT_V2)&lt;/code&gt; — where does &lt;code&gt;CHECKOUT_V2&lt;/code&gt; resolve to? Tree-sitter sees an identifier. The integer value it carries is in another file, possibly in another module. The chain breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synthetic methods.&lt;/strong&gt; Kotlin generates bridge methods for private field access, companion object delegation, and lambda captures. None of these exist in source. All of them can be part of a real call chain.&lt;/p&gt;

&lt;p&gt;For an LLM, these gaps are not minor imprecision. They produce wrong blast radius estimates, missed cleanup targets, and incorrect dependency maps. An agent acting on a broken call graph makes broken decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Right Layer
&lt;/h2&gt;

&lt;p&gt;When the Kotlin compiler is done, the result is bytecode. At that point:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every type has been resolved. &lt;code&gt;var x = foo()&lt;/code&gt; becomes &lt;code&gt;INVOKEVIRTUAL com/example/Foo.bar ()Ljava/lang/String;&lt;/code&gt; — no ambiguity.&lt;/li&gt;
&lt;li&gt;Every inline function has been expanded. The call graph reflects what the JVM will actually execute.&lt;/li&gt;
&lt;li&gt;Every synthetic method exists as a real node. Lambda captures, bridge methods, companion delegations — all visible.&lt;/li&gt;
&lt;li&gt;Every annotation is queryable data, including inherited ones. Walking the class hierarchy to find &lt;code&gt;@RequestMapping&lt;/code&gt; is a graph traversal, not a grep.&lt;/li&gt;
&lt;li&gt;Constant values are resolved across class boundaries. &lt;code&gt;AbTestIds.CHECKOUT_V2 = 1042&lt;/code&gt; — the integer is right there in the constant pool.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what Graphite builds on. It takes compiled bytecode — your JAR, your Spring Boot fat JAR, your WAR file — and constructs a program graph. Nodes are program elements: methods, fields, constants, call sites. Edges are relationships: dataflow, calls, type hierarchy, annotations.&lt;/p&gt;

&lt;p&gt;Then it exposes that graph through Cypher — the same query language used by Neo4j — so you can ask structured questions and get structured answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;Finding all AB test IDs passed to a specific SDK method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;c:&lt;/span&gt;&lt;span class="n"&gt;IntConstant&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:DATAFLOW&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;cs:&lt;/span&gt;&lt;span class="n"&gt;CallSiteNode&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;cs.callee_class&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'com.example.ab.AbClient'&lt;/span&gt;
  &lt;span class="ow"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;cs.callee_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'getOption'&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;c.value&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cs.caller_class&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cs.caller_name&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get back 19 constants, not 6. Including the ones passed through local variables, the ones defined in a constants object in a different module, and the ones flowing through conditional branches.&lt;/p&gt;

&lt;p&gt;Mapping every REST endpoint in a Spring Boot application — including those defined on abstract controllers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;graphite query /data/app-graph &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"MATCH (n:MethodNode)-[:HAS_ANNOTATION]-&amp;gt;(a:AnnotationNode)
   WHERE a.type =~ '.*Mapping'
   RETURN n.declaring_class, a.type, a.value
   ORDER BY a.value"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The type hierarchy traversal is built into the graph. Inherited annotations show up automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Token Argument
&lt;/h2&gt;

&lt;p&gt;Beyond correctness, there's efficiency. When an LLM agent tries to answer "what calls this method?" by reading source, it needs to scan every file that might contain a call site. For a monolith with 500 service classes, that's easily 2 million tokens — to answer a question that Graphite resolves in a single query returning a few hundred bytes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Raw source&lt;/th&gt;
&lt;th&gt;Graphite&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Find all AB test IDs&lt;/td&gt;
&lt;td&gt;~500 files, 2M tokens&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;callSites&lt;/code&gt; + &lt;code&gt;backwardSlice&lt;/code&gt; → 23 results&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.99%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Map REST endpoints&lt;/td&gt;
&lt;td&gt;~200 controllers, 800K tokens&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;memberAnnotations&lt;/code&gt; scan → structured list&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99.9%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resolve type hierarchy&lt;/td&gt;
&lt;td&gt;~100 files per type chain&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;supertypes&lt;/code&gt; / &lt;code&gt;subtypes&lt;/code&gt; → direct answer&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;99%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agent doesn't need to read source to answer structural questions. It queries the graph. The graph answers in milliseconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
brew tap johnsonlee/tap
brew &lt;span class="nb"&gt;install &lt;/span&gt;graphite graphite-explore

&lt;span class="c"&gt;# Build a graph from your JAR&lt;/span&gt;
graphite build app.jar &lt;span class="nt"&gt;-o&lt;/span&gt; /data/app-graph &lt;span class="nt"&gt;--include&lt;/span&gt; com.example

&lt;span class="c"&gt;# Query&lt;/span&gt;
graphite query /data/app-graph &lt;span class="s2"&gt;"MATCH (n:CallSiteNode) RETURN n LIMIT 10"&lt;/span&gt;

&lt;span class="c"&gt;# Visualize&lt;/span&gt;
graphite-explore /data/app-graph &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the Kotlin API directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;graph&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JavaProjectLoader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LoaderConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;includePackages&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.example"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;of&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"app.jar"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;results&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Graphite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;findArgumentConstants&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;method&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;declaringClass&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"com.example.ab.AbClient"&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"getOption"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;argumentIndex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Graphite is open source under Apache 2.0: &lt;a href="https://github.com/johnsonlee/graphite" rel="noopener noreferrer"&gt;github.com/johnsonlee/graphite&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;The problem with source-based code intelligence isn't that the tools are bad. It's that source code is a representation designed for humans to write and read — not for programs to reason about. Bytecode is a representation designed for execution. If you want to understand what code &lt;em&gt;does&lt;/em&gt;, start where the compiler finished.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>java</category>
      <category>kotlin</category>
    </item>
  </channel>
</rss>
