TL;DR
I've spent the last year building JVM CodeLens — a desktop app that analyzes heap dumps, GC logs, thread dumps, and JFR recordings, correlates every finding to the exact file:method:line in your source code, and uses an LLM to explain why it's broken and how to fix it. It runs entirely on your machine. Your source code and diagnostic data never leave your laptop.
This post is not a product pitch. It's a breakdown of the engineering problems you hit when you try to build "AI for JVM diagnostics" properly — problems that curl heap.hprof | llm will never solve — and how I solved them with Eclipse MAT, JavaParser, GCToolkit, QuestDB, and a carefully-scoped LLM integration.
If you've ever been on-call for a Java service and thought "jstack is giving me a 50,000-line wall of text, and I have no idea which thread matters" — this is for you.
The problem: the "last mile" between diagnostic data and source code
Every senior Java engineer has done this dance:
- Production is on fire. Heap usage climbs, GC pauses spike, latency explodes.
- You SSH in.
jmap -dump:live,format=b,file=heap.hprof <pid>. Copy it out. - Open Eclipse MAT. Wait 6 minutes for the dominator tree.
- Scroll. You see
java.util.concurrent.ConcurrentHashMap$Nodeis eating 3.2 GB. - Now what?
Eclipse MAT will tell you what is leaking. It will not tell you where in your code it's being allocated. Not in a way that maps to ServiceController.java:84 without you manually walking the object graph.
Same story with every other tool:
| Tool | What it tells you | What it doesn't |
|---|---|---|
jstack |
200 threads, 50k stack frames | Which thread is the problem |
jstat -gcutil |
GC pause stats | Which allocation pattern caused them |
| GC logs | Every pause event | Whether this is normal for your app |
| JFR | Hot methods, allocation samples | What they mean in your domain |
jmap histo |
Top classes by instance count | Where those instances come from in code |
The last mile — from "your heap is full of String[]" to "line 84 of CacheLoader.java is appending to a list that's never cleared" — is where every engineer burns their Friday evening.
That's the gap JVM CodeLens closes.
"Why not just throw the heap dump at Claude?"
This is the first question everyone asks. It's a reasonable question. Let me explain why it doesn't work, because the answer reveals the whole design philosophy of the product.
Reason 1: Heap dumps are binary blobs, and big
A production heap dump is often 2–8 GB of binary HPROF data. Context windows can't hold that. You'd also be shipping your entire object graph — including user PII, session tokens, cached JWTs — to an LLM provider. That's a compliance nightmare.
Reason 2: LLMs hallucinate on unstructured data
If you hand an LLM 200k lines of thread dump text and ask "what's the deadlock?", it will invent one. Confidently. With fake line numbers. I've watched it happen.
Reason 3: The hard parts of JVM diagnostics are deterministic, not AI-shaped
Finding a deadlock in a thread dump is a graph traversal problem. You build a lock-waits-for graph and detect cycles. This is a solved CS problem. You don't need GPT-4 to do it — you need 40 lines of Java. Same for dominator trees, GC pause percentiles, object retention graphs. These are computations, not inferences.
The right split
So here's the mental model JVM CodeLens is built around:
Java does the parsing and computation. The LLM does the reasoning.
- Java: parse HPROF with Eclipse MAT → compute dominator tree → extract top retainers → map to source via AST → produce a structured JSON finding.
- LLM: receives the structured finding (a few KB, not GB) → generates the explanation, the hypothesis, the suggested fix.
The LLM never sees your heap dump. It sees something like:
{
"leak_suspect": {
"retained_mb": 3247,
"class": "java.util.concurrent.ConcurrentHashMap$Node",
"gc_root_path": "SessionCache.activeSessions → HashMap$Node[]",
"allocation_site": {
"file": "SessionCache.java",
"method": "onLogin",
"line": 84,
"source_snippet": "activeSessions.put(userId, new Session(...))"
},
"retention_trend_24h": "+12% per hour, no plateau"
}
}
That's maybe 2 KB of structured data. It fits in a system prompt. The LLM can reason about it. And critically — the LLM's answer cites your file and line number because the prompt already contains them. It can't hallucinate the location.
The source correlation engine (the hard part)
This is the technical core, and it's where most "AI for JVM" attempts hand-wave past the actual problem. How do you get from a runtime class name like com.acme.SessionCache to the specific file, method, and line in the user's repo?
Here's the pipeline:
Step 1: Index the source with JavaParser + JDT
When a user links their Git repo, JVM CodeLens walks every .java file and builds an AST using JavaParser (for syntax) plus Eclipse JDT (for type resolution — critical for generics and inheritance).
For each class, we extract:
public record ClassMetadata(
String fullyQualifiedName, // com.acme.SessionCache
String filePath, // src/main/java/com/acme/SessionCache.java
int declarationLine, // 12
List<MethodMetadata> methods,
List<FieldMetadata> fields,
List<AllocationSite> allocations // every `new X()` with its line number
) {}
The AllocationSite list is the magic. Every new ConcurrentHashMap<>() in the codebase gets indexed with its exact line number. Later, when we see ConcurrentHashMap$Node dominating the heap, we can point to every place it could have come from.
Step 2: Parse the HPROF with Eclipse MAT Core
Eclipse MAT's parser is the industry-standard HPROF engine (it's what Eclipse Memory Analyzer uses). We pull its Core as a library. After parsing, we walk the dominator tree to find retained heap — not just shallow size.
For the top N retainers, we emit a structured HeapHistogramEntry:
public record HeapHistogramEntry(
String className,
long instanceCount,
long shallowBytes,
long retainedBytes,
@Nullable SourceLocation sourceLocation // from the indexer
) {}
Step 3: Map runtime → source
Given className = "com.acme.SessionCache", we look up ClassMetadata in our index and attach the source location. If the class is ambiguous (multiple matches across modules), we rank by package proximity to the GC root path.
The result: every row in the heap histogram has a clickable source link that opens the file at the allocation line. Not a guess. Not an AI hallucination. A direct lookup.
Step 4: Same idea, different artifacts
-
Thread dumps — regex-parse stack frames, resolve
com.acme.Foo.bar(Foo.java:127)against the index, render each frame as a clickable link. -
GC logs — parse with Microsoft GCToolkit (works across G1, ZGC, Shenandoah, CMS, Parallel), correlate pause spikes with deployment timestamps from
git log. -
JFR — use
jdk.jfr.consumerto extract hot methods and allocation events, resolve eachStackFrameagainst the index.
Every finding becomes a structured object with a SourceLocation attached. Then, and only then, do we hand it to the LLM for reasoning.
The storage tier: why QuestDB (embedded)
Early on I tried SQLite for time-series metrics. Bad idea. When you're polling 5 JVMs every 5 seconds and storing 20+ metrics per target, you generate ~7k rows/minute per target. Queries for "show me the last 24 hours of heap usage with 1-minute downsampling" become miserable.
Switched to QuestDB embedded. Same JAR, no external process. It's a columnar TSDB with SQL, and the SAMPLE BY operator is purpose-built for downsampling:
SELECT timestamp, avg(heap_used_mb) AS heap
FROM jvm_metrics
WHERE target_id = 'app-prod-1' AND timestamp > dateadd('h', -24, now())
SAMPLE BY 1m;
One wrinkle: QuestDB has a native DLL loaded via FunctionFactoryScanner, which choked on Spring Boot's jar:nested: packaging. Fix: requiresUnpack("**/questdb-*.jar") in the bootJar task. Cost me two days.
Retention and forecasting
Metrics are kept 90 days by default. An hourly summary table (downsampled via SAMPLE BY 1h) supports forecasting:
- Linear regression for monotonic trends (heap growth)
- Holt-Winters triple exponential smoothing for weekly/daily seasonality
- Combined, they power Time-to-Failure (TTF) forecasting: "At current growth, heap hits
-Xmxin 14 days 3 hours"
These are pure Java — no Python, no sidecar. The whole forecasting pipeline is a few hundred lines.
The AI layer: four providers, one abstraction
LLM integration is designed to be bring-your-own-key. Four providers:
- Anthropic Claude (recommended — best at structured reasoning)
- OpenAI GPT-4
- Google Gemini
- Ollama (local, for air-gapped environments)
They're behind a single LlmProvider interface. The prompts differ per artifact (GC/heap/thread/JFR) because the production-troubleshooting playbook for each is different. For example, a GC prompt includes:
Severity calibration: ignore pauses < 50ms on G1. Flag p99 > 500ms as degraded, > 2s as critical. Correlate throughput drop with allocation rate spike.
These prompts took months to tune, informed by real production incidents.
One under-appreciated detail: streaming. JVM CodeLens uses SSE to stream the LLM response into the UI token-by-token. When your production is down, a 15-second "thinking…" spinner feels awful. Streaming makes the tool feel alive.
The privacy model
This is non-negotiable and a core product principle:
- Your heap dumps, thread dumps, GC logs, JFR files, source code — never leave your machine. All parsing and correlation happens locally.
- The LLM receives only the structured JSON summary (kilobytes), using your API key on your account. We don't proxy the request. We don't see it.
- For Ollama users, nothing leaves your machine at all. It's fully offline.
- Only license validation and update checks talk to our servers.
This is why it's a desktop app (Electron + embedded Spring Boot) instead of SaaS. The privacy model breaks the moment you introduce a server.
A real example: diagnosing a leak
Here's what the end-to-end flow looks like.
Setup: a Spring Boot service, 8 GB heap, steadily climbing over 3 days until OOM.
Step 1: User attaches JVM CodeLens to the running JVM (via local Attach API or JMX). Metrics start flowing into QuestDB.
Step 2: Anomaly detector notices heap-used-post-GC is trending up with sigma > 3 over the 7-day baseline. It auto-fires a memory leak analysis.
Step 3: Triggers a heap dump via the Attach API. Parses it. Top retainer: ConcurrentHashMap$Node at 6.1 GB retained. GC root path: RequestContextHolder.sessionCache.
Step 4: Indexer looks up SessionCache.java. Finds allocation site at line 84:
public void onLogin(String userId, UserProfile profile) {
activeSessions.put(userId, new Session(profile)); // line 84
// ... no eviction policy. ever.
}
Step 5: Builds structured finding, sends to Claude with a scoped prompt. Claude responds:
SessionCache.activeSessionsis an unboundedConcurrentHashMap. Line 84 inserts on every login, but no eviction exists. Over your 3-day window at observed login rate (~2.3k/hour), retained size grew from 400 MB to 6.1 GB — matches the trend. Fix: replace withCaffeine.newBuilder().expireAfterAccess(30, MINUTES).maximumSize(100_000).build(), or add explicit eviction on logout.
Step 6: Git correlation service cross-references: line 84 was touched by commit a4f2b19 4 days ago — author removed the eviction cron job in a "simplification" PR.
That whole chain — detect → capture → parse → correlate → reason → blame — runs in about 90 seconds on a developer laptop.
Tech stack recap
- Desktop shell: Electron 30
- Backend: embedded Spring Boot 3.4 on Java 24 (virtual threads for the metric poller)
- Frontend: React 19 + TypeScript + Vite + D3 + Recharts
- IPC: localhost HTTP + WebSocket (backend is a child process of the Electron main)
- Time-series: QuestDB embedded
- Config: SQLite
- Heap parsing: Eclipse MAT Core
- GC parsing: Microsoft GCToolkit
- Code parsing: JavaParser + Eclipse JDT
-
JFR:
jdk.jfr.consumer - JRE: Adoptium 21, bundled — no system Java required
The whole .dmg/.exe/.deb ships with its own JRE. Users don't need Java installed. They don't need to configure anything. Attach the app, pick a JVM, go.
What's hard about this that isn't obvious
Three things will surprise you if you try to build something similar:
Packaging a Spring Boot JAR inside Electron is a landmine field. Nested classloaders, DLL unpacking, JVM flag passthrough, JRE path resolution across Mac/Windows/Linux. I wrote a separate post on this — it's the kind of thing that takes a week and no one documents it.
Correlating a runtime class name to a source file is ambiguous more often than you think. Multi-module Gradle builds with the same class name in
mainandtest. Inner classes. Lombok-generated methods. Kotlin interop. Every one of these breaks the naive lookup.GC logs from ZGC / Shenandoah look nothing like G1. You can't regex your way out of it. GCToolkit saves months of work.
Try it
JVM CodeLens is free for personal use (Community tier) — full local JVM monitoring, all parsers, source correlation, and AI with your own API key. Paid tiers add team features, fleet view, cloud license, and enterprise scaffolds (K8s discovery, SSO, IntelliJ plugin).
- Download: https://jvmcodelens.com
- GitHub: https://github.com/prguptadev/jvm_code_lens (star it if you'd like to follow along)
- Docs: https://jvmcodelens.com/#blog
If you're a Java engineer who's tired of jstack walls-of-text, I'd love your feedback. Especially if you hit a scenario where the source correlation fails — those are the bugs I care about most.
And if you've built something similar, or have opinions on the right way to split responsibility between deterministic parsing and LLM reasoning in diagnostic tooling — drop a comment. This design space is still wide open and I'd love to swap notes.
Found this useful? I'm writing a series on building developer tools that live at the intersection of JVM internals and LLMs. Follow me to catch the next one — probably on how the predictive Time-to-Failure engine works without needing ML models.
Top comments (1)
Looking like a very useful tool