Harrison Guo

Posted on Jun 8 • Originally published at harrisonsec.com

Agent Retrieval Above the Crossover: A First-Principles Read of CodeGraph

#ai #architecture #sqlite #programming

The prior post in this series, Agent Retrieval Is a Cost Curve Problem, argued that a viable LLM-symbol-graph would need to satisfy six specific conditions — and that no existing tool had hit all six. The post went live on 2026-05-25; seven days earlier, CodeGraph had hit GitHub trending with exactly those six properties satisfied.

That's the easy version of the update: framework predicted it, someone shipped it, here's the existence proof. The companion piece (I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.) handles the empirical half — 40 verified-connected runs, a decision matrix, the install-or-not call. Short version of that post: the tool-call savings reproduce on an independent repo (−55%), the cost savings from the vendor benchmark don't (+7% at Hono's size). Fewer steps, not fewer dollars, until your repo is big enough.

This post is the harder version of the update.

The interesting question isn't whether CodeGraph works. The interesting question is why are its specific architectural choices right, and where does the abstraction inevitably leak? Answering it gives you the lens for evaluating the next CodeGraph-class tool that ships — and there will be many — without redoing the benchmark each time.

To answer it concretely rather than abstractly, I read CodeGraph against its own artifact: the SQLite database it writes to .codegraph/codegraph.db. Every structural claim below is checked against the index it actually built for Hono (CodeGraph v0.9.7: 362 files, 4,128 nodes, 8,225 edges, a 7.4 MB database). The schema turns out to be the clearest statement of the architecture the tool's README never makes.

tl;dr — CodeGraph's architecture is right for three reasons that aren't obvious from the feature list, and all three are visible in its SQLite schema. (1) The AST extraction boundary: tree-sitter takes what syntax tells you (4,128 nodes across 13 kinds, 8,225 edges across 7 kinds) and leaves the rest to the LLM. The boundary is literal — references syntax can't resolve go into an unresolved_refs table instead of becoming fake edges. (2) SQLite + FTS5, not a vector DB: the index is plain relational tables plus a full-text table over symbol names. Zero embedding columns. The queries are exact lookups that B-tree indexes answer in log time; vector search would be solving a harder problem the workload never asks. This is the prior post's cost curve, recursed onto the index tool itself. (3) The abstraction leaks where syntax diverges from runtime semantics — macros, metaprogramming, codegen, JIT binding. CodeGraph tags its few guessed edges with a heuristic provenance flag (7 of 8,225 on Hono), which is honest; but what tree-sitter can't see at all gets no edge and no flag. Knowing that boundary is what separates a tool you trust from one you cargo-cult.

Why this is a first-principles question, not a tool review

Most coverage of CodeGraph reads like "19k stars in a week, here's the install script." That's news; it isn't analysis. The same coverage will get written for every CodeGraph-class tool that ships in the next 18 months, because the pattern — tree-sitter + local index + MCP server + an instruction snippet that routes the agent to it — is now demonstrated and the ingredients are well known.

The durable question isn't "is CodeGraph good?" It's "what makes this class of tool architecturally correct, and how do I evaluate the next one?" That's what a first-principles read produces. The benchmark in the companion post is one data point; this post is the lens for reading all future data points in the same space.

If you're deciding on CodeGraph specifically, read the companion. If you're thinking about LLM retrieval as a discipline — or about to bet on, or build, a similar tool — read this.

Recap: the six conditions, in 30 seconds

The prior post argued any viable LLM-symbol-graph needed:

No-compile parsing — cold start in seconds, not minutes
Language portability — one binary for many languages, not one server per stack
LLM-shaped API — flat, recordy output the model can digest, not nested LSP hierarchies
Broad enough coverage — code-as-structure plus a text-search fallback for everything else
Live update without reindex — file-watcher-driven, no manual rebuild
Zero-config install — single binary, configures the agent automatically

CodeGraph hits all six (the field-by-field mapping is near the end of this post). Taking the mapping as established, the interesting move is to ask: of the design choices CodeGraph made to hit those six, which were forced and which could have gone the other way? The forced ones are good engineering. The ones that weren't forced — where CodeGraph picked something specific over a live alternative — are where the architecture is making a claim, and where the first-principles content lives.

Three of those choices repay a deep read. The other three (file-watcher update, single-binary distribution, instruction-snippet routing) are well-understood in their own fields — OS notifications, package distribution, prompt engineering — and amount to "do the obvious thing well." The three that don't are the three this post takes apart, each against the actual index.

Section 1 — The AST extraction boundary: an information-theoretic case

CodeGraph parses source with tree-sitter and extracts a specific subset of the syntax into its graph. You don't have to take the README's word for what that subset is — it's enumerable straight out of the nodes and edges tables. On Hono, the 4,128 nodes break down like this:

Node kind	Count	Node kind	Count
import	1,033	method	240
route	873	interface	187
function	569	property	169
file	362	class	50
type_alias	358	enum_member	24
constant	247	variable / enum	16

And the 8,225 edges, which are the actually interesting part:

Edge kind	Count	What it encodes
contains	2,874	structural nesting (file → class → method)
calls	2,230	the call graph
references	1,955	symbol used here, defined there
imports	1,033	module dependency edges
instantiates	124	`new X()` sites
extends	7	class/interface inheritance
implements	2	interface implementation

Now look at what is not there. No "type" nodes. No generic-instantiation edges. No data-flow edges. No "this dynamic dispatch resolves to that concrete method" edges. CodeGraph extracts calls, references, extends, implements — relationships that are locally apparent in the syntax — and stops. The first-order reading of this is "because tree-sitter doesn't resolve types." True, but circular. The deeper reading is why this division of labor is correct for an LLM consumer.

The information-theoretic case

A type-checker (or full LSP) does work the LLM cannot easily redo: resolving obj.method() to the actual method given the static type of obj, propagating types through generics, walking an inheritance chain to the method actually invoked. That requires the full compilation context — every transitive import, every type definition, every generic instantiation. The cost is high (a build environment, slow cold start, breaks when the build breaks) and the benefit is narrow: precise semantic resolution that's genuinely hard to reconstruct from local context.

A syntactic extractor does different work. It makes the structure of the source queryable, but only the structure that's locally apparent: "function dispatch defined at hono-base.ts:406, calls match here, imported from router." No types, no generics, no runtime binding — but no compilation either.

The information-theoretic question is: given an LLM that's good at semantic reasoning but bad at structural enumeration, what's the right split between what the index provides and what the LLM provides?

CodeGraph's answer: hand the LLM the structural skeleton — what calls what, what's defined where, what imports what — because enumerating that across thousands of files is exactly the part the LLM is bad at and would burn dozens of tool calls trying to do by hand. Leave the semantic resolution — what does this call actually invoke at runtime under dynamic dispatch? — to the LLM, because the LLM is reasonable at that once the relevant code is in its context, and baking a type resolver into the index would multiply the build cost for a recovery the LLM mostly doesn't need.

The clean way to see this boundary is the contains + calls + references edges (7,059 of the 8,225) versus the things that aren't edges at all. When the companion benchmark's Q1 asked how a GET /users/:id request reaches its handler, what CodeGraph gave Claude Code was the call chain — fetch → dispatch → match — as graph edges. What it did not give, and didn't try to, was which concrete match implementation runs given Hono's SmartRouter picking RegExpRouter at runtime. The graph located the players; the LLM read the three files and resolved the dispatch. That's the split working as designed: enumeration from the index, resolution from the model.

The boundary is a literal table

Here's the detail that turns this from an argument into an observation. When tree-sitter sees a reference it cannot statically resolve to a definition, CodeGraph does not invent an edge. It writes a row to a separate unresolved_refs table — name, location, the node it came from, no target. The schema has a first-class place for "I saw a use here, I could not prove what it binds to."

On Hono, unresolved_refs has zero rows — and, as it turns out, so did every other repo I indexed to check it (Section 3 has that result, and it's not the one I expected). The empty table isn't the interesting part; the table existing is the architecture stating its own boundary. A tool that faked those edges — guessed a target to make the graph look complete — would be lying to the LLM in exactly the way that produces confident wrong answers. CodeGraph's choice to record the unresolved reference as unresolved is the same discipline a good cache has when it marks an entry stale instead of serving it: the honest move is to represent "don't know," not to paper over it.

Why this matters beyond CodeGraph

This boundary — syntactic graph for the index, semantic reasoning for the LLM — is the line the next generation of LLM-coding tools will either hold or violate. The violations are predictable:

Too far toward semantics in the index: a tool that tries to be a full LSP-plus for the LLM. High build cost, slow cold start, fragile on broken builds, marginal benefit because the LLM can do that resolution from local context anyway.
Too far toward raw text in the index: a tool that's just "grep with nicer indexing" — fast and broad, but it doesn't hand the LLM the structural skeleton it actually needs. That's the position grep+loop already occupies; an index there adds little.

CodeGraph sits in the middle, and that position is right for current LLM capability. As models get better at semantic resolution the line will move one way; as tool-loop iteration gets cheaper it will move the other. But the principle — that there's an information-theoretic boundary worth picking, and that picking it requires modeling the LLM's real strengths and weaknesses — is the durable take. The right way to evaluate any new LLM-retrieval tool starts here: what does it choose to extract, what does it leave for the LLM, and is that split calibrated for what an LLM is actually good at?

Section 2 — SQLite + FTS5 vs vector DB: the cost curve, recursed

CodeGraph stores its symbol graph in a local SQLite database. Not Chroma. Not Pinecone. Not Weaviate. Not Qdrant. The full table list from Hono's index:

nodes              edges              files
unresolved_refs    nodes_fts          schema_versions
project_metadata   (+ FTS5 shadow tables: nodes_fts_data/idx/docsize/config)

nodes and edges are plain relational tables. nodes_fts is an FTS5 virtual table. Searching the whole schema for an embedding column, a vector type, a float array — anything ANN-shaped — returns nothing. The only BLOB columns are FTS5's own internal segment storage (nodes_fts_data), not vectors. There are no embeddings in CodeGraph. That's not an omission; it's the architecture, and it's the same call the prior post made one level down.

The cost-curve frame, recursed

The prior post argued vector RAG over a codebase pays a build cost (chunk + embed every file), a maintain cost (re-embed on change, reconcile cross-chunk references), and a low per-query cost (ANN search + rerank) — and that for most repos this loses to grep+loop's (zero build, zero maintain, per-query round-trips).

Apply that exact frame to CodeGraph's own storage. If CodeGraph used a vector DB for its symbols, it would pay: embed every symbol's signature and body on index; re-embed on every file save (the file-watcher would have to fire embedding calls); ANN search per query. That's the same curve the prior post argued against — and CodeGraph's workload doesn't justify it, because the queries it serves are exact lookups, not similarity searches. The schema proves the queries are exact by the indexes it builds for them:

"Find symbol getUserById" → idx_nodes_name, and idx_nodes_lower_name for case-insensitive matches. A B-tree probe, microseconds. FTS5 (nodes_fts over name, qualified_name, docstring, signature) handles the fuzzier "name contains" variants. No similarity math.
"Who calls Context.set?" → idx_edges_target_kind (a reverse-edge index on (target, kind)). Reverse adjacency lookup, deterministic.
"What does dispatch call?" → idx_edges_source_kind (the forward-edge index). Forward adjacency, deterministic.
"Trace fetch → db_query" → repeated forward-edge hops over those same indexed edges. Graph traversal on stored adjacency, no vectors anywhere in the loop.

Those forward and reverse edge indexes are the whole ballgame. Callers and callees — the queries a code-intelligence tool exists to answer — are a single indexed adjacency lookup in each direction. Vector search cannot do this better; it can only do it fuzzier and more expensively, because "who calls this function" has an exact answer that an approximate-nearest-neighbor index would blur.

The only queries where vector search genuinely helps are semantic ones with no symbol to anchor on — "show me the code that does authentication." CodeGraph doesn't serve those. The LLM does, by issuing a sequence of exact structural queries and reasoning across the results. The division is the same one from Section 1: the index answers the exact-lookup questions deterministically; the LLM answers the fuzzy-intent questions by orchestrating exact lookups. Neither needs an embedding.

The recursion as a design principle

What's elegant — and worth surfacing for its own sake — is that CodeGraph's storage choice is consistent with the retrieval philosophy from the prior post, one level up. Both arguments are the same sentence: exact-lookup workloads should use exact-lookup tools; approximation overhead is paid only where approximation pays back.

If CodeGraph had reached for Chroma over FTS5, it would have violated its own retrieval philosophy — paying embedding and ANN cost to answer questions that have exact answers. That it didn't, that the designer recognized the symbol-graph workload is exact-lookup-shaped and picked the cheapest exact-lookup storage available, is what makes the architecture coherent across layers rather than just locally clever.

The next tool in this class will face the same fork, and most will reach for a vector DB by default, because "AI tooling = vector store" is the reflex. CodeGraph's choice is the corrective: ask what your workload needs, not what the category's fashion suggests. That's the cost-curve frame functioning as a meta-design tool — every time you add a layer to an LLM stack, ask which side of the curve the new layer's workload sits on, and pick storage and algorithm from the answer, not the trend.

Section 3 — Where CodeGraph's abstraction leaks

Every index lies a little. The question is where it lies and whether you can tell when it does.

CodeGraph's graph is built from syntactic extraction, so anywhere the runtime semantics diverge from the syntactic structure, the graph is incomplete in a way that's hard to detect from the index alone. The leak isn't a bug; it's the abstraction working as designed, at a layer that structurally cannot see certain phenomena. There's a tell for it in the schema, and there's a part the schema can't tell you about — and the difference between those two is the whole point.

The honest part: the provenance column

CodeGraph stamps every edge with a provenance value. On Hono, 8,218 of the 8,225 edges have empty provenance — meaning direct from the syntax tree — and exactly 7 carry the value heuristic. Those seven are edges CodeGraph's framework adapters inferred from a recognized pattern rather than read off the AST: route registrations, framework binding conventions, the handful of cases where a tool that "supports Hono / Flask / Spring" pattern-matches a known idiom and synthesizes an edge the raw syntax doesn't spell out.

That heuristic tag is the architecture being honest. It is, in the vocabulary of the memory post in this series, an arrow: every edge points back to how it was derived, and the seven guessed edges are flagged as guesses. A consumer that cared could treat heuristic edges with less trust than syntactic ones. That's good cache hygiene — the index records the confidence of its own entries instead of presenting all of them as equally certain.

The part the schema can't tell you about

Here's the catch, and it's the one that matters: the provenance column only flags edges that exist. The dangerous leak isn't a guessed edge that's marked as guessed. It's the edge that should exist and isn't there at all — because the relationship lives in a layer tree-sitter cannot see, so there's nothing to extract, nothing to tag, and nothing to warn you. The four big zones where this happens:

Macro-heavy code. In Rust, vec![1, 2, 3] expands at compile time into a call sequence the AST never contains; the graph shows a vec! invocation, not the Vec::new() + push() that actually runs. For procedural macros (#[derive(...)], attribute macros), the generated implementation is what executes and CodeGraph can't see into it without running the compiler — which would forfeit the no-compile property that Section 1 showed is the whole point. Same shape in C/C++ preprocessor-heavy code, Lisp/Clojure macros, Elixir compile-time metaprogramming.

Metaprogramming. Python decorators routinely rewrite functions: @dataclass synthesizes __init__/__repr__/__eq__; @app.route("/users") registers a handler with a router. Tree-sitter sees the decorator and the function as adjacent syntax, not the synthesis or the registration. CodeGraph's framework adapters catch the common cases — and that's literally what the 7 heuristic edges on Hono are — but arbitrary user-defined decorators that mutate behavior are invisible. Ruby method_missing, Python __getattr__, Java reflection: same story. The graph confidently returns "no callers" for a method invoked entirely through reflection, and the LLM, trusting structured output, may hand you a confidently wrong blast radius.

Generated code. Protobuf, GraphQL codegen, OpenAPI clients, ORM model generation (Prisma, SQLAlchemy declarative), JSX/Svelte compilation — the code the runtime executes isn't the code in source control. It lives in build/, dist/, .cache/, places .gitignore excludes. CodeGraph indexes what's checked in; the generated layer is outside the boundary. "Who implements UserService?" returns the hand-written interface, not the generated stub that implements it on the wire. Any source-only index has this; it's worth naming because it interacts badly with the user's instinct that an "AST graph" must be complete. It's complete over the source it indexed — and the generated layer was never in that source.

JIT and runtime-registered bindings. DI containers (Spring, Guice, Dagger, ASP.NET service collection), FastAPI Depends, plugin systems with runtime registration, and — the one the companion benchmark hit directly — middleware chains composed at app startup. Hono's app.use(...) builds the middleware array at runtime; tree-sitter sees the use call sites and the handler as unconnected syntax. When the benchmark's Q2 asked Claude Code to trace the middleware call stack, what codegraph_trace could return was the syntactic call chain through compose() — accurate as far as it goes, and genuinely fewer steps than baseline grep — but the actual runtime ordering of middlewares is assembled by app.use calls scattered across the app, which the graph doesn't compose. The trace looked authoritative and was structurally real; it just wasn't the runtime composition, and only someone who knew the leak zone would know to check.

The empirical check, and the null result that sharpens it

I expected unresolved_refs to be where this shows up — index a macro-heavy repo, watch the table fill. So I indexed three to test it: Hono (TypeScript), click (Python, decorator-heavy), and ron (a Rust crate leaning on derive macros and serde). unresolved_refs was zero on all three; heuristic edges were 7, 0, and 0. The null result is the finding. A #[derive(Serialize)] impl never appears as an unresolved reference, because nothing in the source ever wrote a reference to it to leave dangling — the impl only exists after macro expansion. codegraph callers serialize on ron returns its seven real syntactic callers and silently omits whatever the derive generates, with no flag and no empty-table warning, because from the index's point of view nothing is missing. And that is the trap. An empty unresolved_refs table reads like a clean bill of health, but on derive-heavy or reflection-heavy code it means the opposite of "everything resolved" — it means the thing that didn't resolve never left a trace to flag. The table catches references it can't resolve; it cannot catch code that was never written down to reference. That's the leak that costs you: not the guess that gets flagged, but the absence that looks exactly like completeness. It's the same failure shape as the memory post's "could" stored as "did" — the dangerous error is always the one that wears the face of a correct answer.

Why mapping the leaks matters

A tool you trust everywhere is a tool you stop checking. The four zones above are where the LLM, trusting the graph, gives you confidently wrong answers — and those are the failures that cost real engineering time, because the answer looks right and you have no reason to second-guess it.

The practical rule is small. Inside one of these zones — heavy macros, reflection/DI, codegen-heavy projects, runtime-composed bindings — CodeGraph is still a fine starting point, but the LLM's answer has to be cross-checked against the runtime, not against the graph. Outside them — most application code in most languages, which is most of what most people query — the graph is enough. The provenance column tells you which present edges were guessed; nothing tells you which absent edges were never seen. That asymmetry is the actual trust boundary, and it's the thing to internalize before you wire any syntactic index into an agent's decision loop. Joel Spolsky named this pattern for compilers and frameworks twenty years ago — every abstraction leaks, and you pay for the leak precisely when you've forgotten the abstraction is there. CodeGraph is the latest data point in a very old series.

Mapping CodeGraph to the six conditions

Field-by-field, how CodeGraph hits each condition from Agent Retrieval Is a Cost Curve Problem. Compressed; the prior post defines the conditions, the companion post applies them empirically.

1. No-compile parsing. Tree-sitter parses source into an AST with no build invocation, no dependency resolution, no language environment. On Hono, 362 files indexed to 4,128 nodes and 8,225 edges in 1.7 seconds; the published 7-repo benchmark reports first-index on the order of minutes for VS Code-scale (~30k files), all subsequent updates incremental. LSP needs tsc / cargo check / mvn; CodeGraph reads raw text. Met.

2. Language portability. ~19 languages via tree-sitter, plus framework adapters for route-aware extraction (Hono's 873 route nodes come from one of them). One binary, no per-language server. Met.

3. LLM-shaped API. Here the scaffold version of this post — and a lot of the casual coverage — gets a fact wrong worth correcting precisely. The CLI exposes a dozen commands (query, callers, callees, impact, affected, context, …). But the MCP server exposes exactly five tools to the agent: codegraph_search (locations only), codegraph_context (described in its own schema as the PRIMARY tool, call FIRST for any how-does-X-work question), codegraph_node (one symbol plus its callers/callees trail), codegraph_explore (several related symbols in one capped call), and codegraph_trace (the call path between two symbols). The narrowing is the design: the human CLI gets impact and affected as separate verbs; the agent gets a context-first surface of five flat tools, each returning {symbol, file, line, snippet, related[]}-shaped records, with the instruction snippet steering it to codegraph_context before anything else. Ten tools would be worse for an LLM than five; CodeGraph picked five. Met, deliberately.

4. Coverage breadth. Symbol graph for structure; FTS5 over name, qualified_name, docstring, signature for text-fallback; Claude Code's native Grep stays enabled for everything outside the index. Partially met — the correct partial.

5. Live update without reindex. OS file-watcher with a short debounce; a save re-parses the touched file and re-resolves dependents' import edges. Met.

6. Zero-config install. Single binary, one-line install, auto-detects the agent, writes the MCP config and the instruction snippet, then codegraph init -i builds the index. Ten minutes from curiosity to working under ~1,000 files. Met.

Six for six. The architecture the prior post argued was theoretically right but practically missing exists, in production, with a working installer — and, read against its own schema, the choices hold up under inspection rather than just on the landing page.

What this says about LLM retrieval as a discipline

Three things, in increasing order of generality.

1. The right LLM-index design is not a copy of human-IDE design. Sourcegraph and LSP were built for a human reading one precise answer; an LLM reads many cheap rounds and reasons across them. The architectures should differ, and CodeGraph's choices — tree-sitter not LSP, five flat MCP tools not a nested LSP API, FTS5 not vectors — are evidence of someone designing for the actual consumer instead of porting an existing design. The framework predicts the design space, and the interesting variation between the tools that will fill it is not in the six conditions (those are now the table stakes) but in the ranking layer — how each one orders the symbols a query surfaces. That's where the next tool will try to win, and where the next benchmark should aim.

2. The cost-curve frame is recursive. It applies to every layer of an LLM stack, including the tools that wrap the LLM. CodeGraph's FTS5-not-Chroma choice is the same shape as the original grep-not-RAG choice. Use it as a meta-design tool: at every layer, ask which side of the curve the workload sits on, and let that pick the storage and the algorithm.

3. The abstraction leaks are the trust boundary — and trust, in the end, has to terminate at the source. This is the thread that runs through the whole series. CodeGraph's graph is a derived view of the source: a cache. Its heuristic provenance tags and its unresolved_refs table are the parts where it keeps an arrow back to that source and is honest about what it did and didn't see. But a syntactic graph is still a lossy projection of a running program, and the leak zones are exactly where the projection drops information that only exists at runtime. The discipline that falls out of this is the same one the retrieval post and the memory post arrived at from their own directions: a derived artifact is trustworthy only where you can check it against the source that produced it. CodeGraph is fast and exact in the 80% of code where syntax determines structure, and quietly incomplete in the 20% where it doesn't — and the only way to stay out of the failure modes is to remember the graph is a cache and keep the real code, the actual runtime, as the thing that wins every conflict.

The bigger move CodeGraph represents — third-party MCP tools filling the retrieval gap the foundation model's main agent doesn't fill — is the ecosystem direction the feature-flag analysis in the prior post suggested Anthropic is hedging toward. Whether Anthropic eventually builds tree-sitter symbol-graph functionality natively or leaves it to the CodeGraph-class ecosystem is a product call. The technical case for "let MCP fill it" is strong: the design space is still settling, and locking one approach into Claude Code spends option value the ecosystem is currently pricing for free.

Closing — the mini-series arc

This is the third of a three-part Lab series on Claude Code's retrieval and memory architectures:

Agent Retrieval Is a Cost Curve Problem (2026-05-25) — why grep+loop, not RAG, for most projects
Agent Memory Is a Cache Coherence Problem (2026-05-28) — why hand-curated Markdown, not lossy vector recall, for cross-session memory
This post (2026-06-08) — what lives above the cost-curve crossover: CodeGraph as the architecturally coherent symbol-graph companion the first post argued was missing, read first-principles against its own index for what its choices say about the discipline

Read together, the three describe one stance on agent retrieval and memory: choose lossless and exact by default; expose MCP as the integration substrate; let third-party tools fill the gaps you don't want to own; and keep an arrow back to the source everywhere, because every derived view is a cache and the source is the only thing that can't drift from itself. The cost-curve frame is the math, the cache-coherence frame is the failure taxonomy, and the first-principles read of CodeGraph is what the architecture, looked at carefully, says about where LLM retrieval is going.

If you're building agent retrieval, the three frames are now in your toolkit. The companion empirical post gives you the install-or-not decision; this one gives you the lens for the next ten tools that ship in the same space.

Companion piece 1 (this is the third in a 3-post Lab series): *Agent Retrieval Is a Cost Curve Problem: Why Claude Code Doesn't Use RAG***
Companion piece 2: *Agent Memory Is a Cache Coherence Problem***
Empirical pair on the Operator track: *I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't.***
Background: *Consistency in Distributed Systems: Scenarios, Trade-offs, and What Actually Works***
CodeGraph repo: *https://github.com/colbymchenry/codegraph***

DEV Community