<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dhevenddra</title>
    <description>The latest articles on DEV Community by Dhevenddra (@dhev_).</description>
    <link>https://dev.to/dhev_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004462%2F293472a9-708a-4639-bd78-84ec29d54839.png</url>
      <title>DEV Community: Dhevenddra</title>
      <link>https://dev.to/dhev_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dhev_"/>
    <language>en</language>
    <item>
      <title>Grounding AI coding agents with a confidence-tagged code knowledge graph</title>
      <dc:creator>Dhevenddra</dc:creator>
      <pubDate>Sat, 27 Jun 2026 05:28:07 +0000</pubDate>
      <link>https://dev.to/dhev_/grounding-ai-coding-agents-with-a-confidence-tagged-code-knowledge-graph-5ff5</link>
      <guid>https://dev.to/dhev_/grounding-ai-coding-agents-with-a-confidence-tagged-code-knowledge-graph-5ff5</guid>
      <description>&lt;p&gt;&lt;strong&gt;Disclosure:&lt;/strong&gt; this is my own open-source project (&lt;code&gt;forensic-deepdive&lt;/code&gt;, Apache-2.0). I'm sharing it here because the dev.to crowd tends to have sharp opinions on agent tooling and I want the critique.&lt;/p&gt;

&lt;p&gt;Most "repo context" tooling for AI agents is retrieval: embed the files, fetch the chunks that look similar to the prompt, hope the model reasons over them. That's a fine baseline, but it answers "what text looks relevant," not "what breaks if I change this," "which files are load-bearing," or "who owns this and is the bus factor 1." Those are graph and git-history questions.&lt;/p&gt;

&lt;p&gt;Here's how I built a tool that answers them, what it gets right, and, explicitly, what it doesn't yet do.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fk7avdecn7r9bbqkecvxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fk7avdecn7r9bbqkecvxp.png" alt=" " width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: retrieval isn't grounding
&lt;/h2&gt;

&lt;p&gt;The current wave of AI coding tools has largely settled on retrieval-augmented context: embed the repository, fetch the chunks most similar to the prompt, and let the model reason over them. It's useful and it's the right baseline. But it has a structural ceiling.&lt;/p&gt;

&lt;p&gt;Similarity retrieval answers "what text looks relevant?" It does not answer the questions a competent engineer answers reflexively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If I change this function, what actually breaks, transitively, across files?&lt;/li&gt;
&lt;li&gt;Which 20 files out of 2,000 are load-bearing?&lt;/li&gt;
&lt;li&gt;Who has historically owned this module, and is the bus factor 1?&lt;/li&gt;
&lt;li&gt;Does this frontend call resolve to a backend handler, and which one?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are structural and historical questions. You answer them with a graph and with git history, not with cosine similarity. &lt;code&gt;forensic-deepdive&lt;/code&gt; is an open-source (Apache-2.0) tool built around exactly that premise.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it produces
&lt;/h2&gt;

&lt;p&gt;Point it at a repository and it emits three coordinated outputs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fp70cnxl4rxnfpmclgky8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fp70cnxl4rxnfpmclgky8.png" alt=" " width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A persistent embedded knowledge graph (&lt;code&gt;&amp;lt;repo&amp;gt;/.deepdive/graph.lbug&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;An MCP server exposing 9 composite tools to any MCP-aware agent.&lt;/li&gt;
&lt;li&gt;Five durable markdown artifacts as a human-readable projection.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's built on tree-sitter (parsing across 9 languages), a PageRank-style repo-map for centrality, an embedded graph database, and &lt;code&gt;git log&lt;/code&gt; for the historical layer. Extraction runs entirely locally, with zero LLM calls, zero network, and no API keys required. Cloud and semantic features are strictly opt-in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The graph schema
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fufawls03w8yftijpmhiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fufawls03w8yftijpmhiy.png" alt=" " width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Node types: File, Symbol, Module, Commit, Author, Endpoint, DbTable.&lt;/p&gt;

&lt;p&gt;Edge types: &lt;code&gt;DEFINES&lt;/code&gt;, &lt;code&gt;MEMBER_OF&lt;/code&gt;, &lt;code&gt;IMPORTS&lt;/code&gt;, &lt;code&gt;CALLS&lt;/code&gt;, &lt;code&gt;EXTENDS&lt;/code&gt;, &lt;code&gt;IMPLEMENTS&lt;/code&gt;, &lt;code&gt;TOUCHED_BY_COMMIT&lt;/code&gt;, &lt;code&gt;AUTHORED_BY&lt;/code&gt;, &lt;code&gt;CO_CHANGES_WITH&lt;/code&gt;, and the cross-boundary set: &lt;code&gt;HANDLES&lt;/code&gt;, &lt;code&gt;CALLS_ENDPOINT&lt;/code&gt;, &lt;code&gt;ROUTES_TO&lt;/code&gt;, &lt;code&gt;INJECTS&lt;/code&gt;, &lt;code&gt;PERSISTS_TO&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The structural edges (&lt;code&gt;CALLS&lt;/code&gt;/&lt;code&gt;IMPORTS&lt;/code&gt;/&lt;code&gt;EXTENDS&lt;/code&gt;) come from AST analysis. The historical edges (&lt;code&gt;TOUCHED_BY_COMMIT&lt;/code&gt;/&lt;code&gt;AUTHORED_BY&lt;/code&gt;/&lt;code&gt;CO_CHANGES_WITH&lt;/code&gt;) come from git. The cross-boundary edges come from protocol-specific extractors. They live in one graph, so an agent can ask a structural question and a historical question in the same breath.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design decision that matters most: honesty
&lt;/h2&gt;

&lt;p&gt;The failure mode of a code-graph tool is silent confidence. If the graph asserts a &lt;code&gt;CALLS&lt;/code&gt; edge that's really just two same-named symbols in different files, an agent will trust it and "fix" code that never needed touching. High recall with hidden false positives is actively dangerous in an autonomous loop.&lt;/p&gt;

&lt;p&gt;So every edge and every emitted claim carries a confidence tag:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsux1i4s0iuvav8syf8ow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fsux1i4s0iuvav8syf8ow.png" alt=" " width="800" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;EXTRACTED&lt;/code&gt;: deterministic from the AST or &lt;code&gt;git log&lt;/code&gt;. A fact.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;INFERRED&lt;/code&gt;: a heuristic resolved cleanly (import-graph walk, receiver-type inference, single same-name candidate). High-trust but derived.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AMBIGUOUS&lt;/code&gt;: multiple candidates; the resolver couldn't disambiguate, so it surfaces every candidate rather than guessing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shows up everywhere. &lt;code&gt;HOTPATHS&lt;/code&gt; carries a per-row confidence-mix column, so you can tell a symbol that resolved cleanly (mostly &lt;code&gt;EXTRACTED&lt;/code&gt;/&lt;code&gt;INFERRED&lt;/code&gt;) from one drowning in same-name collisions (&lt;code&gt;AMBIGUOUS&lt;/code&gt;). The tool tells you how much to trust it, per claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  One abstraction for five protocols: the Endpoint keystone
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3rliu2bk53nyia50nvqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3rliu2bk53nyia50nvqs.png" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cross-boundary tracing is where most tools stop, because each protocol looks different. forensic-deepdive routes all of them through a single &lt;code&gt;Endpoint&lt;/code&gt; join node. Five protocols, HTTP, MCP tools, registry dispatch, gRPC, and messaging/AMQP, share that one node and a protocol-blind join. A frontend call resolves to its backend handler across the whole stack as a single &lt;code&gt;ROUTES_TO&lt;/code&gt; edge.&lt;/p&gt;

&lt;p&gt;The architectural consequence: adding a sixth protocol is a new key-builder plus provider/consumer extractors. It never touches the trace, emit, or serve layers. The surfacing layer is protocol-blind by design. &lt;code&gt;trace(symbol)&lt;/code&gt; walks &lt;code&gt;frontend call -&amp;gt; CALLS_ENDPOINT -&amp;gt; Endpoint -&amp;gt; HANDLES -&amp;gt; handler -&amp;gt; CALLS tail&lt;/code&gt; generically, no matter which protocol produced the edge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The historical layer agents are blind to
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqm9chtkcbgt2rz95hxd4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqm9chtkcbgt2rz95hxd4.png" alt=" " width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Git archaeology is a first-class layer, not a footnote: churn, top authors with their %, bus factor, co-change clusters (files that always move together), and defect proximity (proximity to bug-fix commits). In hands-on testing this was consistently the highest-trust, highest-value layer, the fastest way to learn where risk is concentrated and who to ask. A graph tells you the shape; archaeology tells you the story.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 9 MCP tools
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;impact&lt;/code&gt;: blast-radius BFS over CALLS edges, depth-bucketed, confidence-filterable.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;context&lt;/code&gt;: one-call kitchen sink: definition + callers + callees + parents/members + recent commits + dominant author + insights.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;archaeology&lt;/code&gt;: churn, top authors, bus factor, co-change cluster, defect proximity.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flow&lt;/code&gt;: DFS over CALLS with cycle detection.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;query&lt;/code&gt;: raw Cypher, or hybrid NL retrieval (BM25 + structural signal + optional offline semantic, RRF-fused).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;record_insight&lt;/code&gt;: persist a verified learning about a symbol.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;recall_insights&lt;/code&gt;: recall stored insights, newest-first.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;visualize&lt;/code&gt;: bounded Mermaid diagram of a neighborhood; edge dash style encodes confidence.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;trace&lt;/code&gt;: cross-stack feature slice across the Endpoint join node.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tool description is kept under ~200 tokens so all nine fit comfortably inside an agent's per-turn metadata budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  How an agent integrates, without you wiring anything
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4jqmplzubbv1kigcknax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F4jqmplzubbv1kigcknax.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On extract, the tool drops write-if-absent shims into the target repo: a &lt;code&gt;CLAUDE.md&lt;/code&gt;, an &lt;code&gt;AGENTS.md&lt;/code&gt;, a &lt;code&gt;.cursor/rules&lt;/code&gt; file, a &lt;code&gt;.continue/rules&lt;/code&gt; file, a Claude Code plugin manifest, and five single-intent skills (&lt;code&gt;codebase-exploring&lt;/code&gt;, &lt;code&gt;-debugging&lt;/code&gt;, &lt;code&gt;-impact-analysis&lt;/code&gt;, &lt;code&gt;-refactoring&lt;/code&gt;, &lt;code&gt;-onboarding&lt;/code&gt;). Each skill's description encodes when to use it, and when to route to a sibling instead.&lt;/p&gt;

&lt;p&gt;The result: a fresh agent opening the repo auto-discovers &lt;code&gt;AGENT_BRIEF.md&lt;/code&gt;, selects the right skill for the task unprompted, and has the 9 MCP tools available. The &lt;code&gt;record_insight&lt;/code&gt;/&lt;code&gt;recall_insights&lt;/code&gt; pair gives it memory that outlives the context window. (Hand-edited files are never overwritten; the shims only fill gaps.)&lt;/p&gt;

&lt;h2&gt;
  
  
  What it gets right, and what it doesn't
&lt;/h2&gt;

&lt;p&gt;I ran a deliberately adversarial review, a fresh agent cross-checking every MCP answer against the actual files. The honest scorecard:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High-trust:&lt;/strong&gt; git archaeology, exact Cypher/structural queries, and the pre-generated briefs. These were accurate and verifiable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify-before-trusting:&lt;/strong&gt; &lt;code&gt;impact()&lt;/code&gt;/&lt;code&gt;context()&lt;/code&gt;/&lt;code&gt;flow()&lt;/code&gt; are excellent lead generators but optimize recall over precision. On a dynamic-dispatch language (Dart) some &lt;code&gt;CALLS&lt;/code&gt; edges are really "references," so a blast radius is a candidate set to verify, not a final answer. v0.8 added precision passes (distinct-caller counts, AMBIGUOUS tiering for same-name collisions, honest degraded-mode flags) that directly address these findings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context-dependent:&lt;/strong&gt; NL &lt;code&gt;query()&lt;/code&gt; and &lt;code&gt;trace&lt;/code&gt; shine on large web/backend codebases and add little on a tiny offline app, and &lt;code&gt;trace&lt;/code&gt; now self-notes when a graph has no endpoints.&lt;/p&gt;

&lt;p&gt;The honest one-liner from that review: a fast lead-generator and an excellent git-risk lens, not an authoritative source of truth. Used as "where should I look and what's risky," it's a clear net positive. Used with verify-the-claim discipline, it pays off.&lt;/p&gt;

&lt;p&gt;I'm equally clear on scope: v0.8 is an assisted-analysis tool. The autonomous end-to-end question, does seeding an agent with this make it resolve real issues measurably faster, is not yet proven. A model-free localization pilot is recorded in the repo (the static seed is a weak prior); the full measurement needs a GPU and a frontier main-agent endpoint, so it's deferred to v0.9. No autonomous-execution claims are made.&lt;/p&gt;

&lt;h2&gt;
  
  
  Roadmap (v0.9)
&lt;/h2&gt;

&lt;p&gt;The headline is an interactive CLI: launch a persistent &lt;code&gt;deepdive&lt;/code&gt; session, a &lt;code&gt;query&lt;/code&gt; REPL holding the graph open, a Textual TUI graph browser, and a guided &lt;code&gt;onboard&lt;/code&gt; wizard, layered on top of the existing command-runner (which stays for CI and agents). Plus the end-to-end usefulness measurement, and a couple of reporting-precision fixes. The full deferred ledger is in the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it / contribute
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;forensic-deepdive
forensic extract /path/to/repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also in the MCP Registry (&lt;a href="https://registry.modelcontextprotocol.io" rel="noopener noreferrer"&gt;https://registry.modelcontextprotocol.io&lt;/a&gt;, &lt;code&gt;io.github.Dhevenddra/forensic-deepdive&lt;/code&gt;) and as a Claude Code plugin (&lt;code&gt;/plugin marketplace add Dhevenddra/forensic-deepdive&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;It's Apache-2.0 and open for contributions. &lt;code&gt;CONTRIBUTING.md&lt;/code&gt; and the architectural invariants are documented. If you work on agent context, code graphs, or developer tooling, I'd value your issues, PRs, and honest critique.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Dhevenddra/forensic-deepdive" rel="noopener noreferrer"&gt;https://github.com/Dhevenddra/forensic-deepdive&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
