Inferred context is not a dependency graph

#aicodingagents #crossrepocontext #contextengine #dependencygraph

A pattern I keep running into, first as a consultant and lately in places like r/CursorAI: a developer is working in one repo, asks an agent to change something that touches a shared library living in another repo, and the agent confidently writes code against an interface that changed six months ago. The methods do not exist any more. The suggestion looks right, reviews fine to a tired eye, and breaks at integration time.

The fix everyone reaches for now is an enterprise context engine that indexes the whole organisation and feeds the agent more context. Tabnine shipped a strong one this year. I want to be fair about this up front, because the rest of the post draws a line and the line only means something if the thing on the other side of it is good: the problem these engines target is real, grounding an agent in your actual systems clearly beats letting it guess from training data, and Tabnine is a serious product built by serious people.

But there are two different questions hiding under the phrase "cross-repo context," and they want different machinery. This is the third in a loose series about that confusion. The first looked at why a modelled catalog and a parsed graph are different categories; the second at why a code-symbol graph stops where infrastructure starts. This one is about the difference between context a model infers and a graph a parser derives, and why that difference is invisible right up until the moment it is the whole cost.

A note on what this post is, and isn't

This is not a competitive teardown. Tabnine's Enterprise Context Engine went generally available in February 2026, works alongside Cursor, GitHub Copilot and Claude Code rather than replacing them, and serves whichever agent a team already uses. It connects to repositories, CI, code review, docs and ticketing, builds a continuously updated model of the organisation, and hands the relevant slice to an agent at generation time. Tabnine was named a Visionary in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, for the second year running. The customer logos are the kind you would expect: large, regulated, careful. None of that is marketing fog. It is a real answer to a real problem, and if you are running agents across a large codebase it is worth your attention.

The argument here is narrower. "Context that makes my agent generate better code" and "the graph I check before I ship a breaking change" are different jobs, and the second one has a tolerance for error that inference cannot meet by construction. That is not a flaw in any context engine. It follows directly from the design choice that makes inference powerful in the first place.

If you are a platform engineer evaluating the wave of context infrastructure arriving this year, the practical takeaway is the same one I keep landing on: most teams need both kinds of thing, and the tools should look as different as the questions do.

Two questions hiding under "cross-repo context"

Sit with the problem for a minute and it splits cleanly.

The first question is generative. When the agent writes a call into the shared library, what does that library actually contain right now? What are its conventions, its recent changes, the half-finished migration nobody wrote down? This is a context problem. The better your index of code, conventions and history, the fewer wrong calls the agent makes. An inferred context layer is the right tool here, and a good one is genuinely hard to build. It reads everything, models the relationships semantically, and hands the agent the slice that matters. When it is slightly wrong, the cost is a suggestion you correct, which you were going to review anyway. The error is cheap because it lands inside a loop that already has a human in it.

The second question is structural. Before I change this shared thing, what across the organisation actually breaks? This is not a context problem. It is a graph problem. And the tolerance for being slightly wrong is on a completely different scale. If the answer to "what depends on this Terraform module" misses three repos, you do not find out at review time. You find out in production, three hours later, with Slack on fire. The error is expensive because it lands outside the loop, after the human has already signed off.

Same surface verb, "what depends on this," and almost nothing else in common. The first wants the model's best, fastest, most fluent guess. The second wants an answer you would be willing to gate a deploy on. That last phrase is the bar I hold everything in this post to, and it is a deliberately higher bar than "context that makes my agent a bit better."

Three ways to answer "what depends on this"

Here is the part that has shifted since I first started writing about this. A year ago the honest framing was a binary: inferred context versus a parsed graph. That binary is no longer quite right, because "deterministic" has become contested vocabulary. Several good products now claim some version of structure or determinism, and they mean genuinely different things by it. There are three architectures in the market, not two, and it is worth being precise about all three.

Inferred. A model reads your artifacts and decides what relates to what. This is the RAG-and-semantics camp: Tabnine's Enterprise Context Engine is the strongest current example, and its own materials are admirably direct about the mechanism, describing the engine as combining semantic retrieval with structural reasoning and enriching context with "inferred relationships." The edges are produced by a model's judgement. That is exactly the right tool for the generative question, where a fluent, broad, slightly-fuzzy picture beats a narrow exact one.

Registered. A human writes each entity into a typed catalog and the tooling queries it. This is the developer-portal camp: Backstage, Port, Roadie. Roadie made the case earlier this year, and made it well, that a typed entity graph with declared schemas gives an agent deterministic answers rather than the fuzzy output of semantic search over docs. They are right that it is deterministic to query. The catch is in the word registered: a catalog entry records what an engineer knew at the moment they wrote it, so the graph is only as current as the last person who updated the YAML, and the reason platform teams quietly abandon these catalogs is that the maintenance burden outpaces the value within a couple of quarters. Deterministic to read, stale by construction.

Parsed. Edges extracted from the source files that already define them. A Terraform module block referencing a git URL. A Dockerfile FROM pulling an internal image. A dependencies entry in a Chart.yaml. A reusable workflow referenced by uses:. These are not inferred and they are not separately registered. They are already written down, in manifests, in formats that parse deterministically, and the relationship either exists in the source or it does not. This is the camp Riftmap sits in, and the property that matters is that it is deterministic and self-updating: there is nothing for a human to maintain and nothing for a model to guess, because the declaration is the dependency.

The three are not ranked. Inference wins the generative question. The registered catalog carries metadata a parser will never see, like ownership and on-call. But for the specific question "what breaks if I change this," only one of the three gives you an answer with no maintenance debt and no probability attached. That is the one you can gate a deploy on.

Why inference is the wrong guarantee for the structural question

An inferred graph is built by a model reading artifacts and deciding what relates to what. That is powerful for the messy, semantic, undocumented stuff, and I am not dismissing it: a docker pull buried in a shell script, a convention that took three RFCs to settle, an architectural decision that lives only in someone's head. Recovering that is real value and it is hard to do well.

But for "what consumes this," inference has a property you cannot design away: you do not know what it missed. A confidence score tells you how sure the model is, not whether the edge exists. For impact analysis that is precisely the wrong guarantee, because the entire point of blast radius is that the expensive failures are the edges you did not know about. A tool that is usually right about which repos depend on your module is no help in the one case where being wrong is the whole cost. "Ninety-two per cent confident these are your consumers" is unshippable: you cannot merge a breaking change to a shared module against a probability distribution over its consumer set. You need the actual set, derived from actual source, with an audit trail you can hand to the consumer teams before you ship. A smaller graph you can audit beats a broader graph you have to trust.

This is not a lonely opinion, which is the other thing that has changed in the last year. The distinction between deterministic and inferential machinery is becoming load-bearing across the field. Martin Fowler's site published a piece on harness engineering that splits an agent's supporting tools into computational ones, which are deterministic, cheap and safe to run on every change, and inferential ones, which are semantic, expensive and non-deterministic, useful precisely where you can tolerate the fuzz. Independent academic work points the same way: a January 2026 paper from Tel Aviv University introduces the Repository Intelligence Graph, a deterministic, evidence-backed map extracted from build and test artifacts that agents treat as the authoritative description of repository structure. Giving three commercial agents that deterministic graph improved mean accuracy by 12.2% and cut completion time by 53.9%, with the largest gains in exactly the multilingual, cross-toolchain repositories where inference struggles most. Even Roadie, from the registered camp, is arguing that structure beats semantic retrieval for the questions an agent needs to act on.

The field is converging on the idea that some questions want a deterministic answer and some want an inferred one, and that conflating them is the mistake. The structural question is firmly in the first bucket.

The coverage gap inference does not close

There is a second problem, separate from the confidence-score one, and it is the one I care most about because it is where the actual work lives.

The dependency edges that bite hardest in a DevOps organisation are not function calls. They are a Terraform module block referencing a git URL with a ?ref= pin. A Dockerfile FROM pulling an internal base image whose tag is set by a build-arg in a separate CI file. A Helm chart depending on another chart through OCI, or HTTPS, or a Flux source pointer. A CI template included across thirty repos. A reusable GitHub Actions workflow referenced by uses: org/repo/.github/workflows/deploy.yml@v2. These are declared, in manifests, in formats that parse deterministically. You do not need a model to read a dependencies block in a Chart.yaml. You need a parser, and then you need a resolver that canonicalises the git URL across its three URL forms, evaluates the semver constraint against the published versions, follows the umbrella chart that re-exports your chart, and chases the build-arg into the CI file that sets it.

Most code-intelligence and context engines stop at the language boundary and never touch this layer. That is not an oversight, it is the same structural fact I worked through at length for Sourcegraph: an IaC dependency is not a symbol. It is a value inside a string that an infrastructure tool evaluates at plan or build time. A semantic model can guess at these relationships, and it will get many of them, but "many" is the failure mode, not the success. The whole point of parsing this layer is that the answer is complete and verifiable, with each edge clicking through to the exact line that created it.

This is also why the IaC layer is the part inference is least equipped to fake. A model trained on a lot of code can reason fluently about a Go interface. The relationship between a Terragrunt root and the module it pins at ~> 3.2, mediated by an intermediate wrapper module that floats on main, is not the kind of thing you reason fluently about. It is the kind of thing you parse, resolve, and check.

What about the tools that parse deterministically too?

The honest objection here is not "isn't this just Tabnine." Tabnine infers; the line against it is clean. The sharper objection comes from the small but growing set of tools that do parse deterministically: local context engines like vexp and open-source analysers like codeindex build dependency graphs straight from an AST with tree-sitter, with nodes for functions, classes and types and edges for calls and imports, no model in the path. The Repository Intelligence Graph paper above is in the same spirit, extracted from build systems. These are deterministic, parsed and auditable, and they are right to be.

They also stop at the same place Sourcegraph does. The graph is a code-symbol graph: functions, types, imports, usually within a single project on a single machine. That is genuinely useful, and for "who calls this function" it is the correct architecture. But it is not the cross-repo infrastructure-artifact layer. None of these tools resolves a Terraform module source across an organisation, evaluates Helm version constraints across three reference formats, or follows a Docker base image through a build-arg into the CI file that sets it. The parsed-graph idea is spreading, which I take as validation, and it is spreading at the language-symbol level while the cross-repo IaC artifact layer stays unserved. That gap is the entire reason Riftmap exists.

So the map has three axes, not one. Inferred context engines (Tabnine) for the generative question. Registered catalogs (Backstage, Port, Roadie) for org metadata. Parsed graphs for the structural question, splitting again into code-symbol parsers (vexp, codeindex, Sourcegraph's SCIP) and the artifact parser estate that handles cross-repo IaC. Riftmap is the last of those, and as far as I can find it is currently the only one.

What this means in practice

I think these tools compose rather than compete, and I mean that more precisely than the usual "we play nicely together."

If you are running an agent across a large codebase, a context engine that grounds its generation is worth having. Feed it the cross-repo picture and your agent hallucinates less and writes against the interface that exists today rather than the one from six months ago. That is a real win and I would not argue against it for a second.

But when the actual decision in front of you is "I am about to change this thing, what is the blast radius," you want a graph that was parsed, not inferred. Edges you can click through to the exact line that created them. A version constraint already evaluated against the published versions, so you know which consumers float onto your new release and which are pinned and safe for now. No confidence score, because there is nothing to be unsure about: the module either references that source or it does not.

Concretely, that is the difference between an agent reasoning over a semantic graph and an agent making a call like this before it plans a change:

GET /api/v1/repositories/{id}/impact

{
  "affected": [
    { "repo": "acme/platform-prod", "depth": 1, "version_constraint": "~> 3.2" },
    { "repo": "acme/monitoring",    "depth": 2, "version_constraint": ">= 3.0" }
  ],
  "total_affected": 7,
  "last_scanned_at": "2026-05-30T08:14:00Z",
  "last_commit_sha": "a1b3f9c"
}

That response carries a freshness contract, not a confidence score. If the repo has been pushed to since the graph last looked, the agent knows the data may be stale and can trigger a rescan before it trusts the answer. Staleness is detectable and fixable. A missed edge in an inferred graph is neither.

This is the multi-context future I sketched in the Sourcegraph piece: a serious agent setup in 2027 composes several specialised context layers, each with its own grammar and freshness model and MCP server. Symbol context for code, artifact context for infrastructure, ticket context, docs context, runtime context. An inferred context engine is one of those layers and a good one. The parsed artifact graph is a different layer answering a different question. The composition is the point, not the competition.

The short version

There are two questions under "cross-repo context." The generative one, "what does this library contain," wants the model's best guess, and an inferred context engine is the right tool for it. Tabnine's is strong and the problem it solves is real.

The structural one, "what breaks if I change this," wants a different guarantee. A confidence score answers the wrong question, because blast radius is precisely about the edges you did not know existed, and the IaC edges that hurt most are declared in manifests a parser can read completely rather than guessed at semantically. For that question you want a graph that was parsed, not inferred: deterministic, self-updating, auditable to the exact line, carrying a freshness contract instead of a probability.

The market has three architectures for this now, not two. Inferred context for generation, registered catalogs for metadata, parsed graphs for impact. The parsed camp is filling in at the code-symbol level and still empty at the cross-repo infrastructure level. That is the line I have drawn with Riftmap: deterministic parsing first, across the IaC and DevOps ecosystems where the edges are declared and verifiable, with no model guessing anywhere in the path that answers "what breaks."

Inferred context is useful. It is just answering a different question than the one you ask right before you ship a breaking change. For that question you do not want the model's best guess. You want the graph.

This is the kind of question Riftmap is built to answer. It scans your GitHub or GitLab organisation with a read-only token, parses Terraform, Docker, Helm, Kustomize, Kubernetes, GitHub Actions, GitLab CI, Ansible, Go modules and npm, and builds the cross-repo artifact graph as a queryable surface, for engineers in the UI and for agents over the API. Around ninety seconds to first graph. If you have read this far, the free tier is here, and the agent integration guide shows how to wire the graph in as a tool call.

If you want the underlying parsing work one ecosystem at a time, the Find Every Consumer series goes through Docker base images, Terraform modules, GitHub Actions workflows, Helm charts and Go modules in turn.