DEV Community

Cover image for First Principles of AI Context
Craig Tracey
Craig Tracey

Posted on • Originally published at contextshift.io

First Principles of AI Context

Every few weeks someone publishes a benchmark showing that the latest model is smarter, faster, more capable. Context windows are getting massive. A million tokens, two million, more on the horizon. And that’s genuinely impressive.

But it raises a question nobody seems to be asking: what are we filling those windows with?

Right now, the answer is mostly everything. Dump in the docs. Stuff in the chat history. Append the tool definitions. Hope the model figures out what matters.

Bigger windows don’t solve the context problem. They just give you more room to be wrong. A million tokens of unfocused, unstructured context isn’t better than ten thousand tokens of the right context. It’s worse, because the model has to work harder to find the signal in the noise, and you’re paying for every token of that noise.

I’ve spent the last year building agent infrastructure, and I keep landing on the same conclusion: the bottleneck isn’t the model and it isn’t the window size. It’s the quality and structure of what goes into the window. Until we treat context as an engineering problem, not just a capacity problem, we’re going to keep building impressive demos that fall apart in production.

Here are the first principles I keep coming back to.


The Context Exists. The Relations Don't.

There’s a reason AI coding tools are so far ahead of everything else. Code has explicit structure: dependencies, type systems, call graphs. The model can follow the relationships. It can reason about how things connect.

Now think about everything else we’re trying to point AI at. Your operations. Your organization. Your business processes. There’s no relationship graph. No map connecting a customer complaint to the team responsible to the system that caused it.

Without structure, the model guesses. A bigger window just means it has more room to guess in.

The structure already exists inside your systems. Before you can get real value from AI, you need to connect it.

Semantics are probability, not truth.

This is the thing that’s easy to forget when a model gives you a confident, well-formatted answer: it doesn’t know anything. It’s predicting the most likely next token. When you ask it to interpret your data, it’s giving you the most probable interpretation, not necessarily the correct one.

That distinction doesn’t matter much when you’re generating a summary or drafting an email. It matters enormously when an agent is deciding which team to page at 3am, or which customer account is affected by an outage, or whether a support ticket is related to a known incident.

You can see this play out in real time with tool calls. An agent without enough context doesn’t just pick the wrong tool. It tries one, fails, tries another, fails again, and loops. It’s not being stupid. It’s doing exactly what you’d expect from a system that’s navigating by probability without a map. It doesn’t have the connective tissue to know that this entity means that tool, so it guesses, checks the result, and guesses again. It’s brute-forcing a path through a graph it can’t see.

Probability is useful. But decisions need ground truth. And ground truth comes from structure: explicit relationships that say this is connected to that, defined by rules, not inferred by a model.

The more we rely on agents to take real action, the less we can afford to let them operate on vibes.

Facts without relationships are a dead end.

RAG was supposed to solve the context problem. Ground the model in your data. Retrieve relevant chunks. It works for question answering.

And even that takes a surprising amount of effort. Chunking strategies, embedding model selection, reranking, relevance tuning, keeping the index fresh as your data changes. RAG pipelines are deceptively expensive to build well and even harder to maintain. That’s a lot of investment for a system that tops out at retrieval.

And when teams hit the ceiling of what vanilla RAG could do, where did they turn to improve it? You guessed it. Graphs. GraphRAG exists because people kept running into the same wall: retrieval without relationships isn’t enough.

But the moment you want an agent to do something, retrieval isn’t enough. Knowing “there was an incident last Tuesday” is a fact. Knowing that the incident affected three customers, was caused by a change made by a specific team, and is related to two open support tickets? That’s a graph. That’s the difference between an agent that can answer questions and one that can actually reason about what to do next.

We keep trying to solve a graph problem with a search engine. Vector similarity tells you what’s textually related. It can’t tell you what’s causally connected, what depends on what, or what breaks if something changes. And because similarity is probabilistic, it’ll happily surface content that looks related but isn’t, with no way to tell the difference.

Context has to discover itself.

Here’s where it gets hard. You can’t manually build and maintain a map of how everything in your world connects. But look at what we’re doing today to try.

We write longer prompts. We craft system instructions. We maintain AGENTS.md and CLAUDE.md files. We build onboarding documents that try to explain our world to the model in prose. We hand-author tool descriptions and few-shot examples. We create elaborate prompt chains that try to steer the model toward the right context at the right time.

All of these are manual. All of them go stale. And all of them are fundamentally trying to solve the same problem: teaching the model what it should already be able to see.

And here’s the kicker. What are we writing all of this context in? Natural language. Prose. The very thing we just established is interpreted probabilistically, not precisely. We’re using semantics to provide context to a system that processes semantics as probability. We’re bootstrapping truth from a medium that doesn’t guarantee it.

It works at small scale. When you have five tools and one domain, you can write enough context by hand to get by. But it breaks the moment your environment grows. More tools, more systems, more relationships, more change. The rate of change is faster than any human process can keep up with.

The only context that stays accurate is context that builds itself, continuously, from the systems that are already running. The relationships already exist inside your tools and platforms. They’re just not structured in a way that AI can use.

The job isn’t data entry. The job is discovery.

Structure needs rules, not just data.

This one took me a while to internalize. You can ingest every piece of data from every system you touch and still have nothing useful. Data without interpretation is noise, and a model will happily interpret that noise for you. Confidently, probabilistically, and sometimes wrong.

Structure emerges from rules. A project is owned by a team. A customer is served by a product. An alert relates to an incident. These aren’t things you discover statistically. They’re things you define. And once defined, they make relationships queryable, composable, and trustworthy. Not probable. True.

Without rules, you have data. With rules, you have structure an agent can trust.

Agents need context before tools.

MCP gave agents a standard way to call tools. That was a genuine breakthrough. But tools without context are blind.

Think about how an agent actually decides which tool to call. It reads the tool’s name and description and picks the one that seems most relevant. Semantics again. The entire tool selection process is probabilistic. The agent isn’t matching against a schema or following a rule. It’s making its best guess.

Give an agent access to hundreds of tools and watch what happens. It picks the wrong ones. It hallucinates capabilities. It takes action without understanding what it’s acting on. And every one of those irrelevant tool definitions is eating up your context window, crowding out the information the agent actually needs. Each failed tool call burns tokens, adds latency, and pushes useful context further out of reach.

The fix isn’t better prompting. The fix is context first, tools second. The agent needs to understand what’s relevant to the current task before it gets access to the tools that apply.

This is the order of operations that most agent architectures get backwards.


Why this matters now

We’re about to get 10 million token context windows. The temptation will be to treat that as a solution. Just throw everything in and let the model sort it out.

That won’t work. It’ll just be expensive, slow, and probabilistically wrong in ways that are hard to debug. The context problem isn’t about capacity. It’s about knowing what matters, how things connect, and what’s relevant right now. With certainty, not just likelihood.

MCP is taking off. Agent frameworks are proliferating. Everyone is building tool integrations. But almost nobody is building the context layer underneath: the thing that decides what goes into the window and why.

That’s the gap. And it’s the gap that will determine whether AI agents become genuinely useful or remain expensive toys that work great in demos.

I started this newsletter because I think the people building in this space need a place to think through these problems together. Not hype. Not product announcements. Just the hard, specific questions that come with making AI systems work for real.


This is the problem I'm building toward solving with sixdegree.ai. More on that soon - and more on the specific patterns that actually work in production.

Top comments (0)