Athreya aka Maneshwar

Posted on Jun 10

Stop Whispering to the Model, Start Furnishing Its Brain

#ai #programming #beginners #machinelearning

Hello, I'm Maneshwar. I'm building git-lrc, a Micro AI code reviewer that runs on every commit. It is free and source-available on Github. Star git-lrc to help devs discover the project. Do give it a try and share your feedback.

Part 1 of a series. This one is the map of context engineering. Future posts are the territory.

There's a moment every developer hits with LLMs.

You write the perfect prompt. Crisp instructions. A nice persona. Maybe a "think step by step" sprinkled on top like seasoning. And the model... gives you something almost right but confidently wrong.

So you tweak a word. Then another. You add "IMPORTANT:" in all caps because surely that'll do it.

You're basically performing a small ritual and hoping the machine spirits are pleased.

Here's the uncomfortable truth: the prompt was never the bottleneck. The context was.

Welcome to Context Engineering: the discipline that quietly does most of the heavy lifting in every serious LLM application, while prompt engineering gets all the blog posts. (The irony of saying that in a blog post is not lost on me.)

This series is my attempt to map the whole thing out.

Today we start with the big picture: what context engineering actually is, and the five pillars we'll spend the rest of the series digging into.

Prompt Engineering vs. Context Engineering

Let's settle this cleanly, because people use these terms like they're synonyms and then act surprised when their app behaves like a confused intern.

Prompt engineering is about the instruction. You're crafting the sentence(s) that tell the model what to do. "Summarize this in three bullet points." "Act as a senior Go reviewer." It's phrasing, framing, and tone. It's important, but it's one ingredient.

Context engineering is about everything the model can see when it answers. The instruction, sure, but also: the retrieved documents, the conversation history, the tool outputs, the user's preferences, the API schema, the error from the last attempt, the relevant rows from your database. It's the entire information environment you assemble around the question.

Here's the analogy that finally made it click for me:

Prompt engineering is asking a good question.
Context engineering is making sure the person you're asking actually has the files open in front of them.

You can ask a brilliant question to someone staring at a blank desk and get nonsense. Or you can ask a mediocre question to someone with exactly the right three documents open and get gold.

The model is only as smart as the context you hand it at inference time.

And this matters more than ever because of one brutal constraint:

The context window is RAM, not a hard drive

A model's context window its 200K, or 1M, or whatever-K tokens — is not storage.

It's working memory.

It's the desk, not the filing cabinet.

Everything the model "knows" in the moment has to fit on that desk, and the desk has a hard edge.

Context engineering is, at its core, the art of deciding what goes on the desk, when, and in what shape.

Too little and the model guesses.

Too much and it gets distracted, slow, expensive, and weirdly forgetful in the middle (the dreaded "lost in the middle" effect).

So how do we manage the desk well? Five pillars. Let's meet them.

Pillar 1: External Memory

LLMs have a goldfish problem.

By default, the model knows two things: whatever got baked into its weights during training, and whatever you put in the current context window.

The instant the conversation ends, the second one in most cases evaporates.

External memory is how we cheat death here. Instead of cramming everything the model might ever need into its parameters (impossible) or into a single prompt (expensive and finite), we stash information outside the model i.e in databases, vector stores, knowledge graphs, plain files and pull the relevant bits back in when we need them.

Think of it as giving the model a notebook it can flip through, instead of demanding it memorize the universe.

[ Model weights ]  →  what it learned in training (frozen, generic)
[ Context window ]  →  what it can see right now (small, expensive)
[ External memory ] →  everything else (huge, cheap, retrievable on demand)

This is the foundation that makes the next pillar even possible. Because once your knowledge lives outside the model, the obvious question becomes: how do I get the right slice of it back in?

Pillar 2: RAG and Dynamic Filters

Retrieval-Augmented Generation is the workhorse of modern LLM apps.

The pitch is simple: before the model answers, go fetch relevant, fresh, factual information from your external memory and slip it into the context.

Now the model answers with your data instead of its hazy half-memory of the internet from two years ago.

If you've built anything with LLMs in the last while, you've probably done RAG even if you didn't call it that.

User asks a question → you embed it → you search a vector DB → you stuff the top results into the prompt → the model answers. Beautiful.

Except naive RAG has a dirty secret: it retrieves things that are similar, not things that are useful.

Vector similarity will happily hand the model five chunks about "billing" when the user asked about a billing bug, including that one outdated doc from 2021 and a chunk that's technically relevant but belongs to a different customer.

This is where dynamic filters earn their keep.

Instead of blindly dumping the top-k matches, you filter the retrieval based on the actual situation:

Who's asking? Filter by user, tenant, permissions. (Nobody should retrieve another customer's data into context. That's not a feature, that's an incident.)
When does it apply? Filter by recency, version, environment.
What's the intent? Route a "how do I" question to docs, a "why is this broken" question to logs.

Pillar 3: Context Compaction

So you've got external memory, and you've got smart retrieval. Congratulations: you can now flood the context window faster than ever.

That's the new problem. Long contexts are slow, expensive, and counterintuitively often worse.

Models lose track of details buried in the middle of a giant blob.

Every redundant token is a token that pushes the important stuff toward the edges of attention.

Context compaction is the discipline of shrinking what you send without losing what matters.

It's lossy compression for meaning.

A few flavors:

Summarization:replace a 40-message conversation with a tight running summary of what's actually decided and relevant.
Filtering / pruning: drop the chunks, turns, or tool outputs that aren't pulling their weight.
Re-ranking: reorder retrieved results so the genuinely most relevant material sits where the model pays the most attention.

A concrete example: in a long agent run, you don't keep every single tool call and its full raw output forever.

You compact. "Searched the DB, found the user's table is documents_v2" is worth keeping.

The 400 lines of JSON it came from? Summarize and discard. (Hold that documents_v2 thought — there's a reason it'll come up again.)

The mantra: the best token is the one you didn't have to send. Compaction is how you free up desk space for the stuff that actually moves the needle.

Pillar 4: Context Isolation

Here's a failure mode everyone discovers eventually.

You build one mega-prompt agent that does everything: it answers support questions, writes code, queries the database, drafts emails, and waters your plants. And it's mediocre at all of it.

Why? Because you've stuffed five jobs' worth of instructions, tools, and context into one window, and they interfere with each other.

The model trying to write SQL gets distracted by the email-drafting instructions.

Unrelated information bleeds across tasks.

It's the cognitive equivalent of trying to do your taxes at a party.

Context isolation is the fix: give each task its own dedicated, clean context.

Instead of one omniscient agent, you use multiple smaller, focused agents (or sub-agents, or just separate calls) each with only the instructions, tools, and information its job requires.

The research agent doesn't need your code style guide.

The coder doesn't need to know how to phrase a customer apology.

By keeping their contexts separate, each one stays sharp, and a mess in one doesn't poison the others.

Isolation buys you reliability and keeps each individual window small enough to actually be good.

Putting the desk back together

Step back and look at the five pillars and you'll notice they're all the same idea wearing different hats:

Pillar	The question it answers
External memory	Where does knowledge live when it's not in the window?
RAG + dynamic filters	How do I pull the right knowledge back in?
Context compaction	How do I fit it without bloat?
Context isolation	How do I keep unrelated jobs from interfering?

Every one of them is a way of managing that scarce, precious desk i.e the context window, so the model sees exactly what it needs and nothing it doesn't.

That's context engineering. Not begging the model in capital letters.

Just deliberate, almost boring decisions about information: what to include, what to leave out, where to store it, and when to bring it back.

The prompt is the question. The context is everything that makes the answer good.

Disclaimer: This article was written by me; AI was used to fix grammar and improve readability.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

HexmosTech / git-lrc

Free, Micro AI Code Reviews That Run on Commit

git-lrc

Free, Micro AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
…

View on GitHub

Top comments (6)

Alex Shev • Jun 11

This is the right mental model. Better results usually come from giving the model a workspace it can reason inside: files, examples, constraints, prior fixes, and explicit success checks.

For coding agents, the terminal is part of that furnished room. It gives the model executable feedback instead of only prose.

Athreya aka Maneshwar • Jun 11

Righttt!

Gamya • Jun 14

Really enjoyed this breakdown! 😊 The "keep documents_v2, burn the JSON" framing is such a clean way to think about it—I hadn't considered how compaction could actually introduce poisoning by turning a hallucination into a "decision" that gets built on later.
The constraints point especially stood out—"never touch the auth module" feels like exactly the kind of thing that would silently get paraphrased away in a summary-of-a-summary situation. I'm curious whether you've seen any patterns for flagging when a constraint might have drifted from its original wording across multiple compactions, or if pinning verbatim is really the only reliable defense.?

Tanay Dwivedi • Jun 14

Thanks @lovestaco for this blog. You explain these concepts well in easy to understand language. Also it made me realised that I have been doing prompt engineering all along and I need to focus more on context engineering.

Athreya aka Maneshwar • Jun 14

Thanks Tanay <3

Sloan the DEV Moderator • Jun 10

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!