LayerZero

Posted on Apr 20

Your Agent Isn't Dumb. Your Context Is. — A Field Guide to Context Engineering

#ai #webdev #tutorial #productivity

Prompt engineering is dead. Nobody told you because the influencers still sell courses on it.

The real skill in 2026 is context engineering — the discipline of deciding what information, tools, and memory go into the model's window on every single turn. It's the difference between an agent that ships a pull request and one that hallucinates a function name and rage-quits.

And almost nobody is doing it right.

What changed

A year ago, "prompt engineering" meant crafting the perfect system message. Add a persona, stack some few-shots, wrap in XML tags, done.

That worked when the model was a stateless Q&A box.

It doesn't work when the model is an agent running 40 tool calls across 6 files to fix a bug. The system prompt is 200 tokens. The context is 80,000 tokens of tool results, file contents, user messages, and prior reasoning — and every one of those tokens is either helping or hurting.

Context engineering is the job of keeping the signal-to-noise ratio high across that entire window, turn after turn.

The four levers

Only four things go into an LLM call. Master these and you control the agent.

Instructions — the system prompt. Goals, constraints, tone.
Knowledge — the facts the model needs right now (RAG chunks, API docs, file contents).
Tools — what actions the model can take and how their results come back.
History — prior turns, including tool calls and their outputs.

Every bug in every agent is one of these four going wrong. Always.

Agent loops forever? History is bloated with stale tool results.
Agent calls a function that doesn't exist? Knowledge missing or instructions too vague.
Agent picks the wrong tool? Tool descriptions are ambiguous.
Agent contradicts itself across turns? Instructions got drowned out by history.

The fix is never "try a different prompt." The fix is deciding what to put in — and what to leave out.

Rule 1: the context window is a budget, not a bag

The number one mistake: treating the context window like storage. "I have 200k tokens, I'll just throw everything in."

That's how you burn $4 per agent turn and get worse answers.

Long context is lossy. Models attend less to the middle of a long window, hallucinate more when the signal is buried in noise, and run slower in ways that compound across tool calls. A 2026 Anthropic benchmark found agent task completion drops by roughly 28% when you pad a working context from 20k to 120k tokens — even when the relevant information is unchanged.

You're not saving the model time. You're drowning it.

Treat every token like you're paying rent on it. Because you are.

Rule 2: compact aggressively

When your agent's history crosses some threshold — say 50% of the model's window — summarize it.

Pattern:

def compact_history(messages, token_threshold=50_000):
    if count_tokens(messages) < token_threshold:
        return messages

    # Keep the last 3 turns verbatim (recent context matters most)
    recent = messages[-6:]
    older = messages[:-6]

    # Summarize the older turns into a single system note
    summary = summarize(older, focus=[
        "decisions made",
        "files modified",
        "open questions",
        "tools that failed and why"
    ])

    return [
        {"role": "system", "content": f"PRIOR WORK SUMMARY:\n{summary}"},
        *recent
    ]

You lose the verbatim trace. You keep the signal. And you reset your token budget so the agent can go another 50 turns without collapsing.

Rule 3: retrieve at the tool level, not the prompt level

Old RAG: stuff the top-5 chunks into the system prompt at startup.

New RAG: give the agent a search_docs tool and let it decide when to retrieve.

Why this matters:

Approach	Tokens at turn 1	Tokens at turn 10	Relevance
Prompt-level RAG	8,000	8,000	Guessing
Tool-level RAG	500	500 (+1,200 on demand)	Targeted

Most agent turns don't need retrieval. Why pay the tax on every call? Let the model pull knowledge the way a developer opens a doc tab — only when they need it.

This is "just-in-time context" and it's the single biggest unlock in modern agent design.

Rule 4: tool descriptions are prompts

Your search_database tool's description is a system prompt for how the model reasons about querying data. If it says:

"Searches the database."

...you deserve the hallucinations you get.

Write it like this:

name: search_database
description: |
  Retrieves customer records by exact email or user_id.
  Use this BEFORE suggesting account changes — never guess a user_id.
  Returns at most 10 results. If you need more, narrow the query.
  Fails if the email format is invalid — validate first.

That description teaches the agent when to call it, what it can't do, and how to recover. Every minute you spend rewriting tool descriptions saves ten minutes of debugging agent behavior.

Rule 5: separate durable memory from working context

Working context = what's in the window right now.
Memory = persistent notes the agent writes across sessions (to a file, a vector store, a scratchpad).

If your agent needs to remember that a user prefers Python over Rust, don't shove it into every system prompt forever. Write it to a memory file. Retrieve it when relevant. Trim it when stale.

Memory is context engineering across time. Working context is context engineering within a turn. They're different problems with different solutions — and teams that treat them as one always hit a wall.

The business angle

This matters because AI infrastructure cost is now a line on your P&L.

A well-engineered context window runs an agent task for $0.20.
A lazy one runs the same task for $2.50.
The output quality is often worse on the expensive one.

Multiply that across a product doing 100,000 agent runs a day and you've got a $230,000/month difference in gross margin. That's a hire. That's your Series A runway extension. That's whether you ship.

The teams who figure this out in 2026 aren't the ones with the biggest GPU budgets. They're the ones who treat context as a design discipline.

The non-obvious takeaway

Context engineering is what prompt engineering wanted to be when it grew up.

Prompt engineering asked: "how do I phrase this question?"
Context engineering asks: "what does the model need to see, at what moment, with what tools, to produce the right action?"

The first is a writing exercise. The second is systems design. And systems design is a moat — prompt tricks are not.

What to do this week

Audit one agent you're running. Log the full context at each turn. Find the 30% that isn't earning its tokens. Cut it.
Move your RAG from prompt-level to tool-level. Measure the quality delta — it usually goes up.
Rewrite your top 5 tool descriptions with the "when to use / what it can't do / how to recover" structure.

Your agents will get cheaper, faster, and smarter — in that order.

Follow LayerZero for more decoded AI infrastructure. Next up: the memory-file pattern that makes agents actually learn from their mistakes.

DEV Community