Let me tell you about a seven-word request that became 5,000 tokens.
A developer opens Cursor and types: "Add error handling to this function."
Seven words. That's the prompt. That's what the developer thinks they sent.
Here's what actually got sent to the model:
SYSTEM: You are an expert software engineer. Write clean,
production-ready code. Follow the existing coding style.
Use the same language and framework as the surrounding code.
CONTEXT — Current file:
[500-2,000 tokens of the file being edited]
CONTEXT — Related files:
[300-1,000 tokens from imported modules and type definitions]
CONTEXT — Project structure:
"This is a TypeScript/Next.js project using Prisma ORM."
CONTEXT — Recent edits:
[What the developer changed in the last 5 minutes]
CONTEXT — Error messages:
[Current terminal errors or linter warnings]
USER: Add error handling to this function.
The developer's seven words are sitting at the bottom of 3,000–5,000 tokens of injected context they never wrote and never saw. And that's why the suggestion fits perfectly — correct imports, matching style, compatible types, aware of the existing error patterns in the codebase.
The prompt didn't do that. The context did.
The term that's already misleading people
"Prompt engineering" took off in 2020 when GPT-3 landed and people discovered that phrasing mattered. Ask the question one way, get a useful answer. Ask it another way, get garbage. The insight was real. The name stuck.
But the name is now actively misleading a generation of developers about where the actual leverage is.
A prompt is what you type. Context is everything the model sees the system prompt, conversation history, injected data, retrieved documents, tool results, examples, and constraints. In production AI systems, what you actually type is often less than 5% of the total context window.
If you're optimizing the 5% and ignoring the 95%, you're polishing the doorknob while the house is on fire.
The engineers building the best AI products in the world right now are not writing clever prompts. They are engineering context deciding what information the model needs, how to retrieve it, how to structure it, and when to inject it. That is a fundamentally different and more architectural skill.
What context engineering actually looks like in production
Take Perplexity. When you ask it about a current event, here's what's actually happening:
It recognizes the question needs live information
It generates search queries and hits Bing
It chunks and embeds the retrieved pages
It re-ranks the chunks by relevance to your question
It injects the top chunks into the prompt alongside your question
The model generates an answer with inline citations
Your question might be twelve words. The total context going into the model is several thousand tokens of retrieved, ranked, and structured web content. The model isn't smarter than ChatGPT without browsing. It has better context construction.
Or take enterprise knowledge bots — the most widely deployed AI use case in companies right now. The ones that actually work don't work because someone wrote a brilliant system prompt. They work because someone built a pipeline that:
Ingests and chunks 10,000 internal documents properly
Embeds them into a vector database
Retrieves the right 3–5 chunks at query time using semantic search
Injects those chunks with the right framing into the prompt
The query "how many vacation days do I get" becomes a grounded, accurate answer not because the prompt was clever but because the right chunk from the right HR document was sitting in the context window when the model generated its response.
The prompt is the last mile. The context pipeline is the highway.
Why this distinction isn't just semantic
Here's where it gets practical.
If you think in terms of prompt engineering, your mental model is: I have a model, I write instructions, I get output. The levers are wording, tone, examples, and structure. This is useful. It is also profoundly limited.
If you think in terms of context engineering, your mental model expands: I have a model with a context window. That window is real estate. What I put in that real estate and how I get it there determines everything about the output quality. The levers are retrieval, structuring, injection timing, chunking strategy, system prompt design, conversation state management, and tool result formatting.
This is why two teams using the exact same model GPT-4, Claude, Gemini, doesn't matter can get dramatically different results. They're not using different models. They're filling the context window differently.
The best AI products differentiate on context engineering. The model is a commodity. What you put around the model is the product.
The five things context engineering actually involves
1. What to inject. Not everything belongs in the context window. Injecting irrelevant information degrades performan the model attends to noise. Good context engineering means deciding what the model actually needs to know to do this specific task, nothing more.
2. How to retrieve it. For dynamic systems RAG pipelines, browsing agents, code tools the context isn't static. It gets assembled at runtime. Semantic search, re-ranking, and hybrid retrieval are context engineering problems, not prompt problems.
3. How to structure it. The same information formatted differently produces different outputs. A retrieved document dumped as raw text performs worse than the same document with a clear label indicating what it is and why it was retrieved.
4. When to inject it. In multi-turn conversations and agentic systems, context management is ongoing. What stays in the window? What gets summarized? What gets dropped? These are architectural decisions with real consequences.
5. What to exclude. The context window is finite. Filling it with the wrong things doesn't just waste space it dilutes the signal. Negative decisions are as important as positive ones.
None of these are prompting decisions. They're system design decisions.
The uncomfortable truth about "prompt engineers"
In 2023, "prompt engineer" was a real job title with real salaries. The idea was that crafting the right instructions to get good outputs from AI was a specialized skill worth paying for.
That was true for about eighteen months. Then two things happened.
First, models got better at following instructions. The gap between a well-crafted prompt and a mediocre one got smaller as models became more capable.
Second, the industry realized that the real leverage was never in the prompt itself. It was in what surrounded the prompt. The teams building durable AI products had stopped thinking about prompts and started thinking about pipelines.
This doesn't mean writing clear, specific instructions stopped mattering. It does mean that "I'm good at writing prompts" is table stakes now, not a competitive skill. The developers who are building things that actually work in production are thinking one level up.
What to do with this
If you're using AI primarily through a chat interface typing requests and reading responses prompt thinking is appropriate and useful. The five-layer framework of role, task, context, format, and guardrails will get you significantly better outputs.
But if you're building AI-powered products, the question to ask is not "how do I write better prompts?" It's "what does the model need to see, and how do I get it there?"
That reframe changes what you build. Instead of a clever system prompt, you build a retrieval pipeline. Instead of carefully worded instructions, you build a context assembly layer that pulls the right information at the right time. Instead of tweaking wording, you instrument your system to see what's actually in the context window when things go wrong.
The prompt is still there. It still matters. But it's downstream of everything else.
The real engineering is happening upstream in the pipeline that decides what shows up in that context window before the model ever reads a single word you wrote.
Building something where context engineering is the hard part? I'd genuinely like to hear what problems you're running into drop them in the comments.
Top comments (0)