Context Engineering Is the New Prompt Engineering

#ai #contextengineering #claudecode #devtools

Everyone's writing better prompts. Few are building better context.

That's the gap. Prompt engineering treats AI like a search box — craft the perfect query, get the perfect answer. Context engineering treats AI like a new team member — give them the right docs, the right access, and a clear understanding of how work actually gets done. As Andrej Karpathy put it, the hottest new programming language is English — but the program isn't the prompt. It's the context surrounding it.

I've spent the last six months building AI-native workflows at Presidio, where I'm a Principal Solutions Architect. Not chatbots. Not demos. Production systems where Claude Code agents run real presales operations — client research, proposal generation, meeting analysis, deal tracking. The kind of work that used to live in someone's head and a dozen browser tabs.

Here's what I learned about making AI actually useful.

Prompts Are Requests. Skills Are Frameworks.

The first mistake everyone makes: stuffing domain knowledge into prompts. "You are an expert in enterprise sales. When analyzing a deal, consider these 47 factors..."

That breaks immediately. Prompts are ephemeral — they disappear when the conversation ends. Domain knowledge needs to persist across sessions, get version-controlled, and evolve as you learn what works.

The pattern that works: skills files — what Claude Code's plugin architecture calls reusable domain knowledge. Markdown documents that encode decision frameworks, not instructions. A skill isn't "analyze this deal." A skill is the 5-gate qualification framework your team actually uses, written as structured markdown with decision criteria, red flags, and exit conditions. The AI reads it and applies it. You update the framework once, every future session uses the new version.

.claude/skills/
├── qualification-framework.md    # Decision gates with criteria
├── pricing-strategy.md           # Margin rules, discount authority
├── sow-review-rubric.md          # Evaluation checklist
└── competitive-positioning.md    # Differentiators by competitor

Skills are reusable. Prompts are disposable. That distinction matters more than any prompting technique.

Context-as-Code: Version Control Your AI's Brain

Every client engagement in my system has a single markdown file that serves as the AI's working memory for that account. Contacts, scope decisions, meeting history, action items, competitive intel — one file, version-controlled in git.

Why markdown? Three reasons:

It's grep-searchable. When an agent needs to find every mention of a specific technology across all accounts, grep -r "Kubernetes" clients/ works instantly. Try that with a vector database.
It diffs cleanly. Git shows you exactly what changed in the AI's understanding of an account. Who updated it, when, and why.
It's token-efficient. Structured markdown compresses well in context windows. A 200-line context file gives an agent everything it needs to operate on an account without RAG retrieval latency.

The anti-pattern is treating AI memory as a black box — embeddings you can't inspect, vector stores you can't diff, context you can't version. If you can't git blame your AI's knowledge, you don't control it.

One God-Agent Is a Trap

I started with one agent that did everything. It was mediocre at all of it.

The fix was domain specialization. Four agents, each with a clear role: one handles discovery and qualification, one handles technical design and proposals, one handles deal operations and pricing, one handles workspace maintenance. Each agent has its own tools, its own context, and a defined handoff protocol for passing work to another agent.

This mirrors how real teams work. Your sales engineer doesn't do contract redlines. Your deal desk doesn't design architecture. Specialization isn't just about accuracy — it's about cost. A maintenance agent running on Haiku costs 95% less than an Opus agent doing the same file cleanup.

Model routing by task complexity is the easiest money you'll save in AI:

Task Type	Model	Why
File organization, validation	Haiku	Structured, predictable
Research, summarization	Sonnet	Good reasoning, fast
Strategy, complex writing	Opus	Needs deep reasoning

Mistakes Must Become Infrastructure

Every production AI system has failure modes you won't predict. The question is whether failures teach the system or just annoy you.

My approach: every time an agent makes a mistake that I have to correct, it becomes a numbered rule in the system's configuration file. Not a mental note. Not a prompt tweak. A permanent, version-controlled rule that every future session reads on startup.

After six months, I have 39 of these. Examples:

"State files in .gitignore can vanish silently during merges — default to safe fallback values"
"Never infer what a client said in a meeting — only quote from the transcript or flag it as an assumption"
"Contract reverts produce the same error on every RPC — don't retry, they're non-recoverable"

These aren't prompt engineering. They're institutional memory encoded as code. The system gets smarter every time it fails, without retraining, fine-tuning, or hoping the model "remembers."

Don't Dump Everything Into Context

The biggest performance killer in AI systems isn't the model — it's context pollution. Loading every piece of knowledge into every session degrades output quality and burns tokens.

The pattern that works: modular context loading. My system has 14 skills, 56 commands, and context files for dozens of accounts. But any given session loads only what's relevant — the specific client context, the specific workflow skills, the specific agent role. Everything else stays on disk until needed.

Think of it like imports in code. You wouldn't import * from every module in your codebase. Don't do it with AI context either.

This also means your context files need to be current state, not changelogs. A context file that accumulates three months of historical notes becomes noise. Describe what the system is right now in 150 scannable lines. Put the changelog somewhere else.

The Stack That Actually Works

After building this across multiple enterprise engagements, here's the architecture I'd recommend for anyone building AI-native workflows:

Structured context files (markdown, git-tracked) over vector databases for domain knowledge
Skills (persistent frameworks) over prompts (ephemeral instructions) for domain expertise
Specialized agents with handoff protocols over one general-purpose agent
Cost-aware model routing — match model capability to task complexity
Error-to-rule pipelines — every failure becomes a permanent system improvement
Modular loading — only load context relevant to the current task

None of this requires a framework. No LangChain, no LlamaIndex, no orchestration layer. It's markdown files, a CLI, and good engineering discipline. The AI does the reasoning. You do the architecture.

The tools for building AI-native workflows exist today. The bottleneck isn't model capability — it's context architecture. Start treating your AI's knowledge like code: structured, versioned, reviewed, and intentionally loaded. That's the difference between a chatbot and a system.