The 5 Hardest Problems in Context Engineering Right Now

#api #ai #automation #saas

Context engineering is not prompt engineering. It is an emerging discipline focused on optimizing context windows for AI agents, translating upstream APIs and MCP servers into efficient tool interfaces, and building evaluation frameworks for agent-API interaction quality.

If you are working with agents and MCP tools, you are probably running into these problems already. Here are the five we keep seeing.

1. No agent evaluation framework
You need to know whether your agent called the right API with the right parameters in the right order — not just whether the output text looks reasonable.

Most evaluation today stops at the surface. Did the LLM produce something plausible? That is necessary but not sufficient. An agent that returns a confident summary while calling the wrong endpoint with the wrong payload is worse than one that errors out, because the failure is invisible.

What we need: evaluation frameworks that inspect the actual state of upstream services after an agent acts. Did the record get created? Did the right field get updated? Was the sequence of API calls correct? This is integration testing for autonomous systems, and the tooling barely exists.

Compounding the problem: different models behave differently with the same tool definitions. What works on Sonnet may break on Opus. Evaluation needs to be model-aware.

2. Too many MCP tools
The MCP ecosystem is growing fast. The problem is that an agent with access to 40 tools performs worse than one with 4 well-designed tools.

Context windows are not infinite. Every tool description consumes tokens. Every similar-sounding tool creates decision overhead for the model. Context engineers need to consolidate sprawling API surfaces into minimal, precise tool interfaces.

This is the core design problem in context engineering: compression without information loss. You need to understand the upstream API semantics AND the downstream model behavior, then find the minimal translation between them. The right translation differs by model, task, and user intent.

What many teams ship:

list_users
get_user
create_user
update_user
delete_user
list_user_roles
get_user_role
assign_user_role
... (30 more tools)

What the agent actually needs:

query_resource (list, get, filter across entities)
mutate_resource (create, update, delete with validation)
manage_permissions (roles, assignments)

The consolidation is not mechanical. It is a design problem.

3. MCP workflows need embedded question formation
"Ask anything" interfaces sound great in demos. In practice, they fail when developers don't know what to ask.

This is the cold-start problem for MCP-powered copilots. A developer who asks the wrong question gets a confident-sounding answer to the wrong question — and might not realize it until something breaks downstream.

Context engineers are discovering that documentation layers need to embed question formation into the workflow. Instead of just answering the current question, the system should suggest the next question based on where the developer is in their workflow.

Think of it like this: traditional docs are a reference book. What agents need is a guided workflow that knows which page you need before you know to look for it.

4. Repos need minimum agent operational docs
Your repo looks well-documented. README, API docs, architecture decision records, inline comments. A human developer can onboard in a day.

An agent tries to run the test suite and fails immediately.

Why? Because "run the setup script" requires a human to infer which script, navigate to the right directory, set environment variables, and interpret error messages. An agent needs all of that explicit:

Which script (exact path)
What arguments
What environment variables must be set (and valid values)
What the expected output looks like
What to do when output doesn't match
Context engineers are defining "minimum agent operational documentation" — the baseline structured docs a repo needs before agents can reliably operate within it. This is not replacing human docs. It is a parallel, machine-readable layer.

The interesting side effect: making docs agent-consumable usually makes them better for humans too, because it forces you to close the gaps everyone was silently working around.

5. Markdown needs standardized structures

This is the most foundational problem and it affects everything else.

Most markdown is written for human readability: inconsistent heading hierarchies, ambiguous section boundaries, sections that depend on context from earlier in the document, formatting that prioritizes aesthetics over structure.

Agents need:

Consistent heading structures
Explicit section boundaries
Self-contained sections (no dependency on prior context)
Metadata indicating what kind of information each section contains

The gap between "human-readable markdown" and "agent-consumable markdown" is wider than most teams expect. Every MCP server serving markdown content, every copilot referencing project docs, every automated workflow reading a README is affected.

The common thread
These problems exist because infrastructure built for human-only workflows does not transfer cleanly to agent-augmented workflows. Context engineering is the discipline of building that translation layer.

At Naftiko, we are building the governed, spec-driven capability layer between enterprise APIs and AI agents. These are exactly the problems we are solving.

If you are running into any of these in your own work, I'd like to hear about it in the comments.