Context Engineering for Enterprise AI, Part 3: Multi-Agent Architecture That Survives Production

#ai #llm #dotnet #python

Originally published on PrepStack.

Most "AI agents" in production are one giant agent with every tool and a 10,000-token prompt. It loops, stalls, and ships confident nonsense. This is Part 3 of my Context Engineering series.

The reframe

A single agent with every tool isn't "one smart system" — it's a state machine with no states, no guards, and no exits. You don't fix that with a better prompt. You fix it with structure: the model decides; the graph governs.

The architecture

ASP.NET Core owns orchestration, budgets, and governance; a Python LangGraph service runs the agent graph:

Supervisor → specialized workers — a cheap supervisor (gpt-4o-mini) routes to a retriever, analyst, writer, and critic
Model routing — only the analyst uses the expensive model; the rest run on mini
Typed hand-offs — workers pass structured data (Pydantic + C# records), not prose
Bounded loops + a critic gate — a hard step budget, and it can't finish until a critic verifies the answer is grounded
Parallelism with failure isolation — concurrent retrieval with per-fetch timeouts; one slow tool degrades to best-effort instead of killing the run
The C# boundary owns the global wall-clock budget, cost cap, and prompt-injection screen

The results

Metric	Before	After
p95 latency	4.2s	1.8s
Cost per query	$0.021	$0.008
Context tokens / agentic request	~12,000	~3,800
Runaway loops (>12 calls)	6%	0%
Endpoint 500 rate	1.4%	0.2%