Originally published on PrepStack.
Most "AI agents" in production are one giant agent with every tool and a 10,000-token prompt. It loops, stalls, and ships confident nonsense. This is Part 3 of my Context Engineering series.
The reframe
A single agent with every tool isn't "one smart system" — it's a state machine with no states, no guards, and no exits. You don't fix that with a better prompt. You fix it with structure: the model decides; the graph governs.
The architecture
ASP.NET Core owns orchestration, budgets, and governance; a Python LangGraph service runs the agent graph:
- Supervisor → specialized workers — a cheap supervisor (gpt-4o-mini) routes to a retriever, analyst, writer, and critic
- Model routing — only the analyst uses the expensive model; the rest run on mini
- Typed hand-offs — workers pass structured data (Pydantic + C# records), not prose
- Bounded loops + a critic gate — a hard step budget, and it can't finish until a critic verifies the answer is grounded
- Parallelism with failure isolation — concurrent retrieval with per-fetch timeouts; one slow tool degrades to best-effort instead of killing the run
- The C# boundary owns the global wall-clock budget, cost cap, and prompt-injection screen
The results
| Metric | Before | After |
|---|---|---|
| p95 latency | 4.2s | 1.8s |
| Cost per query | $0.021 | $0.008 |
| Context tokens / agentic request | ~12,000 | ~3,800 |
| Runaway loops (>12 calls) | 6% | 0% |
| Endpoint 500 rate | 1.4% | 0.2% |
Three habits: no worker without a budget, no hand-off without a type, no done without a verdict.
Read the full breakdown — with all the C# and Python code — on PrepStack:
https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-3-multi-agent-architecture
Top comments (0)