DEV Community

kirandeepjassal-crypto
kirandeepjassal-crypto

Posted on • Originally published at prepstack.co.in

Context Engineering for Enterprise AI, Part 3: Multi-Agent Architecture That Survives Production

Originally published on PrepStack.

Most "AI agents" in production are one giant agent with every tool and a 10,000-token prompt. It loops, stalls, and ships confident nonsense. This is Part 3 of my Context Engineering series.

The reframe

A single agent with every tool isn't "one smart system" — it's a state machine with no states, no guards, and no exits. You don't fix that with a better prompt. You fix it with structure: the model decides; the graph governs.

The architecture

ASP.NET Core owns orchestration, budgets, and governance; a Python LangGraph service runs the agent graph:

  1. Supervisor → specialized workers — a cheap supervisor (gpt-4o-mini) routes to a retriever, analyst, writer, and critic
  2. Model routing — only the analyst uses the expensive model; the rest run on mini
  3. Typed hand-offs — workers pass structured data (Pydantic + C# records), not prose
  4. Bounded loops + a critic gate — a hard step budget, and it can't finish until a critic verifies the answer is grounded
  5. Parallelism with failure isolation — concurrent retrieval with per-fetch timeouts; one slow tool degrades to best-effort instead of killing the run
  6. The C# boundary owns the global wall-clock budget, cost cap, and prompt-injection screen

The results

Metric Before After
p95 latency 4.2s 1.8s
Cost per query $0.021 $0.008
Context tokens / agentic request ~12,000 ~3,800
Runaway loops (>12 calls) 6% 0%
Endpoint 500 rate 1.4% 0.2%

Three habits: no worker without a budget, no hand-off without a type, no done without a verdict.

Read the full breakdown — with all the C# and Python code — on PrepStack:
https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-3-multi-agent-architecture

Top comments (0)