Edward Burton

Posted on Dec 10, 2025

Stop Building God Agents

#ai #webdev #agents #programming

We are building AI systems wrong.

Not slightly wrong. Fundamentally, structurally, catastrophically wrong.

The pattern is always the same. A team discovers the magic of a Large Language Model. They wrap it in a Python script. They give it access to the database, the API gateway, and the customer support logs. They dump three gigabytes of documentation into the context window because "1 million tokens" sounds like infinite storage.

They call it an "Agent."

In reality, they have built a God Agent. A monolithic, omniscient, undifferentiated blob of logic that tries to be the CEO, the janitor, and the database administrator simultaneously.

And it fails.

It hallucinates. It gets confused. It costs a fortune in token usage. The latency creeps up until the user experience feels like waiting for a dial-up connection in 1999. When it breaks (and it always breaks) the engineers cannot debug it because the logic isn't in the code. It’s in a probabilistic haze of prompt engineering and context pollution.

I have spent the last year tearing these systems apart. The solution isn't a better prompt. It isn't a bigger model.

The solution is architecture.

I've written a comprehensive deep-dive with full architectural diagrams, but here is the practical guide to stopping the madness.

The Context Window Myth

We have been sold a lie.

The lie is that if you give a model enough context, it can solve any problem. Vendors push "infinite context" as the ultimate feature. 128k. 1 million. 2 million tokens. The implication is seductive. Don't worry about architecture. Don't worry about data curation. Just dump it all in. The model will figure it out.

This has led to the rise of the God Agent paradigm.

In this worldview, an "Agent" is a singular entity. It holds the entire state of the application. It has access to every tool in the library. When a user asks a question, the God Agent receives the query, looks at its massive context (which contains the entire history of the universe), and attempts to reason its way to an answer.

It feels like progress. It looks like the sci-fi dream of a singular, conscious AI.

But in production, this is a nightmare.

This approach conflates three distinct concepts:

Orchestration (Planning)
Capabilities (Skills)
Execution (Tools)

By mashing these into one layer, we create a system with zero separation of concerns. We assume the model can discern the signal from the noise. But as we stuff more data into the context window, we degrade the model's ability to reason. We introduce "context pollution."

We are effectively asking a junior developer to memorize the entire codebase, the company handbook, and the legal archives, and then asking them to fix a CSS bug in 30 seconds.

They won't fix the bug. They'll have a panic attack.

The Cracks in the Monolith

The cracks in the God Agent architecture are already visible to anyone pushing code to production.

Context Pollution and The Needle in the Haystack
The more information you provide, the less attention the model pays to the critical bits. This is not just a feeling. It is an architectural flaw. Research shows that models struggle to retrieve information from the middle of long contexts. By failing to curate, we actively harm performance. We create systems where the "noise" of irrelevant documentation overpowers the "signal" of the user's specific intent.

Latency and Cost
Every token costs money. Every token takes time to process. A God Agent that re-reads a 50k token context for every turn of conversation is burning cash. It is computationally wasteful. We are running a supercomputer to answer "yes" or "no" because we didn't bother to filter the inputs.

The Debugging Black Hole
When a God Agent fails, why did it fail? Was it the prompt? The retrieval step? The tool output? Or did it just get distracted by an irrelevant piece of text from page 405 of the documentation? You cannot unit test a prompt that changes its behaviour based on the variable soup of a massive context window.

The Governance Void
A single agent with access to everything is a security nightmare. If the prompt injection works, the attacker owns the castle. There are no bulkheads. There is no "zero trust" because the architecture relies on maximum trust in a probabilistic model.

The Solution: The Agentic Mesh

The path forward is Aggressive Context Curation and the Agentic Mesh.

We must shatter the God Agent. We must replace it with a network of small, specialized, highly constrained agents that communicate via standardized protocols.

In a mesh architecture, no single agent knows everything.

The Router Agent knows how to classify intent.
The Support Agent knows the return policy.
The Coding Agent knows Python.
The SQL Agent knows the database schema.

They do not share a context window. They share messages.

This is the shift from a monolith to microservices. It is the only way to scale complexity. When the Support Agent is working, it doesn't need to know the database schema. It doesn't need the Python libraries. Its context is pristine. It is curated.

Defining Terms

This requires a rigorous definition of terms, something developers often skip in the rush to ship.

The Agent: The Orchestrator. It plans. It reasons. It holds the goal.
The Skill: The "How." A specific workflow or capability (e.g., "Draft Email").
The Tool: The "What." The external resource (e.g., Gmail API, Google Search).

By separating these, we gain control. A Skill can be optimised. A Tool can be sandboxed. An Agent can be monitored.

Enter Google's ADK and A2A

I have been skeptical of big tech frameworks. They usually add bloat. They usually try to lock you into a vendor ecosystem that you will regret three years later.

But Google's Agent Development Kit (ADK) and the Agent-to-Agent (A2A) protocol are different. They are trying to solve the plumbing problem.

Google has realised that if we want agents to work, they need to talk to each other like software. Not like chatbots.

The ADK: Structured Context Management

The ADK forces you to think about context as a tiered system, not a bucket. It introduces concepts like Session, State, and Memory.

Session: The immediate conversation.
State: Variables that change (e.g., user_authenticated = true).
Memory: Long-term storage.

Crucially, it supports Artifacts. An artifact is a discrete piece of content a code snippet, a document summary that lives outside the conversational flow but is referenced by it. This keeps the context window clean. The agent doesn't "read" the whole file every time. It references the artifact ID.

The A2A Protocol: The TCP/IP of Agents

This is the game changer. The A2A protocol is a vendor-neutral standard for agents to discover and talk to each other.

It uses "Agent Cards." Standardised JSON metadata that describes what an agent can do.

Think of it like this:

{
  "agent_id": "billing_specialist_v1",
  "capabilities": ["process_refund", "check_invoice_status"],
  "input_schema": {
    "invoice_id": "string",
    "customer_id": "string"
  },
  "output_schema": {
    "status": "string",
    "refund_amount": "float"
  }
}

When a Router Agent needs to process a refund, it doesn't try to hallucinate the API call. It looks up the billing_specialist, handshakes via A2A, passes the structured payload, and waits for a structured response.

This is standardisation.

It allows us to build an Agentic Mesh where agents from different teams, or even different companies, can collaborate.

This solves the "isolated islands" problem. Currently, an OpenAI agent cannot talk to a Vertex AI agent. With A2A, they share a protocol. They negotiate.

Click to expand: A Speculative Router Workflow

Here is how I visualise the mental model of a Router Agent in this mesh. This isn't production code, but a representation of the logic flow.

def handle_user_request(request):
    # Step 1: Classify Intent (The Router's only job)
    intent = classification_model.predict(request)

    # Step 2: Discovery
    # Find an agent card that matches the intent
    target_agent = registry.find_agent(capability=intent)

    if not target_agent:
        return "I don't know how to do that yet."

    # Step 3: Handshake and Context Curation
    # We do NOT send the full history. We send only what is needed.
    payload = {
        "task": intent,
        "parameters": extract_params(request, target_agent.input_schema)
    }

    # Step 4: A2A Execution
    response = a2a_protocol.send(target_agent.id, payload)

    return response

Implications for Builders

Adopting a mesh architecture changes everything about how we build.

1. Observability is Mandatory

You cannot grep the logs of a probabilistic mesh. Traditional observability (logs, metrics, traces) is insufficient.

We need Agentic Observability. We need to see the reasoning chain. Why did the Router hand off to the Billing Agent? Why did the Billing Agent reject the request? We need to trace the cost and latency per node in the mesh.

If you don't have this, you aren't building a system. You're building a casino.

2. Zero Trust Security

In a God Agent model, security is a binary switch. In a mesh, we can apply Zero Trust.

The Billing Agent does not trust the Router Agent implicitly. It verifies the payload. It checks the policy. It limits the blast radius. If the Router is compromised via prompt injection, it cannot simply force the Billing Agent to drain the bank account. The interface is strict. The schema is validated.

3. The End of "Prompt Engineering"

Prompt engineering as a standalone discipline is dying.

It is being replaced by System Engineering. The prompt is just a function configuration. The real work is in the routing logic, the schema definition, and the context curation strategy.

4. Aggressive Context Curation

We must become ruthless editors. The goal is not to fill the context window. The goal is to empty it.

We need to compress. We need to summarise. We need to inject only exactly what is needed for the next immediate step. If an agent is tasked with writing SQL, it needs the schema. It does not need the company mission statement.

(Sounds obvious. Yet I see it ignored in 90% of codebases.)

Warning: Complexity Transfer
Moving to a mesh architecture doesn't remove complexity. It moves it from the Prompt to the Architecture. This makes it harder to build initially, but easier to maintain. Choose your poison.

The Bigger Picture

The honeymoon phase of Generative AI is over. The demos were fun. The chat interfaces were magical.

But now we have to do the actual work.

We have to build systems that are reliable, secure, and cost-effective. We have to treat AI models not as mystical oracles, but as components in a software architecture. Components that have limitations. Components that need clear interfaces.

The God Agent is a dead end.

DEV Community