Mike

Posted on Mar 4

Agents Lie to Each Other — Unless You Put a Translator in the Middle

#systemdesign #agents #architecture #ai

Here's a failure mode nobody warns you about.

Your crash tracker identifies a regression. Solid analysis, reasonable conclusions, 72% confidence. You forward the findings to the telemetry analyzer: "here's what the crash tracker found, correlate it with latency data." The telemetry analyzer reads the crash tracker's reasoning, inherits its framing, and returns a conclusion that builds on that framing. You forward that to the anomaly detector. By the time you're three agents deep, you have a confident, coherent, actionable finding — built entirely on the crash tracker's original 0.72 confidence estimate, which has been laundered through two more models into something that reads like a fact.

Nobody lied. Every agent reasoned correctly from what it was given. And that's exactly the problem — it just looks like lying after three hops. The bug is in the channel.

The Telephone Game, But Every Player Has a PhD

If you've ever played the telephone game, you know the problem: each retelling introduces subtle drift. With kids at a birthday party, this is funny. With autonomous systems making production decisions, this is how you ship a "fix" for a problem that doesn't exist — and spend $0.40 in tokens having three agents confidently agree on a mistake.

Here's why it happens. LLMs reason from context. When the telemetry analyzer receives the crash tracker's full output — including the phrase "race condition during token refresh" — it starts looking for patterns that confirm that framing. Not because it's biased. Because that's what language models do: if you put "race condition" in the context, the model will find evidence of race conditions. "Likely race condition, moderate confidence" becomes "the race condition identified by the crash analysis." The hedge evaporates. The uncertainty gets stripped at each hop, and downstream agents treat increasingly confident conclusions as ground truth.

This is what I call reasoning chain contamination. No single agent made a mistake. The error is structural — it lives in how information moves between agents, not in how any individual agent reasons.

The related failure mode is context bleed. When the crash tracker's full output — including its reasoning about memory allocation patterns, its speculation about Android 13 edge cases, its reference to a loosely matching CVE — ends up in the telemetry analyzer's context, the telemetry analyzer starts reasoning about memory and Android 13 even though its job is network latency. The crash tracker's concerns become the telemetry analyzer's priors. Not because anyone passed bad data. Because you gave a reasoning engine someone else's reasoning, and it did what reasoning engines do.

The Anti-Patterns That Feel Right Until They Don't

Three patterns that seem obvious in a multi-agent system. All three will burn you.

"Just let agents message each other." This feels like microservices. Agent A sends a message to Agent B, Agent B processes it, life goes on. But microservices don't reason — they transform. A microservice that receives {status: "error"} doesn't speculate about what caused the error. An LLM agent does. Worse, LLMs are trained to be helpful — which in practice means they're biased toward agreeing with whatever context they're given. When Agent A's output becomes Agent B's input, Agent B doesn't just treat it as context — it's predisposed to confirm it. It has no way to know which parts are facts, which are inferences, and which are hedges that the model expressed confidently because that's how LLMs write.

"Give all agents access to the same context." Feels elegant — one shared state, everyone's in sync. In practice, you get a bloated context window where every agent inherits every other agent's priors. The crash tracker's memory-leak hypothesis lives alongside the analytics agent's revenue anomaly and the channel scanner's sentiment shift. Each agent reads all of it, reasons from all of it, and produces outputs contaminated by domains that aren't its job. You've built one general-purpose agent wearing a trench coat pretending to be a team.

"Just pass the full output." The most common, and the most damaging. "Here's everything the crash tracker returned, do something useful with it." The receiving agent gets a 2,000-token narrative full of "likely," "suggests," guesses, and facts — all formatted identically. It can't tell which is which. It will treat all of it as input. It will reason against all of it. And its own output will be a confident synthesis of someone else's uncertainty.

Here's what the broken flow looks like:

┌─────────────────────┐
│    Crash Tracker    │
│    confidence: 72%  │
└────────┬────────────┘
         │ full narrative output
         │ (facts + guesses + "likely")
         ▼
┌─────────────────────┐
│ Telemetry Analyzer  │
│ (inherits crash     │
│  tracker's priors)  │
└────────┬────────────┘
         │ confident synthesis
         │ (uncertainty erased)
         ▼
┌─────────────────────┐
│  Anomaly Detector   │
│                     │
│    "confirmed."     │
└─────────────────────┘

By the third agent, the 72% is gone. You have a "confirmed" finding.

The Orchestrator Does Not Forward Mail. It Rewrites It

The core principle: the orchestrator never passes raw agent output to another agent. It translates.

Agents are prompted to output structured JSON conforming to their domain schema — and they self-validate before returning. When the crash tracker finishes its analysis, the output goes to the orchestrator, which validates it again against a JSON Schema. If required fields are missing or types don't match, it rejects the output and reprompts. Defense in depth: the agent tries to get it right, the orchestrator makes sure. No LLM in the translation path. The orchestrator extracts the structured fields, drops the prose, and creates a new handoff packet — not the crash tracker's words, but a normalized representation of its findings. The raw output is stored for audit and debugging, but it never enters another agent's context.

The telemetry analyzer receives this packet. It doesn't see the crash tracker's "suggests" and "possibly" — just structured parameters: a component name, a time range, a platform filter, and a signal strength. That's it. The telemetry analyzer does its job — correlate with latency data — without inheriting anyone else's priors.

This is the orchestrator-as-translator pattern. Agents speak their domain language. The orchestrator speaks the common language. No agent ever reads another agent's prose.

┌─────────────────┐  structured JSON   ┌─────────────────────┐
│  Crash Tracker  │  (typed fields +   │    Orchestrator     │
│                 │   reasoning field) │                     │
│                 │ ──────────────────→│  • Validates schema │
└─────────────────┘                    │  • Drops reasoning  │
                                       │  • Maps fields      │
                                       │  • Creates handoff  │
                                       │    packet           │
                                       └────────┬────────────┘
                                                │
                                                │ handoff packet
                                                │ (typed fields
                                                │  only)
                                                ▼
                                       ┌──────────────────────┐
                                       │  Telemetry Analyzer  │
                                       │                      │
                                       │  (sees only typed    │
                                       │   fields — no prose, │
                                       │   no hypotheses)     │
                                       └──────────────────────┘

The key insight: the orchestrator is lossy on purpose. It drops the reasoning field — the prose where "suggests," "possibly," and "loosely matches" live. That prose is the pathogen in cross-agent communication. What survives are typed fields and a signal-strength category (high, moderate, low) instead of the model's prose explanation of why it's moderate. The downstream agent receives what it needs to do its job. Nothing more.

Here's What the Schema Actually Looks Like

The crash tracker returns its analysis to the orchestrator. Here's what the orchestrator sees:

{
  "pattern_type": "crash_regression",
  "affected_component": "session_manager",
  "confidence": 0.72,
  "incident_count": 47,
  "platform": "android_13",
  "trigger": "network_transition",
  "timestamp_range": {
    "start": "2026-03-04T08:00:00Z",
    "end": "2026-03-04T12:00:00Z"
  },
  "reasoning": "The crash pattern in session_manager.dart suggests a race condition during token refresh. Stack trace signature loosely matches CVE-20XX-XXXXX but not confirmed. Possible memory pressure as contributing factor.",
  "recommended_action": "investigate token refresh lifecycle"
}

The agent outputs structured JSON with typed fields and a reasoning field containing its prose analysis. Both exist in the same payload — the structured fields are machine-readable facts, the reasoning is the agent's narrative interpretation.

Here's what the orchestrator sends to the telemetry analyzer:

{
  "schema": "cross_agent_v1",
  "pattern_type": "crash_regression",
  "affected_component": "session_manager",
  "timestamp_range": {
    "start": "2026-03-04T08:00:00Z",
    "end": "2026-03-04T12:00:00Z"
  },
  "signal_strength": "moderate",
  "request": "correlate_with_latency",
  "context": {
    "platform_filter": "android_13",
    "event_filter": "network_transition",
    "incident_count": 47
  }
}

Notice what's missing. The reasoning field — the race condition hypothesis, the CVE reference, the memory pressure speculation — is gone. Those are the crash tracker's interpretations, not facts. The telemetry analyzer gets the structured fields: a component, a time range, a platform, and a count. It does its own analysis from clean inputs.

signal_strength: "moderate" tells the downstream agent how much weight to give this signal. But it's a structured field, not a paragraph of reasoning. The telemetry analyzer can't latch onto the crash tracker's explanation of why it's moderate and start building on it.

The schema version (cross_agent_v1) makes this contract explicit and testable. When you need to add a field, you version the schema. When a downstream agent breaks, you diff the schemas. It's API design, not vibes.

The orchestrator's role definitions make the translation rules explicit:

orchestrator:
  handoff_schemas:
    - crash_regression
    - performance_degradation
    - anomaly_alert
    - informational_summary
  schema_validation: strict
  strip_fields:
    - reasoning
    - recommended_action
  preserve_fields:
    - pattern_type
    - affected_component
    - incident_count
    - platform
    - trigger
    - timestamp_range
  transform_fields:
    confidence: signal_strength # 0-1 float → high/moderate/low

The orchestrator doesn't interpret — it extracts. Because agents already output structured JSON, the translation is deterministic: validate the schema, map the fields, drop everything else. strip_fields and preserve_fields are rules in config, not judgment calls. This is the same structural enforcement principle from the sidecar proxy pattern — security through architecture, not through hoping the system makes good choices.

When the Schema Isn't Enough

The structured summary packet solves 90% of cross-agent communication. Here's the other 10%.

The schema is too narrow. The crash tracker identifies something genuinely novel — a failure pattern that doesn't fit any existing pattern_type. The orchestrator has no schema for "I've never seen this before." Two options: route it to a human-review queue for classification before it continues downstream, or use a general-purpose informational_summary schema that flags it as unclassified. Either way, a human looks at it before it becomes another agent's input.

The schema is too lossy. Sometimes the downstream agent legitimately needs more context. The telemetry analyzer might need to know whether the crash confidence was 72% or 95% — the exact number, not just "moderate." The fix isn't to pass the reasoning. It's to add a structured field: "confidence_score": 0.72. Expose the data as a typed field, not as prose. The moment you pass prose reasoning between agents, you've reopened the contamination channel.

The handoff is genuinely ambiguous. Some cross-domain questions don't have a clean schema. The crash tracker found something that might be a security issue, or might be a performance regression, or might be user error. Three agents could reasonably claim it. Routing this to a Slack channel for human triage before it enters any agent's context is not a failure of the system — it's the correct design. The instinct to automate every handoff is how you get automated hallucination chains.

A schema that forces you to think about what you're actually communicating is doing its job — even when it's inconvenient.

Where This Came From

This article exists because of a comment thread.

When I published Part 1 of this series, Matthew Hou asked the question I'd been avoiding: "how do you handle the cases where an agent's one job requires context from another agent's domain?" It's the gap in the one-agent-one-job architecture that everyone notices and nobody writes about.

signalstack nailed the answer in a comment: "the orchestrator never passes raw agent output directly to another agent. It sends structured summary packets — a defined schema that strips the crash tracker's output down to just [pattern_type, affected_endpoint, timestamp_range]." They articulated the core pattern better than most architecture docs I've read. This article is the longer treatment of that insight.

The best design discussions happen in comment threads. This was one of them.

The Skeptical Translator

When you force all cross-agent communication through a schema, you force yourself to answer a question that most systems never ask: what do I actually need to communicate?

Not "what did the agent say?" Not "what might be useful?" But specifically: what structured facts does the next agent need to do its job? That constraint is uncomfortable. It means you can't just wire agents together and hope for the best. You have to design every handoff. You have to decide what crosses the boundary and what doesn't.

That's the point.

Agents are opinionated reasoners. Given context, they will reason from it — confidently, thoroughly, and without distinguishing between facts and inherited assumptions. The orchestrator's job is to be a skeptical translator, not a faithful messenger. Faithful messengers amplify errors. Skeptical translators catch them.

The orchestrator isn't a router. It's an editor — and good editors make things shorter, not longer.

This article grew out of the comment discussion on I Run a Fleet of AI Agents in Production. For the full architecture, see Part 1 (architecture, container isolation, tiered LLMs) and Part 2 (security, JIT tokens, self-healing workflows).

When the schema isn't enough and you need multi-LLM arbitration — council discussions, structured voting, adversarial debate — those patterns are open-source in mcp-rubber-duck.