Koki Riho

Posted on Apr 1

Why AI Agent Outputs Need Adversarial Review (and How to Add It in One API Call)

#ai #llm #agents #testing

The Problem: Agents Grading Their Own Homework

If you’re running LLM agents in production — whether with LangChain, CrewAI, or custom pipelines — you’ve probably built some kind of output validation. Maybe a second LLM call checks the first one’s work. Maybe you parse for structural issues.

Here’s what I kept finding: LLM-based self-review has a systematic leniency bias. When you prompt an LLM to review output from another LLM (or itself), it overwhelmingly approves. The reviewer and generator share similar blind spots — they fail in correlated ways.

This matters when your agent writes code that gets deployed, generates customer-facing content, or makes decisions affecting downstream systems.

The Approach: Adversarial Review with Dual Consensus

AgentDesk provides two interfaces for adding adversarial review:

MCP Server (open source, MIT) — review-only. Pass in any content, get structured quality feedback. Runs locally with your own API key.
Hosted REST API — generate + review + auto-fix in one call. Submit a prompt, get reviewed output back.

Both use the same core approach:

Two independent reviewers evaluate the output, each prompted adversarially (their job is to find problems, not confirm quality)
Dual consensus — both must agree on pass/fail
Substantive quality validation — a deterministic (non-LLM) layer that requires every checklist item to include specific evidence quoted from the output. Missing citations → automatic fail.
Scored 0-100 with structured feedback on every issue found

It’s BYOK — you supply your own LLM API key. AgentDesk handles orchestration only.

When to Use This

CI pipeline for generated code — gate merges on review score
Content QA for chatbot outputs — catch hallucinations before they reach users
Data extraction validation — verify structured output completeness
Multi-agent workflow checkpoints — validate intermediate outputs between agents

Option 1: MCP Server (Review Only)

Install in one command and use from Claude Code, Claude Desktop, or any MCP client:

claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp

The review_output tool takes content you’ve already generated and reviews it:

{
  "output": "Your agent generated content goes here...",
  "review_type": "code",
  "criteria": "Check for security vulnerabilities and race conditions"
}

Response:

{
  "verdict": "FAIL",
  "score": 38,
  "issues": [
    {
      "severity": "critical",
      "category": "concurrency",
      "description": "Race condition in refill logic under concurrent access",
      "suggestion": "Use atomic CAS operation or mutex lock"
    }
  ],
  "checklist": [
    {
      "item": "Thread safety",
      "status": "fail",
      "evidence": "lines 23-31: non-atomic read-modify-write on token count"
    }
  ],
  "summary": "Critical concurrency issues found. Not safe for production use."
}

For higher confidence, review_dual runs two independent reviewers + merge. Either reviewer can veto a pass.

Option 2: REST API (Generate + Review + Auto-Fix)

For non-MCP integration, the hosted API generates output, reviews it, and auto-fixes (up to 2 iterations) in a single call:

curl -X POST https://agentdesk.usedevtools.com/api/v1/tasks \
  -H "Authorization: Bearer agd_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Write a token bucket rate limiter in Python",
    "api_key": "sk-ant-your-anthropic-key",
    "review": true,
    "review_type": "code",
    "dual_review": true
  }'

import requests

resp = requests.post(
    "https://agentdesk.usedevtools.com/api/v1/tasks",
    headers={"Authorization": "Bearer agd_your_key"},
    json={
        "prompt": "Write a token bucket rate limiter in Python",
        "api_key": "sk-ant-your-anthropic-key",
        "review": True,
        "review_type": "code",
        "dual_review": True,
    },
)
task = resp.json()

import time
while True:
    result = requests.get(
        f"https://agentdesk.usedevtools.com/api/v1/tasks/{task['id']}",
        headers={"Authorization": "Bearer agd_your_key"},
    ).json()
    if result["status"] in ("completed", "failed"):
        break
    time.sleep(2)

print(result["review"]["verdict"])  # PASS / FAIL / CONDITIONAL_PASS
print(result["review"]["score"])    # 0-100

How It Works Internally

Reviewer A gets the output with an adversarial system prompt — find every flaw: factual errors, logical gaps, missing requirements.
Reviewer B independently evaluates from a different angle — completeness, edge cases, whether the output actually addresses the task.
Substantive quality check (deterministic, not LLM-based): every checklist item must include a specific quote from the output as evidence. Missing citations → item force-failed. If >30% of items lack evidence, entire review capped at score 50.
Consensus engine combines reviews. Both must pass. Scores averaged. Disagreements flagged.

Each reviewer gets a fresh API call with only the output to review and a distinct system prompt — no shared conversation history.

How This Compares

Approach	Correlation with generator	Cost	Runtime?
Self-review (same model)	High	1 LLM call	Yes
Chain-of-verification	Medium	2-3 LLM calls	Yes
AgentDesk adversarial	Low	2-3 LLM calls	Yes
Offline eval (Braintrust, DeepEval)	N/A	Varies	No
Human review	None	$$$ + slow	Partially

Pricing

Free: 20 tasks/month (BYOK)
Starter: $29/mo — 500 tasks
Pro: $79/mo — 5,000 tasks + dual review + workflows
Team: $199/mo — 50,000 tasks

The MCP server is free and open source — you only pay for your own LLM API calls.

Get started in 30 seconds:

MCP: claude mcp add agentdesk-mcp -- npx @ezark-publish/agentdesk-mcp
REST API: Sign up for a free API key
GitHub: github.com/Rih0z/agentdesk-mcp

If you’re building with AI agents, I’d like to hear what’s working for you on quality control. Drop a comment or open an issue on GitHub.

DEV Community