DEV Community

Koki Riho
Koki Riho

Posted on

Why AI Agent Outputs Need Adversarial Review (and How to Add It in One API Call)

The Problem: Agents Grading Their Own Homework

If you're running LLM agents in production, you've probably built some kind of output validation. Maybe a second LLM call checks the first one's work. Maybe you parse for structural issues.

Here's what I kept finding: LLM-based self-review has a systematic leniency bias. When you prompt an LLM to review output from another LLM (or itself), it overwhelmingly approves. The reviewer and generator share similar blind spots — they fail in correlated ways.

This matters when your agent writes code that gets deployed, generates customer-facing content, or makes decisions affecting downstream systems.

The Approach: Adversarial Review with Dual Consensus

AgentDesk is an HTTP API that sits between your agent's output and whatever consumes it:

  1. Two independent reviewers evaluate the output, each prompted adversarially (their job is to find problems, not confirm quality)
  2. Dual consensus — both must agree on pass/fail
  3. Anti-gaming validation — detects outputs that are superficially correct but substantively hollow
  4. Scored 0-100 with structured feedback on every issue found

It's BYOK — you supply your own LLM API key. AgentDesk charges only for orchestration.

Adding It to Your Pipeline

curl

curl -X POST https://agentdesk-blue.vercel.app/api/v1/tasks \
  -H "Authorization: Bearer agd_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Summarize this quarterly report in 3 bullet points",
    "review": true,
    "review_type": "content"
  }'
Enter fullscreen mode Exit fullscreen mode

Python

import requests

resp = requests.post(
    "https://agentdesk-blue.vercel.app/api/v1/tasks",
    headers={"Authorization": "Bearer agd_your_key"},
    json={
        "prompt": "Summarize this quarterly report in 3 bullet points",
        "review": True,
        "review_type": "content",
    },
)

task = resp.json()
# Poll GET /api/v1/tasks/{task['id']} for results
Enter fullscreen mode Exit fullscreen mode

JavaScript

const resp = await fetch('https://agentdesk-blue.vercel.app/api/v1/tasks', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer agd_your_key',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    prompt: 'Summarize this quarterly report in 3 bullet points',
    review: true,
    review_type: 'content',
  }),
});

const task = await resp.json();
// Poll GET /api/v1/tasks/${task.id} for results
Enter fullscreen mode Exit fullscreen mode

How It Works Internally

  1. Reviewer A gets the output with an adversarial system prompt — find every flaw: factual errors, logical gaps, missing requirements.
  2. Reviewer B independently evaluates from a different angle — completeness, edge cases, whether the output actually addresses the task.
  3. Anti-gaming check detects outputs designed to pass review without being good — verbose empty answers, pattern-matched boilerplate.
  4. Consensus engine combines reviews. Both must pass. Scores averaged. Disagreements flagged.

Completes in a single API call, typically 3-8 seconds.

How This Compares

Approach Correlation with generator Cost
Self-review (same model) High 1 LLM call
Chain-of-verification Medium 2-3 LLM calls
AgentDesk adversarial Low 2-3 LLM calls
Human review None $$$ + slow

The key difference isn't the number of LLM calls — it's that adversarial prompting with anti-gaming breaks the correlation between generator and reviewer failure modes.

Pricing

  • Free: 20 tasks/month (BYOK)
  • Starter: $29/mo — 500 tasks
  • Pro: $79/mo — 5,000 tasks + dual review + workflows
  • Team: $199/mo — 50,000 tasks

Open source: github.com/Rih0z/agentdesk


If you're building with AI agents, I'd like to hear what's working for you on quality control. Drop a comment or open an issue on GitHub.

Top comments (0)