DEV Community

VentureIO
VentureIO

Posted on • Originally published at operatoriq.io

The new metrics: agent throughput, verification rate, recovery rate

The new metrics: agent throughput, verification rate, recovery rate

How do you know the agents are actually working?

This is the question every operator asks the second week of running an agentic system in production. The old metrics don't fit. Tickets resolved assumes a human is closing them. Revenue per FTE doesn't know how to count an agent. NPS surveys aren't going to tell you that your support agent silently dropped half its tickets last Thursday.

You need a new vocabulary. Here it is, with formulas, with real numbers, and with the 20 lines of code you need to start tracking them today.

TL;DR

  • The three metrics that matter most for an agentic-AI-first business are throughput, verification rate, and recovery rate. Track these three before anything else.
  • Throughput = units of work produced per agent per day. The "units" are whatever the agent's job description says it ships.
  • Verification rate = the percentage of agent outputs that pass an independent check. If you don't have an independent check, the agent isn't running in production, it's running in trust.
  • Recovery rate = the percentage of failures the system caught and fixed without a human in the loop.
  • Two secondary metrics also matter: drift rate and silent-green rate.

Why the old metrics don't work

The metrics every SaaS dashboard ships with assume a human worker. Revenue per FTE was invented to measure how much output you get from a person on payroll. Tickets resolved counts the closes on a human's queue.

None of these work for an agent. An agent can claim a job complete and have produced nothing. The metric you need isn't a re-skinned human metric.

We learned this the hard way. We ran our first agentic outbound stack for two weeks reporting "100% job success rate" because every scheduled task logged exit code zero. We then checked the actual outbox and found that one of the three send branches had been silently failing for nine days. The exit code was lying.

Metric 1: Throughput

Definition. Units of work produced per agent per day.

Formula. throughput = count(outputs in period) / count(agents) / count(days in period)

Real numbers from our running system. Blog Writer ships 1.0 posts/day. Outreach Closer sends 14-32 cold emails/day. Support Agent handles 3-9 replies/day. Distributor pushes 4-6 syndication events per published asset.

Why it matters. Throughput is the first thing to fall when something is wrong. If throughput drops from 32 emails/day to 8 with no schedule change, something broke.

Metric 2: Verification rate

Definition. The percentage of agent outputs that pass an independent check.

Formula. verification_rate = count(outputs verified pass) / count(outputs produced)

Real numbers. Blog Writer 0.94, Outreach Closer 0.91, Support Agent 0.88, Distributor 0.97.

The most important rule: the verification process must not be the same code that produced the output. If it's the same code, you're checking the agent's homework with the agent's homework.

Metric 3: Recovery rate

Definition. The percentage of failures the system caught and fixed without a human in the loop.

Formula. recovery_rate = count(failures auto-recovered) / count(failures)

Real numbers. 0.71 over the last 30 days. 71% of failures fixed without a human. The other 29% escalated to a human queue.

Why it matters. If your recovery rate is 0.10, you're a babysitter, not an operator. If your recovery rate is 0.70+, you have a real autonomous system.

Secondary metric 4: Drift rate

drift_rate = count(actions outside envelope) / count(total actions)

Our drift rate: 0.012. Roughly 1 in 80 actions flagged as outside scope. A drift rate above 0.05 means the agent doesn't understand its own job description.

Secondary metric 5: Silent-green rate

silent_green_rate = count(success-logged with empty output) / count(success-logged total)

Target: 0.00. This is the failure mode that destroys trust. You think the agent worked. The log says it worked. Nothing happened.

The dashboard

Agent Throughput (30d avg) Verification rate Recovery rate Drift rate Silent-green
Blog Writer 1.0 posts/day 0.94 0.83 0.005 0.00
Outreach Closer 22 emails/day 0.91 0.78 0.018 0.00
Support Agent 6 replies/day 0.88 0.70 0.011 0.00
Distributor 5 events/asset 0.97 0.81 0.004 0.00
Lead Sourcer 18 leads/day 0.93 0.74 0.009 0.00
Operator 1 loop/day 0.96 0.65 0.001 0.00

Two minutes per agent. Anything trending down two days in a row is a flag.

The 20-line implementation

import json
import time
from pathlib import Path

LOG = Path("runs.jsonl")

def log_event(agent: str, event_type: str, **kwargs):
    row = {
        "ts": time.time(),
        "agent": agent,
        "event_type": event_type,
        **kwargs,
    }
    with LOG.open("a") as f:
        f.write(json.dumps(row) + "\n")

def compute_throughput(agent: str, days: int = 30) -> float:
    cutoff = time.time() - days * 86400
    with LOG.open() as f:
        rows = [json.loads(line) for line in f]
    outputs = [r for r in rows
               if r["agent"] == agent
               and r["event_type"] == "output"
               and r["ts"] >= cutoff]
    return len(outputs) / days
Enter fullscreen mode Exit fullscreen mode

Wire verification rate, recovery rate, drift rate, and silent-green with the same pattern. You'll know within a week whether your agents are actually working.


Does your brand show up when buyers ask AI chatbots about your category? The LLMRadar AI Brand Audit runs your brand through ChatGPT, Claude, Perplexity, and Gemini and returns an instant PDF with exactly what they say about you -- and where the gaps are.

Run your AI Brand Audit -- $197 -- Instant delivery, no calls.

Top comments (0)