DEV Community

Aamer Mihaysi
Aamer Mihaysi

Posted on

The Latency Lie: Why Your Agent Is Slower Than You Think

Everyone measures agent latency. Almost nobody measures it correctly.

The problem is that most latency metrics capture model response time, not user experience time. And those are very different things.

What most teams measure:

  • Time from request to first token
  • Time from first token to completion
  • Total generation time

These are useful metrics. They tell you how fast the model responds. But they do not tell you how long the user waits.

What actually affects user experience:

  1. Preprocessing time. Before the model sees the prompt, you may be doing retrieval, context building, prompt assembly. This can add seconds that do not show up in model metrics.

  2. Tool execution time. When the agent calls a tool, the model stops timing. But the user is still waiting. Tool calls can take anywhere from milliseconds to minutes.

  3. Retry loops. If the agent fails and retries, you are adding another full cycle. Each retry doubles the latency.

  4. Context accumulation. Longer contexts mean slower inference. A 50k token prompt takes longer than a 5k prompt, even if generation time is identical.

  5. UI rendering. Streaming tokens to a UI is not instant. Formatting, markdown parsing, code highlighting all add latency.

The hidden latency budget:

A typical agent interaction might look like:

  • Preprocessing: 500ms
  • Model inference: 2s
  • Tool calls: 3s (parallelized)
  • Postprocessing: 200ms
  • UI rendering: 100ms

Total: 5.8 seconds. But your metrics probably show 2 seconds.

Why this matters:

When you optimize only model latency, you are optimizing 30-40 percent of the actual wait time. The rest is invisible to your metrics but very visible to your users.

Better latency measurement:

Measure end-to-end time from user action to visible result. Include everything. Then break it down:

  • How much is preprocessing?
  • How much is tool execution?
  • How much is context bloat?
  • How much is retry overhead?

Model latency is easy to measure. User experience latency requires more instrumentation. But it is the only metric that matters.

If your agent feels slow but your metrics look fine, you are probably measuring the wrong thing.

Top comments (0)