Teemu Piirainen

Posted on Feb 7

Your AI Coding Agents Are Slow Because Your Tools Talk Too Much

#ai #programming #react #learning

Our AI code validator agent took 608 seconds to report results from a test suite that runs in 96 seconds. The agent wasn't stupid. The tool output was.

Every developer tool we use (test runners, linters, compilers, build systems) was designed for humans reading a terminal. When an AI agent reads that same output through a context window, things break in ways you don't expect. This is one example of that problem, and a pattern for fixing it.

The Symptom

We run a TypeScript monorepo with ~12,000 tests across four packages. After each feature, a code-validator agent runs tests and reports pass/fail with coverage. Simple job.

Agent Task	Actual Test Time	Agent Time	Overhead
Backend (3,683 tests)	24s	224s	9.3x
Frontend (7,450 tests)	96s	608s	6.3x

The agent was spending 6-9x longer understanding the results than the tests took to run.

What The Agent Actually Did

We parsed the agent transcripts (every tool call, every reasoning step). Here's the backend agent's actual sequence:

1. npm run test:coverage           → 419KB output, truncated at 235KB
2. grep "Tests" /tmp/output.log    → matched console.log JSON, not summary
3. npm run test:coverage           → re-ran entire suite. Truncated again.
4. tail -20 /tmp/output.log        → got coverage table row, not summary
5. grep -E "passed|failed"         → matched 47 lines of noise
6. npm run test:coverage           → third complete re-run
   ... repeated 6 times total ...

12 tool calls. 6 complete test re-runs. 224 seconds. To answer a yes/no question.

The frontend agent was worse: 28 tool calls, 5 test re-runs, 13 different grep/tail/head combinations trying to parse a coverage text table. It even reported a false failure — incorrectly flagging coverage as below threshold because it parsed the wrong line.

Why? Because vitest produces this:

 ✓ src/services/__tests__/userService.test.ts (12 tests) 45ms
 ✓ src/services/__tests__/authService.test.ts (8 tests) 23ms
   ... 1,386 more files ...

 Test Files  1389 passed (1389)
      Tests  3683 passed (3683)
   Duration  24.1s

----------|---------|----------|---------|---------|
File      | % Stmts | % Branch | % Funcs | % Lines |
----------|---------|----------|---------|---------|
   ... 141 rows ...

419KB of human-readable output. The answer five numbers is at the bottom. The context window truncates from the bottom. The agent never sees it.

You wouldn't send 419KB of raw HTML to a mobile app and tell it to regex out the data. But that's exactly what we were doing with our agents.

The Fix

We stopped asking "how do we make the agent parse this better" and asked "can we give the agent a command that just outputs the answer?"

RESULT_FILE=$(mktemp)
trap 'rm -f "$RESULT_FILE"' EXIT

# JSON reporter writes structured data to file. Everything else → /dev/null.
(cd "$PKG_DIR" && npx vitest run \
  --reporter=json \
  --outputFile="$RESULT_FILE" \
) > /dev/null 2>&1

# Extract exactly what the agent needs
PASSED_TESTS=$(jq '.numPassedTests' "$RESULT_FILE")
FAILED_TESTS=$(jq '.numFailedTests' "$RESULT_FILE")
SUCCESS=$(jq '.success' "$RESULT_FILE")

echo "RESULT=$( [ "$SUCCESS" = "true" ] && echo "PASS" || echo "FAIL" )"
echo "TESTS=$PASSED_TESTS passed, $FAILED_TESTS failed"
echo "WALL_TIME=${WALL_TIME}s"

# On failure only: extract what failed
if [ "$SUCCESS" != "true" ]; then
  jq -r '.testResults[] | select(.status == "failed") |
    "FILE: \(.name)\n\([.assertionResults[] |
    select(.status == "failed") | "  - " + .fullName] | join("\n"))"
  ' "$RESULT_FILE" | head -30
fi

Three decisions:

--reporter=json — vitest writes structured JSON to a file
> /dev/null 2>&1 — 419KB of terminal noise disappears
jq — extracts five numbers from structured data

The agent now sees this:

=== VALIDATION: test:backend ===
RESULT=PASS
SUITES=1389 passed, 0 failed (1389 total)
TESTS=3683 passed, 0 failed (3683 total)
WALL_TIME=40s

Five lines. One tool call. No parsing, no ambiguity, no re-runs.

The Pattern Is Everywhere

This isn't a vitest problem. It's a tool output problem. Every developer tool your agent touches has the same issue:

Linters — ESLint's default output is human-friendly. eslint --format json gives your agent structured violations with file paths, line numbers, and severity — no parsing needed.
Type checkers — tsc --noEmit dumps errors to stderr as human-readable text. A 5-line wrapper that counts errors and captures file paths turns it into a structured report.
Build tools — docker build streams layers of progress output. The agent only needs: did it succeed, what's the image size, how long did it take.
Infrastructure — terraform plan produces pages of human-readable diff. terraform plan -json gives your agent a structured changeset it can reason about.

The pattern is always the same: the tool already has structured output (JSON, machine-readable flags), but the default is designed for a terminal. Switch the format, discard the noise, extract the answer.

The Results

Metric	Before	After
Backend: tool calls	12	1
Backend: agent time	224s	42s
Frontend: tool calls	28	1
Frontend: agent time	608s	66s
False failures	2	0
Test re-runs per agent	5-6	0

Same tests. Same agent. Same model. Same prompts.

The Takeaway

The industry is pouring effort into prompt engineering, model selection, and agent frameworks. Meanwhile, half the agent's context window is filled with ANSI color codes, progress bars, and output that was never meant for machine consumption. The context window is a scarce resource => treat it like memory, not a terminal screen.

When your agent is slow, don't start with the prompt. Start with what the tools are sending back. Audit every command your agent runs. If the output is more than a screenful, the agent is probably struggling with it. Most tools already support structured output: JSON flags, machine-readable formats, quiet modes. Use them. And where they don't exist, a simple wrapper script that filters noise and extracts the answer will do more for your agent's performance than any prompt rewrite.

The fastest agent isn't the one with the best reasoning. It's the one that doesn't have to reason about the data format at all.

Based on a real optimization on a production TypeScript monorepo with ~12,000 vitest tests. The pattern — structured output, noise suppression, answer extraction — applies to any tool your agents touch.

DEV Community