You Feel 20% Faster. A Study Says 19% Slower.

#ai #engineering

In a 2025 randomized trial, experienced developers were 19% slower when allowed to use AI tools, and they believed they were 20% faster. That gap between felt speed and measured speed is where unreviewed bugs live. BrassCoders, the scanner that catches what AI assistants structurally miss, is the cheap check that doesn't run on a feeling.

The study is narrow and dated, so treat it as a snapshot. The pattern it caught is the useful part.

The Speed You Feel Isn't The Speed You Get

BrassCoders is built for the gap this study measured: METR found 16 experienced open-source developers took 19% longer to finish tasks with AI tools enabled, while estimating afterward that they'd gone 20% faster.

The METR trial ran 246 tasks in large, mature repositories the developers already knew well, using Cursor Pro with Claude models. The participants forecast a 24% speedup before starting, estimated a 20% speedup afterward, and were actually 19% slower. Caveats matter: 16 developers, familiar codebases, early-2025 models that have since improved. The result doesn't generalize to every developer or every project. The perception gap, though, is the durable finding.

The Same Gap Shows Up In Security

BrassCoders addresses a second version of the same gap: developers trust the security of AI code more than testing supports. Snyk found nearly 80% of developers believe AI code is more secure than human code, while Veracode's testing found 45% of it carried an OWASP Top 10 vulnerability.

The Snyk survey measured belief; the Veracode report measured outcomes across more than 100 models on 80 tasks. Belief sat near 80% secure; outcomes sat near 45% vulnerable. Two different gaps, productivity and security, with the same shape: confidence ahead of measurement.

Why The Gap Persists

BrassCoders exists because AI output defeats the usual signal developers use to decide whether to review: it looks finished. Code that is syntactically clean and plausibly structured reads as correct, so the review step feels skippable even when the logic is wrong.

A human-written first draft often looks rough, which prompts a closer read. An AI draft arrives polished, which suppresses one. The polish is real and the correctness is separate, and the gap between them is exactly what a quick read misses. The feeling of done-ness is not evidence of done-ness.

A Safety Net That Doesn't Rely On Feel

BrassCoders runs the same deterministic scan on every commit no matter how confident anyone is, which is what makes it a safety net rather than a judgment call. Twelve scanners check the AI-generated code and emit the findings as YAML, offline, in CI.

The point isn't to argue with the productivity numbers, which will shift as models improve. It's that a check tied to a feeling gets skipped on exactly the code that feels best. A check tied to a commit doesn't. Run it on every push and the structural bugs stop depending on whether anyone felt like reviewing.

pip install brasscoders
brasscoders --offline scan /path/to/your/project

DEV Community

You Feel 20% Faster. A Study Says 19% Slower.

The Speed You Feel Isn't The Speed You Get

The Same Gap Shows Up In Security

Why The Gap Persists

A Safety Net That Doesn't Rely On Feel

Top comments (0)