The Mistakes Didn't Change. The Speed Did.

#security #ai #devops #devsecops

Everyone is measuring how fast agents write code. Few are measuring what that code introduces.

This year, independent researchers tested the major AI coding agents building applications from scratch. Most pull requests contained at least one vulnerability. Inside Fortune 50 companies, AI-generated code introduced 10,000+ new security findings per month. Logic and syntax bugs went down. Privilege escalation paths jumped over 300%. Yikes!

The code improved while the vulnerabilities got worse. Agents just produce the same old mistakes faster. One customer seeing another customer's data. Login flows that leave a back door wide open. Endpoints exposed to the entire internet.

The mistakes are harder to see

The code looks clean. It follows the right patterns, uses the right frameworks, passes initial agent-driven code review. It just quietly skips the check that asks "should this user be allowed to do this?" or "has this request been authenticated?"

These are judgment mistakes. The security tooling most teams rely on was built to catch known-bad patterns, not missing logic. Over 80% of vulnerabilities in AI-generated code go undetected by traditional static analysis. Pattern matching catches code that is obviously wrong. It cannot catch logic that is missing.

The window to catch them is closing

A human writes one insecure endpoint per sprint. An agent writes twenty in an afternoon. That alone changes the math on how much your security infrastructure needs to handle.

It goes further. The agentic loop is getting tighter. Agents writing code, agents reviewing code, agents merging code. Each iteration shrinks the window between generation and production, and the human verification layer gets thinner every time.

When that window was wide, pattern-matching tools and human reviewers could compensate for each other's blind spots. As it narrows, both have less time to work with, and the mistakes that slip through are the ones neither was designed to catch.

The tooling is evolving, and so is the attack surface

The next generation of security tooling is starting to reason about code rather than just match patterns. Security review as a continuous practice embedded in the development loop, not a gate at the end of it. That direction is right.

More tooling also means more surface area. Earlier this year, a wave of CVEs hit MCP infrastructure, many of them the same class of vulnerability these tools exist to catch. If you are going to trust your security pipeline, you have to secure the pipeline itself. OWASP and GitHub are already publishing frameworks and reference architectures for this.

What I'm doing about it

On my own platform, I have the pattern-matching layer in place. Static analysis on every PR, dynamic scanning nightly. That catches what it was designed to catch. The floor is built.

Now I need what goes above it: security agents that reason about logic-level gaps, tooling integrated at generation time via MCP instead of just at review time, and a hardened pipeline that gets the same isolation and least-privilege treatment as production.

The mistakes agents make are not new. The speed at which they make them is.