Trust but verify when using AI for fixing security flaws

#aie #ai #security

AI Engineer World's Fair Coverage

AI might seem like a magic bullet for fixing security issues, but it's not that simple, warned Eugene Yan, a member of technical staff at Anthropic, during the newly inaugurated security track at AI Engineer World's Fair. The effectiveness of AI in finding and fixing flaws is doubling every five months, he said, pointing to Mozilla releasing a 423-patch bundle in April. This was more patches than were released in all of 2025.

But while agents are good at finding and fixing flaws, the human element is still needed, say many security professionals. This is both to check that the AI has done the work properly and to make sure that seemingly low-risk bugs can't be strung together to make a serious exploit that AI might not spot.

To fix this, Yan proposed a six-stage program. "We found that most teams converge in approximately these six steps, and a big chunk of my thoughts will be about these," he told the crowd.

First, a threat-finding stage identifies a potential flaw and transfers it to phase two, a sandbox, to see if proof-of-concept code can exploit the issue. The third stage is a discovery phase in which the sample is checked against past issues that may have been fixed.

Stage four is an independent verification, which is designed to further filter out false positive results, and then the results are triaged to avoid flooding out human checkers. Then a patch is developed, and the code is kicked back to the discovery engine.

The end result, he argued, will be much more secure code that still maintains human oversight — while making the lives of security staff a lot easier. Of course, as AI systems improve further, that may not be the case forever if the current rate of engine improvements continues.

Top comments (1)

Nazar Boyko • Jul 1

The point about small bugs chaining into a real exploit is the whole article for me. The six stages each look at one flaw on its own (find it, sandbox it, verify it, patch it), which is exactly the shape that can't notice three harmless bugs lining up into something serious. So the human isn't just a checker bolted on at the end, they're the only stage actually looking across flaws instead of down at one. It makes me wonder if the missing piece is a correlation pass that reasons over the whole set of findings at once, and whether that's the part that stays hard to automate longest, since it needs a model of how the system composes rather than whether one bug reproduces. Did Yan touch on anything like analyzing the findings together, or does it stop at verifying each issue on its own?