Dimitris Kyrkos

Posted on Apr 1

The Verification Paradox: Why 100% of AI-Assisted Devs Face Incidents

#ai #security #discuss #analytics

Intro

How much do we actually trust the code our AI assistants spit out?

Recently, we had the opportunity to present at WALK, the innovation center at the Aristotle University of Thessaloniki that helps startups turn ideas into sustainable ventures. We spoke with founders and engineers about the rise of Vibe Coding and the hidden risks that come with it. Following the session, we surveyed 23 startups about their AI habits and the levels of trust they place in these tools.

The results were a wake-up call for anyone merging AI pull requests.

The Daily Dependency

First, we wanted to know how deep the AI rabbit hole goes. It turns out, we are reaching a point of total dependency.

Nearly half (47.8%) of developers use these tools daily as part of their core workflow, and another 34.8% use them several times a week. Only 4.3% of respondents said they rarely or never use AI.

The 100% Incident Rate

This is where it gets interesting–and a bit scary. We asked if AI-assisted code has ever caused problems.

The "No, never" category was a flat 0.0%. Every single respondent reported that AI had caused an issue, with 78.2% facing problems occasionally or all the time. This creates a massive contrast: 95% of us are using it, yet nearly 80% of us are dealing with major breakages because of it.

The Pressure Trap

If the code breaks so often, why do we trust it?

Most developers (52.2%) claim to be cautious and review code carefully. However, the "WTF" moment happens under pressure. Over a third (34.8%) admit they mostly trust the AI when they are under a deadline. We check the code when we have time, but we skip the rigor exactly when the stakes are highest.

The Security & Privacy Blind Spot

When a security tool flags AI code, the reaction is generally healthy: 69.6% investigate further.

But when it comes to the data we give the AI, the logic disappears.

Despite constant headlines about data breaches, 43.5% of respondents are either "not very concerned" or "not concerned at all" about sharing proprietary data with AI models.

Conclusion: The Auditor Era

This survey reveals a "Verification Paradox". AI has become a daily necessity, but its 100% incident rate proves that our value as developers has shifted. We aren't just writers anymore; we are auditors.

The greatest risk isn't the AI's lack of logic–it's our human tendency to trust it most when time is shortest.

How are you auditing your AI output? Let’s discuss in the comments.

Top comments (9)

arun rajkumar • Apr 1

That 34.8% "mostly trust under time pressure" stat is the whole story. We run a payments platform — FCA-authorised, SOC2 certified — and the pressure to ship fast is constant. We use AI agents daily for things like scanning config drift and validating schemas across our services. But we learned early: the verification layer isn't optional, it's the product. An AI agent flagged 23 env variable inconsistencies across our services in minutes. Incredible speed. But 4 of those were intentional — legacy integrations with partner-specific naming that the agent had zero context for. If we'd auto-fixed them, we'd have broken production. The paradox you're describing isn't really about AI trust — it's about building systems where the human verification step is a first-class citizen, not an afterthought you skip when the deadline hits.

Dimitris Kyrkos • Apr 2

You are completely right. In the pursuit of delivering quickly, we sacrifice security and good practice. Things move so fast and something breaking is a matter of time. AI agents, even though they are quick, lack context and hallucinate a lot, making it hard for us to identify if they flag a false positive or a real issue. So, autofix, even though quick, can be dangerous in legacy code. That is why you need both a human check and a good tool to work together.

arun rajkumar • Apr 3

The hallucination point is huge. We ran into exactly this — an AI agent flagged a "missing" env variable that wasn't actually missing, it was conditionally loaded from a secrets manager in production. Looked like a real finding, passed initial review. If we'd auto-fixed it, we'd have overwritten the secrets manager integration with a hardcoded fallback. In fintech, that's not just a bug — that's a compliance incident. The false positive/false negative asymmetry in regulated systems is why I keep coming back to "human check + good tool" as the only viable pattern right now.

Dimitris Kyrkos • Apr 5

Exactly what most developers experience. Humans and tools are still the go-to for security checks, and imo, this isn't going to change in the near future.
What tools are you using for AI coding and security, if you don't mind me asking?

arun rajkumar • Apr 7

Good question. For AI coding, we use Claude with MCP — the structured context feed means the AI sees exactly what we want it to see, nothing more. For security specifically, we lean on Zod schemas as the runtime validation layer (dev warns, prod exits), and our env audit pipeline flags anomalies before they reach staging. The key for us in fintech is keeping AI in the "fast analyst" role — it surfaces issues, humans decide. What does Cyclopt's tooling look like on the verification side?

Dimitris Kyrkos • Apr 8

Nice tool chain. As I understand, you mostly do data validation. At Cyclopt, we don't do that; we validate software quality using metrics such as the maintainability index. We perform a source code analysis and SAST. Our tool is called Cyclopt Companion.
Our MCP server and tool chain are in the making and will be available shortly for all agentic tools like Claude code or Copilot. If you are interested, let me know if you want to try it when it gets out.

oleg kholin • Apr 2

Great article — but I think it describes the symptom while the root cause goes deeper.
The "pressure trap" you identified isn't a personal failure of individual developers. It's a structural inevitability created by how most companies are deploying AI right now: as substitution, not amplification. When a team of 10 becomes a team of 3 because "AI will handle the rest", they eliminated exactly the people who should be auditing the AI output. The one remaining developer now carries generation, verification, and deadlines simultaneously — skipping review under pressure isn't laziness, it's arithmetic.
There's also a deeper paradox here. Effective AI auditing actually requires more expertise than writing code from scratch. When you write it yourself, you know where you're uncertain. AI-generated code arrives with a false sense of completeness — it looks done, it compiles, it passes basic tests. Catching what's wrong in someone else's (or something else's) confident output is a harder cognitive task.
So companies that chose substitution are slowly losing the organizational capacity to evaluate the quality of their own product. That's not immediately visible — but it's structural degradation that compounds over time.
The real shift isn't just about hiring or tooling. It's about metrics (speed of code generation stops mattering; quality of verification does), accountability structures (who owns an incident caused by AI output?), and time horizons. Amplification-model companies will look slower on next quarter's report and significantly stronger in three years.
The 100% incident rate in your survey isn't a sign that AI isn't ready. It's a sign that the business model around AI isn't ready.

Dimitris Kyrkos • Apr 2

I agree with you on the matter that companies think that AI will do anything and replace their human employees. This leads to all the problems you addressed. On the other hand, though, I believe that people in general don't really see AI as a tool like all the others, but as something completely different. They also, like corporations, look at it as a "solution for every problem", ignoring that AI makes mistakes too, and will continue to make mistakes. Human review will never be replaced, no matter how good the AI gets.

arun rajkumar • Apr 8

That's a great distinction — data validation vs code validation are two different problem spaces. You're right that our focus is on the data layer: env configs, schemas, runtime payloads. Your approach at Cyclopt of validating the code itself (structure, patterns, quality signals) is the complementary piece. In an ideal workflow you'd want both: validate that the code AI generates follows your architectural patterns and validate that the data it processes matches your schemas. We've seen AI agents write perfectly structured code that uses the wrong env variable name — Zod catches the data mismatch, but it wouldn't have caught a bad code pattern. Sounds like that's exactly where your tool fits.