I Stopped Vibe Coding and Started Using Prompt Contracts, Here's What the Data

#ai #vibecoding #promptengineering #coding

You asked Claude Code to build a Supabase auth flow with row-level security.
You got a flawless, production-grade Firebase auth system.
Technically impressive. Fundamentally wrong. 2,400 lines of perfectly clean code, deleted at 2 AM.
That's the story that kicked off Philippe Eveilleau's viral Medium post — and it hit because every developer using Claude Code has a version of that story. Maybe yours was a payment integration that worked in staging and silently failed in production. Maybe it was an auth system that inverted a truthy check and gave deactivated accounts admin access for two weeks. Maybe it was just a function that passed your test case and rejected every international email address your users actually had.
The pattern has a name now. The industry has data on it. And there's a method that actually fixes it — one that Eveilleau formalized as Prompt Contracts, which has now been turned into a book, a framework, and a workflow that developers are genuinely shipping with.
Let's go deep.
The Vibe Coding Crisis Is Real and Documented
First, let's establish that this isn't just one developer's bad night.
"Vibe coding" — the term coined by Andrej Karpathy in February 2025 — was named Collins Dictionary's Word of the Year for 2025. It entered the cultural lexicon as software development where you describe what you want, accept what the AI produces, and iterate via prompts rather than comprehension.
One year in, the vibes are measurably off.
The Productivity Paradox
In July 2025, METR ran a rigorous randomized controlled trial with experienced open-source developers. Real engineers, real codebases, real tasks. The result: developers using AI coding tools were 19% slower than those working without them. The same developers predicted they'd be 24% faster going in — and even after the experiment, still believed they'd been 20% faster.
They were measurably slower, and they didn't know it.
A broader analysis of vibe coding statistics found this pattern consistent: 95% of developers report feeling more productive while simultaneously producing lower-quality output. The explanation is straightforward — AI eliminates the easy, fast parts (scaffolding, boilerplate, repetitive patterns), which feels like a massive speed gain. But it quietly increases time spent on the hard parts: debugging unfamiliar code, hunting subtle logic errors in code you didn't write, and understanding hidden assumptions you never explicitly made.
The Code Quality Numbers Are Worse
A December 2025 analysis by CodeRabbit examined 470 open-source GitHub pull requests and found that AI co-authored code contained approximately 1.7x more "major" issues than human-written code. Specifically:
Logic errors: incorrect dependencies, flawed control flow — measurably elevated
Misconfigurations: 75% more common in AI-generated code
Security vulnerabilities: 2.74x higher in AI-generated versus human-written code
A separate Veracode study in October 2025 confirmed that while LLMs had become dramatically better at generating functional code over three years, the security profile of generated code had not improved at pace. Larger models weren't meaningfully better than smaller ones at generating secure output.
The Stack Overflow Developer Survey found 66% of developers experience the "productivity tax" — that specific frustration of AI-generated code that is almost, but not quite, right. Close enough to accept; wrong enough to break things later.
The CTO Survey That Should End the Debate
Final Round AI surveyed 18 CTOs in August 2025. 16 out of 18 (89%) reported experiencing production disasters directly caused by AI-generated code. One CTO's take: "AI promised to make us all 10x developers, but instead it's making juniors into prompt engineers and seniors into code janitors."
IBM reported a 60% reduction in development time for internal tools with AI assistance — but that's the sweet spot. Internal tools with low security bars, low maintenance expectations, and high bug tolerance. The moment you're building something that needs to be correct, secure, and maintained: the math changes.
What Vibe Coding Actually Gets Wrong
Before we get to the fix, it's worth being precise about the failure mode.
The problem isn't that Claude Code is bad at writing code. It's excellent at writing code. The problem is that it writes code to match what you described, not what you actually need — and the gap between those two things is where every 2 AM debugging session lives.
Dua Asif, writing in Activated Thinker (Feb 2026), described this precisely with the email validation story. She asked Claude to write an email validation function. Claude delivered — clean, with regex patterns, handling basic cases. Tests passed. Production deploy. Life was good.
Until customer support started getting complaints. The function rejected perfectly valid international email addresses, plus-sign filter addresses, addresses that followed RFC 5322 but didn't match the regex. The function worked exactly as asked. She just asked for the wrong thing.
"The quality of your output is directly proportional to the specificity of your input. Garbage in, garbage out applies to AI just as much as it does to traditional programming."
The hidden cost of vibe coding is you get what you request, not what you need. And that gap is invisible until production reveals it.
The deeper issue: vibe coding externalizes cognition into prompts and post-hoc evaluation. Traditional development front-loads thinking — you understand the problem deeply before writing code. Vibe coding moves that understanding after the code exists, which means you're evaluating assumptions you never consciously made. Review time goes up even as coding time goes down. Comprehension debt accumulates.
Prompt Contracts: The Method
Eveilleau's Prompt Contracts framework is, at its core, a simple reframe: stop treating Claude Code like a search engine and start treating it like an API you're writing a contract for.
When you call an external API, you don't just describe the vibes of what you want. You specify the endpoint, the expected inputs, the expected outputs, the error states, the edge cases, and the acceptance criteria. You define what success looks like before you make the call.
A Prompt Contract applies that same discipline to Claude Code interactions.
The Structure of a Prompt Contract
A well-formed Prompt Contract has five components:

Stack Declaration Be explicit and complete about your technical environment. Don't say "set up authentication." Say "implement authentication using Supabase with row-level security, in a Next.js 14 app with TypeScript strict mode, using the existing user table schema in schema.sql." This is the single change that prevents the Supabase-becomes-Firebase moment. Claude Code is not psychic about your stack. It will make reasonable assumptions. Those assumptions will sometimes be wrong in spectacular ways.
Success Criteria Define what "done" means in measurable terms. Not "handle edge cases" — that's a vibe. Instead: Must handle email addresses with non-ASCII characters Must handle plus-sign addressing (user+filter@domain.com) Must comply with RFC 5322 Must return a specific error type for invalid inputs (not throw) Must handle malformed API responses with trailing commas As Dua Asif notes: "What does 'works correctly' mean? Does it mean the function never raises exceptions? Does it mean it handles 99% of cases? Does it mean it fails gracefully with helpful error messages?" If you don't answer this in the prompt, Claude will answer it for you — and it may not answer it the way you need.
Explicit Constraints What can't it do? What must it not touch? Which patterns must it follow? Do not introduce new dependencies Do not modify the existing auth middleware Must use the existing error handling pattern from src/lib/errors.ts Performance: must complete in under 200ms for p99 The Anthropic API documentation itself instructs Claude Code: avoid over-engineering, don't add features beyond what was asked, don't add error handling for scenarios that can't happen. But "can't happen" is only knowable if your constraints are explicit.
Output Format Specification Specify the exact form of the output. A single function? A module? Including tests? With what kind of comments? In what file? Replacing existing code or alongside it? Vague output specs produce vague outputs.
Verification Criteria (The Self-Check Loop) This is the Prompt Contracts move that changes everything: ask Claude to verify its own output before delivering it. Eveilleau's framework includes what he calls the Verification Loop — prompting Claude to test its own code in the browser before you take delivery. The Amazon book description frames it as "the AI tests its own code before you take delivery." A simple version: append to every production-grade prompt — "Before responding, check your output against these criteria: [list your success criteria]. If any criterion is not met, revise before responding." Claude's ability to catch its own mistakes when explicitly asked is meaningfully better than its tendency to catch them unprompted. Build that into the contract. The Bigger Framework: Where Prompt Contracts Fit Prompt Contracts didn't emerge in a vacuum. The industry has been converging on the same insight from multiple directions. Spec-Driven Development GitHub announced a spec-driven development toolkit in September 2025. Martin Fowler published an analysis of it. It appeared in Thoughtworks Technology Radar Volume 33. AWS shipped Kiro — a spec-driven IDE — in public preview. The core idea: write a detailed specification before any code is generated. The spec includes requirements, architecture, API contracts, error handling, and edge cases. Heeki Park's write-up on using spec-driven development with Claude Code (March 2026) captures the experience of the convert: in earlier projects, he'd write a few quick sentences and start generating immediately, then course-correct repeatedly. After adopting specs: "most of my follow-up interactions were small tweaks rather than wholesale changes to the entire project." The pattern: upfront planning time pays compounding dividends in implementation quality. The CLAUDE.md System Claude Code natively supports a CLAUDE.md file — a persistent project constitution that Claude reads at session start. It functions as a standing prompt contract for the entire project: stack, conventions, folder structure, coding standards, commands, and non-negotiables. Nick Babich's guide on Claude Code project structure describes a well-formed CLAUDE.md as including project overview, architecture, tech stack, coding conventions (TypeScript strict mode, functional components, no default exports), folder structure, commands, and important rules for performance, accessibility, and testing. When your project has a CLAUDE.md, every prompt inherits that context automatically. You stop re-explaining your stack. You stop re-specifying your conventions. The contract becomes ambient. The GSD (Get Shit Done) Framework For Claude Code users who want to go further, the GSD build framework formalizes the spec-first approach with .spec.md files per feature. Instead of "Build me a login page", you create login.spec.md: # Specification: User Login Component ## Requirements
Input: Email and Password
Validation: Email must be valid format; Password > 8 chars
Action: Call /api/v1/login on submit
State: Show loading spinner during request ## Tech Stack
React + Tailwind CSS
Lucide Icons for eye/hide password toggle Then: claude "Implement the feature described in docs/specs/login.spec.md". The system ensures output isn't a snippet — it's a production-ready file built against a specification you reviewed before any code was generated. Prompt Contracts in Practice: Real Examples Here's the contrast between a vibe prompt and a contract prompt, using a real scenario. Vibe Prompt Add rate limiting to our API endpoints Contract Prompt STACK: Express.js API, Node 20, Redis (upstash), TypeScript strict. Existing rate limiting: none. Middleware pattern: see src/middleware/auth.ts for the pattern to follow.

TASK: Implement rate limiting middleware for all /api/* routes.

SUCCESS CRITERIA:

100 requests/minute per IP for unauthenticated routes
1000 requests/minute per authenticated user (use existing auth context)
Returns 429 with Retry-After header when limit exceeded
Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) on all responses
Graceful degradation: if Redis is unavailable, allow requests through (log warning)

CONSTRAINTS:

Do not modify existing auth middleware
Do not introduce dependencies beyond upstash/ratelimit (already in package.json)
Must work in edge runtime (Vercel)

OUTPUT: Single middleware file at src/middleware/rateLimit.ts + update to src/app.ts showing usage. Include 3 test cases.

VERIFY: Before responding, check that the Redis unavailability case is handled and that the Retry-After header is correctly calculated from the window reset time.
The second prompt takes 90 seconds to write. It produces something that actually goes to production. It also becomes documentation — when you come back to this code in three months, the spec is right there.
As Dua Asif noted: "The prompt contract became documentation."
When Not to Write a Contract
Prompt Contracts have a cost: they take time to write. For casual work, they're overkill.
The rule of thumb that's emerged from practitioners: write a contract when the code matters. When it's going to production, when other people will maintain it, when bugs would be costly, when it touches security or payments or auth.
Don't write a contract for:
Explaining a concept
Refactoring a small, isolated function you're going to review line-by-line
Throwaway scripts you'll delete in an hour
Exploring or prototyping to understand a new technology
Do write a contract for:
Any feature that touches user data
Any API integration
Any auth or permissions logic
Any code that will outlive the current sprint
Any code that other team members will inherit
The threshold for "code that matters" is lower than most developers initially think. That "quick script" has a way of becoming a critical pipeline six months later.
The Senior Developer Advantage
One of the most consistent findings in vibe coding research: senior developers (10+ years) report 81% productivity gains from AI coding tools, while junior developers show mixed or negative results.
The explanation: AI tools amplify existing judgment. Senior engineers can evaluate what AI produces — they recognize when an architecture decision is wrong, when a security pattern is suspect, when a function is doing too much. They use AI for the mechanical parts while retaining oversight on the judgment calls.
Junior developers lack the reference points to evaluate AI output. They accept code they can't fully evaluate, which means they can't catch what the AI got wrong.
Prompt Contracts partially bridge this gap. By forcing explicit success criteria and constraints before generation, you build in the checkpoints that a senior engineer would apply mentally. The contract externalizes the judgment that experience normally provides implicitly.
The Productivity Tax and How Contracts Reduce It
Stack Overflow's data: 66% of developers experience the productivity tax — "code that is almost, but not quite, right." That "almost" is the most expensive word in software development. It's the bug you can't find because the code looks right. It's the edge case that passes every test you wrote because you didn't know to write that test.
A research analysis of vibe coding statistics put it precisely: vibe coding doesn't eliminate engineering effort — it redistributes it. Speed gains in early development are real. They're counterbalanced by increases in review burden, defect risk, and organizational knowledge decay.
Prompt Contracts specifically attack the knowledge decay problem. When every production prompt includes a stack declaration, success criteria, and constraints, you accumulate a library of specifications that document what the code was supposed to do — not just what it does. That's documentation that writes itself.
The Bottom Line for Developers in 2026
The data has converged on something the Prompt Contracts framework formalized in practice:
AI code generation is not a replacement for engineering judgment. It's an amplifier of it.
The developers who ship reliably with Claude Code in 2026 are not the ones typing natural language and hoping for the best. They're the ones who treat Claude Code as a powerful junior engineer who executes brilliantly but needs precise briefs. They specify the stack, define success, constrain the scope, and build verification in.
Vibe coding will get you a beautiful Firebase implementation when you asked for Supabase.
A Prompt Contract will get you exactly what you needed — even the edge cases you forgot you needed.
The risotto might be delicious. But at 2 AM, you wanted pizza.
Resources & Further Reading
Philippe Eveilleau — Prompt Contracts on Medium (original article)
Philippe Eveilleau — Prompt Contracts: The Book on Amazon
Dua Asif — I Stopped Vibe Coding and Started Prompt Contracts (Activated Thinker, Feb 2026)
Wikipedia — Vibe Coding — METR study, CodeRabbit data, industry reaction
Hashnode — The State of Vibe Coding in 2026
Pixelmojo — The AI Coding Technical Debt Crisis
Bloomberg Businessweek — Claude Code and the Great Productivity Panic of 2026 (Feb 2026)
Heeki Park — Using Spec-Driven Development with Claude Code (March 2026)
Thoughtworks — Spec-Driven Development: 2025 Emerging Practice
Anthropic — Claude Code Best Practices
Anthropic — Claude Code Prompting Best Practices
What does your current AI prompting workflow look like? Are you writing specs before you generate, or still iterating from vibes? Drop it in the comments — genuinely curious where the community has landed.

Top comments (1)

Mykola Kondratiuk • Apr 5

curious what yours look like in practice. the ones I've tried end up longer than the code they're supposed to guide — at what point does the contract become its own maintenance burden?