AI-generated code ships fast. That's the feature and the risk.
I've seen AI assistants produce code that passes tests, looks clean, and still breaks in production because of assumptions no one questioned. After dealing with enough of these, I built a pre-flight checklist: five checks I run before any AI-generated code touches production.
Why You Need a Pre-Flight
Here's what AI assistants do well: generate syntactically correct code that handles the happy path.
Here's what they reliably miss:
- Environment-specific behavior (local vs. staging vs. production)
- Concurrency and race conditions
- Failure modes they weren't explicitly told about
- Security implications of the approach they chose
- Performance at production scale
A pre-flight catches these before your users do.
The 5 Checks
Check 1: "What assumptions did you make?"
Before reviewing any code, I ask:
List every assumption you made while writing this implementation.
Include assumptions about:
- Input data (types, ranges, nullability)
- Environment (OS, Node version, available services)
- Concurrency (single-threaded? thread-safe?)
- Dependencies (versions, availability)
- Scale (how many records, requests/sec, payload sizes)
This consistently surfaces 3-5 assumptions per implementation. At least one is usually wrong.
Real example: An assistant wrote a file upload handler and assumed files would be under 10MB. Our users regularly upload 500MB video files. The implementation would have crashed in production.
Check 2: "What happens when [X] fails?"
I pick the two most critical external dependencies and ask:
In this implementation, what happens when:
1. [Redis/the database/the external API] is unavailable
2. [The request/input/file] is malformed
For each: show me the exact code path and the user-facing behavior.
If there's no handling, add it.
AI assistants almost always handle the success path beautifully and handle failures with... nothing. No try/catch, no timeout, no retry, no graceful degradation.
This single check has caught more production bugs than any other.
Check 3: "Run it with adversarial input"
Generate 5 adversarial test inputs for this implementation:
1. Maximum realistic size
2. Empty/null/undefined
3. Unicode/special characters
4. Concurrent duplicate requests
5. Expired/invalid auth tokens
For each, trace the code path and identify where it breaks.
This isn't fuzzing — it's targeted stress-testing of the logic. The assistant knows the code it wrote, so it can trace through edge cases faster than you can manually test them.
Check 4: "Compare to how we do it elsewhere"
Look at [existing similar file/module] in our codebase.
Compare the patterns used there with your implementation.
List any inconsistencies in:
- Error handling approach
- Logging format
- Response structure
- Naming conventions
- Test structure
This check catches style drift. AI assistants generate code in their "default" style unless you very explicitly constrain them. Even then, they miss subtle patterns like your team's specific error response format or logging conventions.
Check 5: "Write the rollback plan"
If this change causes issues in production, what's the rollback plan?
Specifically:
1. Is this change backwards-compatible with the current database schema?
2. Can we revert the deployment without data migration?
3. Are there any feature flags we should wrap this in?
4. What monitoring/alerting should we add to catch problems early?
This forces the assistant (and you) to think about the deployment as a whole — not just the code. If the rollback plan is "revert the commit," that's probably fine. If it's "run a database migration backward and hope," you need feature flags.
Running the Pre-Flight
The full checklist takes about 5 minutes per feature. Here's my workflow:
- Get the implementation from the assistant
- Run Checks 1-3 in a single prompt (they're related)
- Run Check 4 separately (needs codebase context)
- Run Check 5 before creating the PR
I don't run all five for every change. Hot fix for a typo? Skip it. New payment processing endpoint? Run everything twice.
The Quick-Reference Card
Save this as PRE-FLIGHT.md in your project:
# AI Code Pre-Flight Checklist
Before merging AI-generated code:
- [ ] List assumptions (inputs, env, scale, concurrency)
- [ ] Trace failure paths (external deps, malformed input)
- [ ] Test adversarial inputs (large, empty, unicode, concurrent, expired)
- [ ] Compare patterns with existing codebase
- [ ] Document rollback plan
Skip for: typos, docs, config changes
Run fully for: new features, security-related, data-touching
The Cost of Skipping
I skipped the pre-flight once for a "simple" caching change. The assistant added an in-memory cache that worked perfectly in development (single instance) and caused stale data across three production instances behind a load balancer.
Five minutes of pre-flight would have caught the assumption in Check 1: "This assumes a single server instance."
The incident took four hours to diagnose and fix.
Start Today
Pick one AI-generated change that's pending review right now. Run the five checks. Count how many issues surface.
Most developers find at least two on their first run.
That's two potential production incidents caught in five minutes.
Top comments (0)