𝗝𝗼𝗵𝗻

Posted on Apr 30

The AI Code Review Checklist I Use Before Merging Any TypeScript PR

#ai #productivity #typescript #webdev

AI can write code quickly.

That is no longer the interesting part.

The interesting part is what happens after the code is generated.

Because in real projects, the bottleneck is rarely “how do we get code faster?”
It is usually:

Is this correct?
Does it match the architecture?
Does it handle edge cases?
Is it safe to merge?
Can someone else maintain it next month?

That is why I stopped treating AI output like finished code.

Now I treat it like a draft that needs a reliable review system.

This is the checklist I use before I merge any AI-generated TypeScript PR.

Not a theoretical checklist.
A practical one.

1. Can I explain the change in one sentence?

Before I look at the code, I force myself to summarize the PR in one sentence.

For example:

“This change validates uploaded image metadata before saving records to the database.”

If I cannot describe the change clearly, one of two things is probably true:

the PR is doing too much
the code is hiding the real intent

AI often produces code that looks organized while actually mixing multiple concerns.

If the purpose is fuzzy, I stop there and split the change.

2. Does the code match the requested scope?

AI loves to be helpful.

Sometimes too helpful.

It will often:

rename unrelated variables
refactor nearby code
introduce “small improvements”
create utility functions nobody asked for
solve adjacent problems

That makes review harder.

So one of my first checks is simple:

Did the model change only what needed to change?

If a PR was supposed to fix one validation bug but now touches eight files, I get suspicious immediately.

A good AI-generated PR is usually narrower than you think.

3. Are the boundaries explicit?

For TypeScript projects, this is one of the biggest signals of quality.

I look for clear boundaries:

input types
output types
domain models
DTOs
API contracts
validation layers

AI-generated code is much easier to trust when the edges are visible.

Bad sign:

async function saveUser(data: any) {
  // ...
}

Better:

interface CreateUserInput {
  email: string;
  displayName: string;
}

interface CreateUserResult {
  id: string;
  email: string;
  displayName: string;
}

async function saveUser(data: CreateUserInput): Promise<CreateUserResult> {
  // ...
}

If the AI patch adds logic without tightening the contract, I usually improve the boundary before I approve the implementation.

4. Did the AI generate code, or did it generate a new abstraction?

This matters a lot.

Sometimes AI gives you useful implementation.
Sometimes it gives you a brand new architecture you did not ask for.

Watch for:

extra layers
generic helpers
“reusable” wrappers
configuration systems
class hierarchies for simple logic
abstraction before repetition actually exists

A useful question here is:

Would I still create this abstraction if a human teammate had not suggested it?

If the answer is no, I remove it.

AI-generated code often becomes bloated not because it is broken, but because it is too eager to generalize.

5. Is there a test for the thing that actually changed?

I do not just ask, “Are there tests?”

I ask:

Is there a test for the exact behavior this PR claims to fix or add?

That means:

one test for the happy path
one test for the expected failure mode
one test for the edge case that is easiest to miss

If the PR fixes a parsing bug, I want a parsing test.
If it changes authorization logic, I want an authorization test.
If it adds fallback behavior, I want a test that proves the fallback works.

AI often writes tests that mirror the implementation too closely.

So I look for tests that verify behavior, not just structure.

6. Does the code fail safely?

A surprising amount of AI-generated code handles success better than failure.

So I explicitly review:

missing values
invalid input
timeouts
third-party failures
null and undefined cases
partial success scenarios
retries
logging
user-facing error messages

I ask myself:

What happens if this dependency is down?
What happens if the payload shape changes?
What happens if this field is missing?
What happens if the operation succeeds halfway?

If the answer is “the app probably throws something weird,” the PR is not ready.

7. Are runtime checks present at the system edges?

TypeScript is great, but it does not validate runtime data by itself.

So any AI-generated code that touches:

request bodies
query parameters
local storage
database results
webhooks
environment variables
third-party APIs

should make me ask:

Where is the runtime validation?

Types help inside the codebase.
Validation protects the boundary.

For example:

import { z } from "zod";

const CreatePostSchema = z.object({
  title: z.string().min(1),
  body: z.string().min(1),
  tags: z.array(z.string()).default([]),
});

type CreatePostInput = z.infer<typeof CreatePostSchema>;

function parseCreatePostInput(input: unknown): CreatePostInput {
  return CreatePostSchema.parse(input);
}

If AI writes strongly typed code without validating incoming data, it creates a false sense of safety.

8. Are names better after the change — or worse?

AI can produce valid code with terrible naming.

And bad names are expensive because they survive code review surprisingly often.

So I check:

function names
variable names
type names
file names
booleans
enum values
error messages

I want names that reflect the domain, not the implementation trick.

Bad:

const dataProcessorManager = createHandler();

Better:

const invoiceRetryScheduler = createRetryScheduler();

When naming gets vague, maintainability drops fast.

9. Does the PR introduce duplicate logic?

AI often rewrites something that already exists somewhere else in the codebase.

That creates:

near-duplicate validators
inconsistent helpers
slightly different parsing functions
multiple ways to do the same thing

So I always scan for duplication before approving.

My rule is simple:

if the logic already exists, reuse it
if the existing abstraction is bad, improve it
do not allow AI to create parallel versions of the same idea

Duplicate code is especially dangerous when it looks clean.
It feels harmless at first and becomes expensive later.

10. Would I be comfortable debugging this at 2 AM?

This is one of my favorite review questions.

Because code can be technically correct and still be operationally terrible.

I look for:

meaningful logs
useful error messages
predictable branching
obvious control flow
easy-to-trace data transformations
no “magic” hidden in helpers

If production breaks, will this code help the next person understand what happened?

Or will it force them to reverse-engineer AI-generated cleverness under pressure?

If debugging would be painful, I simplify the code before merge.

11. Is the security model still intact?

Any AI-generated PR that touches:

auth
permissions
tokens
cookies
headers
uploads
database access
redirects
HTML rendering
shell commands

gets a slower review.

I specifically check:

authorization, not just authentication
input handling
secret leakage
unsafe defaults
overly broad permissions
accidental exposure of internal fields
client/server boundary mistakes

I do not trust “looks secure.”

I want the security assumptions to be obvious in the code.

12. Can I roll this back easily?

This final check is underrated.

Even a good change can fail in production.

So before merging I ask:

is the change isolated?
is it behind a flag?
can it be reverted cleanly?
does it change data shape or persistence behavior?
does it create migration risk?
does it depend on coordinated deployment?

AI makes it easy to create larger patches than necessary.
Rollback thinking forces the patch back into a safer shape.

My quick merge rubric

If I need a fast decision, I use this simple rubric.

I merge when:

the scope is narrow
the intent is obvious
the boundaries are typed
runtime input is validated
tests prove the claimed behavior
failure paths are handled
naming is clear
security assumptions are visible
rollback is simple

I request changes when:

the PR does more than requested
the code adds unnecessary abstractions
tests are shallow
boundary validation is missing
names are vague
duplicate logic appears
debugging would be painful
the operational or security story is unclear

Final thought

AI can absolutely make teams faster.

But speed only matters if the output is reviewable, understandable, and safe to ship.

That is why I do not ask:

“Did AI write this?”

I ask:

“Would I still approve this if I had to own it in production?”

That single question has improved my reviews more than any tool setting.

If you are using AI heavily in your TypeScript workflow, build a checklist like this one.

It does not have to be identical.

It just has to be consistent.

Because the real productivity gain is not generated code.

It is generated code that survives review without creating future pain.

DEV Community