DEV Community

Cover image for The AI Code Review Checklist I Use Before Merging Any TypeScript PR

The AI Code Review Checklist I Use Before Merging Any TypeScript PR

AI can write code quickly.

That is no longer the interesting part.

The interesting part is what happens after the code is generated.

Because in real projects, the bottleneck is rarely โ€œhow do we get code faster?โ€
It is usually:

  • Is this correct?
  • Does it match the architecture?
  • Does it handle edge cases?
  • Is it safe to merge?
  • Can someone else maintain it next month?

That is why I stopped treating AI output like finished code.

Now I treat it like a draft that needs a reliable review system.

This is the checklist I use before I merge any AI-generated TypeScript PR.

Not a theoretical checklist.
A practical one.

1. Can I explain the change in one sentence?

Before I look at the code, I force myself to summarize the PR in one sentence.

For example:

โ€œThis change validates uploaded image metadata before saving records to the database.โ€

If I cannot describe the change clearly, one of two things is probably true:

  • the PR is doing too much
  • the code is hiding the real intent

AI often produces code that looks organized while actually mixing multiple concerns.

If the purpose is fuzzy, I stop there and split the change.

2. Does the code match the requested scope?

AI loves to be helpful.

Sometimes too helpful.

It will often:

  • rename unrelated variables
  • refactor nearby code
  • introduce โ€œsmall improvementsโ€
  • create utility functions nobody asked for
  • solve adjacent problems

That makes review harder.

So one of my first checks is simple:

Did the model change only what needed to change?

If a PR was supposed to fix one validation bug but now touches eight files, I get suspicious immediately.

A good AI-generated PR is usually narrower than you think.

3. Are the boundaries explicit?

For TypeScript projects, this is one of the biggest signals of quality.

I look for clear boundaries:

  • input types
  • output types
  • domain models
  • DTOs
  • API contracts
  • validation layers

AI-generated code is much easier to trust when the edges are visible.

Bad sign:

async function saveUser(data: any) {
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Better:

interface CreateUserInput {
  email: string;
  displayName: string;
}

interface CreateUserResult {
  id: string;
  email: string;
  displayName: string;
}

async function saveUser(data: CreateUserInput): Promise<CreateUserResult> {
  // ...
}
Enter fullscreen mode Exit fullscreen mode

If the AI patch adds logic without tightening the contract, I usually improve the boundary before I approve the implementation.

4. Did the AI generate code, or did it generate a new abstraction?

This matters a lot.

Sometimes AI gives you useful implementation.
Sometimes it gives you a brand new architecture you did not ask for.

Watch for:

  • extra layers
  • generic helpers
  • โ€œreusableโ€ wrappers
  • configuration systems
  • class hierarchies for simple logic
  • abstraction before repetition actually exists

A useful question here is:

Would I still create this abstraction if a human teammate had not suggested it?

If the answer is no, I remove it.

AI-generated code often becomes bloated not because it is broken, but because it is too eager to generalize.

5. Is there a test for the thing that actually changed?

I do not just ask, โ€œAre there tests?โ€

I ask:

Is there a test for the exact behavior this PR claims to fix or add?

That means:

  • one test for the happy path
  • one test for the expected failure mode
  • one test for the edge case that is easiest to miss

If the PR fixes a parsing bug, I want a parsing test.
If it changes authorization logic, I want an authorization test.
If it adds fallback behavior, I want a test that proves the fallback works.

AI often writes tests that mirror the implementation too closely.

So I look for tests that verify behavior, not just structure.

6. Does the code fail safely?

A surprising amount of AI-generated code handles success better than failure.

So I explicitly review:

  • missing values
  • invalid input
  • timeouts
  • third-party failures
  • null and undefined cases
  • partial success scenarios
  • retries
  • logging
  • user-facing error messages

I ask myself:

  • What happens if this dependency is down?
  • What happens if the payload shape changes?
  • What happens if this field is missing?
  • What happens if the operation succeeds halfway?

If the answer is โ€œthe app probably throws something weird,โ€ the PR is not ready.

7. Are runtime checks present at the system edges?

TypeScript is great, but it does not validate runtime data by itself.

So any AI-generated code that touches:

  • request bodies
  • query parameters
  • local storage
  • database results
  • webhooks
  • environment variables
  • third-party APIs

should make me ask:

Where is the runtime validation?

Types help inside the codebase.
Validation protects the boundary.

For example:

import { z } from "zod";

const CreatePostSchema = z.object({
  title: z.string().min(1),
  body: z.string().min(1),
  tags: z.array(z.string()).default([]),
});

type CreatePostInput = z.infer<typeof CreatePostSchema>;

function parseCreatePostInput(input: unknown): CreatePostInput {
  return CreatePostSchema.parse(input);
}
Enter fullscreen mode Exit fullscreen mode

If AI writes strongly typed code without validating incoming data, it creates a false sense of safety.

8. Are names better after the change โ€” or worse?

AI can produce valid code with terrible naming.

And bad names are expensive because they survive code review surprisingly often.

So I check:

  • function names
  • variable names
  • type names
  • file names
  • booleans
  • enum values
  • error messages

I want names that reflect the domain, not the implementation trick.

Bad:

const dataProcessorManager = createHandler();
Enter fullscreen mode Exit fullscreen mode

Better:

const invoiceRetryScheduler = createRetryScheduler();
Enter fullscreen mode Exit fullscreen mode

When naming gets vague, maintainability drops fast.

9. Does the PR introduce duplicate logic?

AI often rewrites something that already exists somewhere else in the codebase.

That creates:

  • near-duplicate validators
  • inconsistent helpers
  • slightly different parsing functions
  • multiple ways to do the same thing

So I always scan for duplication before approving.

My rule is simple:

  • if the logic already exists, reuse it
  • if the existing abstraction is bad, improve it
  • do not allow AI to create parallel versions of the same idea

Duplicate code is especially dangerous when it looks clean.
It feels harmless at first and becomes expensive later.

10. Would I be comfortable debugging this at 2 AM?

This is one of my favorite review questions.

Because code can be technically correct and still be operationally terrible.

I look for:

  • meaningful logs
  • useful error messages
  • predictable branching
  • obvious control flow
  • easy-to-trace data transformations
  • no โ€œmagicโ€ hidden in helpers

If production breaks, will this code help the next person understand what happened?

Or will it force them to reverse-engineer AI-generated cleverness under pressure?

If debugging would be painful, I simplify the code before merge.

11. Is the security model still intact?

Any AI-generated PR that touches:

  • auth
  • permissions
  • tokens
  • cookies
  • headers
  • uploads
  • database access
  • redirects
  • HTML rendering
  • shell commands

gets a slower review.

I specifically check:

  • authorization, not just authentication
  • input handling
  • secret leakage
  • unsafe defaults
  • overly broad permissions
  • accidental exposure of internal fields
  • client/server boundary mistakes

I do not trust โ€œlooks secure.โ€

I want the security assumptions to be obvious in the code.

12. Can I roll this back easily?

This final check is underrated.

Even a good change can fail in production.

So before merging I ask:

  • is the change isolated?
  • is it behind a flag?
  • can it be reverted cleanly?
  • does it change data shape or persistence behavior?
  • does it create migration risk?
  • does it depend on coordinated deployment?

AI makes it easy to create larger patches than necessary.
Rollback thinking forces the patch back into a safer shape.

My quick merge rubric

If I need a fast decision, I use this simple rubric.

I merge when:

  • the scope is narrow
  • the intent is obvious
  • the boundaries are typed
  • runtime input is validated
  • tests prove the claimed behavior
  • failure paths are handled
  • naming is clear
  • security assumptions are visible
  • rollback is simple

I request changes when:

  • the PR does more than requested
  • the code adds unnecessary abstractions
  • tests are shallow
  • boundary validation is missing
  • names are vague
  • duplicate logic appears
  • debugging would be painful
  • the operational or security story is unclear

Final thought

AI can absolutely make teams faster.

But speed only matters if the output is reviewable, understandable, and safe to ship.

That is why I do not ask:

โ€œDid AI write this?โ€

I ask:

โ€œWould I still approve this if I had to own it in production?โ€

That single question has improved my reviews more than any tool setting.

If you are using AI heavily in your TypeScript workflow, build a checklist like this one.

It does not have to be identical.

It just has to be consistent.

Because the real productivity gain is not generated code.

It is generated code that survives review without creating future pain.

Top comments (0)