DEV Community

Junkyu Jeon
Junkyu Jeon

Posted on • Originally published at bivecode.com

How to Code Review AI-Generated Code: What Needs Human Eyes vs. What Doesn't.

You shipped a feature with Cursor or Lovable. It works. Then a teammate asks you to walk them through the error handling. You open the file. You trail off. If that moment feels familiar, the problem isn't your code — it's that standard code review instincts weren't built for AI-generated output. AI code is syntactically correct but semantically fragile: it compiles, TypeScript is happy, the happy path works — but the assumptions underneath were never surfaced. This post gives you a focused mental model: the 5 patterns that require a human judgment call, and the AI self-review prompts that handle the mechanical checks so you don't have to.

Why AI Code Review Is Different

Traditional code review assumes a human wrote the code — someone who made decisions with intent, who understood the domain, who chose an approach and can explain it. You're reviewing for logical errors, style consistency, and the occasional "did you think about X?"

AI-generated code is different in a specific way: it is syntactically correct but semantically fragile. The code compiles. TypeScript is happy. The function returns the right shape. But the logic inside makes assumptions the AI never surfaced — assumptions about data that will always be present, about error cases that will never happen, about external dependencies that will always behave.

The AI wasn't being careless. It was pattern-matching from training data. When you asked it to "fetch the user's profile and display it," it generated the happy path — a user exists, the fetch succeeds, the data is complete. It didn't hallucinate a bug. It just didn't think past the thing you asked for.

This means AI code review is less about catching logic errors and more about surfacing hidden assumptions. Your job is not to find what's wrong — it's to find what's assumed.

5 Patterns That Require Human Eyes

These are the categories where AI-generated code consistently carries risk. For each one: what it looks like, why it's dangerous, and how to fix it.

1. Hidden State

AI-generated code often carries state in places you don't expect — module-level variables, closure-captured values, or shared mutable objects that survive across calls. The code looks like a stateless function. It isn't.

// AI-generated: looks like a pure utility function
let cache: Record<string, User> = {};

export function getUser(id: string): User | undefined {
  if (cache[id]) return cache[id];
  // ... fetch and populate cache
  return cache[id];
}

// The problem: cache persists across requests in a serverless/edge context.
// Two users can end up seeing each other's data.
Enter fullscreen mode Exit fullscreen mode

Why it's dangerous: In serverless and edge runtimes (Cloudflare Workers, Vercel Edge), module-level state is not always reset between requests. The AI generated this pattern because it's sensible in a traditional server with one process per instance — but it becomes a data leak in a shared environment.

// Fixed: state lives in the request, not the module
export async function getUser(
  id: string,
  requestCache: Map<string, User>
): Promise<User | undefined> {
  if (requestCache.has(id)) return requestCache.get(id);
  const user = await fetchUserFromDb(id);
  if (user) requestCache.set(id, user);
  return user;
}
Enter fullscreen mode Exit fullscreen mode

What to look for: Any let or const declared at the top level of a module that is mutated inside a function. Ask: "What happens if this function is called twice concurrently?"

2. Hardcoded Values

AI loves to hardcode. URLs, timeouts, limits, role names, status strings — whatever was in the prompt or in the training data for similar code. These values look fine in the generated code because they match your current reality. They become time bombs as your app evolves.

// AI-generated: values burned directly into the logic
export async function getUserPosts(userId: string) {
  const posts = await supabase
    .from("posts")
    .select("*")
    .eq("user_id", userId)
    .eq("status", "published")     // hardcoded status
    .limit(10)                      // hardcoded limit
    .order("created_at", { ascending: false });

  return {
    posts: posts.data,
    hasMore: posts.data!.length === 10, // hardcoded comparison
  };
}
Enter fullscreen mode Exit fullscreen mode

Why it's dangerous: When you later change the limit to 20, the hasMore logic silently breaks — it's still comparing against the old constant. Two months from now, a teammate changes the limit on line 8 and doesn't notice the comparison on line 14.

// Fixed: named constants, single source of truth
const POST_STATUS = { PUBLISHED: "published", DRAFT: "draft" } as const;
const POSTS_PER_PAGE = 10;

export async function getUserPosts(userId: string) {
  const posts = await supabase
    .from("posts")
    .select("*")
    .eq("user_id", userId)
    .eq("status", POST_STATUS.PUBLISHED)
    .limit(POSTS_PER_PAGE)
    .order("created_at", { ascending: false });

  return {
    posts: posts.data ?? [],
    hasMore: (posts.data?.length ?? 0) === POSTS_PER_PAGE,
  };
}
Enter fullscreen mode Exit fullscreen mode

What to look for: Any number, string literal, or URL that appears inside business logic. If the value would need to change when your requirements change, it should be a named constant or config value.

3. Silent Error Swallowing

This is the most insidious pattern. AI-generated code frequently wraps operations in try/catch blocks and then does nothing useful in the catch — returns null, returns an empty array, logs to console, or just moves on. The feature appears to work. In reality, it's silently dropping failures.

// AI-generated: errors vanish into the void
export async function sendWelcomeEmail(userId: string) {
  try {
    const user = await getUser(userId);
    await resend.emails.send({
      from: "welcome@yourapp.com",
      to: user!.email,
      subject: "Welcome!",
      html: renderWelcomeEmail(user!),
    });
  } catch (error) {
    console.error("Failed to send welcome email:", error);
    // silently returns undefined — caller never knows it failed
  }
}
Enter fullscreen mode Exit fullscreen mode

Why it's dangerous: Your onboarding flow calls sendWelcomeEmail. The email provider is down for 10 minutes. Zero users get a welcome email. Your code shows no errors. Your monitoring shows no alerts. You find out a week later when a user says they never got the email.

// Fixed: errors propagate or are explicitly handled with intent
export async function sendWelcomeEmail(userId: string): Promise<void> {
  const user = await getUser(userId);
  if (!user) throw new Error(`User ${userId} not found`);

  await resend.emails.send({
    from: "welcome@yourapp.com",
    to: user.email,
    subject: "Welcome!",
    html: renderWelcomeEmail(user),
  });
}

// In the caller:
try {
  await sendWelcomeEmail(newUser.id);
} catch (error) {
  logger.error("Welcome email failed", { userId: newUser.id, error });
  // deliberate decision: continue onboarding, retry async
}
Enter fullscreen mode Exit fullscreen mode

What to look for: Any catch block that does not re-throw, does not return a meaningful error state to the caller, or only logs to console. Ask: "If this catch fires in production, will I know?"

4. Assumed External Dependencies

AI builds against the happy path of every dependency. It assumes your database is reachable, your API keys are valid, your third-party service returns the documented shape. It does not model what happens when any of those assumptions are wrong.

// AI-generated: assumes Supabase always returns what we expect
export async function getDashboardData(userId: string) {
  const { data: profile } = await supabase
    .from("profiles")
    .select("*")
    .eq("id", userId)
    .single();

  const { data: subscription } = await supabase
    .from("subscriptions")
    .select("*")
    .eq("user_id", userId)
    .single();

  return {
    name: profile.full_name,        // crashes if profile is null
    plan: subscription.plan_name,   // crashes if no subscription row
    renewsAt: subscription.renews_at,
  };
}
Enter fullscreen mode Exit fullscreen mode

Why it's dangerous: New users don't have a subscription row yet. Profiles can be deleted. The Supabase RLS policy might filter the row out. Any of these causes a crash at the property access, with a runtime error that gives no context about why the data was missing.

// Fixed: each dependency failure has a deliberate response
export async function getDashboardData(userId: string) {
  const [profileResult, subscriptionResult] = await Promise.all([
    supabase.from("profiles").select("*").eq("id", userId).single(),
    supabase.from("subscriptions").select("*").eq("user_id", userId).maybeSingle(),
  ]);

  if (profileResult.error || !profileResult.data) {
    throw new Error(`Profile not found for user ${userId}`);
  }

  return {
    name: profileResult.data.full_name,
    plan: subscriptionResult.data?.plan_name ?? "free",
    renewsAt: subscriptionResult.data?.renews_at ?? null,
  };
}
Enter fullscreen mode Exit fullscreen mode

What to look for: Any property access on a value returned from a network call, database query, or external API without a null check. TypeScript with strict mode will catch some of these, but AI often adds non-null assertions (!) to silence the compiler rather than handling the case.

5. Untestable Structure

AI-generated code is often structurally untestable — not because the logic is complex, but because the logic, the I/O, and the side effects are all tangled together in a single function with no seams for a test to grab onto.

// AI-generated: logic, database, and email fused together
export async function processNewOrder(orderId: string) {
  const order = await supabase
    .from("orders")
    .select("*, items(*), user(*)")
    .eq("id", orderId)
    .single();

  const total = order.data!.items.reduce(
    (sum, item) => sum + item.price * item.quantity, 0
  );
  const discountedTotal = total * (order.data!.user.is_pro ? 0.9 : 1);

  await supabase
    .from("orders")
    .update({ total: discountedTotal, status: "confirmed" })
    .eq("id", orderId);

  await resend.emails.send({ /* ... */ });
}
Enter fullscreen mode Exit fullscreen mode

Why it's dangerous: You cannot test the discount logic without hitting a real database and triggering a real email. Change the discount rate and you have no way to verify the math is right without a full end-to-end run. This is how bugs hide.

// Fixed: pure business logic separated from I/O
export function calculateOrderTotal(
  items: OrderItem[],
  isPro: boolean
): number {
  const subtotal = items.reduce(
    (sum, item) => sum + item.price * item.quantity, 0
  );
  return subtotal * (isPro ? 0.9 : 1);
}

// Now trivially testable:
// expect(calculateOrderTotal([{ price: 100, quantity: 2 }], true)).toBe(180);

export async function processNewOrder(orderId: string) {
  const { data: order } = await supabase
    .from("orders")
    .select("*, items(*), user(*)")
    .eq("id", orderId)
    .single();

  if (!order) throw new Error(`Order ${orderId} not found`);

  const total = calculateOrderTotal(order.items, order.user.is_pro);

  await supabase
    .from("orders")
    .update({ total, status: "confirmed" })
    .eq("id", orderId);

  await resend.emails.send({ /* ... */ });
}
Enter fullscreen mode Exit fullscreen mode

What to look for: Functions that mix data fetching, business logic, and side effects without the logic being extractable. Ask: "Can I test the core decision-making in this function without mocking a database?" If the answer is no, extract the logic.

What You Can Delegate Back to AI

Not everything needs a human eye. Some checks are mechanical — consistent, rule-based, and well-suited for AI. Use these prompts to let AI do the legwork before you do the judgment calls.

Surface the assumptions:

List every assumption this code makes — about input values,
external services, data shapes, and environment state.
For each assumption, rate how likely it is to be violated in production.
Enter fullscreen mode Exit fullscreen mode

Find failure inputs:

Give me 5 specific inputs or conditions that would cause this function
to throw an error, return wrong data, or silently do nothing.
Show what the actual output would be for each.
Enter fullscreen mode Exit fullscreen mode

Scan for hardcoded values:

Find every hardcoded string, number, or URL in this code that
represents a business rule or configuration value.
For each one, explain what would break if it changed.
Enter fullscreen mode Exit fullscreen mode

Check error propagation:

Trace every possible error path in this code.
For each catch block or error handler, tell me:
does the caller know the operation failed?
If not, what data or behavior is silently lost?
Enter fullscreen mode Exit fullscreen mode

Assess testability:

Which parts of this code contain business logic that cannot be
tested without hitting a real database or external API?
Suggest how to extract that logic into pure functions.
Enter fullscreen mode Exit fullscreen mode

Run these before your human review. The AI will surface most of the mechanical issues; your human review focuses on the patterns that require judgment.

The Handoff Checkpoint: Can You Explain This?

Before you hand AI-generated code to a teammate or push it to production, apply one final test. For each significant function or module:

  • What does this do when everything goes right? If you can't answer fluently, you don't own it yet.
  • What does this do when the database is slow? When the user is not logged in? When the external API returns a 429?
  • What would I tell a teammate who had to debug this at 2am? If your answer is "good luck," it's not ready.

If you can't explain it, you have two options. Either spend 30 minutes with the AI asking "explain what this does, step by step, and list everything it assumes" until you can explain it yourself. Or flag it explicitly when you hand it off: "this works but I haven't fully reviewed the error handling — don't merge this until someone does."

Both are honest. Saying nothing and hoping for the best is not.

Closing

AI-generated code is not dangerous because it's wrong. It's risky because it's confidently incomplete. The syntax is clean, the types check out, and the happy path works. The problems live in the assumptions — the things that were never said in the prompt and never surfaced in the output.

The five patterns — hidden state, hardcoded values, silent errors, assumed dependencies, and untestable structure — show up in almost every meaningful AI-generated codebase. Knowing where to look turns a daunting review into a focused one.

Build fast. Review with intention. The two are not in conflict.


Originally published at bivecode.com

Top comments (0)