DEV Community

Nova
Nova

Posted on

The Red-Team Checklist: 12 Failure Modes to Catch Before You Ship AI-Generated Code

If you’ve ever pasted AI-generated code into a codebase and thought “Looks fine,” you already know the problem:

Code can be plausible and still be wrong.

The quickest way I’ve found to turn “plausible” into “shippable” is to treat AI output like it came from a junior dev you haven’t worked with yet: helpful, fast, and in need of a consistent review protocol.

Below is a practical red-team checklist you can run in 5–10 minutes before you merge AI-assisted changes. It’s written for code, but the same failure modes show up in docs, scripts, SQL, and infra.


The 12 failure modes (and what to check)

1) Spec drift (it solved a different problem)

AI is great at something adjacent to what you asked for.

Check: Can you restate the requirement in one sentence and point to where the code satisfies it?

Tactic: Add a tiny “acceptance test” in plain English at the top of your prompt or PR:

  • Input → expected output
  • Constraints (time/memory)
  • Non-goals

2) Silent assumptions (it guessed your environment)

Common guesses: Node vs Bun, Python 3.8 vs 3.12, Postgres vs MySQL, browser vs server.

Check:

  • Language/runtime versions
  • Framework conventions
  • Deployment target

Tactic: Make the code print/version-check early or pin dependencies explicitly.

3) Missing constraints (no perf, no limits, no timeouts)

AI tends to produce “works for one happy-path example.”

Check:

  • Worst-case complexity
  • Timeouts / retries
  • Pagination / batching

If your system has a “max input size,” bake it into code.

4) Incorrect error handling (it hides failure)

A classic: try/except: pass, or returning null without surfacing a reason.

Check:

  • Are errors actionable?
  • Are you swallowing exceptions?
  • Do callers get enough context?

Tactic: Prefer “fail loud” during development:

throw new Error(`User sync failed: ${userId} (${err.message})`)
Enter fullscreen mode Exit fullscreen mode

5) Security foot-guns (injection, auth, secrets)

AI will happily concatenate strings into SQL, log tokens, or weaken auth “to make it work.”

Check:

  • SQL injection / command injection
  • SSRF possibilities (URL fetchers)
  • Secrets in logs
  • Permission boundaries

Tactic: Search for:

  • eval(
  • string-built SQL
  • Authorization headers in logs

6) Test illusions (tests that don’t test)

Sometimes the generated tests assert the same function they call (“tautology tests”).

Check:

  • Do tests fail when you intentionally break the code?
  • Are edge cases covered?

Tactic: Add one “break it on purpose” commit locally. If tests still pass, you’re not testing.

7) Edge cases and data shape mismatch

AI output often assumes the input is clean.

Check:

  • null / empty inputs
  • Unicode, casing, locale
  • Time zones
  • Large numbers

Tactic: Throw a property-based test or a small fuzz loop at it.

8) Dependency traps (wrong imports, deprecated APIs)

AI is trained on a lot of “old internet.”

Check:

  • Package versions match your lockfile
  • APIs exist in your installed version
  • Deprecations

Tactic: Run the smallest real compilation/execution path:

  • npm test / pytest / go test
  • lint + typecheck

9) Observability gaps (no logs, no metrics, no trace)

When the change fails in production, you want a breadcrumb trail.

Check:

  • Logging at boundaries
  • Metrics on rate/latency/error
  • Trace propagation

Rule of thumb: If the code makes a network call, it needs timeout + retry policy + logging.

10) Non-determinism and concurrency bugs

Race conditions are easy to generate and hard to notice.

Check:

  • shared mutable state
  • async loops (forEach(async...) in JS)
  • parallel writes

Tactic: In JavaScript/TypeScript, scan for:

array.forEach(async (x) => { ... }) // usually wrong
Enter fullscreen mode Exit fullscreen mode

11) No rollback story (hard to undo)

AI-assisted refactors can touch many files quickly.

Check:

  • Is the change feature-flagged?
  • Can you revert safely?
  • Is the migration reversible?

Tactic: Default to “thin slice”: ship a safe subset with a flag.

12) Licensing / provenance risk (copy-paste ambiguity)

Even if you trust the code, you need a business-safe posture.

Check:

  • Is it a standard snippet or suspiciously specific?
  • Did it “recreate” a library?

Tactic: When in doubt, rewrite in your own words and cite official docs in comments.


A concrete example: red-teaming a generated helper

Say you ask for: “A function that retries a fetch with exponential backoff.” You get this:

export async function fetchWithRetry(url: string, tries = 3) {
  for (let i = 0; i < tries; i++) {
    try {
      const res = await fetch(url)
      return await res.json()
    } catch (e) {
      await new Promise(r => setTimeout(r, 1000 * i))
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Looks fine… until you run the checklist:

  • #3 Missing constraints: no timeout, no cap, backoff starts at 0ms.
  • #4 Error handling: final failure returns undefined with no clue why.
  • #5 Security: if url comes from user input, SSRF risk.
  • #9 Observability: no logging, no correlation id.

A hardened version might be:

export async function fetchJsonWithRetry(
  url: string,
  { tries = 3, timeoutMs = 8000, baseDelayMs = 250 }: { tries?: number; timeoutMs?: number; baseDelayMs?: number } = {}
) {
  if (!url.startsWith("https://api.example.com/")) {
    throw new Error("Blocked URL (possible SSRF)")
  }

  let lastErr: unknown

  for (let i = 0; i < tries; i++) {
    const controller = new AbortController()
    const t = setTimeout(() => controller.abort(), timeoutMs)

    try {
      const res = await fetch(url, { signal: controller.signal })
      if (!res.ok) throw new Error(`HTTP ${res.status}`)
      return await res.json()
    } catch (err) {
      lastErr = err
      const delay = Math.min(4000, baseDelayMs * 2 ** i)
      console.warn("fetchJsonWithRetry failed", { url, attempt: i + 1, delay, err: String(err) })
      await new Promise(r => setTimeout(r, delay))
    } finally {
      clearTimeout(t)
    }
  }

  throw new Error(`fetchJsonWithRetry exhausted retries: ${String(lastErr)}`)
}
Enter fullscreen mode Exit fullscreen mode

You don’t need to write this exact version. The point is that the checklist forces you to add:

  • clear failure behavior
  • explicit limits
  • basic security boundaries
  • observability

How to run this in your workflow

I like to keep the checklist as a PR comment template:

  • [ ] Spec drift
  • [ ] Assumptions (env/versions)
  • [ ] Constraints (timeouts/limits)
  • [ ] Error handling
  • [ ] Security
  • [ ] Tests that fail when broken
  • [ ] Edge cases
  • [ ] Dependencies / deprecations
  • [ ] Observability
  • [ ] Concurrency / determinism
  • [ ] Rollback path
  • [ ] Licensing / provenance

If you do nothing else: make the code fail loudly, add one real test, and put a timeout on any network call.

That combo alone catches an absurd number of AI-shaped bugs.

Top comments (0)