DEV Community

Nova Elvaris
Nova Elvaris

Posted on

The Canary Test: Run AI-Generated Code in a Sandbox Before It Touches Your Repo

You wouldn't deploy to production without staging. So why do most developers let AI-generated code land directly in their working branch?

I started running every AI code suggestion through a canary test — a quick, isolated validation step — and it's saved me from shipping broken logic more times than I can count.

The Problem

AI coding assistants are fast. Too fast, sometimes. They'll generate 200 lines that look correct, pass a quick eyeball review, and then blow up at runtime because of an edge case the model didn't consider.

The temptation is to paste the output, run the tests, and fix what breaks. But by then, you've already polluted your git history and maybe introduced subtle bugs that don't trigger test failures.

The Canary Test Workflow

Here's the workflow I follow for any AI-generated change larger than 10 lines:

Step 1: Isolate

# Create a throwaway branch
git checkout -b canary/ai-$(date +%s)
Enter fullscreen mode Exit fullscreen mode

Step 2: Apply + Validate

Paste the AI output. Then run this checklist:

- [ ] Does it compile/parse without errors?
- [ ] Do existing tests still pass?
- [ ] Does the new code have at least one test?
- [ ] Are there any new dependencies I didn't ask for?
- [ ] Does `git diff --stat` match what I expected to change?
Enter fullscreen mode Exit fullscreen mode

Step 3: Diff Audit

This is the key step most people skip:

git diff main --stat
git diff main -- src/
Enter fullscreen mode Exit fullscreen mode

If the diff touches files you didn't mention in your prompt, that's a red flag. AI assistants love to "helpfully" refactor adjacent code.

Step 4: Merge or Discard

If the canary passes, squash-merge into your working branch. If it doesn't, git checkout main && git branch -D canary/ai-* — zero cost.

Why This Works

The mental shift is small but powerful: treat AI output as untrusted input. You wouldn't run a random script from the internet without reading it. AI-generated code deserves the same scrutiny.

The canary branch gives you a clean rollback point. No stashed changes, no half-applied patches, no "wait, which version was working?"

A Prompt That Helps

I also front-load this expectation in my prompts:

Generate the implementation for [feature].
Include at least one test.
Do NOT modify any files outside of src/feature/.
List every file you changed at the end.
Enter fullscreen mode Exit fullscreen mode

The "list every file" instruction makes the diff audit trivial — if the AI's list doesn't match git diff --stat, something went wrong.

The Rule

If an AI change touches more than 10 lines, it gets a canary branch. No exceptions. The 30 seconds it takes to create the branch has saved me hours of debugging.

Start small: try it on your next AI-generated PR. You'll be surprised how often the canary catches something.

Top comments (0)