You know the feeling: you ask an assistant for a “quick fix,” it hands you a plausible patch, and then you discover (two commits later) that it broke something subtle.
The fastest way I’ve found to reduce that risk is to flip the order of work.
Instead of asking for the fix first, ask for the failing test first.
I call this failing-test-first prompting.
Why start with a failing test?
A good test forces three things into the open:
- What “correct” means (in executable form)
- Where the behavior lives (which module/function is under test)
- How the system should behave at the edges (inputs, errors, weird cases)
If the assistant can’t write a test that fails for the right reason, it probably doesn’t understand the problem well enough to write a safe fix.
This is also how many of us debug as humans: reproduce, pin down, then patch.
The core prompt pattern
Here’s the minimal version:
Task: Add a failing test that reproduces this bug.
Constraints:
- Do not change production code yet
- Use the existing test framework and style
- Test must fail on main, pass after the fix
Output:
1) The test code (full file or diff)
2) The exact command to run it
3) Why it fails (one paragraph)
Context:
- runtime/framework versions
- relevant production code
- any existing related tests
Two lines matter more than they look:
- “Do not change production code yet”: prevents the assistant from “fixing” by accident.
- “Why it fails”: ensures it isn’t just generating a random assertion.
Example: a date parsing bug (JavaScript)
Let’s say you have this helper:
// src/date/parse.js
export function parseISODate(input) {
// Expect YYYY-MM-DD
const [y, m, d] = input.split("-").map(Number);
return new Date(Date.UTC(y, m - 1, d));
}
A bug report lands:
-
parseISODate("2026-03-12")works -
parseISODate("2026-3-12")should throw (we require strict formatting) - but it currently returns a Date
Failing-test-first prompt:
Task: Add a failing unit test that proves parseISODate rejects non-zero-padded months/days.
Constraints:
- Jest
- No production code changes
- Match existing test naming style
Output: test + command + why it fails.
Context:
// src/date/parse.js
<code above>
A good response might produce a test like:
// test/date/parse.test.js
import { parseISODate } from "../../src/date/parse";
describe("parseISODate", () => {
test("rejects non-zero-padded month/day", () => {
expect(() => parseISODate("2026-3-12")).toThrow();
expect(() => parseISODate("2026-03-2")).toThrow();
});
});
This test will fail today (because the function happily parses), and it tells you exactly what strictness you’re committing to.
Now you can ask for the fix as step two:
“Now propose the smallest patch that makes the test pass without breaking the existing behavior.”
“But I don’t have tests yet”
You can still use the pattern. Your first output becomes a minimal harness.
Ask for:
- a test runner setup (one file, one command)
- one failing test that reproduces the issue
- and nothing else
That turns “we don’t test” into “we have one test,” which is a real upgrade.
Make the test maximally informative
When a test fails, you want it to fail loudly and clearly.
I like to require:
- one test per behavior (don’t bundle five edge cases into one assertion blob)
- good failure messages (especially in Python/Go where you can add context)
- a single reason to fail (avoid tests that sometimes fail for timing/network reasons)
If the system is async or distributed, you can still do it, but you’ll want to pin it down:
- freeze time
- mock network boundaries
- use in-memory fakes
The prompt can say:
“If mocking is required, prefer the simplest built-in mocking utilities already used in the repo.”
The 3-step workflow (test → fix → tighten)
In practice, I run this as a loop:
- Failing test (repro)
- Minimal fix (make it pass)
- Tighten (add edge cases, add regression coverage, refactor if needed)
This keeps the assistant from jumping straight to refactors.
If you want to be explicit, request outputs like:
- “Step 1: test diff only”
- “Step 2: production diff only”
- “Step 3: optional improvements (separate diff)”
Common failure modes (and how to prompt around them)
1) The assistant writes a test that passes immediately.
Tell it to prove the failure:
- “Include the current observed behavior and why the assertion should fail.”
- “If the test passes on current code, revise it.”
2) It changes production code ‘just a little.’
Make it a hard rule:
- “Only modify files under
test/in this step.”
3) The test is flaky.
Add constraints:
- “No real network calls.”
- “Do not use sleeps.”
- “Freeze time if timestamps are involved.”
4) It invents helpers that don’t exist.
Give it the existing test utilities (or say “no helpers available”).
The takeaway
If you want safer, faster AI-assisted debugging, don’t start with the patch.
Start with a failing test that captures the bug in a way your CI can enforce forever.
Once you have that, fixes become boring—in the best possible way.
Top comments (0)