I used to paste a whole file into an AI chat and hope it magically spotted the bug. It rarely did. The moment I started feeding minimal reproducible examples (MREs) instead, AI debugging went from “kinda helpful” to “scary effective” — and I stopped losing hours to vague guesses.
TL;DR
- I debug with AI by sending a minimal reproducible example (MRE), not my entire codebase.
- I use a simple checklist to include the right context: expected vs actual, environment, and a single failing test.
- I ask AI for a hypothesis list + fastest experiments, then I run them one-by-one.
- I keep the MRE around as a regression test so the bug doesn’t sneak back.
Context (Why This Matters)
When you’re vibe coding, it’s easy to move fast and accidentally ship tiny bugs that waste a ton of time. The trap I fell into: I’d ask AI “why doesn’t this work?” and dump a huge chunk of code.
AI would respond with ten possible causes… and I’d try random fixes until something “worked.” That’s not debugging. That’s gambling.
What actually helped me: treating AI like a very smart teammate who still needs a clean reproduction. In this post, I’ll show the exact MRE format I use, how I generate one quickly, and how I turn AI’s answers into reliable experiments.
1) Start with a failing test (your MRE anchor)
My rule: if I can’t express the bug as a failing test, I’m not ready to ask AI for help.
A test forces me to define expected behavior vs actual behavior in a way that’s unambiguous. It also gives AI something concrete to reason about.
Here’s a tiny example: a function that’s supposed to group events by day. The bug is timezone-related and only shows up near midnight.
// groupByDay.test.ts
import { describe, it, expect } from "vitest";
import { groupByDay } from "./groupByDay";
describe("groupByDay", () => {
it("groups events by local day (not UTC)", () => {
// This timestamp is Jan 1st 23:30 in -05:00 (local), but Jan 2nd in UTC.
const events = [
{ id: "a", createdAt: "2026-01-01T23:30:00-05:00" },
{ id: "b", createdAt: "2026-01-01T10:00:00-05:00" }
];
const grouped = groupByDay(events);
// Expected: both are Jan 1st in local time
expect(Object.keys(grouped)).toEqual(["2026-01-01"]);
expect(grouped["2026-01-01"].map(e => e.id)).toEqual(["a", "b"]);
});
});
What this does:
- Makes the bug reproducible.
- Encodes the intended behavior.
- Gives me a “done” condition.
Common pitfalls:
- Writing a test that doesn’t actually fail.
- Not pinning timezones/offsets (dates are a bug factory).
- Overbuilding the test setup instead of isolating the smallest case.
Next, I’ll create the smallest implementation that still fails.
2) Shrink the code until the bug still happens
The best MRE is boring. It’s a single file, a single function, and a single failing test.
I literally delete everything that isn’t required to reproduce the issue. If the bug disappears, I undo the last deletion and try a different cut.
Here’s the intentionally buggy implementation that a lot of us write without thinking (I’ve done this more than once):
// groupByDay.ts
export type Event = { id: string; createdAt: string };
export function groupByDay(events: Event[]): Record {
const result: Record = {};
for (const event of events) {
// BUG: toISOString() converts to UTC, which can shift the day.
const dayKey = new Date(event.createdAt).toISOString().slice(0, 10);
result[dayKey] ??= [];
result[dayKey].push(event);
}
// Sorting to make output stable for tests
for (const key of Object.keys(result)) {
result[key].sort((a, b) => a.createdAt.localeCompare(b.createdAt));
}
return result;
}
What’s happening:
-
toISOString()is always UTC. - Your “day” becomes the UTC day, not the local/offset day encoded in the string.
Common pitfalls:
- Thinking
new Date(isoString)keeps the original offset semantics. It doesn’t; it’s an absolute instant. - Using ISO strings for “calendar dates” when you really want “local dates.”
Now that I have a clean MRE, I can ask AI a very specific question.
3) The prompt template I use (and why it works)
When I send an MRE to AI, I don’t ask “what’s wrong?” I ask for structured output.
This is my template. I copy/paste it and fill in the blanks:
You are helping me debug a failing test.
Goal: Make the test pass WITHOUT changing the test expectations.
Environment:
- Runtime: Node 20
- Test runner: Vitest
- Language: TypeScript
Expected behavior:
- groupByDay groups by the local day represented in the input string offset.
Actual behavior:
- The function splits events into different days.
Here is the failing test and implementation (minimal repro):
[PASTE TEST]
[PASTE IMPLEMENTATION]
Please respond with:
1) The most likely root cause in 1-2 sentences
2) Two alternative fixes with tradeoffs
3) A patch (full updated function code)
4) One extra test case I should add to prevent regressions
Why this works:
- “Don’t change the test expectations” prevents AI from “fixing” by weakening requirements.
- Asking for alternatives surfaces tradeoffs (critical for date/time bugs).
- Asking for an extra regression test makes the fix stick.
Common pitfalls:
- Not specifying runtime/timezone assumptions.
- Letting the AI rewrite your whole module instead of the smallest patch.
Next, I’ll show a clean fix you can apply immediately.
4) Apply the fix: format the date in the intended timezone
In this specific case, the requirement is: “group by the day represented by the input string’s offset.”
One pragmatic solution: avoid toISOString() and instead extract the date portion directly from the input when it’s a full ISO string with offset.
This works if your inputs are consistently formatted (and that’s usually a fair constraint in app code).
// groupByDay.ts
export type Event = { id: string; createdAt: string };
function extractOffsetDate(isoWithOffset: string): string {
// Works for inputs like: 2026-01-01T23:30:00-05:00
// or: 2026-01-01T23:30:00Z
// If your input can vary, validate before slicing.
return isoWithOffset.slice(0, 10);
}
export function groupByDay(events: Event[]): Record {
const result: Record = {};
for (const event of events) {
// FIX: preserve the date as expressed in the original string
const dayKey = extractOffsetDate(event.createdAt);
result[dayKey] ??= [];
result[dayKey].push(event);
}
for (const key of Object.keys(result)) {
result[key].sort((a, b) => a.createdAt.localeCompare(b.createdAt));
}
return result;
}
What this does:
- Groups by “calendar date” embedded in the event string.
- Avoids timezone conversion entirely.
Tradeoffs:
- If
createdAtisn’t guaranteed to be ISO-like, you need validation. - If you truly need grouping by user’s current locale timezone, you need a different approach (and probably a timezone-aware library).
Common pitfalls:
- Mixing semantics: “date in the event’s offset” vs “date in the user’s timezone.” Pick one.
Next, I’ll add one more test to lock the behavior down.
5) Add a regression test + ask AI for edge cases
Once the test passes, I always add one more case that would have broken my original logic. This is where AI is genuinely useful: it’s great at brainstorming edge cases.
Here’s an extra test for the Z case (UTC) and a boundary time:
// groupByDay.test.ts
import { describe, it, expect } from "vitest";
import { groupByDay } from "./groupByDay";
describe("groupByDay", () => {
it("groups events by local day (not UTC)", () => {
const events = [
{ id: "a", createdAt: "2026-01-01T23:30:00-05:00" },
{ id: "b", createdAt: "2026-01-01T10:00:00-05:00" }
];
const grouped = groupByDay(events);
expect(Object.keys(grouped)).toEqual(["2026-01-01"]);
expect(grouped["2026-01-01"].map(e => e.id)).toEqual(["a", "b"]);
});
it("handles UTC timestamps without shifting the date key", () => {
const events = [
{ id: "c", createdAt: "2026-01-02T00:00:00Z" },
{ id: "d", createdAt: "2026-01-02T23:59:59Z" }
];
const grouped = groupByDay(events);
expect(Object.keys(grouped)).toEqual(["2026-01-02"]);
expect(grouped["2026-01-02"].map(e => e.id)).toEqual(["c", "d"]);
});
});
What this does:
- Prevents me from “fixing” the -05:00 case but breaking UTC.
- Codifies behavior for the most common timestamp formats.
Common pitfalls:
- Adding too many edge case tests at once and making failures hard to interpret.
- Not keeping test data realistic (use ISO strings you actually store).
Now let’s talk about the actual outcome I get from doing MRE-first AI debugging.
Results / Outcome
When I switched to MRE-first prompts, my AI debugging got way faster and more reliable.
Before: I’d paste a big file, get vague suggestions, and try 5–10 random edits. That could burn 30–90 minutes, especially with date/time issues.
After: I usually get to a passing test in 10–20 minutes because:
- The AI has a precise reproduction.
- I’m running controlled experiments instead of guessing.
- I keep the test as a permanent guardrail.
It also made my code reviews better. Even when AI’s first fix isn’t perfect, it gives me a clean shortlist of hypotheses to validate.
Key Takeaways
- Minimal reproducible examples are the difference between “AI guessing” and “AI debugging.”
- Always include: expected vs actual, environment, and a failing test.
- Ask AI for two alternative fixes + tradeoffs, not a single “answer.”
- Turn the final fix into a regression test so the bug doesn’t return.
- If the bug is about time, assume timezone semantics are the real problem until proven otherwise.
Closing CTA
If you’re using AI to debug, what do you usually paste in: the whole file, or a minimal repro with a failing test?
Drop your current workflow in the comments and I’ll suggest a tighter MRE prompt for it. If there’s interest, I’ll write a follow-up on turning AI’s “maybe it’s this” answers into a repeatable hypothesis checklist.
Top comments (0)