Let me describe a scene you've lived.
A bug report lands. "The completed tasks are still showing up in the active list." You look at the code, spot something that looks wrong, change it, refresh the browser — looks fixed. You close the ticket.
Three weeks later: same bug. Different user. Same ticket reopened.
What happened? You fixed the symptom. The actual cause was still there. And you have no test to tell you if it comes back.
This is the most common bug-fix pattern in software. It's also the most expensive one.
This is Part 3 of the copilot-workflow series. Part 1 set up the template. Part 2 covered pre-merge review. This one covers @test-engineer — the QA engineer persona that forces you to prove bugs exist before fixing them.
The Crime Scene Analogy
Think of a bug like a crime scene.
A bad detective shows up, looks around, decides they know what happened, and closes the case. No evidence. No proof. Just intuition.
A good detective does something different. They document the evidence first. They establish what actually happened — with proof — before drawing conclusions. Then they verify the conclusion against the evidence.
@test-engineer makes you be the good detective.
The rule: write a test that proves the bug exists before you write a single line of fix code.
That test is your evidence. If it doesn't fail before your fix, you didn't actually reproduce the bug — you guessed. If it passes after your fix, you have proof the bug is gone and can never silently return.
The Prove-It Pattern: Step by Step
Here's exactly what happens when I get a bug report.
Step 1: Describe the bug to @test-engineer
@test-engineer I have a bug report: completed tasks still appear
in the active task list. The "completedAt" timestamp is being set
correctly, but the list query is returning them anyway.
Write a failing test that proves this bug exists before I fix it.
Step 2: @test-engineer writes a reproduction test
// This test MUST fail with the current broken code
it('excludes completed tasks from the active task list', async () => {
// Arrange: create one active and one completed task
const activeTask = await createTask({ title: 'Buy groceries' });
const completedTask = await createTask({ title: 'Walk dog' });
await completeTask(completedTask.id);
// Act: fetch the active list
const activeTasks = await listActiveTasks();
// Assert: completed task must not appear
expect(activeTasks).toHaveLength(1);
expect(activeTasks[0].id).toBe(activeTask.id);
expect(activeTasks.map(t => t.id)).not.toContain(completedTask.id);
});
Step 3: Run it. Confirm it fails.
npm test -- --grep "excludes completed tasks"
FAIL src/tasks.test.ts
✕ excludes completed tasks from the active task list
Expected length: 1
Received length: 2 ← Bug confirmed. Completed task IS in the list.
The test fails. The bug is real. Now you have evidence.
Step 4: Fix the actual cause
// Before: missing filter — returns ALL tasks
async function listActiveTasks(): Promise<Task[]> {
return db.tasks.findMany({ orderBy: { createdAt: 'desc' } });
}
// After: filter for tasks where completedAt is null
async function listActiveTasks(): Promise<Task[]> {
return db.tasks.findMany({
where: { completedAt: null },
orderBy: { createdAt: 'desc' }
});
}
Step 5: Run the test again. Confirm it passes.
npm test -- --grep "excludes completed tasks"
PASS src/tasks.test.ts
✓ excludes completed tasks from the active task list (23ms)
Step 6: Run the full suite. No regressions.
npm test
All green. Ship it.
Why This Order Matters
You might be thinking: why not just write the fix first, then write a test to verify it?
Because then you're testing your fix, not the bug.
If you write the fix first, you unconsciously write a test that confirms your fix works. You're not proving the bug existed — you're proving your solution compiles. Those are completely different things.
The Prove-It Pattern forces a different discipline:
The test failing tells you the bug is real. Not "I think this is a bug." Not "a user reported this." Actually real, actually reproducible, actually failing right now.
The test passing tells you the fix is correct. Not "I changed something that looks related." Actually correct — the specific behavior that was broken is now working.
The test existing tells you it can never silently return. The next time someone changes the query, the test will catch it immediately. The bug is permanently guarded.
Using @test-engineer for New Features (TDD)
The Prove-It Pattern is for bugs. But @test-engineer also guides you through Test-Driven Development for new features.
The idea is the same, just inverted: write a failing test that describes what you want the code to do, then make the code pass it.
@test-engineer I'm building a task priority system. Tasks can be low,
medium, or high priority. The active task list should return tasks
sorted by priority (high first) then by creation date.
Write tests for this feature before I implement it.
@test-engineer produces:
describe('listActiveTasks with priority sorting', () => {
it('returns high priority tasks first', async () => {
const low = await createTask({ title: 'Low', priority: 'low' });
const high = await createTask({ title: 'High', priority: 'high' });
const medium = await createTask({ title: 'Medium', priority: 'medium' });
const tasks = await listActiveTasks();
expect(tasks[0].priority).toBe('high');
expect(tasks[1].priority).toBe('medium');
expect(tasks[2].priority).toBe('low');
});
it('sorts by creation date within the same priority', async () => {
const first = await createTask({ title: 'First', priority: 'high' });
const second = await createTask({ title: 'Second', priority: 'high' });
const tasks = await listActiveTasks();
const highPriorityTasks = tasks.filter(t => t.priority === 'high');
expect(highPriorityTasks[0].id).toBe(second.id); // newer first
expect(highPriorityTasks[1].id).toBe(first.id);
});
it('excludes completed tasks from the sorted list', async () => {
const completed = await createTask({ title: 'Done', priority: 'high' });
await completeTask(completed.id);
const active = await createTask({ title: 'Active', priority: 'low' });
const tasks = await listActiveTasks();
expect(tasks.map(t => t.id)).not.toContain(completed.id);
});
});
All three tests fail — the feature doesn't exist yet. Now you implement listActiveTasks to make them pass. The tests become the specification.
The Test Pyramid: Where Each Test Lives
@test-engineer knows the right test level for each scenario:
╱╲
╱ ╲ E2E Tests (5%)
╱ ╲ Real browser, full user flow
╱──────╲
╱ ╲ Integration Tests (15%)
╱ ╲ Real database, real API
╱────────────╲
╱ ╲ Unit Tests (80%)
╱ ╲ Pure logic, no I/O, milliseconds each
╱──────────────────╲
Unit test territory: Pure functions, validation logic, data transformations. No database, no network. Runs in milliseconds. The Prove-It Pattern for logic bugs lives here.
Integration test territory: API endpoints, database queries, the interaction between your code and your database. The listActiveTasks example above is an integration test — it hits a real database.
E2E test territory: Critical user paths that must work end-to-end. "User can log in and create a task" is an E2E test. You don't write these for every bug — only for flows so important they justify the maintenance cost.
When you tell @test-engineer what you're testing, it automatically picks the right level.
What Makes a Good Test Name
@test-engineer enforces descriptive test names as a specification:
// Bad: tells you nothing when it fails
it('works correctly', () => { ... });
it('handles the case', () => { ... });
it('test 3', () => { ... });
// Good: reads like a requirement
it('excludes completed tasks from the active task list', () => { ... });
it('sorts high priority tasks before low priority tasks', () => { ... });
it('throws NotFoundError when task ID does not exist', () => { ... });
When a test fails, you need to understand what broke from the test name alone — without reading the implementation. Good names make CI failures self-explanatory.
The One Rule That Changes Your Debugging Forever
"A bug fix without a reproduction test is not a bug fix. It's a guess."
Before @test-engineer, I fixed bugs by intuition — change the thing that looks wrong, verify it works, move on. Sometimes I was right. Sometimes I was fixing a symptom while the cause festered.
Now: every bug gets a reproduction test first. Every fix gets verified by that test. Every test stays in the suite permanently.
The cumulative effect: each bug I fix makes the next bug harder to introduce. The test suite gets smarter with every incident. The codebase gets more resilient over time, not less.
Get the Template
@test-engineer is part of the copilot-workflow template — one setup, every repo.
👉 github.com/panditAbhis/copilot-workflow
Next in the series: Part 4 — @security-auditor and threat modeling. How to think like an attacker before an attacker does.
Series navigation
Top comments (0)