DEV Community: Abhishek Pandit

A Day in the Life: Complete Copilot Workflow Session From Idea to Merged PR

Abhishek Pandit — Fri, 12 Jun 2026 14:22:24 +0000

Everything in this series has been theory. Good theory — I hope — but theory.

This article is different. We're going to walk through a complete, real session from start to merge. Real prompts, real outputs, real code. Every step of the 8-step workflow, nothing skipped.

The feature: add a due date to tasks, surface overdue tasks in the active list.

The starting point: an Express + TypeScript + Prisma backend with an existing task system. The feature doesn't exist yet.

Let's go.

The 8-Step Workflow

1. /spec chatmode → idea → spec → tasks
2. @test-engineer → write failing tests first
3. Implement → make tests pass
4. @simplifier → readability pass
5. /ship chatmode → SHIP verdict
6. CI green
7. Merge
8. Update CONTEXT.md

One session. One feature. One merge.

Step 1: /spec Chatmode — Before a Single Line of Code

I open the spec chatmode in Copilot Chat and describe the feature:

I want to add due dates to tasks. Tasks past their due date should
be flagged as overdue. The active task list should sort overdue
tasks to the top.

@idea-refiner asks five questions:

Should tasks without a due date appear in the overdue list? (Yes/No)

Is "overdue" calculated server-side or client-side?

Should overdue status persist (stored in DB) or be computed on-the-fly?

Do you need due date filtering (e.g., "due this week")?

Is this sprint backend-only or does the UI need updating too?

My answers: No (tasks without due dates aren't overdue), server-side, computed on-the-fly (no extra column), no filtering this sprint, backend only.

@spec-writer surfaces assumptions then writes the spec:

ASSUMPTIONS:
1. Due date is optional — tasks without one are never overdue
2. Overdue = dueDate < current timestamp (UTC), checked at query time
3. No new database column needed — status is computed in the query
4. Sorting: overdue tasks first, then by creation date within each group
5. No UI changes this sprint — API only
→ Confirm before I proceed.

I confirm. The spec lands:

# Spec: Task Due Dates and Overdue Sorting

## Objective
Allow tasks to have optional due dates. Active task list returns overdue tasks
first (dueAt < NOW()), then remaining tasks by creation date.

## Success Criteria
- [ ] POST /api/tasks accepts optional dueAt (ISO 8601 datetime)
- [ ] GET /api/tasks returns tasks sorted: overdue first, then by createdAt desc
- [ ] PATCH /api/tasks/:id accepts dueAt updates
- [ ] A task with no dueAt is never flagged as overdue
- [ ] All existing task tests continue to pass

## Boundaries
Always: validate dueAt is a valid future date on creation
Ask first: any change to the Task DB schema beyond adding dueAt column
Never: store computed "isOverdue" flag in database — compute at query time

@planner produces 3 tasks. I approve the plan.

Time spent: 12 minutes. Nothing built yet. Everything clear.

Step 2: @test-engineer — Failing Tests First

Task 1 is the schema + service layer. Before touching any implementation:

@test-engineer Write failing tests for the due date feature.
Spec is approved. Tests must fail with current code.

Requirements:
- POST /api/tasks accepts optional dueAt
- GET /api/tasks returns overdue tasks first, then by createdAt desc
- Tasks without dueAt are never overdue

The agent produces:

describe('Task due dates', () => {
  describe('POST /api/tasks', () => {
    it('accepts a valid ISO 8601 due date', async () => {
      const res = await request(app)
        .post('/api/tasks')
        .send({ title: 'File taxes', dueAt: '2026-12-31T23:59:00Z' })
        .expect(201);

      expect(res.body.dueAt).toBe('2026-12-31T23:59:00.000Z');
    });

    it('rejects a past due date on creation', async () => {
      await request(app)
        .post('/api/tasks')
        .send({ title: 'Old task', dueAt: '2020-01-01T00:00:00Z' })
        .expect(422);
    });

    it('creates task successfully without a due date', async () => {
      const res = await request(app)
        .post('/api/tasks')
        .send({ title: 'No deadline' })
        .expect(201);

      expect(res.body.dueAt).toBeNull();
    });
  });

  describe('GET /api/tasks — overdue sorting', () => {
    it('returns overdue tasks before non-overdue tasks', async () => {
      const overdueTask = await createTask({
        title: 'Overdue',
        dueAt: new Date(Date.now() - 86400000) // yesterday
      });
      const futureTask = await createTask({
        title: 'Future',
        dueAt: new Date(Date.now() + 86400000) // tomorrow
      });
      const noDeadline = await createTask({ title: 'No deadline' });

      const res = await request(app).get('/api/tasks').expect(200);
      const titles = res.body.data.map((t: Task) => t.title);

      expect(titles[0]).toBe('Overdue');
      // Future and No deadline can be in either order after overdue
      expect(titles).toContain('Future');
      expect(titles).toContain('No deadline');
    });

    it('never marks a task without dueAt as overdue', async () => {
      await createTask({ title: 'No deadline' });

      const res = await request(app).get('/api/tasks').expect(200);
      const task = res.body.data.find((t: Task) => t.title === 'No deadline');

      expect(task.isOverdue).toBe(false);
    });

    it('sorts multiple overdue tasks by creation date descending', async () => {
      const older = await createTask({ title: 'Older overdue', dueAt: new Date(Date.now() - 172800000) });
      const newer = await createTask({ title: 'Newer overdue', dueAt: new Date(Date.now() - 86400000) });

      const res = await request(app).get('/api/tasks').expect(200);
      const overdueTitles = res.body.data
        .filter((t: Task) => t.isOverdue)
        .map((t: Task) => t.title);

      expect(overdueTitles[0]).toBe('Newer overdue');
      expect(overdueTitles[1]).toBe('Older overdue');
    });
  });
});

I run the tests:

npm test -- --grep "Task due dates"

FAIL src/routes/tasks.test.ts
  ✕ accepts a valid ISO 8601 due date — 400 response (unexpected field)
  ✕ rejects a past due date on creation — 201 (no validation)
  ✕ returns overdue tasks before non-overdue tasks — column does not exist
  ...

All failing. The spec is confirmed in test form. Now I implement.

Step 3: Implement — Make the Tests Pass

Migration first:

npx prisma migrate dev --name add-due-at-to-tasks

model Task {
  id        String   @id @default(cuid())
  title     String
  dueAt     DateTime?     // ← added
  createdAt DateTime @default(now())
  completedAt DateTime?
}

Update the Zod schema:

const CreateTaskSchema = z.object({
  title: z.string().min(1).max(200),
  dueAt: z.string().datetime().optional().refine(
    (val) => !val || new Date(val) > new Date(),
    { message: 'Due date must be in the future' }
  ),
});

Update the list query for overdue sorting:

async function listActiveTasks(): Promise<Task[]> {
  const now = new Date();

  const [overdue, upcoming] = await Promise.all([
    // Overdue: completedAt is null AND dueAt is past
    db.task.findMany({
      where: { completedAt: null, dueAt: { lt: now, not: null } },
      orderBy: { createdAt: 'desc' },
    }),
    // Non-overdue: either no due date or future due date
    db.task.findMany({
      where: {
        completedAt: null,
        OR: [{ dueAt: null }, { dueAt: { gte: now } }],
      },
      orderBy: { createdAt: 'desc' },
    }),
  ]);

  return [
    ...overdue.map(t => ({ ...t, isOverdue: true })),
    ...upcoming.map(t => ({ ...t, isOverdue: false })),
  ];
}

Run the tests:

npm test -- --grep "Task due dates"

PASS src/routes/tasks.test.ts
  ✓ accepts a valid ISO 8601 due date (45ms)
  ✓ rejects a past due date on creation (12ms)
  ✓ creates task successfully without a due date (11ms)
  ✓ returns overdue tasks before non-overdue tasks (67ms)
  ✓ never marks a task without dueAt as overdue (43ms)
  ✓ sorts multiple overdue tasks by creation date descending (58ms)

Full suite:

npm test

All green. No regressions.

Step 4: @simplifier — Readability Pass

@simplifier Do a readability pass on src/services/task.ts
and src/routes/tasks.ts. Tests all pass. Focus on the new
due date and overdue sorting code.

The agent flags one thing:

src/services/task.ts:34 — The Promise.all call with two separate queries is correct but the variable names overdue and upcoming could be more precise. Suggest overduetasks and nonOverdueTasks to match the domain language in the spec.

One rename. I apply it. Tests still pass.

Step 5: /ship Chatmode — The Verdict

I switch to the ship chatmode and describe the changes:

[In /ship chatmode]
Review the due date changes. New Prisma field, updated Zod
validation, new list query with overdue sorting.
Key concern: the parallel query approach — want to confirm
it's correct and won't have race conditions.

The verdict:

## Pre-Merge Review

### Verdict: SHIP ✅

**Summary:** Due date implementation is clean, well-tested, and
correctly computes overdue status at query time as the spec requires.
No critical issues.

### Must Fix Before Merge
— None —

### Should Fix Before Merge
- [IMPORTANT] src/routes/tasks.ts:67 — The 422 validation error for
  past due dates returns the Zod error object directly. Wrap it in the
  project's standard error shape: `{ error: { code: 'VALIDATION_ERROR',
  message: '...', details: result.error.flatten() } }`

### Optional Improvements
- [SUGGESTION] Consider adding an index on (dueAt, completedAt) if
  this table grows large — the overdue query will do a full scan otherwise.

### Clean ✅
- Correct use of Promise.all — no race condition, both queries read
  independently from the same committed state
- Zod validation correctly rejects past dates at the boundary
- isOverdue computed at query time — no stale data risk
- All edge cases covered by tests

One Important issue. I fix the error shape in 2 minutes. Re-run /ship. Clean verdict. SHIP.

Step 6: CI Green

I push the branch. GitHub Actions runs:

✓ Lint & Type Check
✓ Unit & Integration Tests
✓ Build
✓ Security Audit

All green in 3 minutes.

Step 7: Merge

PR opened. CI badge green. /ship verdict documented in the PR description. Merge.

Step 8: Update CONTEXT.md

The last step most developers skip — and the one that pays compound interest.

## Recent Decisions

| Date | Decision | Reason |
|------|----------|--------|
| 2026-06-12 | Compute isOverdue at query time, not stored in DB | Avoids stale data, no background job needed |
| 2026-06-12 | Parallel queries for overdue/non-overdue | Single query with CASE sorting was complex; two clean queries are clearer |

Next session, when I (or a teammate, or an AI agent) opens this project, the context is there. No archaeology required.

The Session at a Glance

Step	Time	What happened
/spec chatmode	12 min	Idea → assumptions surfaced → spec → 3 tasks
@test-engineer	8 min	6 failing tests written
Implement	25 min	Schema, validation, query
@simplifier	3 min	One rename
/ship chatmode	5 min	One Important issue found and fixed
CI	3 min	All green
Merge + CONTEXT	2 min	—
Total	58 min	Feature shipped with spec, tests, review

A working, tested, reviewed feature — in under an hour. With a spec that proves we built the right thing, tests that prove it works, a review that confirmed it's safe, and CI that will catch any regression.

Get the Template

Everything in this walkthrough — all 17 agents, 3 chatmodes, CI pipeline, MCP config, CONTEXT.md template — is in one place.

👉 github.com/panditAbhis/copilot-workflow

Click Use this template. Five minutes to set up. The discipline is yours to keep.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

Stop Building the Wrong Thing: How I Use @spec-writer and @planner Before Writing a Single Line of Code

Abhishek Pandit — Fri, 12 Jun 2026 14:17:13 +0000

I once spent three days building a notification system.

Real-time WebSocket updates. Notification center with read/unread state. Badge counts. Persistence across sessions. It was clean, well-tested, well-reviewed code.

Then I showed it to the product team.

"Oh — we just meant a simple email when someone assigns a task. We don't need any of that in the app."

Three days. Gone. Not because the code was bad. Because I built the wrong thing.

This is the most expensive bug in software development — and it's never caught by tests, never caught by code review, never caught by a security audit. It's caught by showing someone the finished product and watching their face.

@spec-writer and @planner exist to catch this bug before you write the first function.

This is Part 6 of the copilot-workflow series.

The Architect's Blueprint Analogy

You wouldn't build a house without blueprints.

Not because architects are bureaucratic. Because building walls is expensive and tearing them down is more expensive. A blueprint catches design problems when they're still just pencil lines — before concrete gets poured.

Software is the same. The cost of changing a requirement on paper is five minutes. The cost of changing it in working code is hours. The cost of changing it in deployed production code with real users depending on it is days or weeks.

@spec-writer is your blueprint. @planner is your construction schedule. Neither one writes a line of code. Both prevent you from building the wrong thing.

The /spec Chatmode: Three Phases, One Command

The /spec chatmode orchestrates @spec-writer and @planner in sequence. Each phase gates the next — nothing proceeds without approval.

Here's what happens when you open the spec chatmode and describe what you want to build.

Phase 1: @idea-refiner — Sharpen Before You Specify

Before you write a spec, @idea-refiner helps you figure out what you actually want.

Raw idea: "I want to add notifications."

Refined idea (after the agent asks five sharpening questions):

Who: task assignees, not managers
What: email only, no in-app UI this sprint
Success: assignee receives email within 30 seconds of assignment, 0 missed deliveries
Not doing: unsubscribe management, notification preferences, real-time push

This conversation takes 3 minutes. It just saved you 3 days.

Phase 2: @spec-writer — Surface Every Assumption

With a clear direction, @spec-writer produces the spec. But before writing a single section, it does something critical:

ASSUMPTIONS I'M MAKING:
1. Email is via an external service, not self-hosted SMTP
2. Notifications only fire on task creation — not on updates or completion
3. Assignee email is always available (users have verified emails)
4. This is a backend-only change — no frontend work this sprint
→ Correct any of these before I continue.

Assumptions are where projects go wrong. Surfacing them explicitly — before writing the spec — gives you a chance to correct course while it costs nothing.

After confirmation, the full spec:

# Spec: Task Assignment Notifications

## Objective
Send an email to the task assignee within 30 seconds of assignment.
Success: zero missed deliveries, email arrives in inbox (not spam).

## Tech Stack
Node.js 22, Express, Prisma (PostgreSQL), Resend for email delivery.

## Commands
Build: npm run build
Test: npm test -- --coverage
Dev: npm run dev

## Project Structure
src/
  services/notification.ts   ← new file
  jobs/send-notification.ts  ← new file (BullMQ worker)
  routes/tasks.ts            ← modified (trigger on assignment)
tests/
  services/notification.test.ts
  jobs/send-notification.test.ts

## Testing Strategy
Unit: notification service (mock Resend client)
Integration: route handler triggers queue (real DB, mock queue)
E2E: not required this sprint

## Boundaries
Always: validate assignee email before enqueue, log all sends
Ask first: adding new email templates, changing delivery provider
Never: synchronous email delivery (always async via queue)

## Success Criteria
- [ ] Email arrives within 30s of task assignment in manual testing
- [ ] Zero missed deliveries in integration tests (100 consecutive)
- [ ] No email sent on task updates — only on initial assignment
- [ ] Failed delivery retried 3x with exponential backoff

## Open Questions
- None — assumptions confirmed above

This is the gate. The spec goes to the human. Nothing proceeds until it's approved.

Phase 3: @planner — Tasks You Can Actually Execute

With an approved spec, @planner produces the implementation order.

The key insight: vertical slices, not horizontal layers.

Most developers build horizontally: all database first, then all API, then all frontend. This means you have no working feature until everything is done. If something is wrong with the design, you discover it at the end.

Vertical slices mean each task delivers a working piece of functionality:

## Task 1: Notification service with Resend integration
What: Create src/services/notification.ts that sends an email via Resend.
Acceptance:
- [ ] sendTaskAssignmentEmail(task, assignee) sends email via Resend API
- [ ] Failed sends throw NotificationError with error details
- [ ] Mock Resend client in tests — no real emails in test suite
Verification: npm test -- --grep "notification service"
Files: src/services/notification.ts, tests/services/notification.test.ts
Size: S

## Task 2: Queue worker for async delivery
What: Create src/jobs/send-notification.ts BullMQ worker.
Acceptance:
- [ ] Worker processes notification jobs from queue
- [ ] Retries 3x with exponential backoff on failure
- [ ] Logs success and failure with structured fields
Verification: npm test -- --grep "notification worker"
Dependencies: Task 1
Files: src/jobs/send-notification.ts, tests/jobs/send-notification.test.ts
Size: S

## Checkpoint: After Tasks 1-2
- [ ] All tests pass
- [ ] Notification sends correctly in isolation
- [ ] Review before proceeding

## Task 3: Wire assignment route to notification queue
What: Modify task assignment endpoint to enqueue notification.
Acceptance:
- [ ] POST /api/tasks/:id/assign enqueues notification job
- [ ] Notification NOT sent on task update — only on assignment
- [ ] Assignment succeeds even if queue is unavailable (fire-and-forget)
Verification: npm test -- --grep "task assignment"
Dependencies: Tasks 1, 2
Files: src/routes/tasks.ts, tests/routes/tasks.test.ts
Size: S

## Checkpoint: Complete
- [ ] All tests pass
- [ ] Email received within 30s in manual test
- [ ] No email on task update (verified manually)
- [ ] CI green — ready for /ship review

Three tasks. Each one completable in a focused session. Each one leaves the system working. No XL tasks, no guesswork, no "figure it out as you go."

The Anti-Pattern This Prevents

Here's what happens without spec-first development:

You understand the requirement loosely
You start coding based on your interpretation
Three days in, you show someone
They say "oh, we meant something simpler" (or more complex)
You rewrite

The spec doesn't add time to a project. It removes rework. The 15 minutes it takes to write a spec routinely saves 3-8 hours of misguided implementation.

The task breakdown saves a different kind of time: it stops you from building in the wrong order, then discovering a dependency you should have built first.

Using /spec in Practice

For a new feature:

[In /spec chatmode]
I want to add CSV export for the task list.

The chatmode guides you through idea refinement, spec writing, and task breakdown. You approve each phase. Nothing gets implemented until the plan is complete.

For a bug with complex root cause:

[In /spec chatmode]
I need to redesign the task search — it's timing out on tables over 10k rows.
This needs a proper plan before I touch the index structure.

For a refactor:

[In /spec chatmode]
The auth middleware is 400 lines and doing too many things.
I want to split it into separate concerns but keep behavior identical.

What @doubter Adds

After @planner produces the task list, there's one more agent worth invoking for high-stakes decisions: @doubter.

@doubter is an adversarial reviewer — it finds what's wrong with your plan. Not "is this good?" but "what could go wrong?"

@doubter Here is my implementation plan for task assignment notifications.
ARTIFACT: [paste the task list]
CONTRACT: Zero missed deliveries, email within 30s, no sync blocking of the assignment API

It might surface: "Task 3 says assignment succeeds even if queue is unavailable — but if the queue is unavailable for 30 minutes, those notifications are silently lost. Is that acceptable, or do you need a dead-letter queue?"

That's a decision you want to make in the plan, not discover in production.

The Full Workflow in One Picture

Vague idea
    │
    ▼
@idea-refiner → sharp concept + Not-Doing list
    │ human approves direction
    ▼
@spec-writer → assumptions surfaced + spec written
    │ human approves spec
    ▼
@planner → ordered tasks with acceptance criteria
    │ human approves plan
    ▼
@test-engineer → failing tests (Prove-It)
    │
    ▼
Implement → make tests pass
    │
    ▼
/ship chatmode → SHIP / DO NOT SHIP verdict
    │
    ▼
Merge

Every arrow is a gate. Every gate has a human approval. The code is the last step, not the first.

Get the Template

@spec-writer, @planner, @idea-refiner, @doubter, and the /spec chatmode are all included.

👉 github.com/panditAbhis/copilot-workflow

Next (and final) in the series: Part 7 — A complete session walkthrough. Real feature, all 8 steps, start to merge.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

One Command to Rule Them All: The /ship Chatmode That Reviews, Audits, and Cleans Before Every Merge

Abhishek Pandit — Fri, 12 Jun 2026 14:04:11 +0000

Here's a problem I had.

I'd built three specialist agents in Copilot Chat: @code-reviewer, @security-auditor, and @simplifier. Each one was genuinely useful. Each one caught things the others missed.

But using all three before a merge meant:

Invoking @code-reviewer, reading the output, addressing findings
Separately invoking @security-auditor, reading that output, addressing findings
Separately invoking @simplifier, reading that output, addressing findings
Mentally combining three reports into one decision

That's not automation. That's just delegation with extra steps.

What I actually wanted: one command that does all three and tells me to ship or not.

That's what the /ship chatmode is.

This is Part 5 of the copilot-workflow series — and it's the part where the individual pieces become a real automated workflow.

What's a Chatmode?

Before we get into /ship, a quick explanation of chatmodes — because it's a feature most Copilot users don't know about.

Think of Copilot Chat as a radio. By default, it's tuned to a general-purpose frequency. Chatmodes are preset stations — each one configures Copilot with a specific role, tools, and approach before you say anything.

You switch chatmodes the same way you'd switch radio stations: one click in the chatmode selector in VS Code Copilot Chat, then start talking.

The difference: when you open the ship chatmode, Copilot already knows what you're trying to do. It runs the full three-pass review without you having to orchestrate it manually.

The Flight Checklist Analogy

Pilots don't improvise pre-flight checks.

Before every flight, they run through a standardized checklist — the same one, in the same order, every time. Not because pilots are forgetful. Because when something is important enough, you systematize it. You make the right thing the default thing.

Pilots discovered this the hard way. Before checklists became standard, perfectly competent pilots died in perfectly functional aircraft because they forgot one step under pressure.

The /ship chatmode is your pre-merge checklist. Every merge. Same order. No improvising. No forgetting the security pass when you're rushing to meet a deadline.

How /ship Works

When you activate the ship chatmode and describe your changes, three passes run in sequence:

Pass 1: Code Review

@code-reviewer evaluates the changes across five dimensions: correctness, readability, architecture, security, performance.

Every finding is labeled:

Critical — blocks the merge
Important — should fix before merging
Suggestion — optional improvement

Pass 2: Security Audit

@security-auditor starts from trust boundaries, runs STRIDE analysis, and maps exploitable vulnerabilities to OWASP Top 10.

Security-only findings: Critical, High, Medium, Low severity. Every Critical or High includes proof of concept and a specific fix — not vague "consider validating input" advice.

Pass 3: Simplification

@simplifier scans for complexity that can be removed without changing behavior: deeply nested logic, generic names, duplicated code, unnecessary abstractions.

This is the last pass — not because it matters least, but because it only makes sense after you've verified the code is correct and secure.

The Verdict

After all three passes, one consolidated report:

## Pre-Merge Review

### Verdict: DO NOT SHIP ❌

**Summary:** The payment flow changes contain one critical SQL injection
vulnerability and a missing token expiry check. The logic is otherwise
clean and well-structured.

### Must Fix Before Merge
- [CRITICAL] SQL injection in payment intent creation — payments.ts:47
  Fix: parameterize the amount field in the INSERT statement
- [CRITICAL] Reset token not invalidated after use — auth.ts:123
  Fix: SET reset_token = NULL in the UPDATE statement

### Should Fix Before Merge
- [IMPORTANT] Missing test for payment failure path — payments.test.ts
  The happy path is covered but no test for insufficient funds

### Optional Improvements
- [SUGGESTION] extractPaymentIntent() could be extracted to a helper — payments.ts:40-65

### Clean ✅
- Correct use of parameterized queries in all existing endpoints
- Rate limiting already applied to auth endpoints

SHIP = no Critical issues. You make the call on Important and Suggestion items — the chatmode doesn't make that judgment for you.

Real Scenario: What /ship Catches That You'd Miss

Let me show you three things the ship chatmode caught in a single review on a real feature — a task sharing system.

What I thought I was shipping: A feature that lets users share tasks with teammates. Looked clean. Tests passed. I'd been working on it for two days and was confident in it.

What /ship found:

Code review — correctness: The permission check was on the wrong side of the async boundary. If two users tried to accept the same share invitation simultaneously, both could succeed — creating a race condition that allowed a task to have two owners.

Security audit — broken access control: The GET /api/tasks/:id/share-link endpoint didn't verify the requester owned the task. Any authenticated user could generate a share link for any task by guessing the ID.

Simplification: The permission checking logic was duplicated in three different places. One helper function would have made it easier to maintain — and would have meant fixing the race condition in one place instead of three.

Three different problem types. One pass caught each one. Single command.

Setting Up /ship: Zero Configuration

The chatmode is already in the template. There's nothing to configure.

Go to github.com/panditAbhis/copilot-workflow
Click Use this template → create your repo
Open VS Code with the Copilot Chat panel
Click the chatmode selector (the dropdown at the top of Copilot Chat)
Select ship
Describe your changes

That's it. The chatmode is defined in .github/chatmodes/ship.chatmode.md — Copilot picks it up automatically.

The Two Other Chatmodes

The /ship chatmode isn't the only one in the template. There are two others for different phases of development.

/spec — Idea to Implementation Plan

Before writing a single line of code, /spec walks you through:

Idea refinement — sharpens vague concepts, generates variations, forces you to name your "Not Doing" list
Spec writing — produces a structured specification with success criteria, boundaries, and open questions
Task breakdown — converts the spec into ordered, verifiable tasks with acceptance criteria

Nothing gets implemented until the spec is approved. This sounds slow. It's actually the fastest path — because you're not rewriting code that was built on wrong assumptions.

[In /spec chatmode]
I want to add email notifications when tasks are assigned to team members.

The chatmode asks clarifying questions, surfaces assumptions ("should unassigned tasks also trigger notifications?"), proposes 3 approaches with trade-offs, writes the spec, and produces a numbered task list. Then it stops. You implement, guided by the tasks.

/debug — Systematic Root Cause Analysis

When something breaks, /debug runs the Prove-It Pattern:

Reproduce first — make the failure happen reliably before touching code
Localize — which layer? Which commit introduced it? (git bisect)
Root cause — fix the cause, not the symptom
Regression test — write a failing test that proves the bug existed
Verify — full suite passes with the fix applied

The chatmode won't let you skip straight to the fix. The reproduction step is mandatory. The regression test is mandatory. This discipline is tedious the first few times. After your first "this fixed it" that actually fixed it — permanently, with a test to prove it — you stop finding it tedious.

The Automation Gap

Here's what separates a "copilot user" from a "copilot workflow":

A copilot user asks Copilot questions. They get answers. Sometimes the answers are good. The quality depends on how well they prompt, how much context they provide, how consistently they remember to check security, how often they run reviews.

A copilot workflow is systematic. It defines the process once, then enforces it automatically. /ship runs the same three-pass review every time — not when you remember to, not when you feel like the code needs it, not just on "important" PRs.

The difference isn't the AI. It's the discipline baked into the tooling.

What's Coming Next

This template started with three agents and grew to 17. The next phase is agentic workflows — where Copilot doesn't just respond to your prompts but initiates workflows autonomously:

Copilot in GitHub Actions — automatic review triggered on every PR open, no human needed to remember
MCP server integration — agents that read your actual database schema, live error logs, and monitoring metrics when reviewing code
The /spec to production pipeline — from idea to deployed feature, with AI assistance at every step and human approval gates between them

The foundations are already in the template. Watch the repo for updates as each piece ships.

Get the Template

Everything in this series — all 17 agents, all 3 chatmodes, the CI pipeline, MCP config — is in the template.

👉 github.com/panditAbhis/copilot-workflow

One click. Use this template. Every repo you create from it has the full workflow.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy

Abhishek Pandit — Fri, 12 Jun 2026 13:54:58 +0000

A lock on your front door doesn't make your house secure.

Not if the back door is open. Not if the window latch is broken. Not if you're handing spare keys to strangers without realizing it.

Security works the same way. Adding password hashing to your login endpoint doesn't make your app secure — not if the password reset endpoint doesn't expire tokens, or the file upload handler accepts any file type, or the database query for your admin panel is built by concatenating user input.

Most developers secure the obvious thing and leave a dozen attack surfaces unchecked. Not because they're careless — because they're only looking at one door at a time.

@security-auditor changes that. Instead of checking individual controls, it starts from a map of your entire attack surface.

This is Part 4 of the copilot-workflow series.

The Bank Vault Analogy

Imagine you're designing a bank vault.

A naive approach: put a very thick door on the vault. Done.

A security engineer's approach: draw a map of everything that needs protecting, then ask "how would a thief get to it?" They'd check the vault door — but also the air vents, the maintenance tunnels, the manager's office, the cleaning crew's access, and the bank's Wi-Fi network.

This is threat modeling. You don't start with controls. You start with a map of what you're protecting and all the ways someone could get to it.

@security-auditor runs this process on your code automatically. It's not a checklist bot. It's an attacker's mindset, systematically applied.

How @security-auditor Thinks

Before flagging a single vulnerability, the agent does two things:

1. Maps your trust boundaries

Trust boundaries are places where data crosses from one trust level to another. HTTP requests come in — untrusted. Database data goes out — trusted. A user uploads a file — untrusted. Your auth middleware processes it — trusted (if the middleware is correct).

Every trust boundary is potential attack surface.

2. Runs STRIDE

STRIDE is a structured way to think about threats at each boundary:

Letter	Threat	Question
S	Spoofing	Can someone pretend to be a legitimate user or service?
T	Tampering	Can someone modify data in transit or at rest?
R	Repudiation	Can someone deny an action they took?
I	Information disclosure	Can data leak to unauthorized parties?
D	Denial of service	Can the system be overwhelmed?
E	Elevation of privilege	Can someone gain more access than they should have?

Then it maps findings to OWASP Top 10 — the industry standard list of the most critical web security risks.

A Real Example: The Password Reset Trap

Here's an endpoint I've seen in real codebases — a password reset handler:

app.post('/api/auth/reset-password', async (req, res) => {
  const { token, newPassword } = req.body;
  const user = await db.query(
    `SELECT * FROM users WHERE reset_token = '${token}'`
  );
  if (!user) return res.status(404).json({ error: 'Invalid token' });
  await db.query(
    `UPDATE users SET password = '${newPassword}' WHERE id = ${user.id}`
  );
  res.json({ success: true });
});

It "works." Users can reset their passwords. Tests pass. Looks fine.

Let me show you what @security-auditor finds:

@security-auditor Audit this password reset endpoint for security vulnerabilities.

Security Audit Report

Critical: 4 | High: 2 | Medium: 1

[CRITICAL] SQL Injection — token lookup

Location: Line 3
Description: User-supplied token is concatenated directly into SQL
Impact: Attacker can bypass authentication entirely. Input ' OR '1'='1 retrieves any user.
Proof of concept: token = "' OR '1'='1"
Fix: db.query('SELECT * FROM users WHERE reset_token = $1', [token])

[CRITICAL] Plaintext Password Storage

Location: Line 7
Description: newPassword stored as raw string — no hashing
Impact: Database breach exposes every user's actual password
Fix: const hashed = await bcrypt.hash(newPassword, 12); db.query('UPDATE users SET password = $1 WHERE id = $2', [hashed, user.id])

[CRITICAL] SQL Injection — UPDATE statement

Location: Line 7
Description: Both newPassword and user.id concatenated into SQL
Fix: Use parameterized query as shown above

[CRITICAL] Reset token not invalidated after use

Location: Missing
Description: Token remains valid after password reset — replayable
Impact: Anyone who intercepts the token can reset the password again
Fix: UPDATE users SET password = $1, reset_token = NULL WHERE id = $2

[HIGH] No token expiry check

Location: Missing
Description: Tokens never expire — valid indefinitely
Fix: Add reset_token_expires_at column, check > NOW() in query, expire after 15 minutes

[HIGH] No rate limiting

Location: Missing
Description: Unlimited attempts — brute-force token guessing is possible
Fix: app.use('/api/auth/', rateLimit({ windowMs: 900000, max: 10 }))

[MEDIUM] Token enumeration via error message

Location: Line 4
Description: 404 reveals whether a token exists — helps attackers
Fix: Return 200 with generic message regardless of outcome

That endpoint looked functional. The actual security posture: four critical vulnerabilities, any one of which could result in account takeover or full database exposure.

The OWASP Top 10: Your Security Baseline

@security-auditor maps every finding to the OWASP Top 10 — the most critical web application security risks, updated every few years based on real breach data.

You don't need to memorize it. But understanding the categories helps you recognize when to invoke the agent:

#	Risk	"Invoke when you're..."
A01	Broken Access Control	Building any endpoint that checks ownership
A02	Cryptographic Failures	Storing passwords, tokens, or PII
A03	Injection	Writing any database query with user input
A04	Insecure Design	Designing auth flows, payment logic
A05	Security Misconfiguration	Setting up CORS, headers, error messages
A06	Vulnerable Components	Adding a new npm dependency
A07	Authentication Failures	Building login, registration, password reset
A08	Software Integrity	Setting up CI/CD or deployment pipelines
A09	Logging Failures	Building audit trails or error handlers
A10	SSRF	Building webhooks, URL imports, link previews

The Three-Tier Rule

The most useful mental model from @security-auditor is the three-tier boundary system. Before writing any security-sensitive code, you know exactly what category it falls into:

Always do (no human approval needed):

Validate all external input at the boundary
Parameterize all database queries
Hash passwords with bcrypt/argon2, ≥12 rounds
Set security headers (CSP, HSTS, X-Frame-Options)
Use httpOnly + secure + sameSite cookies for sessions
Run npm audit before every release

Ask first (requires explicit approval):

Adding new authentication flows
Storing new categories of sensitive data (PII, payment info)
Adding new external service integrations
Changing CORS configuration
Adding file upload handlers
Modifying rate limits

Never do (hard stops):

Commit secrets to version control
Log passwords, tokens, or full credit card numbers
Trust client-side validation as a security boundary
Use eval() or innerHTML with user-provided data
Store auth tokens in localStorage
Expose stack traces or internal error details to users

When to Invoke @security-auditor

Before shipping anything that:

Accepts user input (forms, query params, file uploads)
Handles authentication or sessions
Fetches URLs provided by users (webhooks, import-from-URL features)
Calls external APIs with stored credentials
Processes payment or PII data

After:

Adding a new npm dependency (npm audit + ask the auditor to check supply chain risk)
Changing CORS configuration
Adding any new public endpoint

Alongside @code-reviewer for:

AI-generated code (especially auth logic — Copilot generates plausible but often vulnerable patterns)
Code you inherited that nobody has security-reviewed

The LLM Security Trap

Here's a threat category most security guides don't cover: your own AI assistant.

If you're building features that use LLMs — chatbots, summarizers, AI agents — the model's output is untrusted data. Full stop. It must be treated exactly like user input.

// DANGEROUS: passing LLM output directly to the database
const sqlQuery = await llm.generate(`Write SQL to find tasks for: ${userQuery}`);
await db.query(sqlQuery);  // arbitrary SQL execution

// SAFE: parse defensively, validate, then act
let intent;
try {
  intent = TaskQuerySchema.parse(JSON.parse(await llm.replyJson(userQuery)));
} catch {
  throw new ValidationError('Could not parse request');
}
const tasks = await db.tasks.findMany({ where: buildSafeWhere(intent) });

@security-auditor checks for this specifically. Prompt injection (an attacker embedding instructions in text your LLM processes), excessive agency (your agent doing things it shouldn't have permission to do), and unbounded consumption (a crafted input that runs up your API costs) are all on its checklist.

The Security Mindset Shift

Before using @security-auditor, I thought about security as a feature — something you add to a working system.

After: security is a constraint. You don't add it after the code works. You build it into every decision.

The difference in practice: when I write a database query, I parameterize it as I'm writing it — not after. When I build an endpoint, I add rate limiting and auth checks in the same commit — not later. When I store a token, I immediately ask what happens if that token is compromised.

The agent didn't teach me to be more careful. It taught me to look in more places.

Get the Template

@security-auditor is included in the copilot-workflow template — one setup, automatic on every session.

👉 github.com/panditAbhis/copilot-workflow

Next in the series: Part 5 — The /ship chatmode. One command that fans out code review, security audit, and simplification in sequence and gives you a single SHIP / DO NOT SHIP verdict.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern

Abhishek Pandit — Fri, 12 Jun 2026 13:49:46 +0000

Let me describe a scene you've lived.

A bug report lands. "The completed tasks are still showing up in the active list." You look at the code, spot something that looks wrong, change it, refresh the browser — looks fixed. You close the ticket.

Three weeks later: same bug. Different user. Same ticket reopened.

What happened? You fixed the symptom. The actual cause was still there. And you have no test to tell you if it comes back.

This is the most common bug-fix pattern in software. It's also the most expensive one.

This is Part 3 of the copilot-workflow series. Part 1 set up the template. Part 2 covered pre-merge review. This one covers @test-engineer — the QA engineer persona that forces you to prove bugs exist before fixing them.

The Crime Scene Analogy

Think of a bug like a crime scene.

A bad detective shows up, looks around, decides they know what happened, and closes the case. No evidence. No proof. Just intuition.

A good detective does something different. They document the evidence first. They establish what actually happened — with proof — before drawing conclusions. Then they verify the conclusion against the evidence.

@test-engineer makes you be the good detective.

The rule: write a test that proves the bug exists before you write a single line of fix code.

That test is your evidence. If it doesn't fail before your fix, you didn't actually reproduce the bug — you guessed. If it passes after your fix, you have proof the bug is gone and can never silently return.

The Prove-It Pattern: Step by Step

Here's exactly what happens when I get a bug report.

Step 1: Describe the bug to @test-engineer

@test-engineer I have a bug report: completed tasks still appear
in the active task list. The "completedAt" timestamp is being set
correctly, but the list query is returning them anyway.

Write a failing test that proves this bug exists before I fix it.

Step 2: @test-engineer writes a reproduction test

// This test MUST fail with the current broken code
it('excludes completed tasks from the active task list', async () => {
  // Arrange: create one active and one completed task
  const activeTask = await createTask({ title: 'Buy groceries' });
  const completedTask = await createTask({ title: 'Walk dog' });
  await completeTask(completedTask.id);

  // Act: fetch the active list
  const activeTasks = await listActiveTasks();

  // Assert: completed task must not appear
  expect(activeTasks).toHaveLength(1);
  expect(activeTasks[0].id).toBe(activeTask.id);
  expect(activeTasks.map(t => t.id)).not.toContain(completedTask.id);
});

Step 3: Run it. Confirm it fails.

npm test -- --grep "excludes completed tasks"

FAIL src/tasks.test.ts
  ✕ excludes completed tasks from the active task list
    Expected length: 1
    Received length: 2   ← Bug confirmed. Completed task IS in the list.

The test fails. The bug is real. Now you have evidence.

Step 4: Fix the actual cause

// Before: missing filter — returns ALL tasks
async function listActiveTasks(): Promise<Task[]> {
  return db.tasks.findMany({ orderBy: { createdAt: 'desc' } });
}

// After: filter for tasks where completedAt is null
async function listActiveTasks(): Promise<Task[]> {
  return db.tasks.findMany({
    where: { completedAt: null },
    orderBy: { createdAt: 'desc' }
  });
}

Step 5: Run the test again. Confirm it passes.

npm test -- --grep "excludes completed tasks"

PASS src/tasks.test.ts
  ✓ excludes completed tasks from the active task list (23ms)

Step 6: Run the full suite. No regressions.

npm test

All green. Ship it.

Why This Order Matters

You might be thinking: why not just write the fix first, then write a test to verify it?

Because then you're testing your fix, not the bug.

If you write the fix first, you unconsciously write a test that confirms your fix works. You're not proving the bug existed — you're proving your solution compiles. Those are completely different things.

The Prove-It Pattern forces a different discipline:

The test failing tells you the bug is real. Not "I think this is a bug." Not "a user reported this." Actually real, actually reproducible, actually failing right now.

The test passing tells you the fix is correct. Not "I changed something that looks related." Actually correct — the specific behavior that was broken is now working.

The test existing tells you it can never silently return. The next time someone changes the query, the test will catch it immediately. The bug is permanently guarded.

Using @test-engineer for New Features (TDD)

The Prove-It Pattern is for bugs. But @test-engineer also guides you through Test-Driven Development for new features.

The idea is the same, just inverted: write a failing test that describes what you want the code to do, then make the code pass it.

@test-engineer I'm building a task priority system. Tasks can be low,
medium, or high priority. The active task list should return tasks
sorted by priority (high first) then by creation date.

Write tests for this feature before I implement it.

@test-engineer produces:

describe('listActiveTasks with priority sorting', () => {
  it('returns high priority tasks first', async () => {
    const low = await createTask({ title: 'Low', priority: 'low' });
    const high = await createTask({ title: 'High', priority: 'high' });
    const medium = await createTask({ title: 'Medium', priority: 'medium' });

    const tasks = await listActiveTasks();

    expect(tasks[0].priority).toBe('high');
    expect(tasks[1].priority).toBe('medium');
    expect(tasks[2].priority).toBe('low');
  });

  it('sorts by creation date within the same priority', async () => {
    const first = await createTask({ title: 'First', priority: 'high' });
    const second = await createTask({ title: 'Second', priority: 'high' });

    const tasks = await listActiveTasks();
    const highPriorityTasks = tasks.filter(t => t.priority === 'high');

    expect(highPriorityTasks[0].id).toBe(second.id); // newer first
    expect(highPriorityTasks[1].id).toBe(first.id);
  });

  it('excludes completed tasks from the sorted list', async () => {
    const completed = await createTask({ title: 'Done', priority: 'high' });
    await completeTask(completed.id);
    const active = await createTask({ title: 'Active', priority: 'low' });

    const tasks = await listActiveTasks();

    expect(tasks.map(t => t.id)).not.toContain(completed.id);
  });
});

All three tests fail — the feature doesn't exist yet. Now you implement listActiveTasks to make them pass. The tests become the specification.

The Test Pyramid: Where Each Test Lives

@test-engineer knows the right test level for each scenario:

          ╱╲
         ╱  ╲         E2E Tests (5%)
        ╱    ╲        Real browser, full user flow
       ╱──────╲
      ╱        ╲      Integration Tests (15%)
     ╱          ╲     Real database, real API
    ╱────────────╲
   ╱              ╲   Unit Tests (80%)
  ╱                ╲  Pure logic, no I/O, milliseconds each
 ╱──────────────────╲

Unit test territory: Pure functions, validation logic, data transformations. No database, no network. Runs in milliseconds. The Prove-It Pattern for logic bugs lives here.

Integration test territory: API endpoints, database queries, the interaction between your code and your database. The listActiveTasks example above is an integration test — it hits a real database.

E2E test territory: Critical user paths that must work end-to-end. "User can log in and create a task" is an E2E test. You don't write these for every bug — only for flows so important they justify the maintenance cost.

When you tell @test-engineer what you're testing, it automatically picks the right level.

What Makes a Good Test Name

@test-engineer enforces descriptive test names as a specification:

// Bad: tells you nothing when it fails
it('works correctly', () => { ... });
it('handles the case', () => { ... });
it('test 3', () => { ... });

// Good: reads like a requirement
it('excludes completed tasks from the active task list', () => { ... });
it('sorts high priority tasks before low priority tasks', () => { ... });
it('throws NotFoundError when task ID does not exist', () => { ... });

When a test fails, you need to understand what broke from the test name alone — without reading the implementation. Good names make CI failures self-explanatory.

The One Rule That Changes Your Debugging Forever

"A bug fix without a reproduction test is not a bug fix. It's a guess."

Before @test-engineer, I fixed bugs by intuition — change the thing that looks wrong, verify it works, move on. Sometimes I was right. Sometimes I was fixing a symptom while the cause festered.

Now: every bug gets a reproduction test first. Every fix gets verified by that test. Every test stays in the suite permanently.

The cumulative effect: each bug I fix makes the next bug harder to introduce. The test suite gets smarter with every incident. The codebase gets more resilient over time, not less.

Get the Template

@test-engineer is part of the copilot-workflow template — one setup, every repo.

👉 github.com/panditAbhis/copilot-workflow

Next in the series: Part 4 — @security-auditor and threat modeling. How to think like an attacker before an attacker does.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

Stop Merging Blind: How I Use @code-reviewer Before Every PR

Abhishek Pandit — Fri, 12 Jun 2026 13:06:53 +0000

Think about the last time you merged a PR without a proper review.

Maybe tests passed. Maybe it looked fine at a glance. Maybe you were moving fast and told yourself you'd clean it up later.

Then two weeks later, a bug surfaces. Or a junior dev inherits the code and spends a day trying to understand it. Or a security scanner flags something in a production deploy.

"Looked fine" is not a review. It's a hope.

This is Part 2 of the copilot-workflow series. Part 1 covered setting up the template. This one covers using @code-reviewer — the staff engineer persona that reviews every change before it touches main.

The Problem With Most Code Review

Code review fails in two ways.

It gets skipped. You're the only dev on a project, or the team is moving fast, or "it's just a small change." The PR merges without anyone really looking at it.

It gets rubber-stamped. Someone glances at the diff, sees nothing obviously on fire, and clicks Approve. This catches maybe 20% of real issues — the obvious ones. The subtle ones sail through.

What you actually need is a reviewer who checks multiple dimensions systematically. Not just "does this work" but also "is this readable," "does this fit the architecture," "is there a security hole," "will this cause a performance problem at scale."

That's what @code-reviewer does.

The 5-Axis Review Framework

Think of your code like a restaurant being health-inspected.

A good inspector doesn't just taste the food. They check the kitchen temperature (correctness), whether the menu is readable (readability), whether the kitchen layout makes sense (architecture), whether hygiene standards are met (security), and whether the kitchen can handle a full service (performance).

Pass all five. Not just one.

Axis	The question	What it catches
Correctness	Does it do what it claims?	Edge cases, error paths, off-by-one errors, race conditions
Readability	Can a stranger understand it?	Confusing names, deeply nested logic, missing context
Architecture	Does it fit the system?	Circular dependencies, wrong abstraction level, code duplication
Security	Can it be exploited?	Unvalidated input, SQL injection, exposed secrets, missing auth checks
Performance	Will it survive load?	N+1 queries, unbounded loops, missing pagination, blocking operations

The Labeling System That Changes Everything

Most review feedback is undifferentiated. Everything comes in as a comment and you have no idea what's blocking the merge vs. what's a nice-to-have. So you either fix everything (slow) or fix nothing (wrong).

@code-reviewer labels every finding:

Label	Meaning	What to do
Critical	Blocks merge. Security vulnerability, data loss risk, broken functionality.	Fix before the PR goes anywhere.
Important	Should fix before merge. Wrong abstraction, missing test, poor error handling.	Fix unless you have a very good reason not to.
Suggestion	Optional improvement. Naming, style, minor optimization.	Take it or leave it. Your call.

This means you read the review once and immediately know your priority order. No guessing what the reviewer considers urgent.

How to Use It

Basic review

@code-reviewer Review the changes in src/auth/login.ts

Give it context — better output

@code-reviewer Review src/auth/login.ts — this adds JWT refresh token rotation. 
Focus on the token storage and the edge case where both tokens expire simultaneously.

The more context you give, the more targeted the output. Tell it what changed and what you're worried about.

Review a whole feature

@code-reviewer Review the changes across src/api/users/ and src/middleware/auth.ts — 
this is a new role-based access control system. The spec said admin users should 
be able to manage other users but not delete their own account.

Paste a diff directly

For small changes, paste the diff straight into chat:

@code-reviewer Review this change:

- const user = await db.query(`SELECT * FROM users WHERE id = ${userId}`);
+ const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);

Is this fix complete or are there other injection vectors in this file?

What the Output Looks Like

Here's a real-world example. Imagine you've written a password reset endpoint:

app.post('/api/auth/reset-password', async (req, res) => {
  const { token, newPassword } = req.body;
  const user = await db.query(`SELECT * FROM users WHERE reset_token = '${token}'`);
  if (!user) return res.status(404).json({ error: 'Invalid token' });
  await db.query(`UPDATE users SET password = '${newPassword}' WHERE id = ${user.id}`);
  res.json({ success: true });
});

Ask @code-reviewer to review it. You'll get something like this:

Verdict: REQUEST CHANGES

Overview: Password reset endpoint with multiple critical security vulnerabilities and missing validation. Do not merge.

Critical Issues

Line 3 — SQL injection via string interpolation in token lookup. Use parameterized query: db.query('SELECT * FROM users WHERE reset_token = $1', [token])
Line 5 — Password stored as plaintext. Hash with bcrypt before storing: await bcrypt.hash(newPassword, 12)
Line 5 — Second SQL injection in UPDATE statement. Parameterize both values.
Missing — Reset token not invalidated after use. An attacker who intercepts the token can reset the password again.
Missing — No token expiry check. Tokens should expire after 15-30 minutes.

Important Issues

Missing — No rate limiting on this endpoint. Brute-force token guessing is possible.
Missing — No input validation on newPassword. Minimum length, complexity requirements should be enforced.
Line 4 — Error message reveals whether a token exists. Return a generic message regardless of outcome to prevent token enumeration.

Suggestions

Consider logging password reset events to an audit log (user ID, timestamp, IP).

What's Done Well

Endpoint is correctly unauthenticated — password reset flows shouldn't require a logged-in session.

That is what a real review looks like. Five critical issues. Two important ones. A genuine suggestion. All labeled. Priorities clear.

Without this review, that code merges and your users' passwords are stored in plaintext with SQL injection vulnerabilities.

When to Invoke It

Before every PR — no exceptions. This is the rule.

The cost is one @code-reviewer message. The cost of skipping it is a production incident, a security breach, or a codebase that quietly becomes harder to maintain.

Three situations where it's especially valuable:

1. Code you wrote quickly. When you're moving fast you make tradeoffs. The reviewer surfaces those tradeoffs before they become permanent.

2. Code another AI generated. Copilot autocomplete, ChatGPT, whatever. AI-generated code is confident and plausible even when wrong. It needs more scrutiny, not less.

3. Code in unfamiliar territory. If you're writing auth logic but auth isn't your specialty, @code-reviewer + @security-auditor in combination is an extremely strong safety net.

The Mental Shift

Here's what changes when you run @code-reviewer consistently.

You stop thinking "does this work?" and start thinking "is this ready?" Those are different questions. Code can work and still be wrong — wrong for maintainability, wrong for security, wrong for the next engineer who has to touch it.

A passing test suite tells you the code does what you tested. The 5-axis review tells you whether it's actually ready to ship.

Get the Template

This agent is part of the copilot-workflow template — one setup, works in every repo you create from it.

👉 github.com/panditAbhis/copilot-workflow

Next in the series: Part 3 covers @test-engineer and the Prove-It Pattern — how to write a failing test that proves a bug exists before you touch a single line of fix code.

If this was useful, follow for the rest of the series and drop a ⭐ on the repo.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.

Abhishek Pandit — Fri, 12 Jun 2026 12:50:19 +0000

Imagine you hire a brilliant contractor.

Sharp, fast, knows every technology. You spend the first morning briefing them: "We do test-driven development here. Security is non-negotiable. PRs stay under 300 lines. We parameterize every database query — no exceptions."

They nod. They get it. The day goes brilliantly.

Next morning they show up with no memory of any of it.

You brief them again. They get it again. Day goes great.

This repeats. Every. Single. Day.

That is GitHub Copilot without configuration.

The Problem Nobody Talks About

Copilot is genuinely impressive. But it's stateless. Every session starts from zero. It doesn't know:

Your testing philosophy (TDD? integration-first? mocks or no mocks?)
Your security rules (parameterized queries, no secrets in code, validate at the boundary)
Your review standards (what makes a PR approvable vs. blocked)
Which specialist mindset you need right now (reviewer? tester? security auditor?)

So you either re-prompt it every session — which nobody actually does — or you accept generic output that fits nobody's codebase in particular.

There's a better way.

The Fix: Give Copilot an Employee Handbook

GitHub Copilot has a feature most people don't know exists: if you put a file at .github/copilot-instructions.md in your repo, Copilot reads it automatically — every session, forever.

No prompting. No setup ritual. It just knows your rules.

And there's more. Put specialist personas in .github/agents/ and you can summon them in Copilot Chat with a single @ mention. One message and Copilot becomes a staff engineer doing a 5-axis code review. Another message and it's a QA engineer writing tests using the Prove-It pattern. Another and it's a security auditor running STRIDE analysis against your auth flow.

I built a template that sets all of this up for you. Here's how it works.

What You're Getting

.github/
  copilot-instructions.md   ← Copilot reads this every session. Your rules, automatically.
  agents/
    code-reviewer.md        ← Staff engineer. 5-axis review before every merge.
    test-engineer.md        ← QA engineer. TDD, coverage gaps, bug reproduction tests.
    security-auditor.md     ← Security engineer. OWASP Top 10. STRIDE. Real vulnerabilities only.

Three specialists on speed dial. Zero configuration after the first setup.

5-Minute Setup

What you need first

GitHub account
VS Code
GitHub Copilot subscription (Individual, Business, or Enterprise)
GitHub Copilot extension installed in VS Code
GitHub Copilot Chat extension installed in VS Code

That's it. No CLI tools. No config files to write manually.

Step 1 — Create your repo from the template

Go to 👉 github.com/panditAbhis/copilot-workflow

Click the green "Use this template" button → "Create a new repository".

Name your repo, set visibility, click Create repository.

Your new repo now has the .github/ folder with everything pre-configured. Copilot picks it up automatically — no further action needed.

Step 2 — Verify it's working

Open your new repo in VS Code. Open Copilot Chat (Ctrl+Alt+I on Windows/Linux, Cmd+Alt+I on Mac).

Ask:

What are the coding standards for this project?

Copilot should describe your testing pyramid, review standards, and security rules — without you typing a single instruction. If it does, you're done with setup.

Doesn't work? Open VS Code settings and confirm github.copilot.chat.codeGeneration.useInstructionFiles is set to true. It's the default, but worth checking.

Step 3 — Meet your three specialists

Open Copilot Chat and try these:

The Code Reviewer:

@code-reviewer Review the changes in src/auth/login.ts

You get a structured report: Critical issues (blocks merge), Important issues (should fix), Suggestions (optional). Labeled so you know exactly what's required.

The Test Engineer:

@test-engineer I have a bug — users can log in with an expired token. Write a failing test that proves this bug exists before I fix it.

This is the Prove-It Pattern: write a test that fails with the current broken code, then fix the code until the test passes. The test is your proof the bug existed and is now gone.

The Security Auditor:

@security-auditor Audit my file upload handler. I want OWASP Top 10 coverage and a STRIDE threat analysis.

Findings are classified by severity (Critical/High/Medium/Low) with proof-of-concept exploitation and specific remediation code — not vague "consider validating input" advice.

What the Coding Standards Actually Cover

The copilot-instructions.md bakes in three areas of discipline. Here's the short version:

Testing

Copilot knows the test pyramid: 80% unit tests (pure logic, no database, milliseconds each), 15% integration tests (real database, localhost only), 5% E2E tests (critical user paths only).

It knows to test what code does, not how it does it internally. Tests that verify method call sequences break when you refactor — even if behavior is unchanged. State-based assertions survive refactoring.

Code Quality

PRs under ~100 lines are ideal. Over 1000 lines, it'll tell you to split. Refactoring PRs and feature PRs stay separate — always. No dead code, no "I'll clean this up later" shims. Abstractions don't get written until the third time you need them.

Security

Every external input is hostile until validated. Database queries are parameterized — no concatenating user input into SQL, ever. Secrets stay in .env, never in code. LLM output (yes, even Copilot's output) is treated as untrusted input — it never goes directly into eval, SQL, or HTML.

The Analogy That Clicked for Me

Think of this setup like a kitchen.

Vanilla Copilot is like a chef who shows up not knowing your restaurant's cuisine, your signature dishes, your allergen rules, or your prep standards. You explain everything every service.

This template is the kitchen bible — laminated, on the wall, permanent. The chef reads it once at the start of each shift. You never explain the basics again.

The three agents are your sous-chefs: a quality inspector who tastes every dish before it leaves the kitchen, a prep specialist who sets up the station correctly before service, and a health inspector who checks that nothing is going to make someone sick.

You're still the head chef. You make the calls. They make you faster and safer.

What's Next in This Series

This is part 1. Here's what's coming:

Part	What you'll learn
1 — this one	Setup: instructions file + three agent personas
2	Deep dive: using `@code-reviewer` before every PR
3	Deep dive: TDD with `@test-engineer` and the Prove-It pattern
4	Deep dive: threat modeling with `@security-auditor`
5	Building agentic workflows — the `/ship` command that fans out all three in parallel

Get the Template

👉 github.com/panditAbhis/copilot-workflow

Click Use this template. Five minutes. Done.

If this helped, follow for the rest of the series — and drop a ⭐ on the repo so others can find it.

Coding standards in this template are derived from Addy Osmani's agent-skills, Google's Software Engineering practices, and OWASP guidelines.

Series navigation

Part	Title
1	Your Copilot Has No Memory. Here's How I Fixed That in 5 Minutes.
2	Stop Merging Blind: How I Use @code-reviewer Before Every PR
3	Never Fix a Bug Without Proof: The @test-engineer Prove-It Pattern
4	Think Like an Attacker: How I Use @security-auditor Before Every Production Deploy
5	One Command to Rule Them All: The /ship Chatmode
6	Stop Building the Wrong Thing: @spec-writer and @planner
7	A Day in the Life: Complete Session Walkthrough

A Day in the Life: Complete Claude Code Session Walkthrough

Abhishek Pandit — Wed, 10 Jun 2026 15:15:04 +0000

Part 7 of 7 · Series: Building Your AI Developer Handbook · GitHub

The Scenario

You're building a password reset feature. User enters email → gets a reset link → clicks link → enters new password. Standard flow. Medium complexity.

Let's walk through every step using the full workflow — as if you're looking over the shoulder of someone who built this system.

"Show me your workflow and I'll show you your output quality."

Before You Even Type

Claude loads automatically in the background:

✓ ~/.claude/CLAUDE.md loaded           ← the global handbook
✓ .claude/CLAUDE.md loaded             ← project rules (TypeScript, pnpm)
✓ memory/MEMORY.md scanned             ← all lessons and preferences

You haven't typed anything yet. Claude already knows:

Feature-based folder structure
State management ladder
No mocking the database
No AI attribution in commits
No useCallback without profiler evidence

"A doctor who reviews your file before you enter the room is more useful than one who asks 'so, remind me who you are?'"

Step 1: /status — Confirm the Setup

/status

Model:   claude-sonnet-4-6
Effort:  normal
Plugins: security-guidance ✓

Thirty seconds. Sometimes the wrong model loads due to overload fallback. Sometimes a plugin fails silently. This check costs 30 seconds and prevents a surprise 30 minutes later.

"A pilot's first action after sitting in the cockpit isn't to take off. It's to check all instruments are reading correctly."

Step 2: /cost — Baseline

/cost
→ Tokens used: 2,847 | Estimated cost: $0.004

Note this number. You'll compare it later before the expensive code review step. A surprise spike means something went wrong.

Step 3: /plan — Design Before Coding

/plan

Build a password reset feature:
- User enters email on /forgot-password
- System sends a reset link (token, expires in 1 hour)
- User clicks link → /reset-password?token=xxx
- User enters new password
- Token validated, password updated, token invalidated

Claude responds with a plan — no code yet:

Proposed approach:

1. DB: Add password_reset_tokens table
   - id, user_id, token (hashed), expires_at, used_at

2. API:
   - POST /api/auth/forgot-password — generate token, send email
   - POST /api/auth/reset-password  — validate token, update password

3. UI:
   - /features/auth/ForgotPasswordForm.tsx
   - /features/auth/ResetPasswordForm.tsx
   - usePasswordReset.ts hook

4. Tests:
   - Unit: token generation/validation logic
   - Integration: full flow with real test DB
   - No E2E (not a critical path)

Questions before I start:
- Email provider?
- Token length and hashing algorithm preference?
- Should used tokens be deleted or marked used?

"A plan that asks questions is more valuable than a plan that assumes answers."

You answer the questions. You approve the plan. Only now does coding start.

Step 4: Code — Supervised Implementation

Claude implements. Because the preferences are loaded, it automatically:

Creates /features/auth/ForgotPasswordForm.tsx — not /components/ForgotPassword.tsx
Uses useState for form fields, TanStack Query for the API calls
Writes catch blocks that console.error and surface toasts
Does NOT add useCallback to handlers without profiler evidence

You watch. You redirect if something drifts.

"Supervision isn't distrust — it's how you catch small course corrections before they become major detours."

Step 5: pnpm test — The Blocking Gate

pnpm test

Must exit 0. Not "mostly passing." Not "the one failing test is unrelated." All tests pass.

If they fail:

FAIL features/auth/usePasswordReset.test.ts
  ✗ handles expired token correctly
    Expected: { error: 'Token expired' }
    Received: { error: undefined }

You stop. You fix it. You run tests again. Only when green do you proceed.

"You wouldn't send a letter before proofreading it. You wouldn't ship code before testing it. The test gate is the proofread."

Step 6: /cost — Delta Check

/cost
→ Tokens used: 18,492 | Delta: +15,645 | Cost: $0.026

~15k tokens for a medium feature is normal. If you saw +80k tokens, that's a red flag — Claude may have scanned the whole codebase or looped a tool call. Investigate before spending on the code review.

Step 7: /code-review — Bug and Design Audit

/code-review

Claude reviews the diff — not the whole codebase, just what changed:

⚠ HIGH  usePasswordReset.ts:47
  Token comparison uses === (timing attack surface).
  Fix: use crypto.timingSafeEqual() instead.

ℹ LOW   ResetPasswordForm.tsx:23
  Loading state not shown during submission.
  Fix: disable submit button while isPending is true.

"A second pair of eyes catches what the first pair stopped seeing. Even if both pairs belong to the same AI."

Fix the HIGH immediately. Decide on the LOW.

Step 8: /simplify — Readability Cleanup

/simplify

Code review found bugs. Simplify finds clutter:

usePasswordReset.ts: resetForm() called in 3 places — extract to reset handler
ForgotPasswordForm.tsx: inline styles on 2 elements — move to className

This pass doesn't hunt for bugs. It hunts for code that works but could be cleaner.

"A code review checks if the bridge is safe. Simplify checks if the bridge is elegant. Both matter."

Step 9: /code-review --comment — Post to PR

/code-review --comment

Posts findings as inline comments directly on the GitHub PR. Reviewers see the notes in context — right on the lines that matter.

PR is ready for human review.

The Full Session, Compressed

Step	Command	Time
1	`/status`	30 sec
2	`/cost`	5 sec
3	`/plan`	10–20 min
4	Code	varies
5	`pnpm test`	2–5 min
6	`/cost`	5 sec
7	`/code-review`	5–10 min
8	`/simplify`	5–10 min
9	`/code-review --comment`	2 min

Total gate overhead: ~30–40 minutes. Defects reaching production: dramatically fewer.

"The gates don't slow you down. They stop you from having to go back."

What This Looks Like Without the Workflow

Without the gates, the same feature might take 45 minutes to implement. But:

The token comparison vulnerability reaches production ← timing attack
A test failure was "unrelated" and got ignored ← it wasn't
Folder structure drifted from feature-based ← scattered files
PR has no inline notes ← reviewer has no context
Session cost 3x more ← nobody checked the delta

"Fast and wrong is slower than right the first time."

Your Turn — Start With Three Things

You don't need to implement this entire workflow today. Start here:

Create ~/.claude/CLAUDE.md with four rules: think first, simplicity, surgical changes, goal-driven
Create one memory file — next time Claude does something you didn't want, write it down
Add the test gate — never proceed past failing tests

The rest follows naturally.

The Full Series

Part	Topic
Part 1	Overview — the full system
Part 2	The Handbook (CLAUDE.md)
Part 3	The Memory System
Part 4	Battle Scars as Rules (Feedback Files)
Part 5	Your Coding DNA (User Preferences)
Part 6	Context is King (Project + Reference Files)
Part 7	A Day in the Life (Complete Walkthrough)

All workflow files on GitHub

"Discipline is not the enemy of creativity. It's the foundation that lets creativity build something that lasts."

Thanks for following the series. If you build your own version of this workflow, share it — I'd love to see how others adapt these ideas.

Context is King: How Project Files and Templates Keep Claude on Track

Abhishek Pandit — Wed, 10 Jun 2026 15:09:19 +0000

Part 6 of 7 · Series: Building Your AI Developer Handbook · GitHub

The Context Problem

Every developer switches between projects. Each project has its own setup, ongoing decisions, deadlines, and "why did we do it this way" history.

"Imagine a consultant who works with five different clients. Every Monday they need a full briefing — 'who are you, what are we building, why did you make that decision last week?' — before they can do anything useful. Now imagine if that briefing happened every single day."

That's Claude without project context files.

Project files and reference files give Claude the backstory before it starts. Not the code — Claude can read the code. The context behind the code.

The Project Context File

---
name: project_auth_rewrite
type: project
---

Auth system rewrite in progress. Deadline: 2026-06-20.

**Why:** Legal flagged session token storage as non-compliant
with new data regulations.

**How to apply:** Scope decisions favor compliance over
ergonomics until the audit is complete.

Every project file has three parts:

The fact — what happened or what was decided
The why — motivation (deadline, constraint, stakeholder, incident)
How to apply — what should change about Claude's behavior

"A sign that says 'do not open this door' is easy to ignore. A sign that says 'do not open this door — last person who did triggered the fire suppression system' is not."

The Why is what lets Claude judge edge cases. Without it, rules are followed blindly. With it, Claude can ask: "does this new situation actually trigger the same concern?"

What Goes in a Project File

Good content:

Architectural decisions and the trade-offs that led to them
Ongoing work — who is doing what, by when (use absolute dates, not "next Thursday")
Why a particular library was chosen over alternatives
A constraint not obvious from the code (legal, performance, security)

Bad content — don't memorize these:

Already exists	Don't put in project file
Code patterns	Read the code
File structure	Read the filesystem
Git history	Run `git log`
Things that change daily	Too stale too fast

"A map is useful. A map from three years ago of a city that's been rebuilt is worse than no map — it gives you false confidence."

Project files decay. The Why field tells you whether a memory is still load-bearing or just historical noise.

When Project Files Shine

Without a project file:

Session start:
"So just to re-explain the context — we're mid-sprint on auth,
there's a compliance deadline, the approach was chosen because..."

With a project file:

[Claude reads: auth rewrite, deadline 2026-06-20,
compliance is the priority, no ergonomics shortcuts]

You: "should we add a caching layer to the token check?"
Claude: "Given the compliance requirement, caching token validation
would complicate the audit trail — skip it until after the audit."

Claude makes the right call without being told why. That's context working.

The Reference File

---
name: reference_template_location
type: reference
---

Template path: ~/.claude/templates/ts-react-project.md

Usage: at new repo init, copy to .claude/CLAUDE.md.
Fill in [Project Name] and Architecture section.

Note: verify library versions at each project init — they drift.

Reference files are simple: where to find things that live outside the project.

Where bugs are tracked (Linear, Jira, GitHub Issues)
Where design assets live (Figma, Storybook)
Where dashboards are (Grafana, Datadog)
Where templates live (local filesystem paths)

"A good assistant knows where the filing cabinet is. They don't memorize every document inside it — they just know where to look."

The Template System

The reference file above points to a template: ~/.claude/templates/ts-react-project.md.

This is a complete project-level CLAUDE.md for TypeScript/React projects. New repo workflow:

mkdir -p .claude
cp ~/.claude/templates/ts-react-project.md .claude/CLAUDE.md
# Fill in: [Project Name], Architecture section

Instantly Claude knows: TypeScript project, pnpm, Zod for validation, TanStack Query for server state, Zustand for client state. All project-level rules in place before you write a single line.

"A chef's mise en place — everything in its place before cooking starts. Setup done right means execution can be clean."

The warning: "Verify library versions are still current at each project init — they drift." A template from six months ago might reference a library that's had a major version bump. Always check.

The Full Context Hierarchy

When Claude starts a session, it reads context in layers:

1. Global CLAUDE.md       ← who you are, universal rules
2. Project CLAUDE.md      ← this stack, this project's rules
3. Memory files (index)   ← lessons, preferences, project context
4. The code itself        ← what's actually been built

"General law → company policy → department rules → today's meeting agenda. Each layer is more specific than the last."

Writing Your First Project File

You don't write project files upfront. Write them when you notice you're re-explaining context.

Trigger: You find yourself starting a session with "so just to re-explain the context..."

Action: Stop. Write a project file:

---
name: project_[feature_name]
type: project
---

[What's being built, current status]

**Why:** [The motivation — constraint, deadline, stakeholder ask]

**How to apply:** [What Claude should do differently because of this]

Add it to MEMORY.md. Next session, Claude already knows.

Key Takeaway

Project files carry the backstory — decisions, motivations, and constraints not in the code but shaping every line of it.

Reference files carry the map — where to find things outside the project.

Together they turn a new session from a cold start into a warm handoff.

"Context isn't just helpful. Without it, the right answer and the wrong answer can look identical."

Next: Part 7 — A Day in the Life: Complete Claude Code Session Walkthrough

All workflow files on GitHub

Your Coding DNA: The Three Files That Shape Every Line Claude Writes

Abhishek Pandit — Wed, 10 Jun 2026 15:03:52 +0000

Part 5 of 7 · Series: Building Your AI Developer Handbook · GitHub

What User Preference Files Are

Feedback files record mistakes. User preference files record who you are as a developer.

"Two carpenters given the same wood will build different tables — same tools, same materials, different hands, different results. User preference files are what make Claude build YOUR table, not the generic one."

Three preference files cover the three decisions you make on every single feature:

How you organize code — architecture
How you manage data — state management
How you verify correctness — testing

Preference 1: Architecture — Feature-Based Folders

/features/auth/
  AuthForm.tsx
  AuthForm.test.tsx
  useAuth.ts
  auth.types.ts
/components/ui/
  Button.tsx
  Input.tsx

Two competing philosophies:

Layer-based (what most tutorials teach):

/components/
/hooks/
/types/
/tests/

Feature-based (what this workflow uses):

/features/auth/
/features/payment/
/features/dashboard/

"In a layer-based structure, adding a 'login' feature means touching four separate folders. In a feature-based structure, you touch one."

Layer-based groups files by what they are. Feature-based groups by what they do.

The difference becomes obvious when you need to delete a feature:

Layer-based: hunt through /components, /hooks, /types, /tests — find and delete each file individually, hope you didn't miss anything
Feature-based: delete the /features/auth/ folder. Done.

The shared primitives rule:
Buttons, inputs, modals — building blocks shared across all features — live in /components/ui/ only. Never copy a Button into a feature folder.

"The kitchen is shared. Your desk is yours."

No barrel exports on large modules:

// Avoid on large modules — slows TS server, hurts tree-shaking
export { AuthForm } from './AuthForm'
export { useAuth } from './useAuth'

Import directly from the file. The extra path is worth it.

Preference 2: State Management — The Ladder

1. useState    — local UI state, one component
2. Zustand     — shared client state across components  
3. TanStack Query — anything from a server

"State management is like choosing a vehicle. A bicycle for the corner shop, a car for the city, a truck for cross-country. The mistake is driving a truck to the corner shop."

Step 1: useState — Start Here, Always

const [isOpen, setIsOpen] = useState(false)

If state is local to one component and doesn't need to be shared — useState is perfect. Simple, readable, zero overhead.

"Don't add complexity until complexity is required."

Step 2: Zustand — Shared Client State Only

// Good: client-only shared state
const useUIStore = create((set) => ({
  sidebarOpen: false,
  toggleSidebar: () => set((s) => ({ sidebarOpen: !s.sidebarOpen })),
}))

When state needs sharing across multiple components and it's client-only — not from a server — Zustand is right. Modal state, sidebar flags, user preferences.

The hard rule: never put server data in Zustand.

// WRONG — server data in Zustand
const useUserStore = create((set) => ({
  user: null,
  fetchUser: async () => {
    const user = await api.getUser()
    set({ user })
  }
}))

This creates a second copy of the data. They get out of sync. You add refresh logic. Then invalidation logic. Then loading states. You've just rebuilt TanStack Query, badly.

Step 3: TanStack Query — Anything From a Server

// RIGHT — server data belongs here
const { data: user } = useQuery({
  queryKey: ['user'],
  queryFn: () => api.getUser()
})

Loading states, error states, caching, background refetching, cache invalidation — all free.

"Zustand is a drawer. TanStack Query is a real-time window to the outside world. Don't put windows in drawers."

Preference 3: Testing — The Pyramid

Unit tests:        pure logic, no side effects, no internal mocks
Integration tests: real database, real data flows
E2E tests:         critical paths only (login, checkout, core flow)

"Testing is like quality control on a car assembly line. You don't test the whole car every time you tighten one bolt. You test the bolt, then the subsystem, then the full car before it ships."

Unit Tests — Pure Logic Only

// Perfect unit test — pure function, no external dependencies
test('formats currency correctly', () => {
  expect(formatCurrency(1000, 'USD')).toBe('$1,000.00')
})

Belongs here: validators, formatters, reducers, pure utility functions.
Doesn't belong here: anything touching a database, API, filesystem, or network.

Integration Tests — Real Database, Always

"A flight simulator is great for training. But the first time you land a real plane, you discover the simulator lied about the crosswind."

Mocked database tests pass even when the real migration fails. Integration tests must use a real test database. Yes, they're slower. That's the point — they're testing real behavior.

E2E Tests — Critical Paths Only

E2E opens a real browser and clicks through the UI. Most accurate, slowest, most brittle.

"You don't test every road in the country to verify the highway exists. You just drive the highway."

E2E for: login, checkout, the one flow that makes you money. Not for edge cases — those belong in unit and integration tests.

How These Three Work Together

When Claude builds a new feature with these files loaded, it automatically:

Creates /features/feature-name/ — not scattered layer folders
Starts with useState, reaches for Zustand only if state needs sharing
Writes unit tests for logic, integration tests for data layer
Never adds barrel exports or server data to Zustand

You don't say any of this. It's already loaded.

"The best rule is one you never have to repeat."

Key Takeaway

User preference files transform Claude from a generic code generator into a collaborator that builds code the way you would build it.

Architecture: feature-based, co-located, no barrel exports
State: useState → Zustand → TanStack Query, never server state in Zustand
Testing: unit for logic, real DB for integration, E2E for critical paths only

Three files. Every feature they shape.

Next: Part 6 — Context is King: How Project Files and Templates Keep Claude on Track

All workflow files on GitHub

Battle Scars as Rules: Inside the Feedback Files

Abhishek Pandit — Wed, 10 Jun 2026 14:52:10 +0000

Part 4 of 7 · Series: Building Your AI Developer Handbook · GitHub

What Are Feedback Files?

Every rule in a feedback file was born from either a mistake or a win.

"The safety rules on a construction site weren't written by lawyers. They were written in blood — each rule marks the spot where something went wrong."

A feedback file looks like this:

**Rule:** [what to do or not do]
**Why:** [the incident or reason behind it]
**Apply:** [when and where this kicks in]

The Why is the most important part. Without it, you follow rules blindly. With it, you can judge edge cases — "does this situation actually trigger this rule, or is it different enough?"

Here are 8 real feedback rules, what they say, and the story behind each one.

Rule 1: Never Mock the Database in Tests

Rule: Integration tests must hit a real database. No mocks.
Why: Mocked tests passed. Prod migration broke.
Apply: Never mock the DB layer in integration tests.

"A fire drill with a fake fire teaches you nothing about real smoke."

Tests were passing. The mock was set up correctly. The code looked clean. But the mock didn't simulate a specific edge case in the actual database — a constraint that only exists in production. The migration ran, the constraint triggered, and production broke while all tests were green.

The lesson: mocks are lying witnesses. They tell you what you told them to say, not what the real system does.

For unit tests (pure functions, no side effects) — mocks are fine. For integration tests that touch data — always use a real test database.

Rule 2: No useCallback/useMemo by Default

Rule: Don't add useCallback/useMemo unless:
  1. The child is wrapped in React.memo
  2. React Profiler shows a real re-render problem
Why: Premature optimization clutters code and rarely helps.
Apply: Strip default memoization in code review.

"Putting armor on a bicycle because 'it might get hit by a car' doesn't make it safer — it just makes it heavier and harder to ride."

Developers add useCallback/useMemo "just in case" — wrapping every handler by default. Result: harder-to-read code with reference tracking overhead, solving performance problems that don't exist.

React's default re-render behavior is fast. Most re-renders take under 1ms. The profiler will tell you when something is actually slow. Optimize then — not before.

Rule 3: Never Write API Tokens Into Config Files

Rule: Never write tokens, passwords, or secrets into settings.json or any config file.
Why: Config files are plaintext. Plaintext gets committed. Committed secrets get stolen.
Apply: Use .zshrc, secrets managers, or wrapper scripts instead.

"Writing your password on a sticky note is convenient until someone walks past your desk."

settings.json files are convenient — set them once, forget. But they're plaintext files that often end up in git. Bots scan GitHub constantly for exposed credentials. An exposed key can be exploited within minutes.

The safe pattern:

Config files → key names only, no values
.env → values, never committed, in .gitignore
Shell profile (.zshrc) → exported env vars that tools pick up automatically

Rule 4: CLAUDE.md — Global vs Project

Rule: Global CLAUDE.md = universal rules only. Stack-specific rules go in project CLAUDE.md.
Why: Hardcoding React/TypeScript rules globally broke Python project sessions.
Apply: Reject stack-specific content in global CLAUDE.md.

"The company handbook says 'be on time.' It doesn't say 'use a blue pen' — that's a department rule."

One CLAUDE.md tried to cover everything: TypeScript rules, React patterns, pnpm commands, Zod schemas. Then a Python project opened. Suddenly Claude was suggesting pnpm commands for a pip project.

The fix: two levels. Global = who you are. Project = what this stack is.

Rule 5: Dependency Protocol — Check Before You Add

Rule: Before any new dependency:
  1. bundlephobia → check bundle size impact
  2. pnpm audit → check known vulnerabilities
  3. Repo activity → last commit within 1 year?
Why: Dependencies are long-term liabilities.
Apply: Include audit command whenever suggesting pnpm add.

"Adopting a pet is easy. Feeding it for 10 years is the commitment you're actually making."

Every dependency you add is a dependency you maintain — security patches, breaking changes, version conflicts. A package that solves a problem today but gets abandoned tomorrow is a liability.

Three checks, 2 minutes, every time. The cost of skipping: potentially hours debugging a supply chain issue.

Rule 6: Error Handling — Never Swallow Silently

Rule: Every catch block must at minimum console.error(err).
      User-visible errors go to toast or error boundary.
Why: Silent failures look like missing features — hardest bugs to diagnose.
Apply: Never write catch (e) { /* ignore */ }

"A smoke alarm with dead batteries doesn't mean there's no fire. It means you won't know about it until it's too late."

catch (e) {} — empty catch blocks are the most dangerous pattern in production code. An error occurred. Something failed. And you told the computer to pretend nothing happened.

The user sees a broken UI with no error message. You see no logs. The error is invisible. You spend three hours debugging something that would have taken three seconds with a console.error.

Minimum: log it. Better: show a toast. Best: let it bubble to an error boundary.

Rule 7: Skip Removed Tools Silently

Rule: If a tool has been deliberately removed, never suggest or reference it.
Why: The removal was intentional. Asking about it is friction.
Apply: Skip silently, continue with available tools.

"If a chef removes an ingredient from their kitchen, they don't want the sous-chef asking 'but what about the cilantro?' every time they cook."

Sometimes you remove an integration — for cost, privacy, or preference. The decision was deliberate. You don't want Claude asking about it or suggesting you re-add it every session.

Rule 8: Separate Work and Personal Git Identities

Rule: Always use the correct git identity (work vs personal) for the project context.
      Never add AI co-author lines to commits.
Why: Work and personal projects need separate attribution.
Apply: Confirm user.email matches context before every commit.

"You wouldn't sign a personal letter with your work signature."

Two contexts, two identities. The wrong email in a commit doesn't just look sloppy — it can create audit trail problems. The co-author rule: commit history should reflect human authorship decisions.

Building Your Own Feedback Files

---
name: feedback_rule_name
description: one-line summary
metadata:
  type: feedback
---

**Rule:** [what Claude should do or not do]

**Why:** [the story — what happened or what you observed]

**How to apply:** [when does this rule trigger]

The trigger for writing a new one: if you've corrected Claude twice for the same thing, it belongs in a feedback file.

"The first mistake is an accident. The second is a pattern. The third is a choice."

Key Takeaway

Feedback files are the institutional memory of your AI workflow. They capture hard-won knowledge that would otherwise reset every session. Each rule has a story. Each story prevents a future mistake.

Next: Part 5 — Your Coding DNA: The Three Files That Shape Every Line Claude Writes

All workflow files on GitHub

Teaching an AI to Never Forget: How the Memory System Works

Abhishek Pandit — Wed, 10 Jun 2026 14:45:23 +0000

Part 3 of 7 · Series: Building Your AI Developer Handbook · GitHub

The Goldfish Problem

By default, every Claude session starts completely fresh. No memory of last week's conversation. No memory of the rule you explained three times. No memory of the mistake you made together and fixed together.

"Imagine if your doctor forgot everything about you every time you walked into the clinic. You'd spend 10 minutes re-explaining your history before they could help you."

That's Claude without memory files.

The memory system fixes this. It's a folder of plain markdown files that Claude reads at the start of every session — carrying forward the lessons, preferences, and decisions that would otherwise reset.

Where Memory Lives

~/.claude/projects/your-project/memory/
  MEMORY.md              ← the index (loads automatically every session)
  feedback_rules.md      ← lessons from mistakes
  user_preferences.md    ← how you like to work
  project_context.md     ← what's happening in the project right now
  reference_links.md     ← where to find things outside the project

Think of it like a filing cabinet:

MEMORY.md is the table of contents — one line per memory, always loaded
Each individual file is a folder in the cabinet — loaded when relevant

The Four Types of Memory

1. Feedback Memory — "Rules Born From Mistakes"

"Every rule in a safety manual was written in response to an accident." — Aviation saying

These files store corrections and confirmations. Every time Claude does something you didn't want — or something you want it to repeat — that lesson becomes a feedback file.

Example: You discover mocked database tests passed while a real production migration failed. You tell Claude: "never mock the database in tests." That becomes a rule that applies to every future session.

**Rule:** Integration tests must hit a real database.
**Why:** Mocked tests passed; prod migration broke.
**Apply:** Never suggest mocking the DB layer in tests.

2. User Memory — "How You Think and Work"

"A good assistant doesn't just do tasks — they understand how their manager thinks."

These files capture your preferences, style, and philosophy — folder structure, state management approach, testing philosophy. Claude reads these and adjusts its suggestions to match your way of working.

3. Project Memory — "What's Happening Right Now"

"Context switches are expensive. The more context you can offload to a file, the less you re-explain every session."

These files track ongoing work — what feature is being built, why a particular decision was made, what the deadline is. Without these, you start every session re-explaining the backstory.

4. Reference Memory — "Where to Look Things Up"

Simple pointers to external resources:

"Bug tracker is at Linear project INGEST"
"Design tokens are in Figma file XYZ"
"Oncall dashboard is at grafana.internal/api-latency"

"A good assistant knows where the filing cabinet is. They don't memorize every document — they know where to look."

The MEMORY.md Index

# Memory Index

- [No DB mocks](feedback_db_testing.md) — integration tests use real DB; mocks missed prod migration bug
- [State ladder](user_state_management.md) — useState→Zustand→TanStack Query; no server state in Zustand
- [Current feature](project_auth_sprint.md) — building OAuth login, deadline 2026-06-20

This is the only file that loads automatically on every session. Intentionally short — one line per memory.

"A good index tells you whether you need to open the drawer. You shouldn't need to read the whole drawer just to find out."

Each line has three parts:

The name (links to the full file)
A one-line hook (enough to know if it's relevant right now)

How It Works in Practice

Session 1 (no memory):

You: "How should I structure the auth feature?"
Claude: [gives generic advice based on training data]
You: "I use feature-based folders, not layer-based"
Claude: [adjusts, gives feature-based advice]

Same question, Session 2 (with memory):

[Claude reads: "feature-based folders — /features/auth/, /features/payment/"]
You: "How should I structure the auth feature?"
Claude: [immediately gives feature-based advice — no re-explaining needed]

What NOT to Put in Memory

Memory files are for things that aren't in the code itself.

Already exists	Don't memorize
Code patterns	They're in the code
Git history	`git log` knows
File structure	Read the filesystem
Debugging solutions	Fix is in the commit

"A post-it note on your monitor says 'check Figma before coding UI.' It doesn't copy the entire Figma file onto the post-it."

Memory files decay. The older they are, the more likely they describe something that's changed. The Why field is what tells you whether a memory is still load-bearing.

Starting Your Own Memory System

Start with one file: feedback.md. Every time you correct Claude, add a line:

- Don't use barrel exports in large modules — slows TS server
- Always check bundlephobia before adding a dependency  
- State management order: useState first, Zustand only if shared

Over a few weeks you'll have a precise record of how you like to work. Split into separate files when it gets large. Build the MEMORY.md index.

Key Takeaway

Memory files turn Claude from a stateless tool into a contextual collaborator.

The difference between "Claude that needs re-teaching every session" and "Claude that gets better the more you use it" is a folder of markdown files.

"Experience is just lessons that were written down. Wisdom is lessons that were read again."

Next: Part 4 — Battle Scars as Rules: Inside the Feedback Files

All workflow files on GitHub