DEV Community

Nova
Nova

Posted on

The Triage Prompt: Debug Your Assistant’s Output in 5 Minutes

You know the feeling: you ask your assistant for something “simple” — a shell one-liner, a refactor, a short spec — and the output is… almost right.

Close enough to be tempting.

Wrong enough to waste the next 30 minutes in back-and-forth.

Over time I noticed a pattern: when an assistant misses, it usually misses in predictable ways. And you can surface the root cause quickly if you triage the failure like you would a bug report.

This post is a practical 5-minute routine (plus a ready-to-copy prompt) I use to get from “hmm, not quite” to “okay, now we’re cooking” — without spiraling into an endless chat.


The core idea: treat a bad answer like a failing test

When humans debug, we don’t just say “it’s broken.” We try to answer:

  • What exactly is wrong?
  • What did we expect instead?
  • What constraints were assumed but not stated?
  • What’s the smallest example that reproduces it?

Assistants respond well to the same structure.

The “triage prompt” below forces three things:

  1. A precise problem statement (expected vs actual)
  2. A minimal reproducible example (MRE) or smallest input
  3. A plan of attack before proposing changes

That combination cuts hallucinated detours dramatically.


The 5-minute triage routine

Minute 1: Write the failure in one sentence

This is the single highest leverage step.

Bad: “This doesn’t work.”

Good: “The command deletes nested folders; I need it to delete only empty folders.”

Good: “The refactor changed behavior: null should remain null, not become "".”

If you can’t describe the failure in one sentence, the assistant can’t either.

Minute 2: Provide a tiny example input/output

Don’t paste your whole repo. Don’t paste your whole log.

Give the assistant something it can simulate in its head.

Example (date parsing bug):

  • Input: "2026-03-07 16:30"
  • Expected: 2026-03-07T16:30:00+01:00
  • Actual: Invalid Date

Example (SQL query):

  • Tables: users(id), orders(user_id, created_at)
  • Expected: users with no orders in last 30 days
  • Actual: query returns users with orders

Minute 3: Lock the constraints (the silent assumptions)

Most “wrong answers” are missing constraints.

Write the assumptions as bullet points:

  • OS: macOS / Ubuntu / Windows
  • Tooling: bash vs zsh, ripgrep vs grep, Python 3.12, Node 22
  • Safety: no destructive actions, dry-run first
  • Style: keep changes minimal, keep public API stable

If you care about any of these and you don’t state them, the assistant will guess.

Minute 4: Ask for a plan + checks before the solution

This is the second highest leverage step.

Instead of “fix it”, ask:

  • “Explain what’s wrong.”
  • “List 2–3 candidate fixes.”
  • “Pick one and tell me what you’ll verify.”

This forces the assistant to reason in steps (and gives you a chance to stop a bad direction early).

Minute 5: Add one negative test (“don’t break this”)

If you only give one example, the assistant will overfit.

Add a second example that must not change.

Example (regex extraction):

  • Case A should match: ABC-123
  • Case B should not match: ABC-1234

Example (API mapping):

  • null stays null
  • empty string stays empty string

These “guardrail examples” dramatically improve reliability.


The copy/paste Triage Prompt

Use this when an output is close-but-wrong:

You’re helping me debug your previous answer.

1) Restate the task in your own words.
2) Identify what is wrong with the previous output (expected vs actual).
3) Ask up to 3 clarifying questions ONLY if needed.
4) Propose 2–3 candidate approaches, with pros/cons.
5) Pick the safest approach that fits the constraints.
6) Produce the corrected output.
7) Verify against the examples and the “do-not-break” cases.

Context:
- Goal: <one sentence>
- Constraints:
  - <bullets>
- Example (expected vs actual):
  - Input:
  - Expected:
  - Actual:
- Do-not-break case:
  - Input:
  - Expected:

Output format:
- Start with a short plan.
- Then the final answer.
- Then a checklist showing the examples were satisfied.
Enter fullscreen mode Exit fullscreen mode

I like this prompt because it makes the assistant commit to a plan, and it makes verification explicit.


Concrete example 1: a safe file cleanup command

Say you asked for a cleanup command and got something scary.

Your triage input

Goal: Delete only empty directories under ./tmp.
Constraints:
- Must be safe: show what will be deleted first.
- Must not delete files.
- Must work on Linux.
Example:
- Input: directory tree under ./tmp
- Expected: only directories with no files inside are removed
- Actual: previous command deleted non-empty directories
Do-not-break:
- If a directory contains a hidden file like .keep, it must NOT be deleted.
Enter fullscreen mode Exit fullscreen mode

A good corrected answer (what you should get)

Plan: list empty dirs → confirm → delete.

# 1) Preview: list truly empty directories
find ./tmp -type d -empty -print

# 2) Delete: only after you’ve reviewed the list
find ./tmp -type d -empty -print -delete
Enter fullscreen mode Exit fullscreen mode

Verification checklist:

  • Only directories: yes (-type d)
  • Only empty: yes (-empty)
  • Hidden file .keep prevents deletion: yes (dir not empty)

Notice how the constraints force a safer, two-step flow.


Concrete example 2: “small refactor, no behavior change”

Refactors are where assistants love to get creative.

Your triage constraints might be:

  • Keep function signature unchanged
  • No new dependencies
  • Preserve behavior for null, undefined, and empty strings
  • Add a small test table in the answer

A strong response will:

  1. Explain what changed behaviorally
  2. Propose a minimal patch
  3. Include a truth table like:
input      expected
null       null
undefined  undefined
""         ""
"0"        "0"
"  "       "  "
Enter fullscreen mode Exit fullscreen mode

If the assistant can’t produce that table, it probably hasn’t understood the edge cases.


Why this works (and when it doesn’t)

This routine works because it replaces vague feedback (“wrong”) with the artifacts assistants can actually use:

  • a crisp goal
  • constraints
  • examples
  • guardrails

It won’t magically fix:

  • missing domain knowledge
  • tasks that require access to your private environment
  • ambiguous requirements you haven’t decided yet

But it will prevent the common time sinks: thrashing, overfitting to one example, and solutions that ignore constraints.


A small habit that compounds

The next time an answer is 80% correct, don’t restart the conversation.

Triage it.

Write the failure sentence. Add one tiny example. Add one do-not-break case.

You’ll get better answers and you’ll get better at stating requirements — which pays off whether you’re working with an assistant or a human teammate.

If you try this routine, I’d love to hear what kind of failures it fixed fastest for you.

Top comments (1)

Collapse
 
nyrok profile image
Hamza KONTE

The triage approach is the right mental model — debugging AI output is about isolating which part of the prompt is responsible for the bad output, not rewriting everything from scratch. Structured prompts with separate role/objective/constraints/output-format blocks make this much faster: you can swap one block at a time and see exactly what changed. flompt.dev / github.com/Nyrok/flompt