DEV Community

Nova
Nova

Posted on

Prompt Acceptance Criteria: the fastest way to get reliable AI outputs

If you’ve ever thought “the model ignored what I said”, there’s a good chance you didn’t actually say it in a way that can be checked.

Most prompts describe intent (what you want) but skip acceptance criteria (how to tell it’s done). In software, we don’t ship “make it nicer” — we ship “passes these tests”. Prompting works the same way.

This post is a practical pattern I use constantly:

Write the acceptance criteria first. Then write the prompt.

It sounds boring. It’s also the fastest way to turn flaky outputs into repeatable results.


What “acceptance criteria” means for prompts

Acceptance criteria are verifiable requirements for the output. They answer:

  • What must be included?
  • What must be excluded?
  • What format should the answer have?
  • What quality bar should it meet?
  • How will I quickly review it?

When you include these, you’re not “overprompting”. You’re giving the model a target it can aim at.

A simple mental model:

  • Intent: “Summarize this doc.”
  • Criteria: “7 bullets, each starts with a verb, cite section numbers, no more than 140 characters per bullet.”

Intent gets you something. Criteria gets you something you can ship.


The 5-part template (copy/paste)

Here’s a template that stays short but does real work:

Task:
- …

Context:
- …

Acceptance criteria:
- Must …
- Must …
- Must not …

Output format:
- …

Self-check:
- Before answering, verify each acceptance criterion is satisfied. If not, revise.
Enter fullscreen mode Exit fullscreen mode

Two notes:

  1. “Must not” is underrated. It prevents the most common failure mode: the model “helpfully” adding fluff.
  2. Self-check isn’t magic, but it nudges the model to re-read the spec before it hits send.

Example 1: Turn meeting notes into an actionable update

Vague prompt

Turn these notes into a status update.

You’ll get a wall of text, inconsistent structure, and “maybe” language.

Spec-driven prompt

Task:
Turn the meeting notes below into a status update for Slack.

Context:
- Audience: engineering + product
- Tone: calm, confident, factual

Acceptance criteria:
- Exactly 6 bullet points
- Each bullet starts with a bold label: **Done**, **In progress**, **Risk**, **Decision**, **Next**, **Ask**
- Mention owners by @name when present in the notes
- Do not invent dates, owners, or scope

Output format:
- Plain text (no markdown headings)

Self-check:
- Verify each bullet is present and no info was invented.

Notes:
[PASTE NOTES HERE]
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • The model is forced into a stable shape.
  • Review becomes fast: you scan 6 bullets and ship.
  • The “do not invent” rule prevents the most expensive kind of hallucination: made-up commitments.

Example 2: Refactor code without breaking behavior

A classic failure is “refactor this” → the model changes semantics.

Spec-driven refactor prompt

Task:
Refactor the function below for readability.

Context:
- Language: TypeScript
- Keep external behavior identical

Acceptance criteria:
- No change to function signature
- No change to return values for any input
- Keep time complexity the same or better
- Add 3 short comments explaining non-obvious logic
- Provide a minimal test snippet with 4 cases (happy path + edge cases)

Output format:
- 1) Refactored code
- 2) Test snippet
- 3) Brief explanation (max 120 words)

Code:
[PASTE CODE]
Enter fullscreen mode Exit fullscreen mode

Two things happen here:

  1. You’ve defined “readability” in concrete terms (comments + tests).
  2. You’ve created a safety rail: signature + return values + complexity.

Even if you don’t run the tests, forcing the model to write them makes it less likely to drift.


Example 3: Research without the confident nonsense

Research prompts often fail because they ask for “the answer” instead of “the evidence”.

Spec-driven research prompt

Task:
Give me a quick comparison of 3 approaches to X.

Context:
- I’m deciding what to implement this week.

Acceptance criteria:
- Provide 3 options, each with: pros, cons, when to use, and a 1-sentence risk
- Include at least 1 primary source link per option (docs, RFC, official blog)
- If you are unsure about a claim, label it as uncertain
- Do not include secondary "SEO" roundup articles unless they add unique details

Output format:
- A table, then a 5-bullet recommendation

Topic:
X = [DEFINE X]
Enter fullscreen mode Exit fullscreen mode

This doesn’t stop mistakes entirely, but it changes the model’s job from “perform certainty” to “show your work”.


A quick rubric for writing good criteria

When you’re stuck, use this checklist:

  1. Countable: Can I count it? (bullets, steps, examples, words)
  2. Constrained: What should it not do?
  3. Comparable: If I got two answers, can I pick the better one quickly?
  4. Checkable: Can I verify without additional back-and-forth?
  5. Cheap to review: Can I approve it in under a minute?

If your criteria fail #5, you’ll stop using them. Keep them tight.


The “two-pass” upgrade (optional)

For high-stakes outputs, I add one more move:

  1. Pass 1: produce the output.
  2. Pass 2: run a criteria audit and patch the gaps.

You can do it in a single prompt:

First, produce the output.
Then, list each acceptance criterion with ✅/❌ and revise the output until all are ✅.
Only return the final output.
Enter fullscreen mode Exit fullscreen mode

It’s not perfect, but it’s a practical way to catch missing sections and format drift.


Closing thought

Prompting gets dramatically easier when you stop asking for “a good answer” and start asking for an answer that passes a spec.

Next time you’re about to write a prompt, try this:

  • Write 3–7 acceptance criteria.
  • Add one “must not”.
  • Define the output format.

You’ll spend 30 seconds more upfront and save minutes of cleanup — every single time.

Top comments (1)

Collapse
 
nyrok profile image
Hamza KONTE

"Acceptance criteria for prompts" is a framing I hadn't heard before but it makes total sense — you need a definition of done that isn't just "it looks okay to me".

I've been approaching this by decomposing prompts into typed blocks (role, constraints, examples, output format) so each piece can be verified independently. Harder to have a vague constraint when you've isolated it to its own field. Built flompt.dev around this idea — repo at github.com/Nyrok/flompt.