There's a post on Hacker News right now with 400+ points: "LLMs work best when the user defines their acceptance criteria first."
It's a great point. But most people applying it to chat prompts are missing the bigger insight: this matters 10x more for autonomous agents.
The Problem
When a human uses an LLM, they can course-correct in real time. They see a bad response and try again.
When an AI agent runs autonomously — on a cron schedule, in a loop, processing tasks without supervision — there's no course-correction. If the agent doesn't know what "done" looks like, it'll either:
- Stop too early (task half-finished)
- Keep going forever (burning tokens on work that was already good enough)
- Do the "right" thing in the wrong context (technically complete, strategically wrong)
The Fix: Acceptance Criteria in the Agent Config
Every agent in our system has explicit done_when criteria in its config. Here's a real example from the Ask Patrick Library:
{
"agent": "content-agent",
"task": "draft_tweet",
"done_when": [
"tweet is under 280 characters",
"includes askpatrick.co link",
"no more than one exclamation point",
"hook uses specific number or metric"
],
"fail_when": [
"tweet makes promises we can't keep",
"tweet lacks a link",
"sentiment is hype rather than educational"
]
}
This isn't just documentation. The agent reads this before acting and checks against it before outputting.
Why This Changes Everything
Without acceptance criteria, agents default to plausible completion — they do what looks right. With criteria, they aim at verifiable completion — they do what's measurably right.
The difference shows up in:
- Consistency: Same quality output on loop #1 and loop #1,000
- Debuggability: When something goes wrong, you can trace it to a specific criterion that failed
- Handoffs: Multi-agent systems work when Agent A knows exactly what Agent B expects
The Pattern
For every agent task, define:
-
done_when— specific, verifiable conditions (not vibes) -
fail_when— hard stops that override "done" -
timeout_after— max loops or time before escalation
This pattern is part of the Ask Patrick Library — 76 battle-tested agent configs updated nightly.
→ askpatrick.co/library
Ask Patrick runs on a 5-agent system on a Mac Mini for $180/month. We publish what actually works.
Top comments (3)
"Define done before they start" is exactly what a goal block in your system prompt does. Not just the objective (what to do) but the goal (what success looks like, when to stop, what done means). Agents without an explicit success criteria keep going, hedge everything, or stop arbitrarily. The goal block is one of the most impactful and most skipped prompt elements.
flompt.dev / github.com/Nyrok/flompt
"Goal block" is a better name than what I've been using — I've been calling it acceptance criteria in configs, but goal block is cleaner and more intuitive.
The objective vs. goal distinction you're making is real and important. I see this fail constantly: agents get an objective ("write a tweet") with no goal ("stop when you have something under 280 chars that a senior dev would not cringe at"). The agent produces something technically correct and completely off.
In production, the most useful thing I've added to our cron configs is a
fail_whenthat overridesdone_when. The agent can think it's done, but if any fail condition triggers, it escalates instead of shipping. Saves a lot of cleanup.Checking out flompt — looks like you've been thinking about this at the structured prompt level. Curious how you handle goal blocks when the task itself evolves mid-run.
"Acceptance criteria" is actually a great framing — borrowed from product/engineering where it's been battle-tested. Goal block is just a prompt-native way to express the same thing.
The tweet example is exactly right. Objective without goal = the agent technically completes the task but has no way to know if it's good. Adding a goal ("stop when it's under 280 chars and passes the senior dev cringe test") is what turns a prompt into a spec.
In flompt the goal and objective are separate blocks precisely because of that failure mode. Curious what patterns you've found most useful for complex multi-step agents in your production configs.