I've watched a lot of engineers buy a 500-prompt pack, use 12 of them, and quietly conclude the pack was a scam.
It usually wasn't. The pack was fine. What broke was the gap between copy a prompt and understand why it works — because the moment your task drifts even slightly from what the pack author had in mind, you have no idea which knob to turn.
This is a post about the knobs.
The actual problem with prompt packs
A prompt pack is a list of outputs without the function. It's like getting a folder of compiled binaries with no source. Works great when the input matches exactly. Useless the second you need to change one parameter.
The failure mode I see most often:
User: <pastes "Code Reviewer" prompt from a pack>
User: review this function for me
AI: <generic review, misses the actual concerns>
User: ...the pack is bad?
No. The pack assumed a context the AI doesn't have. The reviewer prompt was probably written for a TypeScript backend, you handed it Rust, and the model is now half-pretending it knows your idioms because nobody told it not to.
The fix isn't a better pack. The fix is knowing what every prompt is doing under the hood.
The five things every working prompt does
After reviewing hundreds of prompts that worked and hundreds that didn't, I reduced it to five components. Every prompt that consistently produces quality output has all five. Every prompt that misfires is missing at least one.
I'll use the acronym RCFEO because it sticks: Role, Context, Format, Examples, Output.
1. Role
Not "act as an expert." That's noise. The role component is about constraining the failure modes the model defaults to.
Default GPT/Claude wants to be agreeable, comprehensive, and gentle. Those are bad defaults for code review. Good role framing flips them:
You are a senior backend engineer who has rejected three of my PRs this
month. You are not gentle. You assume my code has bugs until proven otherwise.
You flag concerns by severity (P0/P1/P2) and refuse to file P3+ "nice to have"
feedback unless explicitly asked.
The specificity of "rejected three of my PRs" matters more than "senior backend engineer." The first sets a behavior; the second is a costume.
2. Context
This is the one that breaks pack prompts. A pack prompt has generic context placeholders. Your real context never matches.
Good context is:
- What this code/system actually is ("Rust HTTP server, Axum, Postgres, ~12k LOC")
- What it does NOT do ("no async runtime debugging — that's already locked in")
- What the reader/runner already knows ("I wrote this; you don't need to teach me Rust")
- What the constraint is ("this PR can't change the public API")
The "does NOT" line is the secret. LLMs over-help. They'll suggest refactoring your async runtime when you asked about a SQL query. Cutting their scope upfront saves 60% of the back-and-forth.
3. Format
Most prompts say something like "give me a list." That's an instruction, not a format spec. A format spec looks like this:
Return:
- One <Issue> block per concern
- Each <Issue> has: severity (P0/P1/P2), file:line, 1-sentence summary,
3-sentence rationale, suggested fix as a code diff
- Sort by severity descending, file:line ascending within severity
- Maximum 8 issues. If you find more, return only the top 8.
Why bother? Because once the format is locked, you can pipe the output into a parser, a script, a Linear ticket, a markdown doc. Free-form output is for chat. Structured output is for workflows.
4. Examples
This is the single highest-leverage component and the one most ignored.
One example outperforms five paragraphs of instructions, because the model is, fundamentally, a pattern-completion engine. You're not telling it what you want. You're showing it.
Example of a P1 issue I'd expect:
<Issue>
severity: P1
location: src/db/users.rs:142
summary: SQL query is vulnerable to enumeration timing attack
rationale: The login handler returns different latencies depending on
whether the email exists. An attacker can enumerate registered emails
by measuring response times. This is exploitable in production.
fix:
```
diff
- let user = sqlx::query!("SELECT * FROM users WHERE email = $1", email)
- .fetch_optional(&pool).await?;
+ let user = match sqlx::query!(...).fetch_optional(&pool).await? {
+ Some(u) => Some(verify_password(&u.hash, &password)?),
+ None => { dummy_verify(&password)?; None }
+ };
```
</Issue>
A single example like this does what 400 words of "please be detailed and specific" cannot. It pins down severity calibration, format precision, and rationale density in one shot.
5. Output
The final component is the most often skipped: what does the model do when it's done?
Default behavior: keep going. Add caveats. Suggest follow-ups. Apologize for limitations. None of that is what you want.
When finished, output the last </Issue> block and stop.
Do not summarize. Do not suggest follow-ups. Do not ask if I want more.
If you found zero issues worth flagging at P0-P2, output exactly:
NO_ISSUES_FOUND
and stop.
This is also where you put your "escape hatch" — what should the model do when it's uncertain? My default:
If you don't have enough context to assess a concern with confidence,
flag it with severity NEEDS_INFO and list exactly what you'd need to know.
Do not guess.
This one line drops the hallucination rate on review tasks by something like 50% in my experience. Models will guess when they think guessing is the helpful path. Tell them guessing is the unhelpful path.
Putting all five together
Here's a prompt I actually use. It's about 280 words. Annotated by component:
[ROLE]
You are a senior backend engineer who has rejected three of my PRs this
month. You are blunt. You assume my code has bugs until proven otherwise.
[CONTEXT]
Project: Rust HTTP service, Axum framework, Postgres via sqlx, ~12k LOC.
This PR is a 60-line change to the login handler.
I know Rust well — do not explain language features.
In-scope: SQL injection, timing attacks, error handling, observability.
Out-of-scope: async runtime choice, dependency choices, formatting.
[FORMAT]
Return one <Issue> block per concern.
Fields: severity (P0/P1/P2), location (file:line), summary (1 sentence),
rationale (3 sentences max), fix (code diff).
Sort by severity descending. Max 8 issues.
[EXAMPLE]
<Issue>
severity: P1
location: src/auth.rs:142
summary: Login handler vulnerable to timing-based email enumeration.
rationale: Latency differs depending on whether the email exists.
An attacker can enumerate registered users by measuring response time.
Exploitable in production with no special access.
fix:
diff
- let user = sqlx::query!("...").fetch_optional(&pool).await?;
- match sqlx::query!("...").fetch_optional(&pool).await? {
- Some(u) => verify_password(&u.hash, &password)?,
- None => { dummy_verify(&password)?; None }
- }
</Issue>
[OUTPUT]
When done, output the last </Issue> and stop.
Do not summarize, suggest follow-ups, or ask if I want more.
If no P0-P2 issues found, output "NO_ISSUES_FOUND" and stop.
If uncertain about a concern, use severity NEEDS_INFO and list what you need.
[CODE]
<paste the diff here>
That prompt works. Not because the role line is clever — because all five components are present and pulling in the same direction.
What this means for prompt packs
You can buy a pack. I sell a pack. Packs are useful as starter scaffolding.
But every prompt in every pack is just a particular instantiation of these five components. Once you can see the components, you stop being dependent on the pack. You can:
- Adapt any pack prompt to your stack in 90 seconds (rewrite Context, keep the rest)
- Diagnose a misbehaving prompt (which component is wrong?)
- Compose prompts from scratch faster than you can search a pack
- Recognize a bad pack on sight (no Examples? skip it)
This is the difference between owning a cookbook and being able to cook.
Going further
If you want the full version of this — 50 before/after rewrites, the chain-of-thought and self-critique extensions, the printable RCFEO cheat sheet, the practice exercises, and a template for organizing your own prompt library — I packaged it as the AI Prompt Engineering Masterclass. $19, lifetime access, no subscription.
But honestly, even if you never buy it: take the five components, write them on a sticky note, and apply them to the next prompt you write. The sticky note alone will get you 80% of the way there.
The rest is reps.
Top comments (0)