If you’ve ever tried to “one-shot” a big task with an LLM (generate a feature, refactor a module, write a spec, produce tests) you’ve probably seen the failure modes:
- it misses requirements
- it invents APIs
- it contradicts itself halfway through
- it produces something that looks right but doesn’t compile
The fix isn’t “a better model”. The fix is prompt chaining: break the work into small, verifiable steps where each step produces an artifact you can validate.
This post shows a practical, developer-friendly way to build reliable chains you can run manually or automate.
What is prompt chaining?
Prompt chaining is turning a vague goal into a sequence of prompts where:
- each step has a narrow objective
- each step outputs a structured artifact (bullets / JSON / diff / test list)
- you validate it (quick sanity check, linter, unit tests, schema validation)
- the next step consumes the validated output
Think of it as a tiny pipeline: Plan → Specify → Implement → Verify → Polish.
The 5-step chaining template (steal this)
Step 1) Clarify + constraints
Goal: get assumptions out into the open.
Prompt
You are a senior engineer. Ask me up to 7 clarifying questions.
Context:
- Project: <…>
- Goal: <…>
Constraints:
- Must not break: <…>
- Non-goals: <…>
After questions, propose 2-3 possible approaches with tradeoffs.
Output as:
1) Questions
2) Approaches (pros/cons)
Why it works: most “bad” output is caused by hidden constraints.
Step 2) Produce a small spec (structured)
Goal: turn the final approach into something you can implement.
Prompt
Write a mini-spec for the chosen approach.
Include:
- API changes
- Data model changes
- Edge cases
- Error handling
- Observability (logs/metrics)
Output strictly as JSON matching this schema:
{
"summary": "string",
"acceptance_criteria": ["string"],
"interfaces": [{"name":"string","inputs":"string","outputs":"string"}],
"edge_cases": ["string"],
"risks": ["string"],
"test_plan": ["string"]
}
No extra keys.
Now you can validate: does this JSON cover the real requirements?
Step 3) Implement in small diffs
Goal: avoid “here’s 400 lines” output.
Prompt
Implement the spec, but only output ONE commit-sized diff.
Rules:
- Keep changes under ~150 lines.
- Prefer smallest working increment.
- Output as a unified diff.
- Do not change unrelated formatting.
Spec JSON:
<PASTE STEP 2 JSON>
Repository notes:
- Language: <…>
- Testing: <…>
If you want to be extra strict, ask for a diff per file or “one function at a time”.
Step 4) Verify (tests + self-review)
Goal: make the model become your reviewer.
Prompt
Review the diff as if you were doing a production PR review.
Output:
- 5-10 review comments (must reference exact lines/areas)
- Security concerns
- Performance concerns
- Missing tests
Then propose a follow-up diff with fixes (unified diff).
The key is that this step is allowed to be critical.
Step 5) Polish for humans
Goal: developer ergonomics.
Prompt
Given the final code, write:
- a short PR description
- a changelog entry
- 1-2 usage examples
Keep it concise. No marketing fluff.
A real example: chaining a "code review" workflow
Let’s say you want an LLM-assisted review checklist tailored to your repo.
Step 1 output (questions)
You answer questions like: “TypeScript? Node version? Lint rules? Testing framework? Typical bugs?”
Step 2 output (JSON spec)
You get a structured checklist and can tweak it.
Step 3 output (diff)
You add something like .github/pull_request_template.md and a small scripts/review-check.ts.
Step 4 output (review)
It points out missing cases (e.g. “you forgot to enforce timezone-safe date parsing”).
Step 5 output (docs)
It writes a short README section.
This sounds trivial, but the workflow scales to bigger tasks: migrations, refactors, new endpoints, even “write tests for this module”.
Automation tip: glue steps with a tiny script
You don’t need a huge framework. Even a small Node script can keep your chain consistent.
// pseudo-code
const steps = [
{ name: 'clarify', prompt: load('01-clarify.txt') },
{ name: 'spec', prompt: load('02-spec-json.txt') },
{ name: 'diff', prompt: load('03-implement-diff.txt') },
{ name: 'review', prompt: load('04-review.txt') },
]
let context = { goal: process.argv[2] }
for (const s of steps) {
const out = await llm({ prompt: render(s.prompt, context) })
save(`out/${s.name}.md`, out)
context[s.name] = out
}
The secret sauce is saving intermediate outputs so you can compare runs and debug where things went wrong.
Common chaining mistakes (and quick fixes)
- No validation step → add a “schema + strict JSON” output and validate it.
- Steps are too big → enforce line limits and “one diff only”.
- Context drift → paste the spec JSON into every downstream step.
- Ambiguous roles → explicitly say “act as senior engineer / reviewer / SRE”.
TL;DR
If you want reliable LLM output:
- Ask clarifying questions
- Generate a structured mini-spec
- Implement in small diffs
- Review + fix
- Polish for humans
You’ll get less "magic" and a lot more shippable work.
If you want more copy-pasteable templates like these, I’m building a Prompt Engineering Cheatsheet at Nova Press.
Grab the free sample here: https://getnovapress.gumroad.com
Top comments (0)