π Live demo (LOOK Β· UNDERSTAND Β· BUILD): https://dev48v.infy.uk/prompt/day2-chain-of-thought.html
Day 2 of my PromptFromZero series β 50 LLM techniques in 50 days, each visualized with LOOK / UNDERSTAND / BUILD.
Today: Chain of Thought (CoT). The single highest-impact prompt change you can make. Costs nothing. Adds 7 words. Often turns wrong answers into right ones.
The setup
Same problem. Same model. Two prompts.
Roger has 5 tennis balls. He buys 2 cans of 3 balls each.
How many balls does he have now?
Prompt A β "just answer"
β¦questionβ¦ Just answer with the number, nothing else.
Small / older models often answer: 8. Wrong.
Prompt B β Chain of Thought
β¦same questionβ¦ Let's think step by step.
Model writes:
Roger starts with 5 balls.
He buys 2 cans, each holding 3 balls.
2 Γ 3 = 6 new balls.
5 + 6 = 11.
Final answer: 11.
Right.
Same model. Same problem. Seven extra words on the prompt. The accuracy boost on multi-step math problems is consistently massive.
Why it works
LLMs generate one token at a time, each token conditioned on every token that came before. If you ask for the answer with no working, the model has to compress the whole computation into a single number prediction. There's nowhere to "scratch paper".
Chain of Thought forces the model to write the scratch paper out. Each step becomes additional context for the next step. By the time it gets to "Final answer:", the arithmetic is already on the page β anchored to real numbers, not vibes.
More tokens spent = more compute per problem = more reasoning capacity. CoT is literally trading latency for accuracy.
When to use it
| Use CoT | Skip CoT |
|---|---|
| Math word problems | Factual lookups ("What's the capital of France?") |
| Multi-step logical reasoning | Creative writing |
| Cause-and-effect chains | Short summaries |
| Subtle classifications | Code completion |
Heuristic: if you would write scratch-paper math yourself, the model will benefit from CoT.
Build it in 10 minutes
mkdir cot-from-zero && cd cot-from-zero
npm init -y
npm install ai @ai-sdk/google
echo "GOOGLE_GENERATIVE_AI_API_KEY=your_key_here" > .env
Get a free Gemini key at https://aistudio.google.com/apikey (no credit card).
// cot.mjs
import { generateText } from "ai";
import { google } from "@ai-sdk/google";
const model = google("gemini-2.5-flash");
const problem = "Roger has 5 tennis balls. He buys 2 cans of 3 balls each. How many balls does he have now?";
const bad = await generateText({
model,
prompt: problem + "\n\nJust answer with the number, nothing else."
});
const good = await generateText({
model,
prompt: problem + "\n\nLet's think step by step."
});
console.log("=== Without CoT ===\n" + bad.text);
console.log("\n=== With CoT ===\n" + good.text);
node --env-file=.env cot.mjs
Two runs of the same model on the same problem, side by side. The difference is visible immediately.
Levels of CoT
1. Zero-shot CoT (above)
Just add "Let's think step by step." Works on most modern models.
2. Few-shot CoT
Prepend 2-3 worked examples before the question:
Q: Sara had 4 apples and got 2 more. How many?
A: Sara had 4. She got 2 more. 4 + 2 = 6. Answer: 6.
Q: Roger has 5 tennis balls. He buys 2 cans of 3 each. How many balls?
A: [model continues in same format]
Better on harder problems β the model has explicit examples of the reasoning depth you want.
3. Structured CoT
Force a format:
"Solve this. Number your steps 1, 2, 3. Final answer on a new line starting 'Answer:'."
Easier to parse programmatically.
4. Hidden CoT
Generate the chain, then strip it before showing the user:
const reply = result.text;
const clean = reply.replace(/<thinking>[\s\S]*?<\/thinking>/g, '').trim();
User sees just the answer; the model gets the accuracy benefit.
What about reasoning models?
GPT-5, Claude 4 Sonnet, o1, o3, Gemini 2.5 β modern flagship models train with reasoning baked in. They don't need "let's think step by step." They do it automatically.
But:
- They cost 10Γ more per token
- They're slower (visible "thinking..." UI)
- They're overkill for simple tasks
Cheap model + CoT prompt β reasoning model output, at ~10% of the cost. CoT is still the highest-leverage technique you can use on small models.
What this unlocks
CoT is the foundation. Every fancier reasoning technique builds on top:
- Self-consistency β sample N CoT runs, take majority vote
- ReAct β CoT + tool calls interleaved (Day 1)
- Tree of Thoughts β branch CoT into multiple paths, evaluate
- Reflection β generate, criticize own output, regenerate
Master CoT first. Everything else is variations.
Try it now
Three tabs on one page:
https://dev48v.infy.uk/prompt/day2-chain-of-thought.html
- LOOK β animated side-by-side trace of both prompts
- UNDERSTAND β 8 click-through steps on why CoT works
- BUILD β copy the code, run it on your machine
What's next in PromptFromZero
Day 3: Self-consistency. Sample 5 CoT runs, take majority vote. Same model, even higher accuracy.
Series: 50 LLM techniques Β· 50 days Β· Vercel AI SDK throughout.
π All techniques: https://dev48v.infy.uk/promptfromzero.php
Top comments (0)