DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Show Your LLM 2 Examples and It Will Copy the Format Forever — Few-Shot Prompting

🌐 Live demo: https://dev48v.infy.uk/prompt/day4-few-shot.html

Day 4 of PromptFromZero. 50 LLM techniques · 50 days · each visualized with LOOK / UNDERSTAND / BUILD.

Today: few-shot prompting. The OG technique from the GPT-3 paper. Costs nothing. Pairs naturally with CoT (Day 2) and self-consistency (Day 3). The cheapest way to control LLM output without fine-tuning.

The setup

Same task. Same model. Two prompts.

Classify this product review's sentiment + extract the main complaint.

Review: "Beautiful display but the keyboard feels mushy under heavy typing."
Enter fullscreen mode Exit fullscreen mode

Zero-shot output

The review you provided seems to express mixed feelings about the product.
The reviewer appreciates the visual quality of the display, but is concerned
about the typing experience due to the keyboard feeling mushy. The main
negative point is the keyboard quality.
Enter fullscreen mode Exit fullscreen mode

Verbose prose. Hard to parse. Different every call.

Few-shot output (2 worked examples first)

{ "sentiment": "mixed", "complaint": "mushy keyboard" }
Enter fullscreen mode Exit fullscreen mode

Parseable. Deterministic format. JSON.parse() works directly.

The technique

Add 2-3 input/output pairs BEFORE the real input:

const FEW_SHOT = `Classify review sentiment + extract main complaint.

Review: "Battery dies in 2 hours. Camera is great though."
Output: { "sentiment": "negative", "complaint": "battery life" }

Review: "Love the screen and price. Wish it had USB-C."
Output: { "sentiment": "positive", "complaint": "missing USB-C" }

Review: "${userInput}"
Output:`;
Enter fullscreen mode Exit fullscreen mode

The model is a next-token predictor. By the time it reaches the third Output:, it has already "learned" the format from the prior two pairs. It just continues the pattern.

Why it works

LLMs don't reason in advance — they predict the next token given everything before. By front-loading the prompt with examples:

  • The model infers the output schema is {sentiment, complaint}
  • The model infers JSON syntax
  • The model infers value vocabulary (positive / negative / mixed)
  • The model infers brevity is the norm

The third response continues the pattern. No fine-tuning needed.

The 3 levers

1. Example count

N Where
1 Fragile, model may freestyle
2-3 Most tasks, 85% reliable
5 Sweet spot, 90-95%
10+ Diminishing returns + token bloat

2. Example diversity

Don't show 3 positive reviews then ask for a negative classification. The model will guess positive. Cover the output distribution you expect to see in production.

3. Example quality

Bad example in → bad outputs out. Your worked examples are the model's textbook for this task. Curate them.

Few-shot vs fine-tune — when to switch

Switch to fine-tune when
You have > 100 examples Fine-tune is cheaper per call (no example bloat in every prompt)
Format hasn't changed in months Few-shot wins for prompts that evolve weekly
Need < 100ms latency Fine-tune skips re-reading examples
Same task across millions of calls Cumulative token savings pay back the fine-tune cost

For ~80% of production tasks, few-shot wins on cost + flexibility.

Pair with Chain of Thought

For reasoning problems, examples include the reasoning, not just answers:

Q: Sara has 4 apples, gets 2 more. How many?
A: Sara had 4. She got 2 more. 4 + 2 = 6. Answer: 6.

Q: Roger has 5 tennis balls, buys 2 cans of 3 each. How many balls?
A: ← model continues with full reasoning trace
Enter fullscreen mode Exit fullscreen mode

The model copies not just the format but the depth of thought. Called "few-shot CoT" in the literature.

Build it in 10 minutes

mkdir few-shot && cd few-shot
npm init -y && npm install ai @ai-sdk/google
echo "GOOGLE_GENERATIVE_AI_API_KEY=your_key" > .env
Enter fullscreen mode Exit fullscreen mode

Free Gemini key at https://aistudio.google.com/apikey.

import { generateText } from "ai";
import { google } from "@ai-sdk/google";
const model = google("gemini-2.5-flash");

const FEW_SHOT = `Classify review sentiment + extract main complaint.

Review: "Battery dies in 2 hours. Camera is great though."
Output: { "sentiment": "negative", "complaint": "battery life" }

Review: "Love the screen and price. Wish it had USB-C."
Output: { "sentiment": "positive", "complaint": "missing USB-C" }

Review: "${review}"
Output:`;

const r = await generateText({ model, prompt: FEW_SHOT });
console.log(JSON.parse(r.text));
Enter fullscreen mode Exit fullscreen mode
node --env-file=.env few-shot.mjs
Enter fullscreen mode Exit fullscreen mode

Try it now

3-tier learning page:
https://dev48v.infy.uk/prompt/day4-few-shot.html

  • LOOK — side-by-side animated comparison (zero-shot vs few-shot)
  • UNDERSTAND — 8 click-through steps on each lever
  • BUILD — full Node.js script, copy + run

Tomorrow: RAG basic — embed documents + retrieve top-K before the model answers.

🌐 All techniques: https://dev48v.infy.uk/promptfromzero.php

Top comments (0)