LLMs write answers one token at a time, strictly left to right. Token 500 can't start until token 499 exists, so a thorough answer feels slow no matter how fast your hardware is. Skeleton of Thought (SoT) attacks exactly that — the length of the sequential critical path.
The idea
Most answers are really a list of semi-independent parts: tips, sections, aspects of a comparison. SoT exploits this in three moves:
- Ask for a skeleton — just short point titles, no prose.
- Expand every point in parallel — one request each, all at once.
- Stitch them back together in order.
Because the expansions don't depend on each other, they can run concurrently.
The skeleton call
const skeleton = await llm(
`Answer "${q}" as ONLY 3-8 numbered point titles, <=6 words each.`);
const points = skeleton.split("\n").filter(Boolean);
Tiny and fast. It also doubles as a plan, which tends to make the final answer better-organised than free-form writing.
Parallel expansion — where the speed comes from
const parts = await Promise.all(points.map((p, i) =>
llm(`Q: ${q}\nOutline: ${skeleton}\nExpand point ${i+1} in 1-2 sentences.`)
));
Instead of waiting for point 1, then 2, then 3… you wait once for whichever point takes longest.
The maths
- Sequential time ≈ skeleton + sum of all points.
- SoT time ≈ skeleton + the single longest point.
With five similar-length points, that's roughly five point-times versus one — the reported speedups land around 2× and up.
When NOT to use it
Parallelism is only safe when points are independent. Chained reasoning — a maths proof, a step-by-step derivation where point 3 needs point 2 — breaks it. Gate on that and fall back to normal generation.
The trade-off
SoT isn't free: you spend more total tokens (each expansion repeats the question and outline as context) and make more requests. What you buy is latency — the user sees a complete, structured answer much sooner. It's the classic distributed-systems bargain: more total work to shorten the critical path. It pairs beautifully with streaming UIs, too.
Watch a sequential vs SoT race with real timing here:
https://dev48v.infy.uk/prompt/day22-skeleton-of-thought.html
Top comments (0)