Most reasoning prompts do the same thing to every problem. Chain-of-thought staples "let's think step by step" onto the question and hopes the model figures out the rest. It usually beats a plain answer, but it's a one-size-fits-all move: a counting puzzle, a scheduling constraint, and a proof all get the exact same nudge. Sometimes that fits the task. Often it doesn't, and you get an answer that reads beautifully and is quietly wrong.
Self-Discover, from Google DeepMind (2024), starts from a different idea: before solving, the model should work out how to reason about this specific task. Not more reasoning — reasoning about the reasoning. It picks a strategy first, then runs it.
The three stages
Self-Discover begins with a fixed toolbox: about 39 generic "reasoning modules" drawn from how people actually solve problems. Things like "break the problem into smaller parts", "identify the key assumptions", "consider alternative perspectives", "critical thinking", "make a step-by-step plan". They're deliberately task-agnostic — each one describes a way to think, not a domain. The model never invents new ones. Everything happens by choosing and arranging these.
Then three meta-steps run:
SELECT — Given the task and the whole pool, the model picks the handful of modules that actually matter here. Most of the 39 are irrelevant to any given problem, so this is mostly a filter. For a cost-estimation task it might keep "break into subproblems", "identify key assumptions", "analytical computation", and "step-by-step plan", and drop the other 35.
ADAPT — The selected modules are still phrased generically. "What are the key assumptions?" is too abstract to act on. ADAPT rewrites each one to talk about the actual task: "a bus can't be half-hired, so round the bus count up." This grounding is where the accuracy comes from — the reasoning stops being a checklist and becomes a recipe for the problem in front of it.
IMPLEMENT — The adapted modules get composed into an ordered plan, written as a small JSON structure with one slot per step and the values left empty:
{
"identify_quantities": "",
"state_assumption": "",
"subtasks": "",
"compute_each": "",
"combine": ""
}
That structure is the thing Self-Discover discovers. The ordering isn't decorative — it encodes dependencies. You state the rounding assumption before you compute the bus count, not after.
Then you just fill it in
Only now does the model touch the actual problem, and it solves by filling every field of the structure in order. Take this one: 92 students, buses seat 40, each bus costs $150 to hire plus $2 per student for insurance — total cost?
Because the plan has a slot literally called state_assumption, the model has to write "round buses up" before it computes anything. So it does ceil(92/40) = 3 buses, 3 × 150 = 450 hire, 92 × 2 = 184 insurance, total $634.
Plain chain-of-thought, with its one fixed style, never thinks to round. It does 92 ÷ 40 = 2.3 buses, 2.3 × 150 = 345, adds insurance, and lands on $529 — fluent and wrong. The discovered structure had a named place for the exact insight CoT skipped.
The part that makes it practical
The three meta-steps are the expensive bit — but they run once per task type, not once per question. They produce a structure, not an answer. Every subsequent instance of the same kind of problem is solved by the cheap fill-in step. A thousand bus-cost questions pay the discovery cost once, then one call each to solve.
That's the real difference from methods like self-consistency or tree-of-thoughts, which multiply the cost on every instance. Self-Discover front-loads a fixed, reusable investment instead.
When to reach for it
It's not free, so it's overkill for a single one-off question — for that, Plan-and-Solve gets you most of the benefit in one call. Self-Discover pays off when you have many instances of a varied, reasoning-heavy task where no single chain fits everything. On mixed benchmarks like BIG-Bench Hard that's exactly the situation, and it beats CoT by a wide margin at a fraction of the inference cost of search-based methods.
One caution: the structure is reused, so its flaws are systematic. A bad SELECT or a sloppy ADAPT steers every instance toward the same wrong answer. So eyeball the discovered structure once and check it against held-out examples before you trust it. Discovery is something you review once, then rely on.
I built an interactive walkthrough — watch the three stages compose the structure, then execute it with real arithmetic, side by side with plain CoT walking into the trap:
Top comments (0)