"I understood it when Claude explained it." This is the most dangerous sentence in learning, because it reports recognition (I followed an explanation that was happening right in front of me) and quietly gets filed as comprehension (I now possess the concept and can apply it). The two states are wildly different. The problem is that the first state feels exactly like the second one, right up until the moment you have to use the concept on a new problem and discover that you cannot.
Bransford and Schwartz's 1999 paper "Rethinking Transfer: A Simple Proposal with Multiple Implications" is the clearest diagnostic for this failure. Their test: pose a novel problem in a new surface form. If the learner solves it, the concept transferred. If the learner can only reproduce the original explanation, the concept collapsed into memorization.
Whitehead (1929) called this collapse "inert knowledge" — knowledge the student can recite but cannot apply. Bransford's test is what detects it. I run the test as Node 5 of the concentric-loop discipline in claude-code-agent-skills-framework, and I run the same test against agent output before trusting any Claude-generated artifact.
The test for concepts
After a concept has been explained and the descent through its layers has landed, wait — at minimum, a day; ideally, a week. Then pose a problem that satisfies three conditions:
- New surface form. Different domain, different vocabulary, different concrete example. Not the same problem with the numbers swapped. A genuinely new clothing for the same mechanism.
- No scrollback. The original explanation is not available. No notes, no conversation history, no re-reading the blog post.
- Different framing. If the concept was introduced via one practitioner's lens (Karpathy's build-from-scratch, say), the transfer problem is framed via a different lens (Huyen's latency-tier I/O contract, say).
Solve it. If you can, the concept transferred. If you cannot — or if you can only after hints — the learning did not close.
Most "I learned X last week" claims fail this test. That is the discovery. The claim was made in good faith at the moment of the original explanation, and it felt true because recognition felt like comprehension. The transfer test separates the two states by requiring the concept to do work outside the environment in which it was introduced.
The three failure signals
When the transfer test fails, it fails in one of three specific shapes. Naming the shape matters because each one points to a different remediation.
Signal 1 — Can only reproduce the original analogy. The student's attempt to solve the new problem leans on the original metaphor or example and does not generalize. "Well, in the 3Blue1Brown video, they said neurons are like voters..." The analogy has become the concept. This is Gentner's analogy-leakage failure: the student is reasoning about the vehicle instead of the mechanism.
Remediation: descend again, with a different analogy. Not the same analogy rephrased — a genuinely different starting point that forces the mechanism to be re-grounded.
Signal 2 — Solves the old problem, not the transfer. The student can correctly derive backprop on the exact MLP from the lecture, but cannot adapt to a transformer head. The knowledge is real but local. It is not yet a transferable skill; it is a memorized procedure.
Remediation: the multi-instance requirement from the whiteboard test — run the same concept across three genuinely different architectures or problems, forcing generalization.
Signal 3 — Solves the transfer only after a hint. The student gets there eventually, but only after the interrogator scaffolds the first step. The concept is semi-transferred — partly held, partly dependent on external prompting.
Remediation: keep the card in active rotation. Re-test at a longer interval with no hint permitted. If the unaided solution lands, move the card to the less-frequent review pool.
The same test applied to agent output
This is where the pattern extends beyond learning. Bransford transfer also works as an evaluation for any agent pipeline that purports to generalize.
The failure mode: an agent that performs well on the exact distribution it was tested against — the exact prompt shape, the exact input schema, the exact phrasing of the instruction — and collapses when any of those shifts. Superficially this looks like an agent that "works." In practice, it is an agent that memorized its evaluation.
Apply Bransford's test: swap the system prompt, swap the input schema, swap the practitioner's framing of the instruction. Check whether the agent's correctness transfers. If it does, the agent genuinely solves the class of problem. If only the exact form works, the agent memorized the harness.
Hamel Husain's manual-trace-labeling discipline is the Bransford test run at scale on agent output. Label 20-100 real traces across different surface forms. Extract the cases that fail. Those cases are the non-transferring ones — the ones the agent "knew" in the original framing and could not hold in the new one.
Why this sits at the close of the concentric loop
The concentric loop — analogy → code → system → math → analogy — is only complete when the return lands. The return is not a summary. It is the test: can the student now apply the enriched mental model to the original analogy in a way they could not before?
Bransford transfer is the formal version of that return. If the transfer holds, the loop closed. If the transfer fails, the descent was not deep enough — the student followed the explanation but never integrated it into a form that generalizes.
The loop is not a presentation artifact. It is an instrument with a measurable completion criterion. Without the criterion, every teaching session feels successful at the moment it ends, which is how inert knowledge accumulates.
What makes this practically catchable
The test is cheap to run. The expensive part is the discipline of running it after the concept has been "learned," rather than calling the learning done and moving on. Every new concept goes onto a spaced-review card with a Bransford test associated with it: a specific novel surface form the student has not yet seen.
The card surfaces on a schedule. The student runs the test. The outcome (pass, fail, fail-with-hint) gets logged. Over months, the log becomes a map of which concepts have genuinely transferred and which remain fragile — which is the input to every decision about what to study next.
Without the card, the test does not run. Without the test, the transfer is assumed rather than verified. Assumption is the cache layer between recognition and application; the cache gets cleared by novelty, and the reader discovers it only when the novelty arrives.
The sentence that replaces "I understood it"
After every concept, the honest sentence is not "I understood it." It is "I understood the explanation. The transfer test has not been run yet. I will know whether the concept transferred when the test fires on a novel problem in a different surface form."
This is longer and less satisfying. It is also true. The shorter version is what produces the six-months-later surprise of realizing you do not actually know the thing you thought you learned.
Replace one "I understood it" claim with its Bransford-pending version this week. Schedule the test. The discipline compounds across every concept you acquire afterward.
Aman Bhandari. Operator of an AI-engineering research lab running Claude Opus as the coaching partner, plus a QA-automation surface shipping against a real sprint workload. Public artifacts: claude-code-agent-skills-framework and claude-code-mcp-qa-automation. github.com/aman-bhandari.
Top comments (0)