blaze

Posted on Jun 28

A Four-Type Framework for LLM Wiki by karpathy

#ai #llm #learning #agents

Why Knowledge Alone Doesn't Create Judgment

Karpathy's LLM Wiki is brilliant. You dump raw material in, an LLM extracts concepts and links them together, and you get a personal knowledge base that actually works.

I built one. 100+ pages. It's great.

But I hit a wall that made me rethink everything.

The Wall

I asked my AI to act as a programming tutor. It could recite every concept perfectly.

Student: "I don't understand Promises."

AI: "A Promise is an object representing the eventual completion or failure of an asynchronous operation..."

Wrong answer. The right answer was: "Do you understand callbacks first? What about synchronous execution? What have you tried so far?"

The AI had knowledge. It had zero judgment.

And then I realized why: every single page in my wiki was the same type of knowledge.

One Type vs Four

LLM Wiki 1.0 stores declarative knowledge — facts, definitions, summaries. Things that answer "What is this?"

But think about what makes a human expert different from a textbook:

A great programming mentor doesn't just know what Promises are. They know why you teach callback → Promise → async/await in that exact order — and never the reverse. That's not a fact. It's a reasoning path.

A master astrologer doesn't just know what each star represents. They know why you check 命宮 first, then 三方四正, when to prioritize 格局, when a palace is a consequence rather than a cause. That's not a fact either. It's a decision sequence.

And here's the kicker: even knowing the reasoning path isn't enough.

We annotated Anderson's (1972) Socratic tutoring dialogues — full 41-turn and 30-turn conversations, labeling every decision point. Knowing the 23 Socratic rules (the reasoning path) is one thing. Reading a complete dialogue — watching the expert set a trap, wait 15 seconds in silence, break their own rules when the student gets frustrated — is something else entirely.

Read the 《Complete Book of Psychology》 ≠ know how to use or teach.

And there's still one more type.

Student says: "I have no motivation lately."

A knowledge-based response: "Here are the top 5 causes of low motivation..."
but....it's useless, LLM don't know how to resolve the problem.
it just explaining a concept.
So,a more suitable response for this scenario is: "When was the first time you noticed this? What makes you think so?"

The expert isn't answering. They're diagnosing. They know that "no motivation" is a surface symptom — the real problem could be burnout, unclear goals, a specific failure, or something else. Until you know which, any advice is a guess.

That's four distinct types of knowledge:

Declarative — What is true (facts, concepts, definitions)
Procedural — How to reason (expert decision sequences, why X before Y)
Experiential — How it's actually done (complete worked examples with mistakes visible)
Interaction — How to guide (what to ask next, when to tell vs wait)

LLM Wiki 1.0 only stores type 1.

The Evidence Is Brutal

WashU researchers analyzed 98 real CS TA sessions — 17 hours, 8,203 utterances.

Socratic questioning (guided reasoning, diagnostic probes): 0.6%.
TAs directly giving the answer: 75%.

These TAs knew the method. They were trained. Under time pressure, they defaulted to giving answers anyway.

Knowing the rules ≠ being able to execute them.

That gap — between knowing and executing — is exactly where procedural, experiential, and interaction knowledge live. If you don't store those types, you can't train them. If you can't train them, you can't execute under pressure.

The Missing Operation

Karpathy's framework has one operation: ingest — extract facts from raw material.

That produces declarative knowledge beautifully. But you can't get reasoning paths, worked examples, or guidance strategies by looking for facts. You have to look for decisions — what did the expert choose, when, and what followed?

We added a second operation: mine.

ingest looks for facts → Declarative Knowledge
mine looks for decisions → Procedural, Experiential, Interaction Knowledge

Same raw material. Completely different extraction target.

What This Looks Like In Practice

Over two weeks, we mined five teaching case studies:

Procedural frameworks extracted:

Anderson's 23 Socratic Rules — complete tutoring cycle in 6 groups
One-Minute Preceptor — clinical medicine's "diagnose before you teach" framework
Socratic Debugging 7 Steps — "don't touch the keyboard, guide to cognitive dissonance"

Experiential cases annotated (decision-point level, not summaries):

41-turn scientific reasoning dialogue — trap design, "don't say you're wrong"
30-turn moral reasoning dialogue — counter-example strategy, breakthrough moment
1-hour CMU math tutoring — "Tell Your Reader" metaphor, progressive correction
WashU 98-session negative case — why Socratic method fails in practice
MathDial 3,000-dialogue taxonomy — Focus / Probe / Tell / Generic decision model

Interaction pattern (emerging):

A decision tree: when the student is stuck → narrow the problem (Focus). When they answer but reasoning is unclear → deepen understanding (Probe). When Focus + Probe cycle fails twice → give a strategy hint, not the answer (Tell).

This Isn't Just About Teaching

The four-type distinction applies wherever expertise exists:

Medical diagnosis: Disease definitions → diagnostic reasoning sequence → grand rounds presentations → how to guide a resident
Philosophy mentoring: What Heidegger said → when to bring up Stoicism instead → full dialogue transcripts → when to stay silent
Growth coaching: Motivation theories → when to probe vs reframe → full session transcripts → "When did you first notice this?"

In every domain, experts have all four types. Knowledge bases only capture the first.

The Point

The next generation of AI won't be defined by larger knowledge bases.

It will be defined by better reasoning, better teaching, and better judgment.

Those don't come from more declarative knowledge. They come from organizing knowledge differently.

Judgment isn't a knowledge problem. It's a knowledge-type problem.

Built on @karpathy's LLM Wiki foundation. The idea of "mine" as a second operation is what's new here — ingest extracts facts, mine extracts decisions. If you're building an AI tutor, a knowledge system, or anything that needs judgment, the four-type checklist might save you months.