Yigit Alp Unal

Posted on Apr 14 • Edited on Jul 9 • Originally published at dev.to

Your AI Doesn't Remember You. That's About to Matter More Than You Think.

#ai #knowledgegraph #mcp #architecture

How two engineers independently arrived at the same architecture — and what it reveals about the next layer of personal AI.

On April 4, 2026, Andrej Karpathy published a gist called "LLM Wiki." Within hours it had thousands of stars. The idea: instead of throwing documents at RAG and hoping for the best, have the LLM incrementally build and maintain a persistent wiki — a structured, interlinked knowledge base that compounds over time. The knowledge is compiled once and kept current, not re-derived on every query.

Nineteen days before that gist, I started building RAWThink — a 15-tool MCP server that provides persistent memory for Claude Code sessions. A JSONL knowledge graph with entities, relations, and observations. Hybrid vector + BM25 search via Qdrant. Session lifecycle automation. Three-layer memory that accumulates across every conversation.

I didn't know Karpathy was working on the same problem. He didn't know I existed. We arrived at nearly identical architectures because the technical path leads here inevitably.

This isn't a coincidence. This is convergence. And convergence in engineering means you're looking at something real.

The Wall Everyone Hits

If you've spent serious time with AI coding assistants — Claude Code, Codex, Cursor, any of them — you've hit the same wall. Every session starts from zero. The context window is your entire relationship. Close the tab, lose the thread.

This isn't a minor inconvenience. It's a fundamental architectural flaw. Imagine working with a brilliant colleague who gets complete amnesia every time they leave the room. You'd spend half your time re-explaining what you already decided, what you already tried, what you already know. That's what we're all doing. Every day. With every AI tool.

RAG was supposed to fix this. Upload your documents, let the system retrieve relevant chunks at query time. It works — for simple lookups. But ask a question that requires synthesizing five documents, connecting dots across three conversations, and remembering that the conclusion from last Tuesday contradicts what you believed last month? RAG re-derives everything from scratch. There's no accumulation. No compounding. No learning.

Karpathy put it perfectly: "The LLM is rediscovering knowledge from scratch on every question."

Two Roads, Same Destination

Karpathy's LLM Wiki has three layers:

Raw Sources — immutable input documents. The LLM reads them but never modifies them.
The Wiki — LLM-generated markdown pages. Summaries, entity pages, concept pages, comparisons. The LLM creates, updates, and cross-references everything.
The Schema — a configuration file (CLAUDE.md) telling the LLM how the wiki works.

RAWThink has three layers:

Raw Input — session transcripts, uploaded documents, external sources. Immutable.
Knowledge Graph — JSONL entries with entities, relations, observations. The system creates, updates, and connects everything across sessions.
MEMORY.md + MCP Schema — configuration defining session lifecycle, tool behavior, and the system's advisory personality.

Same architecture. Different names. Built 19 days apart by people who never spoke.

But here's what makes this story more than a technical curiosity.

This Isn't a Wiki. This Is the First Layer of Your Personal Neural Network.

Step back from the implementation details. Look at what's actually being built.

Every entity in a knowledge graph is a neuron. Every relationship between entities is a synapse. Every confidence score on a fact is a weight. Every time new information reinforces or contradicts an existing entry, that's learning. Every session that adds observations and updates connections is a training step.

We are building personal neural networks. Not metaphorically. Structurally.

My RAWThink instance has 120 entities, 80+ typed relationships, and 2,600+ observations accumulated across 46 sessions. That graph encodes how I think about AI architecture, what I've decided about my career, what trading patterns I've investigated, which technical decisions I've made and why, and how all of those things connect to each other. It's not a collection of notes. It's a structured representation of a portion of my cognition — my decisions, my reasoning chains, my evolving beliefs.

When I start a new Claude Code session and RAWThink loads, the AI isn't starting from zero anymore. It's starting from me. My context. My history. My patterns of thought. The graph gives the AI a compressed model of who I am and what I know.

That's not a wiki. That's a cognitive substrate.

The Hardware Isn't Ready. The Data Structure Is.

Here's the part most people are missing.

Right now, these knowledge graphs are passive. The AI queries them, retrieves context, and uses it to give better answers. That's useful — I've shipped 635 commits on a game project largely because RAWThink prevented the constant context re-derivation that kills momentum in long-running AI-assisted development.

But passive retrieval is just phase one.

Phase two is active reasoning over the graph. An agent that doesn't just retrieve "Yiğit investigated momentum strategies in March" but traverses the graph to discover "momentum strategies performed well in low-volatility regimes, the current regime shows volatility decline, and the last time this transition happened the outcome was X." Multi-hop reasoning. Causal chains. Temporal pattern matching.

Phase three is autonomous action based on graph state. The agent monitors the graph, detects patterns that match actionable criteria, and executes — or at least proposes execution. A trading signal isn't just a number; it's a path through the knowledge graph connecting market data to historical patterns to strategy performance to risk parameters.

We're not at phase two yet. The models aren't quite there. The hardware isn't there. The inference costs for continuous graph reasoning at scale aren't there.

But the data structure can be built now. And when the compute catches up — and it will, faster than most people expect — the people who have clean, rich, well-structured personal neural graphs will have an enormous advantage over those starting from zero.

This is the early internet parallel. In 1995, most businesses didn't see why they needed a website. The infrastructure was primitive. Modems were slow. E-commerce barely existed. But the companies that digitized their information early — that built the data structures before the infrastructure matured — were the ones that dominated when broadband arrived.

We're in the 1995 of personal knowledge graphs. The infrastructure is primitive. The models are just barely capable enough. But the data structure you build today is the asset that compounds tomorrow.

From General Wiki to Domain Architecture

Karpathy's LLM Wiki is deliberately abstract. It describes the pattern, not a specific implementation. That's its strength as a teaching tool and its limitation as a production system.

The real value emerges when you apply the pattern to a specific domain with structured entity types, typed relationships, and domain-specific reasoning chains. Let me show you what this looks like across four very different fields — because the universality of the pattern is the point.

Medicine: A Physician's Diagnostic Graph

A physician doesn't need a wiki of medical facts — UpToDate already exists. What a physician needs is a patient reasoning graph that accumulates diagnostic thinking across encounters.

Entities are typed: Patient, Symptom, Diagnosis, Medication, LabResult, Outcome. Each carries domain-specific fields — a Medication entity has dosage, start date, contraindications; a LabResult has reference ranges, trend direction, clinical significance.

Relationships encode clinical reasoning: symptom SUGGESTS diagnosis, medication PRESCRIBED_FOR diagnosis, labResult CONTRADICTS diagnosis, patient RESPONDED_TO medication, diagnosis EVOLVED_INTO diagnosis. The critical relationship is CONTRADICTS — when a lab result undermines a working diagnosis, that contradiction is the most valuable signal in the graph.

Temporal layers are life-or-death here. A medication that was effective for six months then caused adverse effects isn't just "a medication" — it's a medication with a temporal efficacy curve. The graph must encode prescribed_at, effective_until, discontinued_because. When a new patient presents with similar symptoms, the physician's AI doesn't just retrieve "Drug X treats Condition Y" — it traverses the graph to find "Drug X worked for similar patients for 4–8 months but three patients developed resistance, and in those cases switching to Drug Z produced better outcomes."

This isn't hypothetical. Google DeepMind's Med-PaLM 2 already matches expert physicians on medical exam questions. What it lacks is your clinical experience — the patterns you've seen across your patients, the drug interactions you've encountered, the diagnostic hunches that come from twenty years of practice. A personal medical knowledge graph captures exactly that.

Law: A Case Strategy Graph

A lawyer building a case doesn't need a legal database — Westlaw and LexisNexis exist. What a lawyer needs is a case strategy graph that maps the evolving landscape of arguments, precedents, and their interconnections.

Entities are typed: Precedent, Statute, Argument, Evidence, Witness, JudicialOpinion. A Precedent entity carries jurisdiction, court level, year, distinguishing factors, and current validity status (good law, questioned, overruled).

Relationships encode legal reasoning chains: precedent SUPPORTS argument, statute GOVERNS issue, evidence UNDERMINES argument, opinion DISTINGUISHES precedent, argument DEPENDS_ON evidence. The power relationship is DISTINGUISHES — when opposing counsel cites a precedent, your graph should immediately surface how that precedent was distinguished, limited, or questioned in subsequent decisions.

Confidence scores map to legal strength. A precedent from your jurisdiction's supreme court with no negative treatment carries higher weight than a trial court opinion from another state. An argument supported by three independent precedents and corroborating physical evidence has a different confidence profile than one resting on a single witness testimony. The graph makes these weight differences explicit and queryable — "show me all arguments where confidence dropped below 0.6 after opposing counsel's last filing."

Over years of practice, a lawyer's graph accumulates patterns that no legal database captures: which judges are receptive to which argument styles, which opposing firms tend to pursue which strategies, which types of evidence are most persuasive in specific case categories. This is institutional knowledge that currently exists only in senior partners' heads and walks out the door when they retire.

Scientific Research: A Literature Evolution Graph

A researcher doesn't need another citation manager — Zotero exists. What a researcher needs is a literature evolution graph that tracks how ideas develop, conflict, merge, and get superseded across the field.

Entities are typed: Paper, Finding, Method, Dataset, Hypothesis, ReplicationResult. A Finding entity is distinct from the Paper that contains it — because a single paper may contain multiple findings, and findings have independent replication histories.

Relationships encode the life cycle of scientific knowledge: paper INTRODUCES hypothesis, finding SUPPORTS hypothesis, replicationResult FAILS_TO_REPLICATE finding, method IMPROVES_UPON method, finding CONTRADICTS finding, paper SUPERSEDES paper. The most valuable relationship is FAILS_TO_REPLICATE — this is the signal that most literature reviews miss because negative results are rarely published, but a personal research graph can track them explicitly.

Temporal evolution reveals paradigm shifts in real time. When you can query "show me all findings from 2020–2023 where replication confidence has decreased," you're watching the leading edge of a paradigm shift before anyone publishes a review paper about it. When you can traverse the path from a 2018 hypothesis through its supporting evidence, contradictions, and modifications to its current form in 2026, you're seeing the evolution of knowledge — not just its current state.

A PhD student who starts building this graph in year one will have, by year five, a structured map of their entire field's evolution as seen through their reading and thinking. No advisor, no database, no AI model trained on static data can replicate this — because it encodes not just what was read, but what was found important, what contradicted what, and how the researcher's own understanding evolved.

Software Engineering: An Architecture Decision Graph

A software engineer doesn't need more documentation — they need a decision archaeology graph that preserves why things were built the way they were.

Entities are typed: Decision, TradeOff, Pattern, Incident, TechDebt, Benchmark. A Decision entity carries the date, participants, alternatives considered, constraints at the time, and the reasoning chain. A TechDebt entity carries the original shortcut reason, the accumulation rate, and the estimated remediation cost.

Relationships encode architectural reasoning: decision CHOSE pattern OVER pattern, incident CAUSED_BY decision, techDebt ACCUMULATED_FROM decision, benchmark INVALIDATES assumption, pattern REPLACED_BY pattern BECAUSE constraint_changed. The critical chain is the causal one — when a production incident occurs, the graph can traverse backward through CAUSED_BY edges to the original architectural decision, through CHOSE ... OVER to the rejected alternative, and forward through BECAUSE constraint_changed to understand whether the original reasoning still holds.

I've lived this one. Nine years of banking technology — SAGA orchestrations, CQRS architectures, AML compliance systems handling SEPA/ISO 20022 transactions. The most expensive moments aren't the bugs. They're the moments when a team asks "why did we build it this way?" and nobody knows, because the architect left three years ago and the decision was made in a meeting that was never documented. A personal architecture decision graph prevents this — not by documenting everything, but by capturing the reasoning and the connections at the moment decisions are made.

The Pattern Is Universal

In every domain, the same transformation happens when you move from flat notes to a structured graph:

Dimension	Flat Notes / Wiki	Domain Knowledge Graph
Entities	Pages	Typed nodes with domain schemas
Links	Hyperlinks	Semantic relationships (`supports`, `contradicts`, `evolved_into`)
Time	Timestamps	Temporal validity windows, evolution chains
Certainty	Implicit	Explicit confidence scores with decay
Retrieval	Keyword / semantic search	Graph traversal + multi-hop reasoning
Learning	Manual review	Automated contradiction detection, confidence updates

Karpathy gave us the foundation: raw sources → compiled knowledge → schema. The next layer is: typed entities → semantic relationships → temporal evolution → confidence-weighted reasoning.

The domain determines the entity types and relationship vocabulary. The architecture is the same everywhere.

Find Your Strange Loop

There's something deeper happening here that goes beyond tools and architectures. And if you miss it, you'll build a knowledge graph that stores information but never truly thinks.

Douglas Hofstadter defined a strange loop as a system that, by moving through its own levels, arrives back at where it started — but transformed. A feedback loop where the output feeds back as input, and each cycle produces something new. It's the mechanism behind consciousness itself: you observe, you model what you observed, you observe your model, and in that recursive self-reference, something emerges that is more than the sum of its parts.

Every knowledge graph worth building is a strange loop. Here's why.

When you think about a problem, you generate an insight. You externalize that insight — write it as an entity, define its relationships, assign confidence. That externalized thought now exists in the graph. The next time you (or your AI) encounter a related problem, the graph surfaces that insight, combines it with other entities, and produces a new connection you hadn't seen before. That new connection triggers new thinking. You externalize again. The graph grows. The cycle continues.

Think → externalize → graph → new connection → new thinking → externalize again.

This isn't just record-keeping. This is thought evolving through externalization. The graph doesn't just store what you know — it creates conditions for you to know things you couldn't have known without it. The connections between entities generate emergent insights that no individual entity contains. The whole becomes greater than the sum of its parts — which is, not coincidentally, the definition of emergence in complex systems.

I've watched this happen in RAWThink. In session 5, I explored the concept of "root branch" — the state before a fork, before consciousness splits into observer and observed. It was a standalone philosophical note. Sixteen sessions later, during a completely unrelated architecture discussion, RAWThink surfaced that entity alongside a technical pattern called "stigmergy" — indirect coordination through traces left in a shared environment. The connection was instant: the knowledge graph itself is a stigmergic medium. One session leaves a trace (an entity, an observation, a relationship), and the next session discovers that trace and responds to it. No direct communication between sessions. Just traces in a shared environment, producing coordinated behavior.

I didn't plan that connection. I couldn't have planned it. The graph made it possible by holding both ideas in structured, searchable form until the moment they became relevant to each other. That's the strange loop: my thinking created the graph, the graph created a new connection, the new connection created new thinking.

Every person who builds a knowledge graph needs to find their own version of this loop. Not just as a metaphor — as a practice. Here's what that means concretely:

Don't just add — revisit. The compounding mechanism isn't in writing new entries. It's in updating old ones. When new information changes your understanding of something you recorded six months ago, go back and modify the original entity. Add an evolved_into edge. Update the confidence score. Mark the contradiction. This revision is where the loop closes — your past thinking informs your present, and your present thinking transforms your past.

Capture contradictions explicitly. Most people avoid contradictions. They're uncomfortable. But in a knowledge graph, contradictions are the most valuable signals. They're the points where learning happens. When you believe X on Monday and discover evidence for not-X on Wednesday, don't delete Monday's entry. Create a contradicts relationship between them. Add observations explaining what changed. Over time, your graph's contradiction density becomes a measure of how much genuine learning is happening.

Let the graph surprise you. Periodically query your own graph with open-ended questions: "What concepts are connected that I haven't explicitly linked?" "What entities have the lowest confidence scores?" "What was I thinking about three months ago that I've since abandoned?" The best insights come from the graph telling you something you've forgotten or never consciously connected.

Track the evolution of your beliefs, not just your conclusions. A graph that says "I use React for frontend" is flat. A graph that says "I used Angular in 2019 (confidence: 0.9), considered Vue in 2021 (confidence: 0.6), switched to React in 2023 because of X, Y, Z (confidence: 0.85), currently questioning whether Svelte better serves my use case (confidence: 0.4)" — that graph encodes how you think, not just what you think. And "how you think" is what makes your graph uniquely valuable, because no one else's reasoning chain looks like yours.

The strange loop is the difference between a filing cabinet and a thinking partner. A filing cabinet stores what you put in. A strange loop gives back more than you put in — because the structure itself generates new connections, and those connections generate new thinking, and that new thinking enriches the structure. It's recursion with emergent properties. It's self-reference that produces growth.

Hofstadter was writing about consciousness. But he was also, without knowing it, describing the architecture of a personal knowledge graph that actually works.

The Community Already Discovered What's Missing

Within days of Karpathy's gist, the community started building implementations. Some of the discoveries are profound:

WikiMind found that confidence-tagged claims transform maintenance from an expensive AI re-read into a cheap database query. Instead of asking the LLM "are there contradictions in the wiki?", you query: SELECT * WHERE confidence < 0.5 OR last_confirmed > 30 days. This is the moment the wiki becomes a database, and the database becomes a graph, and the graph becomes a neural network.

thinking-mcp went further — instead of storing facts, it stores how the user thinks. Decision heuristics, tensions between beliefs, reasoning patterns. With typed edges (supports, contradicts, evolved_into) and node decay where core values persist while fleeting ideas fade. This is the closest thing I've seen to encoding cognition itself.

A commenter proposed 4D Evolutionary Knowledge Graphs — time as the Z-axis, with causal edges between temporal snapshots. This isn't academic abstraction. For any domain where understanding how things evolved matters more than their current state — medicine, law, markets, personal development — temporal causality is the missing dimension.

celestix-ifr tackled the retrieval scaling problem with a biologically-inspired approach: query vectors that mutate at each hop through the embedding graph, inspired by Koshland's induced-fit model in enzymology. On the HotpotQA benchmark with 5.2 million articles, all traditional RAG methods scored 0% on multi-hop queries, while this approach found targets ranked as deep as position 665 in baseline results. This is the retrieval architecture that personal neural graphs will eventually need — because the most valuable connections in your graph aren't the obvious ones.

These aren't incremental improvements. They're the features that transform a wiki into a neural graph.

Why Starting Now Matters

Let me be concrete about what "start now" means.

You don't need RAWThink. You don't need a fancy MCP server. You don't need Qdrant or JSONL or any specific technology. Here's what you need:

A place to accumulate structured knowledge that persists across AI sessions.

That can be an Obsidian vault with a CLAUDE.md file and a convention for entity pages. It can be a folder of markdown files with YAML frontmatter. It can be a SQLite database. It can be a Google Doc with a disciplined structure. The tool doesn't matter. The habit of structured accumulation matters.

Every time you make a decision with an AI assistant, file it. Not as a chat log — as a structured entry. What was decided, why, what alternatives were considered, what evidence supported it, and what it connects to.

Every time you learn something that changes your understanding, update the existing entries. Don't just add new notes — revise old ones. That's the compounding mechanism. That's what makes this a living graph instead of a pile of documents.

Every time you notice a contradiction between what you knew before and what you know now, mark it explicitly. Contradictions are the most valuable signals in a knowledge graph. They're where learning happens.

The people who start this discipline today — even with primitive tools — will have knowledge graphs with thousands of interconnected, confidence-weighted, temporally-layered entries by the time the infrastructure matures. The people who wait will be starting from zero, trying to reconstruct years of thinking from scattered chat logs and half-remembered conversations.

The Convergence Isn't Stopping

Karpathy built a wiki. I built a knowledge graph. The community added confidence scoring, typed edges, temporal layers, and cognitive modeling. In five days.

This isn't a trend. This is a phase transition. The way we interact with AI is shifting from transactional (ask a question, get an answer, forget everything) to cumulative (every interaction builds on every previous interaction, knowledge compounds, the AI gets better at being your AI over time).

The technical path doesn't fork from here. It converges harder. MCP is becoming the standard protocol — 97 million SDK downloads, adoption by every major AI company. Knowledge graphs are becoming the standard memory structure — Mem0 raised $24M, Zep raised $3.3M, every major LLM provider added persistent memory in the last year. The wiki-as-compiled-knowledge pattern is being validated simultaneously by independent builders worldwide.

When I look at my personal neural graph — 120 entities, 80 relationships, 2,600 observations, 46 sessions of accumulated context — I don't see a productivity tool. I see the first draft of something that will, within a few years, be as fundamental to how we work with AI as the file system is to how we work with computers.

Build yours. Start now. The structure matters more than the tools. And the time you invest today in organizing your knowledge will compound in ways that are hard to imagine but impossible to replicate later.

Yigit Alp Unal is a Senior Backend Developer and Project Lead at Anadolubank with 9+ years in banking technology. He builds RAWThink, an open-source persistent memory MCP server for AI development workflows. He writes about AI architecture, knowledge systems, and the intersection of enterprise engineering and personal AI at @yigitaunal on dev.to, yigitalpunal.com, and rawthink.ai

References:

RAWThink. First commit March 16, 2026. 15-tool MCP server, JSONL knowledge graph, Qdrant hybrid search.https://github.com/ygtalp/rawthink-mcp
Karpathy, A. (2026). "LLM Wiki." GitHub Gist, April 4. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f
Karpathy, A. (2026). "AutoResearch." GitHub repository. https://github.com/karpathy/autoresearch
WikiMind (manavgup). Confidence-tagged claims implementation for LLM Wiki. Fork of Karpathy's gist with schema-level confidence scoring.
thinking-mcp (multimail-dev). MCP server for capturing reasoning patterns, decision heuristics, and belief tensions with typed edges and node decay. https://github.com/multimail-dev/thinking-mcp
kenwCoding. "4D Evolutionary Knowledge Graphs" — comment on Karpathy's LLM Wiki gist proposing temporal causality as the Z-axis.
celestix-ifr. Induced-fit retrieval model for multi-hop knowledge graph traversal, inspired by Koshland's enzymology. Comment and implementation on Karpathy's gist.
Bush, V. (1945). "As We May Think." The Atlantic Monthly, July 1945. https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/
Hofstadter, D. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books.
Google DeepMind & MIT (2025). "Towards a Science of Scaling Agent Systems." arXiv:2512.08296.
Mem0 (formerly EmbedChain). 48K GitHub stars, 80,000+ cloud developers. https://github.com/mem0ai/mem0
Zep / Graphiti. Temporal knowledge graph engine, 24.5K GitHub stars. https://github.com/getzep/graphiti
Letta / MemGPT (UC Berkeley). Three-tier memory architecture. https://github.com/letta-ai/letta
MCP (Model Context Protocol). Donated to Linux Foundation's Agentic AI Foundation, December 2025. 19,687 servers cataloged, 97M monthly SDK downloads. https://modelcontextprotocol.io
Shumailov, I. et al. (2024). "AI models collapse when trained on recursively generated data." Nature, 631, 755–759.
Heylighen, F. (2007). "Why is Open Access Development so Successful? Stigmergic organization and the economics of information." In Open Source Jahrbuch.
qmd (Tobi Lütke). Local semantic search with query expansion and cross-encoder reranking, 16,500 GitHub stars. https://github.com/qmd-lab/qmd

DEV Community