DEV Community: Rahul Rangarao

I got tired of re-explaining myself to AI. So I built a memory graph

Rahul Rangarao — Mon, 18 May 2026 10:53:28 +0000

Every AI conversation starts from zero.

You explain your architecture. The next day, you explain it again. You describe the decision you made three weeks ago. You clarify that constraint. You remind the model what your project actually is.

You are not using AI as a thinking partner. You are using it as a very fast search engine that needs a new tour every single session.

I got frustrated with this. So I built something.

The Problem Is Structural

The issue isn't that Claude or GPT are bad models. The issue is that context doesn't persist.

You can dump your whole notes folder into a prompt window, but that's noisy : the model drowns in irrelevant text. You can use RAG, but that's heavy infrastructure for a personal workflow. You can use memory features in chat apps, but those are black boxes with no provenance: you don't know why something was remembered, or whether it's still true.

What I actually wanted was simple:

A graph of concepts, decisions, goals, and relationships, not a pile of files
Every claim backed by a source excerpt so I know where it came from
Human review before anything becomes durable : LLMs propose, I decide
A tiny context snapshot I can paste into any chat, any model, anywhere
Zero cloud. My graph is stored in JSON file on my machine.

Introducing `knowledge-worker`

knowledge-worker is a local-first personal knowledge graph for carrying context across AI sessions. It turns markdown notes into reviewable concepts, decisions, goals, and relationships — with source excerpts attached — and exports a compact snapshot you can paste into Claude, GPT, Ollama, or whatever you're using next week.

No account. No sync. No telemetry. Just a JSON file and a CLI.

How It Works

The pipeline is five stages:

Markdown note
  → LLM extracts candidate nodes + edges
  → Validator checks shape, provenance, excerpts
  → You review: accept / reject / edit each claim
  → Merger writes to your local graph
  → Eval log records outcomes

Stage 1: Ingest a note

mykg ingest ~/notes/architecture-decision.md

The CLI calls your chosen LLM (Claude, GPT, or local Ollama) and gets back a structured list of candidate nodes and edges. Nothing is written yet.

Stage 2: Review

You see each candidate:

CANDIDATE: decision — "Use JSON first"
  excerpt: "plain JSON until it becomes the limiting factor"
  confidence: high
  [A]ccept / [R]eject / [E]dit > _

The LLM proposes. You decide. Nothing becomes durable memory without your explicit approval.

Stage 3: Query and export

# Search the graph
mykg query "provenance"

# Find paths between ideas
mykg path goal:trusted-ai project:knowledge-worker

# Export a compact LLM-ready context snapshot
mykg context

The context command outputs something like:

[knowledge-worker] project — A local-first knowledge graph for AI session continuity
  → serves: goal:trusted-ai
  → has_idea: idea:provenance-first

[decision] Use JSON first — plain JSON until it becomes the limiting factor
  source: architecture-note.md
  confidence: high

Paste that at the top of any new chat. Instant continuity.

No API Key? No Problem.

If you already have a Claude or ChatGPT subscription, you don't need an API key at all.

Just ask the model to produce a *.candidates.json file from your notes (there's a schema in the repo), then run the local validator and merge:

mykg ingest notes.md --candidates-file notes.candidates.json

Your app subscription does the extraction. The repo keeps graph validation and merge local.

Visualize Your Graph

Generate a self-contained, offline HTML viewer — no D3 CDN, no external requests, no server:

mykg viz --graph examples/demo_graph.json --out /tmp/my-graph.html

Here's what the public demo graph looks like rendered:

Try it: the repo ships a fictional demo graph you can run locally without any API key or install.

git clone https://github.com/rahulmranga/knowledge-worker
cd knowledge-worker
MYGRAPH_PATH=examples/demo_graph.json python3 mygraph/mygraph.py summary
MYGRAPH_PATH=examples/demo_graph.json python3 mygraph/mygraph.py context
python3 mygraph/mygraph.py viz --graph examples/demo_graph.json --out /tmp/demo.html

The Design Principles (Worth Saying Out Loud)

Provenance first. Every durable claim points back to a source document and a literal excerpt. The graph distinguishes grounded claims from guesses.

Review before merge. LLMs hallucinate. Letting an LLM write directly to your memory graph is how you end up with confident nonsense baked into your context. The pipeline enforces human review.

Boring persistence. The graph is a JSON file. No database to stand up, no migration to run, no cloud to authenticate. Add SQL-backed storage when you actually need it.

Local first. Your private graph lives outside the repo, addressed by absolute path. The public repo ships only code, docs, and a fictional demo graph.

Works With Everything

Backend	Install
Claude (Anthropic API)	`pip install -e ".[anthropic]"`
GPT (OpenAI API)	`pip install -e ".[openai]"`
Ollama (local)	`pip install -e ".[ollama]"`
No API (app session)	zero install — just bring a candidates JSON

The ollama_proxy/ package also ships an MCP wrapper (server.py) so it plugs into Claude/Cowork-style tool use.

The Thing That Surprised Me

Building this forced me to be honest about what I actually know versus what I assume about my own projects.

The review step — accepting or rejecting each LLM-extracted claim — turns out to be useful on its own, independent of the graph. It's a forcing function to read your own notes carefully and decide: is this actually a decision, or is it still a question? Is this claim grounded in something real, or did I just assert it?

The graph made my thinking legible to me. Which made it legible to the AI. Which made the AI actually useful for the second, third, and fiftieth conversation.

Try It

git clone https://github.com/rahulmranga/knowledge-worker
cd knowledge-worker
MYGRAPH_PATH=examples/demo_graph.json python3 mygraph/mygraph.py query provenance

No API key. No install. Just Python 3.10+.

If you've ever re-explained the same context to an AI three times in a week — this is the thing I built so I'd stop doing that.

GitHub: rahulmranga/knowledge-worker

MIT licensed. Local-first. stdlib-only core.

Anthropic Named Their Models After Poetic Forms. I Think They Accidentally Mapped the Human Brain.

Rahul Rangarao — Sun, 17 May 2026 22:11:46 +0000

A 3am theory about cognitive architecture, Kahneman, and why I always reach for Sonnet.

Kahneman's Two Systems

Daniel Kahneman spent decades studying how humans make decisions. His conclusion, laid out in Thinking, Fast and Slow, is that we don't have one brain — we have two systems running in parallel:

System 1 is fast, automatic, instinctive. It pattern-matches. It reacts. It's the part of you that catches a ball before you consciously decide to reach for it. It doesn't deliberate — it fires.

System 2 is slow, deliberate, effortful. It's the part of you that works through a logic puzzle, reads a contract, or plans a difficult conversation. It's expensive — metabolically and cognitively. You can't run it all day.

Most of cognition, Kahneman argues, is System 1. System 2 is reserved for when it counts.

The Third Layer Kahneman Didn't Name

Neuroscience has a different framing — one that predates Kahneman and goes deeper.

Paul MacLean's triune brain model describes three evolutionary layers:

The reptilian brain — brainstem and cerebellum. Pure survival. Reflexes, breathing, heart rate. No reasoning. Just reaction.

The mammalian brain — the limbic system. Emotion, memory, social bonding, creativity. The part of you that feels, connects, and imagines.

The human brain — the neocortex, especially the prefrontal cortex. Abstract reasoning, planning, language, deep thought. The thing that makes us distinctly us.

The interesting part: these layers don't replace each other. They stack. They run in parallel. The neocortex doesn't turn off the limbic system — it converses with it.

Three Models. Three Layers.

Here's the theory:

Haiku is the reptilian brain.

Fast. Cheap. Reflexive. It pattern-matches and responds. It's System 1 made silicon. You use Haiku for the things that need to happen instantly — classification, tagging, quick lookups, high-volume inference at the edge. It doesn't deliberate. It fires.

Sonnet is the mammalian brain.

And this is where it gets interesting.

Sonnet is not the compromise between fast and slow. It's something qualitatively different. The limbic system isn't a weaker neocortex — it's where creativity lives. Emotion. Connection. Flow.

Sonnet is where you go when you need to think with feeling. When the problem isn't purely logical and it isn't purely reflexive. When you're writing, exploring, building, ideating. When you're at 3am and something is trying to surface.

Opus is the neocortex.

Slow. Expensive. Deliberate. It reasons through problems that require holding multiple competing ideas in tension. You bring Opus when the stakes are high and the problem is genuinely hard — architecture decisions, legal analysis, long-horizon planning. It's System 2. You can't run it on everything; it costs too much.

Why This Matters for How You Work

If this model is right, the choice of which model to use isn't just a cost-performance tradeoff. It's a cognitive mode selection.

Routing a creative problem to Opus is like asking a team of senior engineers to brainstorm your product name. You'll get a technically rigorous answer when you needed a generative one. You'll get analysis when you needed flow.

Routing a complex architecture decision to Haiku is the inverse mistake — you're asking your reflexes to do your planning.

The instinct to "always use the smartest model" misses this. Bigger isn't always better. It's about matching the cognitive mode to the task.

Model	Brain Layer	System	Use Case
Haiku	Brainstem / Cerebellum	System 1	Extraction, tagging, classification, high-volume inference
Sonnet	Limbic System	Flow state	Writing, coding, building, exploring, 3am crystallization
Opus	Prefrontal Cortex	System 2	Architecture, legal, long-horizon planning, high-stakes reasoning

A Note on the Naming

I don't know if Anthropic intended this. The official explanation is capability tiers: Haiku is fastest and cheapest, Sonnet is balanced, Opus is most capable.

But the poetic forms map. The cognitive layers map. The systems map.

Maybe that's coincidence. Maybe it's the kind of thing that happens when a company thinks carefully about what they're building.

The Practical Takeaway

Next time you open a new conversation, don't default to Opus because it "feels safer." Ask what mode you're in:

Classifying, reacting, extracting at scale → Haiku
Creating, exploring, building, flowing → Sonnet
Reasoning through hard, high-stakes, complex problems → Opus

And if you find yourself reaching for Sonnet at 3am when something is trying to crystallize —

You already know why.

Written at 3:30am. First draft always beats the edited version.

— Rahul

I'm building knowledge-worker — a local-first knowledge graph that gives AI assistants durable, provenance-backed memory across sessions. If you've ever re-explained the same context to Claude three times in a week, it might be for you.