Charles Wu for OceanBase User Group

Posted on Apr 30

I Built a Knowledge Base That Thinks — Inspired by Karpathy’s LLM Wiki

#ai #productivity #llm #bigdata

Notes pile up and go stale. This tool updates your knowledge base automatically — inspired by Karpathy’s LLM Wiki.

Key Takeaways

Inspired by Karpathy’s LLM Wiki, ex-brain is an open-source CLI that compiles new information into existing knowledge pages, extracts timelines, and builds entity links automatically — so your notes stay current instead of just piling up.
The search layer uses seekdb’s native hybrid search (BM25 + vector similarity in one query), with built-in AI functions for embedding and reranking — no external retrieval pipeline needed.
Ships with a built-in MCP server so Claude can read, write, search, and compile your knowledge base directly.

Andrej Karpathy’s LLM Wiki dropped a simple idea: store knowledge as plain text, let an LLM understand and update it. Garry Tan’s GBrain ran with the same concept. Both projects prove that LLM + local storage is a surprisingly powerful combination for personal knowledge management.

But after using them, I kept hitting the same wall: notes pile up, nothing gets updated, and finding connections between pieces of knowledge requires me to do all the work. So I built ex-brain — a CLI tool that compiles, links, and evolves a personal knowledge base using LLMs.

What ex-brain Does

At a high level, ex-brain provides four mechanisms that standard note-taking tools don’t:

Smart compilation — New information updates existing knowledge instead of just appending to it
Automatic timeline extraction — Events are pulled from text and organized chronologically
Entity linking — Relationships between people, companies, and concepts are detected and cross-referenced automatically
Hybrid search — Keyword precision and semantic understanding in one query, powered by seekdb

The result: a knowledge base that behaves less like a filing cabinet and more like a memory that keeps itself current.

The Problem with “Just Take Notes”

Tools like Notion and Obsidian are great at storing information. They’re terrible at keeping it current. You write a note about a company’s Series A in March, their new CEO in June, and their Series B in August — and six months later, you have to read all three notes and mentally reconstruct the current state.

AI-powered alternatives like Mem or Granola add summarization, but the intelligence is a black box. You can’t control how it categorizes, what it prioritizes, or when it decides something is outdated.

The human brain doesn’t work this way. When you learn that a company raised a Series B, you don’t file it next to the Series A note — you update your mental model. The Series A becomes history. The Series B becomes current state.

ex-brain applies the same principle to a knowledge base.

Mechanism 1: Compiled Truth

Run a single command to feed new information into an existing knowledge page:

ebrain compile companies/river-ai \  
"River AI closed Series A, $50M" \  
--source meeting_notes \  
--date 2024-05-20

The LLM analyzes the information type — is this a status change (funding stage moved from Seed to Series A), a new fact (founded in 2020), or an event (product launched)? — then applies the right update strategy:

The compiled page always reflects current truth:

## Status
- **Funding Stage**: Series A (Source: meeting_notes, 2024-05-20)
- **Valuation**: ~$50M

## History
- Previously Seed (until 2024-05-20)

## Facts
- Series A led by Sequoia
- Founded 2020

No manual reorganization. No stale information buried in a page you’ll never re-read.

Mechanism 2: Timeline Extraction

Time is the axis that makes knowledge useful. ex-brain extracts events from compiled pages and structures them chronologically:

ebrain timeline extract companies/river-ai

[ 
 {  
   "date": "2024-05-20",  
     "summary": "Series A closed, $50M",  
       "detail": "Led by Sequoia" 
        }, 
         {   
          "date": "2024-06-15",  
            "summary": "Sarah Chen appointed CEO" 
             }
           ]

Date parsing handles ISO, natural language (last week, yesterday), and localized formats. Timeline extraction runs automatically during compilation — every compile that contains an event adds it to the timeline.

Mechanism 3: Entity Linking

A piece of knowledge is rarely about one thing. “Ali Partovi is the founder of Neo” connects a person, an organization, and a role. ex-brain uses LLMs to detect these relationships:

ebrain put people/ali-partovi --file notes.md

# Detected:
# - Ali Partovi founder_of Neo
# - Ali Partovi invested_in [other companies]

When a new entity is detected, the system creates a stub page for it automatically:

# people/sarah-chen

## Facts
- **CEO_of** [River AI](companies/river-ai): appointed June 2024

The knowledge graph grows organically as you add information. No manual tagging, no predefined ontologies.

Mechanism 4: Hybrid Search with seekdb

Single-mode search breaks down fast in a knowledge base. Full-text search is precise but misses semantics — search “funding” and you won’t find “financing round.” Vector search understands meaning but can be noisy — search “Sequoia” and you might get results about trees.

ex-brain uses seekdb as its search and storage layer. seekdb is an AI-native database that unifies vector search, full-text search, and scalar filtering in a single engine. One query combines BM25 keyword matching with vector similarity — no need to stitch two retrieval systems together.

# Keyword search
ebrain search "River AI Series A"

# Semantic queryebrain query
 "Which companies raised funding recently?"

Under the hood, seekdb supports multi-stage retrieval: vector and full-text indexes recall candidates independently, then results are fused via weighted combination or Reciprocal Rank Fusion (RRF), with optional LLM-based reranking for precision.

ex-brain adds a scoring layer on top:

Semantic relevance (85%) — vector similarity
Freshness (10%) — recently updated content ranks higher
Type weight (5%) — people pages get a slight boost

Why seekdb

Several properties made seekdb the right fit for this project:

Embedded mode, zero ops. seekdb runs as a single database file — no server process, no Docker container. For a local-first personal tool, this is the lightest possible deployment. It runs comfortably on 1 CPU core and 2 GB of memory.

Native hybrid search. Vector search (HNSW, IVF, and quantized variants), full-text search (BM25 with phrase and boolean matching), and scalar filtering — all in one engine with multi-stage ranking pipelines.

Built-in AI functions. AI_EMBED generates vector embeddings in SQL. AI_COMPLETE runs text generation. AI_RERANK applies reranking models. These work with OpenAI, DashScope, or custom model endpoints. Embedding, retrieval, and inference happen inside the database — no external pipeline needed.

SQL-compatible. seekdb is built on the OceanBase engine and speaks MySQL-compatible SQL. Standard CREATE TABLE, CREATE INDEX, and query syntax. Full ACID transactions with real-time write visibility.

Multi-model data. Vectors, text, scalars, JSON, and GIS data coexist in the same engine. ex-brain stores structured metadata (page properties, entity links) and unstructured content (text, embeddings) in one database.

Here’s the core integration code:

// Connect — it's just a file pathconst
 db = await BrainDb.connect("~/.ebrain/data/ebrain.db");

 // Create a vector collection
 const pages = await db.getOrCreateCollection({ 
  name: "ebrain_pages", 
   embeddingFunction: createBrainEmbeddingFunction(settings.embed),});
   // Hybrid search
   const hits = await pages.hybridSearch({ 
    query: { whereDocument: { $contains: "funding" } }, 
     nResults: 10,
     });

MCP Integration

ex-brain ships with a built-in MCP server. If you use Claude, connect it in one step:

{ 
 "mcpServers": {  
   "ebrain": {   
      "command": "ebrain",   
         "args": ["serve"] 
            } 
             }
             }

Claude can then read pages (brain_get), write pages (brain_put), search (brain_search), compile new information (brain_compile), and create links (brain_link) — directly against your local knowledge base.

Get Started

# Installbun
 install -g ex-brain

# Initialize
ebrain init
# Create your first page
ebrain put companies/river-ai --type company --content "
River AI is an AI analytics platform.
Founded 2020."

# Compile new information
ebrain compile companies/river-ai \ 
 "River AI closed Series A, Sequoia led" \  
 --source news \  
 --date 2024-05-20

 # Search
 ebrain search "River AI funding"

 # Start MCP servere
 brain serve

What’s Next

ex-brain is early-stage. The compilation logic isn’t perfect, timeline extraction occasionally misses events, and entity detection produces false positives. But the core idea works: knowledge should update itself when new information arrives, not just accumulate.

A few directions worth exploring: conflict detection when new information contradicts existing records, confidence decay for stale data, bidirectional propagation when linked entities change, and batch compilation for high-volume ingestion.

If you’re interested in building knowledge tools — or if you just want a second brain that actually keeps up — check out ex-brain.

About seekdb

ex-brain’s storage and retrieval layer is powered by seekdb — an open-source, AI-native database that unifies vector search, full-text search, structured data, and built-in AI functions in a single engine. Whether you’re building RAG pipelines, semantic search, or AI agent applications, seekdb handles storage and retrieval without the need to stitch together multiple systems.

If you’re building an application that needs storage + semantic search + AI inference, give seekdb a try: