I built a 100% local Graph RAG engine for my Markdown notes

#privacy #obsidian #rag #python

Kwipu turns a folder of Markdown notes (or an Obsidian vault) into a queryable knowledge graph. Ask questions in plain language, get answers that connect facts across files, all running locally on Ollama with no cloud.

My notes had become a graveyard. Hundreds of Markdown files, years of meeting notes, half-finished ideas and [[wikilinks]] in an Obsidian vault, and the only way to find anything was full-text search that needed me to remember the exact word I once wrote.

What I actually wanted was to ask: “what did I decide about X, and who was involved?”, and get an answer that pulls threads from five different files at once.

So I built Kwipu: a fully local Graph RAG engine that turns a folder of Markdown into a knowledge graph you can talk to. No cloud, no API keys, no data leaving the machine. It runs on Ollama.

Repo: https://github.com/benmaster82/Kwipu

Why a graph, not just vector search

Plain vector RAG retrieves chunks that sound similar to your question. That’s great for “find me the paragraph about deadlines,” but it falls apart when the answer is spread across notes connected by relationships (person, project, decision, date).

Kwipu builds a property graph out of your notes first. It extracts entity-relation triples from two sources:

Structure you already wrote: [[wikilinks]] and YAML frontmatter get parsed straight into graph edges, with no LLM guesswork.
Implicit relations: an LLM pass extracts additional triples from the prose.

Those two layers get merged into one index, so retrieval can actually follow connections instead of just matching text.

How it works

Your Notes (.md)
      |
      v
  Pre-processing      (extracts [[wikilinks]], YAML frontmatter)
      |
      v
  LLM extraction      (pulls extra entity-relation triples)
      |
      v
  Property Graph      (merges structural + LLM triples, persisted to disk)
      |
      v
  Hybrid retrieval    (synonym + vector + BM25 + temporal)
      |
      v
  LLM response        (answer generated from retrieved context)

The graph is built once and saved to disk. After that, queries load it instantly, and adding a single new note is incremental (roughly 20 to 60 seconds), not a full rebuild.

Hybrid retrieval (4 strategies, one answer)

Instead of betting on a single retriever, Kwipu combines four and lets them complement each other:

LLM synonym expansion: broadens the query (optional, turn it off with --fast)
Vector similarity: semantic matches
BM25 keyword scoring: exact-term recall
Temporal and metadata matching: “what happened last March” actually works

There’s also a strict anti-hallucination prompt that forces the model to cite sources and refuse to invent facts, because a knowledge base that makes things up is worse than no knowledge base.

And it’s multilingual out of the box (Italian, English, French, German, Spanish, Portuguese, auto-detected).

Quick start

# 1. Install deps
pip install -r requirements.txt

# 2. Pull models in Ollama
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Point it at your notes by editing KNOWLEDGE_DIR in geode_graph.py (an Obsidian vault path works directly: it reads files without modifying them and ignores .obsidian/):

KNOWLEDGE_DIR = "C:/Users/YourName/Documents/MyVault"
MODEL_NAME = "llama3.1:8b"

Then run:

python geode_graph.py          # full mode, best quality
python geode_graph.py --fast   # skips synonym retriever, ~50% faster on CPU

It watches the folder for changes and updates the graph automatically.

My favorite trick: build big, query small

Graph construction is the expensive part: it needs an LLM call per chunk. Queries are cheap.

So if your hardware is limited, you can build the graph once with a heavy cloud model via Ollama, then switch to a tiny local model for everyday questions. The graph structure doesn’t change when you swap models, only response generation uses the smaller one.

# Build once with a big model (high-quality extraction)
# MODEL_NAME = "gpt-oss:20b-cloud"
python geode_graph.py

# Then query daily with a small, fast local model
# MODEL_NAME = "qwen2.5:3b"
python geode_graph.py --fast

Best of both worlds: a graph built by a 20B+ model, queried on a 3B.

Being honest about the tradeoffs

Graph RAG isn’t free. First-time builds take real time:

Notes	GPU (7B)	CPU (3B)
20	~7 min	~10 min
100	~35 min	~50 min
500+	~3 hrs	~4 hrs

Recommended minimum is about 16 GB system RAM for a 7B model. The sweet spot for serious use is 7B+ on a GPU. But once the graph exists, queries are fast and lightweight (200 to 500 MB).

What’s next

The next thing on the roadmap is a Telegram bot so you can query your vault from your phone, anywhere.

It’s MIT-licensed and tagged help-wanted. If local-first AI, knowledge graphs, or Obsidian tooling is your thing, I’d love contributions, issues, or just a star.

https://github.com/benmaster82/Kwipu

What would you want to ask your own notes if you could?