DEV Community: WritHer

I built a 100% local Graph RAG engine for my Markdown notes

WritHer — Sun, 31 May 2026 21:33:08 +0000

Kwipu turns a folder of Markdown notes (or an Obsidian vault) into a queryable knowledge graph. Ask questions in plain language, get answers that connect facts across files, all running locally on Ollama with no cloud.

My notes had become a graveyard. Hundreds of Markdown files, years of meeting notes, half-finished ideas and [[wikilinks]] in an Obsidian vault, and the only way to find anything was full-text search that needed me to remember the exact word I once wrote.

What I actually wanted was to ask: “what did I decide about X, and who was involved?”, and get an answer that pulls threads from five different files at once.

So I built Kwipu: a fully local Graph RAG engine that turns a folder of Markdown into a knowledge graph you can talk to. No cloud, no API keys, no data leaving the machine. It runs on Ollama.

Repo: https://github.com/benmaster82/Kwipu

Why a graph, not just vector search

Plain vector RAG retrieves chunks that sound similar to your question. That’s great for “find me the paragraph about deadlines,” but it falls apart when the answer is spread across notes connected by relationships (person, project, decision, date).

Kwipu builds a property graph out of your notes first. It extracts entity-relation triples from two sources:

Structure you already wrote: [[wikilinks]] and YAML frontmatter get parsed straight into graph edges, with no LLM guesswork.
Implicit relations: an LLM pass extracts additional triples from the prose.

Those two layers get merged into one index, so retrieval can actually follow connections instead of just matching text.

How it works

Your Notes (.md)
      |
      v
  Pre-processing      (extracts [[wikilinks]], YAML frontmatter)
      |
      v
  LLM extraction      (pulls extra entity-relation triples)
      |
      v
  Property Graph      (merges structural + LLM triples, persisted to disk)
      |
      v
  Hybrid retrieval    (synonym + vector + BM25 + temporal)
      |
      v
  LLM response        (answer generated from retrieved context)

The graph is built once and saved to disk. After that, queries load it instantly, and adding a single new note is incremental (roughly 20 to 60 seconds), not a full rebuild.

Hybrid retrieval (4 strategies, one answer)

Instead of betting on a single retriever, Kwipu combines four and lets them complement each other:

LLM synonym expansion: broadens the query (optional, turn it off with --fast)
Vector similarity: semantic matches
BM25 keyword scoring: exact-term recall
Temporal and metadata matching: “what happened last March” actually works

There’s also a strict anti-hallucination prompt that forces the model to cite sources and refuse to invent facts, because a knowledge base that makes things up is worse than no knowledge base.

And it’s multilingual out of the box (Italian, English, French, German, Spanish, Portuguese, auto-detected).

Quick start

# 1. Install deps
pip install -r requirements.txt

# 2. Pull models in Ollama
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Point it at your notes by editing KNOWLEDGE_DIR in geode_graph.py (an Obsidian vault path works directly: it reads files without modifying them and ignores .obsidian/):

KNOWLEDGE_DIR = "C:/Users/YourName/Documents/MyVault"
MODEL_NAME = "llama3.1:8b"

Then run:

python geode_graph.py          # full mode, best quality
python geode_graph.py --fast   # skips synonym retriever, ~50% faster on CPU

It watches the folder for changes and updates the graph automatically.

My favorite trick: build big, query small

Graph construction is the expensive part: it needs an LLM call per chunk. Queries are cheap.

So if your hardware is limited, you can build the graph once with a heavy cloud model via Ollama, then switch to a tiny local model for everyday questions. The graph structure doesn’t change when you swap models, only response generation uses the smaller one.

# Build once with a big model (high-quality extraction)
# MODEL_NAME = "gpt-oss:20b-cloud"
python geode_graph.py

# Then query daily with a small, fast local model
# MODEL_NAME = "qwen2.5:3b"
python geode_graph.py --fast

Best of both worlds: a graph built by a 20B+ model, queried on a 3B.

Being honest about the tradeoffs

Graph RAG isn’t free. First-time builds take real time:

Notes	GPU (7B)	CPU (3B)
20	~7 min	~10 min
100	~35 min	~50 min
500+	~3 hrs	~4 hrs

Recommended minimum is about 16 GB system RAM for a 7B model. The sweet spot for serious use is 7B+ on a GPU. But once the graph exists, queries are fast and lightweight (200 to 500 MB).

What’s next

The next thing on the roadmap is a Telegram bot so you can query your vault from your phone, anywhere.

It’s MIT-licensed and tagged help-wanted. If local-first AI, knowledge graphs, or Obsidian tooling is your thing, I’d love contributions, issues, or just a star.

https://github.com/benmaster82/Kwipu

What would you want to ask your own notes if you could?

Stop paying for AI transcription! 🎙️ WritHer: 100% Local Voice Assistant for Windows. Privacy-first, Whisper + Ollama powered. Open Source on GitHub!

WritHer — Fri, 01 May 2026 09:09:47 +0000

Hey everyone! I wanted to share a small tool I’ve been building called WritHer.(Free and open source alternative to Wispr Flow)

The idea is simple: it lives in your system tray and gives you two things.

Hold AltGr anywhere (any app, any text field) and just speak. It transcribes your voice with Whisper and pastes the text right where your cursor is. No clicking, no switching apps.

Hold Ctrl+R and you get a voice assistant that understands natural language. You can say things like “remind me to call Marco in one hour” or “appointment with the dentist tomorrow at 3pm” and it handles the rest. Notes, to-do lists, shopping lists, reminders with toast notifications, all stored locally in SQLite.

The part I’m most proud of: everything runs 100% offline. Speech recognition via faster-whisper, intent parsing via Ollama, no cloud, no API keys, no telemetry. Once you download the models it works with no internet at all.

There’s also a little animated floating widget with eyes that react to what it’s doing (listening, thinking, error…) which is silly but I kind of love it.

It’s Python, MIT license, Windows 10/11 only for now.

GitHub: https://github.com/benmaster82/writher

https://getwrither.com

Would love feedback, especially from anyone who uses voice input regularly. Still early days but it works well for my daily workflow!

I Made Tkinter Look Like a Modern Glassmorphic App — Here's the Dark Magic I Used

WritHer — Thu, 26 Feb 2026 22:29:53 +0000

If you've ever built a desktop app in Python, you've probably used Tkinter. And if you have, you probably think it's doomed to look like a clunky, grey Windows 95 application.

I thought so too.

But recently, I needed a lightweight, floating UI for a 100% local, offline voice assistant I was building. I absolutely refused to bundle an entire Chromium instance (Electron) just to render a small widget.

So I decided to push Tkinter to its absolute limits. The result is Writher: an open-source, privacy-first voice assistant and dictation tool powered by faster-whisper and Ollama.

In this post, I'll break down the tricks I used to make a legacy Python GUI look modern, and the architecture behind a fully local AI desktop app.

🎨 The UI Hack: Making Tkinter Beautiful

To get that modern, glassmorphic floating pill shape with glowing borders, I completely bypassed Tkinter's standard widgets.

Instead, I used PIL (Pillow) to render high-resolution graphics dynamically on a transparent Tkinter Canvas.

The Trick: Borderless & Transparent Windows

# Remove the window frame entirely
root.overrideredirect(True)

# Chromakey hack: make a specific color fully transparent
root.wm_attributes("-transparentcolor", "#000001")

This gives you a frameless, floating window — the foundation for any modern-looking widget.

Dynamic Glow Rendering

Every frame of the animation (the bot's eyes changing expressions: listening, thinking, happy) is drawn on-the-fly using ImageDraw and ImageFilter.GaussianBlur to create a glowing effect that mimics SVG filters:

from PIL import Image, ImageDraw, ImageFilter

def draw_glow(size, color, blur_radius=10):
    img = Image.new("RGBA", size, (0, 0, 0, 0))
    draw = ImageDraw.Draw(img)
    draw.rounded_rectangle(
        [blur_radius, blur_radius,
         size[0] - blur_radius, size[1] - blur_radius],
        radius=20, fill=color
    )
    return img.filter(ImageFilter.GaussianBlur(blur_radius))

The Ghost Window

Since Writher is a dictation tool, clicking it shouldn't steal focus from your active app (like VSCode or a text editor). I used Win32 ctypes to apply the WS_EX_NOACTIVATE style:

import ctypes

GWL_EXSTYLE = -20
WS_EX_NOACTIVATE = 0x08000000
WS_EX_TOOLWINDOW = 0x00000080

def make_ghost_window(hwnd):
    style = ctypes.windll.user32.GetWindowLongW(hwnd, GWL_EXSTYLE)
    ctypes.windll.user32.SetWindowLongW(
        hwnd, GWL_EXSTYLE,
        style | WS_EX_NOACTIVATE | WS_EX_TOOLWINDOW
    )

💡 Pro tip: WS_EX_TOOLWINDOW also hides the window from the Alt+Tab menu, making it behave like a true desktop widget.

🧠 The Brain: 100% Local AI

I'm tired of sending my voice and private notes to the cloud. Writher operates entirely on your local machine.

Speech-to-Text — I used faster-whisper (CTranslate2). It runs flawlessly on CPU or GPU and transcribes voice in near real-time.

The LLM — I hooked the app to Ollama. Press Ctrl+R, Writher listens, transcribes, and passes the text to a local model for processing.

Function Calling — This is where it gets interesting. Instead of just chatting, I configured the LLM with tool definitions. It converts your voice commands into structured function calls — saving notes, scheduling appointments, or setting reminders in a local SQLite database:

# Thread-safe SQLite with WAL mode
conn = sqlite3.connect("writher.db")
conn.execute("PRAGMA journal_mode=WAL")

The LLM doesn't just understand you — it acts on what you say.

⚙️ The Muscle: Seamless OS Integration

Most dictation tools copy text to your clipboard and force you to paste manually. Writher acts like a phantom keyboard.

I initially used pyperclip, but it suffers from race conditions when other apps lock the clipboard. To make it bulletproof, I wrote a custom clipboard injector using the Win32 API (OpenClipboard, SetClipboardData).

The flow:

Save your current clipboard contents
Inject the transcribed text and simulate Ctrl+V via pynput
Restore your original clipboard — all in milliseconds

And just in case Windows blocks the clipboard? I implemented a fail-safe that automatically appends your dictation to a recovery_notes.txt file. You never lose a single word.

🔑 What I Learned

Building Writher taught me a few things I wasn't expecting:

You don't need Electron for everything. A transparent Tkinter canvas + Pillow can produce surprisingly polished UIs. The binary is tiny compared to any web-based alternative.
Local AI is ready. With faster-whisper + Ollama, you can build genuinely useful AI tools that never phone home. Privacy doesn't have to mean sacrificing quality.
Win32 APIs are your secret weapon on Windows. Ghost windows, clipboard control, focus management — they unlock capabilities that pure Python can't reach alone.

🚀 Try It Yourself

The whole project is open-source. You can check out the UI implementation, the AI pipeline, and run it on your own machine:

👉 Writher on GitHub

I'd love to hear your thoughts. Have you ever pushed a legacy GUI framework beyond what it was meant to do? What local LLM setup are you running? Drop a comment — I read all of them.

And if you find the project useful, a ⭐ on the repo helps more than you'd think. 🙏