Why Your AI Coding Assistant Forgets Everything (And How I Fixed It)

I've been using Claude Code for months. It's incredible — until you start a new session.

"What architecture decisions did we make yesterday?"

"I don't have information about previous conversations."

Every. Single. Time.

I'd re-explain my project structure. Re-describe my conventions. Re-contextualize bugs I already fixed. I estimated I was spending 20-30% of my Claude Code time just re-establishing context that the AI had already processed.

So I built UltraBrain.

What is UltraBrain?

UltraBrain is an open-source plugin for Claude Code that gives your AI persistent memory across sessions. It runs silently in the background through 5 lifecycle hooks:

SessionStart → UserPromptSubmit → PostToolUse → Stop → SessionEnd

Every session, it:

Captures tool usage, file changes, and Claude's reasoning
Compresses raw data into semantic observations (via AI — free with Groq)
Embeds observations into LanceDB for vector similarity search
Injects the most relevant context when your next session starts

Claude sees a concise summary of your project history at the start of every session. No manual intervention required.

The numbers

Metric	Result
Vector search latency	1.3ms
Context injection	<50ms
Token savings	~80% vs raw context
AI processing cost	$0 (Groq free tier)
Setup time	2 commands

It's more than memory

What started as a memory layer turned into a full development command center:

Project Management — Bugs, todos, ideas, and learnings automatically extracted from your sessions
Kanban Board — Drag-and-drop tasks auto-created from AI observations
CLAUDE.md Manager — Browse and edit all 7 tiers of CLAUDE.md files
Ralph Loop — Autonomous coding iterations launched from the dashboard
Auto-Tagging — AI classifies every observation (bug, todo, idea, learning, etc.)
Mission Control — Web terminal, automation engine, analytics, session recording, knowledge graph

How it works under the hood

The core insight: you don't need to store everything. You need to store the right things in the right format.

UltraBrain uses a compression pipeline:

Raw tool calls (thousands per session)
    ↓ AI compression (Groq, free)
Semantic observations (~5-15 per session)
    ↓ Embedding (all-MiniLM-L6-v2, local ONNX)
384-dimensional vectors in LanceDB
    ↓ Similarity search (<2ms)
Top-k relevant observations
    ↓ Progressive disclosure
Injected context (~300 tokens)

The AI never sees raw data. It sees compressed, relevant, ranked observations. This saves ~80% of tokens compared to naive context loading.

Stack

TypeScript — Hooks, worker, everything
LanceDB — Native Rust vector engine, runs in-process
Bun SQLite — Local database, zero external dependencies
ONNX Runtime — all-MiniLM-L6-v2 embeddings, in-process
React — Dashboard UI (built to single HTML file)
No Python — Zero. None. Nada.

Try it

/plugin marketplace add EconLab-AI/Ultrabrain
/plugin install ultrabrain

That's it. Two commands. Your AI never forgets again.

GitHub: https://github.com/EconLab-AI/Ultrabrain
License: MIT — free to use, modify, and distribute.
Contributions welcome — check the good first issues.

DEV Community