DEV Community

Cover image for Why Your AI Coding Assistant Forgets Everything (And How I Fixed It)
Giuliano Falco
Giuliano Falco

Posted on

Why Your AI Coding Assistant Forgets Everything (And How I Fixed It)

I've been using Claude Code for months. It's incredible — until you start a new session.

"What architecture decisions did we make yesterday?"

"I don't have information about previous conversations."

Every. Single. Time.

I'd re-explain my project structure. Re-describe my conventions. Re-contextualize bugs I already fixed. I estimated I was spending 20-30% of my Claude Code time just re-establishing context that the AI had already processed.

So I built UltraBrain.

What is UltraBrain?

UltraBrain is an open-source plugin for Claude Code that gives your AI persistent memory across sessions. It runs silently in the background through 5 lifecycle hooks:

SessionStart → UserPromptSubmit → PostToolUse → Stop → SessionEnd
Enter fullscreen mode Exit fullscreen mode

Every session, it:

  1. Captures tool usage, file changes, and Claude's reasoning
  2. Compresses raw data into semantic observations (via AI — free with Groq)
  3. Embeds observations into LanceDB for vector similarity search
  4. Injects the most relevant context when your next session starts

Claude sees a concise summary of your project history at the start of every session. No manual intervention required.

The numbers

Metric Result
Vector search latency 1.3ms
Context injection <50ms
Token savings ~80% vs raw context
AI processing cost $0 (Groq free tier)
Setup time 2 commands

It's more than memory

What started as a memory layer turned into a full development command center:

  • Project Management — Bugs, todos, ideas, and learnings automatically extracted from your sessions
  • Kanban Board — Drag-and-drop tasks auto-created from AI observations
  • CLAUDE.md Manager — Browse and edit all 7 tiers of CLAUDE.md files
  • Ralph Loop — Autonomous coding iterations launched from the dashboard
  • Auto-Tagging — AI classifies every observation (bug, todo, idea, learning, etc.)
  • Mission Control — Web terminal, automation engine, analytics, session recording, knowledge graph

How it works under the hood

The core insight: you don't need to store everything. You need to store the right things in the right format.

UltraBrain uses a compression pipeline:

Raw tool calls (thousands per session)
    ↓ AI compression (Groq, free)
Semantic observations (~5-15 per session)
    ↓ Embedding (all-MiniLM-L6-v2, local ONNX)
384-dimensional vectors in LanceDB
    ↓ Similarity search (<2ms)
Top-k relevant observations
    ↓ Progressive disclosure
Injected context (~300 tokens)
Enter fullscreen mode Exit fullscreen mode

The AI never sees raw data. It sees compressed, relevant, ranked observations. This saves ~80% of tokens compared to naive context loading.

Stack

  • TypeScript — Hooks, worker, everything
  • LanceDB — Native Rust vector engine, runs in-process
  • Bun SQLite — Local database, zero external dependencies
  • ONNX Runtime — all-MiniLM-L6-v2 embeddings, in-process
  • React — Dashboard UI (built to single HTML file)
  • No Python — Zero. None. Nada.

Try it

/plugin marketplace add EconLab-AI/Ultrabrain
/plugin install ultrabrain
Enter fullscreen mode Exit fullscreen mode

That's it. Two commands. Your AI never forgets again.

GitHub: https://github.com/EconLab-AI/Ultrabrain
License: MIT — free to use, modify, and distribute.
Contributions welcome — check the good first issues.

Top comments (0)