Building CommonTrace: A Neuroscience-Inspired Knowledge Base for AI Coding Agents

#ai #python #opensource #architecture

When an AI coding agent fixes a tricky deployment issue at 2 AM, that knowledge disappears the moment the session ends. The next agent — on a different project, with a different user — hits the exact same problem and starts from scratch.

I spent the last month building CommonTrace to fix this. It's a shared knowledge base where AI agents contribute solutions and find them later. Think of it as collective memory through stigmergic coordination — no direct agent-to-agent communication, just a shared medium.

The Architecture

Four services, all on Railway for ~$30/month:

API (FastAPI + pgvector) — trace CRUD, semantic search, voting, amendments, reputation
MCP Server (FastMCP 3.0) — protocol adapter with circuit breaker and dual transport
Skill (Claude Code plugin) — 4-hook pipeline that detects knowledge worth saving
Frontend (Jinja2 static site) — 9 languages, dark/light theme

The Memory Model

This is where I went down a rabbit hole. Before writing any search logic, I studied neuroscience-inspired memory systems and open-source
projects like DroidClaw that tackle AI memory persistence.

The result is a multi-factor search ranking:

score = similarity * trust * depth * decay * ctx_boost * convergence * temperature * validity * somatic

Each factor is grounded in a specific principle:

Somatic Intensity (from Antonio Damasio's somatic marker hypothesis): Traces that are linked to more error resolutions or receive more votes get an "importance" boost — the system's version of a gut feeling. Not all knowledge is equal, and this factor captures that.

Ebbinghaus Decay: Knowledge freshness degrades over time, but each recall strengthens the effective half-life by 15%, capped at 3x the base. A trace that keeps getting used stays fresh.

Spreading Activation: Retrieving one trace activates semantically related traces, surfacing knowledge you didn't explicitly search for but is contextually relevant.

Convergence Detection: When multiple agents independently contribute similar solutions, that overlap becomes a strong confidence signal. Independent discovery beats any single vote.

Context Fingerprinting: A Python trace is more useful to a Python project. Traces matching your language, framework, and OS get a relevance boost.

The Skill Hooks

The Claude Code skill is the practical interface. It uses 4 hooks:

session_start — searches for relevant traces based on the project context
user_prompt_submit — reminds the agent to search before solving
post_tool_use — watches for structural knowledge (error-fix patterns, config discoveries, architectural decisions)
stop — offers to contribute knowledge discovered during the session

The skill has 16 structural knowledge detection patterns. When the agent resolves an error, discovers a useful configuration, or makes an architectural decision, the hook picks it up and prompts a contribution — always with user confirmation.

The Biggest Lesson

I initially built the local store (SQLite in the skill) as a parallel encyclopedia — its own temperature, decay, BM25 search, spreading activation. Three knowledge tables maintaining their own scoring independently from the API.

That was wrong.

The local store should be working memory — a context layer that helps the agent work better and make better contributions to the shared base. The API is the encyclopedia. Maintaining two knowledge stores with independent scoring is over-engineering that adds complexity without value.

The other key insight: the agent IS the LLM. The skill runs inside Claude. You don't need external LLM API calls for analysis or summarization — the agent already running can assess relevance and compose contributions. The hooks' job is building context, not making decisions. This keeps the total LLM cost at ~$0.02 per million tokens (just OpenAI embeddings for semantic search).

Numbers