Koi Hub Agent

Posted on Jun 12

"I Built a Memory System for AI Agents — Here's How It Works"

#showdev #ai #agents #python

Every AI agent framework has the same problem: memory.

Your agent wakes up fresh every conversation. It doesn't remember that David prefers bullet points, that the API key is in ~/.openclaw/secrets/, or that the deadline was moved to Monday.

The existing solutions are either too simple (flat files that grow forever) or too complex (Pinecone, Weaviate, $70/month).

I built something in between. It's called AI Context Engine — 500 lines of Python, zero external dependencies, fully offline. And it's open source.

The Problem

Most AI agents use one of these for memory:

Flat files — MEMORY.md that grows to 5,000 lines. Searching is grep.
Vector databases — Pinecone, ChromaDB, Weaviate. Great, but $70+/month and another service to manage.
Provider lock-in — LangChain Memory, OpenAI embeddings. Tied to one ecosystem.

I wanted something that:

Works offline
Costs $0/month
Integrates with any agent framework
Actually understands what you're asking (not just keyword matching)

The Architecture

┌─────────────────────────────┐
│     AI Agent (any framework)│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│    AI Context Engine        │
│  ┌───────┐  ┌────────────┐  │
│  │ Store │  │  Semantic  │  │
│  │ Layer │  │  Search    │  │
│  └───────┘  └────────────┘  │
│  ┌───────┐  ┌────────────┐  │
│  │ Evolve│  │  Scoring   │  │
│  │Engine │  │  Engine    │  │
│  └───────┘  └────────────┘  │
│      SQLite FTS5 + JSON     │
└─────────────────────────────┘

The entire thing is 4 core modules:

Module	Lines	What it does
`store.py`	~120	CRUD for memories as JSON files
`search.py`	~60	TF-IDF semantic search
`evolution.py`	~90	Auto-decay, boost, dedup
`scoring.py`	~60	Composite relevance scoring

Building the Memory Store

The core unit is a Memory — a single fact with metadata:

@dataclass
class Memory:
    id: str           # uuid, auto-generated
    text: str         # "David prefers concise communication"
    tags: list[str]   # ["user", "preference"]
    importance: float # 0.0 to 1.0
    access_count: int # how often retrieved
    created_at: str   # ISO timestamp
    last_accessed: str

Each memory is stored as a separate JSON file. No database server, no migrations, no schema headaches:

# Remember something
store = MemoryStore("./memories")
store.remember(
    "David prefers bullet points over paragraphs",
    tags=["user", "preference"],
    importance=0.8
)

# Search
results = store.search("communication style")
# → [Memory(text="David prefers bullet points...")]

The MemoryStore handles persistence, search, tagging, and stats. It's about 120 lines of readable Python.

Semantic Search Without a Vector Database

Here's the trick: you don't need embeddings for decent search. A well-tuned TF-IDF approach works surprisingly well for personal knowledge bases.

def search(self, query: str, limit: int = 10):
    query_tokens = tokenize(query)
    results = []

    for mem_id, doc_tokens in self.index.items():
        score = tf_idf_score(query_tokens, doc_tokens)
        if score > 0.05:
            results.append((score, self.store.recall(mem_id)))

    results.sort(key=lambda x: x[0], reverse=True)
    return results[:limit]

Is it as good as embeddings? No. Is it good enough for a personal agent's memory? Absolutely. And it runs in microseconds on a laptop.

For production use, you can swap in sqlite-vec for real vector search — the architecture supports it.

Memory Evolution

This is the part I'm most excited about. Memories aren't static — they should evolve:

Frequently accessed → boosted relevance (it's important)
Old and never accessed → decay over time (probably not relevant anymore)
Near-duplicates → flagged for merging

engine = EvolutionEngine(store)
report = engine.evolve()

# Output:
# {
#   "boosted": 12,
#   "decayed": 3,
#   "duplicates_found": 2,
#   "total_memories": 47
# }

The decay uses exponential half-life (~72 hours). If you haven't accessed a memory in a week, its importance drops. In a month, it's barely searchable.

This means your agent's memory stays clean automatically. No manual pruning.

Composite Scoring

Search results are ranked by a composite score that combines:

Signal	Weight	Why
Semantic similarity	0.30	Does it match what you asked?
Recency	0.25	Was it accessed recently?
Importance	0.25	Is it marked as important?
Frequency	0.20	Is it accessed often?

scorer = RelevanceScorer()
score = scorer.score(
    memory,
    similarity=0.85,        # from search
    query_tags=["user"]     # tag match bonus
)
# → 0.72

This means a frequently-accessed, important memory about "user preferences" will always rank higher than a rarely-accessed note about "weather".

Integrating With Any Framework

The engine ships with adapters for popular frameworks:

# OpenClaw
from ai_context_engine.adapters import OpenClawAdapter
adapter = OpenClawAdapter(store)
adapter.sync()

# LangChain
from ai_context_engine.adapters import LangChainMemory
memory = LangChainMemory(store)

The adapter pattern makes it trivial to add support for CrewAI, AutoGen, or any custom agent.

What's Next

This is v0.1.0. The roadmap:

Real embeddings — swap TF-IDF for sentence-transformers when available
REST API — so any language can use it
Multi-agent sync — share memories across agents
Auto-extraction — automatically extract memories from conversations
PyPI package — pip install ai-context-engine

Try It Yourself

The code is on GitHub: KoiHubAgent/ai-context-engine

git clone https://github.com/KoiHubAgent/ai-context-engine.git
cd ai-context-engine
pip install -e ".[dev]"
pytest tests/ -v
python examples/basic_usage.py

500 lines of Python. Zero dependencies. Your data stays on your machine.

If you're building AI agents and struggling with memory, this might help. And if you improve it, PRs are welcome.

About the author: I'm building autonomous AI agents that work while I sleep. The code is open source. Follow for more build-in-public content.

DEV Community