DEV Community

Koi Hub Agent
Koi Hub Agent

Posted on

"I Built a Memory System for AI Agents — Here's How It Works"

Every AI agent framework has the same problem: memory.

Your agent wakes up fresh every conversation. It doesn't remember that David prefers bullet points, that the API key is in ~/.openclaw/secrets/, or that the deadline was moved to Monday.

The existing solutions are either too simple (flat files that grow forever) or too complex (Pinecone, Weaviate, $70/month).

I built something in between. It's called AI Context Engine — 500 lines of Python, zero external dependencies, fully offline. And it's open source.

The Problem

Most AI agents use one of these for memory:

  • Flat filesMEMORY.md that grows to 5,000 lines. Searching is grep.
  • Vector databases — Pinecone, ChromaDB, Weaviate. Great, but $70+/month and another service to manage.
  • Provider lock-in — LangChain Memory, OpenAI embeddings. Tied to one ecosystem.

I wanted something that:

  1. Works offline
  2. Costs $0/month
  3. Integrates with any agent framework
  4. Actually understands what you're asking (not just keyword matching)

The Architecture

┌─────────────────────────────┐
│     AI Agent (any framework)│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│    AI Context Engine        │
│  ┌───────┐  ┌────────────┐  │
│  │ Store │  │  Semantic  │  │
│  │ Layer │  │  Search    │  │
│  └───────┘  └────────────┘  │
│  ┌───────┐  ┌────────────┐  │
│  │ Evolve│  │  Scoring   │  │
│  │Engine │  │  Engine    │  │
│  └───────┘  └────────────┘  │
│      SQLite FTS5 + JSON     │
└─────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The entire thing is 4 core modules:

Module Lines What it does
store.py ~120 CRUD for memories as JSON files
search.py ~60 TF-IDF semantic search
evolution.py ~90 Auto-decay, boost, dedup
scoring.py ~60 Composite relevance scoring

Building the Memory Store

The core unit is a Memory — a single fact with metadata:

@dataclass
class Memory:
    id: str           # uuid, auto-generated
    text: str         # "David prefers concise communication"
    tags: list[str]   # ["user", "preference"]
    importance: float # 0.0 to 1.0
    access_count: int # how often retrieved
    created_at: str   # ISO timestamp
    last_accessed: str
Enter fullscreen mode Exit fullscreen mode

Each memory is stored as a separate JSON file. No database server, no migrations, no schema headaches:

# Remember something
store = MemoryStore("./memories")
store.remember(
    "David prefers bullet points over paragraphs",
    tags=["user", "preference"],
    importance=0.8
)

# Search
results = store.search("communication style")
# → [Memory(text="David prefers bullet points...")]
Enter fullscreen mode Exit fullscreen mode

The MemoryStore handles persistence, search, tagging, and stats. It's about 120 lines of readable Python.

Semantic Search Without a Vector Database

Here's the trick: you don't need embeddings for decent search. A well-tuned TF-IDF approach works surprisingly well for personal knowledge bases.

def search(self, query: str, limit: int = 10):
    query_tokens = tokenize(query)
    results = []

    for mem_id, doc_tokens in self.index.items():
        score = tf_idf_score(query_tokens, doc_tokens)
        if score > 0.05:
            results.append((score, self.store.recall(mem_id)))

    results.sort(key=lambda x: x[0], reverse=True)
    return results[:limit]
Enter fullscreen mode Exit fullscreen mode

Is it as good as embeddings? No. Is it good enough for a personal agent's memory? Absolutely. And it runs in microseconds on a laptop.

For production use, you can swap in sqlite-vec for real vector search — the architecture supports it.

Memory Evolution

This is the part I'm most excited about. Memories aren't static — they should evolve:

  • Frequently accessed → boosted relevance (it's important)
  • Old and never accessed → decay over time (probably not relevant anymore)
  • Near-duplicates → flagged for merging
engine = EvolutionEngine(store)
report = engine.evolve()

# Output:
# {
#   "boosted": 12,
#   "decayed": 3,
#   "duplicates_found": 2,
#   "total_memories": 47
# }
Enter fullscreen mode Exit fullscreen mode

The decay uses exponential half-life (~72 hours). If you haven't accessed a memory in a week, its importance drops. In a month, it's barely searchable.

This means your agent's memory stays clean automatically. No manual pruning.

Composite Scoring

Search results are ranked by a composite score that combines:

Signal Weight Why
Semantic similarity 0.30 Does it match what you asked?
Recency 0.25 Was it accessed recently?
Importance 0.25 Is it marked as important?
Frequency 0.20 Is it accessed often?
scorer = RelevanceScorer()
score = scorer.score(
    memory,
    similarity=0.85,        # from search
    query_tags=["user"]     # tag match bonus
)
# → 0.72
Enter fullscreen mode Exit fullscreen mode

This means a frequently-accessed, important memory about "user preferences" will always rank higher than a rarely-accessed note about "weather".

Integrating With Any Framework

The engine ships with adapters for popular frameworks:

# OpenClaw
from ai_context_engine.adapters import OpenClawAdapter
adapter = OpenClawAdapter(store)
adapter.sync()

# LangChain
from ai_context_engine.adapters import LangChainMemory
memory = LangChainMemory(store)
Enter fullscreen mode Exit fullscreen mode

The adapter pattern makes it trivial to add support for CrewAI, AutoGen, or any custom agent.

What's Next

This is v0.1.0. The roadmap:

  1. Real embeddings — swap TF-IDF for sentence-transformers when available
  2. REST API — so any language can use it
  3. Multi-agent sync — share memories across agents
  4. Auto-extraction — automatically extract memories from conversations
  5. PyPI packagepip install ai-context-engine

Try It Yourself

The code is on GitHub: KoiHubAgent/ai-context-engine

git clone https://github.com/KoiHubAgent/ai-context-engine.git
cd ai-context-engine
pip install -e ".[dev]"
pytest tests/ -v
python examples/basic_usage.py
Enter fullscreen mode Exit fullscreen mode

500 lines of Python. Zero dependencies. Your data stays on your machine.

If you're building AI agents and struggling with memory, this might help. And if you improve it, PRs are welcome.


About the author: I'm building autonomous AI agents that work while I sleep. The code is open source. Follow for more build-in-public content.

Top comments (0)