Every AI agent framework has the same problem: memory.
Your agent wakes up fresh every conversation. It doesn't remember that David prefers bullet points, that the API key is in ~/.openclaw/secrets/, or that the deadline was moved to Monday.
The existing solutions are either too simple (flat files that grow forever) or too complex (Pinecone, Weaviate, $70/month).
I built something in between. It's called AI Context Engine — 500 lines of Python, zero external dependencies, fully offline. And it's open source.
The Problem
Most AI agents use one of these for memory:
-
Flat files —
MEMORY.mdthat grows to 5,000 lines. Searching isgrep. - Vector databases — Pinecone, ChromaDB, Weaviate. Great, but $70+/month and another service to manage.
- Provider lock-in — LangChain Memory, OpenAI embeddings. Tied to one ecosystem.
I wanted something that:
- Works offline
- Costs $0/month
- Integrates with any agent framework
- Actually understands what you're asking (not just keyword matching)
The Architecture
┌─────────────────────────────┐
│ AI Agent (any framework)│
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ AI Context Engine │
│ ┌───────┐ ┌────────────┐ │
│ │ Store │ │ Semantic │ │
│ │ Layer │ │ Search │ │
│ └───────┘ └────────────┘ │
│ ┌───────┐ ┌────────────┐ │
│ │ Evolve│ │ Scoring │ │
│ │Engine │ │ Engine │ │
│ └───────┘ └────────────┘ │
│ SQLite FTS5 + JSON │
└─────────────────────────────┘
The entire thing is 4 core modules:
| Module | Lines | What it does |
|---|---|---|
store.py |
~120 | CRUD for memories as JSON files |
search.py |
~60 | TF-IDF semantic search |
evolution.py |
~90 | Auto-decay, boost, dedup |
scoring.py |
~60 | Composite relevance scoring |
Building the Memory Store
The core unit is a Memory — a single fact with metadata:
@dataclass
class Memory:
id: str # uuid, auto-generated
text: str # "David prefers concise communication"
tags: list[str] # ["user", "preference"]
importance: float # 0.0 to 1.0
access_count: int # how often retrieved
created_at: str # ISO timestamp
last_accessed: str
Each memory is stored as a separate JSON file. No database server, no migrations, no schema headaches:
# Remember something
store = MemoryStore("./memories")
store.remember(
"David prefers bullet points over paragraphs",
tags=["user", "preference"],
importance=0.8
)
# Search
results = store.search("communication style")
# → [Memory(text="David prefers bullet points...")]
The MemoryStore handles persistence, search, tagging, and stats. It's about 120 lines of readable Python.
Semantic Search Without a Vector Database
Here's the trick: you don't need embeddings for decent search. A well-tuned TF-IDF approach works surprisingly well for personal knowledge bases.
def search(self, query: str, limit: int = 10):
query_tokens = tokenize(query)
results = []
for mem_id, doc_tokens in self.index.items():
score = tf_idf_score(query_tokens, doc_tokens)
if score > 0.05:
results.append((score, self.store.recall(mem_id)))
results.sort(key=lambda x: x[0], reverse=True)
return results[:limit]
Is it as good as embeddings? No. Is it good enough for a personal agent's memory? Absolutely. And it runs in microseconds on a laptop.
For production use, you can swap in sqlite-vec for real vector search — the architecture supports it.
Memory Evolution
This is the part I'm most excited about. Memories aren't static — they should evolve:
- Frequently accessed → boosted relevance (it's important)
- Old and never accessed → decay over time (probably not relevant anymore)
- Near-duplicates → flagged for merging
engine = EvolutionEngine(store)
report = engine.evolve()
# Output:
# {
# "boosted": 12,
# "decayed": 3,
# "duplicates_found": 2,
# "total_memories": 47
# }
The decay uses exponential half-life (~72 hours). If you haven't accessed a memory in a week, its importance drops. In a month, it's barely searchable.
This means your agent's memory stays clean automatically. No manual pruning.
Composite Scoring
Search results are ranked by a composite score that combines:
| Signal | Weight | Why |
|---|---|---|
| Semantic similarity | 0.30 | Does it match what you asked? |
| Recency | 0.25 | Was it accessed recently? |
| Importance | 0.25 | Is it marked as important? |
| Frequency | 0.20 | Is it accessed often? |
scorer = RelevanceScorer()
score = scorer.score(
memory,
similarity=0.85, # from search
query_tags=["user"] # tag match bonus
)
# → 0.72
This means a frequently-accessed, important memory about "user preferences" will always rank higher than a rarely-accessed note about "weather".
Integrating With Any Framework
The engine ships with adapters for popular frameworks:
# OpenClaw
from ai_context_engine.adapters import OpenClawAdapter
adapter = OpenClawAdapter(store)
adapter.sync()
# LangChain
from ai_context_engine.adapters import LangChainMemory
memory = LangChainMemory(store)
The adapter pattern makes it trivial to add support for CrewAI, AutoGen, or any custom agent.
What's Next
This is v0.1.0. The roadmap:
-
Real embeddings — swap TF-IDF for
sentence-transformerswhen available - REST API — so any language can use it
- Multi-agent sync — share memories across agents
- Auto-extraction — automatically extract memories from conversations
-
PyPI package —
pip install ai-context-engine
Try It Yourself
The code is on GitHub: KoiHubAgent/ai-context-engine
git clone https://github.com/KoiHubAgent/ai-context-engine.git
cd ai-context-engine
pip install -e ".[dev]"
pytest tests/ -v
python examples/basic_usage.py
500 lines of Python. Zero dependencies. Your data stays on your machine.
If you're building AI agents and struggling with memory, this might help. And if you improve it, PRs are welcome.
About the author: I'm building autonomous AI agents that work while I sleep. The code is open source. Follow for more build-in-public content.
Top comments (0)