48 hours after launch. 419 clones. 90 unique developers. 8 stars. Nobody said a word.
That silence told me something important: engineers don't star things — they test them.
Here's the story of what I built, why, and what those numbers actually mean.
The Problem Nobody Talks About
Everyone is building AI agents. Most of them have a memory problem.
The standard approach: use embeddings. Store text as vectors, query them at recall time. Tools like Mem0, Zep, and LangMem all work this way.
The hidden cost:
- Every recall = an embedding API call = 150–300ms latency
- Every embedding call = money (OpenAI charges per token)
- Offline deployment? Impossible — you need the embedding API available
For cloud-based chatbots this is fine. But for local AI agents running on your own hardware — especially with Ollama — this breaks the whole offline-first promise.
If your agent needs to "remember" something, it has to call home first.
That felt wrong to me.
A Different Idea: SDR Instead of Embeddings
I started reading about Sparse Distributed Representations (SDR) — the pattern encoding mechanism used in Hierarchical Temporal Memory (HTM) theory, originally inspired by how the neocortex works.
The core idea: represent any concept as a sparse binary vector (256K bits in Aura's case) where only ~2% of bits are active. Similarity between patterns is computed using Tanimoto coefficient — pure bit math, no neural network needed.
No embedding model. No API call. No GPU.
Just math.
Recall latency: 0.35ms. That's not a typo.
What I Built
Aura — a cognitive memory system for AI agents written in Rust.
Key properties:
- Sub-millisecond recall — 0.35ms average, 0.29ms after warm cache
- Zero LLM calls for memory operations — the recall itself needs no model
- 2.7MB binary — the entire memory engine fits in a small file
- Fully offline — works with Ollama, any local model, no internet required
- Persistent across sessions — brain reloads from disk, all context intact
- 217 tests, ChaCha20-Poly1305 encryption, patent pending (US 63/969,703)
Four memory levels with different retention weights:
Working Memory → 0.80 retention (temporary context)
Decision Memory → 0.90 retention (choices made)
Domain Memory → 0.95 retention (learned knowledge)
Identity Memory → 0.99 retention (core facts)
Integration with Ollama: 3 Lines
from aura_memory import Aura
brain = Aura("./agent_brain")
context = brain.recall(user_input, token_budget=1500)
# inject context into your Ollama system prompt
response = ollama.chat(
model="gemma3n:e4b",
messages=[
{"role": "system", "content": f"Context:\n{context}\n\nYou are a helpful assistant."},
{"role": "user", "content": user_input}
]
)
# store the interaction
brain.store(user_input, response["message"]["content"])
That's it. Your Ollama agent now has persistent memory across sessions — no embedding API, no cloud, no ongoing cost.
Live Demo Output
I ran a 4-phase test with gemma3n:e4b locally. Here's the actual terminal output:
Phase 1: Storing facts
✓ Stored: Name is Aleksander, AI engineer from Ukraine
✓ Stored: Working on AuraSDK — cognitive memory for agents
✓ Stored: Prefers concise technical explanations
Phase 2: Conversations with memory context
[Recall: 0.35ms] Context injected into system prompt
[Recall: 0.48ms] Agent referenced previous preference correctly
[Recall: 0.41ms] Agent remembered project name without being told
Phase 3: Session reload (fresh Python instance)
Brain loaded from disk...
[Recall: 0.29ms] ALL context intact ✅
Total records: 12
Memory persisted: YES
LLM calls for memory: 0
The agent remembered my name, project, and communication preferences across a completely fresh Python instance — without a single LLM or embedding call.
Benchmark vs Embedding-based approach
| Metric | Aura | Embedding-based approach |
|---|---|---|
| Recall latency | 0.35ms | ~200ms |
| Embedding API calls | 0 | Required |
| Offline capable | ✅ | ❌ |
| Binary size | 2.7MB | N/A (cloud) |
| Cost per recall | $0 | API pricing |
| Speedup | 270x faster | baseline |
Why Rust?
Three reasons:
- Performance — sub-millisecond recall requires zero garbage collection overhead
- Safety — memory systems that corrupt data are worse than no memory at all
- Portability — 2.7MB binary runs anywhere: Raspberry Pi, edge devices, air-gapped servers
19,500 lines of Rust. 217 tests. Built during power outages in Kyiv 🇺🇦
The 419 Clones
After posting in the Ollama Discord and commenting on a few Twitter threads about agent memory, the GitHub traffic spiked:
- 419 clones in 48 hours
- 90 unique cloners
- Zero comments
I think developers are quietly testing it. That's the most honest validation I could ask for — nobody clones a repo to be polite.
If you're one of those 90 people: I'd genuinely love to know what you found. What worked, what didn't, what you were trying to build.
Get Started
pip install aura-memory
- 📦 PyPI: aura-memory
- 🔗 GitHub: teolex2020/AuraSDK
- 🌐 Docs: aurasdk.dev
One Question For You
How are you handling memory in your AI agents right now?
Embeddings? Simple conversation history? Something else entirely?
I'm genuinely curious about the tradeoffs people are navigating — especially for local/offline deployments where latency and API costs actually matter.
Top comments (1)
🤖 AhaChat AI Ecosystem is here!
💬 AI Response – Auto-reply to customers 24/7
🎯 AI Sales – Smart assistant that helps close more deals
🔍 AI Trigger – Understands message context & responds instantly
🎨 AI Image – Generate or analyze images with one command
🎤 AI Voice – Turn text into natural, human-like speech
📊 AI Funnel – Qualify & nurture your best leads automatically