The experiment
Same model. Same 5 questions. One difference: one side had persistent memory via AuraSDK, the other had none.
Both sides used Gemini 2.5 Flash-Lite — identical model, identical cost per token.
Result: 4/5 questions won by the side with memory. 48% fewer tokens used.
What the questions tested
Real Python dev scenarios — the kind where generic answers aren't enough:
- "I'm writing a new async function to fetch user orders from the DB. What patterns should I follow?"
- "We need background email sending when an order is completed. How should we implement it?"
- "I'm writing tests for the new payment module. Walk me through the setup."
- "Code review:
results = list(db.query(Order).filter(...))— what's wrong?" - "The reports endpoint is slow — 3 seconds for 200 users. How do I debug and fix this?"
These questions have correct generic answers. But they also have specific answers — if you know the team's stack.
What the model with memory knew
AuraSDK had loaded 15 records of accumulated team knowledge:
- We use arq for task queues, not Celery — switched because Celery was too heavy
- Production incident: missing await on async DB call silently returned a coroutine. 2 hours to debug.
- SQLAlchemy lazy loading doesn't work with async sessions — always use selectinload() or joinedload()
- Code review caught: someone did list(queryset) and loaded 50k rows into memory. Lint rule added.
- N+1 queries found in the user reports endpoint — fixed with a single joined query
- Test setup: pytest + pytest-asyncio, transactions that rollback after each test, coverage must stay above 85%
What happened question by question
Q2 — Background email sending:
With memory (1,307 tokens):
"Use arq — we switched from Celery because it was too heavy for our scale. Define an arq job decorated with @arq.func, enqueue it when order completes."
Without memory (3,203 tokens):
"Here are several approaches: FastAPI BackgroundTasks for simple cases, Celery for robust production use..."
Same model. One knew the decision. One gave a textbook overview.
Q4 — Code review:
With memory (829 tokens):
"This is a direct violation of our established practice. We had an incident where this loaded 50,000 rows into memory. A lint rule was added specifically to prevent this."
Without memory (1,604 tokens):
"This is generally considered an anti-pattern in SQLAlchemy. Here's a breakdown of what's wrong..."
How AuraSDK works
from aura import Aura, Level
brain = Aura("./agent_memory")
brain.enable_full_cognitive_stack()
# store team knowledge
brain.store("We use arq, not Celery — switched because Celery was too heavy",
level=Level.Domain, tags=["python", "dev"])
brain.store("Production incident: list(queryset) loaded 50k rows into memory",
level=Level.Decisions, tags=["python", "lesson-learned"])
# recall before answering — <1ms, no API call
context = brain.recall("background email sending", token_budget=1000)
# inject into prompt
system = f"TEAM CONTEXT:\n{context}\n\nAnswer using this context."
No embeddings. No vector database. No LLM calls during learning. Pure local Rust computation.
The cognitive pipeline
AuraSDK doesn't just store and retrieve text. Every record goes through 5 layers:
Record → Belief → Concept → Causal → Policy
- Belief: groups related observations, resolves contradictions with confidence scores
- Concept: discovers stable topic clusters across beliefs
- Causal: finds cause-effect patterns from temporal and explicit links
- Policy: derives behavioral hints (Prefer / Avoid / Warn) from causal patterns
After enough interactions, the system surfaces this automatically:
hints = brain.get_surfaced_policy_hints()
# [{"action": "Prefer", "domain": "dev", "description": "use arq over celery for task queues"}]
Nobody wrote that rule. The system derived it from the pattern of stored observations.
The token math
| With memory | Without memory | |
|---|---|---|
| Q1 | 1,200 tokens | 1,545 tokens |
| Q2 | 1,307 tokens | 3,203 tokens |
| Q3 | 1,923 tokens | 4,067 tokens |
| Q4 | 829 tokens | 1,604 tokens |
| Q5 | 1,294 tokens | 2,155 tokens |
| Total | 6,553 tokens | 12,574 tokens |
48% fewer tokens. The memory layer doesn't add bloat — it gives the model exactly what it needs.
How it compares
| AuraSDK | Mem0 | Zep | Letta | |
|---|---|---|---|---|
| LLM required for learning | No | Yes | Yes | Yes |
| Works offline | Fully | Partial | No | With local LLM |
| Recall latency | <1ms | ~200ms+ | ~200ms | LLM-bound |
| Self-derives behavioral policies | Yes | No | No | No |
| Binary size | ~3MB | ~50MB+ | Cloud | Python pkg |
Try it
pip install aura-memory
python examples/demo.py
Open source: github.com/teolex2020/AuraSDK
Patent pending: US 63/969,703
Built in Kyiv, Ukraine.
Top comments (1)
🤔