Oleksander

Posted on Mar 5

10x Faster Recall + Memory That Evolves: Aura v1.3 for AI Agents

#python #rust #machinelearning #ai

The Problem

Every AI agent framework has the same weakness: memory is an afterthought. Most solutions dump everything into a vector database and hope cosine similarity finds the right context. This works until it doesn't — when your agent needs to know when it learned something, what changed since last week, or which memories are actually useful.

What Aura Does Differently

Aura is a pure-Rust cognitive memory engine. Instead of embeddings + vector search, it uses:

SDR Encoding (Sparse Distributed Representations) — biologically-inspired, noise-tolerant
RRF Fusion — 4 parallel ranking signals (SDR similarity, MinHash, Tag Jaccard, optional embeddings)
Temporal Decay — memories naturally fade unless reinforced
Graph Connections — associative, causal, and co-activation links between memories

The result: sub-millisecond recall, ~3MB binary, zero external dependencies.

10x Recall Speedup (v1.3.1)

Every recall_structured call was cloning ALL records into a new HashMap to filter by namespace. At 10K records, that's 94ms of pure waste.

Fix: pass the original HashMap through the pipeline. Each signal collector filters by namespace inline with a cheap contains() check. Plus a new StructuredRecallCache for repeated queries.

Records	Before	After	Speedup
1K	15 ms	2.6 ms	5.8x
5K	58 ms	5.1 ms	11.4x
10K	94 ms	8.6 ms	10.9x

Warm recall (cache hit): ~0.07 ms — constant time regardless of record count.

What's New in v1.3.0

1. Temporal Queries

from aura import Aura
import time

brain = Aura("./memory")
brain.store("User prefers dark mode", level="Domain")

# ... days pass, user changes preference ...
brain.supersede(old_id, "User prefers light mode")

# What did we know last week?
old_memories = brain.recall_at("user preferences", last_week_timestamp)

recall_at(query, timestamp) filters records by creation time. history(record_id) shows the full access/strength timeline. This is how you debug agent behavior — "why did it do X on Tuesday?"

2. LangChain Drop-In

from aura import Aura
from aura.langchain import AuraMemory

brain = Aura("./memory")
memory = AuraMemory(brain)

# Works with any LangChain chain
chain = ConversationChain(llm=llm, memory=memory)

AuraChatMessageHistory implements the full BaseChatMessageHistory interface. AuraMemory is duck-type compatible with ConversationBufferMemory. No changes to your existing code.

3. Adaptive Recall

# After recall, tell Aura what was useful
results = brain.recall_structured("deployment steps", top_k=5)
for score, record in results:
    if was_helpful(record):
        brain.feedback(record.id, useful=True)   # +0.1 strength
    else:
        brain.feedback(record.id, useful=False)  # -0.15 strength

Over time, noise naturally decays while valuable memories get reinforced. No other memory SDK has this built-in.

4. Memory Versioning

# Save state before experiment
brain.snapshot("before_refactor")

# ... agent does things ...

# Something went wrong? Roll back
brain.rollback("before_refactor")

# Or compare states
diff = brain.diff("before_refactor", "after_refactor")
print(f"Added: {diff['added']}, Removed: {diff['removed']}")

5. Agent-to-Agent Sharing

# Agent A exports relevant context
fragment = agent_a.export_context("user preferences", top_k=5)

# Agent B imports it (strength halved, tagged "shared")
agent_b.import_context(fragment)

The protocol envelope includes version and provenance metadata. Imported records arrive with reduced trust — they need to prove themselves.

6. C FFI — Aura as a Platform

#include "aura.h"

AuraHandle* h = aura_open("./memory");
aura_store(h, "Remember this", NULL, NULL);

char* result = aura_recall(h, "what to remember", 5);
printf("%s\n", result);
aura_free_string(result);
aura_close(h);

Working examples in Go and C#. Any language with C FFI can use Aura.

7. OpenTelemetry

[features]
telemetry = ["opentelemetry", "opentelemetry_sdk", "opentelemetry-otlp", "tracing-opentelemetry"]

17 key functions instrumented with #[instrument] spans. OTLP export to any collector. Grafana dashboard template included.

The Bug That Took 8 Hours

Fun story: our CI was timing out at 6+ hours. We tried increasing timeouts, switching to release builds, reducing the test matrix. Nothing worked.

Turns out: Aura struct didn't have a Drop implementation. When tests ended without calling close(), internal file handles wouldn't release. Each test hung for 5 minutes waiting for a timeout that never came. 28 tests x 5 min = CI death.

Fix: 9 lines of code.

impl Drop for Aura {
    fn drop(&mut self) {
        self.stop_background();
        let _ = self.flush();
        let _ = self.storage.flush();
        let _ = self.index.save();
    }
}

Now 503 tests pass in 7 minutes. Sometimes the hardest bugs are the simplest ones.

Try It

pip install aura-memory

from aura import Aura

brain = Aura("./my_agent_memory")
brain.store("User prefers concise answers", level="Identity")

context = brain.recall("how should I respond?")
# Returns formatted context for your LLM's system prompt

Star the repo if this is useful. PRs and issues welcome.

Top comments (1)

mote • Apr 18

Great breakdown of the recall problem. The temporal query angle is something most AI memory tutorials gloss over — treating time as just another filter field is a recipe for slow queries at scale.

The versioning approach is interesting, but I wonder how it handles the cold start problem in embodied AI. When a robot boots up, it often needs to reconstruct context from fragmented, noisy sensor data — not just version history. Did you find that naive timestamp-based retrieval was a bottleneck before you built the temporal layer?

For anyone wrestling with similar problems on resource-constrained hardware (Raspberry Pi-class devices), one thing that helped us was keeping the memory engine entirely local — no network round-trips for recall. The latency profile changes completely when you are measuring in microseconds instead of milliseconds.

Curious: how does Aura handle conflicting memories? Say two observations contradict each other — do you use recency, confidence scores, or something else to prioritize?