DEV Community

Oleksander
Oleksander

Posted on

10x Faster Recall + Memory That Evolves: Aura v1.3 for AI Agents

The Problem

Every AI agent framework has the same weakness: memory is an afterthought. Most solutions dump everything into a vector database and hope cosine similarity finds the right context. This works until it doesn't — when your agent needs to know when it learned something, what changed since last week, or which memories are actually useful.

What Aura Does Differently

Aura is a pure-Rust cognitive memory engine. Instead of embeddings + vector search, it uses:

  • SDR Encoding (Sparse Distributed Representations) — biologically-inspired, noise-tolerant
  • RRF Fusion — 4 parallel ranking signals (SDR similarity, MinHash, Tag Jaccard, optional embeddings)
  • Temporal Decay — memories naturally fade unless reinforced
  • Graph Connections — associative, causal, and co-activation links between memories

The result: sub-millisecond recall, ~3MB binary, zero external dependencies.

10x Recall Speedup (v1.3.1)

Every recall_structured call was cloning ALL records into a new HashMap to filter by namespace. At 10K records, that's 94ms of pure waste.

Fix: pass the original HashMap through the pipeline. Each signal collector filters by namespace inline with a cheap contains() check. Plus a new StructuredRecallCache for repeated queries.

Records Before After Speedup
1K 15 ms 2.6 ms 5.8x
5K 58 ms 5.1 ms 11.4x
10K 94 ms 8.6 ms 10.9x

Warm recall (cache hit): ~0.07 ms — constant time regardless of record count.

What's New in v1.3.0

1. Temporal Queries

from aura import Aura
import time

brain = Aura("./memory")
brain.store("User prefers dark mode", level="Domain")

# ... days pass, user changes preference ...
brain.supersede(old_id, "User prefers light mode")

# What did we know last week?
old_memories = brain.recall_at("user preferences", last_week_timestamp)
Enter fullscreen mode Exit fullscreen mode

recall_at(query, timestamp) filters records by creation time. history(record_id) shows the full access/strength timeline. This is how you debug agent behavior — "why did it do X on Tuesday?"

2. LangChain Drop-In

from aura import Aura
from aura.langchain import AuraMemory

brain = Aura("./memory")
memory = AuraMemory(brain)

# Works with any LangChain chain
chain = ConversationChain(llm=llm, memory=memory)
Enter fullscreen mode Exit fullscreen mode

AuraChatMessageHistory implements the full BaseChatMessageHistory interface. AuraMemory is duck-type compatible with ConversationBufferMemory. No changes to your existing code.

3. Adaptive Recall

# After recall, tell Aura what was useful
results = brain.recall_structured("deployment steps", top_k=5)
for score, record in results:
    if was_helpful(record):
        brain.feedback(record.id, useful=True)   # +0.1 strength
    else:
        brain.feedback(record.id, useful=False)  # -0.15 strength
Enter fullscreen mode Exit fullscreen mode

Over time, noise naturally decays while valuable memories get reinforced. No other memory SDK has this built-in.

4. Memory Versioning

# Save state before experiment
brain.snapshot("before_refactor")

# ... agent does things ...

# Something went wrong? Roll back
brain.rollback("before_refactor")

# Or compare states
diff = brain.diff("before_refactor", "after_refactor")
print(f"Added: {diff['added']}, Removed: {diff['removed']}")
Enter fullscreen mode Exit fullscreen mode

5. Agent-to-Agent Sharing

# Agent A exports relevant context
fragment = agent_a.export_context("user preferences", top_k=5)

# Agent B imports it (strength halved, tagged "shared")
agent_b.import_context(fragment)
Enter fullscreen mode Exit fullscreen mode

The protocol envelope includes version and provenance metadata. Imported records arrive with reduced trust — they need to prove themselves.

6. C FFI — Aura as a Platform

#include "aura.h"

AuraHandle* h = aura_open("./memory");
aura_store(h, "Remember this", NULL, NULL);

char* result = aura_recall(h, "what to remember", 5);
printf("%s\n", result);
aura_free_string(result);
aura_close(h);
Enter fullscreen mode Exit fullscreen mode

Working examples in Go and C#. Any language with C FFI can use Aura.

7. OpenTelemetry

[features]
telemetry = ["opentelemetry", "opentelemetry_sdk", "opentelemetry-otlp", "tracing-opentelemetry"]
Enter fullscreen mode Exit fullscreen mode

17 key functions instrumented with #[instrument] spans. OTLP export to any collector. Grafana dashboard template included.

The Bug That Took 8 Hours

Fun story: our CI was timing out at 6+ hours. We tried increasing timeouts, switching to release builds, reducing the test matrix. Nothing worked.

Turns out: Aura struct didn't have a Drop implementation. When tests ended without calling close(), internal file handles wouldn't release. Each test hung for 5 minutes waiting for a timeout that never came. 28 tests x 5 min = CI death.

Fix: 9 lines of code.

impl Drop for Aura {
    fn drop(&mut self) {
        self.stop_background();
        let _ = self.flush();
        let _ = self.storage.flush();
        let _ = self.index.save();
    }
}
Enter fullscreen mode Exit fullscreen mode

Now 503 tests pass in 7 minutes. Sometimes the hardest bugs are the simplest ones.

Try It

pip install aura-memory
Enter fullscreen mode Exit fullscreen mode
from aura import Aura

brain = Aura("./my_agent_memory")
brain.store("User prefers concise answers", level="Identity")

context = brain.recall("how should I respond?")
# Returns formatted context for your LLM's system prompt
Enter fullscreen mode Exit fullscreen mode

Star the repo if this is useful. PRs and issues welcome.

Top comments (0)