Oleksander

Posted on Mar 17

I built a cognitive layer for AI agents that learns without LLM calls

#ai #memory #rust #python

The problem

Every time your agent starts a conversation, it starts from zero.

Sure, you can stuff a summary into the system prompt. You can use RAG. You can call Mem0 or Zep.

But all of these have the same problem: they need LLM calls to learn. To extract facts, to build a user profile, to understand what matters — you're paying per token, adding latency, and depending on a cloud service.

What if the learning happened locally, automatically, without any LLM involvement?

What AuraSDK does differently

AuraSDK is a cognitive layer that runs alongside any LLM. It observes interactions and — without any LLM calls — builds up a structured understanding of patterns, causes, and behavioral rules.

from aura import Aura, Level

brain = Aura("./agent_memory")
brain.enable_full_cognitive_stack()

# store what happens
brain.store("User always deploys to staging first", level=Level.Domain, tags=["workflow"])
brain.store("Staging deploy prevented 3 production incidents", level=Level.Domain, tags=["workflow"])

# sub-millisecond recall — inject into any LLM prompt
context = brain.recall("deployment decision")

# after enough interactions, the system derives this on its own:
hints = brain.get_surfaced_policy_hints()
# [{"action": "Prefer", "domain": "workflow", "description": "deploy to staging first"}]

Nobody wrote that policy rule. The system derived it from the pattern of stored observations.

The cognitive pipeline

AuraSDK processes every stored record through 5 layers:

Record → Belief → Concept → Causal → Policy

Each layer is bounded and deterministic:

Belief: groups related observations, resolves contradictions
Concept: discovers stable topic clusters across beliefs
Causal: finds cause-effect patterns from temporal and explicit links
Policy: derives behavioral hints (Prefer / Avoid / Warn) from causal patterns

The entire pipeline runs in milliseconds. No LLM. No cloud. No embeddings required.

Try it in 60 seconds

pip install aura-memory
python examples/demo.py

Output:

Phase 4 - Recall in action

  Query: "deployment decision"  [0.29ms]
    1. Staging deploy prevented database migration failure
    2. Direct prod deploy skipped staging -- caused data loss

  Query: "code review"  [0.18ms]
    1. Code review caught SQL injection before merge
    2. Code review found performance regression early

5 learning cycles completed in 16ms. Recall at 0.29ms.

How it compares

	AuraSDK	Mem0	Zep	Letta
LLM required for learning	No	Yes	Yes	Yes
Works offline	Fully	Partial	No	With local LLM
Recall latency	<1ms	~200ms+	~200ms	LLM-bound
Self-derives behavioral policies	Yes	No	No	No
Binary size	~3MB	~50MB+	Cloud	Python pkg

What's new in v1.5.3

Full 5-layer cognitive pipeline active by default
enable_full_cognitive_stack() — one call to activate everything
Decay now driven by memory level, not manual type labels
Policy hints now work with explicit causal links (link_records())
demo.py — see it working in 60 seconds

Built in Rust, from Kyiv

Pure Rust core. No Python dependencies for the engine. Patent pending (US 63/969,703).

Open source: github.com/teolex2020/AuraSDK
Install: pip install aura-memory
Web: aurasdk.dev

If you're building AI agents and want deterministic, explainable, offline-capable memory — give it a try and tell me what you think.

Top comments (20)

Vemtrac Labs • Mar 23

the sub-millisecond recall is wild. i've been building python tools for web analysis (seo scoring, tech stack detection) and latency matters a lot when you're processing hundreds of URLs in batch.

the comparison table is really well done — makes it immediately clear where AuraSDK fits vs the LLM-dependent alternatives. the "no cloud required" angle is underrated, especially for anyone processing sensitive data.

curious about the memory persistence — if you stop the process and restart, does the cognitive layer retain everything it learned?

Harjot Singh • May 31

A learning layer that doesn't burn an LLM call for every update is exactly the right instinct - most "memory/learning" bolt-ons quietly re-invoke the model constantly and the cost creeps up invisibly. Keeping the cognitive/state layer cheap and deterministic, and reserving the LLM for the genuinely fuzzy steps, is how you keep an agent affordable at scale.

This is basically the thesis behind what I work on (Moonshift - a multi-agent pipeline that takes a prompt to a shipped SaaS on your own GitHub+Vercel): route the cheap/deterministic work away from the frontier model and only spend premium tokens where reasoning actually decides the outcome, so a full build lands around $3 flat instead of a monthly burn. Your no-LLM-call learning layer is the same principle one level down. Genuinely curious how the learning representation works without embeddings calls - rules, vectors, something else? (and if you ever want to see the pattern applied end-to-end, first run's free, no card).

Cyber Safety Zone • Mar 18

This is a really compelling direction—moving away from LLM-in-the-loop toward deterministic, cognitive-layer architectures feels like a necessary evolution. The focus on consistent memory, lower latency, and cost efficiency addresses real production bottlenecks that most agent systems still struggle with. It also echoes a broader shift toward structured cognitive systems rather than prompt-driven intelligence.

Mindmagic • Mar 18

Really interesting approach.
Removing LLM calls from the learning loop makes a lot of sense for agents that need low latency and predictable behavior.
The Record → Belief → Concept → Causal → Policy pipeline reminds me a bit of rule-based cognitive architectures, but applied in a very practical way for modern agent workflows.

Oleksander • Mar 18

The rule-based cognitive architecture comparison is fair — the difference is that classical rule-based systems require a human to write the rules. Here the rules emerge from observation data. The pipeline is deterministic and inspectable (you can read exactly which records formed which belief, which causal pattern seeded which policy hint), but nobody writes the logic explicitly.

Predictable behavior was a hard constraint from the start. Every reranking phase has strict bounds — belief layer can shift a result by ±5%, concept ±4%, causal ±3%, policy ±2%. The system cannot surprise you with a 10× score flip. That's what makes it safe to run in production without a human review loop.

Victor Okefie • Mar 18

The claim that matters: "Nobody wrote that policy rule. The system derived it from the pattern of stored observations." That's the difference between memory and understanding. Memory recalls what happened. Understanding infers what will happen next. Most memory tools stop at the first layer.

Oleksander • Mar 18

That's the most precise description of the design intent I've seen from outside the project.

The five layers exist exactly for this reason: Record is storage, Belief is pattern recognition, Concept is abstraction, Causal is inference, Policy is forward prediction. Most tools stop at layer one or two. The interesting behavior only starts at layer three.

The part that surprised me during development: the system derives policy hints that I never explicitly programmed. It observed the patterns, built the causal chain, and surfaced the recommendation on its own. That's when I knew the architecture was correct.

klement Gunndu • Mar 17

The deterministic pipeline is interesting, but how does the Belief layer handle contradictions when the agent's environment changes over time? Stale beliefs becoming policy hints seems like the hardest edge case.

Oleksander • Mar 17

Good question — this is exactly what the decay + contradiction system handles.

When new evidence contradicts an existing belief, the belief engine tracks both as competing hypotheses with confidence scores. The older one doesn't disappear immediately — it decays based on memory level (Identity decays over weeks, Working over hours). During maintenance cycles, contradiction semantic type records are preserved longer specifically for conflict analysis.

Stale beliefs reaching the policy layer is prevented by two gates: a belief must have sufficient stability score before it can seed a causal pattern, and causal patterns need minimum support count before generating policy hints. A belief that stopped receiving reinforcing evidence will decay below the stability threshold before it could produce a policy hint.

So the pipeline is: contradicting record added → competing hypothesis created → older hypothesis loses confidence → decays below stability threshold → no longer seeds causal/policy layers.

Not a perfect solution for rapid environment changes, but that's a known trade-off of the decay window tuning.

Apex Stack • Mar 17

The Record → Belief → Concept → Causal → Policy pipeline is a really elegant architecture. I run a fleet of AI agents that manage a 100K-page programmatic SEO site — checking search console data, auditing content quality, filing tickets — and the biggest pain point is exactly what you described: every agent session starts from zero context. Right now I'm stuffing summaries into system prompts and it works, but it's brittle and expensive at scale.

The sub-millisecond recall is what caught my eye. When you're running agents on scheduled tasks multiple times a day, shaving even 200ms off each context-loading step compounds fast. Curious whether you've tested AuraSDK with agents that have very domain-specific knowledge that changes frequently — like financial data or search rankings that shift daily. The decay tuning you mentioned in the comments seems like the key lever there.

Oleksander • Mar 18

Your use case is almost exactly the target scenario. Stuffing summaries into system prompts works until it doesn't — the context window fills up, the cost compounds, and stale summaries start actively misleading the agent.

For frequently-changing domain knowledge like search rankings, decay tuning is indeed the key lever. trend semantic type was built for exactly this — it decays faster than a fact and promotes differently, so yesterday's ranking signal doesn't linger as long as a stable domain fact. You get natural freshness without manual cleanup.

For a fleet at your scale, namespace isolation per agent or per site section means each agent builds its own belief graph without cross-contamination. The maintenance cycle runs in ~1ms per agent, so even 1,000 agents running maintenance on a schedule adds negligible overhead.

The brittle system prompt approach also breaks when the agent needs to act on a pattern it hasn't explicitly seen — Aura's causal layer would surface "content audits in category X consistently preceded ranking drops" as a policy hint without anyone writing that rule.

Worth trying on one agent in the fleet first.

Max Othex • Mar 18

The cold-start problem in AI agents is massively underappreciated. Most people solve it by stuffing more context into the system prompt and calling it memory — which just shifts the cost from compute to token count.

What you're describing is different: deterministic, structured inference that runs locally without round-tripping through an LLM. The fact that the policy hints emerge from patterns rather than being hand-coded is the part that actually changes the equation.

One thing I'm curious about — how does AuraSDK handle contradictory patterns? E.g. if early observations say "deploy directly to prod" but later ones say "always stage first", does the causal layer weight recency, or does it try to resolve the conflict explicitly?

View full discussion (20 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.