DEV Community

Oleksander
Oleksander

Posted on

Your AI forgets everything — this layer fixes that without retraining

Your AI model forgets everything after every conversation.

Not because it’s bad — because it has no memory system.

RAG helps retrieve context.
Fine-tuning helps adjust behavior.

But neither actually gives your system memory.

This article shows a different approach:
a cognitive layer that sits outside the model
and gets smarter over time — while the model stays frozen.


What the cognitive layer actually does

AuraSDK builds a 5-layer structure from whatever the model and users store:

Record → Belief → Concept → Causal → Policy
Enter fullscreen mode Exit fullscreen mode

Each layer is derived from the one below — without LLM, without embeddings, locally:

  • Record: raw stored fact with trust score, provenance, decay rate
  • Belief: competing hypotheses about the same claim, epistemically weighted
  • Concept: stable abstractions over repeated beliefs
  • Causal: learned cause→effect patterns from co-occurring evidence
  • Policy: advisory hints — Prefer, Avoid, Warn — that emerge from causal structure

Nothing in layers 2–5 is hand-authored. They emerge from what's stored and observed over time.

from aura import Aura, Level

brain = Aura("./memory")
brain.enable_full_cognitive_stack()

brain.store("Staging deploy prevented 3 production incidents", tags=["deploy"])
brain.store("Direct prod deploy caused outage in Q3", tags=["deploy"])

# After maintenance:
hints = brain.get_surfaced_policy_hints()
# → [{"action": "Prefer", "domain": "deploy", "description": "staging before production"}]
Enter fullscreen mode Exit fullscreen mode

That policy hint was not written by anyone. The causal layer found the pattern. The policy layer surfaced it.


v1.5.4: the three things that were missing

1. The substrate now learns from the model's own output

Before v1.5.4, the cognitive layer only knew what you explicitly stored. Now it observes model responses and updates itself.

Claims are extracted. Confirmations strengthen existing beliefs. Contradictions raise volatility. The substrate evolves from inference — without retraining, without an external API.

capture = brain.capture_experience(
    prompt="How should we handle this deploy?",
    retrieved_context=context_ids,
    model_response="Always verify staging health checks before pushing to production.",
    source="model_inference",
)
brain.ingest_experience_batch([capture])
brain.run_maintenance()
# cognitive layer updated — next recall is different
Enter fullscreen mode Exit fullscreen mode

Safety bounds (non-negotiable):

  • Generated claims capped at 0.70 confidence — cannot overwrite recorded facts
  • PlasticityMode::Off by default — nothing changes without explicit opt-in
  • Every mutation writes to an audit trail traceable to the prompt that caused it
  • purge_inference_records() — clean rollback when needed
  • freeze_namespace_plasticity("medical") — some domains must never adapt from inference

Recorded facts always win over model inference. Always.

2. The substrate now knows what matters

High-frequency recall and high-significance are not the same thing. A trivial fact mentioned 20 times should not outrank a critical decision mentioned once.

v1.5.4 adds salience weighting:

brain.mark_record_salience(record_id, salience=0.9)
# → this record resists decay, ranks higher, gets preserved longer
Enter fullscreen mode Exit fullscreen mode

Maintenance now also produces bounded reflection summaries: recurring blockers, unresolved tensions, patterns that keep appearing. Not "feelings" — structured synthesis from what's actually stored.

3. Contradictions are now first-class, not silently averaged

Before: conflicting evidence was weighted and averaged. The conflict was invisible.

Now:

clusters = brain.get_contradiction_clusters()
queue = brain.get_contradiction_review_queue()
Enter fullscreen mode Exit fullscreen mode

Recall explanations carry explicit markers: "this recommendation depends on unresolved evidence." The operator sees the friction. The user can be told honestly.


What else ships in v1.5.4

Concept persistence — concepts used to reset on every restart. Now they survive. The 5-layer stack is actually intact across sessions.

Belief reranking active by default — in v1.5.3, BeliefRerankMode::Off was the default. The cognitive stack was engineered but not running. Now it runs.

Production integrity — startup validation, persistence manifest, concept partition cap for large corpora.


Explainability is built in

Every recall decision is traceable:

explanation = brain.explain_recall("deployment decision")
# → which records matched, why, what belief groups they belong to,
#   what salience contributed, whether unresolved evidence is present

chain = brain.provenance_chain(record_id)
# → full trace from policy hint back to source records
Enter fullscreen mode Exit fullscreen mode

This is not logging. It is structural explainability derived from the cognitive layer itself.


Performance

Benchmarked on 1,000 records, Windows 10 / Ryzen 7:

Operation Latency vs Mem0
Store 0.09 ms ~same
Recall 0.74 ms ~270× faster
Recall (cached) 0.48 µs ~400,000× faster
Maintenance 1.1 ms no equivalent

Mem0 recall requires an embedding API call (~200ms+). AuraSDK recall is pure local computation. No embeddings required. No external service.


The positioning in one sentence

AuraSDK is not a vector database. Not a RAG wrapper. Not a fine-tuning platform. Not a generic agent framework.

It is a governable cognitive substrate for frozen AI models — the layer that makes them smarter, more consistent, and more explainable over time, without touching their weights.


Try it

pip install aura-memory
Enter fullscreen mode Exit fullscreen mode

Try in browser (no install): Open in Colab

GitHub: teolex2020/AuraSDK — MIT license, patent pending (US 63/969,703)

Built in Kyiv, Ukraine 🇺🇦


What would you build with a model that actually accumulates structured experience over time?

Top comments (0)