I Built a Long-Term Memory Engine for AI Coding Agents in Rust

#ai #machinelearning #rust #opensource

AI coding agents have a weird problem.
They fail the same way repeatedly.
You fix a bug once.
The next session forgets it existed.
A few hours later the agent repeats the exact same mistake.

Wrong float handling.
Broken retry logic.
Reintroducing old bugs.
Ignoring project-specific constraints.

That pushed me into building Memorie.
A local-first semantic memory engine for AI coding agents.

Not a chatbot history wrapper.
Not another “AI memory platform.”
Just an experiment around one question:

What would it take for coding agents to actually learn from previous work?
The mistake I made first
The first implementation was basically:

Store conversations
Embed them
Retrieve similar chunks later

It worked surprisingly well for small demos.
Then it slowly collapsed.
The memory store became full of:

duplicated ideas
low-value observations
outdated fixes
contradictory advice
random noise from failed attempts

Retrieval quality degraded over time because the system had no concept of memory quality.
That changed the entire direction of the project.
I stopped thinking about “storage.”
I started thinking about memory governance.
The core is written in Rust.
Current stack:

SQLite persistence
ONNX embeddings
HNSW search
BLAKE3 deduplication
Python bindings through ctypes
CFFI layer

The system stays fully local-first.
No external vector database.
No hosted APIs inside the core loop.
Everything lives in a single SQLite-backed memory system.
The retrieval pipeline roughly works like this:

query
→ embedding
→ similarity search
→ trust weighting
→ contradiction filtering
→ ranking
→ final memories

Each memory tracks:
trust
uncertainty
reinforcement history
failure count
recency
importance

The goal is not “remember everything.”
The goal is:
remember what consistently helps.

One design choice that mattered a lot
I originally planned to use ANN search immediately.
Bad idea.
For smaller datasets, linear scans were simpler and often fast enough.
So the current system uses:

linear scan under ~500 memories
HNSW above that threshold

That removed complexity early and made debugging easier.
I think too many infrastructure projects optimize for scale before correctness.
Reinforcement is harder than retrieval
This became the real problem.

Suppose an agent retrieves 5 memories and succeeds.
Which memory actually contributed to the success?
A lot of systems reinforce everything equally.
That creates long-term garbage accumulation.

So Memorie uses attribution gates before reinforcement:
semantic overlap checks
cosine similarity thresholds
usage matching

Still imperfect.
But enough to stop obvious false reinforcement loops.
Contradictions get ugly fast
Eventually memories start disagreeing.
Example:

Memory A:
“Always retry failed requests.”

Memory B:
“Never retry this endpoint because duplicate execution corrupts state.”

Both can be valid depending on context.
Right now contradiction handling uses:

semantic similarity
polarity detection
quality-weighted conflict resolution

Lower-trust memories get archived instead of immediately deleted.
Honestly, this part still feels very unsolved.
The part that changed my thinking
I started this project assuming memory retrieval was the hard problem.

Now I think the real challenge is:
how systems decide what deserves to survive.
Most memory systems focus on retrieval speed.
I think long-term usefulness matters more.
Without suppression, decay, and conflict handling, memory systems slowly poison themselves.

Current state
The project is still experimental and not production ready.
But it already includes:

Rust core library
SQLite persistence
semantic retrieval
trust scoring
reinforcement + penalty loops
contradiction resolution
Python bindings
MCP server support

Repo:
GITHUB

I would genuinely appreciate criticism from people working on:

AI agents
retrieval systems
vector databases
reinforcement systems
memory architectures

Especially around where this design breaks at scale.