Last year, I spent two weeks debugging why my robot kept repeating the same mistake.
Not a code bug. Not a hardware failure. The robot knew what it had done wrong the last time. I could see it in the logs. It had stored the error. It just didn't... use that knowledge when the same situation came up again.
That's when I realized I had been solving the wrong problem.
The Log File Mental Model
Most agent memory systems I've seen follow the same pattern:
- Agent does something
- Store a text description of what happened
- Later, embed it and retrieve it with semantic search
- Inject retrieved context into the next prompt
It's elegant. It works well for conversational AI — the kind that lives in a chat window and helps you write emails.
But I'm building robots. And the log-file model breaks in ways that aren't obvious until your robot crashes into the same wall for the third time.
Here's why.
What Goes Wrong on the Edge
A robot's environment produces data that doesn't fit in a text string.
When my drone hit an airflow problem near a building edge, what it experienced was:
- A 47ms IMU reading spike (accelerometer Z-axis: +3.8g)
- A camera frame showing a glass surface at 0.4m
- A motor throttle log
- A GPS coordinate
- A vibration frequency pattern
I stored a text note: "Building edge caused unexpected turbulence, compensated with throttle adjustment."
Two weeks later, same building edge. The agent retrieved the text note. The text said "throttle adjustment." The agent adjusted throttle. It still struggled, because the actual recovery wasn't about throttle — it was about yaw correction combined with a specific altitude hold. The text summary had lost the operational precision.
This is the binding problem. The memory exists. The retrieval works. But the stored representation can't carry the real-world nuance.
What Memory Actually Needs to Do (For Agents That Act)
After a lot of iteration, I've landed on a different model:
Memory for acting agents is not a recall system. It's a structured experience index.
That means:
- Multi-modal by default. An experience in the physical world involves sensor readings, visual data, timing, and spatial context — not just a text description of what happened.
- Queryable by context, not just keywords. "What did I do last time the accelerometer reading was above 3g near a glass surface?" is a different query than "what happened near buildings?"
- Lightweight enough to run on the device. A Raspberry Pi can't afford 500ms vector search round-trips mid-flight.
- Persistent across power cycles. Edge AI devices reboot. Memory has to survive that.
None of the existing options — SQLite, Redis, Chroma, even small embedded vector stores — were designed for this combination.
What I Built Instead
I spent about four months building moteDB, a Rust-native embedded database specifically for this use case.
The core design decisions:
1. Multi-modal storage as a first-class concept.
A "record" can contain a vector, a binary blob, structured fields, and a timestamp — not an ORM abstraction on top of text columns. When the drone stores an experience, it stores all modalities together, not in separate tables that need joining later.
2. No runtime dependencies.
cargo add motedb — that's it. It runs in the same process as your agent. No daemon, no network round-trip, no container to manage.
3. Designed for embedded constraints.
Memory budget is explicit. Old records get evicted by policy. It doesn't assume you have 32GB of RAM or SSD storage.
4. Query by structured context.
You can ask: "Give me the 3 most similar past experiences where sensor reading X was above threshold Y and the outcome was Z." That's a spatial + vector + structured query — all in one call.
Does It Actually Help?
The real test: same building edge, three months after I started using moteDB.
The drone recalled the actual IMU trace from the previous incident — not a text summary of it. Its flight controller could compare the current sensor pattern directly to the stored pattern. Yaw correction happened 400ms earlier than it had previously.
No crash.
More importantly: I didn't have to rewrite the memory system every time the robot encountered a new type of sensor data. The schema is flexible. Text, vectors, binary — it stores what the agent actually experiences.
The Broader Point
The "agent memory = semantic text search" assumption works for chat-based agents because their world is text. But if you're building agents that act in physical or structured-data environments, the mismatch compounds fast.
The memory system needs to match the modality of the environment, not just the modality of the LLM interface.
I'm still iterating on this. The drone problem is mostly solved. The harder one is multi-agent memory sharing — when two robots have different experiences of the same environment, how do you merge those sensibly?
Working on it.
If you're building agents that operate in physical or data-heavy environments, I'd be curious what memory architecture you're using. Are you hitting the same log-file limitation, or have you found something that works well?
cargo add motedb if you want to experiment. GitHub: https://github.com/motedb/motedb
Top comments (0)