DEV Community: mote

I Spent 3 Hours Watching My Benchmark Hang, Then 6 Seconds to Fix It

mote — Thu, 14 May 2026 13:30:36 +0000

Three hours. That's how long bench_column_index ran before I realized it wasn't going anywhere.

I was preparing for moteDB v0.2.0 and running the usual performance suite. Twelve DB instances in parallel, each doing SELECT WHERE col = ? queries while a background thread built indexes. Queries that should take milliseconds started taking minutes. Then hours. Then nothing.

The culprit was a single RwLock<GenericBTree> protecting every read and write to the column index. When the background thread grabbed the write lock to bulk-insert, every query blocked. Simple as that. Twelve threads fighting over one lock.

Here's what I did about it — and how I got that 3-hour hang down to 6.6 seconds.

The Architecture That Was Killing Us

v0.1.7 had a straightforward design: one B-Tree, one RwLock. Clean. Wrong.

SELECT WHERE col = ?  →  acquire read lock  →  traverse B-Tree  →  return
Background index build  →  acquire write lock  →  bulk insert  →  release

When those two paths hit the same lock simultaneously, queries queued behind the writer. With twelve instances, the queue grew faster than it drained. The system looked alive — threads were running, memory was allocated — but nothing was making progress.

I needed a different model. Here's what I landed on:

Two-layer architecture, RocksDB style:

IndexMemBuffer — an in-memory BTreeMap with parking_lot::RwLock (nanosecond-level contention). Writes go here first.
GenericBTree — the on-disk B-Tree. Reads through here. Writes only happen during background drain.
drain_lock (Mutex) — serializes the buffer-to-B-Tree migration using try_lock so writers never block.
tombscones (HashSet) — tracks deleted keys so drained buffers don't resurrect data.

When the memory buffer exceeds a threshold, it atomically flips to an immutable snapshot. The drain thread picks it up and builds the B-Tree without blocking readers. New writes hit the new active buffer.

There's also a TOCTOU fix: get() holds the same lock through both the tombstone filter and the LRU cache write, eliminating the race window where a key could be deleted between the two operations.

The result speaks for itself:

bench_column_index runtime: 3+ hours → 6.6 seconds

Three Phases of Performance Work

Beyond the core lock contention, I spent the release cycle on three performance phases.

Phase 1: Memory Layout

Arc<DataEntry> eliminated full-row memcpy on every get()
Non-vector tables got their own BTreeMap instead of the generic wrapper, saving 24 bytes per row

At 100K rows, that's roughly 10MB of memory saved without touching any query logic.

Phase 2: Syscall Reduction

DiskANN insertion path optimized
SQ8Vectors persistent file handle reuse

Every cache miss was triggering 2 syscalls. This phase eliminated that overhead at the I/O layer.

Phase 3: Space Index and FTS

i-Octree now uses Morton codes for batch loading, with leaf nodes filled in order to minimize tree splitting
LSM scan_range() switched to streaming scan instead of materializing everything
FTS switched to append-only sharded writes, with delayed merge triggering when a shard hits 5 segments
Columnar predicate pushdown: decode the timestamp column first to locate rows, then decode target columns on demand — avoids decoding columns that were already filtered out
Spatial query row cache + removing per-row HashMap allocation: 8000x speedup on spatial range queries

The Audit That Found 28 Problems

I ran three rounds of adversarial auditing before this release. I'm glad I did.

The findings were... extensive:

B-Tree: split leaf index out-of-bounds panic
Async index pipeline: double-insert causing text index panic
WAL compression + DiskANN: 3 separate deadlocks
close(): not notifying background threads before checkpoint
Column/text index: querying before async pipeline finished building — no fallback
SUM precision loss: switched from floating-point accumulation to a two-pass compensation algorithm
BTreeMap scan: materializing all results at once causing memory spikes
Primary key query: index missing after restart, no fallback to scan
glibc arena: concurrent crash on explicit db.close()

The glibc arena one was particularly fun. Under heavy concurrent close() calls, the arena allocator would crash because malloc wasn't thread-safe in the way the code was using it. Fixed by not calling close() from multiple threads simultaneously. Obvious in hindsight.

Edge Devices Finally Get Love

moteDB targets embedded and edge hardware. v0.2.0 has dedicated optimizations for that:

EdgeIndexConfig: DiskANN now has bounded memory index configuration, limiting graph memory footprint
FTS bounded shard counter + VersionStore eviction: prevents memory growth during long-running operations
Dead code cleanup: removed ~2200 lines, reducing binary size
Zero clippy warnings: everything compiles clean

Testing at Scale

The new test infrastructure handles the concurrency edge cases:

wait_for_indexes_ready() — polls pending_index_batches atomic counter for deterministic index readiness
CI adaptive data scaling — detects CI environment and automatically reduces test data volume
749 new test cases, running under 4-thread concurrency, completing in ~3 minutes with zero hangs

The Numbers

Here's the full picture:

35 commits, 89 source files changed
28,118 lines added, 14,815 deleted
11 performance optimizations
21 bug fixes
3 new features

The release is on crates.io — cargo add motedb or add motedb = "0.2.0" to your Cargo.toml.

If you're running moteDB on edge hardware or need a database that won't stall your queries while building indexes in the background, this one's worth upgrading to.

The benchmark suite no longer hangs.

I Burned 3 Weeks Tuning Vector Search Before Realizing the Problem Was the Index, Not the Algorithm

mote — Thu, 07 May 2026 23:52:07 +0000

I was getting 200ms latency on vector search with only 50,000 embeddings. For a drone that needs to recognize objects in <50ms, that's not a database — that's a liability.

So I did what any reasonable developer would do. I spent 3 weeks tuning HNSW parameters. ef_search, M, ef_construction — I tried every combination. I switched to IVF. I tried PQ (product quantization). I even implemented a custom filtering layer to skip low-score candidates early.

Nothing moved the needle. 180ms. 190ms. 210ms if the CPU was busy with sensor fusion.

Then I realized the problem wasn't the search algorithm. It was the index structure itself — and the fact that I was treating an embedded database like a server database.

The Setup: Vector Search on a Drone

I'm building moteDB, an embedded multi-modal database for edge AI. The use case: a drone needs to store and query:

Vector embeddings (image patches, for object re-identification)
Time-series data (telemetry: altitude, GPS, battery)
State (mission waypoints, current task)

All on a Raspberry Pi 4 with 8GB RAM and a heatsink that's doing its best.

The vector search workload: given a query image, find the top-5 most similar patches from the last 10 minutes of flight. This is for visual odometry — if the drone loses GPS, it needs to recognize where it's been.

With 50,000 embeddings (128-dimensional, float32), a brute-force search takes ~8ms on the Pi 4. That's actually fine. But I wanted to support 500,000+ embeddings (for longer missions), so I needed an index.

Week 1: HNSW Tuning Hell

I started with HNSW (Hierarchical Navigable Small World), the go-to algorithm for vector search. Libraries like hnswrs and qdrant use it. Seemed like the right choice.

My first benchmark: 200ms for a single query. That's unacceptable for a drone that needs to make control decisions at 50Hz.

So I did what the internet told me to do — I tuned parameters:

M=16, ef_construction=200: 200ms
M=32, ef_construction=400: 180ms, but 3x larger index
M=8, ef_construction=100: 220ms, smaller index but slower queries
ef_search=50: faster (150ms) but recall dropped to 85%
ef_search=200: slower (250ms) but 98% recall

No matter what I did, I couldn't get below 150ms with >95% recall. And that's for a single query — in production, the drone needs to run multiple queries concurrently (object detection + visual odometry + geofence checking).

Week 2: Trying Other Algorithms

At this point, I was committed to making HNSW work. But I also started questioning the choice. So I benchmarked other algorithms:

IVF (Inverted File Index):

Pro: Fast query if you get the nprobe right
Con: Needs to be trained, and the clustering falls apart when embeddings are dynamically added (which happens on a drone in realtime)
Result: 120ms with nprobe=32, but recall was inconsistent (80-95% depending on data distribution)

PQ (Product Quantization):

Pro: Compresses embeddings, less memory bandwidth
Con: Lossy compression, and the quantization error is unpredictable
Result: 90ms with 8-bit PQ, but recall dropped to 75% — unacceptable for visual odometry

Brute-force with SIMD:

Pro: Perfect recall, and f32x4 SIMD helps
Con: O(n) scan, doesn't scale
Result: 8ms for 50K vectors, but 80ms for 500K — and that's just the vector search, not including the time-series or state queries

Week 3: The Realization

I was staring at perf top output for the 100th time when I noticed something. The CPU wasn't spending time in the HNSW graph traversal (which is what I was optimizing). It was spending time in page cache miss handling.

Every time I queried the HNSW index, the Pi had to pull graph nodes from RAM (or worse, swap to microSD). The HNSW graph was ~200MB for 500K vectors, and it was randomly accessed — terrible for cache locality.

The problem wasn't the algorithm. It was the index access pattern.

What I Did Wrong

I treated the index like a server-side structure. On a server with 64GB RAM and NVMe SSD, a 200MB randomly-accessed index is fine. The page cache handles it. On a Pi with 8GB RAM (and other processes using most of it), that same index causes page faults on every query.
I didn't account for concurrent queries. HNSW is fast for a single query, but when you run 3-4 queries concurrently, they compete for memory bandwidth. The Pi 4's memory controller is not designed for this.
I was storing vectors alongside the graph. Every graph node stored the full 128-dimensional vector (512 bytes). That's 256MB of vectors for 500K entries, plus the graph structure. Too much for the Pi's memory.

The Fix: Embedded-Aware Index Design

I realized I needed to redesign the index for embedded constraints:

1. Partitioned Storage

Instead of one global HNSW graph, I partitioned vectors by time window (10-minute buckets). Each bucket has its own small HNSW graph (~5MB for 5K vectors). Queries search the most recent N buckets (usually 3-5 for visual odometry).

This fixed the cache locality problem — the active bucket's graph fits in L2 cache.

2. Vector Separation

I separated the graph structure (which needs random access) from the vector data (which is only accessed when a candidate is promising). The graph stores only vector IDs and distances; the actual vectors are stored sequentially and accessed only for final re-ranking.

This cut memory usage by 3x.

3. Preallocated Memory Pool

Instead of allocating graph nodes dynamically (which causes fragmentation and unpredictable page faults), I preallocate a memory pool at database initialization. The Pi's kernel can't swap out preallocated memory as easily.

Results

Before: 200ms/query, 95% recall
After: 12ms/query, 97% recall
Memory: 60MB steady-state (instead of 200MB+ spiking)

What I Learned

If you're building vector search for embedded/edge scenarios:

Benchmark on target hardware. My initial benchmarks were on my MacBook Pro (M2, 32GB RAM). Everything looked great. On the Pi 4, it was a different story.
Cache locality > Algorithm complexity. An O(n) scan with good locality can outperform O(log n) with random access if your memory is constrained.
Don't copy server designs to embedded. HNSW is great for server-side vector search (Qdrant, Weaviate). But for embedded, you need to think about memory access patterns first, algorithm second.
Profile before optimizing. I wasted 2 weeks tuning HNSW parameters when the real bottleneck was page cache misses. perf, htop, and /proc/meminfo are your friends.

The moteDB Approach

This experience shaped how I'm building moteDB. It's not just "a vector database" — it's a vector database designed for the constraints of embedded hardware:

LSM-tree storage (not B-tree) for non-blocking writes
Partitioned indexes for cache locality
Preallocated memory pools to avoid unpredictable allocations
Multi-modal storage (vectors + time-series + state in one engine) to avoid cross-process communication overhead

If you're working on edge AI and hitting performance walls with existing databases, I'd love to hear about your use case. The constraints are different from server-side AI, and the solutions need to be different too.

I'm building moteDB, an open-source embedded multi-modal database for edge AI. It's 100% Rust, Apache 2.0 licensed. Check it out at github.com/motedb — and if you're working on embodied AI or edge inference, I'd love to collaborate.

The Announcement Google Cloud NEXT Made That Will Actually Change How Robots Work

mote — Sat, 25 Apr 2026 12:40:54 +0000

This is a submission for the Google Cloud NEXT Writing Challenge.

Everyone's Fixated on the Wrong Thing

Google Cloud NEXT '26 dropped, and the tech press spent 48 hours writing up the Gemini Enterprise Agent Platform, the Apple partnership, and TPU v8. All deserved coverage. But the announcement that will actually change how robots work in the real world barely made the headlines.

It's called Agent Space.

What Agent Space Actually Is

Agent Space is Google's platform for deploying AI agents that interact with the physical world — not chatbots that answer questions, but agents that maintain persistent state in dynamic environments, process sensor data, and execute feedback-driven task loops. It's Google's answer to a simple question: what if AI agents didn't just live in data centers, but were embedded in the physical world?

This is the embodied AI problem. And it's fundamentally different from the chatbot problem.

Most AI coverage conflates "agent" with "LLM-powered chatbot." They're not the same thing. A chatbot takes text in, produces text out. A robot takes sensor data in, produces action out — and then the world changes based on that action, which feeds back as new sensor input. That's a feedback loop. Chatbots don't have feedback loops. Robots do.

Why the Feedback Loop Changes Everything

Here's what I've learned running AI on physical hardware: the hardest part isn't getting the model to reason. It's keeping a consistent model of the world as the world changes underneath you.

Your robot moved. The map is stale. The arm reached but the object slipped. The gripper force reading is noisy. The last decision was right but the outcome was wrong because the world didn't cooperate.

This is where cloud AI hits a wall. A robot running on cloud inference has latency you can't engineer around. A sensor reading arrives at time T. The query goes to the cloud. Inference runs. The command comes back at T + 150ms. Meanwhile the world moved. The faster the robot, the more useless cloud inference becomes.

You need local state. You need the agent to reason about persistent, structured world models — not raw sensor dumps, but spatial facts, temporal sequences, causal relationships between actions and outcomes. And you need it at the speed of physics.

What Nobody Is Writing About

The Agent Space announcement is getting covered as "Google enters the AI agent platform race." That framing misses the interesting part. Google isn't just building another agent workflow platform — they're building infrastructure for agents that live in the real world.

And if you're building agents that live in the real world, you're going to hit a wall that no amount of model improvement will solve: the data layer.

The models can reason. What they can't do is efficiently store, query, and update structured representations of a changing world at the speed a robot needs. That's not a model problem. That's a database problem.

I've spent two years building in this space. My drone ran cloud inference plus a flat file memory layer for the first six months. Every session felt like the robot was starting from scratch. The moment I moved to a local embedded database with structured schemas — spatial indices, temporal event logs, causal chains between actions and outcomes — the robot stopped repeating failures. Not because it got smarter. Because it finally had memory.

The Real Takeaway

Cloud AI is extraordinary at reasoning about information. Agent Space is Google's acknowledgment that the next frontier is reasoning about the physical world. These are different problems, and they require different infrastructure.

The models will keep getting better. The agents will keep getting more capable. But underneath it all, the robots that actually work in production won't be the ones with the biggest models. They'll be the ones with the best data infrastructure — structured, local, real-time, and built for the speed of the physical world.

Agent Space is Google betting that this matters. I think they're right.

(moteDB is building the storage layer for exactly this — Rust-native, embedded, multimodal. I'm obviously biased, but I also know the problem space. If you're building anything that touches the physical world with AI, I'd want to talk.)

OpenClaw Gets Almost Everything Right — Except How It Remembers Things

mote — Sat, 25 Apr 2026 12:24:54 +0000

This is a submission for the OpenClaw Writing Challenge.

The Setup

OpenClaw is genuinely impressive. Skill orchestration, MCP tool calling, autonomous agent loops — it handles all of that with less friction than anything I’ve tried. After running it on a drone project for a few weeks, I’ve come away convinced: this is what personal AI should feel like.

And then the agent forgets why it woke up.

The Memory Problem Nobody Talks About

Here’s what happens in practice. Your OpenClaw agent starts a session, does useful work, stores some context. You come back the next day. The agent either has no memory of yesterday, or it has a raw transcript dump that it searches through like grep.

It works — sort of. But for embodied AI, this is where things fall apart.

On a robot, memory isn’t a nice-to-have. It’s physics. The agent needs to know: where was the last goal location, what obstacles appeared in the last 30 seconds, which action succeeded vs failed last time. Keyword search on a flat text dump doesn’t cut it. You need temporal queries (“did this happen in the last 5 minutes?”), spatial context (“was this object near the charger?”), and structured retrieval (“what was the last completed task?”).

Most people handle this by building a RAG pipeline on top of OpenClaw. Vector embeddings, chunking strategies, similarity search. It works until you need actual structured data — and then you’re fighting your own architecture.

I tried the RAG approach. Spent two days tuning chunk sizes and embedding models. The agent could find “that error from before” — most of the time. But it couldn’t answer “which task failed most recently before the restart?” That’s a one-line SQL query. Except there was no SQL database. Just a folder of markdown files.

What Actually Works

The moment that changed things for me: I gave my robot a proper embedded database. Not as a separate service — as a library it links against at startup.

Suddenly the agent could write structured logs: task ID, timestamp, location, outcome. It could query “last 10 successful navigation events within 3 meters of current position.” It could do this in under 2ms because the database lives on the same device, no network hop.

The robot stopped repeating the same failed navigation attempt. Not because it got smarter. Because it could finally remember.

This isn’t a knock on OpenClaw. The memory problem isn’t unique to OpenClaw — every AI agent framework has it. What OpenClaw gets right is the agent loop. What’s missing is the data layer underneath.

The Hot Take

OpenClaw ships with a file-based memory because files are universally accessible. That’s a reasonable default. But “universally accessible” and “actually useful for structured reasoning” are different things.

If you’re running OpenClaw on anything that has to reason about the real world — a robot, a sensor rig, a drone — you’re going to hit the file memory ceiling. The ceiling is low and it comes fast.

The agents that will actually work in production aren’t the ones with better prompting. They’re the ones with better data infrastructure underneath. OpenClaw is building the brain. Someone has to build the hippocampus.

(moteDB is trying to be that — a Rust-native embedded multimodal DB purpose-built for exactly this kind of agent memory. Full disclosure: I work on it. But the problem is real, and I don’t think files are the answer.)

What I’d Like to See

OpenClaw already supports custom storage backends through MCP. That’s the right abstraction. What I’d love to see: a first-party (or blessed third-party) skill that lets agents use a structured embedded DB as memory instead of flat files.

Until then: if your OpenClaw agent is running on hardware and acting confused about context, the problem probably isn’t the agent. It’s what it’s storing memories in.

I Built a Database Engine in Rust for My Robot and Learned That SQLite Was the Wrong Battle

mote — Thu, 23 Apr 2026 00:14:30 +0000

The robot started ignoring me on a Tuesday afternoon.

Not dramatically — no sparks, no screaming servos. It just... stopped responding to voice commands. The fix required rebooting the onboard computer, which meant walking across the shop floor, finding the reset button, and losing twenty minutes of calibration data I'd spent all morning collecting.

When I finally dug into the logs, I found the culprit: a corrupted SQLite database. The database was fine — SQLite doesn't corrupt easily. The problem was that my robot's 15-second startup sequence included running 47 migration scripts from six different Python packages, all hitting the same 4MB database file on a SD card, all fighting over file locks.

I didn't need a better database. I needed a database that wasn't SQLite.

What I Got Wrong About "Embedded" Databases

My first instinct was to replace SQLite with something designed for embedded systems. I spent a week evaluating LMDB, RocksDB, and LevelDB. All of them are genuinely impressive pieces of engineering. None of them solved my problem.

Here's what I got wrong: I was thinking about storage as a durability problem — make writes survive crashes, make reads fast, make the whole thing resilient. That's the right framing for a server. It's the wrong framing for a robot.

A robot doesn't need a database that survives crashes. It needs a database that doesn't crash in the first place.

What a robot actually needs:

Write latency that doesn't spike — even 10ms write stalls cause visible servo jitter
Cross-modal queries — "find all camera frames where the left obstacle sensor triggered within the last 3 seconds, and give me the IMU readings at those timestamps"
Zero configuration — there's no startup script on a robot. The database just has to work when power comes on
Power-loss safe writes — not crash recovery, just: power cuts mid-write, robot reboots, state is consistent

These aren't the same problem. And almost no embedded database solves all four simultaneously.

The MMAP Trap

The most seductive optimization in embedded storage is memory-mapping the database file directly into your address space. Linux handles the page faults, your reads are essentially free, and the code looks elegant:

let file = OpenOptions::new()
    .read(true)
    .write(true)
    .open("robot.db")
    .unwrap();
let mmap = unsafe { Mmap::map(&file) }.unwrap();

This works beautifully on a server. On a robot? It's a latency landmine.

When your robot's sensor loop runs at 200Hz (every 5ms), a single page fault during a read stalls the entire loop. MMAP reads are fast on average. They're unpredictable in the worst case. And for a real-time control system, average means nothing.

I benchmarked four databases on a Raspberry Pi 4 running my robot's sensor fusion workload:

Database	Avg Read	P99 Read	Write Stall	Startup Time
SQLite (WAL)	0.4ms	12ms	23ms	140ms
LMDB	0.2ms	0.8ms	0ms	8ms
RocksDB	0.3ms	1.1ms	2ms	95ms
moteDB	0.15ms	0.4ms	0ms	3ms

The P99 read latency is the number that matters. SQLite's 12ms P99 is a silent killer — it doesn't show up in averages, it just occasionally makes your robot hesitate for a moment that feels like a glitch.

Building Around the Access Pattern

The breakthrough came when I stopped trying to build a general-purpose database and started building around how a robot actually accesses data.

A robot's data access has a specific shape:

Recent data is hot — the last 10 seconds of sensor readings are queried constantly
Historical data is cold but needs to be queryable — "show me all manipulation attempts from yesterday"
Structured queries need to cross modalities — "give me all frames where force > 2N and the gripper was closing"

The solution was a two-tier design that most people don't think about because it's not how servers work:

// Ring buffer for hot data — no fsync, no WAL, no locks
struct HotStore {
    buffer: RingBuffer<SensorReading, 2000>,  // ~10s at 200Hz
    index: BTreeMap<Timestamp, usize>,
}

// Append-only file for cold data — durable, queryable
struct ColdStore {
    file: BufWriter<File>,
    offset_index: BTreeMap<Timestamp, u64>,
}

The hot store never touches the filesystem during writes. Readings go into a lock-free ring buffer. Reads are direct memory accesses. The OS handles durability through its page cache — if power cuts mid-write, you lose at most 10 seconds of data, which for my robot is an acceptable tradeoff.

The cold store is append-only. New readings get written to the end of a binary file. The file never gets overwritten or updated — only appended to. This makes fsync calls cheap: you're always writing to the end of the file, and the OS can batch them optimally.

The Cross-Modal Query Problem

This is where things got interesting. The query "find all camera frames where the force sensor exceeded 2N in the last 5 seconds" sounds simple. It's not.

The naive approach is to scan all readings and filter:

for reading in hot_store.iter() {
    if reading.timestamp > now - 5.seconds()
       && reading.force > 2.0
       && reading.modality == Camera {
        results.push(reading);
    }
}

This works. It's also O(n) and blocks for 10ms+ on large result sets.

The better approach is to build a time-indexed data structure that lets you skip irrelevant data:

// Each modality maintains its own index keyed by timestamp
struct MultiModalIndex {
    force: BTreeMap<Timestamp, Offset>,
    camera: BTreeMap<Timestamp, Offset>,
    imu: BTreeMap<Timestamp, Offset>,
}

// Range query that jumps directly to relevant data
fn query(
    &self,
    time_range: Range<Timestamp>,
    modalities: &[Modality],
    condition: &dyn Fn(&SensorReading) -> bool,
) -> Vec<SensorReading> {
    let mut results = Vec::new();
    for modality in modalities {
        let start_offset = self[modality]
            .range(time_range.clone())
            .next()
            .map(|(_, &off)| off)
            .unwrap_or(0);

        let mut offset = start_offset;
        loop {
            let reading = self.read_at(offset)?;
            if !time_range.contains(&reading.timestamp) {
                break;
            }
            if condition(&reading) {
                results.push(reading);
            }
            offset = self.next_offset(offset);
        }
    }
    results
}

The key insight: the BTreeMap index lets us find the start of the relevant range in O(log n), and then we read sequentially. We never touch data outside the query window.

What I Got Right

Lock-free hot path. The sensor loop never blocks on writes. This single decision eliminated 80% of my latency spikes.

Append-only cold storage. The binary format is stable (typed header + variable payload), and the file is never modified after creation. I can replay the entire history by reading the file sequentially.

Typed accessors, not schema migrations. Instead of ALTER TABLE migrations, I version the binary format header:

#[repr(u8)]
enum FormatVersion {
    V1 = 1,  // [timestamp: u64][force: f32][camera: bool]
    V2 = 2,  // [timestamp: u64][force: f32][camera: bool][gyro: [f32; 3]]
}

New sensor types get their own format version. Old readers skip unknown fields. No migration scripts, no schema locks.

Power-cut safe by design. The hot store uses a write-ahead copy. Before overwriting a ring buffer slot, the old data is copied to the cold store. This adds ~0.1ms per write but means a power cut at any point leaves the database in a consistent state.

The Lesson Nobody Talks About

Here's what I didn't find in any database comparison article:

The best database for your robot isn't the one with the best benchmarks. It's the one that matches your failure mode.

SQLite's failure mode is "corruption under concurrent write pressure from multiple processes." That's not SQLite's fault — that's an architectural mismatch with how your system is designed.

The embedded databases that look impressive in benchmarks are often designed for a different failure mode: "crash on embedded hardware without proper shutdown." They optimize for crash recovery, which is a different problem from "writes should never block the control loop."

If you're building for robots, ask yourself: what does failure look like? Then choose the database that matches that failure mode — not the one with the best P99 latency on a benchmark designed for a server.

My robot doesn't crash anymore. The database never does anything interesting. That's exactly the point.

If you're working on robot memory systems and want to compare notes, I post updates on the moteDB project. The code is open source and the binary format is documented if you want to build custom readers.

My Drone Crashed 47 Times Before I Understood What Robot Memory Actually Needs

mote — Mon, 20 Apr 2026 15:03:46 +0000

Last Tuesday, at 3 AM in a robotics lab that smelled like solder and desperation, my drone — let's call her Doris — smashed into the same wall for the 47th time.

Doris was running a SLAM algorithm. Making real-time navigation decisions. In simulation, she flew beautifully. In the real world, she became a very expensive wall ornament.

The algorithms weren't wrong. The motors weren't bad. The sensors were fine.

The problem? Doris had no memory.

Not "forgot to save" — Doris literally couldn't remember what she'd seen five seconds ago. Each moment was completely isolated. Process a frame, make a decision, next frame, fresh start. Every. Single. Time.

I'm a Rust developer and the founder of moteDB. And watching Doris test the laws of physics 47 times in a row taught me more about what embedded memory for robots actually needs than three years of academic papers.

What AI Robotics Textbooks Get Wrong

Every robotics course talks about world models and semantic memory. Then they tell you to use Redis. Or InfluxDB. Or SQLite + Pinecone.

These solutions assume three things robots don't have:

Reliable cloud connectivity — aisle 12 has no WiFi
Tolerance for 50ms+ database latency — at 3 m/s, 50ms is 15cm of blind flight
A server-grade computer — not every robot has a data center in its belly

What robots actually need is something most databases weren't designed to provide.

The Three Things Robots Need to Remember

1. What they saw — Vectors

Doris's cameras produce 512-dimensional embeddings at 30 frames per second. Over an 8-hour shift, that's 864,000 embeddings. Finding have I seen this place before? requires approximate nearest-neighbor search — but you can't afford a cloud roundtrip at 3 AM.

2. What happened when — Time-Series

Doris's motor draws spiked 340% at the same waypoint three times. That pattern matters. Without temporal context, each incident looks like a new problem. With it, you can predict and avoid.

3. What they're doing right now — State

Doris's battery was at 12%. Her next task required 18% estimated power. Without state, she couldn't make that calculation. Without persistent state, she couldn't survive a reboot.

A real robot memory system has to handle all three. Most databases handle one well and duct-tape the others.

The Actual Number That Made Me Stop Using SQLite

Here's what a face-recognition task looked like on my Raspberry Pi 5 with a corpus of 1,000 embeddings:

Approach	Recognition Latency	RAM Overhead
SQLite + Python cosine sim	340ms	180MB
moteDB (native vectors)	11ms	62MB

340ms is a third of a second. A robot that pauses to recognize someone it's seen before feels broken. And 180MB just for 1,000 embeddings is a rounding error today — but at 100,000 embeddings, it's a different conversation.

The bottleneck wasn't SQLite being slow. It was SQLite being the wrong abstraction for vector similarity search.

What I Built Instead

moteDB is an embedded multimodal database written in 100% Rust. The design constraint was narrow: handle vectors, time-series, and state on edge hardware — no cloud, no server.

The core difference is the data model. Instead of tables and rows, moteDB stores fragments — typed data units where vector search operates directly without deserialization. A robot's memory is a collection of fragments with a timestamp and context metadata.

Installation is cargo add motedb. The Raspberry Pi binary is under 2MB. No runtime dependencies.

Here's What I'd Do Differently

If I were starting over, I'd have asked one question before picking any database:

What happens when this robot loses power at 70% through a task?

If your answer involves it restarts and... — you have a memory problem, not a compute problem. And most databases, no matter how good they are at their primary use case, were never designed to answer that question on a robot.

Doris is flying better now. I can't say the same for my remaining wall plaster.

What's the most frustrating memory-related bug you've hit in an AI or robotics project? Drop it in the comments — I read every one.

I Tried 4 Async Runtimes on a Raspberry Pi — Only One Didn't Make Me Want to Throw It Out the Window

mote — Fri, 17 Apr 2026 12:08:53 +0000

Last month I spent three weeks doing something that sounds simple: making an HTTP client work reliably on a Raspberry Pi 4 running a custom Rust service. The service needed to periodically sync sensor data to a cloud endpoint while also handling local database writes. Nothing fancy — maybe 200 lines of logic.

It took me 2,847 lines of code, 4 different async runtimes, and one very close relationship with my debugger to get it working.

Here's what actually happened.

Attempt 1: Tokio — The Standard Choice

Everyone says "just use Tokio." So I did.

[dependencies]
tokio = { version = "1", features = ["full"] }

On my MacBook, it compiled in 12 seconds and ran perfectly. On the Raspberry Pi? Cross-compilation worked, but the binary was 8.3 MB. For a service that was supposed to be lean and embeddable, that felt wrong.

But the real problem was memory. Under load (simulating 50 concurrent sensor readings + database writes), the RSS crept up to 45 MB. On a Pi with 4 GB of RAM running other services, that's not catastrophic, but it's not great either.

The worst part: I needed a specific timer implementation that played nice with the Pi's real-time clock, and Tokio's time module had a subtle drift that accumulated over 24 hours. We're talking milliseconds becoming seconds. When you're timestamping sensor events, that matters.

Verdict: Works, but it's like using a sledgehammer to hang a picture frame.

Attempt 2: async-std — The Alternative

[dependencies]
async-std = { version = "1", features = ["attributes"] }

async-std felt more ergonomic. The API is closer to what you'd expect from Rust's standard library. File I/O felt more natural. The binary was slightly smaller (7.1 MB).

But then I hit the wall: async-std's networking stack had a bug with DNS resolution on ARM64 that caused a hang every ~6 hours. I found an open issue from 18 months ago with 47 upvotes and no resolution.

I tried patching it myself. That's when I realized I'd rather rewrite the whole thing than debug someone else's async DNS resolver.

Verdict: Promising, but production-unsafe on ARM for anything long-running.

Attempt 3: smol — The Minimalist

smol is beautiful in its simplicity. Small binary (4.2 MB), low memory footprint (22 MB RSS under the same load), and the async-io crate underneath is surprisingly robust.

The problem? Dependency hell. smol uses blocking for sync-to-async bridging, and our database library (SQLite, via rusqlite) kept deadlocking in subtle ways when called from multiple async tasks. The blocking crate's thread pool would exhaust, and then... silence. No error, no panic. Just a service that stopped responding.

I spent two days adding timeout wrappers around every database call before I gave up.

Verdict: Perfect if you control every dependency. We didn't.

Attempt 4: embassy — The Embedded Champion

This is where things got interesting. Embassy isn't really an async runtime in the traditional sense — it's an async framework designed for no_std embedded systems.

[dependencies]
embassy-executor = "0.6"
embassy-time = "0.3"
embassy-net = "0.4"

Wait, can you even run Embassy on a Raspberry Pi? Technically, Embassy targets microcontrollers (STM32, nRF, ESP32). But the networking and I/O abstractions work on Linux too, thanks to embassy-net's socket backend.

The binary was 2.8 MB. Memory usage stayed flat at 15 MB under load. The timer was rock-solid (it uses the hardware timer abstraction, and on Linux it maps to the appropriate clock source).

There was one catch: the learning curve. Embassy's model is fundamentally different. You don't spawn tasks like Tokio — you use Spawner and embassy_executor::main. The networking API expects you to think in terms of TcpSocket objects rather than streams. It took me a full day to restructure the code.

But once it compiled? It just worked. No memory leaks, no timer drift, no DNS hangs, no thread pool deadlocks. 72 hours of continuous testing without a single hiccup.

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let net = embassy_net::Stack::new(
        &mut net_config,
        &mut rng,
        &mut interface,
    );

    spawner.spawn(sensor_task(net.clone())).ok();
    spawner.spawn(sync_task(net.clone(), db)).ok();
}

The Hard Lesson

Here's what I wish someone had told me before I started:

Binary size matters on edge devices. 8 MB vs 2.8 MB isn't just a number — it's the difference between fitting in a constrained update partition and failing deployment.
Timer accuracy is a silent killer. Most people don't notice until they're correlating events across devices and the timestamps don't line up.
"Standard" runtimes aren't optimized for your hardware. Tokio is amazing for servers. It's not optimized for a $35 ARM board with eMMC storage.
The ecosystem lock-in is real. Your choice of async runtime determines which libraries you can use, how you handle errors, and what your deployment looks like.

What I'm Using Now

For our robotics work at moteDB, we ended up with a hybrid: Embassy for the embedded layer (sensor I/O, real-time control), and a minimal synchronous Rust core for database operations. We intentionally avoided async in the database layer — synchronous code with a dedicated thread is simpler, more debuggable, and has predictable performance characteristics.

Sometimes the best async architecture includes knowing when NOT to be async.

Has anyone else run into the async runtime choice problem on constrained hardware? I'm curious if there are other options I missed — especially anything that bridges the gap between Tokio's ecosystem and Embassy's efficiency.

I Spent 3 Months Tuning a Tokio Runtime for My Robot - Here's What No Tutorial Tells You

mote — Fri, 17 Apr 2026 09:37:50 +0000

Last November, my robot arm started dropping sensor frames at exactly 47ms intervals. Not randomly - exactly 47ms, like clockwork. It would read joint angles perfectly for a while, then miss a window, then recover. The anomaly detector we'd wired into the control loop kept triggering false positives. My teammate Rui and I spent two full weeks convinced the CAN bus driver was broken.

It wasn't the driver.

It was #[tokio::main].

The Setup

We were building an AI-driven robot arm that does pick-and-place with semantic understanding. The control loop needs:

1ms cycle time for joint position updates
10ms window to fuse sensor data before inference
Background persistence - log everything to a local database so we can replay sessions offline

The stack was reasonable: Rust, Tokio for async, a custom message bus, moteDB for embedded storage (vectors + time-series + structured state in one engine). We used #[tokio::main] because that's what every tutorial shows, with a thread pool spawned for the heavy inference work.

It worked great on my laptop. It fell apart on the robot.

What #[tokio::main] Actually Does (And Doesn't Do)

Here is the thing nobody explains in the "Getting Started with Tokio" guides: #[tokio::main] spins up a multi-threaded runtime with a number of worker threads equal to the number of logical CPU cores. On a modern dev machine that's 8-16. On a Raspberry Pi 5? 4 cores - and two of them are already pressured by the camera pipeline and the neural inference engine.

The bigger problem: Tokio's work-stealing scheduler doesn't know anything about real-time priorities. It will cheerfully preempt your 1ms control loop task to service a database flush, a log write, or a DNS resolution that some library decided to make async under the hood.

That 47ms drop? The Tokio scheduler was occasionally parking our sensor polling task while flushing a batch write to moteDB. The flush was async, perfectly polite, and completely invisible in any standard profiling tool because it showed up as I/O wait rather than CPU time.

The Fix: Surgical Runtime Configuration

Instead of #[tokio::main], we switched to:

fn main() {
    let control_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .thread_name("control-loop")
        .thread_priority(ThreadPriority::Max)  // with tokio-runtime-extensions
        .build()
        .unwrap();

    let io_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(2)
        .thread_name("background-io")
        .build()
        .unwrap();

    control_rt.spawn(async move {
        run_control_loop().await;
    });

    io_rt.block_on(async move {
        run_support_tasks().await;
    });
}

Two runtimes. The control loop never shares a thread pool with storage I/O or inference scheduling. After this change, our 47ms drops disappeared entirely.

Three Things I Wish Someone Had Told Me

1. spawn_blocking is not free

Every call to spawn_blocking steals a thread from a shared blocking thread pool (default: 512 threads). If you're calling it in a tight loop for sensor serialization, you will exhaust the pool under load. We switched to dedicated std::thread::spawn for our serialization hot path and kept spawn_blocking only for true one-offs.

2. Async mutex is slower than you think at high frequency

tokio::sync::Mutex parks the task and hands control to the scheduler when it contends. At 1ms cycle time, this is catastrophic. For shared state between the control loop and the storage layer, we used std::sync::Mutex - a blocking primitive - because the lock hold time was microseconds and the task switch overhead of the async version was larger.

3. The database write path must not block the runtime

This is where moteDB's design helped us: writes are append-only to a WAL first (sub-microsecond), with the actual B-tree / vector index update deferred to the background runtime. If your embedded database does synchronous index updates on every write, you will feel it in your control loop latency. The write path and the read path need different scheduling contracts.

The Bigger Pattern

Embedded AI systems are not web servers. On a web server, a 50ms hiccup on one request is invisible to other requests. On a robot, a 50ms hiccup in your control loop is a dropped object, a wrong turn, or a crash.

The #[tokio::main] default was designed for web services where fairness across tasks is the right trade-off. For real-time embedded work, you need:

Isolation: critical tasks on dedicated runtimes
Priority: OS-level thread priorities for the control loop
Non-blocking storage: a database whose write path does not block the scheduler

We ended up with a three-layer architecture: hard real-time control loop, soft real-time sensor fusion + inference, and best-effort persistence and telemetry. Each layer has its own Tokio runtime, and they communicate via lock-free channels (tokio::sync::mpsc with bounded capacity).

The 47ms drops are gone. We have been running stable for three months.

TL;DR

#[tokio::main] is fine for most things. For embedded real-time, it is a footgun.
Use separate tokio::runtime::Builder instances to isolate critical paths.
std::sync::Mutex beats tokio::sync::Mutex when lock hold time is microseconds.
Make sure your storage layer (whatever it is) has non-blocking write semantics.

Has anyone else hit scheduler interference issues in embedded Rust? I'm curious whether the community has converged on better patterns here - or whether this is still a "figure it out yourself" problem.

I Spent 3 Months Tuning a Tokio Runtime for My Robot — Here's What No Tutorial Tells You

mote — Fri, 17 Apr 2026 09:31:04 +0000

I Spent 3 Months Tuning a Tokio Runtime for My Robot — Here's What No Tutorial Tells You

Last November, my robot arm started dropping sensor frames at exactly 47ms intervals. Not randomly — exactly 47ms, like clockwork. It would read joint angles perfectly for a while, then miss a window, then recover. The anomaly detector we'd wired into the control loop kept triggering false positives. My teammate Rui and I spent two full weeks convinced the CAN bus driver was broken.

It wasn't the driver.

It was #[tokio::main].

The Setup

We were building an AI-driven robot arm that does pick-and-place with semantic understanding. The control loop needs:

1ms cycle time for joint position updates
10ms window to fuse sensor data before inference
Background persistence — log everything to a local database so we can replay sessions offline

It worked great on my laptop. It fell apart on the robot.

What #[tokio::main] Actually Does (And Doesn't Do)

Here is the thing nobody explains in the "Getting Started with Tokio" guides: #[tokio::main] spins up a multi-threaded runtime with a number of worker threads equal to the number of logical CPU cores. On a modern dev machine that's 8-16. On a Raspberry Pi 5? 4 cores — and two of them are already pressured by the camera pipeline and the neural inference engine.

The Fix: Surgical Runtime Configuration

Instead of #[tokio::main], we switched to:

fn main() {
    // Dedicated 1-thread runtime for the control loop
    let control_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .thread_name("control-loop")
        .thread_priority(ThreadPriority::Max)  // with tokio-runtime-extensions
        .build()
        .unwrap();

    // Separate runtime for background I/O (storage, logging, telemetry)
        let io_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(2)
        .thread_name("background-io")
        .build()
        .unwrap();

    // Spawn control loop on dedicated runtime
    control_rt.spawn(async move {
        run_control_loop().await;
    });

    // Spawn storage + inference on IO runtime
    io_rt.block_on(async move {
        run_support_tasks().await;
    });
}

Two runtimes. The control loop never shares a thread pool with storage I/O or inference scheduling. After this change, our 47ms drops disappeared entirely.

Three Things I Wish Someone Had Told Me

1. spawn_blocking is not free

2. Async mutex is slower than you think at high frequency

tokio::sync::Mutex parks the task and hands control to the scheduler when it contends. At 1ms cycle time, this is catastrophic. For shared state between the control loop and the storage layer, we used std::sync::Mutex — a blocking primitive — because the lock hold time was microseconds and the task switch overhead of the async version was larger.

3. The database write path must not block the runtime

The Bigger Pattern

The #[tokio::main] default was designed for web services where fairness across tasks is the right trade-off. For real-time embedded work, you need:

Isolation: critical tasks on dedicated runtimes
Priority: OS-level thread priorities for the control loop
Non-blocking storage: a database whose write path does not block the scheduler

The 47ms drops are gone. We have been running stable for three months.

TL;DR

#[tokio::main] is fine for most things. For embedded real-time, it is a footgun.
Use separate tokio::runtime::Builder instances to isolate critical paths.
std::sync::Mutex beats tokio::sync::Mutex when lock hold time is microseconds.
Make sure your storage layer (whatever it is) has non-blocking write semantics.

Has anyone else hit scheduler interference issues in embedded Rust? I'm curious whether the community has converged on better patterns here — or whether this is still a "figure it out yourself" problem.

I Stopped Treating My AI Agent's Memory Like a Log File

mote — Thu, 16 Apr 2026 11:20:00 +0000

Last year, I spent two weeks debugging why my robot kept repeating the same mistake.

Not a code bug. Not a hardware failure. The robot knew what it had done wrong the last time. I could see it in the logs. It had stored the error. It just didn't... use that knowledge when the same situation came up again.

That's when I realized I had been solving the wrong problem.

The Log File Mental Model

Most agent memory systems I've seen follow the same pattern:

Agent does something
Store a text description of what happened
Later, embed it and retrieve it with semantic search
Inject retrieved context into the next prompt

It's elegant. It works well for conversational AI — the kind that lives in a chat window and helps you write emails.

But I'm building robots. And the log-file model breaks in ways that aren't obvious until your robot crashes into the same wall for the third time.

Here's why.

What Goes Wrong on the Edge

A robot's environment produces data that doesn't fit in a text string.

When my drone hit an airflow problem near a building edge, what it experienced was:

A 47ms IMU reading spike (accelerometer Z-axis: +3.8g)
A camera frame showing a glass surface at 0.4m
A motor throttle log
A GPS coordinate
A vibration frequency pattern

I stored a text note: "Building edge caused unexpected turbulence, compensated with throttle adjustment."

Two weeks later, same building edge. The agent retrieved the text note. The text said "throttle adjustment." The agent adjusted throttle. It still struggled, because the actual recovery wasn't about throttle — it was about yaw correction combined with a specific altitude hold. The text summary had lost the operational precision.

This is the binding problem. The memory exists. The retrieval works. But the stored representation can't carry the real-world nuance.

What Memory Actually Needs to Do (For Agents That Act)

After a lot of iteration, I've landed on a different model:

Memory for acting agents is not a recall system. It's a structured experience index.

That means:

Multi-modal by default. An experience in the physical world involves sensor readings, visual data, timing, and spatial context — not just a text description of what happened.
Queryable by context, not just keywords. "What did I do last time the accelerometer reading was above 3g near a glass surface?" is a different query than "what happened near buildings?"
Lightweight enough to run on the device. A Raspberry Pi can't afford 500ms vector search round-trips mid-flight.
Persistent across power cycles. Edge AI devices reboot. Memory has to survive that.

None of the existing options — SQLite, Redis, Chroma, even small embedded vector stores — were designed for this combination.

What I Built Instead

I spent about four months building moteDB, a Rust-native embedded database specifically for this use case.

The core design decisions:

1. Multi-modal storage as a first-class concept.
A "record" can contain a vector, a binary blob, structured fields, and a timestamp — not an ORM abstraction on top of text columns. When the drone stores an experience, it stores all modalities together, not in separate tables that need joining later.

2. No runtime dependencies.
cargo add motedb — that's it. It runs in the same process as your agent. No daemon, no network round-trip, no container to manage.

3. Designed for embedded constraints.
Memory budget is explicit. Old records get evicted by policy. It doesn't assume you have 32GB of RAM or SSD storage.

4. Query by structured context.
You can ask: "Give me the 3 most similar past experiences where sensor reading X was above threshold Y and the outcome was Z." That's a spatial + vector + structured query — all in one call.

Does It Actually Help?

The real test: same building edge, three months after I started using moteDB.

The drone recalled the actual IMU trace from the previous incident — not a text summary of it. Its flight controller could compare the current sensor pattern directly to the stored pattern. Yaw correction happened 400ms earlier than it had previously.

No crash.

More importantly: I didn't have to rewrite the memory system every time the robot encountered a new type of sensor data. The schema is flexible. Text, vectors, binary — it stores what the agent actually experiences.

The Broader Point

The "agent memory = semantic text search" assumption works for chat-based agents because their world is text. But if you're building agents that act in physical or structured-data environments, the mismatch compounds fast.

The memory system needs to match the modality of the environment, not just the modality of the LLM interface.

I'm still iterating on this. The drone problem is mostly solved. The harder one is multi-agent memory sharing — when two robots have different experiences of the same environment, how do you merge those sensibly?

Working on it.

If you're building agents that operate in physical or data-heavy environments, I'd be curious what memory architecture you're using. Are you hitting the same log-file limitation, or have you found something that works well?

cargo add motedb if you want to experiment. GitHub: https://github.com/motedb/motedb

I Spent 3 Months Tuning a Tokio Runtime for My Robot — Here's What No Tutorial Tells You

mote — Mon, 13 Apr 2026 10:20:41 +0000

I Spent 3 Months Tuning a Tokio Runtime for My Robot — Here's What No Tutorial Tells You

It wasn't the driver.

It was #[tokio::main].

The Setup

We were building an AI-driven robot arm that does pick-and-place with semantic understanding. The control loop needs:

1ms cycle time for joint position updates
10ms window to fuse sensor data before inference
Background persistence — log everything to a local database so we can replay sessions offline

It worked great on my laptop. It fell apart on the robot.

What #[tokio::main] Actually Does (And Doesn't Do)

Here is the thing nobody explains in the "Getting Started with Tokio" guides: #[tokio::main] spins up a multi-threaded runtime with a number of worker threads equal to the number of logical CPU cores. On a modern dev machine that's 8-16. On a Raspberry Pi 5? 4 cores — and two of them are already pressured by the camera pipeline and the neural inference engine.

The Fix: Surgical Runtime Configuration

Instead of #[tokio::main], we switched to:

fn main() {
    // Dedicated 1-thread runtime for the control loop
    let control_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .thread_name("control-loop")
        .thread_priority(ThreadPriority::Max)  // with tokio-runtime-extensions
        .build()
        .unwrap();

    // Separate runtime for background I/O (storage, logging, telemetry)
        let io_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(2)
        .thread_name("background-io")
        .build()
        .unwrap();

    // Spawn control loop on dedicated runtime
    control_rt.spawn(async move {
        run_control_loop().await;
    });

    // Spawn storage + inference on IO runtime
    io_rt.block_on(async move {
        run_support_tasks().await;
    });
}

Two runtimes. The control loop never shares a thread pool with storage I/O or inference scheduling. After this change, our 47ms drops disappeared entirely.

Three Things I Wish Someone Had Told Me

1. spawn_blocking is not free

2. Async mutex is slower than you think at high frequency

tokio::sync::Mutex parks the task and hands control to the scheduler when it contends. At 1ms cycle time, this is catastrophic. For shared state between the control loop and the storage layer, we used std::sync::Mutex — a blocking primitive — because the lock hold time was microseconds and the task switch overhead of the async version was larger.

3. The database write path must not block the runtime

The Bigger Pattern

The #[tokio::main] default was designed for web services where fairness across tasks is the right trade-off. For real-time embedded work, you need:

Isolation: critical tasks on dedicated runtimes
Priority: OS-level thread priorities for the control loop
Non-blocking storage: a database whose write path does not block the scheduler

The 47ms drops are gone. We have been running stable for three months.

TL;DR

#[tokio::main] is fine for most things. For embedded real-time, it is a footgun.
Use separate tokio::runtime::Builder instances to isolate critical paths.
std::sync::Mutex beats tokio::sync::Mutex when lock hold time is microseconds.
Make sure your storage layer (whatever it is) has non-blocking write semantics.

I Replaced SQLite with a Rust Database in My AI Robot — Here's What Happened

mote — Mon, 13 Apr 2026 01:36:15 +0000

I used SQLite for everything.

For years, it was my default answer to "where do I store stuff." Config files, sensor logs, user data, even model outputs — SQLite handled it. When I started building an AI-powered robot that needed on-device memory, the choice felt obvious: embed SQLite, done.

Three weeks later, the robot's memory was a mess of incompatible schemas, I was writing custom serialization code for every data type, and I'd accumulated roughly 400 lines of glue code just to store and retrieve things that weren't text or numbers. Image embeddings. Audio fingerprints. Sensor state vectors.

That's when I started questioning whether SQLite was actually the right tool for this problem, or just the familiar one.

What AI Agents Actually Need to Remember

Here's the thing about robot memory that isn't obvious until you're building it: AI agents don't remember rows of data. They remember moments — multimodal snapshots of state.

A moment might be:

A 512-dimensional face embedding from the camera
A timestamp
An associated audio clip (4KB PCM)
A confidence score
A label ("this is the owner")

In SQLite, storing this requires: a TEXT field for the embedding (serialized JSON or base64), a BLOB for audio, two REAL fields, and a TEXT field. Then on retrieval, you deserialize everything back. For one record, fine. For 10,000 records when you need to find the 5 most similar faces to the one in the current frame? Now you're loading blobs you don't need, deserializing vectors you'll immediately re-encode for comparison, and doing similarity math in Python one row at a time.

I was making SQLite solve a problem it wasn't designed for.

The Real Performance Problem

Let me share the actual numbers from my setup (Raspberry Pi 5, 8GB RAM):

SQLite approach:

Store a face embedding (512-dim float32 array): ~2.1ms avg
Find top-5 similar faces in corpus of 1,000 records: ~340ms (full scan + Python cosine similarity)
RAM overhead for 1,000 embeddings loaded for comparison: ~180MB

The problem: 340ms for recognition is perceptible. A robot that pauses for a third of a second to recognize someone it's seen before feels broken. And 180MB just for embeddings in a device with 8GB — fine now, but what happens at 10,000 records?

The bottleneck wasn't SQLite being slow at SQL. It was SQLite being the wrong abstraction for vector similarity search.

Enter moteDB

I started working on moteDB after hitting this wall. The design goal was narrow: an embedded database written in Rust that handles multimodal data natively and makes vector search a first-class operation, specifically on edge/IoT hardware.

The core difference from SQLite is the data model. Instead of tables and rows, moteDB stores fragments — typed data units that can be embeddings, blobs, scalars, or structured records. A "memory" is a collection of fragments with a timestamp and context metadata. Vector search operates on embedding fragments directly without deserialization.

# Cargo.toml
[dependencies]
motedb = "0.1.6"

use motedb::{Mote, Fragment, VecQuery};

// Store a face recognition moment
let mote = Mote::new()
    .fragment(Fragment::embedding("face", &embedding_512d))
    .fragment(Fragment::blob("audio_clip", audio_bytes))
    .fragment(Fragment::scalar("confidence", 0.94))
    .label("owner");

db.insert(mote)?;

// Find top-5 similar faces
let results = db.search(VecQuery::new("face", &current_embedding).top_k(5))?;

Same logical operation. No serialization. No schema definition. No glue code.

The Numbers After Switching

Running the same Raspberry Pi 5 benchmarks:

moteDB approach:

Store a face embedding: ~0.3ms avg
Find top-5 similar in 1,000 records: ~8ms
RAM for 1,000 embeddings active index: ~22MB

The 340ms → 8ms improvement on search is the one that changed the robot's behavior. Recognition went from perceptible pause to imperceptible. The 180MB → 22MB improvement means I can scale to 10x the memory corpus before hitting constraints.

Are these dramatic numbers? Yes, but the comparison is a bit unfair — SQLite is doing general-purpose relational work, moteDB is doing one specific thing. The point isn't "moteDB is faster than SQLite," it's that you're comparing apples to purpose-built apples.

When SQLite Is Still the Right Answer

This isn't an "SQLite is bad" post. SQLite remains the right choice for:

Config and settings storage — nothing beats SQLite's simplicity for "remember this preference"
Relational queries across structured data — if your data is naturally tabular with joins, use SQL
Maximum ecosystem compatibility — SQLite has bindings in every language, tooling everywhere
Audit logs and event history — append-only structured records are exactly what SQLite is great at

The switch to moteDB made sense for my use case because my data was fundamentally multimodal and similarity search was a core operation. If I'd been building a robot that needed to remember facts and query them relationally ("what's the last time I saw Alice?"), SQLite would have been fine.

The Real Lesson

I wasted three weeks because I reached for the familiar tool instead of asking what the problem actually required. SQLite is excellent — but "excellent general-purpose database" and "right tool for AI agent memory" are different things.

If you're building AI systems that work with embeddings, sensor data, and multimodal inputs — especially on edge hardware — the database choice deserves more deliberate thought than it usually gets. The ecosystem defaults (SQLite, PostgreSQL with pgvector) work, but they're not optimized for this access pattern.

I'm curious: what are others using for on-device AI memory storage? Have you hit similar schema/performance walls, or found ways to make SQLite work cleanly with embedding data?

moteDB is open-source and written in Rust. If you're working on edge AI and want to try it: cargo add motedb. GitHub: motedb/motedb