I Spent 3 Hours Watching My Benchmark Hang, Then 6 Seconds to Fix It

Three hours. That's how long bench_column_index ran before I realized it wasn't going anywhere.

I was preparing for moteDB v0.2.0 and running the usual performance suite. Twelve DB instances in parallel, each doing SELECT WHERE col = ? queries while a background thread built indexes. Queries that should take milliseconds started taking minutes. Then hours. Then nothing.

The culprit was a single RwLock<GenericBTree> protecting every read and write to the column index. When the background thread grabbed the write lock to bulk-insert, every query blocked. Simple as that. Twelve threads fighting over one lock.

Here's what I did about it — and how I got that 3-hour hang down to 6.6 seconds.

The Architecture That Was Killing Us

v0.1.7 had a straightforward design: one B-Tree, one RwLock. Clean. Wrong.

SELECT WHERE col = ?  →  acquire read lock  →  traverse B-Tree  →  return
Background index build  →  acquire write lock  →  bulk insert  →  release

When those two paths hit the same lock simultaneously, queries queued behind the writer. With twelve instances, the queue grew faster than it drained. The system looked alive — threads were running, memory was allocated — but nothing was making progress.

I needed a different model. Here's what I landed on:

Two-layer architecture, RocksDB style:

IndexMemBuffer — an in-memory BTreeMap with parking_lot::RwLock (nanosecond-level contention). Writes go here first.
GenericBTree — the on-disk B-Tree. Reads through here. Writes only happen during background drain.
drain_lock (Mutex) — serializes the buffer-to-B-Tree migration using try_lock so writers never block.
tombscones (HashSet) — tracks deleted keys so drained buffers don't resurrect data.

When the memory buffer exceeds a threshold, it atomically flips to an immutable snapshot. The drain thread picks it up and builds the B-Tree without blocking readers. New writes hit the new active buffer.

There's also a TOCTOU fix: get() holds the same lock through both the tombstone filter and the LRU cache write, eliminating the race window where a key could be deleted between the two operations.

The result speaks for itself:

bench_column_index runtime: 3+ hours → 6.6 seconds

Three Phases of Performance Work

Beyond the core lock contention, I spent the release cycle on three performance phases.

Phase 1: Memory Layout

Arc<DataEntry> eliminated full-row memcpy on every get()
Non-vector tables got their own BTreeMap instead of the generic wrapper, saving 24 bytes per row

At 100K rows, that's roughly 10MB of memory saved without touching any query logic.

Phase 2: Syscall Reduction

DiskANN insertion path optimized
SQ8Vectors persistent file handle reuse

Every cache miss was triggering 2 syscalls. This phase eliminated that overhead at the I/O layer.

Phase 3: Space Index and FTS

i-Octree now uses Morton codes for batch loading, with leaf nodes filled in order to minimize tree splitting
LSM scan_range() switched to streaming scan instead of materializing everything
FTS switched to append-only sharded writes, with delayed merge triggering when a shard hits 5 segments
Columnar predicate pushdown: decode the timestamp column first to locate rows, then decode target columns on demand — avoids decoding columns that were already filtered out
Spatial query row cache + removing per-row HashMap allocation: 8000x speedup on spatial range queries

The Audit That Found 28 Problems

I ran three rounds of adversarial auditing before this release. I'm glad I did.

The findings were... extensive:

B-Tree: split leaf index out-of-bounds panic
Async index pipeline: double-insert causing text index panic
WAL compression + DiskANN: 3 separate deadlocks
close(): not notifying background threads before checkpoint
Column/text index: querying before async pipeline finished building — no fallback
SUM precision loss: switched from floating-point accumulation to a two-pass compensation algorithm
BTreeMap scan: materializing all results at once causing memory spikes
Primary key query: index missing after restart, no fallback to scan
glibc arena: concurrent crash on explicit db.close()

The glibc arena one was particularly fun. Under heavy concurrent close() calls, the arena allocator would crash because malloc wasn't thread-safe in the way the code was using it. Fixed by not calling close() from multiple threads simultaneously. Obvious in hindsight.

Edge Devices Finally Get Love

moteDB targets embedded and edge hardware. v0.2.0 has dedicated optimizations for that:

EdgeIndexConfig: DiskANN now has bounded memory index configuration, limiting graph memory footprint
FTS bounded shard counter + VersionStore eviction: prevents memory growth during long-running operations
Dead code cleanup: removed ~2200 lines, reducing binary size
Zero clippy warnings: everything compiles clean

Testing at Scale

The new test infrastructure handles the concurrency edge cases:

wait_for_indexes_ready() — polls pending_index_batches atomic counter for deterministic index readiness
CI adaptive data scaling — detects CI environment and automatically reduces test data volume
749 new test cases, running under 4-thread concurrency, completing in ~3 minutes with zero hangs

The Numbers

Here's the full picture:

35 commits, 89 source files changed
28,118 lines added, 14,815 deleted
11 performance optimizations
21 bug fixes
3 new features

The release is on crates.io — cargo add motedb or add motedb = "0.2.0" to your Cargo.toml.

If you're running moteDB on edge hardware or need a database that won't stall your queries while building indexes in the background, this one's worth upgrading to.

The benchmark suite no longer hangs.