xulingfeng

Posted on May 27 • Edited on Jun 6

Everyone Cheered the Vector Database Demo. I Quietly Kept My SQLite Backup Running. 2 AM Called. It Answered.

#story

Everyone Cheered the Vector Database Demo. I Quietly Kept My SQLite Backup Running. 2 AM Called. It Answered.

This is a work of fiction. All characters, companies, and events are imaginary. Any resemblance to real people or situations is purely coincidental.

The all-hands meeting room had that new-startup smell — coffee, whiteboard markers, and ambition.

Marcus Webb stood at the front, clicking through slides with the confidence of someone who knew the room was already on his side. His architecture diagram was beautiful. Hexagons connected by arrows. A vector database fed by GPU-accelerated embeddings. Cosine similarity search. Semantic clustering.

"ChromaDB, BGE-M3 embeddings, PyTorch CUDA backend," he said, each buzzword landing like a drumbeat. "This is how you do AI memory at scale."

The room nodded. Our CTO Jordan asked two questions. Marcus answered both before she finished asking them.

I sat in the back, my laptop open to a SQLite schema I'd been tinkering with for three weeks. No GPU. No embeddings. No vector search. Just FTS5 full-text indexing and a jieba-based entity linker I'd hacked together on a Sunday afternoon.

"That's cute, Alex."

Marcus had caught me staring at my screen during his Q&A. He walked over, glanced at my terminal, and smiled the kind of smile that's meant to show superiority but really shows insecurity.

"SQLite, huh? For AI memory?" He said it loud enough for the front rows to hear. A few people chuckled.

"FTS5 is surprisingly fast for small datasets," I said. "And it doesn't need a GPU."

"Neither does a bicycle," Marcus shot back. "But you wouldn't use one in a Formula 1 race."

More laughter. I closed my laptop.

I didn't argue. That's the thing about these situations — the louder voice always wins in a meeting. The quieter one wins in production.

Over the next six weeks, Marcus led the charge on the vector memory layer. He called it "MemCore." It had an elegant architecture, a well-documented API, and a PyTorch dependency that took three hours of dependency hell to install on every new machine.

I built something else. A backup. A paranoid, nobody-asked-for-this, just-in-case fallback.

I called it sqlitemem. It was 185 lines of Python wrapped around a SQLite FTS5 table. Entity linking by regex patterns. Trust scoring by hand-tuned weights. RRF fusion to blend five signals into one relevance score.

I deployed it alongside Marcus's system in silent mode — same queries, same data, same load — writing results to a log file that nobody ever looked at.

I didn't tell anyone. Not because I was sneaky. Because if I had, Marcus would've laughed again. And honestly? I wasn't sure he was wrong.

The crash came on a Tuesday.

2:14 AM. My phone lit up with a PagerDuty alert from the on-call rotation I wasn't even part of.

FATAL: CUDA error - thread deadlock detected. MemCore service offline.

I checked the logs while still half-asleep. The vector layer had ingested a batch of 200 parallel memory writes — a routine load — and PyTorch's CUDA backend had deadlocked. All six embedding workers froze. The database connection pool saturated. The entire memory service went dark.

Jordan's message hit the engineering Slack channel two minutes later:

@channel MemCore is down. On-call is investigating. Board demo in 7 hours. Anyone with ideas, speak up.

Marcus replied within thirty seconds: Looking into it. CUDA race condition. Should have a fix in 2-3 days.

Two to three days.

The board demo was in seven hours.

I stared at my phone for exactly thirty seconds. Then I typed:

I have a fallback system running in parallel with identical data. Give me 5 minutes to route traffic.

Jordan: Wait — what fallback system?

The one I built when Marcus called it "cute."

Switching over took 2 minutes and 43 seconds. I changed a config file, restarted the agent, and watched the logs.

First query: 2ms. Hit.

Second query: 3ms. Hit.

Query 10: All five signals firing. FTS5 catching keywords. Entity linking finding cross-references. RRF producing a clean, ranked result set.

I ran the same benchmark I'd been silently logging for weeks, pulled the comparison, and pasted it into Slack:

                MemCore (vector)    sqlitemem (SQLite)
Recall@5        164%                198%
Precision@5     85%                 94%
MRR             0.85                0.94
First query     6.6s (GPU load)     2ms
Avg latency     12ms                3ms
GPU required    Yes                 No
Crash history   5 incidents         0

Jordan: Alex, what am I looking at?

The system you didn't know you had running in production for the last six weeks.

The Slack channel went silent for a full minute. I imagined Marcus staring at his phone, the words "198% recall" burning into his retina.

Seven hours later, the board demo went flawlessly. The agent answered every question, retrieved every relevant memory, and never once stuttered.

After the call, Jordan pulled me into a 1:1. She didn't ask about Marcus's system. She asked about mine.

"How long have you been running it?"

"Six weeks. Since the week Marcus called it a bicycle."

She laughed — a short, surprised sound. "And you didn't say anything?"

"What was I supposed to say? 'Hey, I built a better system in my free time'? That never works in meetings. It only works in production."

She leaned back. "You know that's not healthy, right? This culture where engineers have to prove things through silent competition instead of just ... talking."

"I know. But it's the culture we have. Not the one we want."

Jordan nodded. "Starting Monday, you're the lead on memory infrastructure. sqlitemem goes to production. Marcus reports to me directly on the vector rewrite — and he's using your architecture as the reference."

I didn't say "I told you so." I didn't need to. The benchmark numbers had already said it for me.

A month later, Marcus stopped by my desk.

Not to apologize — I never expected that. He handed me a printout of our latest latency metrics and said, "Your RRF fusion approach is cleaner than what I was doing with cosine similarity."

"Thanks," I said.

"That's all you're going to say?"

"SQLite. FTS5. 198% recall. No GPU." I shrugged. "The numbers talk. I don't need to."

He almost smiled. "You've been waiting to say that for six weeks, haven't you?"

"Maybe six and a half."

What I Actually Learned

This is a work of fiction, but it's built on a real dynamic I've seen play out in every company I've worked at:

1. The loudest solution is rarely the best. Marcus wasn't wrong about vector databases. He was wrong about assuming it was the only answer. Architecture is about constraints, not absolutes. A system with 1,000 records doesn't need the same tool as a system with 1 billion.

2. Paranoia is underrated as an engineering trait. I built sqlitemem because I didn't trust the "proven" solution. That distrust came from experience — from watching buzzword architectures crumble under real load. Being called paranoid feels bad. Being right feels better.

3. "Not scalable" doesn't mean "not useful." Everyone told me SQLite doesn't scale. They were right. But it doesn't need to. The dataset fits in memory. The indexes cover the queries. The latency is measured in milliseconds, not seconds. Premature scalability is the root of many expensive failures.

4. The best time to build a backup is before people tell you it's unnecessary. By the time the vector DB crashed, my fallback had already been running in production for a month and a half. I knew it worked. I knew the numbers. I didn't have to guess when the pressure was on.

What's your "SQLite backup" — the simple solution everyone laughed at, until it was the only thing that worked?

Follow for more stories about AI, engineering, and what happens when code meets corporate reality.

DEV Community

Everyone Cheered the Vector Database Demo. I Quietly Kept My SQLite Backup Running. 2 AM Called. It Answered.

Everyone Cheered the Vector Database Demo. I Quietly Kept My SQLite Backup Running. 2 AM Called. It Answered.

What I Actually Learned

Top comments (0)