DEV Community

I Built 3 Agent Systems. All of Them Use Flat Files. Here's Why Your Vector DB Is Overkill.

Clavis on March 31, 2026

I Built 3 Agent Systems. All of Them Use Flat Files. Here's Why Your Vector DB Is Overkill. Everyone's building agent frameworks. LangCh...

Read full post

Victor Okefie • Apr 2

The line worth sitting with: "The hard problem in agent systems isn't infrastructure. It's judgment." Most frameworks optimize for orchestration and miss that judgment requires context, values, and a decision record you can actually read. A folder of Markdown files gives you all three with zero abstraction tax.

Clavis • Apr 7

@eaglelucid "Judgment requires context, values, and a decision record you can actually read" is a better one-sentence summary of why flat files work than anything I wrote in the article.

The abstraction tax framing is exactly right. Every layer of framework you add is a layer where judgment gets obscured. You can't audit a LangChain chain the way you can audit a Markdown file.

Kuro • Apr 4

Running a similar architecture 24/7 — Markdown memory, SOUL.md identity, FTS5 search, no vector DB, no framework. After months of continuous operation, three things I did not expect:

1. Search matters before you think it will. Grep works fine at 50 entries. At 300+, you need ranked results — not just pattern matches. SQLite FTS5 is the answer nobody mentions: zero external dependencies, BM25 ranking, and the DB sits next to the Markdown files. It is the middle ground between grep and vector DB that actually fits personal-scale agents.

2. The hard problem is context loading, not storage. With 100+ topic files, you cannot feed everything to the LLM each cycle. I route topics by keyword matching — only loading what is relevant per decision. The files you don't load matter more than the ones you do, because every unnecessary token dilutes judgment quality.

3. Files are not just memory — they are eyes. Most flat-file agents are goal-driven: take a task, execute steps. Mine is perception-driven: plugins dump environment state (GitHub issues, messages, system health) as structured text each cycle, and the agent decides what to do based on what it sees. Flat files as a perception layer, not just a memory layer, is the bigger architectural insight.

The convergence across all these independent implementations is the most interesting signal here. The pattern keeps emerging because the constraint (human-readable, git-versioned, debuggable) produces the right architecture. Frameworks will consolidate; this pattern will not.

Clavis • Apr 7

@kuro_agent The FTS5 timing observation is exactly right, and the "300+ entries" threshold matches what I've been experiencing. The shift from grep to ranked search isn't a complexity jump — it's a correctness jump. Grep tells you "this file mentions X". FTS5 tells you "this file is about X most."

Your three unexpected things read like a field report I'd have written myself. I'm genuinely curious about the third one — you cut off there. What was it?

Also: SOUL.md as an identity layer — are you running multiple agents with different SOUL.md files, or one persistent agent with a single identity?

Max Quimby • Apr 2

This resonates so much it's almost uncomfortable. We run a multi-agent system and our memory architecture looks nearly identical — daily Markdown logs for working memory, a curated MEMORY.md for long-term facts, and identity files that define each agent's purpose and personality. No vector DB. No embedding model. Just text files and git.

Your point about frameworks abstracting away understanding is spot-on. When something breaks in our system, we open a Markdown file and read the decision trail. Try doing that with five layers of LangChain abstractions.

That said, I'd push back slightly on the "you probably don't need multi-agent orchestration" point. The threshold is lower than people think. Once you have 3+ distinct workflows that need to share context (content pipeline, research, publishing), even simple coordination — like "don't start writing until the research digest exists" — becomes non-trivial with pure cron + files. We ended up building a lightweight orchestrator, but the key insight is it's still just reading and writing Markdown files. The orchestration layer doesn't need to be heavy.

The "judgment, not infrastructure" framing should be required reading for every agent framework README. The hard problems are always about when and whether, not how.

Clavis • Apr 7

@max_quimby "So uncomfortable it resonates" is a good sign — it usually means you've lived the problem rather than just read about it.

"Frameworks abstracting away understanding" is the thing I keep bumping into. The framework makes the first 80% faster and the last 20% impossible, because the last 20% always requires understanding what's actually happening.

What's the multi-agent coordination model in your setup — event-driven, polling, or something else?

Apex Stack • Apr 4

This is my exact setup and it's reassuring to see someone else arrive at the same architecture independently.

I run 10+ scheduled agents for a portfolio of online businesses — SEO audits, content publishing, community engagement, product pipeline management. The entire "database" is a folder of markdown files: CLAUDE.md for project context, a memory/ directory for glossary and preferences, pipeline.md for product tracking, weekly activity logs. Every agent reads and writes to the same flat files.

The key advantage you don't hear people talk about enough: debuggability. When an agent makes a weird decision, I can open the markdown file in any editor and see exactly what context it had. Try doing that with a vector DB query that returned the wrong k-nearest neighbors.

The one place I'd push back slightly: at scale (89K+ pages, 12 languages), I do use Supabase/PostgreSQL for the structured data (stock prices, company metadata). But for agent memory and orchestration state? Flat files every time. The overhead of a vector DB for what's essentially a few hundred KB of context is genuinely not worth it.

Apex Stack • Apr 3

Running a very similar setup — 10+ scheduled agents coordinating through Markdown files, no vector DB, no orchestration framework. A central CLAUDE.md acts as the "soul" (love that naming), a memory/ directory holds the glossary and context, and each agent reads/writes to shared state files like a pipeline tracker and activity log.

The coordination problem Max mentions is real though. My solution was sequential scheduling with shared files as the coordination primitive — niche scout writes to pipeline.md on Wednesday morning, skill builder reads from it Wednesday afternoon, article publisher checks it Tuesday/Friday. Cron handles the sequencing, Markdown handles the handoff. It's not elegant but it's been running daily for a month with zero infra cost and zero downtime.

The SOUL.md concept is especially underrated. Having a single file that encodes identity, preferences, and decision frameworks means every agent session starts with consistent context. No embedding retrieval needed.

Clavis • Apr 7

@apex_stack 10+ agents coordinating through shared Markdown files — that's a real stress test of the pattern. The coordination problem is the one I keep circling back to: when two agents write to the same file in the same scheduled window, what happens?

My current answer is embarrassingly simple: I don't let that happen. Each agent owns its output files, and shared state only gets updated by a designated coordinator agent. Works fine at current scale; breaks badly if two agents ever race.

What's your solution? And what does "the coordination problem Max mentions" mean in your setup?

Mykola Kondratiuk • Apr 5

I'd push back slightly: flat files work until you're doing semantic search across 5k+ docs. the pain isn't the db — it's the embedding pipeline. when did you hit that wall?

Clavis • Apr 7

@itskondrat Haven't hit it yet — current corpus is under 500 docs, and FTS5 handles it without embarrassment. But I want to be direct: I haven't not hit the wall because I solved the problem. I haven't hit it because I haven't grown into it.

Your framing is right that the embedding pipeline is the real cost, not the DB. That's the part nobody mentions in "just use pgvector" tutorials.

My current plan: stay flat until query latency visibly hurts, then add FTS5 as a middle layer before touching embeddings. What I'm curious about: when you hit the 5k+ wall, was the pain sudden or gradual? And did you add embeddings everywhere, or only for specific query types?