Everyone is building AI agents with custom tools, vector databases, and retrieval pipelines. We tried all of that. Then we replaced it with files and a bash command. Everything got better.
This is what we learned.
The setup
Silvia is an AI personal CFO. She connects to your bank accounts, brokerage, crypto wallets, and credit cards. She tracks your net worth, monitors your spending, reviews your portfolio, and gives you financial advice based on your actual numbers. Over $30 billion in connected assets across thousands of users.
The agent has to remember things across sessions. Your risk tolerance. Your tax situation. The fact that you asked about refinancing last month. The allocation model you agreed to. Every user's financial picture is different, and the agent needs to carry that context forward.
We started building this the way everyone does.
What we tried first
Like every team building this, we started with specialized tools. A tool to update personal info. A tool to track preferences. A tool to handle document uploads. A research tool. As we mapped out the next features (agent memory, scratchpad notes, per-user context), the list was growing fast.
The problems showed up early. The model would sometimes call the wrong tool. Routing accuracy degraded as the tool list grew. Token usage climbed because the model had to reason about which tool to call before it could do anything useful.
Every new feature meant another tool, another schema, another set of edge cases. We were spending more engineering time maintaining tool routing logic than building actual financial features.
The switch
So instead of building four more tools for memory, notes, context, and per-user state, we built one. A skills tool that gives the agent a real filesystem and a shell.
The agent gets a persistent directory. It writes whatever it wants, however it wants. User preferences go in a file. Session notes go in a file. Financial summaries go in a file. The agent organizes its own workspace the way it sees fit.
Retrieval? grep. The agent searches its own files with the same Unix commands it saw billions of times during pretraining. No embeddings. No vector index. No retrieval pipeline.
Three things happened immediately:
Token usage on tool routing dropped meaningfully. The model stopped spending tokens, reasoning about which of multiple memory/notes tools to call. There's only one. Run a command or don't.
Accuracy went up. No more routing errors. No more "I stored this in the wrong place." The agent writes a file and reads it back later. The failure mode is almost nonexistent.
Development moved faster. We stopped building tool wrappers and started building financial features. The filesystem handled storage, retrieval, and organization without us writing any orchestration code.
The model was better at navigating its own files than it was at calling our purpose-built wrappers. That shouldn't have been surprising. LLMs have massive pretraining on filesystems and Unix commands. cat, grep, find, awk, sed, ls, and mkdir. These are some of the most common patterns in the training data. We were building abstractions on top of something the model already knew how to use.
Then we noticed something else
After a few weeks, the agent's workspaces started looking different for each user. Not because we programmed it that way. Because the agent organized its own notes based on what mattered to each person.
For someone in active trading mode, a directory of monitoring rules and price alerts emerged. For someone in retirement planning mode, tracking files, distributions, and tax timing. For someone running a company alongside personal finances, separation between the two. The agent organized differently for each life situation.
Nobody told the agent to do this. It just organized its workspace in a way that made sense for each user's situation.
That's when we realized: the filesystem isn't just a storage layer. It's the agent's understanding of the user. The longer someone uses Silvia, the more their workspace reflects their financial life. And the more the workspace reflects their life, the better the agent gets at helping them.
Every session makes the next one better. Every saved preference is context that the agent doesn't have to rederive. Every note you write is a switching cost because if you switch to a different product, you lose all that accumulated understanding.
The convergence nobody expected
We thought we were just solving our own problem. Then we looked around.
Karpathy published his LLM Wiki pattern in April. 20.8 million views. Over 100 articles compiled by an LLM into markdown files with cross-references and an index. No RAG. No vector database. Just files.
Anthropic shipped auto memory for Claude. The implementation? Claude writes notes to files between sessions. File-based, not vector-based.
Letta benchmarked a filesystem-based agent on LoCoMo, the standard memory retrieval benchmark. Their agent, running on gpt-4o-mini with just grep, search_files, open, and close, scored 74.0%. That beat Mem0's top-performing graph variant (68.5%). Letta's takeaway: agents are far more effective at using filesystem-style tools than at using purpose-built memory APIs, because file operations dominate their training data.
Mintlify replaced their RAG-driven session boot with ChromaFs, a virtual filesystem that their agents query with grep, cat, and ls. Session creation time dropped from 46 seconds to 100 milliseconds. 460x faster.
Anthropic's Managed Agents docs describe memory stores as file-based, mounted inside the agent's container under /mnt/memory/, read and written with standard file tools.
The agentic AI orchestration and memory systems market is at $6.27 billion and is projected to hit $28.45 billion by 2030.
Researchers, labs, infrastructure builders, and us. Everyone is arriving at the same answer independently. The filesystem is the convergence point.
It's not just public. It's happening quietly everywhere.
I've been building a personal wiki with Claude myself. So have multiple developers and non-developers I know. Nobody coordinated. They all just started doing it.
In every conversation I have with people using LLMs, developers, and non-developers, the same thing comes up. "I have this folder of markdown files that Claude reads." "I keep a context file that the agent loads at the start of every session." People are doing this everywhere. Most of them don't even realize everyone else is doing it too.
When this many people independently arrive at the same answer, it's probably the right one.
Why this is happening
LLMs already know how to use filesystems.
This is the part that sounds too simple to be true, but it explains everything. Large language models were trained on enormous amounts of code, documentation, and technical content. They've seen billions of examples of file operations. When you give an agent a filesystem and a shell, you're not teaching it something new. You're removing the abstractions that were standing between it and something it already understood.
Every custom tool you build is something the model has to learn. Every API wrapper is a new interface to reason about. Every retrieval pipeline is a new failure mode. The tool list grows, routing accuracy drops, token usage increases, and reliability suffers.
A filesystem with Unix tools has none of these problems. The model already knows the interface. It's been trained on it. The tools compose naturally. You can pipe grep into awk into sort. There's one consistent mental model instead of five different APIs.
This is why every team that tries the filesystem approach reports the same thing: simpler code, fewer errors, lower token usage, better results.
What nobody else is talking about
Here's what's missing from every "build your own second brain" article making the rounds this week.
All of them are local. The vault lives on your laptop. The files are in your home directory. It works for one person on one machine.
That breaks the moment you want to build this for users.
If you're building an agent product, you need:
Multi-tenancy. Each user's agent gets its own isolated workspace. User A's files never touch User B's files. Scoped API keys. Namespace isolation.
Cloud persistence. The files can't live on one machine. They need to be accessible from your web app, mobile app, API, and scheduled jobs. Across machines, across sessions, across model changes.
Security. If agents are processing financial documents, medical records, or legal contracts, the sandbox needs to be real. No outbound network. Process isolation. Audit logs.
Webhooks. When the agent writes a file, your backend needs to know. Real-time events on every write so you can trigger downstream workflows.
Snapshots and rollback. Did the agent write something wrong? Roll back the workspace to a known good state.
None of this exists in Obsidian. None of it exists in a local directory. None of it exists in a bash script duct-taped to Claude Code.
What we built
We built TroveFiles to solve this for ourselves, then realized every agent builder we talked to had the same problem.
TroveFiles is a cloud-managed POSIX filesystem for AI agents. Three tools: trove_exec (any shell command), trove_read, and trove_write. That's it. The model already knows the interface. Each customer gets their own isolated namespace. Full multimodal toolchain preinstalled: pdftotext, ffmpeg, imagemagick, jq, exiftool, plus everything in coreutils. Files persist across sessions. HMAC-signed webhooks fire on every write. No outbound network from the sandbox. Snapshots and rollback are built in.
It works with any MCP-compatible client. Claude Desktop, Cursor, and others. Setup is one command, or three lines of Python if you'd rather use the SDK directly.
The filesystem is the convergence point for AI agent memory. We built the infrastructure to make it work in production.
Shain Noor is CTO of ProCap Financial and founder of cfosilvia.com, an AI personal CFO with over $30B in connected assets. TroveFiles productizes the architecture that powers Silvia's skills and memory layer.
Top comments (0)