Juan Torchia

Posted on Apr 17 • Edited on Apr 20 • Originally published at juanchi.dev

TigerFS: A Full Filesystem Inside PostgreSQL (And Why This Obsession Feels Like a Symptom)

#english #experiments #experimentos #postgres

POSIX defines 17 system calls for managing files. PostgreSQL implements all 17 of them inside relational tables. When I read that in the TigerFS README, I had to close my laptop, take a breath, and open it again.

Not because it's useful. It's clearly an experiment. But because someone sat down and mapped open(), read(), write(), mkdir(), unlink() — the whole thing — onto rows and columns in Postgres. And made it work.

Last year I shoved the entire Linux git history into a database and called it archaeology. Now someone did the inverse: took something that predates modern database management systems — the concept of a filesystem itself — and crammed it inside one. There's something about this collective obsession with putting everything inside everything else that I think is a symptom of something bigger.

I installed it. Broke it twice. And I think I understand why it exists.

TigerFS and the Idea Behind a Filesystem on Postgres

TigerFS is a userspace filesystem (FUSE) that uses PostgreSQL as its storage backend. That means when you write a file, it doesn't go to disk directly — it goes to a table. When you create a directory, you insert a row. When you delete a file, you run a DELETE.

The schema is elegant in its brutality:

-- Main inode table
CREATE TABLE inodes (
  inode_id    BIGSERIAL PRIMARY KEY,
  parent_id   BIGINT REFERENCES inodes(inode_id),
  name        TEXT NOT NULL,
  type        CHAR(1) NOT NULL, -- 'f' file, 'd' directory, 'l' symlink
  size        BIGINT DEFAULT 0,
  mode        INTEGER DEFAULT 493, -- 0755 in octal
  uid         INTEGER DEFAULT 0,
  gid         INTEGER DEFAULT 0,
  atime       TIMESTAMPTZ DEFAULT NOW(),
  mtime       TIMESTAMPTZ DEFAULT NOW(),
  ctime       TIMESTAMPTZ DEFAULT NOW()
);

-- Actual data lives here, partitioned into blocks
CREATE TABLE blocks (
  inode_id    BIGINT REFERENCES inodes(inode_id) ON DELETE CASCADE,
  block_num   INTEGER NOT NULL,
  data        BYTEA NOT NULL, -- real binary content
  PRIMARY KEY (inode_id, block_num)
);

-- Critical index — without this it's unusable
CREATE INDEX idx_inodes_parent_name ON inodes(parent_id, name);

Every filesystem operation translates to SQL. A file read is a SELECT data FROM blocks WHERE inode_id = ? ORDER BY block_num. A write is an INSERT ON CONFLICT UPDATE. An ls is a SELECT name FROM inodes WHERE parent_id = ?.

FUSE bridges the kernel's syscalls to these operations. Your program writes a file, the kernel calls FUSE, FUSE calls TigerFS, TigerFS talks to Postgres.

Installation, First Contact, and How I Broke It

I started with Docker because I'm not a masochist (or not that much of one):

# Spin up Postgres first
docker run -d \
  --name tigerfs-postgres \
  -e POSTGRES_PASSWORD=tigerfs \
  -e POSTGRES_DB=tigerfs \
  -p 5432:5432 \
  postgres:16

# Wait for it to actually be ready
sleep 3

# Install FUSE dependencies on the host
sudo apt-get install -y fuse libfuse-dev

# Clone TigerFS
git clone https://github.com/[repo]/tigerfs
cd tigerfs

# Build
make build

# Create the mount point
mkdir -p /tmp/tigerfs-mount

# Mount it
./tigerfs mount \
  --dsn "postgres://postgres:tigerfs@localhost:5432/tigerfs" \
  --mountpoint /tmp/tigerfs-mount

First problem: FUSE in non-root mode on modern Linux needs user_allow_other enabled in /etc/fuse.conf. Without that, only the user who mounted it can access it. In production that matters a lot. In a weekend experiment, I added it and moved on.

First real test:

# Write something
echo "hello tigerfs" > /tmp/tigerfs-mount/test.txt

# Verify it's actually in Postgres
psql -h localhost -U postgres tigerfs -c "
  SELECT 
    i.name,
    i.size,
    encode(b.data, 'escape') as content
  FROM inodes i
  JOIN blocks b ON i.inode_id = b.inode_id
  WHERE i.name = 'test.txt';
"

-- Result:
--   name   | size |    content
-- ---------+------+------------------
--  test.txt|   14 | hello tigerfs\012

There it is. A text file sitting inside a relational database. The \012 is the newline. Everything checks out.

How I broke it the first time: I tried copying a large binary file. A 50MB executable. TigerFS defaults to 4KB blocks, which means 12,800 INSERT statements for a single file. Postgres didn't complain. But the write took 40 seconds. For a 50MB file. That's when I understood we are very, very far from ext4.

How I broke it the second time: I left a transaction open in another psql session while writing from FUSE. Deadlock. The filesystem hung. I had to unmount manually with fusermount -u /tmp/tigerfs-mount and restart.

Both failures are expected. They're the right failures for an experiment.

The Gotchas Nobody Tells You About

Gotcha 1: FUSE and Docker aren't friends by default

If you run TigerFS inside a container, you need --privileged or at least --device /dev/fuse --cap-add SYS_ADMIN. Without that, FUSE can't mount anything.

# This fails silently without the right flag
docker run --device /dev/fuse --cap-add SYS_ADMIN tigerfs-image

Gotcha 2: Block size matters enormously

With 4KB blocks, writing large files is a latency nightmare. With 1MB blocks you dramatically improve throughput but waste space on small files. No silver bullet here — it's the exact same tradeoff as any real filesystem. Always has been.

Gotcha 3: Indexes are everything

Without the composite index on (parent_id, name), an ls on a directory with 1,000 files does a full sequential scan of the inode table. I learned this the hard way. Same principle as always: 73% of Postgres performance problems are about indexes, not hardware.

Gotcha 4: Transactions and atomicity

This is where it gets genuinely interesting. Unlike a traditional filesystem, TigerFS can wrap operations in real transactions. Write 10 files, fail on the 7th, roll back, and it's like nothing happened. ext4 doesn't give you that.

Gotcha 5: mtime and atime are basically free

In normal filesystems, updating atime on every read is expensive — it implies a disk write. In TigerFS it's just a field update in Postgres, which can be optimized or disabled with a flag. Minor detail, but it shows that the relational model brings unexpected advantages you wouldn't think of upfront.

Why This Exists: The Bigger Symptom

There's a tendency I think about a lot. I call it "abstraction as exploration."

It's not about building something useful. It's about understanding what happens when you break assumed layers. A filesystem exists at one level of abstraction. A database exists at another. Normally you don't mix them. TigerFS asks: what if we do?

Last year I did the same thing in the opposite direction with Linux's git history. I took data that normally lives in a git repo and put it inside Postgres so I could run SQL queries on it. Same energy. Different direction.

I see the same pattern in MegaTrain trying to train 100B LLMs on a single GPU: someone asking what happens if we ignore the assumed constraint. In Project Glasswing analyzing what's actually inside AI-generated code: questioning what we assume is safe.

These projects aren't for production. They're executable thought experiments. And executable thought experiments are how you actually learn.

After the Vercel-to-Railway migration I went through — a weekend that taught me more about real infrastructure than months of tutorials ever did — I get why people build these things. Sometimes you need to break the mental model to see its edges.

TigerFS shows you the edges of a filesystem. It says: look, a filesystem is basically a metadata tree plus data blocks. That's it. Postgres can represent that. The question isn't whether it can, but what you gain and what you lose.

What you lose: performance (dramatically), compatibility with system tools, operational simplicity.

What you gain: real transactions, SQL queries over metadata, built-in replication, consistent backups with pg_dump, direct SQL access to your data. If you have a use case where those advantages outweigh the downsides — and they exist, especially in embedded systems or environments where you already have Postgres and need structured storage — TigerFS or something inspired by it makes sense.

Think document management systems. Or data pipelines where the filesystem is a coordination layer between processes. Or testing, where you want a filesystem you can inspect with SQL after your test runs. Suddenly the experiment starts having real applications.

FAQ: Filesystem on Postgres, FUSE, and TigerFS

Is TigerFS production-ready?

No, at least not in its current state. Write times for large files are orders of magnitude slower than a native filesystem. It's designed as an experiment and proof of concept. That said, the principles behind it — database-backed filesystems — do exist in production in systems like Amazon S3 (which internally uses similar models) and various distributed storage systems.

How does FUSE actually work?

FUSE (Filesystem in Userspace) is a Linux kernel module that lets you implement a filesystem in userspace, without touching kernel code. When an application calls open("/tmp/tigerfs-mount/file.txt"), the kernel sees that path is mounted with FUSE and delegates the call to your userspace program. Your program responds, the kernel returns the result to the application. The magic is that the application has no idea it's talking to Postgres — it thinks it's talking to a normal filesystem.

What's the real advantage of storing files in Postgres vs. disk?

Depending on your use case: ACID transactions (write 100 files and roll back if something fails), SQL queries over metadata (find all files modified in the last 24 hours with a simple SELECT), automatic replication if you already have Postgres replicated, and consistent backups with pg_dump. For most cases, native filesystem wins by a mile. But for specific cases — especially process coordination or auditing — the database wins.

Why do experiments like TigerFS matter if they're not used in production?

Because they're the best teachers of fundamentals. Implementing a filesystem forces you to understand what an inode is, why blocks exist, how the directory tree works. Implementing it on Postgres forces you to understand what Postgres does well and what it does badly. You don't learn that by reading documentation — you learn it by breaking things. The same principle applies to not blindly trusting AI-generated code: you need to understand the layers below to know what's actually happening.

What's the difference between TigerFS and just storing files as BLOBs in Postgres?

Good question. Storing BLOBs in Postgres is a known practice (and sometimes a valid one). TigerFS goes further: it implements the complete semantics of a filesystem — permissions, timestamps, nested directories, symlinks, atomic operations. It's not just file storage, it's a complete filesystem with its metadata tree, its block system, and its integration with the kernel's VFS via FUSE. The difference is like comparing storing HTML in a TEXT column versus implementing a full web server.

Could something like this work for testing or CI?

This is the application I find most legitimately compelling. Imagine a test that writes files to a TigerFS filesystem, runs, and then you can do SELECT * FROM inodes WHERE mtime > NOW() - INTERVAL '10 seconds' to see exactly which files your program touched. Or you can roll back the entire filesystem between tests with ROLLBACK. That's not trivial with a normal filesystem — you'd need something like overlayfs or tmpfs with custom logic. With TigerFS you get it for free.

Closing: The Obsession Worth Having

I'm not going to use TigerFS in production. I wouldn't recommend it for anything that matters. But I'm going to keep it installed because every time I get stuck thinking about a storage or metadata problem, I can open psql, query the filesystem, and see the structure from a completely different angle.

Something the years have taught me — from the days diagnosing network outages at a cyber café at 11pm to nuking a production server with rm -rf in my first week — abstraction layers are agreements, not truths. A filesystem is an agreement. A database is an agreement. When you break those agreements in a controlled way, in an experiment, on a weekend, with no real consequences, you learn where the edges are.

TigerFS is that exercise. And I think it's absolutely worth doing.

If you're interested in the angle of shoving data into places it "shouldn't" go, the post on Linux's git history in a database is the natural companion to this one. And if you're worried about dependency on external tools — which is the real cost when experiments become production — the post on Anthropic and vendor lock-in in AI APIs has the same DNA.

Break things. In controlled environments. With pg_dump first.