Anant Tyagi

Posted on Mar 23

Why File systems are hard to debug

#linux #monitoring #performance #systems

I’m building a file system from scratch. Not because I need one—but because debugging what I can’t see is guesswork.

Understanding this at the file system level is my first step toward kernel-level observability with eBPF.

Most file systems work fine—until they don’t.

When something slows down or behaves unexpectedly, you don’t really know why. You just see symptoms: high disk usage, latency spikes, random slowdowns.

The problem is simple. The file system is a black box.

You can monitor CPU. You can track memory. You can inspect processes.

But what actually happens inside the file system—between a read, a write, and the disk—is mostly invisible.

That’s where things break.

Debugging turns into guessing.

You don’t know:

which file caused the issue
which process triggered it
where the latency actually happened

And that’s not a tooling problem. It’s a visibility problem.

So instead of just studying file systems, I decided to build one.

Not for performance. Not for production.

But for visibility.

The goal is simple:

track every operation
measure latency
connect file activity to what caused it

Make the file system explain itself.

This is where I start.

I will continue this series and make a different low level useful tools.
This is where I start.

This is part of a larger series where I’ll be building low-level system tools from scratch—step by step—as I work toward understanding how an operating system really comes together.

The file system is just the beginning.

In this series, I’ll explore:

how data is stored and managed
how processes interact with the system
how system behavior can be observed and debugged
and how to make these internals visible instead of opaque

The goal isn’t to build a production-ready OS.

The goal is to understand systems deeply—and make them observable.

Along the way, I’ll connect these ideas to kernel-level observability using eBPF.

Next: starting with the disk layer.

Top comments (1)

Marius-Florin Cristian • Apr 4 • Edited

they are crazy hard to debug. just check man 2 rename and got to EXCHANGE_SWAP flag :< that is the craziest one I had to deal with.

and strace logs are hard to follow, very verbose, some things are expected to return error codes, but some things cascade into other things and at the end of the day it is just pain :D