DEV Community

Peter Teoh
Peter Teoh

Posted on

Difference between filesystem and database at the low level

"At low level" here means at the CPU and storage devices' block level.

Filesystems and databases are more similar than people think: both are storage managers that try to provide consistency, durability, and concurrency over bytes/records. Below we described how database' ACID properties (Atomicity, Consistency, Isolation and Durability) mapped to filesystem internals.


1. Atomicity

  • Database view: A transaction is “all or nothing.” Either all SQL operations commit, or none.
  • Filesystem equivalent:

    • rename(2) in POSIX is atomic: the new filename is guaranteed to either point to the new inode or the old one, never half-written.
    • Journaling filesystems (ext4, NTFS, XFS, btrfs) implement write-ahead logs or copy-on-write (COW). Example:
    • ext4’s fs/jbd2/ journaling code logs metadata blocks first, then marks them committed.
    • If crash occurs, replay ensures either the old inode/blocks are visible or the fully new ones—never a mix.
    • Code path: in Linux ext4, see fs/ext4/namei.c: ext4_rename() → calls journaling routines in [fs/jbd2/transaction.c] for atomic commit.

2. Consistency

  • Database view: A transaction moves the DB from one valid state to another (constraints preserved).
  • Filesystem equivalent:

    • Journaling or COW ensures metadata integrity: block allocation bitmaps, inodes, directory entries are always in a self-consistent state after crash recovery.
    • Example: ext4 journals metadata operations in struct handle_s (see fs/jbd2/transaction.c) and replays them fully or not at all.
    • Constraints in FS: free block counters match actual free blocks, no directory entry points to garbage, reference counts are accurate. Similar to DB enforcing primary key/foreign key rules.

3. Isolation

  • Database view: Concurrent transactions don’t interfere (enforced by locks, MVCC).
  • Filesystem equivalent:

    • Locks: VFS provides inode-level locks (struct inode->i_rwsem, struct file_lock) to serialize writers and coordinate readers.
    • Concurrent isolation: A file write + rename from one process is isolated from another process reading—Linux uses lock ordering in fs/read_write.c and journaling barriers.
    • Ext4’s journal transaction handles are per-thread, isolating concurrent FS operations until commit.
    • Isolation is weaker than DB MVCC: FS usually provides serializability for metadata but not full snapshot isolation for data blocks. (e.g., readers can see partially written data unless O_DIRECT or O_SYNC is used).

4. Durability

  • Database view: Once commit is acknowledged, data survives power loss.
  • Filesystem equivalent:

    • Journaling (ext4/NTFS) or COW (btrfs, ZFS) ensures committed changes are persisted.
    • In Linux ext4: jbd2_journal_commit_transaction() flushes journal blocks with blkdev_issue_flush() before reporting success.
    • ZFS and btrfs use copy-on-write B-trees: old blocks stay untouched until new blocks + metadata are fsync’d. After crash, replay ensures new committed tree root is used.
    • Applications rely on fsync(2) or fdatasync(2) to push buffers to disk—similar to DB COMMIT.

5. Mapping Table

ACID Property Database Filesystem Equivalent Source Code (Linux/Ext4)
Atomicity Transaction commit/rollback rename(2), journaling of inode/dir updates fs/ext4/namei.c:ext4_rename()
Consistency Constraints preserved Journaling replays to keep block/inode maps valid fs/jbd2/transaction.c
Isolation Locks/MVCC Inode/file locks, per-handle journaling isolation fs/inode.c, include/linux/fs.h
Durability WAL + fsync Journal flush (jbd2_journal_commit_transaction) or COW flush fs/jbd2/commit.c

6. Key Difference

  • Databases implement logical consistency (foreign keys, uniqueness).
  • Filesystems implement structural consistency (inodes, block bitmaps).
  • DB durability guarantees apply once commit returns, FS durability only applies once app calls fsync. Many apps forget fsync—leading to “committed but lost” anomalies.

Database ACID properties maps fairly well onto modern journaling or COW filesystems. Filesystem is a weaker transactional database specialized for file/inode structures. The kernel code shows the same primitives: logging, locking, atomic renames, and recovery replays.

Top comments (0)