DEV Community

Cover image for # Why I Bypassed FUSE: Building a Transparent DataTiering Engine in Rust
Jo
Jo

Posted on

# Why I Bypassed FUSE: Building a Transparent DataTiering Engine in Rust

If you run a home lab or manage large datasets, you’ve hit this wall: NVMe drives are fast but too expensive to hoard data on. Hard drives or cloud buckets are cheap, but they are slow and a pain to manage manually.

The enterprise world solves this with HSM Hierarchical Storage Managementautomatically shuffling colddata to slow storage while keeping a transparent stub on the fast drive. But enterprise HSMs cost thousands of dollars and lock your data in proprietary black boxes.

I wanted this for Linux, for free. So, I started building HuskHoard, an opensource data tiering engine.

My first thought, like almost every Linux developer building a virtual filesystem, was to use FUSE . But I quickly realized FUSE was the wrong tool for the job. Here is why I abandoned it, and how I used the Linux fanotify API and Rust to build a transparent, zero overhead archiving engine.

The Problem with FUSE

FUSE is fantastic for creating custom filesystems like SSHFS or mounting an S3 bucket. But for an HSM, it creates a massive bottleneck.

When you use FUSE, every single read and write has to go through a context switch:
Application > Kernel > FUSE Daemon Userspace > Kernel > Physical Drive.

If 90% of your data is Hot actively being used on your fast NVMe, forcing it through FUSE overhead completely defeats the purpose of buying expensive NVMe drives in the first place. You sacrifice native I/O performance just to manage the 10% of Cold data.

I needed a solution where the Hot data ran at native speed, touching nothing but the XFS/Ext4 kernel drivers.

The Solution: Enter fanotify

Instead of intercepting every transaction via FUSE, I realized I only needed to intervene in one specific scenario: When a user tries to open a file that has been archived.

Linux has a kernel API called fanotify originally designed for antivirus scanners. It allows a userspace program to monitor a mount point and, crucially, block an application from opening a file until the daemon says it’s okay.

Here is how HuskHoard uses fanotify to create transparent tiering:

  1. The Janitor: A background Rust thread scans my NVMe drive. When it finds a file that hasnt been touched in 30 days, it compresses it Zstd and moves the payload to a cheap HDD, LTO Tape, or S3 bucket.
  2. The Husk Stub: It leaves the original file on the NVMe drive but truncates its allocated size to 0 bytes creating a sparse file. To the OS and the user, the file still looks like it’s 50GB and sits in /home/movies.
  3. The Interceptor: This is where fanotify shines. The HuskHoard daemon listens for FAN_ACCESS_PERM events. If VLC media player tries to open that Husk file, fanotify pauses VLCs execution in the kernel.
  4. The Recall: HuskHoard intercepts the request, streams the 50GB payload from the tape/S3 bucket back into the sparse file on the NVMe, and then tells fanotify to allow VLC to proceed.

VLC thinks it just opened a local file. It has no idea the data was fetched from an S3 bucket 50 milliseconds ago.

The Rust Implementation

Rust was the obvious choice for this. When you are blocking kernellevel I/O requests, memory safety and predictable latency are nonnegotiable.

Handling the fanotify loop requires a few specific Linux capabilities specifically CAP_SYS_ADMIN, but Rust allows us to safely manage the multithreaded heavy lifting of the Archive Worker.

pub fn run_interceptor(config: Arc<HuskConfig>, use_direct_io: bool) -> std::io::Result<()> {
    let watch_dir = &config.hot_tier;
    let db_path = &config.db_path;
    info!("\n[Daemon] Starting fanotify interceptor on '{}'...", watch_dir);
    let abs_dir = std::fs::canonicalize(watch_dir)?;

    let fan_fd = unsafe {
        libc::fanotify_init(libc::FAN_CLASS_PRE_CONTENT, libc::O_RDWR as u32)
    };
    if fan_fd < 0 { 
        let err = std::io::Error::last_os_error();
        error!(" fanotify_init failed: {}. Missing Root or Capabilities!", err);
        return Err(err); 
    }


        let mark_mask = libc::FAN_ACCESS_PERM | libc::FAN_CLOSE_WRITE | libc::FAN_EVENT_ON_CHILD;

        // 1. Recursively mark the root watch directory and all current subdirectories
        info!("[Daemon]  Scanning and attaching listeners to all subdirectories...");
        mark_directory_recursive(fan_fd, &abs_dir, mark_mask, &config);
Enter fullscreen mode Exit fullscreen mode

Escaping Vendor Lockin The Easy Exit Promise

One of the biggest issues with commercial HSMs is that if the daemon dies, your data is gone, trapped in proprietary metadata.

Because I was building this for the opensource community, I enforced a strict Easy Exit architecture:

  • Payload data is stored in standard Zstd streams verified by BLAKE3.
  • The catalog metadata the Brain tracking where the cold bytes live is an SQLite database.
  • You can natively export the entire catalog to Apache Parquet.

This means if you decide to stop using HuskHoard, you dont need my software to get your data back. You can query your catalog with DuckDB or Python and manually extract your Zstd archives.

Whats Next?

Building HuskHoard has been a massive deepdive into Linux kernel APIs and SCSI Tape drivers yes, it natively supports physical LTO drives via /dev/nstX to prevent tape shoeshining.

The engine currently supports automated replication across local drives, tapes, and rclonesupported cloud buckets.

If you are a Rust developer, a HomeLab data hoarder, or just interested in Linux storage architecture, Id love your feedback or code reviews. It is fully AGPL v3 licensed.

Check out the repo here: [GitHub HuskHoard]https://github.com/huskhoard/huskhoard
More architecture details: [HuskHoard Blog]https://www.huskhoard.com/blog.html

Top comments (0)