DEV Community

dw1
dw1

Posted on • Originally published at dw1.io

Introducing mmapfile: Unlock Fast File Access in Go with Memory-Mapped I/O

Hey everyone, if you're a Go developer who's ever gotten frustrated with slow file ops, especially when dealing with big files or lots of reads and writes. Meet mmapfile, my little project that's basically a drop-in replacement for the standard *os.File -- but way faster. It uses memory-mapped I/O to skip a bunch of the usual overhead.

Let's chat about what it is, why it's cool, and how you can use it.

So, What is mmapfile?

mmapfile is a Go library that acts just like *os.File but under the hood, it maps files directly into your program's memory. Instead of calling the system for every little R/W operation, you get straight access to the file data. It's like having the file right there in RAM without actually loading it all up front.

It works on different systems, but if your platform doesn't support it, it just reads the whole file into memory as a fallback. See Platform Support.

The end result is a lot way faster file operations, especially for jumping around in files or reading A LOT.

Why?

I built this to be easy to swap in. Here are the highlights:

  • It's Basically Just *os.File: it implements all the interfaces you know and love:

    So, you can often just replace your os.Open calls with mmapfile.Open and be done.

    I also write Semgrep rules to automatically detect *os.File usage and suggest mmapfile replacements. See Semgrep Rules.

  • Zero-Copy: the star of the show is the Bytes() method. It gives you a direct view into the file's memory. No copying, no extra memory use, just point and read.

  • Works Everywhere: Linux, macOS, Windows, even the weirder Unix variants. It handles the differences for you.

  • Safe for Multiple Threads: ReadAt and WriteAt are thread-safe, so you can have goroutines hammering away without issues.

  • No Memory Waste: most operations don't allocate anything on the heap (read: keeps your GC chill).

Getting Started

Get it with:

go get go.dw1.io/mmapfile
Enter fullscreen mode Exit fullscreen mode

Quick example:

package main

import (
    "fmt"
    "log"

    "go.dw1.io/mmapfile"
)

func main() {
    f, err := mmapfile.Open("data.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    // read like normal
    buf := make([]byte, 100)
    n, err := f.Read(buf)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Read %d bytes: %s\n", n, buf[:n])

    // or grab the whole thing directly
    data := f.Bytes()
    fmt.Printf("File contents: %s\n", data)
}
Enter fullscreen mode Exit fullscreen mode

For creating files:

f, err := mmapfile.OpenFile("newfile.txt", os.O_RDWR|os.O_CREATE, 0644, 1024*1024) // 1MB file
Enter fullscreen mode Exit fullscreen mode

The API (Nothing Fancy btw)

It supports most os.OpenFile flags, minus a couple we'll talk about later. Here's what you can do:

Method What it does
Read([]byte) Read some bytes, move the cursor
ReadAt([]byte, int64) Read from a spot without moving the cursor
Write([]byte) Write bytes, move the cursor
WriteAt([]byte, int64) Write to a spot without moving the cursor
Seek(int64, int) Jump to a position
ReadFrom(io.Reader) Pull data from a reader
WriteTo(io.Writer) Push data to a writer
Close() Shut it down
Sync() Save changes to disk
Stat() Get file info
Name() File name
Len() File size
Bytes() Direct memory access

The sweet spot is random access. Use ReadAt/WriteAt for the best speed.

See Go reference.

Performance

I ran some benchmarks against regular *os.File, and mmapfile just destroys it in most cases.

Reads

  • Tiny stuff (1KB): 50x faster.
  • Huge files (1GB): Still 3x faster.
  • Parallel reads: 10-12x faster (no matter the size).

Writes

  • Small (1KB): 51x faster.
  • Medium (100KB): 6x faster.
  • Big sequential (500MB+): A bit slower because the kernel's tricks win there.

Other Stuff

  • Seek: 29x faster.
  • WriteTo: 254x faster.
  • Overall: About 6x faster on average.

There is no system calls for most ops, just direct memory access. Fast, simple, no surprises.

When to Use It (and When Not To)

mmapfile shines in these spots:

  1. Big files with random access: Like databases or parsing binary files.
  2. Lots of reading: Configs, static data, lookup tables.
  3. Memory-mapped DBs: Fixed-size stuff, logs that just append.
  4. Shared memory between processes: Multiple programs reading the same file.
  5. High-frequency I/O: Thousands of small ops per second.

Skip it for:

  1. Streaming data: Like from networks or pipes.
  2. Files that grow a lot: Needs fixed size.
  3. Huge sequential writes: Kernel buffering beats user-space copies.
  4. Tiny, rare files: Setup cost isn't worth it.

Gotchas and Limits

Nothing is perfect. Watch out for:

  1. Fixed size: Can't grow files after opening. Set size when creating.
  2. No truncate: To change size, close and reopen.
  3. No append mode: os.O_APPEND is not there.
  4. Cursor ops are slower: Stick to ReadAt/WriteAt.

and Bytes() gives you a slice that's only good until Close(). So if you mess with read-only files, you WILL crash.

Thread Safety

Built for concurrency:

  • ReadAt/WriteAt: Safe to call from multiple goroutines.
  • Read/Write/Seek: Share a cursor, so lock if concurrent.
  • Close: Don't call while others are running.

Real-World Uses

Perfect for:

  • DB engines: Quick jumps to data pages.
  • Log munching: Parsing giant log files.
  • Config loading: Fast parsing of big configs.
  • File caches: Persistent, memory-backed caches.
  • Science stuff: Working with binary datasets.

Wrapping Up

If you're hacking on Go apps with heavy file I/O, mmapfile could be your new best friend. The speedups for reads and random access are killer for modern apps.

But hey, it's not for everything. Think about your use case; do you need growing files? Streaming? If not, give it a shot.

This is pre-1.0, so things might change. Go check it out on GitHub, try it, and let me know what you think!

Top comments (0)