Cheng Pan

Posted on Feb 20

Learning Linux - splice

#linux #go #splice

The Simple Problem

I'm recently working on a project that requires concatenating many small files (~100k) into a large file. The file size ranges from 32KB to 1MB. At first glance, the problem is pretty straight forward, I simply read the content of each file into memory and write the content into the destination file. The following is my initial implementation:

package main

import (
    "io"
    "os"
)

func main() {
    for i := 1; i < len(os.Args); i++ {
        f, err := os.Open(os.Args[i])
        if err != nil {
            panic(err)
        }
        defer f.Close()

        _, err = io.Copy(os.Stdout, f)
        if err != nil {
            panic(err)
        }
    }
}

What the above program does is almost identical to what the cat command does. Since the program simply reads the data from underlying device (whichever backs the file) and write back to another file. I started to wonder is there a more performant way to implement it? Because the files are read and written as is, there is not need to do this extra hop of reading the data into user space!

After a bit of digging, it turned out that you can copy files without reading its content. The answer is splice.

What's splice?

The following is from Linux man page.

splice() moves data between two file descriptors without copying between kernel address space and user address space. It transfers up to len bytes of data from the file descriptor fd_in to the file descriptor fd_out, where one of the file descriptors must refer to a pipe.

That's exactly what we want!

By reading at it's function definition:

ssize_t splice(int fd_in, off64_t *off_in, int fd_out, off64_t *off_out, size_t len, unsigned int flags);

The splice function takes a fd_in at offset off_int, and writes data of given length len into fd_out at offset off_out. Pretty straight forward. However, one interesting part is one of the file descriptor must be a pipe. For reason why, please refer to Linus's comment at here.

After more Googles, I couldn't find a good post talk about how to implement it in Golang (hence the creation of this post), but I found this post about a implementation in Rust. Now we have a good start point.

First Attempt

My first attempt is simply translating the Rust implementation into Golang. Here is what I got:

const BUF_SIZE = 16 * 1024 // Using the same 16 KB

// splice file with zero-copy to speed up resembling process
// https://man7.org/linux/man-pages/man2/splice.2.html
func SpliceTo(src, dst *os.File) error {
    // create the pipe
    rp, wp, err := os.Pipe()
    if err != nil {
        return errors.Wrap(err, "pipe failed")
    }
    defer rp.Close()
    defer wp.Close()

    var (
        inFd  = int(src.Fd())
        outFd = int(dst.Fd())
        wpFd  = int(wp.Fd())
        rpFd  = int(rp.Fd())
    )

    for {
        nr, err := syscall.Splice(inFd, nil, wpFd, nil, BUF_SIZE, 0)
        if err != nil {
            return errors.Wrap(err, "failed to splice to the inFd")
        }
        if nr <= 0 {
            break
        }

        _, err = syscall.Splice(rpFd, nil, outFd, nil, BUF_SIZE, 0)
        if err != nil {
            return errors.Wrap(err, "failed to splice to the outFd")
        }
    }

    return nil
}

func main() {
    in, err := os.Open(os.Args[1])
    if err != nil {
        panic("failed to open the file")
    }
    defer in.Close()

    err = SpliceTo(in, os.Stdout)
    if err != nil {
        panic(fmt.Sprintf("failed to splice the file: %v", err))
    }
}

Now let's test out the program, I first created the random file with urandom and I got 2.02GiB/s:

head -c 2G </dev/urandom > large-file

go run main.go large-file | pv -r >/dev/null
[2.02GiB/s]

And I'm using cat as the benchmark, I got 1.78GiB/s. It's similar to the result that I got from the other post:

cat large-file| pv -r > /dev/null
[1.78GiB/s]

It's a surprising to me that the result using splice is not as good as what I expected, and it's much lower than the 5.90GiB/s in the other post.

Second Attempt

I was puzzled by what might be the reason. The first thinking was the buffer size might be too small, so that I increased the buffer size from 16KB to 160KB, which is 10 times more:

const BUF_SIZE = 16 * 1024 * 10 // 160 KB

Now, let's run the program again:

go run main.go large-file | pv -r >/dev/null
[3.63GiB/s]

The throughput jumps right to 3.63GiB/s. That's much better!

As an engineer, I'd like proving the theory with experimentation. Although I'm getting a much better results than the benchmark, I was wondering the correctness of the program. So I tested the spliced output using a hash program, eg sha256sum. If I can get the same hash between the benchmark and my program, than the result is correct.

Here is what I did:

# Get the hash using cat
cat large-file| sha256sum
86df6afa07b75faab97cf0a3021f884e10783d6f1e2f2447f65f2c644c969009  -

# Get the hash using splice
go run main.go large-file | sha256sum
cbdf56b4cf4e4c8b1d254c4ec937ba26fe9b64a2cf9acc3a0ae8122f52d143aa  -

Hmmm... The hash doesn't match!

Third Attempt

I was puzzled again about what might be wrong. And there is one thing that caught my eye. If you are a careful reader, you might already noticed, the Go Splice syscall returns not only an error, but also the number of bytes that's got spliced. What if the number of bytes is not the same as the buffer size? If you have dealt with I/O, it's a common pattern that the kernel uses to signal back pressure to user space program. Sometimes kernel could be busy or limited by some other factors, so that I might not also fulfill the requested I/O, which is 160KB BUFF_SIZE in this case. And when this happens, it will only return the number of bytes that actually got written (or spliced in our case).

Let's first prove the theory by printing out the number of bytes nr in the first splice call:

    for {
        nr, err := syscall.Splice(inFd, nil, wpFd, nil, BUF_SIZE, 0)
        if err != nil {
            return errors.Wrap(err, "failed to splice to the inFd")
        }
        if nr <= 0 {
            break
        }
        fmt.Fprintf(os.Stderr, "nr: %v\n", nr)

        _, err = syscall.Splice(rpFd, nil, outFd, nil, &nr, 0)
        if err != nil {
            return errors.Wrap(err, "failed to splice to the outFd")
        }
    }

and here is what I got after running the program:

nr: 65536
...
nr: 65536

It turned out that although we requested for 160KB, but there is only 64KB got spliced. And the following is the final implementation that I got:

const BUF_SIZE = 16 * 10240 // 160 KB

// splice file with zero-copy to speed up resembling process
// https://man7.org/linux/man-pages/man2/splice.2.html
func SpliceTo(src, dst *os.File) error {
    // create the pipe
    rp, wp, err := os.Pipe()
    if err != nil {
        return errors.Wrap(err, "pipe failed")
    }
    defer rp.Close()
    defer wp.Close()

    var (
        inFd  = int(src.Fd())
        outFd = int(dst.Fd())
        wpFd  = int(wp.Fd())
        rpFd  = int(rp.Fd())
    )

    for {
        nr, err := syscall.Splice(inFd, nil, wpFd, nil, BUF_SIZE, 0)
        if err != nil {
            return errors.Wrap(err, "failed to splice to the inFd")
        }
        if nr <= 0 {
            break
        }

        // safe to convert int64 to int since the BUFF_SIZE is int
        toWrite := int(nr)
        // toWrite might not be fulfilled, so need a for loop for splice
        // to make sure all the toWrite are spliced
        for toWrite > 0 {
            nw, err := syscall.Splice(rpFd, nil, outFd, nil, toWrite, 0)
            if err != nil {
                return errors.Wrap(err, "failed to splice to the outFd")
            }
            toWrite -= int(nw)
        }
    }

    return nil
}

Make sure the hash is correct:

go run main.go large-file | sha256sum
86df6afa07b75faab97cf0a3021f884e10783d6f1e2f2447f65f2c644c969009  -
cat large-file| sha256sum
86df6afa07b75faab97cf0a3021f884e10783d6f1e2f2447f65f2c644c969009  -

And make sure the throughput is performant:

go run main.go large-file | pv -r >/dev/null
[3.40GiB/s]

Summary

Splice is a powerful zero-copy optimization technique that could be used to speed up I/O when dealing with file bytes across kernal space and user space. However, there are caveat that you need to careful about to implement a both performance and correct program. Here is what I learnt:

If the buffer size is too small, it could limit the throughput of the splice performance.
Splice returns the number of bytes that actually got written, and it could be smaller than the requested size, and the user program need to handle the case to get correct results.

DEV Community

Learning Linux - splice

The Simple Problem

What's splice?

First Attempt

Second Attempt

Third Attempt

Summary

Top comments (0)

Read next

Tools for Linux Distro Hoppers

Say good bye to cd and hello Zoxide - the better and smarter cd command!

Install Cloudflare WARP on any Linux Distro, Thanks to Distrobox!

Integration test with Go and PostgreSQL