Gabriel Anhaia

Posted on May 4

io.Reader Composition: Build a Logging Pipeline From Stdlib Pieces

#go #http #stdlib #logging

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I worked with kept asking the same question every
sprint. They had an HTTP service that accepted webhook
payloads from a partner and forwarded them to an internal
billing API. Compliance wanted a copy of every request
body archived to disk, gzipped, with a line-by-line audit
log to stdout for debugging. The first attempt pulled in
three libraries, a goroutine pool, and a Kafka producer.
None of it worked end to end after a week.

The second attempt threw the libraries away and used four
types from the Go standard library. It shipped in an
afternoon. The whole pipeline came in around fifty lines.

Forget the audit log for a second. What matters is how
io.Reader and io.Writer compose, and why a Go service
that needs to split, transform, and forward bytes rarely
needs more than the stdlib.

The four pieces

You only need four types to build the pipeline.

io.TeeReader returns a reader that copies every byte
read from a source into a writer you supply. It is a wire
tap. The downstream consumer never knows it is there.

io.MultiWriter returns a writer that writes the same
bytes to every writer you give it. It is a fan-out.

bytes.IndexByte from the bytes package finds the next
newline in a buffer. That is all you need to split a
stream into prefix-tagged lines without bufio.Scanner's
64KB-token cap.

gzip.Writer from compress/gzip wraps any
io.Writer and compresses what gets written through it.

Each one of those types accepts an interface and returns
an interface. That is the entire reason this works. You
plug them into each other and the bytes flow.

The shape of the pipeline

The handler receives an HTTP request. The body is an
io.Reader. You want three things to happen to it:

Every line in the body should land on stdout for the audit log.
The full body should land on disk, gzipped, in a per-request file.
The body should be forwarded upstream as the body of a new HTTP request.

The catch is that an io.Reader can only be read once.
Reading it for the audit log consumes the bytes before
the upstream request gets to see them. You need to split
the stream.

That is what io.TeeReader is for. Read from the tee,
and a copy of the bytes lands in the writer you handed
it. The downstream code reads the tee and never knows
the bytes were copied.

The handler

The full handler comes in around fifty lines of Go, no
third-party imports, runnable as-is.

package main

import (
    "bytes"
    "compress/gzip"
    "fmt"
    "io"
    "log"
    "net/http"
    "os"
    "path/filepath"
    "time"
)

const upstream = "https://billing.internal/v1/events"

func forward(w http.ResponseWriter, r *http.Request) {
    id := fmt.Sprintf("req-%d", time.Now().UnixNano())

    path := filepath.Join("audit", id+".gz")
    if err := os.MkdirAll("audit", 0o755); err != nil {
        http.Error(w, "audit dir", 500)
        return
    }
    f, err := os.Create(path)
    if err != nil {
        http.Error(w, "audit file", 500)
        return
    }
    defer f.Close()

    gz := gzip.NewWriter(f)
    defer gz.Close()

    stdoutPrefix := newPrefixWriter(os.Stdout, id+" ")
    tap := io.MultiWriter(gz, stdoutPrefix)
    body := io.TeeReader(r.Body, tap)

    req, err := http.NewRequestWithContext(
        r.Context(), http.MethodPost, upstream, body,
    )
    if err != nil {
        http.Error(w, "upstream req", 500)
        return
    }
    req.Header = r.Header.Clone()
    // In production, strip hop-by-hop headers (Connection,
    // Keep-Alive, Proxy-*, TE, Transfer-Encoding, Upgrade)
    // and any partner auth before forwarding.

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        http.Error(w, "upstream do", 502)
        return
    }
    defer resp.Body.Close()

    w.WriteHeader(resp.StatusCode)
    io.Copy(w, resp.Body)
}

func main() {
    http.HandleFunc("/forward", forward)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

The prefixWriter is the only helper you have to write
yourself. It buffers bytes until it sees a newline, then
flushes one prefixed line at a time. That keeps lines
intact across Write calls (HTTP bodies arrive in chunks
that rarely line up with \n boundaries) and avoids the
64KB-per-token cap that bufio.Scanner would impose.

type prefixWriter struct {
    out    io.Writer
    prefix string
    buf    []byte
}

func newPrefixWriter(w io.Writer, p string) *prefixWriter {
    return &prefixWriter{out: w, prefix: p}
}

func (p *prefixWriter) Write(b []byte) (int, error) {
    p.buf = append(p.buf, b...)
    for {
        i := bytes.IndexByte(p.buf, '\n')
        if i < 0 {
            break
        }
        line := p.buf[:i]
        _, err := fmt.Fprintf(
            p.out, "%s%s\n", p.prefix, line,
        )
        if err != nil {
            return 0, err
        }
        p.buf = p.buf[i+1:]
    }
    return len(b), nil
}

Five lines in forward carry the work (error handling
elided, see the full handler above):

gz := gzip.NewWriter(f)
stdoutPrefix := newPrefixWriter(os.Stdout, id+" ")
tap := io.MultiWriter(gz, stdoutPrefix)
body := io.TeeReader(r.Body, tap)
req, _ := http.NewRequestWithContext(
    r.Context(), http.MethodPost, upstream, body,
)

gz writes compressed bytes to the audit file.
stdoutPrefix writes prefixed lines to stdout.
io.MultiWriter fans the stream out to both.
io.TeeReader makes every read from r.Body also write
to that fan-out. The upstream HTTP client reads from the
tee. Every byte the upstream sees is also gzipped to disk
and printed to stdout.

No goroutines, no manual copy loops. The runtime drives
the whole thing because the HTTP client pulls bytes out
of the body, and pulling from a tee is what makes the
side-effects happen.

One detail to keep honest: gzip.Writer only flushes its
trailer on Close(). The deferred gz.Close() on line
117 is what makes the .gz file decompressible. Drop
that defer and gunzip rejects the archive as truncated,
because the gzip CRC and length frame never get written.

Run it

go run main.go
curl -X POST -d '{"event":"signup","user":42}' \
  http://localhost:8080/forward

Stdout shows:

req-1761742893214567000 {"event":"signup","user":42}

The audit/ directory has a fresh .gz file. Decompress
it and you get the original body back.

gunzip -c audit/req-1761742893214567000.gz

That is the entire pipeline. Around fifty lines of Go,
zero dependencies, three concurrent side-effects on a
single stream of bytes.

Why this composes

Go's io.Reader and io.Writer are one-method
interfaces. Read([]byte) (int, error). Write([]byte) (int, error). Anything that satisfies the signature is
a reader or a writer. A file is one. A network
connection is one. A buffer is one. A request body is
one. A gzip writer is one. They all interchange.

Most languages have stream abstractions, but few make
them this small. Java's InputStream is a class
hierarchy with abstract methods, mark/reset semantics,
and decorators that have to extend FilterInputStream.
Node's Readable is an EventEmitter that pushes data
through a state machine. The Go interface is two
methods, no inheritance, no events. That is why the
composition reads like Lego.

The other reason it works is that the pieces in the
stdlib follow the same convention every time. A
function that wraps a reader takes an io.Reader and
returns one. A function that wraps a writer takes an
io.Writer and returns one. gzip.NewWriter,
io.TeeReader, io.MultiWriter, bufio.NewScanner,
json.NewDecoder, crypto/cipher.StreamWriter, and
compress/flate.NewWriter all follow the same shape.
Once you spot it, you assemble pipelines without
consulting docs.

When this earns its keep

The pattern shines for audit pipelines, ETL, and
proxies that need to observe or transform bytes in
flight without buffering the whole stream in memory.
Three things make it the right tool:

The work is per-byte or per-line, not per-record. You do not need to parse the body to do the side-effect. Compression and logging are byte-level.
The downstream consumer pulls. HTTP clients, file copies, and io.Copy all pull from the source. io.TeeReader only does work when the consumer asks for bytes, so the side-effects never get ahead of the forwarder.
You want one shot of backpressure. If the upstream HTTP server is slow, the tee reader stops being read, the disk writer stops being written, and stdout stops printing. The bottleneck propagates for free.

A team running a webhook proxy at a few thousand
requests per second can ship this and stop thinking
about it. There is nothing in there to break.

When a streaming framework wins

Three signals that you have outgrown the stdlib pattern:

You need durable buffering between stages. The disk writer dies and you want the upstream forward to keep going from a checkpoint. io.TeeReader ties the two together. A queue or a Kafka topic between them does not.
You need a fan-in. Multiple producers feeding one consumer with ordering or windowing. The stdlib gives you fan-out for free. Fan-in with semantics is a framework job.
The transformation is record-level, not byte-level. You want to parse JSON, drop fields with PII, and re-emit. That is decoder territory, not reader territory. You will end up with json.Decoder over the tee anyway, but the moment you also need schema-driven routing or DLQs, a framework like Apache Beam, Flink, or even a hand-rolled goroutine pipeline with channels carries its weight.

The mental rule that has held up: if you can describe
the work as "for every byte, also do X," the stdlib is
enough. Once it becomes "for every record, depending
on its shape, do Y or Z," reach for something with more
machinery.

The takeaway

Go's stdlib hands you a stream-processing toolkit
shaped like four interfaces and ten functions. Most
production pipelines that pretend to need a framework
need a tee, a multiwriter, a gzip writer, and a few
lines of bookkeeping. Fifty lines of code, no
goroutines, no third-party deps. The composition is the
architecture.

Read the source of io.MultiWriter and io.TeeReader
in io/multi.go
once. Both fit on a screen. Once you see how small they
are, you start spotting opportunities to use them
everywhere.

If this was useful

The standard library is the unsung hero of most Go
codebases. The Complete Guide to Go Programming
covers io.Reader and io.Writer composition in
depth, along with the rest of the stdlib idioms that
make Go services small. The companion volume,
Hexagonal Architecture in Go, covers what to do once
the pipeline grows past a single handler and you need
to keep the boundaries clean.