SEN LLC

Posted on Apr 16

A 600-line Rust log rotator, or: what logrotate would look like if it had no config file

#rust #cli #logs #ops

A 600-line Rust log rotator, or: what logrotate would look like if it had no config file

log-rotate --max-size 10M --keep 5 /var/log/app.log — one binary, no /etc/logrotate.d, no state file, no DSL. Gzip-compresses old copies, prunes beyond the retention window, supports dry-run. Written in Rust with a pure-planning core that makes the test suite almost trivial.

📦 GitHub: https://github.com/sen-ltd/log-rotate

I love logrotate(8). I've used it for years. It is also wildly over-engineered for what I actually do with it on small boxes, which is: "if this file got big, rename it, gzip it, delete anything older than N generations." That's it. That's the whole job on most of my machines. And yet logrotate asks me to write a config file, which goes in a well-known directory, which is read by a cron job that also owns a state file, which I then have to debug with -d -v when it silently stops rotating because of permissions I didn't set quite right.

So I wrote log-rotate: a single-binary Rust CLI that does one thing. No config file. You call it from cron (or systemd timer, or a healthcheck, whatever) and it either rotates or it doesn't. About 600 lines of code, 46 tests, 644 KB statically-linked musl binary, a ~20 MB Alpine runtime image. The whole thing took a Saturday.

The problem I wanted to solve

Here's the shape of the cron lines I keep writing on small boxes:

*/10 * * * * root /usr/local/bin/log-rotate --max-size 10M --keep 5 /var/log/myapp.log

That is the entire relationship I want with log rotation on 90% of my services. I don't need per-file pre/post hooks. I don't need copytruncate as a Big Deal. I don't need compresscmd and uncompresscmd and olddir. I want size-or-age-based rotation, gzip the old files, keep the last N, and get out of my way.

The trick to making this easy is refusing to make it configurable. You don't get named config files. You get CLI flags and one file argument. If you want different policies for different logs, you write different cron lines. If that sounds primitive, consider that it is the exact same interface as every other Unix command.

The rotation model

This part is not interesting, and that's on purpose. Every Unix log rotator since the 1980s does the same dance:

log        → log.1.gz
log.1.gz   → log.2.gz
log.2.gz   → log.3.gz
log.N.gz   → deleted (if N > --keep)

To open a fresh slot at log.1.gz, you walk the existing numbered files highest-index first — otherwise log.1.gz → log.2.gz stomps the old log.2.gz before you've moved it out of the way. Then the live file gets renamed to log.1, gzipped into log.1.gz, and a fresh empty log is created. Processes that held the file open keep writing to the now-unlinked inode (which is the thing you just gzipped — sorry about that last half-second of bytes). That's called create mode in logrotate and it's the only mode I ever use, so it's the only mode I implemented.

Where the code actually lives

The one design decision I care about is splitting decisions from actions. The library is pure; only main.rs touches the filesystem. This split is a massive multiplier on test ergonomics, and I want to show you the shape.

src/
  cli.rs          clap derive, flag surface
  size.rs         parse "10M"  → u64 bytes
  duration.rs     parse "7d"   → u64 seconds
  plan.rs         should_rotate / rotate_plan / prune_plan — NO fs calls
  main.rs         the one place bytes actually move

plan.rs is the heart of the program. It takes plain data in (sizes, ages, a list of existing rotated indices) and returns plain data out (a Vec<Action>). No Paths get read, no files get opened, no syscalls are made. Here's the Action enum in full:

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Action {
    /// Rename `from` to `to`. Used to bump `log.N.gz → log.(N+1).gz`.
    Rename { from: PathBuf, to: PathBuf },
    /// Read `src`, gzip-encode into `dst`, delete `src`.
    Gzip { src: PathBuf, dst: PathBuf },
    /// Delete a file that's beyond the retention window.
    Delete { path: PathBuf },
    /// Truncate the live log file to zero bytes.
    Truncate { path: PathBuf },
}

rotate_plan is then a pure function: give me the live log path and the sorted list of existing rotated indices, and I'll give you the sequence of actions that reach the post-rotation state.

pub fn rotate_plan(
    log_path: &std::path::Path,
    mut existing: Vec<u32>,
) -> Vec<Action> {
    existing.sort_unstable();
    existing.dedup();

    let mut actions = Vec::new();
    let base = log_path.to_path_buf();

    // Step 1: bump existing, highest-first.
    for &n in existing.iter().rev() {
        let from = sibling(&base, &format!("{}.gz", n));
        let to   = sibling(&base, &format!("{}.gz", n + 1));
        actions.push(Action::Rename { from, to });
    }

    // Step 2: live → .1 (uncompressed, to be gzipped next).
    let staging = sibling(&base, "1");
    actions.push(Action::Rename {
        from: base.clone(),
        to:   staging.clone(),
    });

    // Step 3: gzip .1 → .1.gz (and remove .1).
    let gz = sibling(&base, "1.gz");
    actions.push(Action::Gzip { src: staging, dst: gz });

    // Step 4: empty the live file so readers with open fds have
    // something to fsync into.
    actions.push(Action::Truncate { path: base });

    actions
}

Note the iter().rev(). That's the "highest-first" rule from before — and now it's testable without touching disk. Which brings me to the point.

Tests without tempdirs

Because rotate_plan is pure, the tests look like this:

#[test]
fn rotate_plan_bumps_highest_first() {
    let p = Path::new("app.log");
    let plan = rotate_plan(p, vec![1, 2, 3]);

    // 3 bumps (highest first) + rename + gzip + truncate
    assert_eq!(plan.len(), 6);
    assert_eq!(
        plan[0],
        Action::Rename {
            from: PathBuf::from("app.log.3.gz"),
            to:   PathBuf::from("app.log.4.gz"),
        }
    );
    assert_eq!(
        plan[1],
        Action::Rename {
            from: PathBuf::from("app.log.2.gz"),
            to:   PathBuf::from("app.log.3.gz"),
        }
    );
}

No tempdir. No fs::write setup dance. No "can we write here?" skip logic for CI. Just: here's the input, here's the output. The test for gap handling (what if log.1.gz and log.4.gz both exist but log.2.gz and log.3.gz were deleted?) is three lines:

#[test]
fn rotate_plan_handles_gaps() {
    let plan = rotate_plan(Path::new("a"), vec![1, 4]);
    // highest-first bumps: 4 → 5, then 1 → 2
    assert_eq!(plan[0], Action::Rename {
        from: PathBuf::from("a.4.gz"),
        to:   PathBuf::from("a.5.gz"),
    });
}

That's a bug I would have found on a production box a year from now if the tests had required filesystem setup — because gap handling is an edge case I would never have bothered to write a test for without this ergonomic payoff.

The pattern is the same for the size/duration parsers. They're pure string-to-number functions, so each one got six or seven unit tests that cover every suffix, every error path, and every overflow case in about thirty lines of test code. 26 unit tests live inside the library crate. Four more live inside main.rs for the handful of helpers that work on plain Paths without actually reading them. Sixteen more run the real binary against a tempdir through assert_cmd. 46 total. The ratio of pure-function tests to full-binary tests is roughly 2:1, and that is the ratio I want in every CLI I write.

The one place bytes move

main.rs is thin on purpose:

fn execute(action: &Action) -> io::Result<()> {
    match action {
        Action::Rename { from, to }  => fs::rename(from, to),
        Action::Gzip { src, dst }    => gzip_file(src, dst),
        Action::Delete { path }      => fs::remove_file(path),
        Action::Truncate { path }    => {
            OpenOptions::new()
                .create(true).write(true).truncate(true)
                .open(path)?;
            Ok(())
        }
    }
}

That's it. The interesting logic — "what should we do?" — happened back in plan.rs, in memory, without permission to side-effect. Everything here is the boring bit: call the syscall. And because this function is so dumb, --dry-run is a three-line patch at the top of the loop: don't call execute, just print action.describe() for each one.

gzip_file is a stream-copy through flate2::write::GzEncoder with a 64 KB buffer. I didn't benchmark it because log rotation is not hot-path work; even a naive implementation compresses faster than logs are generated on every machine I own. The one subtlety is deleting the source file only after encoder.finish() returns Ok, so a crash mid-gzip leaves the pre-rotation .1 file in place and the next run is still recoverable.

Size and duration parsing

This is the kind of thing that's always quietly awkward. The size parser handles bytes, K/KB/KiB, M/MB/MiB, and G/GB/GiB. All binary multipliers (1 K = 1024), because every log rotator I've ever used treats them that way and nobody writing a log retention policy cares about the SI/IEC distinction. The whole thing is forty lines:

pub fn parse_size(s: &str) -> Result<u64, String> {
    let s = s.trim();
    if s.is_empty() { return Err("empty size".into()); }
    let split = s.find(|c: char| !c.is_ascii_digit()).unwrap_or(s.len());
    let (num, suffix) = s.split_at(split);
    if num.is_empty() { return Err(format!("size must start with digits: {s}")); }
    let n: u64 = num.parse().map_err(|_| format!("bad number in size: {num}"))?;
    let mul: u64 = match suffix.to_ascii_uppercase().as_str() {
        "" | "B"                => 1,
        "K" | "KB" | "KIB"      => 1024,
        "M" | "MB" | "MIB"      => 1024 * 1024,
        "G" | "GB" | "GIB"      => 1024 * 1024 * 1024,
        other => return Err(format!("unknown size suffix: {other}")),
    };
    n.checked_mul(mul).ok_or_else(|| format!("size overflow: {s}"))
}

The duration parser is the same shape: s, m, h, d, w. No compound durations, because 1h30m doesn't correspond to any real log retention policy I've seen a human pick on purpose. If you want to rotate every 90 minutes, you want to rewrite your service.

What I left out

The scope-control list is where this program lives or dies, because a log rotator is one of those ideas where you can scope-creep for six months without shipping. I didn't:

Read a config file. Cron lines are my config file.
Manage state. logrotate keeps /var/lib/logrotate/logrotate.status so it knows when it last rotated each file. I read the file's mtime. That's the state. If you want "rotate every 7 days", I check if mtime > 7 days ago. You get one bit of state for free from the filesystem, and I think logrotate would be a simpler program if it used it.
Support copytruncate mode. create mode is what I want, and I haven't worked at a company that truly needed copytruncate in about ten years. Long-running processes that can't reopen their log file are a configuration mistake I can usually push back on.
Signal-based reload. --postrotate and kill -HUP scripts are out of scope. If you need to tell something to reopen its logs, write that as a separate cron line on the next minute.
JSON output. Exit codes are the stable interface.

Every one of those is a thing logrotate does and I specifically don't. The boundary is severe on purpose. The goal is a binary I can scp to a small box and forget about for three years.

The Docker side

The runtime image is alpine:3.20 with a non-root user. The builder is rust:1.90-alpine with musl-dev. The Dockerfile uses the standard "deps-cache layer" trick: copy Cargo.toml and Cargo.lock first, build a dummy stub, then copy the real source and rebuild. When only source changes (which is most of the time), the dependency compilation is cached.

FROM rust:1.90-alpine AS builder
RUN apk add --no-cache musl-dev
WORKDIR /build

COPY Cargo.toml Cargo.lock ./
RUN mkdir -p src tests && \
    echo 'fn main(){}' > src/main.rs && \
    echo ''            > src/lib.rs && \
    cargo build --release && \
    rm -rf src tests

COPY src ./src
COPY tests ./tests
RUN touch src/main.rs src/lib.rs && \
    cargo build --release && \
    cargo test --release

FROM alpine:3.20
RUN adduser -D -u 1000 logrotate
USER logrotate
COPY --from=builder /build/target/release/log-rotate /usr/local/bin/log-rotate
ENTRYPOINT ["/usr/local/bin/log-rotate"]

The release profile is the aggressive-small one from the Rust book: strip = true, lto = true, codegen-units = 1, opt-level = "z", panic = "abort". That buys me a 644 KB binary instead of the 1.8 MB you get with default settings. For a program that only does one thing, that's the right trade.

Using it

# Rotate /var/log/app.log if it's larger than 10 MB, keep 5 copies.
log-rotate --max-size 10M --keep 5 /var/log/app.log

# Rotate if the file is older than 7 days.
log-rotate --max-age 7d /var/log/app.log

# Either trigger (logrotate-compatible OR).
log-rotate --max-size 100M --max-age 1w /var/log/app.log

# Show the plan without touching anything.
log-rotate --max-size 10M --dry-run /var/log/app.log

# Force a rotation regardless of thresholds — handy from a cron job
# that always wants a daily copy.
log-rotate --force --keep 7 /var/log/app.log

The exit code is 0 on success (including "nothing needed rotating"), 2 on bad arguments or IO errors, and 3 under --exit-on-noop if no action was taken. That last one is for monitoring scripts that want to know whether cron actually did any work.

The lesson, such as it is

Every time I write a CLI in Rust I learn the same lesson again: keep the core pure. The test suite writes itself. --dry-run is a patch, not a feature. Swapping the planner for a new one later is cheap. The whole program reads linearly because the interesting code all lives in one file with no IO imports.

The reason I keep re-learning it is that most of the Rust CLI examples out there don't do it. They reach for Path and File in the library layer, and then the tests reach for tempfile, and then the tests are slow and annoying, and the next project they grow up into "I'll just write integration tests, unit tests are too much ceremony." And they're right, but only because they skipped the split that would have made the unit tests ceremony-free.

log-rotate is not a replacement for logrotate. It's the version I wish I could have dropped on small boxes in 2010 when all I needed was one cron line and something that didn't need a config file to prove itself.

Source, Dockerfile, full test suite: https://github.com/sen-ltd/log-rotate. MIT licensed.

DEV Community

A 600-line Rust log rotator, or: what logrotate would look like if it had no config file

A 600-line Rust log rotator, or: what logrotate would look like if it had no config file

The problem I wanted to solve

The rotation model

Where the code actually lives

Tests without tempdirs

The one place bytes move

Size and duration parsing

What I left out

The Docker side

Using it

The lesson, such as it is

Top comments (0)