SEN LLC

Posted on Apr 15

hexview: writing a color-aware hex dump in Rust with one dependency

#rust #cli #tutorial #binary

hexview: writing a color-aware hex dump in Rust with one dependency

A tiny Rust CLI that prints the kind of hex dump you actually want to read — green printable ASCII, yellow control bytes, cyan high bytes, dim nulls, and a side-by-side diff mode. One real dependency (clap), hand-rolled ANSI, ~500 lines of source. Ships as a 9 MB Alpine image.

Every engineer reaches for xxd or hexdump -C a few times a month — debugging a protocol, staring at a binary file format, confirming what's actually at offset 0x42. And every time, the output is a dense wall of gray hex that your eyes have to decode by hand. xxd works. It has worked since forever. But it's not ergonomic, and its flags are a tiny cryptic language (-s, -l, -c, -g, -b, -e…) that I re-Google every single time.

hexview is my answer. It's a small Rust CLI that does what xxd -C does, plus:

color-highlights printable ASCII, control chars, nulls, and high bytes into four distinct visual groups;
has a --diff mode that prints two files side by side with mismatched bytes in red;
auto-disables color when piped (via std::io::IsTerminal + honoring NO_COLOR);
takes a single real dependency (clap) — the color is hand-rolled ANSI.

🔗 GitHub: https://github.com/sen-ltd/hexview

The rest of this post is a tour of the interesting bits. hexview is also what I'd hand someone as a starter Rust project: it exercises clap, error handling with Box<dyn Error>, buffered I/O, IsTerminal, a little bit of binary formatting, and integration tests with assert_cmd — all without pulling in a pile of crates.

The problem I actually wanted to fix

Here is xxd on a short HTTP request:

00000000: 4745 5420 2f61 7069 2f75 7365 7273 2048  GET /api/users H
00000010: 5454 502f 312e 310d 0a48 6f73 743a 2073  TTP/1.1..Host: s

Here is hexview on the same bytes:

00000000  47 45 54 20 2f 61 70 69  2f 75 73 65 72 73 20 48  |GET /api/users H|
00000010  54 54 50 2f 31 2e 31 0d  0a 48 6f 73 74 3a 20 73  |TTP/1.1..Host: s|

The obvious differences:

One byte per "word" instead of two. xxd packs bytes in pairs (4745), which is mildly useful for little-endian hex but makes it awkward to find a specific byte position by eye. hexdump -C already does this and hexview follows it.
Group separator in the middle. After 8 bytes there's an extra space. You can find position 12 at a glance.
ASCII panel with pipes. Again, exactly what hexdump -C does.
Colors, which xxd and hexdump don't do at all, even though every terminal on earth has supported them since 1985.

And on top of that, hexview adds knobs I actually want: --format bin to see bytes as binary (for flag registers), --diff for comparing two files positionally, and --skip/--length with sane short names (-s and -n) that I can remember without the man page.

That's the whole pitch. Now let's build it.

Dependencies: just clap

My Cargo.toml is boring on purpose:

[dependencies]
clap = { version = "4", features = ["derive"] }

[dev-dependencies]
assert_cmd = "2"
predicates = "3"

No anyhow. No thiserror. No colored or owo-colors. No crossterm. For a CLI this small, Box<dyn Error> is enough, and the handful of ANSI escape codes I need fit on half a screen. Rust's standard library covers everything else — std::io::BufReader, std::fs::File, std::io::IsTerminal. The release profile is aggressive:

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"

With those flags the Alpine-musl binary compresses down to roughly 900 KB, and the final Docker image is 9.4 MB with a non-root hex user and a /work volume ready to go.

The dump loop

The core is ~50 lines. It reads one line's worth of bytes at a time into a reusable buffer, then hands that buffer to a row renderer:

pub fn dump<R: Read, W: Write>(
    reader: &mut R,
    writer: &mut W,
    cfg: &DumpConfig,
    limit: u64,
) -> std::io::Result<()> {
    let mut buf = vec![0u8; cfg.width];
    let mut address = cfg.start_address;
    let mut remaining = limit;

    loop {
        if remaining == 0 { break; }
        let want = cfg.width.min(remaining as usize);
        let n = read_full(reader, &mut buf[..want])?;
        if n == 0 { break; }
        render_line(writer, &buf[..n], address, cfg)?;
        address = address.saturating_add(n as u64);
        remaining = remaining.saturating_sub(n as u64);
        if n < want { break; }
    }
    Ok(())
}

Two details that matter:

read_full is not read_exact. read_exact errors on a short read; for the last line of a file, a short read is the expected case, not a failure. So I wrote a tiny helper that loops until the buffer is full or we hit EOF, ignoring Interrupted:

fn read_full<R: Read>(r: &mut R, buf: &mut [u8]) -> std::io::Result<usize> {
    let mut filled = 0;
    while filled < buf.len() {
        match r.read(&mut buf[filled..]) {
            Ok(0) => break,
            Ok(n) => filled += n,
            Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }
    }
    Ok(filled)
}

start_address is separate from the read position. When the user passes --skip 7, I want the displayed offsets to start at 00000007, not 00000000. For files I just seek the reader and set start_address = skip. For stdin (no seek!) I drain skip bytes from the reader first, then still set the start address to skip. Same displayed output, two different underlying mechanisms.

Rendering a row is the ugly-but-honest part — lots of small writes, all going through BufWriter so it's actually cheap:

pub(crate) fn render_line<W: Write>(
    writer: &mut W,
    row: &[u8],
    address: u64,
    cfg: &DumpConfig,
) -> std::io::Result<()> {
    let pal = &cfg.palette;

    if cfg.show_address {
        write!(writer, "{}{:08x}{}  ", pal.address(), address, pal.reset())?;
    }

    for i in 0..cfg.width {
        if cfg.group > 0 && i > 0 && i % cfg.group == 0 {
            write!(writer, " ")?;
        }
        if let Some(&b) = row.get(i) {
            let cat = Category::of(b);
            write!(writer, "{}", cat.color(pal))?;
            cfg.format.write_byte(writer, b)?;
            write!(writer, "{}", pal.reset())?;
        } else {
            for _ in 0..cfg.format.width() { write!(writer, " ")?; }
        }
        write!(writer, " ")?;
    }

    if cfg.show_ascii {
        write!(writer, " |")?;
        for &b in row {
            let cat = Category::of(b);
            let ch = if matches!(cat, Category::Printable) { b as char } else { '.' };
            write!(writer, "{}{}{}", cat.color(pal), ch, pal.reset())?;
        }
        write!(writer, "|")?;
    }

    writeln!(writer)?;
    Ok(())
}

There's one thing in here that bit me in a test and is worth calling out: short last lines must pad the byte column with spaces, because otherwise the ASCII panel drifts left on the last row and the whole dump looks broken. The test for this is one of the few cases where the obvious behaviour ("write what's there, move on") is wrong:

#[test]
fn short_last_line_keeps_column_alignment() {
    let cfg = DumpConfig::default();
    let out = dump_to_string(b"ABCDE", &cfg, u64::MAX);
    let line = out.lines().next().unwrap();
    let pipe_pos = line.find('|').unwrap();
    // 16 slots * 3 chars + group space + address "00000000  " + leading space = 60
    assert_eq!(pipe_pos, 60);
}

The color helper

Instead of pulling in a crate, I wrote a six-field struct that holds one boolean and maps color "roles" to either an ANSI escape or the empty string:

pub struct Palette { enabled: bool }

impl Palette {
    pub const fn new(enabled: bool) -> Self { Self { enabled } }

    #[inline]
    pub fn printable(&self) -> &'static str {
        if self.enabled { "\x1b[32m" } else { "" }
    }
    // ... null(), control(), high(), diff(), reset() ...
}

The important move is that enabled is baked in once at startup, not checked on every byte. The dump loop writes pal.printable() and gets back either a green escape or an empty string; either way, write! is happy. No per-byte branches. No feature gates. The compiler can't inline across dynamic strings, but each call is still one indirect load and one pointer comparison — noise next to the actual formatting.

And classifying a byte into a color category is a six-line match:

pub fn of(b: u8) -> Self {
    match b {
        0x00 => Self::Null,
        0x01..=0x1f | 0x7f => Self::Control,
        0x20..=0x7e => Self::Printable,
        _ => Self::High,
    }
}

The should_use_color decision is where std::io::IsTerminal earns its keep. It stabilised in Rust 1.70 and lets you do the right thing without the atty crate (which is unmaintained):

pub fn should_use_color(force_no_color: bool) -> bool {
    if force_no_color { return false; }
    if std::env::var_os("NO_COLOR").is_some_and(|v| !v.is_empty()) {
        return false;
    }
    std::io::stdout().is_terminal()
}

Three rules, in order: explicit --no-color wins; then the NO_COLOR convention; then the stdout TTY check. The "piped to a file" case is automatic — no color codes leak into your grep pipeline.

The diff mode

I wanted hexview --diff a.bin b.bin to tell me at a glance which bytes changed, not which structural regions. That's a deliberately simpler job than diff does on text: I do a positional byte comparison and highlight mismatches in red. If file B inserts one byte at the start, every downstream byte will show as changed. That matches cmp -l's behaviour and is the right level of cleverness for a hex viewer — users who want Myers diff already have diff or delta.

The core is pleasingly boring:

pub fn diff<R1, R2, W>(a: &mut R1, b: &mut R2, writer: &mut W, cfg: &DumpConfig)
    -> std::io::Result<()>
where R1: Read, R2: Read, W: Write,
{
    let mut a_buf = Vec::new();
    let mut b_buf = Vec::new();
    a.read_to_end(&mut a_buf)?;
    b.read_to_end(&mut b_buf)?;

    let max_len = a_buf.len().max(b_buf.len());
    let mut offset = 0;
    let mut addr = cfg.start_address;
    while offset < max_len {
        let end = (offset + cfg.width).min(max_len);
        let row_a = slice_or_empty(&a_buf, offset, end);
        let row_b = slice_or_empty(&b_buf, offset, end);
        // render side A, two spaces, render side B, newline
        ...
        offset = end;
        addr = addr.saturating_add(cfg.width as u64);
    }
    Ok(())
}

The rendering side is a small twist on the regular row renderer: for each byte position, I compare against "the other side" and pick the byte's normal category color, or the red "diff" color, or the blue "present on this side only" color:

match (self_row.get(i), other_row.get(i)) {
    (Some(&b), Some(&o)) if b == o => {
        write!(w, "{}", Category::of(b).color(pal))?;
        cfg.format.write_byte(w, b)?;
        write!(w, "{}", pal.reset())?;
    }
    (Some(&b), Some(_))  => write_highlighted(w, b, pal.diff(), cfg)?,
    (Some(&b), None)     => write_highlighted(w, b, pal.add(), cfg)?,
    (None, _)            => pad(w, cfg.format.width())?,
}

Loading both files entirely into memory is a tradeoff — fine for the files you'd actually use this on (configs, packet captures, binaries up to a few hundred MB), wrong for 10 GB cores. The alternative is keeping two seekable readers and interleaving their reads, which is more code and doesn't materially change the happy-path story. I picked simple.

Error handling: `Box<dyn Error>` is fine

For a CLI this small, pulling in anyhow to get nicer error messages is overkill. The entry point returns ExitCode, and I wrap fallible operations to get a helpful prefix:

fn main() -> ExitCode {
    let cli = match Cli::try_parse() {
        Ok(c) => c,
        Err(e) => {
            let _ = e.print();
            return match e.kind() {
                clap::error::ErrorKind::DisplayHelp
                | clap::error::ErrorKind::DisplayVersion => ExitCode::from(0),
                _ => ExitCode::from(2),
            };
        }
    };
    match run(cli) {
        Ok(()) => ExitCode::from(0),
        Err(e) => {
            eprintln!("hexview: {}", e);
            ExitCode::from(1)
        }
    }
}

Three exit codes (0 success, 1 I/O error, 2 bad args) with clap's own error path handling --help/--version and argument parsing failures. The tests assert these explicitly:

#[test]
fn bad_file_path_exits_one() { ... .failure().code(1); }
#[test]
fn bad_args_exit_two() { ... .failure().code(2); }

That's the whole error story. If hexview grew — say, if it got a plugin system, or a config file parser — I'd reach for anyhow on the spot. At this size, it's overhead.

Tests

Nineteen end-to-end tests via assert_cmd, plus unit tests on dump, color, and diff. assert_cmd is the thing that convinced me to write this post, honestly — it makes CLI testing so low-friction that there's no excuse not to cover every flag:

#[test]
fn skip_and_length() {
    let f = write_fixture("sl.bin", b"Hello, world!");
    Command::cargo_bin("hexview").unwrap()
        .arg(&f)
        .arg("--no-color")
        .arg("--skip").arg("7")
        .arg("--length").arg("5")
        .assert()
        .success()
        .stdout(predicate::str::contains("|world|"))
        .stdout(predicate::str::contains("00000007"));
}

That one test alone verifies: file open path, the --skip seek, the --length cap, the address display starting at the skip offset, and the ASCII panel on a short line. All in fifteen lines. The test suite runs in ~10 ms after the build.

Tradeoffs I'm shipping on purpose

No memory mapping for huge files. Plain BufReader<File> is correct for any size and portable. If you're using this on a 50 GB core dump, you'll wait longer. That's fine for v0.1.
Colors aren't configurable. Four hardcoded categories, one palette. Adding themes is more code than the whole rest of the program.
--format bin is per-byte. Real low-level debugging sometimes wants to see multi-byte integers in binary; I don't cover that. Know your endianness.
No --search pattern. If you need it, pipe through grep. (Because I disabled color on pipes, the grep output is clean.)
Diff is positional, not structural. On purpose, as discussed.

Each of these is a place I could add a flag and didn't, and the test suite and the binary are smaller because of it.

Try it in 30 seconds

docker build -t hexview https://github.com/sen-ltd/hexview.git
printf 'Hello, world!\x0a\x00\x01' > /tmp/x.bin
docker run --rm -v /tmp:/work hexview x.bin
docker run --rm -v /tmp:/work hexview x.bin --format bin --width 8

For diff mode:

printf 'differ'  > /tmp/a.bin
printf 'differ!' > /tmp/b.bin
docker run --rm -v /tmp:/work hexview --diff a.bin b.bin

The image is 9.4 MB. There's no live demo (it's a terminal tool), but the GitHub repo has the full source and a working Dockerfile.

Closing

Entry #137 in a 100+ portfolio series from SEN LLC. hexview is the first Rust entry in the sweep — the rest of the Rust entries will follow the same shape: one dependency if possible, hand-rolled where the stdlib is enough, stripped release profile, multi-stage Alpine Dockerfile, integration tests via assert_cmd. Rust punishes you for overreach and rewards you for minimalism; this is an attempt to respect that.

Feedback welcome.

DEV Community

hexview: writing a color-aware hex dump in Rust with one dependency

hexview: writing a color-aware hex dump in Rust with one dependency

The problem I actually wanted to fix

Dependencies: just clap

The dump loop

The color helper

The diff mode

Error handling: `Box<dyn Error>` is fine

Tests

Tradeoffs I'm shipping on purpose

Try it in 30 seconds

Closing

Top comments (0)

hexview: writing a color-aware hex dump in Rust with one dependency

The problem I actually wanted to fix

Dependencies: just clap

The dump loop

The color helper

The diff mode

Error handling: Box<dyn Error> is fine

Tests

Tradeoffs I'm shipping on purpose

Try it in 30 seconds

Closing

Error handling: `Box<dyn Error>` is fine