hexview: writing a color-aware hex dump in Rust with one dependency
A tiny Rust CLI that prints the kind of hex dump you actually want to read — green printable ASCII, yellow control bytes, cyan high bytes, dim nulls, and a side-by-side diff mode. One real dependency (
clap), hand-rolled ANSI, ~500 lines of source. Ships as a 9 MB Alpine image.
Every engineer reaches for xxd or hexdump -C a few times a month — debugging a protocol, staring at a binary file format, confirming what's actually at offset 0x42. And every time, the output is a dense wall of gray hex that your eyes have to decode by hand. xxd works. It has worked since forever. But it's not ergonomic, and its flags are a tiny cryptic language (-s, -l, -c, -g, -b, -e…) that I re-Google every single time.
hexview is my answer. It's a small Rust CLI that does what xxd -C does, plus:
- color-highlights printable ASCII, control chars, nulls, and high bytes into four distinct visual groups;
- has a
--diffmode that prints two files side by side with mismatched bytes in red; - auto-disables color when piped (via
std::io::IsTerminal+ honoringNO_COLOR); - takes a single real dependency (
clap) — the color is hand-rolled ANSI.
🔗 GitHub: https://github.com/sen-ltd/hexview
The rest of this post is a tour of the interesting bits. hexview is also what I'd hand someone as a starter Rust project: it exercises clap, error handling with Box<dyn Error>, buffered I/O, IsTerminal, a little bit of binary formatting, and integration tests with assert_cmd — all without pulling in a pile of crates.
The problem I actually wanted to fix
Here is xxd on a short HTTP request:
00000000: 4745 5420 2f61 7069 2f75 7365 7273 2048 GET /api/users H
00000010: 5454 502f 312e 310d 0a48 6f73 743a 2073 TTP/1.1..Host: s
Here is hexview on the same bytes:
00000000 47 45 54 20 2f 61 70 69 2f 75 73 65 72 73 20 48 |GET /api/users H|
00000010 54 54 50 2f 31 2e 31 0d 0a 48 6f 73 74 3a 20 73 |TTP/1.1..Host: s|
The obvious differences:
-
One byte per "word" instead of two.
xxdpacks bytes in pairs (4745), which is mildly useful for little-endian hex but makes it awkward to find a specific byte position by eye.hexdump -Calready does this andhexviewfollows it. - Group separator in the middle. After 8 bytes there's an extra space. You can find position 12 at a glance.
-
ASCII panel with pipes. Again, exactly what
hexdump -Cdoes. -
Colors, which
xxdandhexdumpdon't do at all, even though every terminal on earth has supported them since 1985.
And on top of that, hexview adds knobs I actually want: --format bin to see bytes as binary (for flag registers), --diff for comparing two files positionally, and --skip/--length with sane short names (-s and -n) that I can remember without the man page.
That's the whole pitch. Now let's build it.
Dependencies: just clap
My Cargo.toml is boring on purpose:
[dependencies]
clap = { version = "4", features = ["derive"] }
[dev-dependencies]
assert_cmd = "2"
predicates = "3"
No anyhow. No thiserror. No colored or owo-colors. No crossterm. For a CLI this small, Box<dyn Error> is enough, and the handful of ANSI escape codes I need fit on half a screen. Rust's standard library covers everything else — std::io::BufReader, std::fs::File, std::io::IsTerminal. The release profile is aggressive:
[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"
With those flags the Alpine-musl binary compresses down to roughly 900 KB, and the final Docker image is 9.4 MB with a non-root hex user and a /work volume ready to go.
The dump loop
The core is ~50 lines. It reads one line's worth of bytes at a time into a reusable buffer, then hands that buffer to a row renderer:
pub fn dump<R: Read, W: Write>(
reader: &mut R,
writer: &mut W,
cfg: &DumpConfig,
limit: u64,
) -> std::io::Result<()> {
let mut buf = vec![0u8; cfg.width];
let mut address = cfg.start_address;
let mut remaining = limit;
loop {
if remaining == 0 { break; }
let want = cfg.width.min(remaining as usize);
let n = read_full(reader, &mut buf[..want])?;
if n == 0 { break; }
render_line(writer, &buf[..n], address, cfg)?;
address = address.saturating_add(n as u64);
remaining = remaining.saturating_sub(n as u64);
if n < want { break; }
}
Ok(())
}
Two details that matter:
-
read_fullis notread_exact.read_exacterrors on a short read; for the last line of a file, a short read is the expected case, not a failure. So I wrote a tiny helper that loops until the buffer is full or we hit EOF, ignoringInterrupted:
fn read_full<R: Read>(r: &mut R, buf: &mut [u8]) -> std::io::Result<usize> { let mut filled = 0; while filled < buf.len() { match r.read(&mut buf[filled..]) { Ok(0) => break, Ok(n) => filled += n, Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, Err(e) => return Err(e), } } Ok(filled) } start_addressis separate from the read position. When the user passes--skip 7, I want the displayed offsets to start at00000007, not00000000. For files I justseekthe reader and setstart_address = skip. For stdin (no seek!) I drainskipbytes from the reader first, then still set the start address toskip. Same displayed output, two different underlying mechanisms.
Rendering a row is the ugly-but-honest part — lots of small writes, all going through BufWriter so it's actually cheap:
pub(crate) fn render_line<W: Write>(
writer: &mut W,
row: &[u8],
address: u64,
cfg: &DumpConfig,
) -> std::io::Result<()> {
let pal = &cfg.palette;
if cfg.show_address {
write!(writer, "{}{:08x}{} ", pal.address(), address, pal.reset())?;
}
for i in 0..cfg.width {
if cfg.group > 0 && i > 0 && i % cfg.group == 0 {
write!(writer, " ")?;
}
if let Some(&b) = row.get(i) {
let cat = Category::of(b);
write!(writer, "{}", cat.color(pal))?;
cfg.format.write_byte(writer, b)?;
write!(writer, "{}", pal.reset())?;
} else {
for _ in 0..cfg.format.width() { write!(writer, " ")?; }
}
write!(writer, " ")?;
}
if cfg.show_ascii {
write!(writer, " |")?;
for &b in row {
let cat = Category::of(b);
let ch = if matches!(cat, Category::Printable) { b as char } else { '.' };
write!(writer, "{}{}{}", cat.color(pal), ch, pal.reset())?;
}
write!(writer, "|")?;
}
writeln!(writer)?;
Ok(())
}
There's one thing in here that bit me in a test and is worth calling out: short last lines must pad the byte column with spaces, because otherwise the ASCII panel drifts left on the last row and the whole dump looks broken. The test for this is one of the few cases where the obvious behaviour ("write what's there, move on") is wrong:
#[test]
fn short_last_line_keeps_column_alignment() {
let cfg = DumpConfig::default();
let out = dump_to_string(b"ABCDE", &cfg, u64::MAX);
let line = out.lines().next().unwrap();
let pipe_pos = line.find('|').unwrap();
// 16 slots * 3 chars + group space + address "00000000 " + leading space = 60
assert_eq!(pipe_pos, 60);
}
The color helper
Instead of pulling in a crate, I wrote a six-field struct that holds one boolean and maps color "roles" to either an ANSI escape or the empty string:
pub struct Palette { enabled: bool }
impl Palette {
pub const fn new(enabled: bool) -> Self { Self { enabled } }
#[inline]
pub fn printable(&self) -> &'static str {
if self.enabled { "\x1b[32m" } else { "" }
}
// ... null(), control(), high(), diff(), reset() ...
}
The important move is that enabled is baked in once at startup, not checked on every byte. The dump loop writes pal.printable() and gets back either a green escape or an empty string; either way, write! is happy. No per-byte branches. No feature gates. The compiler can't inline across dynamic strings, but each call is still one indirect load and one pointer comparison — noise next to the actual formatting.
And classifying a byte into a color category is a six-line match:
pub fn of(b: u8) -> Self {
match b {
0x00 => Self::Null,
0x01..=0x1f | 0x7f => Self::Control,
0x20..=0x7e => Self::Printable,
_ => Self::High,
}
}
The should_use_color decision is where std::io::IsTerminal earns its keep. It stabilised in Rust 1.70 and lets you do the right thing without the atty crate (which is unmaintained):
pub fn should_use_color(force_no_color: bool) -> bool {
if force_no_color { return false; }
if std::env::var_os("NO_COLOR").is_some_and(|v| !v.is_empty()) {
return false;
}
std::io::stdout().is_terminal()
}
Three rules, in order: explicit --no-color wins; then the NO_COLOR convention; then the stdout TTY check. The "piped to a file" case is automatic — no color codes leak into your grep pipeline.
The diff mode
I wanted hexview --diff a.bin b.bin to tell me at a glance which bytes changed, not which structural regions. That's a deliberately simpler job than diff does on text: I do a positional byte comparison and highlight mismatches in red. If file B inserts one byte at the start, every downstream byte will show as changed. That matches cmp -l's behaviour and is the right level of cleverness for a hex viewer — users who want Myers diff already have diff or delta.
The core is pleasingly boring:
pub fn diff<R1, R2, W>(a: &mut R1, b: &mut R2, writer: &mut W, cfg: &DumpConfig)
-> std::io::Result<()>
where R1: Read, R2: Read, W: Write,
{
let mut a_buf = Vec::new();
let mut b_buf = Vec::new();
a.read_to_end(&mut a_buf)?;
b.read_to_end(&mut b_buf)?;
let max_len = a_buf.len().max(b_buf.len());
let mut offset = 0;
let mut addr = cfg.start_address;
while offset < max_len {
let end = (offset + cfg.width).min(max_len);
let row_a = slice_or_empty(&a_buf, offset, end);
let row_b = slice_or_empty(&b_buf, offset, end);
// render side A, two spaces, render side B, newline
...
offset = end;
addr = addr.saturating_add(cfg.width as u64);
}
Ok(())
}
The rendering side is a small twist on the regular row renderer: for each byte position, I compare against "the other side" and pick the byte's normal category color, or the red "diff" color, or the blue "present on this side only" color:
match (self_row.get(i), other_row.get(i)) {
(Some(&b), Some(&o)) if b == o => {
write!(w, "{}", Category::of(b).color(pal))?;
cfg.format.write_byte(w, b)?;
write!(w, "{}", pal.reset())?;
}
(Some(&b), Some(_)) => write_highlighted(w, b, pal.diff(), cfg)?,
(Some(&b), None) => write_highlighted(w, b, pal.add(), cfg)?,
(None, _) => pad(w, cfg.format.width())?,
}
Loading both files entirely into memory is a tradeoff — fine for the files you'd actually use this on (configs, packet captures, binaries up to a few hundred MB), wrong for 10 GB cores. The alternative is keeping two seekable readers and interleaving their reads, which is more code and doesn't materially change the happy-path story. I picked simple.
Error handling: Box<dyn Error> is fine
For a CLI this small, pulling in anyhow to get nicer error messages is overkill. The entry point returns ExitCode, and I wrap fallible operations to get a helpful prefix:
fn main() -> ExitCode {
let cli = match Cli::try_parse() {
Ok(c) => c,
Err(e) => {
let _ = e.print();
return match e.kind() {
clap::error::ErrorKind::DisplayHelp
| clap::error::ErrorKind::DisplayVersion => ExitCode::from(0),
_ => ExitCode::from(2),
};
}
};
match run(cli) {
Ok(()) => ExitCode::from(0),
Err(e) => {
eprintln!("hexview: {}", e);
ExitCode::from(1)
}
}
}
Three exit codes (0 success, 1 I/O error, 2 bad args) with clap's own error path handling --help/--version and argument parsing failures. The tests assert these explicitly:
#[test]
fn bad_file_path_exits_one() { ... .failure().code(1); }
#[test]
fn bad_args_exit_two() { ... .failure().code(2); }
That's the whole error story. If hexview grew — say, if it got a plugin system, or a config file parser — I'd reach for anyhow on the spot. At this size, it's overhead.
Tests
Nineteen end-to-end tests via assert_cmd, plus unit tests on dump, color, and diff. assert_cmd is the thing that convinced me to write this post, honestly — it makes CLI testing so low-friction that there's no excuse not to cover every flag:
#[test]
fn skip_and_length() {
let f = write_fixture("sl.bin", b"Hello, world!");
Command::cargo_bin("hexview").unwrap()
.arg(&f)
.arg("--no-color")
.arg("--skip").arg("7")
.arg("--length").arg("5")
.assert()
.success()
.stdout(predicate::str::contains("|world|"))
.stdout(predicate::str::contains("00000007"));
}
That one test alone verifies: file open path, the --skip seek, the --length cap, the address display starting at the skip offset, and the ASCII panel on a short line. All in fifteen lines. The test suite runs in ~10 ms after the build.
Tradeoffs I'm shipping on purpose
-
No memory mapping for huge files. Plain
BufReader<File>is correct for any size and portable. If you're using this on a 50 GB core dump, you'll wait longer. That's fine for v0.1. - Colors aren't configurable. Four hardcoded categories, one palette. Adding themes is more code than the whole rest of the program.
-
--format binis per-byte. Real low-level debugging sometimes wants to see multi-byte integers in binary; I don't cover that. Know your endianness. -
No
--searchpattern. If you need it, pipe throughgrep. (Because I disabled color on pipes, the grep output is clean.) - Diff is positional, not structural. On purpose, as discussed.
Each of these is a place I could add a flag and didn't, and the test suite and the binary are smaller because of it.
Try it in 30 seconds
docker build -t hexview https://github.com/sen-ltd/hexview.git
printf 'Hello, world!\x0a\x00\x01' > /tmp/x.bin
docker run --rm -v /tmp:/work hexview x.bin
docker run --rm -v /tmp:/work hexview x.bin --format bin --width 8
For diff mode:
printf 'differ' > /tmp/a.bin
printf 'differ!' > /tmp/b.bin
docker run --rm -v /tmp:/work hexview --diff a.bin b.bin
The image is 9.4 MB. There's no live demo (it's a terminal tool), but the GitHub repo has the full source and a working Dockerfile.
Closing
Entry #137 in a 100+ portfolio series from SEN LLC. hexview is the first Rust entry in the sweep — the rest of the Rust entries will follow the same shape: one dependency if possible, hand-rolled where the stdlib is enough, stripped release profile, multi-stage Alpine Dockerfile, integration tests via assert_cmd. Rust punishes you for overreach and rewards you for minimalism; this is an attempt to respect that.
Feedback welcome.

Top comments (0)