SEN LLC

Posted on Apr 16

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

#rust #cli #toml #formatter

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

A Rust CLI that parses TOML files and outputs consistently formatted versions with sorted keys, aligned = signs, and a --check mode for CI pipelines.

TOML files accumulate entropy. Every contributor has their own style preferences — some sort keys, some don't, some align equals signs, some leave random blank lines between sections. Over time, Cargo.toml, pyproject.toml, and config files drift into inconsistency that makes diffs noisy and reviews harder.

I built toml-fmt to solve this. It's a single-binary Rust CLI that reads any TOML file, parses it into a structured representation, and emits a consistently formatted version. Sorted keys, normalized quoting, consistent spacing, and a --check mode that returns exit code 1 when formatting would change — perfect for CI gates.

📦 GitHub: https://github.com/sen-ltd/toml-fmt

Why another formatter?

You might wonder why not use taplo or other existing TOML formatters. There are good reasons to build your own:

Learning exercise — parsing and emitting structured data teaches you about the format's edge cases
Opinionated defaults — toml-fmt sorts keys by default, which most formatters don't
Zero configuration — no config files, no plugins, just flags
Tiny binary — with release optimizations, the binary is compact enough for container images

The tool handles three modes of operation:

# Print formatted output to stdout
toml-fmt Cargo.toml

# Overwrite the file in place
toml-fmt --in-place Cargo.toml

# CI mode: exit 1 if the file would change
toml-fmt --check Cargo.toml

Architecture: Library + CLI Shell

The project splits into two files: lib.rs contains the pure formatting logic, and main.rs is a thin CLI wrapper. This separation matters — the formatting function is testable without filesystem or CLI concerns.

src/
├── lib.rs    # format_toml(input, opts) -> Result<String, Error>
└── main.rs   # CLI argument parsing and I/O

The core function signature:

pub fn format_toml(input: &str, opts: &Options) -> Result<String, Error>

It takes a string, returns a string. No file I/O, no side effects. The Options struct controls behavior:

pub struct Options {
    pub sort_keys: bool,  // Alphabetical ordering within tables
    pub align: bool,      // Align = signs within each section
}

Parsing with the `toml` crate

Rust's toml crate provides toml::Value, which represents any TOML value as an enum:

enum Value {
    String(String),
    Integer(i64),
    Float(f64),
    Boolean(bool),
    Datetime(Datetime),
    Array(Vec<Value>),
    Table(Map<String, Value>),
}

Parsing is a one-liner: input.parse::<toml::Value>(). The crate handles all the TOML spec edge cases — multiline strings, inline tables, dotted keys, arrays of tables. We get a clean tree structure to work with.

The trade-off: toml::Value doesn't preserve comments. A round-trip through parse-then-emit strips them. For a formatter, this is acceptable — comments are a separate concern, and preserving them would require toml_edit with significantly more complexity. If you need comment preservation, taplo is the better tool. toml-fmt is for projects that want deterministic, sorted output and can live without comment round-tripping.

The Emission Algorithm

The formatter walks the parsed tree recursively, handling four categories of content at each table level:

Simple key-value pairs — strings, integers, booleans, arrays of primitives
Sub-tables — nested [section.subsection] headers
Arrays of tables — [[section]] repeated entries
Inline tables — {key = val, ...} notation for small flat tables

The classification logic:

for key in &keys {
    let val = &table[key];
    match val {
        toml::Value::Table(_) => sub_tables.push((key, val)),
        toml::Value::Array(arr) if is_array_of_tables(arr) => {
            array_tables.push((key, arr));
        }
        _ => simple.push((key, val)),
    }
}

Simple pairs come first (they belong directly under the current [header]), then sub-tables recurse, then arrays of tables. This produces clean, predictable output where each section's own keys appear immediately after its header.

Key Sorting

When sort_keys is enabled (the default), keys within each table are collected into a Vec and sorted alphabetically before emission:

let keys: Vec<&String> = if opts.sort_keys {
    let mut ks: Vec<&String> = table.keys().collect();
    ks.sort();
    ks
} else {
    table.keys().collect()
};

This is the single most impactful formatting rule. Sorted keys make diffs cleaner because additions and removals are localized rather than scattered. When reviewing a Cargo.toml change that adds a dependency, the diff shows exactly one line insertion in alphabetical position, not a random placement that might be near the top or bottom.

Equals Sign Alignment

The --align flag pads key names so all = signs in a section line up:

if opts.align && !simple.is_empty() {
    let max_key_len = simple.iter()
        .map(|(k, _)| format_key(k).len())
        .max()
        .unwrap_or(0);
    for (key, val) in &simple {
        let fk = format_key(key);
        let padding = " ".repeat(max_key_len - fk.len());
        out.push_str(&format!("{fk}{padding} = {}\n", format_value(val)));
    }
}

Before:

[package]
name = "my-app"
version = "0.1.0"
description = "A longer description"
edition = "2021"

After --align:

[package]
description = "A longer description"
edition     = "2021"
name        = "my-app"
version     = "0.1.0"

Alignment is per-section, not global. Each [table] header resets the column width calculation. This prevents a long key in one section from causing excessive padding in another.

Key Formatting: Bare vs Quoted

TOML allows two key styles: bare (name) and quoted ("key with spaces"). The formatter normalizes this: bare keys whenever possible, quoted only when the key contains characters outside A-Za-z0-9_-:

fn format_key(key: &str) -> String {
    if key.is_empty() {
        return "\"\"".to_string();
    }
    if key.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_') {
        key.to_string()
    } else {
        format!("\"{}\"", escape_string(key))
    }
}

This handles edge cases like empty keys (quoted as "") and keys with dots, spaces, or unicode characters (always quoted with proper escaping).

Value Formatting

Each value type gets its own formatting rule:

Strings: Always double-quoted with proper escaping (\n, \t, \\, \")
Integers: Direct decimal representation
Floats: Decimal with guaranteed decimal point (e.g., 1.0 not 1)
Booleans: true / false
Datetimes: ISO 8601 as parsed
Arrays: [item1, item2, item3] on a single line
Inline tables: {key1 = val1, key2 = val2} with sorted keys

The float formatting deserves mention:

toml::Value::Float(f) => {
    if f.is_nan() { "nan".to_string() }
    else if f.is_infinite() {
        if f.is_sign_positive() { "inf".to_string() }
        else { "-inf".to_string() }
    } else {
        let s = format!("{f}");
        if s.contains('.') { s } else { format!("{s}.0") }
    }
}

TOML requires floats to be distinguishable from integers, so 1 must become 1.0. The special values nan, inf, and -inf are valid TOML float literals.

The CLI Shell

The CLI uses clap with derive macros for argument parsing:

#[derive(Parser)]
#[command(name = "toml-fmt", version, about = "A TOML formatter / pretty-printer")]
struct Cli {
    file: Option<String>,
    #[arg(long, default_value_t = true)]
    sort_keys: bool,
    #[arg(long)]
    no_sort_keys: bool,
    #[arg(long)]
    align: bool,
    #[arg(long, short = 'i')]
    in_place: bool,
    #[arg(long, short = 'c')]
    check: bool,
}

The main function returns ExitCode for proper exit status:

0 — success (or --check with no changes needed)
1 — --check detected formatting differences
2 — errors (parse failure, I/O error, invalid arguments)

Error messages use hand-rolled ANSI codes for coloring (\x1b[31m for red, \x1b[33m for yellow) — no extra crate needed for terminal colors.

Testing Strategy

The library has 24 tests covering:

Basic formatting — simple key-value pairs, booleans, integers, floats
Sorting — key ordering with sort_keys enabled/disabled
Alignment — = sign alignment within sections
Structure — nested tables, dotted paths, arrays of tables, inline tables
Edge cases — empty documents, empty arrays, empty strings, quoted keys
String escaping — newlines, special characters
Idempotency — formatting twice produces the same output
Error handling — invalid TOML input produces clear error messages
Real-world — actual Cargo.toml structure with dependencies and profiles

The idempotency test is critical for a formatter:

#[test]
fn idempotent() {
    let input = "[package]\nname = \"test\"\nversion = \"0.1.0\"\n\n[dependencies]\nclap = \"4\"\n";
    let first = fmt(input);
    let second = fmt(&first);
    assert_eq!(first, second, "formatting should be idempotent");
}

If formatting the output of a format operation changes anything, the tool is broken. Users would get different results depending on how many times they run the formatter, and CI checks would flip-flop.

Release Profile

The Cargo.toml uses an aggressive release profile:

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"

strip: Removes debug symbols from the binary
lto: Link-time optimization across all crates
codegen-units = 1: Single codegen unit for maximum optimization (slower compile, faster binary)
opt-level = "z": Optimize for size over speed
panic = "abort": No unwinding overhead

This produces a small, fast binary suitable for Docker images and CI environments.

Docker

The Dockerfile uses a multi-stage build:

FROM rust:1.90-alpine AS builder
RUN apk add --no-cache musl-dev
WORKDIR /src
COPY Cargo.toml Cargo.lock ./
COPY src/ src/
RUN cargo build --release --locked

FROM alpine:3.20
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /src/target/release/toml-fmt /usr/local/bin/toml-fmt
USER app
ENTRYPOINT ["toml-fmt"]

The builder stage compiles against musl for a fully static binary. The runtime image is bare Alpine with a non-root user. The final image is small — Alpine base plus one static binary.

CI Integration

The --check flag is designed for CI pipelines:

# GitHub Actions example
- name: Check TOML formatting
  run: |
    cargo install --path .
    toml-fmt --check Cargo.toml

Or in a pre-commit hook:

#!/bin/sh
for f in $(git diff --cached --name-only -- '*.toml'); do
    toml-fmt --check "$f" || exit 1
done

The exit code convention (0 = formatted, 1 = would change, 2 = error) follows the same pattern as rustfmt, black, and other formatters. This makes it easy to integrate into existing workflows.

Limitations and Trade-offs

No comment preservation. The toml crate's Value type discards comments during parsing. This is the biggest limitation. If your TOML files have important inline comments, toml-fmt will strip them. For comment-preserving formatting, use taplo or toml_edit.

Inline tables expand. The toml::Value type doesn't distinguish between dep = { version = "1" } and a full [dep] section. The formatter emits all tables as sections. This is valid TOML but may not match your visual preference for compact dependency declarations.

No partial formatting. The tool formats the entire file. There's no range-based formatting like rustfmt supports.

These are deliberate trade-offs. The goal was a small, fast, opinionated tool with predictable output. Comment preservation and format detection would roughly triple the codebase size.

What I Learned

Building a formatter is an exercise in understanding the output format deeply. TOML has more edge cases than you'd expect:

Bare vs quoted keys — keys with dots, spaces, or unicode need quoting
Float representation — 1 is an integer, 1.0 is a float, both look similar
Table vs inline table — semantically identical, syntactically different
Array of tables — [[section]] is fundamentally different from [section] with array values
String escaping — TOML supports \uXXXX for control characters

The recursive table emission was the most interesting part. TOML's hierarchical structure maps naturally to recursive descent, but handling the interleaving of simple keys, sub-tables, and arrays of tables requires careful ordering.

The toml crate does the heavy lifting for parsing. The formatter is roughly 200 lines of emission code plus tests. Rust's toml::Value enum gives you a clean tree to walk, and pattern matching makes the value formatting concise.

If you need a quick TOML formatter for your Rust projects or CI pipelines, give toml-fmt a try. It's one binary, no configuration, and it produces deterministic output every time.

DEV Community

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

Why another formatter?

Architecture: Library + CLI Shell

Parsing with the `toml` crate

The Emission Algorithm

Key Sorting

Equals Sign Alignment

Key Formatting: Bare vs Quoted

Value Formatting

The CLI Shell

Testing Strategy

Release Profile

Docker

CI Integration

Limitations and Trade-offs

What I Learned

Top comments (0)

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

Why another formatter?

Architecture: Library + CLI Shell

Parsing with the toml crate

The Emission Algorithm

Key Sorting

Equals Sign Alignment

Key Formatting: Bare vs Quoted

Value Formatting

The CLI Shell

Testing Strategy

Release Profile

Docker

CI Integration

Limitations and Trade-offs

What I Learned

Parsing with the `toml` crate