DEV Community

SEN LLC
SEN LLC

Posted on

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

Building a TOML Formatter in Rust — Sorted Keys, Aligned Equals, and CI-Friendly Check Mode

A Rust CLI that parses TOML files and outputs consistently formatted versions with sorted keys, aligned = signs, and a --check mode for CI pipelines.

TOML files accumulate entropy. Every contributor has their own style preferences — some sort keys, some don't, some align equals signs, some leave random blank lines between sections. Over time, Cargo.toml, pyproject.toml, and config files drift into inconsistency that makes diffs noisy and reviews harder.

I built toml-fmt to solve this. It's a single-binary Rust CLI that reads any TOML file, parses it into a structured representation, and emits a consistently formatted version. Sorted keys, normalized quoting, consistent spacing, and a --check mode that returns exit code 1 when formatting would change — perfect for CI gates.

📦 GitHub: https://github.com/sen-ltd/toml-fmt

Screenshot

Why another formatter?

You might wonder why not use taplo or other existing TOML formatters. There are good reasons to build your own:

  1. Learning exercise — parsing and emitting structured data teaches you about the format's edge cases
  2. Opinionated defaultstoml-fmt sorts keys by default, which most formatters don't
  3. Zero configuration — no config files, no plugins, just flags
  4. Tiny binary — with release optimizations, the binary is compact enough for container images

The tool handles three modes of operation:

# Print formatted output to stdout
toml-fmt Cargo.toml

# Overwrite the file in place
toml-fmt --in-place Cargo.toml

# CI mode: exit 1 if the file would change
toml-fmt --check Cargo.toml
Enter fullscreen mode Exit fullscreen mode

Architecture: Library + CLI Shell

The project splits into two files: lib.rs contains the pure formatting logic, and main.rs is a thin CLI wrapper. This separation matters — the formatting function is testable without filesystem or CLI concerns.

src/
├── lib.rs    # format_toml(input, opts) -> Result<String, Error>
└── main.rs   # CLI argument parsing and I/O
Enter fullscreen mode Exit fullscreen mode

The core function signature:

pub fn format_toml(input: &str, opts: &Options) -> Result<String, Error>
Enter fullscreen mode Exit fullscreen mode

It takes a string, returns a string. No file I/O, no side effects. The Options struct controls behavior:

pub struct Options {
    pub sort_keys: bool,  // Alphabetical ordering within tables
    pub align: bool,      // Align = signs within each section
}
Enter fullscreen mode Exit fullscreen mode

Parsing with the toml crate

Rust's toml crate provides toml::Value, which represents any TOML value as an enum:

enum Value {
    String(String),
    Integer(i64),
    Float(f64),
    Boolean(bool),
    Datetime(Datetime),
    Array(Vec<Value>),
    Table(Map<String, Value>),
}
Enter fullscreen mode Exit fullscreen mode

Parsing is a one-liner: input.parse::<toml::Value>(). The crate handles all the TOML spec edge cases — multiline strings, inline tables, dotted keys, arrays of tables. We get a clean tree structure to work with.

The trade-off: toml::Value doesn't preserve comments. A round-trip through parse-then-emit strips them. For a formatter, this is acceptable — comments are a separate concern, and preserving them would require toml_edit with significantly more complexity. If you need comment preservation, taplo is the better tool. toml-fmt is for projects that want deterministic, sorted output and can live without comment round-tripping.

The Emission Algorithm

The formatter walks the parsed tree recursively, handling four categories of content at each table level:

  1. Simple key-value pairs — strings, integers, booleans, arrays of primitives
  2. Sub-tables — nested [section.subsection] headers
  3. Arrays of tables[[section]] repeated entries
  4. Inline tables{key = val, ...} notation for small flat tables

The classification logic:

for key in &keys {
    let val = &table[key];
    match val {
        toml::Value::Table(_) => sub_tables.push((key, val)),
        toml::Value::Array(arr) if is_array_of_tables(arr) => {
            array_tables.push((key, arr));
        }
        _ => simple.push((key, val)),
    }
}
Enter fullscreen mode Exit fullscreen mode

Simple pairs come first (they belong directly under the current [header]), then sub-tables recurse, then arrays of tables. This produces clean, predictable output where each section's own keys appear immediately after its header.

Key Sorting

When sort_keys is enabled (the default), keys within each table are collected into a Vec and sorted alphabetically before emission:

let keys: Vec<&String> = if opts.sort_keys {
    let mut ks: Vec<&String> = table.keys().collect();
    ks.sort();
    ks
} else {
    table.keys().collect()
};
Enter fullscreen mode Exit fullscreen mode

This is the single most impactful formatting rule. Sorted keys make diffs cleaner because additions and removals are localized rather than scattered. When reviewing a Cargo.toml change that adds a dependency, the diff shows exactly one line insertion in alphabetical position, not a random placement that might be near the top or bottom.

Equals Sign Alignment

The --align flag pads key names so all = signs in a section line up:

if opts.align && !simple.is_empty() {
    let max_key_len = simple.iter()
        .map(|(k, _)| format_key(k).len())
        .max()
        .unwrap_or(0);
    for (key, val) in &simple {
        let fk = format_key(key);
        let padding = " ".repeat(max_key_len - fk.len());
        out.push_str(&format!("{fk}{padding} = {}\n", format_value(val)));
    }
}
Enter fullscreen mode Exit fullscreen mode

Before:

[package]
name = "my-app"
version = "0.1.0"
description = "A longer description"
edition = "2021"
Enter fullscreen mode Exit fullscreen mode

After --align:

[package]
description = "A longer description"
edition     = "2021"
name        = "my-app"
version     = "0.1.0"
Enter fullscreen mode Exit fullscreen mode

Alignment is per-section, not global. Each [table] header resets the column width calculation. This prevents a long key in one section from causing excessive padding in another.

Key Formatting: Bare vs Quoted

TOML allows two key styles: bare (name) and quoted ("key with spaces"). The formatter normalizes this: bare keys whenever possible, quoted only when the key contains characters outside A-Za-z0-9_-:

fn format_key(key: &str) -> String {
    if key.is_empty() {
        return "\"\"".to_string();
    }
    if key.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_') {
        key.to_string()
    } else {
        format!("\"{}\"", escape_string(key))
    }
}
Enter fullscreen mode Exit fullscreen mode

This handles edge cases like empty keys (quoted as "") and keys with dots, spaces, or unicode characters (always quoted with proper escaping).

Value Formatting

Each value type gets its own formatting rule:

  • Strings: Always double-quoted with proper escaping (\n, \t, \\, \")
  • Integers: Direct decimal representation
  • Floats: Decimal with guaranteed decimal point (e.g., 1.0 not 1)
  • Booleans: true / false
  • Datetimes: ISO 8601 as parsed
  • Arrays: [item1, item2, item3] on a single line
  • Inline tables: {key1 = val1, key2 = val2} with sorted keys

The float formatting deserves mention:

toml::Value::Float(f) => {
    if f.is_nan() { "nan".to_string() }
    else if f.is_infinite() {
        if f.is_sign_positive() { "inf".to_string() }
        else { "-inf".to_string() }
    } else {
        let s = format!("{f}");
        if s.contains('.') { s } else { format!("{s}.0") }
    }
}
Enter fullscreen mode Exit fullscreen mode

TOML requires floats to be distinguishable from integers, so 1 must become 1.0. The special values nan, inf, and -inf are valid TOML float literals.

The CLI Shell

The CLI uses clap with derive macros for argument parsing:

#[derive(Parser)]
#[command(name = "toml-fmt", version, about = "A TOML formatter / pretty-printer")]
struct Cli {
    file: Option<String>,
    #[arg(long, default_value_t = true)]
    sort_keys: bool,
    #[arg(long)]
    no_sort_keys: bool,
    #[arg(long)]
    align: bool,
    #[arg(long, short = 'i')]
    in_place: bool,
    #[arg(long, short = 'c')]
    check: bool,
}
Enter fullscreen mode Exit fullscreen mode

The main function returns ExitCode for proper exit status:

  • 0 — success (or --check with no changes needed)
  • 1--check detected formatting differences
  • 2 — errors (parse failure, I/O error, invalid arguments)

Error messages use hand-rolled ANSI codes for coloring (\x1b[31m for red, \x1b[33m for yellow) — no extra crate needed for terminal colors.

Testing Strategy

The library has 24 tests covering:

  1. Basic formatting — simple key-value pairs, booleans, integers, floats
  2. Sorting — key ordering with sort_keys enabled/disabled
  3. Alignment= sign alignment within sections
  4. Structure — nested tables, dotted paths, arrays of tables, inline tables
  5. Edge cases — empty documents, empty arrays, empty strings, quoted keys
  6. String escaping — newlines, special characters
  7. Idempotency — formatting twice produces the same output
  8. Error handling — invalid TOML input produces clear error messages
  9. Real-world — actual Cargo.toml structure with dependencies and profiles

The idempotency test is critical for a formatter:

#[test]
fn idempotent() {
    let input = "[package]\nname = \"test\"\nversion = \"0.1.0\"\n\n[dependencies]\nclap = \"4\"\n";
    let first = fmt(input);
    let second = fmt(&first);
    assert_eq!(first, second, "formatting should be idempotent");
}
Enter fullscreen mode Exit fullscreen mode

If formatting the output of a format operation changes anything, the tool is broken. Users would get different results depending on how many times they run the formatter, and CI checks would flip-flop.

Release Profile

The Cargo.toml uses an aggressive release profile:

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"
Enter fullscreen mode Exit fullscreen mode
  • strip: Removes debug symbols from the binary
  • lto: Link-time optimization across all crates
  • codegen-units = 1: Single codegen unit for maximum optimization (slower compile, faster binary)
  • opt-level = "z": Optimize for size over speed
  • panic = "abort": No unwinding overhead

This produces a small, fast binary suitable for Docker images and CI environments.

Docker

The Dockerfile uses a multi-stage build:

FROM rust:1.90-alpine AS builder
RUN apk add --no-cache musl-dev
WORKDIR /src
COPY Cargo.toml Cargo.lock ./
COPY src/ src/
RUN cargo build --release --locked

FROM alpine:3.20
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /src/target/release/toml-fmt /usr/local/bin/toml-fmt
USER app
ENTRYPOINT ["toml-fmt"]
Enter fullscreen mode Exit fullscreen mode

The builder stage compiles against musl for a fully static binary. The runtime image is bare Alpine with a non-root user. The final image is small — Alpine base plus one static binary.

CI Integration

The --check flag is designed for CI pipelines:

# GitHub Actions example
- name: Check TOML formatting
  run: |
    cargo install --path .
    toml-fmt --check Cargo.toml
Enter fullscreen mode Exit fullscreen mode

Or in a pre-commit hook:

#!/bin/sh
for f in $(git diff --cached --name-only -- '*.toml'); do
    toml-fmt --check "$f" || exit 1
done
Enter fullscreen mode Exit fullscreen mode

The exit code convention (0 = formatted, 1 = would change, 2 = error) follows the same pattern as rustfmt, black, and other formatters. This makes it easy to integrate into existing workflows.

Limitations and Trade-offs

No comment preservation. The toml crate's Value type discards comments during parsing. This is the biggest limitation. If your TOML files have important inline comments, toml-fmt will strip them. For comment-preserving formatting, use taplo or toml_edit.

Inline tables expand. The toml::Value type doesn't distinguish between dep = { version = "1" } and a full [dep] section. The formatter emits all tables as sections. This is valid TOML but may not match your visual preference for compact dependency declarations.

No partial formatting. The tool formats the entire file. There's no range-based formatting like rustfmt supports.

These are deliberate trade-offs. The goal was a small, fast, opinionated tool with predictable output. Comment preservation and format detection would roughly triple the codebase size.

What I Learned

Building a formatter is an exercise in understanding the output format deeply. TOML has more edge cases than you'd expect:

  • Bare vs quoted keys — keys with dots, spaces, or unicode need quoting
  • Float representation1 is an integer, 1.0 is a float, both look similar
  • Table vs inline table — semantically identical, syntactically different
  • Array of tables[[section]] is fundamentally different from [section] with array values
  • String escaping — TOML supports \uXXXX for control characters

The recursive table emission was the most interesting part. TOML's hierarchical structure maps naturally to recursive descent, but handling the interleaving of simple keys, sub-tables, and arrays of tables requires careful ordering.

The toml crate does the heavy lifting for parsing. The formatter is roughly 200 lines of emission code plus tests. Rust's toml::Value enum gives you a clean tree to walk, and pattern matching makes the value formatting concise.

If you need a quick TOML formatter for your Rust projects or CI pipelines, give toml-fmt a try. It's one binary, no configuration, and it produces deterministic output every time.

Top comments (0)