SEN LLC

Posted on Apr 15

I Built jq for TOML Because My Shell Scripts Deserved Better

#rust #cli #toml #tutorial

I Built jq for TOML Because My Shell Scripts Deserved Better

A small Rust CLI that queries, edits, and reformats TOML files. Four output formats, in-place --edit, shell-friendly exit codes. cargo install and alias it next to your existing jq.

📦 GitHub: https://github.com/sen-ltd/toml-query

jq has been a permanent fixture in my shell for years. If your CI pipeline touches JSON — reading a version out of package.json, conditionally deploying based on a flag, diffing two API responses — jq is right there. It's so ubiquitous that I genuinely forget it isn't a POSIX utility.

But TOML? TOML is the config format of half my stack. Cargo.toml, pyproject.toml, Hugo's config.toml, netlify.toml, rustfmt.toml, .cargo/config.toml, every Rust workspace manifest I touch. And until recently my shell scripts handled it like this:

VERSION=$(grep '^version' Cargo.toml | head -1 | cut -d'"' -f2)

…which works, until it doesn't. What if there are two version = lines because a dependency is pinned? What if someone reformats the file? What if the field I want is nested under [package.metadata.docs.rs]? Regex-on-TOML is a trap.

So I wrote toml-query: a tiny Rust CLI that does for TOML what jq does for JSON. Path queries, array indexing, multiple output formats, in-place edits, shell-friendly exit codes. Four runtime deps. Ships as a 10 MB Alpine Docker image.

This post walks through the design: the path parser, the query walker, the --edit semantics, and the trade-offs I made to keep it a one-evening project.

The problem, concretely

Here's what I wanted to write in shell scripts:

# Get a nested value
VERSION=$(toml-query Cargo.toml dependencies.serde.version)

# List keys in a table
for dep in $(toml-query Cargo.toml --keys dependencies); do
    echo "$dep"
done

# Guard on a flag
if toml-query Cargo.toml --exists package.metadata.docs.rs; then
    cargo doc --no-deps
fi

# Pipe into jq for transformations
toml-query Cargo.toml dependencies --format json | jq 'keys | length'

# Bump a version in CI
toml-query Cargo.toml package.version --edit "$NEW_VERSION"

None of those are possible with stock tools. There is dasel, which is great and genuinely does this, but it's a polyglot with ambitions for JSON, YAML, TOML, XML, and CSV all at once. I wanted something small, TOML-first, and dependent on the official toml crate so I never have to worry about syntax drift.

Design: three small modules

The whole thing fits in four Rust files plus integration tests.

src/
├── main.rs    # CLI dispatch + exit codes
├── cli.rs     # clap Parser derive
├── path.rs    # "a.b[2].c" → [Key("a"), Key("b"), Index(2), Key("c")]
├── query.rs   # walk a toml::Value by path; get/set/exists/keys/length
└── format.rs  # render raw/json/toml output; parse --edit RHS

Each module is independently unit-testable because none of them touch I/O. main.rs is the only file that knows about fs::read_to_string and exit codes.

The path parser

jq uses a rich path language — .foo[0].bar | select(.x > 3) — and I deliberately didn't follow it. For scripting, 95% of what you want is dotted keys plus array indexing. That's what I support:

package.name
dependencies.serde.version
bin[0].name
workspace.members[2]
matrix[1][2]

Here's the parser:

pub enum Segment {
    Key(String),
    Index(usize),
}

pub fn parse(input: &str) -> Result<Vec<Segment>, PathError> {
    let mut out = Vec::new();
    if input.is_empty() {
        return Ok(out);
    }

    let bytes = input.as_bytes();
    let mut i = 0usize;
    let mut expecting_key = true;

    while i < bytes.len() {
        if expecting_key {
            let start = i;
            while i < bytes.len() && bytes[i] != b'.' && bytes[i] != b'[' {
                i += 1;
            }
            if start == i {
                return Err(err(format!("empty key at position {}", start)));
            }
            let key = std::str::from_utf8(&bytes[start..i])?.to_string();
            out.push(Segment::Key(key));
            expecting_key = false;
            continue;
        }

        match bytes[i] {
            b'.' => {
                i += 1;
                expecting_key = true;
            }
            b'[' => {
                i += 1;
                let start = i;
                while i < bytes.len() && bytes[i].is_ascii_digit() {
                    i += 1;
                }
                if i >= bytes.len() || bytes[i] != b']' {
                    return Err(err("missing ']'"));
                }
                let digits = std::str::from_utf8(&bytes[start..i]).unwrap();
                let n: usize = digits.parse()?;
                out.push(Segment::Index(n));
                i += 1;
            }
            other => return Err(err(format!("unexpected '{}' at {}", other as char, i))),
        }
    }

    if expecting_key {
        return Err(err("trailing '.'"));
    }
    Ok(out)
}

A couple of things I learned:

State machines are still the right answer for tiny parsers. The expecting_key flag is the whole state of the parser. No peek, no lookahead, no nom dependency. It's 40 lines and fits in your head.
Array indices after a key, not as a key. bin[0].name is three segments: Key("bin"), Index(0), Key("name"). Not two segments Key("bin[0]") then Key("name"). The latter is tempting because you can split on . and forget about brackets, but it pushes every consumer of the parser to re-parse.
I deliberately don't support quoted keys. TOML lets you write "weird.key with dots" = 1, and that's a real feature, but the moment you support it in the path syntax, you need escaping rules, and suddenly the parser is 200 lines. For v0.1, dotted bare keys cover every real Cargo.toml I've ever seen.

The query walker

Given a parsed path and a toml::Value, walk the tree:

pub fn get<'a>(root: &'a Value, segments: &[Segment]) -> Result<&'a Value, QueryError> {
    let mut cur = root;
    for (i, seg) in segments.iter().enumerate() {
        match seg {
            Segment::Key(k) => {
                let table = cur.as_table().ok_or_else(|| {
                    QueryError::TypeMismatch(format!(
                        "segment {} (key '{}'): parent is not a table", i, k
                    ))
                })?;
                cur = table.get(k).ok_or_else(||
                    QueryError::NotFound(format!("key '{}'", k)))?;
            }
            Segment::Index(n) => {
                let arr = cur.as_array().ok_or_else(|| {
                    QueryError::TypeMismatch(format!(
                        "segment {} ([{}]): parent is not an array", i, n
                    ))
                })?;
                cur = arr.get(*n).ok_or_else(||
                    QueryError::NotFound(format!("index {}", n)))?;
            }
        }
    }
    Ok(cur)
}

The error type distinguishes NotFound from TypeMismatch because they should map to different exit codes. Not-found is a normal result (exit 1, like grep -q). Type mismatch — e.g. package.name.inner when name is a string — is a programming error on the caller's part (exit 2).

Type-preserving output

TOML has a richer type system than JSON: integers, floats, bools, strings, local and offset datetimes, arrays, and tables. When you query a leaf, what should the output be?

jq's answer for JSON is: always JSON. jq '.name' package.json gives you "my-app" — with the quotes. To strip the quotes you add -r.

I went the other way. The default output format is raw, which means scalars print bare:

$ toml-query Cargo.toml package.name
example

$ toml-query Cargo.toml package.version
0.1.0

$ toml-query Cargo.toml package.edition
2021

This is because shell capture almost always wants the bare value:

VERSION=$(toml-query Cargo.toml package.version)
# VERSION=0.1.0, not "\"0.1.0\""

For composite values (tables and arrays), raw falls back to compact JSON because you can't meaningfully "print a table bare" in one line. If you want valid JSON for everything, pass --format json. If you want TOML back out (for piping into another tool or re-serializing a sub-tree), pass --format toml.

The toml::Value → serde_json::Value conversion is about 15 lines of recursive pattern matching:

pub fn toml_to_json(value: &TomlValue) -> JsonValue {
    match value {
        TomlValue::String(s) => JsonValue::String(s.clone()),
        TomlValue::Integer(i) => JsonValue::Number((*i).into()),
        TomlValue::Float(f) => serde_json::Number::from_f64(*f)
            .map(JsonValue::Number)
            .unwrap_or(JsonValue::Null),
        TomlValue::Boolean(b) => JsonValue::Bool(*b),
        TomlValue::Datetime(d) => JsonValue::String(d.to_string()),
        TomlValue::Array(a) => JsonValue::Array(a.iter().map(toml_to_json).collect()),
        TomlValue::Table(t) => {
            let mut obj = serde_json::Map::new();
            for (k, v) in t {
                obj.insert(k.clone(), toml_to_json(v));
            }
            JsonValue::Object(obj)
        }
    }
}

Datetimes become strings — JSON has no datetime type, so the only lossless option is the TOML textual form. NaN and Infinity floats become JSON null, because again JSON can't represent them. Those are the only lossy spots.

`--edit`: the in-place write

The --edit flag is where it gets interesting, because you're round-tripping a file the user wrote through a parser and an emitter. Here's the full write path:

if let Some(edit_raw) = &cli.edit {
    let mut doc: toml::Value = input.parse()?;
    let new_val = parse_edit_value(edit_raw, ty)?;
    query::set(&mut doc, &segments, new_val)?;
    let serialized = toml::to_string(&doc)?;
    fs::write(&cli.file, serialized)?;
    return Ok(ExitCode::SUCCESS);
}

Four lines of real logic plus I/O. The subtleties are hiding in query::set:

pub fn set(root: &mut Value, segments: &[Segment], new_value: Value)
    -> Result<(), QueryError>
{
    if segments.is_empty() {
        *root = new_value;
        return Ok(());
    }

    let mut cur = root;
    for (i, seg) in segments.iter().enumerate() {
        let is_last = i == segments.len() - 1;
        match seg {
            Segment::Key(k) => {
                let table = cur.as_table_mut().ok_or_else(/* type mismatch */)?;
                if is_last {
                    table.insert(k.clone(), new_value);
                    return Ok(());
                }
                // Auto-create intermediate table if missing.
                if !table.contains_key(k) {
                    table.insert(k.clone(), Value::Table(toml::value::Table::new()));
                }
                cur = table.get_mut(k).unwrap();
            }
            Segment::Index(n) => {
                let arr = cur.as_array_mut().ok_or_else(/* type mismatch */)?;
                if *n >= arr.len() {
                    return Err(QueryError::NotFound(format!("index {}", n)));
                }
                if is_last {
                    arr[*n] = new_value;
                    return Ok(());
                }
                cur = &mut arr[*n];
            }
        }
    }
    Ok(())
}

Design choices:

Intermediate tables are auto-created. If you run toml-query Cargo.toml profile.release.opt-level --edit z on a file with no [profile.release] section, you get one. This is the path of least surprise for scripts.
Intermediate arrays are NOT auto-extended. If bin has length 2 and you try to edit bin[5].name, you get an error, not four empty entries. Index-out-of-range is almost always a typo.
Typed edits. The default --type is string, because that's what you want 90% of the time (package.version, package.name). But --type int, --type float, --type bool, and --type json let you write integers, floats, booleans, and arbitrary JSON structures. The last one is crucial for e.g. appending to a feature list: --type json --edit '["default", "async"]'.

Trade-offs I chose

No streaming. The whole TOML file is parsed into a toml::Value tree before any query runs. For Cargo.toml files this is irrelevant (they're always <10 KB). For enormous generated TOML — if that exists — this would be wasteful. I don't think it exists.

--edit reformats whitespace. This is the big one. When the toml crate round-trips a document, it preserves semantic content perfectly but does not always preserve the exact whitespace, comment placement, or quote style of the original. If your Cargo.toml had a comment between [dependencies] and serde = ..., it might end up somewhere slightly different after an edit. If you used single quotes and the emitter prefers double, you'll get double.

This is a real limitation for version-bump-in-CI workflows where you want a minimal diff. The mitigations are:

There's a separate crate, toml_edit, that preserves formatting exactly. Switching to it is a future-work item; the API is different enough that it wasn't a one-evening change.
For now, --edit is best for scratch files and cases where you control the formatting anyway. For upstream-quality version bumps, use cargo set-version or cargo-edit.

Inline tables are awkward. serde = { version = "1.0", features = ["derive"] } is an inline table. Querying dependencies.serde.version works (returns "1.0"). But editing it may reformat the inline table into a regular [dependencies.serde] section on output. Again, a toml_edit migration fixes this.

No multi-document support. TOML has no equivalent of YAML's --- document separator, so there's nothing to support. jq has --slurp for multi-JSON inputs; nothing like it is needed here.

One path per invocation. jq can transform and re-emit a whole document with complex pipelines. toml-query can only query or edit one path per run. For complex shell logic you call it multiple times. For complex transformations you pipe to --format json and then into jq.

Testing

59 tests total: 34 unit tests across path, query, and format modules, plus 25 integration tests that invoke the compiled binary via assert_cmd. Each integration test writes a sample Cargo.toml to a tempdir, runs the binary, and asserts on stdout, stderr, and exit codes.

The integration tests are the important ones. Unit tests tell you the parser is self-consistent; integration tests tell you the exit codes line up with reality, that --keys produces newline-separated output without trailing garbage, that --edit followed by a re-query gives you back what you set. They catch the gap between "the module is correct" and "the binary does what shell scripts expect".

Try it in 30 seconds

The Docker image is 10 MB. If you have Docker:

docker build -t toml-query https://github.com/sen-ltd/toml-query.git

# Query a field
docker run --rm -v "$PWD":/work toml-query /work/Cargo.toml package.name

# List keys
docker run --rm -v "$PWD":/work toml-query /work/Cargo.toml --keys dependencies

# Dump a sub-tree as JSON
docker run --rm -v "$PWD":/work toml-query /work/Cargo.toml dependencies --format json

Or install from source:

git clone https://github.com/sen-ltd/toml-query
cd toml-query
cargo install --path .

Then alias it next to your jq:

alias tq='toml-query'
tq Cargo.toml package.version

Closing

The thing I keep coming back to: useful tools don't need to do a lot. jq is beloved because every operation has a predictable mapping to the JSON tree underneath. toml-query is the same idea for TOML — a parser, a walker, and a renderer, none of which know about each other, all of which are boring. Boring tools are the ones I keep installed across machines.

Four dependencies, roughly 550 lines of Rust, 59 tests, a 10 MB Docker image, and my shell scripts are prettier. That seems like a fair trade.

If you've been writing grep '^version' Cargo.toml | cut -d'"' -f2 in your CI, maybe try this instead.

📦 Repo: https://github.com/sen-ltd/toml-query

DEV Community

I Built jq for TOML Because My Shell Scripts Deserved Better

I Built jq for TOML Because My Shell Scripts Deserved Better

The problem, concretely

Design: three small modules

The path parser

The query walker

Type-preserving output

`--edit`: the in-place write

Trade-offs I chose

Testing

Try it in 30 seconds

Closing

Top comments (0)

I Built jq for TOML Because My Shell Scripts Deserved Better

The problem, concretely

Design: three small modules

The path parser

The query walker

Type-preserving output

--edit: the in-place write

Trade-offs I chose

Testing

Try it in 30 seconds

Closing

`--edit`: the in-place write