SEN LLC

Posted on Apr 15

I Built a jq Alternative That Speaks JSONPath — and Deliberately Wrote Almost No Parser Code

#python #cli #json #tutorial

I Built a jq Alternative That Speaks JSONPath — and Deliberately Wrote Almost No Parser Code

A small Python CLI that queries JSON using JSONPath (RFC 9535) instead of jq's DSL. The interesting part isn't the query engine — I wrapped an existing library for that. The interesting part is everything around it: error reporting, exit codes, output modes, Docker packaging, and the judgment call to not rewrite a solved problem.

📦 GitHub: https://github.com/sen-ltd/jsonpath-cli

Every couple of months I end up in the same spot: I need to extract a value from a blob of JSON in a shell pipeline. I reach for jq, I get the syntax wrong, I read the man page, I get it wrong again, I curse, I eventually succeed. Next time, I've forgotten everything.

Meanwhile, in the rest of my day, I'm writing JSONPath expressions constantly — in Postman tests, in Kubernetes kubectl -o jsonpath, in RestAssured assertions, in my IDE's JSON inspector. JSONPath is in my fingers. jq's DSL is not.

This gap has a fix, and the fix is boring: build a CLI that takes JSONPath expressions, reads JSON, prints matches. That's the whole product. I'm going to walk through how I built it, why I didn't write a parser, and what I learned about a query language I thought I understood.

The problem in one example

Say you have this JSON:

{"users":[{"name":"ada","age":36},{"name":"grace","age":85}]}

You want the names of users older than 40. In jq:

jq -r '.users[] | select(.age > 40) | .name'

In JSONPath:

$.users[?(@.age > 40)].name

Both are fine. Both express the same query. But if you've been writing the bottom one in six different tools all week, having to translate it to the top one — remembering the pipe syntax, the select function, the leading dot instead of $, the fact that .users[] is a flatten and not an index — is a small tax that adds up across a career. jq is a fantastic tool. I just want a version that speaks the dialect I already know.

What I wanted was:

jsonpath-cli '$.users[?(@.age > 40)].name'

Read stdin or a file. Print matches. Grep-style exit codes. Done.

Why Python, and why wrap an existing library

The first serious design decision was: do I write the JSONPath parser myself?

The RFC 9535 grammar is not huge — it fits in a few pages. I've written PEG parsers before. It would be interesting. It would also be at least two weeks of work to handle every edge case, and I'd be implementing the fifth or sixth production JSONPath parser in Python. There is already jsonpath-ng, which has been maintained for years, has a compatible extension module for filters (the ?(...) syntax), handles the ambiguous cases the original Goessner 2007 blog post left open, and is one pip install away.

The honest answer is: writing another parser would have been a vanity project. The value I can add is not in the parser. It's in how the tool behaves at the shell boundary — error messages, exit codes, output formats, Docker packaging, documentation. That's where the UX of jq is very good and the UX of random JSONPath playgrounds is almost uniformly bad. So: wrap jsonpath-ng, and spend my attention on everything else.

A few more judgment calls I made before writing any code:

CLI-first, not library-first. The Python ecosystem already has plenty of JSONPath libraries you can import. What it doesn't have is one with a clean jq-style command-line UX. So jsonpath-cli has a minimal library surface (just compile_expression, evaluate, iter_matches for tests), and the real product is cli.py.
Docker as the primary distribution channel. pip install jsonpath-cli is fine for Python folks, but the people who would benefit most from this tool don't necessarily have a clean Python environment — they have bash, a Mac, and Docker. Making docker run the first-class entry point means anyone can try it in 30 seconds without touching their system Python.
Grep-like exit codes. jq exits 0 when a query parses, regardless of whether it matched anything. That's fine for some workflows but awful when you're writing a shell pipeline like if jsonpath-cli '$.error' logs.json; then .... I wanted grep behavior: 0 on match, 1 on no match, 2 on expression error, 3 on I/O / invalid JSON.

The engine layer is a very thin wrapper

Here's the entire engine module, minus imports and docstrings:

class JSONPathParseError(JSONPathError):
    def __init__(self, expression: str, detail: str, position: int | None = None):
        self.expression = expression
        self.detail = detail
        self.position = position
        super().__init__(self._format())

    def _format(self) -> str:
        if self.position is None:
            return f"invalid JSONPath expression: {self.detail}"
        pointer = " " * self.position + "^"
        return (
            f"invalid JSONPath expression at column {self.position + 1}: {self.detail}\n"
            f"    {self.expression}\n"
            f"    {pointer}"
        )


def compile_expression(expression: str) -> Any:
    if not isinstance(expression, str):
        raise JSONPathParseError(str(expression), "expression must be a string")
    if not expression.strip():
        raise JSONPathParseError(expression, "expression is empty")
    try:
        return _ext_parse(expression)
    except (JsonPathParserError, JsonPathLexerError) as exc:
        detail, position = _extract_position(str(exc))
        raise JSONPathParseError(expression, detail, position) from exc

The whole engine is about 80 lines, and most of that is error translation. _ext_parse is jsonpath_ng.ext.parse, which understands filter expressions — the plain jsonpath_ng.parse doesn't, which tripped me up for half an hour. The ext module is the right default and not mentioned in the first page of the library's README.

The interesting design detail here is that the engine raises my exception type (JSONPathParseError), not jsonpath-ng's internal error. That matters because it lets me re-render the error with a caret pointer the user can actually read:

$ jsonpath-cli '$.users[?(' -
jsonpath-cli: invalid JSONPath expression at column 9: Parse error at 1:9 near token ( (()
    $.users[?(
            ^

The column is extracted from jsonpath-ng's own error string with a small regex. If I can't find a column, I render the error without the caret — degraded but not broken. The general principle: never let a dependency's error bubble up verbatim, even when the dependency is good, because your users don't know or care what library you wrapped.

Output formatting is where the actual work went

The engine returns a list of matched Python values. Turning that list into text on stdout is the part users actually see, and I wanted four distinct modes:

def _format_matches(matches, *, raw, json_output, count, indent):
    if count:
        return f"{len(matches)}\n"

    if json_output:
        return json.dumps(matches, ensure_ascii=False, indent=indent) + "\n"

    lines = []
    for value in matches:
        if raw and isinstance(value, str):
            lines.append(value)
        else:
            lines.append(json.dumps(value, ensure_ascii=False, indent=indent))
    return "".join(line + "\n" for line in lines)

Four modes in 15 lines, and I could fold them into the CLI function — but keeping formatting in its own function makes the tests almost trivial. You call _format_matches([...], raw=True, ...) and compare strings. No subprocess, no captured streams, no flakes.

One subtlety in --raw mode: it only unquotes strings. If your expression matches a number or an object, raw mode falls back to JSON-encoding it. That mirrors jq -r's behavior and is what you actually want in pipelines — jsonpath-cli --raw '$.user.id' | xargs echo should work whether the ID is a string or a number.

ensure_ascii=False is there because I know some of my data is Japanese. Without it, {"名前": "ada"} would come out as "\u540d\u524d", which is technically correct JSON and absolutely useless in a terminal. One flag flip, one test case, one less paper cut.

Streams in, exit codes out

The CLI's main function is structured so every external thing it touches — argv, stdin, stdout, stderr — is a parameter with a default:

def main(argv=None, *, stdin=None, stdout=None, stderr=None) -> int:
    parser = build_parser()
    args = parser.parse_args(argv)

    stdin = stdin or sys.stdin
    stdout = stdout or sys.stdout
    stderr = stderr or sys.stderr

    try:
        data = _read_input(args.file, stdin)
    except _IOProblem as exc:
        print(f"jsonpath-cli: {exc}", file=stderr)
        return EXIT_IO_ERROR

    try:
        matches = evaluate(args.expression, data)
    except JSONPathParseError as exc:
        print(f"jsonpath-cli: {exc}", file=stderr)
        return EXIT_PARSE_ERROR

    output = _format_matches(matches, raw=args.raw, json_output=args.json_output,
                             count=args.count, indent=args.indent)
    stdout.write(output)
    return EXIT_OK if matches else EXIT_NO_MATCH

This shape — a main that returns an int and takes injectable streams — is the single biggest upgrade you can make to a Python CLI's testability. My end-to-end tests are literally this:

def _run(argv, stdin_text=""):
    stdin = io.StringIO(stdin_text)
    stdout = io.StringIO()
    stderr = io.StringIO()
    code = main(argv, stdin=stdin, stdout=stdout, stderr=stderr)
    return code, stdout.getvalue(), stderr.getvalue()

def test_no_match_returns_exit_code_1():
    code, out, _ = _run(["$.missing"], stdin_text='{"a": 1}')
    assert code == 1
    assert out == ""

No subprocess.run, no temp files for most cases, no slow startup. Twenty-odd tests run in 0.2 seconds. When a test fails, the assertion error points at the exact string mismatch, not at a wall of captured stderr.

The exit code rule is the only place I had to think. Should --count exit 1 when the count is zero? I decided yes, because if you write jsonpath-cli --count '$.errors' in CI, you want a non-zero exit to fire when there are no matches but you're checking match existence, and you want $? to flip on match presence. That's the whole point of grep-style exit codes.

What JSONPath actually isn't — and where `jsonpath-ng` helps

Here's the thing I had to accept halfway through building this: JSONPath is not one language. It's a family of mutually incompatible dialects, and until RFC 9535 landed in February 2024, there was no official standard at all — just Stefan Goessner's 2007 blog post and whatever each implementation decided to do. That's why jq, jsonpath-ng, Kubernetes' JSONPath, and Java's JSONPath library all disagree on edge cases like:

Does $[*] on an object iterate values or keys?
Is $..['a','b'] valid, or do you need two separate queries?
What does $.store.books[?(@.price)] return — truthy-filter semantics, existence-filter semantics, or a parse error?
Is the root $ mandatory, and what does a path without it mean?

jsonpath-ng picks a reasonable set of answers and mostly aligns with the RFC. "Mostly" is honest: the RFC is new enough that no Python library is fully conformant yet, and jsonpath-ng predates the RFC by years. For the 95% of queries people actually write — field access, wildcards, recursive descent, simple filters, slices — behavior is boring and consistent, and that's the part I care about. For the 5% edge cases, I documented my position: whatever jsonpath-ng does is what the CLI does, and if the RFC disagrees I'll upgrade when the library does.

This is a judgment call I'm comfortable defending. The alternative — writing my own parser that tracks the RFC to the letter — means owning a compliance treadmill for a tool I wrote in a weekend. Not a good trade.

Where this tool falls short

Honest list:

Streaming. The CLI reads the entire input into memory and parses it as one JSON document. If you pipe a 5 GB log file in, it'll OOM. A real streaming implementation would need an event-based JSON parser (ijson) and a reimplementation of the JSONPath engine in a push-based style. Neither is small work. For log files, jq --stream is still your answer.
Output ordering on recursive descent. $..price returns matches in the order jsonpath-ng walks the tree, which is stable but not necessarily what you'd expect from a textual reading of the source JSON. I documented this with a test that asserts the set matches, not the sequence.
No JSONPath 2 features yet. Things like $..[?(@.tags contains 'python')] work in some dialects but not in jsonpath-ng's ext module. That's upstream, not something I'm going to fix.
Docker image is ~60 MB. Multi-stage Alpine build, but the python:3.12-alpine base is still 45 MB by itself. A Go rewrite would get this under 10 MB. That's a rewrite I might actually do, because for a CLI tool installable via docker run, download size is user-facing performance.

Try it in 30 seconds

# Build it
git clone https://github.com/sen-ltd/jsonpath-cli
cd jsonpath-cli
docker build -t jsonpath-cli .

# Basic field
echo '{"user":{"name":"ada"}}' \
  | docker run --rm -i jsonpath-cli --raw '$.user.name'
# => ada

# Filter
echo '{"users":[{"name":"ada","age":36},{"name":"grace","age":85}]}' \
  | docker run --rm -i jsonpath-cli --raw '$.users[?(@.age > 40)].name'
# => grace

# Recursive descent + JSON output
echo '{"a":{"b":{"c":42}},"d":{"c":99}}' \
  | docker run --rm -i jsonpath-cli --json '$..c'
# => [42, 99]

# Run the test suite in the image
docker run --rm --entrypoint pytest jsonpath-cli
# => 29 passed in 0.24s

Exit codes work the way you'd want:

echo '{"a":1}' | docker run --rm -i jsonpath-cli '$.nothing' ; echo $?
# => 1

echo 'not json' | docker run --rm -i jsonpath-cli '$.a' ; echo $?
# => 3

The lesson

The thing I want to leave you with isn't about JSONPath. It's about the decision I made on day one: I'm not writing the parser.

Every time I'm tempted to reimplement a solved problem "for fun" or "to understand it better," I have to weigh that against the actual product I'm trying to ship. Sometimes the learning is the point and reimplementing is right. But sometimes — most times, honestly — the value I can add is at the edges: the error message a user sees at 2am, the exit code their CI script checks, the docker run command that works without a README. Those are the things that turn a library into a tool. And those are the things no one else has already written for you.

If there's a user-facing CLI behavior you wish existed, and the hard algorithmic work is already in a library somewhere, go wrap it. The wrapper is the product.

Closing

Entry #102 in a 100+ portfolio series by SEN LLC. I'm also building this same "wrap a library, ship a great CLI" pattern for a few other query languages. If that sounds interesting, follow along.

Feedback welcome.

DEV Community

I Built a jq Alternative That Speaks JSONPath — and Deliberately Wrote Almost No Parser Code

I Built a jq Alternative That Speaks JSONPath — and Deliberately Wrote Almost No Parser Code

The problem in one example

Why Python, and why wrap an existing library

The engine layer is a very thin wrapper

Output formatting is where the actual work went

Streams in, exit codes out

What JSONPath actually isn't — and where `jsonpath-ng` helps

Where this tool falls short

Try it in 30 seconds

The lesson

Closing

Top comments (0)

I Built a jq Alternative That Speaks JSONPath — and Deliberately Wrote Almost No Parser Code

The problem in one example

Why Python, and why wrap an existing library

The engine layer is a very thin wrapper

Output formatting is where the actual work went

Streams in, exit codes out

What JSONPath actually isn't — and where jsonpath-ng helps

Where this tool falls short

Try it in 30 seconds

The lesson

Closing

What JSONPath actually isn't — and where `jsonpath-ng` helps