Qasim

Posted on Jun 22

Parse and transform JSON on the command line with jq

#jq #bash #json #devtools

Every backend developer eventually pipes a JSON response into a terminal and needs one field out of it. You can open a REPL, write a script, and parse it — or you can pipe it to jq and get the answer in one line. jq is a small command-line tool that filters and transforms JSON the way grep and sed handle text, and once it's in your muscle memory you stop writing throwaway scripts for data you only look at once.

This is a field guide to the jq filters I actually use, built around real tasks: pulling a field, filtering an array, reshaping objects, and handling the messy cases like missing keys and mixed types. Every example runs against JSON you'd get from a typical REST API.

Install jq and run your first filter

jq ships as a single binary with no runtime dependencies, which is why it shows up in so many Docker images and CI pipelines. Install it with your package manager: brew install jq on macOS, apt install jq on Debian or Ubuntu, dnf install jq on Fedora. The current release is 1.7, which added a few functions worth knowing, but everything here works on 1.6 too.

The simplest filter is ., the identity filter, which pretty-prints whatever it receives. Pipe a compact API response through it and you get readable, indented JSON with syntax coloring:

echo '{"id":42,"name":"Ada","active":true}' | jq '.'

That alone justifies installing it. A minified response that's unreadable in your terminal becomes a clean, colorized object. From here, every other filter is a path or transformation applied to that input.

Pull a single field out of a response

The most common job is extracting one value. jq '.field' reaches into an object and returns the value at that key, and you chain keys with dots to go deeper. For a nested response, .user.email walks two levels down in one expression.

# Extract a top-level field
curl -s https://api.example.com/users/42 | jq '.name'

# Reach into a nested object
curl -s https://api.example.com/users/42 | jq '.user.profile.email'

By default jq prints strings with quotes, which is fine for reading but awkward when you want to use the value in a shell variable. Add the -r (raw) flag and it strips the quotes, so EMAIL=$(curl -s ... | jq -r '.email') gives you a clean string with no surrounding characters to trim. The -r flag is the single most useful option in the tool.

Work through arrays with the iterator

APIs return lists constantly, and .[] is how you iterate one. It takes an array and emits each element as a separate output, so .[] on a 50-item list produces 50 results that you can filter further down the pipe. Combine it with a field access to pull one property from every element.

# Print the name of every user in a list
curl -s https://api.example.com/users | jq '.[].name'

# Index a specific element (zero-based)
curl -s https://api.example.com/users | jq '.[0]'

# Slice a range of elements
curl -s https://api.example.com/users | jq '.[2:5]'

The difference between .[] and .[0] trips people up at first. The bracket-only form is an iterator that streams every element, while an index returns one. When you want the whole array transformed rather than streamed, wrap the expression in brackets to collect results back into an array, which the next section covers.

Reshape objects with map and object construction

Pulling fields is half the job; the other half is building a new structure. Object construction with {} lets you select and rename fields, and map() applies a transform to every element of an array. Together they turn a verbose API payload into exactly the shape your next step needs.

# Keep only id and name from each user, renaming name to label
curl -s https://api.example.com/users \
  | jq 'map({ id: .id, label: .name })'

This is where jq replaces a small script. The map({ ... }) expression iterates the array, and for each element builds a new object with just the two fields you asked for. Shorthand exists too: { .id, .name } keeps both keys with their original names, so map({ .id, .name }) is the quickest way to drop everything except a known subset of fields.

Filter arrays with select

Extracting matching elements is what select() does. It takes a boolean expression and passes through only the elements where it's true, so select(.active) keeps the active users and drops the rest. Pair it with the .[] iterator to filter a list, then collect the survivors back into an array with brackets.

# Keep only active users
curl -s https://api.example.com/users | jq '[.[] | select(.active)]'

# Filter on a numeric comparison
curl -s https://api.example.com/users | jq '[.[] | select(.age >= 18)]'

# Match a string field
curl -s https://api.example.com/orders | jq '[.[] | select(.status == "shipped")]'

The pattern [.[] | select(...)] reads as "iterate, keep matches, collect" and it's the workhorse for querying API data. The outer brackets matter: without them you get a stream of matching objects, which is fine for piping onward but not valid JSON as a whole. With them, the output is a proper array you can save to a file or feed to another tool.

Handle missing keys and mixed types safely

Real API data is inconsistent. A field is present on some objects and absent on others, or it's sometimes a string and sometimes null. jq errors out when you access a key on something that isn't an object, and the fix is the ? operator, which suppresses that error and yields nothing instead. The // alternative operator supplies a default when the left side is null or absent.

# Suppress errors when a key might be missing
curl -s https://api.example.com/users | jq '[.[] | .address.city?]'

# Provide a default for null or missing values
curl -s https://api.example.com/users | jq '[.[] | .nickname // "unknown"]'

These two operators are what make a jq filter survive real data instead of breaking on the first record that doesn't match your assumptions. Reach for ? when a path might not exist, and // when you want a fallback value. A filter built with both runs cleanly across a list where every record is shaped slightly differently.

Count, sum, and group with built-in functions

Beyond filtering, jq has aggregation built in. length returns the size of an array or string, add sums a list of numbers, and group_by() clusters elements by a key. These turn a raw list into a summary without leaving the terminal, which is exactly what you want when you're eyeballing the shape of a dataset.

# Count the elements in an array
curl -s https://api.example.com/users | jq 'length'

# Sum a numeric field across all elements
curl -s https://api.example.com/orders | jq '[.[].total] | add'

# Group orders by status and count each group
curl -s https://api.example.com/orders \
  | jq 'group_by(.status) | map({ status: .[0].status, count: length })'

The grouping example is the kind of thing you'd normally open a notebook for. group_by(.status) returns an array of arrays, one per distinct status, and the map() collapses each group into a small summary object. It's a complete aggregation pipeline in two functions.

Convert JSON to CSV and back

jq bridges JSON and tabular formats, which matters the moment someone asks for a spreadsheet. The @csv and @tsv format strings turn an array of values into a delimited row, and you build the rows by mapping each object to an array of its fields. The -r flag is required here so the output isn't wrapped in JSON string quotes.

# Turn a JSON array into CSV rows
curl -s https://api.example.com/users \
  | jq -r '.[] | [.id, .name, .email] | @csv'

For a header row, prepend it before the data: ["id","name","email"], (.[] | [...]). The result is a clean CSV you can redirect into a file and open anywhere. Going the other direction, jq reads line-delimited input with the -R flag to treat each line as a raw string, which is how you parse log files that mix plain text and JSON.

Process newline-delimited JSON from logs

Structured logs are often NDJSON, one JSON object per line, and jq reads them natively. By default it parses the whole input as one document, but logs are a stream, so you point it at each line. The -c flag keeps output compact, one object per line, which is what you want when filtering a log file down to the events you care about.

# Filter a JSON log file to error-level events, compact output
jq -c 'select(.level == "error")' app.log

# Extract two fields from each log line as a table
jq -r '[.timestamp, .message] | @tsv' app.log

This is where jq earns its place in an incident response. A multi-gigabyte NDJSON log becomes searchable by field instead of by regex, so "every error in the last deploy with a specific request ID" is one select() away. Combined with grep to pre-filter the file, it scales to logs far larger than anything you'd load into an editor.

Pass shell values into a filter with --arg

jq filters are sandboxed from the shell, which is a feature rather than a limitation. To use a shell value inside a filter you bind it as data with --arg name value for strings or --argjson name value for parsed JSON, then reference it as $name. String-interpolating a value into the filter text invites quoting bugs and injection; binding it keeps the filter and the data separate.

# Filter by a status held in a shell variable
STATUS=shipped
curl -s https://api.example.com/orders \
  | jq --arg s "$STATUS" '[.[] | select(.status == $s)]'

# Pass a number with --argjson so it stays numeric, not a string
curl -s https://api.example.com/users \
  | jq --argjson min 18 '[.[] | select(.age >= $min)]'

The distinction matters: --arg always binds a string, so comparing it against a number silently fails, while --argjson parses the value first and keeps it a number, boolean, or object. Use --arg for text and --argjson whenever the type has to survive.

Sort and deduplicate results

Turning an unordered API list into a clean, sorted set is two functions. sort_by() orders an array by a key, ascending and stable, and unique removes duplicate values. Chain reverse after a sort for descending order, which is how you get "newest first" or "highest value at the top" out of a raw list.

# Sort users by age, oldest first
curl -s https://api.example.com/users | jq 'sort_by(.age) | reverse'

# Build a unique, sorted list of every status seen across orders
curl -s https://api.example.com/orders | jq '[.[].status] | unique'

unique sorts as a side effect and collapses duplicates, so a list of 500 orders with five distinct statuses becomes a five-element array. For deduplicating by a specific field rather than the whole value, unique_by(.status) keeps one element per status.

Things worth keeping in mind

A few habits make jq filters more reliable once you move past one-liners into scripts and pipelines.

Always reach for -r when feeding shell variables. Quoted output is the most common reason a jq value breaks a downstream command.
Wrap streaming filters in [ ] when you need valid JSON. A bare .[] | select(...) emits a stream, which isn't a single document.
Use ? and // defensively on real data. Assume some records are missing fields, because at scale they always are.
Test filters on a small slice first. Pipe through jq '.[0:3]' while you build the expression, then drop it once the shape is right.
Single-quote your filters in bash. jq syntax uses characters the shell wants to expand, so single quotes keep the filter intact.
Chain two jq calls when a pipeline gets dense. Two simple filters piped together are often clearer than one long expression, and the intermediate JSON is easy to inspect when something goes wrong.

Wrapping up

jq pays for itself the first time you skip writing a script to read a JSON file. The core moves are small: . to inspect, .field to extract, .[] to iterate, select() to filter, and map({}) to reshape. Add -r for clean output and ?/// for messy data, and you can answer almost any question about a JSON payload from the command line.

Where to go next:

The official jq manual — every filter and function with examples
jq play — a browser playground for testing filters against your own JSON
jq on GitHub — releases, issues, and the changelog for 1.7

DEV Community