Modern engineers often reach for JSON parsing, temporary files, or orchestration tools. Unix pipelines still outperform them more often than you'd expect. Two recent backup tasks reminded me how often we over-engineer simple data-movement problems.
Two backup problems, same pattern: both reached for complex solutions involving temporary files, JSON parsing, and extra dependencies. Both could be solved with simple pipelines.
Problem 1: Pruning Old Backups
I needed to keep the latest seven backups in object storage and delete everything older.
First instinct: treat it like an application problem.
mc ls --recursive --json bucket/ \
| jq -r '. | "\(.lastModified) \(.key)"' \
| sort \
| … # extract keys, build array, loop, delete
Parse JSON, build arrays, extract timestamps, loop through objects.
But backups already encode timestamps in filenames. mc find already prints one file per line. sort already orders strings. head already drops lines.
The whole task collapses:
mc find bucket/ --name "backup-*.gz" \
| sort \
| head -n -7 \
| while read -r file; do mc rm "$file"; done
No JSON. No arrays. No parsing. Just text flowing through composable tools.
Problem 2: Creating Backups
A teammate needed to dump PostgreSQL, compress it, and upload to object storage.
His approach (roughly):
First, upload a script to the server:
scp backup.sh db-server:/tmp/
Then run it:
ssh db-server /tmp/backup.sh
The script itself:
#!/bin/bash
pg_dump mydb > /tmp/backup.sql
gzip /tmp/backup.sql
mc cp /tmp/backup.sql.gz s3://bucket/backup-$(date +%Y%m%d).sql.gz
rm /tmp/backup.sql.gz
This requires:
- Uploading the script first
- Installing
mcon the database server - Managing temporary files
- Cleanup logic
- Disk space for both uncompressed and compressed data
Each of these steps adds friction, state, and failure modes.
But none of them are actually required:
ssh db-server "pg_dump mydb | gzip" \
| mc pipe s3://bucket/backup-$(date +%Y%m%d).sql.gz
From two commands plus a bash script to one line. From five requirements to zero.
Bonus: Free Parallelism
That backup pipeline runs three processes simultaneously: pg_dump generating data, gzip compressing it, mc uploading it. No threading code. No coordination.
The temporary-file version? Strictly sequential. Each step waits for the previous to finish.
Pipes give you streaming parallelism by default.
Need more? xargs and GNU parallel scale across cores:
# Compress 100 files with 4 workers
find . -name "*.log" | xargs -P 4 -I {} gzip {}
Same pipeline thinking, multiplied across CPUs.
When NOT to Use Pipes
Pipelines aren't always the answer:
- Complex state management - multiple passes over data, tracking relationships
- Explicit error handling - pipelines can fail silently
- Extreme scale - terabytes need Spark/Hadoop
- Bash quirkiness - arcane quoting, clunky error handling
- Framework requirements - ETL tools gain orchestration but lose composability
- Team familiarity - if pipelines are cryptic to your team, write Python
Unix pipes work best for glue code: moving data between systems, transforming formats, filtering streams.
The Pattern to Look For
Whenever you find yourself:
- Loading data into arrays
- Writing temporary files
- Parsing structured data just to transform it
- Installing tools on systems just to move data
…there's probably a pipeline waiting to be discovered.
Not because pipelines are "better." Because they're simpler. Simple solutions are easier to debug, modify, and maintain.
Conclusion
Fifty years later, Unix pipelines still win through elimination.
They eliminate temporary state. They eliminate dependencies. They eliminate the complexity of treating data movement as a programming exercise.
Your turn: What's a problem you recently solved with a pipeline instead of code? Or where you wrote code when a pipeline would have worked?
Top comments (0)