Sergey Dobrov

Posted on Dec 8, 2025

Rediscovering Unix Pipelines: Two Backup Problems, One mindset

#bash #devops #productivity #cli

Modern engineers often reach for JSON parsing, temporary files, or orchestration tools. Unix pipelines still outperform them more often than you'd expect. Two recent backup tasks reminded me how often we over-engineer simple data-movement problems.

Two backup problems, same pattern: both reached for complex solutions involving temporary files, JSON parsing, and extra dependencies. Both could be solved with simple pipelines.

Problem 1: Pruning Old Backups

I needed to keep the latest seven backups in object storage and delete everything older.

First instinct: treat it like an application problem.

mc ls --recursive --json bucket/ \
  | jq -r '. | "\(.lastModified) \(.key)"' \
  | sort \
  | … # extract keys, build array, loop, delete

Parse JSON, build arrays, extract timestamps, loop through objects.

But backups already encode timestamps in filenames. mc find already prints one file per line. sort already orders strings. head already drops lines.

The whole task collapses:

mc find bucket/ --name "backup-*.gz" \
  | sort \
  | head -n -7 \
  | while read -r file; do mc rm "$file"; done

No JSON. No arrays. No parsing. Just text flowing through composable tools.

Problem 2: Creating Backups

A teammate needed to dump PostgreSQL, compress it, and upload to object storage.

His approach (roughly):

First, upload a script to the server:

scp backup.sh db-server:/tmp/

Then run it:

ssh db-server /tmp/backup.sh

The script itself:

#!/bin/bash
pg_dump mydb > /tmp/backup.sql
gzip /tmp/backup.sql
mc cp /tmp/backup.sql.gz s3://bucket/backup-$(date +%Y%m%d).sql.gz
rm /tmp/backup.sql.gz

This requires:

Uploading the script first
Installing mc on the database server
Managing temporary files
Cleanup logic
Disk space for both uncompressed and compressed data

Each of these steps adds friction, state, and failure modes.
But none of them are actually required:

ssh db-server "pg_dump mydb | gzip" \
  | mc pipe s3://bucket/backup-$(date +%Y%m%d).sql.gz

From two commands plus a bash script to one line. From five requirements to zero.

Bonus: Free Parallelism

That backup pipeline runs three processes simultaneously: pg_dump generating data, gzip compressing it, mc uploading it. No threading code. No coordination.

The temporary-file version? Strictly sequential. Each step waits for the previous to finish.

Pipes give you streaming parallelism by default.

Need more? xargs and GNU parallel scale across cores:

# Compress 100 files with 4 workers
find . -name "*.log" | xargs -P 4 -I {} gzip {}

Same pipeline thinking, multiplied across CPUs.

When NOT to Use Pipes

Pipelines aren't always the answer:

Complex state management - multiple passes over data, tracking relationships
Explicit error handling - pipelines can fail silently
Extreme scale - terabytes need Spark/Hadoop
Bash quirkiness - arcane quoting, clunky error handling
Framework requirements - ETL tools gain orchestration but lose composability
Team familiarity - if pipelines are cryptic to your team, write Python

Unix pipes work best for glue code: moving data between systems, transforming formats, filtering streams.

The Pattern to Look For

Whenever you find yourself:

Loading data into arrays
Writing temporary files
Parsing structured data just to transform it
Installing tools on systems just to move data

…there's probably a pipeline waiting to be discovered.

Not because pipelines are "better." Because they're simpler. Simple solutions are easier to debug, modify, and maintain.

Conclusion

Fifty years later, Unix pipelines still win through elimination.

They eliminate temporary state. They eliminate dependencies. They eliminate the complexity of treating data movement as a programming exercise.

Your turn: What's a problem you recently solved with a pipeline instead of code? Or where you wrote code when a pipeline would have worked?

DEV Community