DEV Community

Cover image for Rediscovering Unix Pipelines: Two Backup Problems, One mindset
Sergey Dobrov
Sergey Dobrov

Posted on

Rediscovering Unix Pipelines: Two Backup Problems, One mindset

Modern engineers often reach for JSON parsing, temporary files, or orchestration tools. Unix pipelines still outperform them more often than you'd expect. Two recent backup tasks reminded me how often we over-engineer simple data-movement problems.

Two backup problems, same pattern: both reached for complex solutions involving temporary files, JSON parsing, and extra dependencies. Both could be solved with simple pipelines.

Problem 1: Pruning Old Backups

I needed to keep the latest seven backups in object storage and delete everything older.

First instinct: treat it like an application problem.

mc ls --recursive --json bucket/ \
  | jq -r '. | "\(.lastModified) \(.key)"' \
  | sort \
  | … # extract keys, build array, loop, delete
Enter fullscreen mode Exit fullscreen mode

Parse JSON, build arrays, extract timestamps, loop through objects.

But backups already encode timestamps in filenames. mc find already prints one file per line. sort already orders strings. head already drops lines.

The whole task collapses:

mc find bucket/ --name "backup-*.gz" \
  | sort \
  | head -n -7 \
  | while read -r file; do mc rm "$file"; done
Enter fullscreen mode Exit fullscreen mode

No JSON. No arrays. No parsing. Just text flowing through composable tools.

Problem 2: Creating Backups

A teammate needed to dump PostgreSQL, compress it, and upload to object storage.

His approach (roughly):

First, upload a script to the server:

scp backup.sh db-server:/tmp/
Enter fullscreen mode Exit fullscreen mode

Then run it:

ssh db-server /tmp/backup.sh
Enter fullscreen mode Exit fullscreen mode

The script itself:

#!/bin/bash
pg_dump mydb > /tmp/backup.sql
gzip /tmp/backup.sql
mc cp /tmp/backup.sql.gz s3://bucket/backup-$(date +%Y%m%d).sql.gz
rm /tmp/backup.sql.gz
Enter fullscreen mode Exit fullscreen mode

This requires:

  • Uploading the script first
  • Installing mc on the database server
  • Managing temporary files
  • Cleanup logic
  • Disk space for both uncompressed and compressed data

Each of these steps adds friction, state, and failure modes.
But none of them are actually required:

ssh db-server "pg_dump mydb | gzip" \
  | mc pipe s3://bucket/backup-$(date +%Y%m%d).sql.gz
Enter fullscreen mode Exit fullscreen mode

From two commands plus a bash script to one line. From five requirements to zero.

Bonus: Free Parallelism

That backup pipeline runs three processes simultaneously: pg_dump generating data, gzip compressing it, mc uploading it. No threading code. No coordination.

The temporary-file version? Strictly sequential. Each step waits for the previous to finish.

Pipes give you streaming parallelism by default.

Need more? xargs and GNU parallel scale across cores:

# Compress 100 files with 4 workers
find . -name "*.log" | xargs -P 4 -I {} gzip {}
Enter fullscreen mode Exit fullscreen mode

Same pipeline thinking, multiplied across CPUs.

When NOT to Use Pipes

Pipelines aren't always the answer:

  • Complex state management - multiple passes over data, tracking relationships
  • Explicit error handling - pipelines can fail silently
  • Extreme scale - terabytes need Spark/Hadoop
  • Bash quirkiness - arcane quoting, clunky error handling
  • Framework requirements - ETL tools gain orchestration but lose composability
  • Team familiarity - if pipelines are cryptic to your team, write Python

Unix pipes work best for glue code: moving data between systems, transforming formats, filtering streams.

The Pattern to Look For

Whenever you find yourself:

  • Loading data into arrays
  • Writing temporary files
  • Parsing structured data just to transform it
  • Installing tools on systems just to move data

…there's probably a pipeline waiting to be discovered.

Not because pipelines are "better." Because they're simpler. Simple solutions are easier to debug, modify, and maintain.

Conclusion

Fifty years later, Unix pipelines still win through elimination.

They eliminate temporary state. They eliminate dependencies. They eliminate the complexity of treating data movement as a programming exercise.

Your turn: What's a problem you recently solved with a pipeline instead of code? Or where you wrote code when a pipeline would have worked?

Top comments (0)