DEV Community

Mike Whitaker
Mike Whitaker

Posted on

5 3

Shell command options you didn't know you needed #6

Been a while, but here's a handy one I discovered over the weekend.

Back to the trusty xargs, that rather blunt and brutish chainsaw for processing a long list of files or whatever that someone gave you.

In this case, I had a list of files that I knew with 99%+ certainty were created on our old server and thus encoded in iso-8859-1, contained characters that were represented differently in utf-8 (which we had switched to on our new server) and needed converting, and a handy script wrapper around iconv to do one file at a time.

All 41,000 of them. The list took four hours to generate, during which time I was pondering the fact that I really should have taken advantage of the fact that, usefully, the new server has 40 cores of Xeon goodness. So we ought to be able to parallel process this list now we've got it, right? And ideally without bothering with GNU Parallel or Perl's Parallel::ForkManager?

Turns out we can!

xargs -P <n> (if supported on your OS) runs the commands generated by xargs in n-way parallel.

So:

cat <list of 41K files> | xargs -n 1 -P 100 <iconv wrapper>

We need the -n 1 as the wrapper only takes one file at a time, and this is how we tell xargs that. Deep breath. Hit RETURN.

Whoosh. Load on server briefly rockets to 45, then falls just as fast to its steady 1 and a bit. In about one minute flat, for all 41,000 files.

Not bad.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more