DEV Community

Don Johnson
Don Johnson Subscriber

Posted on • Originally published at dev.to

The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again)

These weren't in your bootcamp. They're not in most tutorials. They've been quietly available on every Linux box since before "AI workflow" was a phrase — and they're more useful now than they've ever been.

Try it yourself: clone linux-archaeology-lab, run bash setup.sh, and every command in this article has a working exercise waiting for you.


watch — monitor anything without a single line of code

watch runs a command on a repeating interval and fills your terminal with the refreshing output. That's it. No loop, no sleep, no script.

Why it's back: AI inference runs take time. watch -n1 nvidia-smi is the fastest way to see GPU memory climb and fall without touching the model process at all. watch -n2 'ls outputs/ | wc -l' tells you how far a batch job has gotten. One flag, zero instrumentation.


tee — two destinations, one stream

tee reads stdin and writes it to both stdout and a file simultaneously. Not sequentially — simultaneously, as data flows.

The pattern that comes up constantly in AI work:

agent-command 2>&1 | ts '[%H:%M:%S]' | tee run-$(date +%Y%m%d-%H%M%S).log
Enter fullscreen mode Exit fullscreen mode

You see it live. It's in a timestamped log. Stderr is captured. One pipeline, three things handled.


pv — a progress bar for any pipeline

pv is a transparent pipe segment. Data passes through it unchanged; it prints throughput, elapsed time, and a progress bar to stderr.

You don't modify the commands on either side. You just insert pv into the middle:

cat data.jsonl | pv | python3 process.py
Enter fullscreen mode Exit fullscreen mode

A blinking cursor becomes a progress bar with an ETA. For long inference batches — thousands of rows, slow API calls, large embeddings — pv turns a black box into something you can actually reason about.


ts — timestamp every line of output

ts prepends a timestamp to every line it receives on stdin. Nothing else.

The power is in the relative mode:

agent-command | ts -s '%.s'
Enter fullscreen mode Exit fullscreen mode

Each line is prefixed with the time since the previous line — so you can see exactly where an agent spent 4 seconds between steps. No profiler. No code changes.

ts is from moreutils. Install once: sudo apt install moreutils.


sponge — safe in-place pipeline transforms

This command exists to solve one specific problem, and it solves it perfectly.

The shell opens output files for writing before the pipeline starts — which truncates the file before it's been read. sponge soaks up all of stdin into memory first, then writes when it gets EOF. The file is safe.

sort file.txt | sponge file.txt        # safe
python3 -m json.tool cfg.json | sponge cfg.json   # safe
grep -v DEBUG app.log | sponge app.log            # safe
Enter fullscreen mode Exit fullscreen mode

Also from moreutils.


column — readable tables without Python

column formats delimited input into aligned columns. One flag for the delimiter, one flag for table mode.

Before:

model   provider    params  context_k
llama-3.1-8b    Meta    8B  128
mistral-7b  Mistral AI  7B  32
Enter fullscreen mode Exit fullscreen mode

After column -t -s $'\t':

model          provider      params  context_k
llama-3.1-8b   Meta          8B      128
mistral-7b     Mistral AI    7B      32
Enter fullscreen mode Exit fullscreen mode

For any command that emits structured text — tool call logs, benchmark results, model comparisons — column makes it scannable in one pipeline stage. No pandas. No formatting code.


comm — surgical set operations on text files

comm compares two sorted files and gives you three columns: lines only in file A, lines only in file B, lines in both. Suppress any column you don't need.

The comm -12 (intersection) and comm -23 (A minus B) patterns are the correct answer to "what's consistent across these two model runs?" and "what did run B drop that run A had?" — in one command, no Python, no diff | grep.

Process substitution makes it flexible:

comm -23 <(sort run-a.txt) <(sort run-b.txt)
Enter fullscreen mode Exit fullscreen mode

tac — read any file from the bottom

tac is cat spelled backwards. It reverses line order.

The killer use case:

tac agent.log | grep -m1 'ERROR'
Enter fullscreen mode Exit fullscreen mode

Find the most recent error in a log without reading the whole file. -m1 stops at the first match — which, in a reversed file, is the last occurrence. No tail, no awk, no Python.

Pair with head for newest-N-lines: tac logfile | head -20.


vidir — batch rename in your text editor

vidir opens a directory listing in $EDITOR. You rename files by editing text. You delete files by deleting lines.

1   outputs/output-1.txt
2   outputs/output-2.txt
3   outputs/output-3.txt
Enter fullscreen mode Exit fullscreen mode

Run :%s/output-/summary-/g, save, quit. All three files renamed. Your editor's full power — regex, macros, multicursor — applied to filesystem operations.

Replaces rename 's/pattern/replacement/' * (Perl regex you have to look up) and for f in *; do mv ...; done (quoting hell).

Also from moreutils.


parallel — concurrent tasks without threading code

GNU parallel is xargs -P with readable syntax, job control, retries, and output you can actually parse.

The batched inference pattern:

cat prompts.jsonl | parallel -j4 --pipe --block 10k inference-tool
Enter fullscreen mode Exit fullscreen mode

Four workers, each receiving a 10K block of JSONL. No threading code. No async boilerplate. Output is ordered and labeled with --tag. Failed jobs retry with --retries 3.

For AI workloads — running the same prompt against multiple models, calling an embedding API for each document in a dataset, processing output files — parallel turns a sequential loop into concurrent execution in one command.


Load the reasoning skill into Claude Code

Knowing the commands is one thing. Knowing which one to reach for is another.

The lab repo ships .claude/skills/linux-archaeology.md — a Claude Code skill that maps natural-language descriptions to the right command. Describe your problem and it reasons through the answer:

"I need a progress bar for this pipeline"pv
"How do I timestamp my agent logs?"ts

"I want to rename a batch of files without writing a script"vidir

Install in any project:

mkdir -p .claude/skills
curl -sL https://raw.githubusercontent.com/copyleftdev/linux-archaeology-lab/main/.claude/skills/linux-archaeology.md \
  > .claude/skills/linux-archaeology.md
Enter fullscreen mode Exit fullscreen mode

The thread

watch, tee, pv, ts, sponge, column, comm, tac, vidir, parallel — none of these are new. They were built for the terminal long before AI workflows existed. But AI workflows surfaced the exact problems they solve: long-running processes with no visibility, streams that need to go two places, logs that need timestamps, files that need in-place transforms, tasks that need to run in parallel.

The tools were there. The problems caught up.

Run every command in this article against real data:
linux-archaeology-lab — clone it, bash setup.sh, open exercises/.

Which one did you not know about? Drop it in the comments.


Tags: linux productivity devtools ai bash


Sister article: The git Commands You Forgot Exist

Top comments (0)