Technical Beauty — Episode 37
A disk is filling up. Somewhere under /var are thousands of stale log files, scattered across directories nobody remembers creating. The task is dreary and familiar: locate the old ones and clear them. One line does it, and reads almost like an instruction to a colleague:
find /var/log -name '*.log' -mtime +30 -delete
No loop, no temporary file, no manual descent into each directory. The tool that reads that line as a single coherent thought has been doing so since 1979, and the way it reads is the whole point of this episode.
The Grammar
Most Unix tools take flags: a verb and a handful of switches that modify it. find is different. find takes an expression.
The arguments after the starting path are not flags in the usual sense; they are terms in a small query language. There are primaries, which are tests or actions: -name matches a glob, -type f selects regular files, -mtime +30 means "modified more than thirty days ago", -size +100M means larger than a hundred megabytes, -newer ref means changed more recently than a reference file. There are operators that combine them: terms written next to each other are joined by an implicit logical AND, -o is OR, ! is NOT, and parentheses group sub-expressions (escaped from the shell as \( ... \)).
find walks the directory tree, and for each file it evaluates the expression. If the expression is true, the actions in it fire. That is the entire model. "Find every regular file under here, larger than a hundred megabytes, not owned by root, and print it" is one expression, evaluated once per file, composed from parts you already know.
This is the reduction the series exists to celebrate. find does not ship a flag for every conceivable query. It ships a grammar, and the grammar composes every query from a small vocabulary of primaries and three operators. The surface you must learn is tiny; the space of things you can express is enormous. That ratio, small vocabulary to large expressivity, is what elegance looks like on a command line.
The Surface
In practice, most of what anyone types is a handful of shapes:
find . -type f -name '*.conf'
find /var/log -mtime +30 -delete
find . -size +100M -exec ls -lh {} +
find . -type d -empty -delete
The -exec action deserves a note, because it has two forms and the difference matters. -exec cmd {} \; runs the command once per matched file, substituting the filename for {}. -exec cmd {} + gathers as many matches as the command line allows and runs the command as few times as possible, which is dramatically faster for large match sets. The plus form is the one to reach for by default; the semicolon form is for when the command genuinely takes one argument at a time.
For everything else, find composes with the rest of the toolbox through the pipe, and it does so safely:
find . -type f -print0 | xargs -0 sha256
-print0 terminates each filename with a null byte instead of a newline, and xargs -0 reads them the same way. This is not a nicety. Filenames on Unix may contain spaces, newlines, and almost any other byte, and the naive idioms (for f in $(ls), or piping plain find output into a tool that splits on whitespace) corrupt or skip such names, occasionally with destructive results. The null-separated pipeline is the correct way to move a list of arbitrary filenames between tools, and find has supported it for decades. Beauty, here, includes correctness: the elegant idiom is also the safe one.
On FreeBSD
FreeBSD ships BSD find in the base system, BSD-licensed, at /usr/bin/find. It is the lean, POSIX-clean implementation, and on a freshly installed FreeBSD it is simply present, no package required. The same is true on OpenBSD, NetBSD and macOS, all of which carry a BSD-derived find.
GNU find, part of the GNU findutils package and licensed under the GPL, grew a larger set of primaries over the years (-printf with its own format language, several regex variants, and more) and accreted complexity accordingly. None of that is wrong, and some of the extensions are genuinely handy; it is simply a different point on the curve between "small and POSIX-clean" and "feature-rich". On FreeBSD it is a pkg install findutils away, installed as gfind, for the occasions when a script needs a specific GNU primary. For the daily load, the in-base BSD tool is the whole tool, and that is the version this episode is about: the one that fits in your head.
The Lineage
Dick Haight wrote find for Version 7 Unix, released in 1979, along with cpio and expr. He worked in what was then the Unix Support Group, the part of Bell Labs charged with turning the research system into something AT&T could support and ship, rather than in the research group where Unix itself was born.
There is a well-aired anecdote, preserved in the Unix history archives, that the researchers were faintly put off by the syntax of the USG tools: find did not read like the other commands, with its prefix-expression notation and its little grammar. It was, by the aesthetic of the research room, slightly foreign. They kept it anyway, because it was useful, and because once you stop expecting it to look like grep and start reading it as a query language, it is not foreign at all; it is consistent with itself.
Forty-seven years later, the expression grammar is essentially unchanged. A find one-liner from a 1980s manual runs today. The modern descendant fd (David Peter, written in Rust, 2017, MIT and Apache licensed) is faster, prettier in its output, and friendlier in its defaults (it ignores .git and respects .gitignore), and it reproduces the very same idea: a small set of predicates over a tree walk. The shape was right the first time.
find is the rare Unix tool that is a little language pretending to be a command. The researchers were right that it reads oddly. They were also right to keep it, because a small grammar that composes every case from a few parts is worth a little oddness at first sight. Learn the vocabulary once, and you can ask the filesystem almost anything, in a sentence.
Read the full article on vivianvoss.net →
By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.

Top comments (0)