DEV Community

Cover image for A For Loop Skipped Every File With a Space and Called the Backup a Success
Anguishe
Anguishe

Posted on • Originally published at bashsnippets.xyz

A For Loop Skipped Every File With a Space and Called the Backup a Success

The nightly backup looped over for f in $(ls /data/exports) and copied each file to a backup volume. It exited clean every night. For three weeks, green exit codes, no errors, nothing in the logs to suggest anything was wrong. The backup script had been written by someone who left the company six months before I started, and it had never been tested against files with spaces in their names because the original export directory only ever had files like Q3.xlsx and report-final.xlsx.

Then someone generated Q3 final.xlsx.

It took three weeks because that file was generated once a quarter and the next time someone needed it was a quarter later. The person who needed it was the CFO. The CFO does not particularly enjoy being told that the backup system that was supposed to protect the quarterly export has been silently failing for an indeterminate period of time and we are not sure which other files it missed.

I know the specific backup had been running for three weeks because that was the date the file appeared in the exports directory. Every night since then, the loop had been splitting Q3 final.xlsx into two items — Q3 and final.xlsx — trying to copy two paths that did not exist, logging two harmless "no such file or directory" lines that nobody read, and moving on. Every file without a space in its name backed up fine. The script looked correct because most of the time it was.

Why $(ls) word-splits

When you write for f in $(ls /data/exports), bash runs ls, captures its text output, and splits it on IFS — the internal field separator, which defaults to spaces, tabs, and newlines. Filenames are just text in the output of ls. A file named Q3 final.xlsx is one filename, but ls outputs it as the string Q3 final.xlsx, and bash splits that string on the space into two separate items before the loop ever starts.

This is not a bug in ls. This is not a bug in bash. This is exactly what $( ) does to any command's output — it captures text and bash processes it as text. The problem is that filenames are not reliably text-safe; they can contain any character except null and the path separator. Spaces are common. Tabs are less common but legal. Newlines are technically legal. Treating command output as a filename list breaks the moment any of those show up.

The fix is to stop treating command output as a filename list and let bash build the list from the filesystem directly:

#!/bin/bash
# Script: backup-exports.sh
# Purpose: copy export files without losing ones with spaces in their names
# Usage: ./backup-exports.sh
set -euo pipefail

CHECK="✓"
CROSS="✗"
SRC_DIR="/data/exports"
DEST_DIR="/backup/exports"
shopt -s nullglob   # empty glob expands to nothing, not the literal pattern

for f in "$SRC_DIR"/*.xlsx; do
  if cp -- "$f" "$DEST_DIR/"; then
    echo "$CHECK backed up: $(basename "$f")"
  else
    echo "$CROSS failed: $(basename "$f")"
  fi
done
Enter fullscreen mode Exit fullscreen mode

for f in "$SRC_DIR"/*.xlsx asks bash to expand the glob. Bash talks directly to the filesystem and gets back a properly-separated list of matching paths. No command output, no text splitting, no ambiguity about what the separator is. A file named Q3 final.xlsx stays one item because it was never turned into text and split back apart. The -- before "$f" tells cp to stop reading flags, so a filename that starts with a dash does not get interpreted as a cp option.

The second half: quoting on use

The glob fixes the loop header. Quoting fixes the point of use. These are separate.

for f in "$SRC_DIR"/*.xlsx; do
  cp $f "$DEST_DIR/"    # WRONG — $f re-splits here
  cp "$f" "$DEST_DIR/"  # RIGHT — the quotes prevent the split
done
Enter fullscreen mode Exit fullscreen mode

Even with a correct glob, an unquoted $f in the cp command re-splits on spaces. Bash has already expanded the variable to Q3 final.xlsx, but when you use it unquoted, the shell processes word splitting again on that value and cp receives two arguments: Q3 and final.xlsx. The quotes around "$f" tell bash to pass the entire value as a single argument. The rule is: glob over parse to build the list, quote on use to keep each item intact.

I have seen this mistake repeated in scripts at three different companies. In each case the scripts had been running for months or years and the problem was invisible because the majority of files had no spaces. The ones that did — client names, quarterly reports, anything a non-technical person named — were being silently skipped or mishandled.

Ranges and counters: where the brace trap hides

n=5

# This does NOT produce a range — it produces the literal string {1..5}
for i in {1..$n}; do echo "attempt $i"; done

# This works — C-style, evaluates variables at runtime
for ((i = 1; i <= n; i++)); do echo "attempt $i of $n"; done
Enter fullscreen mode Exit fullscreen mode

Brace expansion happens before variable expansion in bash's order of operations. By the time $n is replaced with its value, the brace expansion step has already passed and {1..$n} is just a string. This is the kind of thing that fails silently in a test environment where the count is small and hardcoded, and causes weird output in production where it is a variable.

The C-style loop is the correct form any time the bound is a variable. It evaluates at runtime and handles arithmetic naturally. If the count is genuinely fixed at write time, {1..10} works fine. If it is ever going to be a variable, use for (( )).

Arrays: the original bug wearing a different hat

servers=("web-01" "db primary" "cache-02")

# WRONG — word-splits "db primary" into two iterations
for s in ${servers[@]}; do
  ping -c1 "$s"
done

# RIGHT — quotes keep each element intact
for s in "${servers[@]}"; do
  ping -c1 "$s"
done
Enter fullscreen mode Exit fullscreen mode

"${servers[@]}" with the quotes and [@] is the form that preserves each element as a single item regardless of what is in it. Without the quotes, bash word-splits the array expansion and db primary becomes two separate loop iterations — neither of which is a real hostname. This is exactly the same word-splitting mechanism as the $(ls) problem, just manifesting in arrays instead of command output.

Once you internalize that word-splitting happens wherever an unquoted variable or expansion appears, the rule becomes one rule instead of several: always quote expansions. The specific context changes; the mechanism does not.

The nullglob case

shopt -s nullglob
for f in /data/exports/*.xlsx; do
  echo "$f"
done
Enter fullscreen mode Exit fullscreen mode

Without shopt -s nullglob, if no .xlsx files exist, the glob *.xlsx does not expand — it stays as the literal string *.xlsx. The loop runs once with f set to the literal string /data/exports/*.xlsx. Your script then tries to process a file with that exact path, which does not exist, and produces an error or silently does nothing depending on what you do with it.

Setting nullglob tells bash to expand a non-matching glob to nothing (an empty list), so the loop simply does not run. This is almost always the right behavior when you are iterating files that might not exist.

What happened to the Q3 export

We recovered it from the CFO's local machine, where she had downloaded it before the backup was supposed to preserve it. The fix to the script took four minutes. The conversation about why the backup system had been failing silently for three weeks took longer. The monitoring that we added afterwards — a nightly check that the backup directory has at least as many files as the source directory — took another twenty minutes.

The monitoring should have been there from the start. So should the glob. So should the set -euo pipefail that would have made the copy failures loud instead of silent. These are things you add before something breaks, and the only reason to know you need them is to have seen, or caused, or read about what happens when they are missing.

Full examples with the safe glob, counter, C-style loop, array form, and nullglob guard: https://bashsnippets.xyz/snippets/bash-for-loop-examples

For reading a file's lines one at a time, a for loop is the wrong tool — use while IFS= read -r — and wrap any loop that touches real files in set -euo pipefail. The rest is at https://bashsnippets.xyz

Top comments (0)