Martin Oehlert

Posted on Feb 9

From 1.4s to 53ms: Optimizing zsh Startup on macOS

#zsh #performance #shell

Every time I opened a terminal, I waited. Not long — maybe a second and a half — but long enough to notice. Long enough to be annoying. I finally decided to profile my zsh startup, and what I found took it from 1.4 seconds down to 53 milliseconds.

Here's what I learned.

Profiling with zprof

Zsh has a built-in profiler. Add zmodload zsh/zprof at the top of your .zshrc and zprof at the bottom, then open a new shell:

# top of .zshrc
zmodload zsh/zprof

# ... your config ...

# bottom of .zshrc
zprof

My initial profile told a clear story:

Culprit	Time	% of startup
NVM (`nvm.sh`)	~430ms	31%
Completion subprocesses (kubectl, helm, gh, ...)	~400ms	29%
`compinit` (full rebuild every time)	~240ms	17%
`brew shellenv`	~30ms	2%
`go env GOPATH`	~20ms	1%
Everything else	~280ms	20%

Four of these five are subprocess calls — things like eval "$(brew shellenv)" or source <(kubectl completion zsh) that fork a process just to produce some static text. That's the low-hanging fruit.

Optimization 1: Lazy-load NVM

NVM was the single biggest offender. Sourcing nvm.sh on every shell startup cost ~430ms, and I don't use node in every terminal session. The fix: wrapper functions that defer loading until you actually call nvm, node, npm, etc.

Before:

export NVM_DIR="$HOME/.nvm"
[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"
[ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && \. "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"

After:

export NVM_DIR="$HOME/.nvm"

_nvm_lazy_load() {
  unfunction nvm node npm npx corepack 2>/dev/null
  [ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"
  [ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && \. "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"
}

nvm()      { _nvm_lazy_load; nvm "$@" }
node()     { _nvm_lazy_load; node "$@" }
npm()      { _nvm_lazy_load; npm "$@" }
npx()      { _nvm_lazy_load; npx "$@" }
corepack() { _nvm_lazy_load; corepack "$@" }

The wrapper functions replace themselves on first call via unfunction, then delegate to the real command. Cost at startup: zero. Cost on first node invocation: ~430ms (once).

Optimization 2: Hardcode static values

Several lines in my config were spawning subprocesses to compute values that never change:

# Before — subprocess every startup
eval "$(/opt/homebrew/bin/brew shellenv)"
export PATH="$PATH:$(go env GOPATH)/bin"
. "$HOME/.cargo/env"

These produce the same output every time. Just paste the result directly:

# After — zero subprocesses
export HOMEBREW_PREFIX="/opt/homebrew"
export HOMEBREW_CELLAR="/opt/homebrew/Cellar"
export HOMEBREW_REPOSITORY="/opt/homebrew"
export PATH="/opt/homebrew/bin:/opt/homebrew/sbin:$PATH"
[ -z "${MANPATH-}" ] || export MANPATH=":${MANPATH#:}"
export INFOPATH="/opt/homebrew/share/info:${INFOPATH:-}"

export GOPATH="$HOME/go"
export PATH="$PATH:$GOPATH/bin"

export PATH="$HOME/.cargo/bin:$PATH"

Leave a comment like # regenerate with: brew shellenv so future-you knows where the values came from.

Optimization 3: Cache completions into fpath

This was the big one. My original config eagerly sourced completions from 12 different tools on every shell startup:

# Before — 12 subprocesses, every startup
command -v kubectl  &>/dev/null && source <(kubectl completion zsh)
command -v helm     &>/dev/null && source <(helm completion zsh)
command -v minikube &>/dev/null && source <(minikube completion zsh)
command -v gh       &>/dev/null && source <(gh completion -s zsh)
# ... 8 more tools

Each source <(tool completion zsh) forks a subprocess AND evaluates thousands of lines of shell code. Minikube's completion alone is 5,000 lines.

The fix has two parts:

For completions: write them to files in an fpath directory. Compinit loads these lazily — only when you actually press TAB on that command:

ZSH_COMP_CACHE="$HOME/.zsh-completion-cache"
[[ -d "$ZSH_COMP_CACHE" ]] || mkdir -p "$ZSH_COMP_CACHE"

_cache_fpath() {
  local name="$1"; shift
  local cache_file="$ZSH_COMP_CACHE/_$name"
  local -a stale=($cache_file(N.mh+24))
  if [[ ! -f "$cache_file" ]] || (( $#stale )); then
    "$@" > "$cache_file" 2>/dev/null
  fi
}

command -v kubectl &>/dev/null && _cache_fpath kubectl kubectl completion zsh
command -v helm    &>/dev/null && _cache_fpath helm    helm completion zsh
# ... etc

fpath=($ZSH_COMP_CACHE $fpath)

For plugins that must run at startup (fzf keybindings, direnv hook, oh-my-posh prompt), cache their init output and zcompile for faster sourcing:

_cache_source() {
  local name="$1"; shift
  local cache_file="$ZSH_COMP_CACHE/$name.zsh"
  local -a stale=($cache_file(N.mh+24))
  if [[ ! -f "$cache_file" ]] || (( $#stale )); then
    "$@" > "$cache_file" 2>/dev/null
    zcompile "$cache_file" 2>/dev/null
  fi
  source "$cache_file"
}

_cache_source fzf fzf --zsh
_cache_source direnv direnv hook zsh
_cache_source oh-my-posh oh-my-posh init zsh --config ~/.poshthemes/theme.omp.json --print

Both functions use a 24-hour cache expiry via zsh glob qualifiers. Delete ~/.zsh-completion-cache to force a refresh.

I also cached compinit itself — a full rebuild only runs once per day, and otherwise compinit -C skips straight to the dump file:

autoload -Uz compinit
local -a zcompdump_stale=(~/.zcompdump(N.mh+24))
if (( $#zcompdump_stale )); then
  compinit
else
  compinit -C
fi
{ zcompile ~/.zcompdump } &!

The bug that almost ruined everything

After implementing all of this, I ran time zsh -i -c exit. The result: 1.59 seconds. Slower than before.

I profiled again and saw this:

num  calls                time            self            name
-----------------------------------------------------------------
 1)   15   1180.06  97.34%  1169.02  96.43%  _cache_completion
 2)    1     26.83   2.21%     7.49   0.62%  compinit

The caching function was taking 97% of startup time across 15 calls. The caches existed on disk but were being regenerated every single time. The staleness check was broken.

I restructured the approach — separating completions (fpath-based, lazy) from plugins (source-based, eager) — and tried again. Same problem: _cache_fpath at 72%, compinit doing full rebuilds.

The bug was in this line:

if [[ ! -f "$cache_file" || -n "$cache_file"(#qN.mh+24) ]]; then

This looks reasonable. The glob qualifier (#qN.mh+24) means "match if the file is older than 24 hours, with N (nullglob) to return empty string if no match." The -n test checks if the result is non-empty.

The problem: glob qualifiers don't expand inside [[ ]].

Zsh's [[ ]] conditional construct does not perform filename generation (globbing). The string "$cache_file"(#qN.mh+24) is treated as the literal path with (#qN.mh+24) appended as text. Since that string is always non-empty, the condition is always true. Every cache was being regenerated on every startup. The caching was doing nothing.

The same bug affected the compinit staleness check:

# Also broken — compinit was doing a full rebuild every time
if [[ -n ~/.zcompdump(#qN.mh+24) ]]; then

The fix: expand the glob into an array variable first, then check its length:

local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then

Regular variable assignments DO perform globbing. The (N.mh+24) qualifier (no #q prefix needed outside [[ ]]) expands the glob, and $#stale gives us the match count. If the file is older than 24 hours, stale contains one element; otherwise it's empty.

This is a subtle footgun. The code looks correct, it doesn't produce errors, and the caches are created — they're just never reused. Without profiling, you'd never know.

Result

$ time zsh -i -c exit
zsh -i -c exit  0.03s user 0.02s system 93% cpu 0.053 total

53 milliseconds. A 96% reduction from 1.4 seconds.

Here's what each optimization contributed:

Optimization	Savings
Lazy-load NVM	~430ms
Cache completions into fpath (lazy compinit)	~500ms
Cache plugin init scripts + zcompile	~200ms
Hardcode brew/go/cargo	~50ms
compinit -C (cached dump)	~170ms
Total	~1,350ms

The first shell open after 24 hours takes a couple of seconds to regenerate caches, but every subsequent shell is instant. You can force a full refresh anytime:

rm -rf ~/.zsh-completion-cache ~/.zcompdump*

Takeaways

Profile first. zprof told me exactly where the time was going. Don't guess.
Subprocess calls add up. Each eval $(...) or source <(...) forks a process. Twelve of them cost almost a full second.
fpath > source for completions. Compinit loads completion functions lazily from fpath. Don't eagerly source thousands of lines you might never use.
Test your caching actually works. A cache that regenerates every time is worse than no cache — it has the overhead of both the generation AND the file I/O.
Glob qualifiers don't work inside [[ ]]. This is the kind of bug that looks correct, produces no errors, and silently destroys your performance. Expand globs into variables first.