Every time I opened a terminal, I waited. Not long — maybe a second and a half — but long enough to notice. Long enough to be annoying. I finally decided to profile my zsh startup, and what I found took it from 1.4 seconds down to 53 milliseconds.
Here's what I learned.
Profiling with zprof
Zsh has a built-in profiler. Add zmodload zsh/zprof at the top of your .zshrc and zprof at the bottom, then open a new shell:
# top of .zshrc
zmodload zsh/zprof
# ... your config ...
# bottom of .zshrc
zprof
My initial profile told a clear story:
| Culprit | Time | % of startup |
|---|---|---|
NVM (nvm.sh) |
~430ms | 31% |
| Completion subprocesses (kubectl, helm, gh, ...) | ~400ms | 29% |
compinit (full rebuild every time) |
~240ms | 17% |
brew shellenv |
~30ms | 2% |
go env GOPATH |
~20ms | 1% |
| Everything else | ~280ms | 20% |
Four of these five are subprocess calls — things like eval "$(brew shellenv)" or source <(kubectl completion zsh) that fork a process just to produce some static text. That's the low-hanging fruit.
Optimization 1: Lazy-load NVM
NVM was the single biggest offender. Sourcing nvm.sh on every shell startup cost ~430ms, and I don't use node in every terminal session. The fix: wrapper functions that defer loading until you actually call nvm, node, npm, etc.
Before:
export NVM_DIR="$HOME/.nvm"
[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"
[ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && \. "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"
After:
export NVM_DIR="$HOME/.nvm"
_nvm_lazy_load() {
unfunction nvm node npm npx corepack 2>/dev/null
[ -s "/opt/homebrew/opt/nvm/nvm.sh" ] && \. "/opt/homebrew/opt/nvm/nvm.sh"
[ -s "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm" ] && \. "/opt/homebrew/opt/nvm/etc/bash_completion.d/nvm"
}
nvm() { _nvm_lazy_load; nvm "$@" }
node() { _nvm_lazy_load; node "$@" }
npm() { _nvm_lazy_load; npm "$@" }
npx() { _nvm_lazy_load; npx "$@" }
corepack() { _nvm_lazy_load; corepack "$@" }
The wrapper functions replace themselves on first call via unfunction, then delegate to the real command. Cost at startup: zero. Cost on first node invocation: ~430ms (once).
Optimization 2: Hardcode static values
Several lines in my config were spawning subprocesses to compute values that never change:
# Before — subprocess every startup
eval "$(/opt/homebrew/bin/brew shellenv)"
export PATH="$PATH:$(go env GOPATH)/bin"
. "$HOME/.cargo/env"
These produce the same output every time. Just paste the result directly:
# After — zero subprocesses
export HOMEBREW_PREFIX="/opt/homebrew"
export HOMEBREW_CELLAR="/opt/homebrew/Cellar"
export HOMEBREW_REPOSITORY="/opt/homebrew"
export PATH="/opt/homebrew/bin:/opt/homebrew/sbin:$PATH"
[ -z "${MANPATH-}" ] || export MANPATH=":${MANPATH#:}"
export INFOPATH="/opt/homebrew/share/info:${INFOPATH:-}"
export GOPATH="$HOME/go"
export PATH="$PATH:$GOPATH/bin"
export PATH="$HOME/.cargo/bin:$PATH"
Leave a comment like # regenerate with: brew shellenv so future-you knows where the values came from.
Optimization 3: Cache completions into fpath
This was the big one. My original config eagerly sourced completions from 12 different tools on every shell startup:
# Before — 12 subprocesses, every startup
command -v kubectl &>/dev/null && source <(kubectl completion zsh)
command -v helm &>/dev/null && source <(helm completion zsh)
command -v minikube &>/dev/null && source <(minikube completion zsh)
command -v gh &>/dev/null && source <(gh completion -s zsh)
# ... 8 more tools
Each source <(tool completion zsh) forks a subprocess AND evaluates thousands of lines of shell code. Minikube's completion alone is 5,000 lines.
The fix has two parts:
For completions: write them to files in an fpath directory. Compinit loads these lazily — only when you actually press TAB on that command:
ZSH_COMP_CACHE="$HOME/.zsh-completion-cache"
[[ -d "$ZSH_COMP_CACHE" ]] || mkdir -p "$ZSH_COMP_CACHE"
_cache_fpath() {
local name="$1"; shift
local cache_file="$ZSH_COMP_CACHE/_$name"
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
"$@" > "$cache_file" 2>/dev/null
fi
}
command -v kubectl &>/dev/null && _cache_fpath kubectl kubectl completion zsh
command -v helm &>/dev/null && _cache_fpath helm helm completion zsh
# ... etc
fpath=($ZSH_COMP_CACHE $fpath)
For plugins that must run at startup (fzf keybindings, direnv hook, oh-my-posh prompt), cache their init output and zcompile for faster sourcing:
_cache_source() {
local name="$1"; shift
local cache_file="$ZSH_COMP_CACHE/$name.zsh"
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
"$@" > "$cache_file" 2>/dev/null
zcompile "$cache_file" 2>/dev/null
fi
source "$cache_file"
}
_cache_source fzf fzf --zsh
_cache_source direnv direnv hook zsh
_cache_source oh-my-posh oh-my-posh init zsh --config ~/.poshthemes/theme.omp.json --print
Both functions use a 24-hour cache expiry via zsh glob qualifiers. Delete ~/.zsh-completion-cache to force a refresh.
I also cached compinit itself — a full rebuild only runs once per day, and otherwise compinit -C skips straight to the dump file:
autoload -Uz compinit
local -a zcompdump_stale=(~/.zcompdump(N.mh+24))
if (( $#zcompdump_stale )); then
compinit
else
compinit -C
fi
{ zcompile ~/.zcompdump } &!
The bug that almost ruined everything
After implementing all of this, I ran time zsh -i -c exit. The result: 1.59 seconds. Slower than before.
I profiled again and saw this:
num calls time self name
-----------------------------------------------------------------
1) 15 1180.06 97.34% 1169.02 96.43% _cache_completion
2) 1 26.83 2.21% 7.49 0.62% compinit
The caching function was taking 97% of startup time across 15 calls. The caches existed on disk but were being regenerated every single time. The staleness check was broken.
I restructured the approach — separating completions (fpath-based, lazy) from plugins (source-based, eager) — and tried again. Same problem: _cache_fpath at 72%, compinit doing full rebuilds.
The bug was in this line:
if [[ ! -f "$cache_file" || -n "$cache_file"(#qN.mh+24) ]]; then
This looks reasonable. The glob qualifier (#qN.mh+24) means "match if the file is older than 24 hours, with N (nullglob) to return empty string if no match." The -n test checks if the result is non-empty.
The problem: glob qualifiers don't expand inside [[ ]].
Zsh's [[ ]] conditional construct does not perform filename generation (globbing). The string "$cache_file"(#qN.mh+24) is treated as the literal path with (#qN.mh+24) appended as text. Since that string is always non-empty, the condition is always true. Every cache was being regenerated on every startup. The caching was doing nothing.
The same bug affected the compinit staleness check:
# Also broken — compinit was doing a full rebuild every time
if [[ -n ~/.zcompdump(#qN.mh+24) ]]; then
The fix: expand the glob into an array variable first, then check its length:
local -a stale=($cache_file(N.mh+24))
if [[ ! -f "$cache_file" ]] || (( $#stale )); then
Regular variable assignments DO perform globbing. The (N.mh+24) qualifier (no #q prefix needed outside [[ ]]) expands the glob, and $#stale gives us the match count. If the file is older than 24 hours, stale contains one element; otherwise it's empty.
This is a subtle footgun. The code looks correct, it doesn't produce errors, and the caches are created — they're just never reused. Without profiling, you'd never know.
Result
$ time zsh -i -c exit
zsh -i -c exit 0.03s user 0.02s system 93% cpu 0.053 total
53 milliseconds. A 96% reduction from 1.4 seconds.
Here's what each optimization contributed:
| Optimization | Savings |
|---|---|
| Lazy-load NVM | ~430ms |
| Cache completions into fpath (lazy compinit) | ~500ms |
| Cache plugin init scripts + zcompile | ~200ms |
| Hardcode brew/go/cargo | ~50ms |
| compinit -C (cached dump) | ~170ms |
| Total | ~1,350ms |
The first shell open after 24 hours takes a couple of seconds to regenerate caches, but every subsequent shell is instant. You can force a full refresh anytime:
rm -rf ~/.zsh-completion-cache ~/.zcompdump*
Takeaways
-
Profile first.
zproftold me exactly where the time was going. Don't guess. -
Subprocess calls add up. Each
eval $(...)orsource <(...)forks a process. Twelve of them cost almost a full second. - fpath > source for completions. Compinit loads completion functions lazily from fpath. Don't eagerly source thousands of lines you might never use.
- Test your caching actually works. A cache that regenerates every time is worse than no cache — it has the overhead of both the generation AND the file I/O.
-
Glob qualifiers don't work inside
[[ ]]. This is the kind of bug that looks correct, produces no errors, and silently destroys your performance. Expand globs into variables first.
Top comments (0)