I sped up bash startup from 165 ms to 40 ms. It’s actually noticeable. Why and how did I do it?
Table of Contents
Motivation
Whenever I need to quickly look something up (or use a calculator), I open a new terminal (using a keyboard shortcut) and start typing into it. Slow bash startup disrupts this workflow as I would often type before the shell prompt:
Daniel Parker recently wrote an excellent blog post Faster Bash Startup detailing his journey from 1.7 seconds to 210 ms. I start at 165 ms and need to go significantly lower than Daniel, therefore different techniques will be needed.
Investigation
hyperfine is a brilliant command-line tool for benchmarking commands that I discovered recently (thanks to Daniel!), so let’s see where we are now:
[tomi@notes ~]$ hyperfine 'bash -i'
Benchmark #1: bash -i
Time (mean ± σ): 165.8 ms ± 0.7 ms [User: 156.3 ms, System: 12.8 ms]
Range (min … max): 164.9 ms … 167.1 ms 17 runs
Now we need to find out what’s taking so long. How to profile a bash shell script slow startup? Most Stack Overflow answers suggest some variant of set -x
, which will help us find any single command that takes unusually long.
man
In my case, that command was man -w
, specifically this piece of my .bashrc.d/10_env.sh
:
export MANPATH=$HOME/.local/share/man:
# FIXME: workaround for /usr/share/bash-completion/completions/man
MANPATH=$(man -w)
Turns out none of this is needed any more, man
and manpath
now add ~/.local/share/man
automatically so I can just drop it and save more than 100 ms1.
death by a thousand cuts
But that’s it. No other single command stands out, it’s just a lot of small things that add up. Daniel says “it has to take some time,” and he’s mostly right, but I still have one trick up my sleeve.
My .bashrc
is split into several smaller parts in ~/.bashrc.d
, so I can profile these and see if anything stands out. My .bashrc
thus becomes:
for i in ~/.bashrc.d/*.sh; do
if [[$__bashrc_bench]]; then
TIMEFORMAT="$i: %R"
time . "$i"
unset TIMEFORMAT
else
. "$i"
fi
done; unset i
Let’s see what happens…
[tomi@notes ~]$ __bashrc_bench=1 bash -i
/home/tomi/.bashrc.d/10_env.sh: 0,118
/home/tomi/.bashrc.d/20_history.sh: 0,000
/home/tomi/.bashrc.d/20_prompt.sh: 0,002
/home/tomi/.bashrc.d/30_completion_git.sh: 0,000
/home/tomi/.bashrc.d/31_completion.sh: 0,011
/home/tomi/.bashrc.d/50_aliases.sh: 0,002
/home/tomi/.bashrc.d/50_aliases_sudo.sh: 0,000
/home/tomi/.bashrc.d/50_functions.sh: 0,001
/home/tomi/.bashrc.d/50_git_dotfiles.sh: 0,008
/home/tomi/.bashrc.d/50_mc.sh: 0,000
/home/tomi/.bashrc.d/90_fzf.sh: 0,011
118 ms in 10_env.sh
was caused by man -w
and we know what to do with that.
completions
11 ms in 31_completion.sh
which loads bash-completion. That’s certainly better than Daniel’s 235 ms, probably because up-to-date bash-completion only loads a few necessary completions and defers everything else to being loaded on demand. I couldn’t live without the completions, so 11 ms is a fair price.
8 ms for 50_git_dotfiles.sh
, which defines a few aliases and sets up git completions for my git-dotfiles
alias, seems too much, though. Good news is that we don’t need to drop this. We can use bash-completion’s on-demand loading. Whenever completions for command cmd
are needed for the first time, bash-completion looks for ~/.local/share/bash-completion/completions/cmd
or /usr/share/bash-completion/completions/cmd
.
Therefore, ~/.local/share/bash-completion/completions/git-dotfiles
becomes:
. /usr/share/bash-completion/completions/git
complete -F _git git-dotfiles
fzf
90_fzf.sh
loads key bindings and completions code so that fzf is used when searching through history, completing **
in filenames, etc. Well worth the 11 ms it needs to load2.
are we done yet?
After these changes, I got:
[tomi@notes ~]$ __bashrc_bench=1 bash -i
/home/tomi/.bashrc.d/10_env.sh: 0,001
/home/tomi/.bashrc.d/20_history.sh: 0,000
/home/tomi/.bashrc.d/20_prompt.sh: 0,002
/home/tomi/.bashrc.d/30_completion_git.sh: 0,000
/home/tomi/.bashrc.d/31_completion.sh: 0,012
/home/tomi/.bashrc.d/50_aliases.sh: 0,002
/home/tomi/.bashrc.d/50_aliases_sudo.sh: 0,000
/home/tomi/.bashrc.d/50_functions.sh: 0,001
/home/tomi/.bashrc.d/50_git_dotfiles.sh: 0,000
/home/tomi/.bashrc.d/50_mc.sh: 0,000
/home/tomi/.bashrc.d/90_fzf.sh: 0,011
That’s 29 ms, brilliant! Or… is it? 🤔
[tomi@notes ~]$ hyperfine 'bash -i'
Benchmark #1: bash -i
Time (mean ± σ): 55.7 ms ± 1.0 ms [User: 47.6 ms, System: 11.1 ms]
Range (min … max): 54.8 ms … 58.9 ms 53 runs
history
Some of those additional 26 ms are spent reading my huge (HISTSIZE=50000
) .bash_history
file. I will skip the details about how I investigated this, because I didn’t: I stumbled upon this by chance while testing something else.
We can see that using an empty history file brings us down to a little under 40 ms:
[tomi@notes ~]$ HISTFILE=/tmp/.bash_history_tmp hyperfine 'bash -i'
Benchmark #1: bash -i
Time (mean ± σ): 38.6 ms ± 0.7 ms [User: 34.0 ms, System: 7.8 ms]
Range (min … max): 37.8 ms … 42.3 ms 75 runs
Now, cutting 17 ms by sacrificing the shell history is probably not a good deal for most people. I settled for setting up a systemd timer to back up .bash_history
to git once a day and lowered HISTSIZE
to 50003. This still keeps my bash startup below 40 ms:
[tomi@notes ~]$ hyperfine 'bash -i'
Benchmark #1: bash -i
Time (mean ± σ): 39.9 ms ± 0.5 ms [User: 36.1 ms, System: 6.8 ms]
Range (min … max): 39.1 ms … 42.1 ms 73 runs
Conclusion
By dropping unnecessary invocation of man -w
, deferring loading of git completions to when they’re needed, and shortening my shell history file, I managed to speed up bash startup from 165 ms to 40 ms.
Benchmark #1: bash -i
Time (mean ± σ): 165.8 ms ± 0.7 ms [User: 156.3 ms, System: 12.8 ms]
Range (min … max): 164.9 ms … 167.1 ms 17 runs
Benchmark #1: bash -i
Time (mean ± σ): 39.9 ms ± 0.5 ms [User: 36.1 ms, System: 6.8 ms]
Range (min … max): 39.1 ms … 42.1 ms 73 runs
More importantly, I no longer type before the prompt, even if I try!
And at this point I can finally agree with Daniel that further tweaking will only have diminishing returns4. 😊
Update 1: Why not fix typing before the prompt instead?
Redditor buttellmewhynot (pun intended) comments:
I feel like it shouldn't matter that the shell starts with a delay. If you start a shell, the computer should assume that you want further input directed there and queue somewhere to send it to the shell when it's up.
I understand that there's probably a lot of weird quirks about how terminals and shells work and how processes get created but surely there's a way to do this.
They're right on both points. The input is queued somewhere, and there is a way to fix the messed up prompt. As some might suspect, zsh handles it fine: try running sleep 5
and type some input in the meantime:
zsh | bash |
---|---|
We can see that:
- in all cases, the input appears twice (bit annoying, but tolerable)
- zsh prompt is never messed up
- bash prompt is messed up if there's no newline after the input5
- no input is discarded, in contrast to the first image of this post
Turns out my PROMPT_COMMAND which was meant to ensure the prompt always starts on new line was discarding the pending input. Zsh uses a different approach, printing $COLUMNS
spaces and then a carriage return (explanation), which I don't like as it messes up copy/paste. But I managed to improve my solution to correctly detect pending input and not discard it.
It's not perfect (so I'll still try to keep bash startup fast), but it's definitely an improvement, and it will be useful whenever I get impatient with a slow command and start typing the next command before the prompt appears.
Thank you buttellmewhynot for nudging me in the correct direction.
At the time of publishing this post,
man -w
no longer takes 100+ ms thanks to several performance improvements in libseccomp ↩At the time of publishing this post, the latest fzf release (0.24.3) loads twice as long (20+ ms). I fixed this in #2246 and #2250, but it might take a short while to be released and find its way to distributions. ↩
5000 is a bit limiting in practice, as it rolls over in a few weeks. In 2020, you’d expect your shell to keep unlimited history without slowdown. I will address this in another post soon. ↩
Some people may be even more sensitive to latency than me, but measurements by Dan Luu suggest that at this scale there are other bottlenecks: Computer latency, Keyboard latency. ↩
GNU Readline assumes the prompt starts in the first column so it gets more messed up later e.g. when walking through history using ↑/↓. ↩
Top comments (1)
There's been some discussion on reddit already, if anyone's interested: old.reddit.com/r/programming/dupli...