Lyra

Posted on May 14

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

#linux #systemd #performance #opensource

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

If a Linux system feels slow to boot, the tempting move is to scan systemd-analyze blame, spot the biggest number, and disable whatever looks guilty.

That works just often enough to be dangerous.

A service can look slow because it is truly expensive, because it is waiting on something else, or because it sits on the boot critical path while other units run in parallel. The useful question is not "what has the biggest number?" It is "what is actually delaying the target I care about?"

systemd-analyze gives you the answer if you use the right subcommands in the right order.

In this guide, I'll show a practical workflow to:

measure boot time correctly
identify the real boot bottleneck
visualize the boot path
inspect who is pulling in a slow dependency
make targeted fixes instead of random boot-time surgery

What `systemd-analyze time` really measures

Start with the baseline:

systemd-analyze time

Example output:

Startup finished in 3.415s (kernel) + 6.712s (userspace) = 10.128s
graphical.target reached after 6.492s in userspace.

This is useful, but it is narrower than many people assume.

According to the systemd-analyze(1) manual, this measures:

time in the kernel before userspace
time in the initrd, if one exists
time until normal userspace has spawned system services

It does not guarantee the system is fully idle or that every service finished all of its work. Treat it as a boot baseline, not a complete performance profile.

Step 1: Use `blame`, but don't trust it blindly

Now list the slowest-starting units:

systemd-analyze blame | head -n 15

Example:

4.277s apt-daily.service
1.672s systemd-networkd-wait-online.service
1.653s apt-daily-upgrade.service
1.636s fstrim.service
1.567s cloud-init-main.service

This is a good shortlist, but it is not a causal graph.

The man page explicitly warns that blame can be misleading:

a unit may look slow because it is waiting for another unit
units of Type=simple do not show meaningful startup timing here
it only reports time spent in the activating state

So blame tells you what took time, not necessarily what delayed boot.

Step 2: Find the real blocker with `critical-chain`

This is the command that usually matters most:

systemd-analyze critical-chain

Example:

graphical.target @6.492s
└─multi-user.target @6.490s
  └─tailscaled.service @5.680s +806ms
    └─basic.target @5.558s
      └─sockets.target @5.556s
        └─uuidd.socket @5.554s
          └─sysinit.target @5.513s
            └─cloud-init-network.service @5.104s +395ms
              └─systemd-networkd-wait-online.service @3.427s +1.672s

How to read this:

@ = when the unit became active
+ = how long that unit itself took to start

This shows the path that actually delayed the target. In the example above, systemd-networkd-wait-online.service is on the critical path. That matters more than another service with a bigger blame number that ran in parallel.

If you only use one command after time, make it critical-chain.

Step 3: Generate a boot chart you can inspect visually

For messy boots, a picture helps:

systemd-analyze plot > bootup.svg
xdg-open bootup.svg

This generates an SVG timeline showing when each unit started and how long initialization took.

Why this helps:

you can see parallelism vs serialization
you can spot long waits before a unit even starts
you can distinguish "slow unit" from "slow dependency chain"

If you're working over SSH, copy the file locally and open it in a browser.

Step 4: Identify who actually requested the slow thing

A common boot delay is network-online.target or a wait-online service. The right fix is often not disabling it globally. The right fix is finding what needs it.

First inspect the reverse dependencies:

systemctl list-dependencies --reverse --no-pager network-online.target

Example:

network-online.target
● ├─cloud-config.service
● ├─cloud-final.service
● └─exim4.service

Then inspect the target itself:

systemctl show -p Wants -p Requires -p Before -p After network-online.target

Example:

Requires=
Wants=systemd-networkd-wait-online.service
Before=apt-daily.service cloud-final.service exim4.service
After=systemd-networkd-wait-online.service cloud-init-network.service network.target

This is where the diagnosis gets real:

if nothing important depends on network-online.target, boot delay may be accidental
if remote mounts depend on it, the wait may be justified
if only one consumer needs it, fix that consumer or narrow the wait condition

The systemd.special(7) manual makes an important distinction here:

network.target is a passive synchronization point and usually does not add much delay
network-online.target is an active target used by consumers that strictly require configured networking, and it can add substantial boot delay

That distinction is easy to miss, and it explains a lot of "mystery slow boots."

Step 5: Fix the dependency, not the symptom

Let's use a very common example: systemd-networkd-wait-online.service.

The systemd-networkd-wait-online.service(8) manual says the default service waits for all interfaces managed by systemd-networkd to be configured or failed, and for at least one to be online. On multi-NIC systems, VMs, or hosts with links that may not have carrier at boot, that can be longer than you want.

Safer fix pattern A: wait only for the interface that matters

If only one interface matters for boot-critical consumers, use the instance unit:

sudo systemctl disable systemd-networkd-wait-online.service
sudo systemctl enable systemd-networkd-wait-online@eth0.service
sudo systemctl reboot

That switches from "wait for everything" to "wait for eth0."

Safer fix pattern B: override the wait behavior

Create an override:

sudo systemctl edit systemd-networkd-wait-online.service

Use something like this:

[Service]
ExecStart=
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any --interface=eth0 --timeout=15

Then reload and test on the next boot:

sudo systemctl daemon-reload
sudo systemctl reboot

That tells the service to stop waiting for every managed link and to fail faster if the expected condition is not met.

Important warning

Do not blindly remove wait-online behavior on systems that need:

remote filesystems
network-backed identity or config on boot
cloud-init stages that expect usable networking
services that genuinely must start only after routable connectivity exists

The goal is targeted boot optimization, not shaving seconds by breaking startup ordering.

Step 6: Re-measure after every change

After each boot change, run the same small checklist:

systemd-analyze time
systemd-analyze blame | head -n 15
systemd-analyze critical-chain

If you want a before/after record:

mkdir -p ~/boot-profiles
stamp=$(date +%F-%H%M%S)
{
  echo "## $stamp"
  systemd-analyze time
  echo
  systemd-analyze blame | head -n 20
  echo
  systemd-analyze critical-chain
} > ~/boot-profiles/$stamp.txt

That makes it much easier to verify whether a change actually improved the path to multi-user.target or graphical.target.

A practical workflow that holds up

When a Linux boot feels slow, this is the sequence I trust:

systemd-analyze time
systemd-analyze blame | head -n 15
systemd-analyze critical-chain
systemd-analyze plot > bootup.svg
systemctl list-dependencies --reverse --no-pager network-online.target

That flow answers five different questions:

How long did boot take?
Which units consumed time?
Which chain delayed the final target?
What did parallel startup actually look like?
Which unit asked for the expensive dependency?

That is a much better place to start than disabling services because their names look suspicious.

Final thought

Fast boots come from fixing the dependency graph, not from collecting random disable --now trophies.

systemd-analyze blame is a hint. critical-chain is the diagnosis. The SVG plot is the sanity check.

Use all three together and you'll spend a lot less time optimizing the wrong thing.

References

systemd-analyze(1): https://manpages.debian.org/bookworm/systemd/systemd-analyze.1.en.html
systemd.special(7): https://manpages.debian.org/bookworm/systemd/systemd.special.7.en.html
systemd-networkd-wait-online.service(8): https://manpages.debian.org/bookworm/systemd/systemd-networkd-wait-online.service.8.en.html

DEV Community

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

What `systemd-analyze time` really measures

Step 1: Use `blame`, but don't trust it blindly

Step 2: Find the real blocker with `critical-chain`

Step 3: Generate a boot chart you can inspect visually

Step 4: Identify who actually requested the slow thing

Step 5: Fix the dependency, not the symptom

Safer fix pattern A: wait only for the interface that matters

Safer fix pattern B: override the wait behavior

Important warning

Step 6: Re-measure after every change

A practical workflow that holds up

Final thought

References

Top comments (0)

Stop Guessing Why Linux Boots Slowly: Practical systemd-analyze for Real Bottlenecks

What systemd-analyze time really measures

Step 1: Use blame, but don't trust it blindly

Step 2: Find the real blocker with critical-chain

Step 3: Generate a boot chart you can inspect visually

Step 4: Identify who actually requested the slow thing

Step 5: Fix the dependency, not the symptom

Safer fix pattern A: wait only for the interface that matters

Safer fix pattern B: override the wait behavior

Important warning

Step 6: Re-measure after every change

A practical workflow that holds up

Final thought

References

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

What `systemd-analyze time` really measures

Step 1: Use `blame`, but don't trust it blindly

Step 2: Find the real blocker with `critical-chain`