DEV Community

Cover image for Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks
Lyra
Lyra

Posted on

Stop Guessing Why Linux Boots Slowly: Practical `systemd-analyze` for Real Bottlenecks

Stop Guessing Why Linux Boots Slowly: Practical systemd-analyze for Real Bottlenecks

If a Linux system feels slow to boot, the tempting move is to scan systemd-analyze blame, spot the biggest number, and disable whatever looks guilty.

That works just often enough to be dangerous.

A service can look slow because it is truly expensive, because it is waiting on something else, or because it sits on the boot critical path while other units run in parallel. The useful question is not "what has the biggest number?" It is "what is actually delaying the target I care about?"

systemd-analyze gives you the answer if you use the right subcommands in the right order.

In this guide, I'll show a practical workflow to:

  • measure boot time correctly
  • identify the real boot bottleneck
  • visualize the boot path
  • inspect who is pulling in a slow dependency
  • make targeted fixes instead of random boot-time surgery

What systemd-analyze time really measures

Start with the baseline:

systemd-analyze time
Enter fullscreen mode Exit fullscreen mode

Example output:

Startup finished in 3.415s (kernel) + 6.712s (userspace) = 10.128s
graphical.target reached after 6.492s in userspace.
Enter fullscreen mode Exit fullscreen mode

This is useful, but it is narrower than many people assume.

According to the systemd-analyze(1) manual, this measures:

  • time in the kernel before userspace
  • time in the initrd, if one exists
  • time until normal userspace has spawned system services

It does not guarantee the system is fully idle or that every service finished all of its work. Treat it as a boot baseline, not a complete performance profile.

Step 1: Use blame, but don't trust it blindly

Now list the slowest-starting units:

systemd-analyze blame | head -n 15
Enter fullscreen mode Exit fullscreen mode

Example:

4.277s apt-daily.service
1.672s systemd-networkd-wait-online.service
1.653s apt-daily-upgrade.service
1.636s fstrim.service
1.567s cloud-init-main.service
Enter fullscreen mode Exit fullscreen mode

This is a good shortlist, but it is not a causal graph.

The man page explicitly warns that blame can be misleading:

  • a unit may look slow because it is waiting for another unit
  • units of Type=simple do not show meaningful startup timing here
  • it only reports time spent in the activating state

So blame tells you what took time, not necessarily what delayed boot.

Step 2: Find the real blocker with critical-chain

This is the command that usually matters most:

systemd-analyze critical-chain
Enter fullscreen mode Exit fullscreen mode

Example:

graphical.target @6.492s
└─multi-user.target @6.490s
  └─tailscaled.service @5.680s +806ms
    └─basic.target @5.558s
      └─sockets.target @5.556s
        └─uuidd.socket @5.554s
          └─sysinit.target @5.513s
            └─cloud-init-network.service @5.104s +395ms
              └─systemd-networkd-wait-online.service @3.427s +1.672s
Enter fullscreen mode Exit fullscreen mode

How to read this:

  • @ = when the unit became active
  • + = how long that unit itself took to start

This shows the path that actually delayed the target. In the example above, systemd-networkd-wait-online.service is on the critical path. That matters more than another service with a bigger blame number that ran in parallel.

If you only use one command after time, make it critical-chain.

Step 3: Generate a boot chart you can inspect visually

For messy boots, a picture helps:

systemd-analyze plot > bootup.svg
xdg-open bootup.svg
Enter fullscreen mode Exit fullscreen mode

This generates an SVG timeline showing when each unit started and how long initialization took.

Why this helps:

  • you can see parallelism vs serialization
  • you can spot long waits before a unit even starts
  • you can distinguish "slow unit" from "slow dependency chain"

If you're working over SSH, copy the file locally and open it in a browser.

Step 4: Identify who actually requested the slow thing

A common boot delay is network-online.target or a wait-online service. The right fix is often not disabling it globally. The right fix is finding what needs it.

First inspect the reverse dependencies:

systemctl list-dependencies --reverse --no-pager network-online.target
Enter fullscreen mode Exit fullscreen mode

Example:

network-online.target
● ├─cloud-config.service
● ├─cloud-final.service
● └─exim4.service
Enter fullscreen mode Exit fullscreen mode

Then inspect the target itself:

systemctl show -p Wants -p Requires -p Before -p After network-online.target
Enter fullscreen mode Exit fullscreen mode

Example:

Requires=
Wants=systemd-networkd-wait-online.service
Before=apt-daily.service cloud-final.service exim4.service
After=systemd-networkd-wait-online.service cloud-init-network.service network.target
Enter fullscreen mode Exit fullscreen mode

This is where the diagnosis gets real:

  • if nothing important depends on network-online.target, boot delay may be accidental
  • if remote mounts depend on it, the wait may be justified
  • if only one consumer needs it, fix that consumer or narrow the wait condition

The systemd.special(7) manual makes an important distinction here:

  • network.target is a passive synchronization point and usually does not add much delay
  • network-online.target is an active target used by consumers that strictly require configured networking, and it can add substantial boot delay

That distinction is easy to miss, and it explains a lot of "mystery slow boots."

Step 5: Fix the dependency, not the symptom

Let's use a very common example: systemd-networkd-wait-online.service.

The systemd-networkd-wait-online.service(8) manual says the default service waits for all interfaces managed by systemd-networkd to be configured or failed, and for at least one to be online. On multi-NIC systems, VMs, or hosts with links that may not have carrier at boot, that can be longer than you want.

Safer fix pattern A: wait only for the interface that matters

If only one interface matters for boot-critical consumers, use the instance unit:

sudo systemctl disable systemd-networkd-wait-online.service
sudo systemctl enable systemd-networkd-wait-online@eth0.service
sudo systemctl reboot
Enter fullscreen mode Exit fullscreen mode

That switches from "wait for everything" to "wait for eth0."

Safer fix pattern B: override the wait behavior

Create an override:

sudo systemctl edit systemd-networkd-wait-online.service
Enter fullscreen mode Exit fullscreen mode

Use something like this:

[Service]
ExecStart=
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online --any --interface=eth0 --timeout=15
Enter fullscreen mode Exit fullscreen mode

Then reload and test on the next boot:

sudo systemctl daemon-reload
sudo systemctl reboot
Enter fullscreen mode Exit fullscreen mode

That tells the service to stop waiting for every managed link and to fail faster if the expected condition is not met.

Important warning

Do not blindly remove wait-online behavior on systems that need:

  • remote filesystems
  • network-backed identity or config on boot
  • cloud-init stages that expect usable networking
  • services that genuinely must start only after routable connectivity exists

The goal is targeted boot optimization, not shaving seconds by breaking startup ordering.

Step 6: Re-measure after every change

After each boot change, run the same small checklist:

systemd-analyze time
systemd-analyze blame | head -n 15
systemd-analyze critical-chain
Enter fullscreen mode Exit fullscreen mode

If you want a before/after record:

mkdir -p ~/boot-profiles
stamp=$(date +%F-%H%M%S)
{
  echo "## $stamp"
  systemd-analyze time
  echo
  systemd-analyze blame | head -n 20
  echo
  systemd-analyze critical-chain
} > ~/boot-profiles/$stamp.txt
Enter fullscreen mode Exit fullscreen mode

That makes it much easier to verify whether a change actually improved the path to multi-user.target or graphical.target.

A practical workflow that holds up

When a Linux boot feels slow, this is the sequence I trust:

systemd-analyze time
systemd-analyze blame | head -n 15
systemd-analyze critical-chain
systemd-analyze plot > bootup.svg
systemctl list-dependencies --reverse --no-pager network-online.target
Enter fullscreen mode Exit fullscreen mode

That flow answers five different questions:

  1. How long did boot take?
  2. Which units consumed time?
  3. Which chain delayed the final target?
  4. What did parallel startup actually look like?
  5. Which unit asked for the expensive dependency?

That is a much better place to start than disabling services because their names look suspicious.

Final thought

Fast boots come from fixing the dependency graph, not from collecting random disable --now trophies.

systemd-analyze blame is a hint. critical-chain is the diagnosis. The SVG plot is the sanity check.

Use all three together and you'll spend a lot less time optimizing the wrong thing.

References

Top comments (0)