Ranaweera

Posted on Jun 3

Creating Robust systemd Services for Embedded Applications

#systemd #linux #embedded #devops

There is a moment every embedded Linux developer hits eventually. You have spent days building something that works beautifully — a sensor pipeline, a streaming server, an MQTT client — and then you reboot the device and everything is silent. Nothing started. You SSH in, manually run your script, and it all comes back to life. The hardware is fine. Your code is fine. You just have no way of automatically running it.

That is the gap systemd fills. It is the init system on virtually every modern Linux distribution, and on embedded Linux systems like the Raspberry Pi it is what decides what runs at boot, what gets restarted if it crashes, and where all the logs go. Once you understand how to write a service file, your applications stop being fragile scripts you need to babysit and start being first-class system services that survive reboots, network drops, and unexpected crashes.

This tutorial builds up from the simplest possible service file to a production-ready configuration, explaining every line along the way. By the end you will have a service running your own Python application, logging to the system journal, and automatically restarting itself after failures.

See Complete Tutorial in Github: Systemd Services Tutorial

What systemd Actually Does

Before writing any configuration, it helps to understand what problem systemd is solving, because the design of service files makes much more sense once you see the underlying model.

When your Raspberry Pi boots, the Linux kernel starts and immediately hands control to process ID 1 — the very first user-space process. On modern systems, that process is systemd. Everything that happens next — mounting filesystems, bringing up the network, starting your application — is orchestrated by systemd. It reads configuration files called unit files that describe what should be started, when, in what order, and what to do if something goes wrong.

A service file is just one type of unit file (there are also unit files for timers, sockets, mount points, and more, but services are what you will use most). When you tell systemd about your application through a service file, you are essentially saying: "here is my program, here is when I want it to run, and here is how I want you to manage it." systemd takes it from there — starting it at boot, watching it, restarting it if it dies, and capturing everything it prints to stdout and stderr into a structured log called the journal.

The central tool for interacting with systemd is systemctl. You use it to start, stop, enable, disable, and inspect services. The companion tool journalctl gives you access to the journal — the logs that systemd collects from every service it manages.

A Minimal Service File

Let us start with the simplest possible service to understand the structure, then build from there. Suppose you have a Python script at /home/pi/mqtt_client/production_client.py that you want to run automatically at boot.

Service files live in /etc/systemd/system/. Create a new one:

sudo nano /etc/systemd/system/mqtt-client.service

Here is the minimal version:

[Unit]
Description=Production MQTT Edge Client

[Service]
ExecStart=/usr/bin/python3 /home/pi/mqtt_client/production_client.py

[Install]
WantedBy=multi-user.target

Even this tiny file has three sections and each one has a specific job. The [Unit] section contains metadata and dependency declarations — the Description is just a human-readable label that appears in logs and status output. The [Service] section is where the actual execution configuration lives — right now we only have ExecStart, which is the command that launches your program. The [Install] section controls how the service integrates into the boot process — WantedBy=multi-user.target means "start this service when the system reaches the normal multi-user state", which is essentially "start this at boot when the system is ready for normal operation."

To activate it, you need to do two things: reload systemd so it picks up the new file, and enable the service so it starts at boot:

# Tell systemd to re-read all unit files
sudo systemctl daemon-reload

# Enable it to start at boot
sudo systemctl enable mqtt-client.service

# Start it right now without waiting for a reboot
sudo systemctl start mqtt-client.service

Check whether it is running:

sudo systemctl status mqtt-client.service

You will see output that tells you the service state active (running). This is what you want, the process ID, when it started, and the last few lines of log output. If it failed, the status output will usually tell you exactly why — an incorrect path, a Python import error, whatever the problem is.

The difference between enable and start trips people up at first, so it is worth being explicit: enable creates a symlink that tells systemd to start the service at boot — it does not start it right now. start starts it immediately — it does not persist across reboots. In practice you almost always want both.

Making it Actually Robust: Restart Policies

The minimal service above will start your application at boot, but if the application crashes — which happens, especially in edge environments with unreliable hardware or network — systemd will not do anything. It will just leave the service in a failed state. For an embedded device running unattended in the field, that is not acceptable.

The Restart directive in the [Service] section tells systemd what to do when your application exits. Here is what the options mean in practice. no means never restart (the default — not what you want for production). on-failure means restart if the process exits with a non-zero exit code or is killed by a signal, but not if it exits cleanly with code 0. always means restart no matter what — even if the process exits with code 0. on-abnormal means restart on crash or signal, but not on clean exit or timeout.

For most embedded applications, on-failure is the right choice. It means "if something goes wrong and the program dies unexpectedly, bring it back", but it also means "if I deliberately stop the service with systemctl stop, do not restart it."

There is one more important setting to pair with Restart: RestartSec, which sets how long to wait before restarting. Without it, systemd restarts immediately, which can cause problems if your service is crashing due to a dependency that is not ready yet (like the network). A short delay — even two or three seconds — gives things time to settle.

Here is the updated service with restart policy:

[Unit]
Description=Production MQTT Edge Client

[Service]
ExecStart=/usr/bin/python3 /home/pi/mqtt_client/production_client.py
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Declaring Dependencies

This is where systemd really starts to earn its keep. Most embedded applications do not run in isolation — they depend on the network being up, a filesystem being mounted, or another service being ready. systemd lets you express these dependencies explicitly so that your service starts at the right time and in the right order.

The two most commonly confused directives here are After and Requires. Think of them as answering different questions. After=network.target answers the question "when should I start?" — it tells systemd to not even attempt to launch this service until the network target has been reached. Requires=network.target answers the question "what must exist for me to function?" — it tells systemd that if the network goes away, this service should be stopped too.

You can use them together, and for network-dependent applications you almost always should. There is also Wants, which is a softer version of Requires — it expresses preference rather than hard dependency. If what Wants declares is not available, the service will still start rather than failing.

For an MQTT client that needs the network to do anything useful:

[Unit]
Description=Production MQTT Edge Client
After=network.target
Wants=network.target

[Service]
ExecStart=/usr/bin/python3 /home/pi/mqtt_client/production_client.py
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Wants rather than Requires here is deliberate. The MQTT client already handles network loss internally — it has exponential backoff and an offline queue for exactly this scenario. So we do not want systemd to kill the service if the network drops; we just want to make sure we do not start before the network stack is even initialised.

For cases where your application would be completely broken without a dependency — a database-backed service where the database is also managed by systemd, for example — use Requires. The distinction matters in practice.

Working Directory and Environment

Two things that bite almost every developer on their first real systemd deployment are working directory and environment variables. When you run a script manually from your terminal, you have a working directory (usually your home folder) and a full set of environment variables inherited from your shell. systemd does not give you either of those by default.

This matters because Python scripts often use relative paths like ./config.json or ./logs/, and those paths will fail when the working directory is not what you expect. Environment variables like PATH, HOME, and any custom variables your application reads from the environment will also be missing or wrong.

WorkingDirectory solves the path problem, and Environment or EnvironmentFile solves the variable problem:

[Unit]
Description=Production MQTT Edge Client
After=network.target
Wants=network.target

[Service]
# Set the working directory so relative paths work correctly
WorkingDirectory=/home/pi/mqtt_client

# Pass environment variables directly
Environment=MQTT_BROKER_HOST=192.168.1.100
Environment=MQTT_CLIENT_ID=warehouse_sensor_01

# Or load them from a file (better for secrets)
# EnvironmentFile=/etc/mqtt-client/config.env

ExecStart=/usr/bin/python3 /home/pi/mqtt_client/production_client.py
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

The EnvironmentFile approach is worth knowing about even if you do not use it immediately. It lets you store configuration in a separate file — including secrets like passwords — outside the service file itself. A file like /etc/mqtt-client/config.env contains simple KEY=value lines and is loaded by systemd before starting the service. This keeps credentials out of version control and makes configuration changes possible without touching the service file.

Logging

One of systemd's genuine gifts to embedded developers is centralised logging. Everything your application prints to stdout and stderr is automatically captured by the journal — no log file configuration required, no rotation to set up. The journal is structured, persistent across reboots (on most systems), and queryable in powerful ways.

The basic command to read logs from your service is:

# Show all logs from your service
journalctl -u mqtt-client.service

# Follow logs in real-time (like tail -f)
journalctl -u mqtt-client.service -f

# Show only logs since the last boot
journalctl -u mqtt-client.service -b

# Show logs from the last hour
journalctl -u mqtt-client.service --since "1 hour ago"

For embedded devices with limited storage, it is worth explicitly configuring how much disk space the journal is allowed to use. You do this not in the service file but in the journal configuration:

sudo nano /etc/systemd/journald.conf

Add or modify these lines:

[Journal]
# Maximum journal size on disk
SystemMaxUse=50M

# Maximum size of a single journal file
SystemMaxFileSize=10M

Then restart the journal daemon:

sudo systemctl restart systemd-journald

For embedded deployments where the device has a small SD card or eMMC, keeping the journal small is important. Fifty megabytes is a reasonable limit for most edge devices.

A Production-Ready Example

Putting all of this together, here is what the service file looks like for a real deployment — the kind of configuration you would use for a long-running embedded application that needs to be reliable in the field. This is modelled after exactly how I deploy the MQTT edge client and the WebRTC streaming service on the telepresence robot:

[Unit]
Description=Production MQTT Edge Client with Offline Resilience
# Human-readable detail shown in status output and logs
Documentation=https://github.com/ranaweerasupun/mqtt-production-client

# Start after the network is available, but do not require it —
# the application handles network loss internally with its own
# backoff and offline queue
After=network.target
Wants=network.target

# If the service restarts more than 5 times in 60 seconds, give up
# This prevents an infinite crash-restart-crash loop from overwhelming the system
# (these belong in [Unit] on systemd v230+)
StartLimitIntervalSec=60
StartLimitBurst=5

[Service]
# Run as a specific user rather than root — better security practice
User=pi
Group=pi

# Set working directory so relative paths in the application work
WorkingDirectory=/home/pi/mqtt_client

# Load configuration from a separate file
# This keeps secrets out of the service file
EnvironmentFile=/etc/mqtt-client/config.env

# The command that starts the application
ExecStart=/usr/bin/python3 /home/pi/mqtt_client/production_client.py

# What to do when the process exits unexpectedly
# on-failure: restart if the process crashes or exits non-zero
# but NOT if you run 'systemctl stop mqtt-client'
Restart=on-failure

# Wait 5 seconds before restarting
# This prevents rapid restart loops if something is fundamentally broken
RestartSec=5

# Give the service up to 30 seconds to stop gracefully before killing it
TimeoutStopSec=30

# Capture stdout and stderr to the journal
StandardOutput=journal
StandardError=journal

# Tag journal entries with this identifier for easy filtering
SyslogIdentifier=mqtt-client

[Install]
WantedBy=multi-user.target

The StartLimitIntervalSec and StartLimitBurst combination is worth understanding because it solves a real problem. Imagine your MQTT client has a bug that makes it crash immediately on startup — perhaps a malformed config file or a missing dependency. Without these limits, systemd would restart it immediately, it would crash again, systemd would restart it again, and this would loop forever, consuming CPU and filling your journal with crash logs. With StartLimitBurst=5 and StartLimitIntervalSec=60, systemd will make five restart attempts within a 60-second window, and if all five fail, it marks the service as failed and stops trying. At that point systemctl status will clearly tell you the service has hit its restart limit, which prompts you to actually investigate the root cause.

The User=pi directive is also important for production embedded deployments. Running services as root is a security risk — if your MQTT client has a vulnerability, an attacker who exploits it gets root access. Running as a non-privileged user limits the damage. The tradeoff is that you need to make sure that user has permission to access the files and ports your application needs.

Useful Commands to Know

Once your service is running, a handful of commands will cover most of what you need day-to-day:

# Check current status, recent logs, and whether it is enabled at boot
sudo systemctl status mqtt-client.service

# Start the service right now
sudo systemctl start mqtt-client.service

# Stop the service (will not restart automatically due to Restart=on-failure)
sudo systemctl stop mqtt-client.service

# Restart the service (useful after changing your application code)
sudo systemctl restart mqtt-client.service

# Reload the service file after you edit it (then restart to apply changes)
sudo systemctl daemon-reload
sudo systemctl restart mqtt-client.service

# Disable the service from starting at boot
sudo systemctl disable mqtt-client.service

# View full log history for this service
journalctl -u mqtt-client.service

# View logs in real-time
journalctl -u mqtt-client.service -f

One workflow note: whenever you edit the service file, you must run daemon-reload before the changes take effect. systemd caches unit file contents and will not notice your edits otherwise. Forgetting this step is a common source of confusion when your changes do not seem to be working.

What systemd Gives You for Free

It is worth pausing to appreciate what you get from writing a proper service file, because the alternative — ad hoc startup scripts in /etc/rc.local or cron @reboot jobs — gives you almost none of it.

With a systemd service you get automatic restart on crash, ordered startup relative to other system components, centralised and queryable logging, a clean mechanism to start and stop the application during development, restart rate limiting to prevent crash loops, graceful shutdown handling, and the ability to run as a non-root user easily. All of that from a text file that is, at its core, less than twenty lines.

For an embedded device deployed in a factory, a warehouse, or anywhere else it needs to run unattended for months at a time, that reliability infrastructure is not optional. systemd gives it to you essentially for free, as long as you take the time to describe your service correctly.

Top comments (1)

Ranaweera • Jun 3 • Edited

This one came straight out of debugging my own telepresence robot deployment. If anything's unclear or you want me to expand a section, drop a comment. Happy to help.