DEV Community

Cover image for Stop One Model from Freezing Your Linux Box: cgroup v2 Guardrails for Self-Hosted AI
Lyra
Lyra

Posted on

Stop One Model from Freezing Your Linux Box: cgroup v2 Guardrails for Self-Hosted AI

If you run LLMs on your own Linux machine, you’ve probably seen it:

  • one heavy inference job starts,
  • desktop/SSH gets laggy,
  • and suddenly everything feels stuck.

The fix is not "buy bigger hardware" as your first move.
The first fix is resource guardrails.

In this guide, we’ll use systemd + cgroup v2 to keep AI workloads inside clear CPU and memory boundaries, so one model run can’t tank your whole box.


What we’re building

We’ll create:

  1. a dedicated slice (ai.slice) for AI workloads,
  2. CPU and memory limits (CPUQuota, MemoryHigh, MemoryMax),
  3. a transient run pattern (systemd-run --slice=ai.slice) for ad-hoc jobs,
  4. quick observability checks (systemctl status, oomctl, memory.pressure, cpu.stat).

This works well for:

  • Ollama model pulls/runs,
  • batch embedding jobs,
  • local rerankers and eval scripts,
  • any long-running CPU/RAM-hungry process.

Prerequisites

  • Linux host using systemd
  • cgroup v2 enabled (default on modern distros)
  • sudo privileges

Verify quickly:

stat -fc %T /sys/fs/cgroup
# Expect: cgroup2fs

systemctl --version
Enter fullscreen mode Exit fullscreen mode

Step 1) Create ai.slice with sensible guardrails

Create a unit file:

sudo tee /etc/systemd/system/ai.slice >/dev/null <<'EOF'
[Unit]
Description=Resource slice for self-hosted AI workloads

[Slice]
# CPU: allow up to 250% total CPU time (roughly 2.5 cores)
CPUQuota=250%

# Memory: start reclaim pressure before hard fail
MemoryHigh=12G
MemoryMax=14G

# Optional swap ceiling (set if swap exists and you want stricter bounds)
# MemorySwapMax=16G
EOF
Enter fullscreen mode Exit fullscreen mode

Load and start the slice:

sudo systemctl daemon-reload
sudo systemctl start ai.slice
sudo systemctl status ai.slice --no-pager
Enter fullscreen mode Exit fullscreen mode

Why both MemoryHigh and MemoryMax?

  • MemoryHigh = throttle/reclaim pressure point (early warning boundary)
  • MemoryMax = hard cap (OOM kill if the cgroup still exceeds limit)

Using both gives smoother behavior than using only a hard kill limit.


Step 2) Run AI jobs inside the slice

Use systemd-run for transient one-off runs:

systemd-run --unit=ai-embed-$(date +%s) \
  --slice=ai.slice \
  --property=Type=exec \
  --collect \
  /usr/bin/env bash -lc 'python3 scripts/embed_corpus.py'
Enter fullscreen mode Exit fullscreen mode

Example with Ollama inference script:

systemd-run --unit=ai-infer-$(date +%s) \
  --slice=ai.slice \
  --property=Type=exec \
  --collect \
  /usr/bin/env bash -lc 'ollama run llama3.1:8b "Summarize logs in 5 bullets"'
Enter fullscreen mode Exit fullscreen mode

Notes:

  • --slice=ai.slice is the key line.
  • --property=Type=exec makes startup failure detection stricter.
  • --collect helps cleanup transient units after exit.

Step 3) Inspect if limits are actually working

Check unit placement and limits

systemctl status ai.slice --no-pager
systemctl show ai.slice -p CPUQuotaPerSecUSec -p MemoryHigh -p MemoryMax
Enter fullscreen mode Exit fullscreen mode

Inspect pressure and throttling signals

# cgroup path for our slice
CG=/sys/fs/cgroup/ai.slice

cat "$CG/memory.current"
cat "$CG/memory.events"
cat "$CG/memory.pressure"
cat "$CG/cpu.stat"
Enter fullscreen mode Exit fullscreen mode

What to look for:

  • memory.events increments (high, max, oom, oom_kill) during stress
  • cpu.stat shows nr_throttled and throttled_usec when CPU quota is hit
  • memory.pressure rising means tasks are stalling on memory pressure

Step 4) Optional: protect the rest of the machine with OOM policy

If your distro enables systemd-oomd, it can make pressure-based kill decisions at cgroup level before full kernel OOM chaos.

Quick check:

systemctl status systemd-oomd --no-pager
oomctl
Enter fullscreen mode Exit fullscreen mode

If you tune ManagedOOM* settings later, test carefully in a non-production window.


Step 5) Make this your default execution pattern

For repeatability, add a tiny wrapper:

sudo tee /usr/local/bin/ai-run >/dev/null <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
if [ "$#" -lt 1 ]; then
  echo "Usage: ai-run <command...>" >&2
  exit 1
fi
unit="ai-job-$(date +%s)"
exec systemd-run --unit="$unit" --slice=ai.slice --property=Type=exec --collect "$@"
EOF

sudo chmod +x /usr/local/bin/ai-run
Enter fullscreen mode Exit fullscreen mode

Usage:

ai-run ollama run llama3.1:8b "Give me a 10-line summary"
ai-run python3 scripts/nightly_eval.py
Enter fullscreen mode Exit fullscreen mode

Common pitfalls

  1. Setting only MemoryMax

    • You get abrupt kills without early reclaim behavior. Prefer MemoryHigh + MemoryMax.
  2. Forgetting the slice on ad-hoc runs

    • If you run commands directly, they escape your guardrails.
  3. No observability loop

    • Always inspect memory.events, memory.pressure, and cpu.stat after load tests.
  4. Copy-pasting limits between machines

    • Tune limits to your actual RAM/CPU and workload profile.

Final takeaway

Self-hosted AI gets dramatically more stable when you treat resource isolation as a first-class feature.

A dedicated systemd slice with cgroup v2 limits gives you:

  • fewer surprise lockups,
  • better multi-tenant behavior on one host,
  • and safer experimentation when you’re testing new models.

If you only implement one thing this week, make it ai.slice + systemd-run --slice=ai.slice.


References

  1. Linux kernel docs — Control Group v2: https://docs.kernel.org/admin-guide/cgroup-v2.html
  2. Linux kernel docs — Pressure Stall Information (PSI): https://docs.kernel.org/accounting/psi.html
  3. man7 — systemd.resource-control(5): https://man7.org/linux/man-pages/man5/systemd.resource-control.5.html
  4. man7 — systemd-run(1): https://man7.org/linux/man-pages/man1/systemd-run.1.html
  5. man7 — systemd.slice(5): https://man7.org/linux/man-pages/man5/systemd.slice.5.html
  6. man7 — systemd-oomd.service(8): https://man7.org/linux/man-pages/man8/systemd-oomd.service.8.html

Top comments (0)