우병수

Posted on May 9 • Originally published at techdigestor.com

Patching Linux Kernel CVEs Before the Distro Does: A Sysadmin's Survival Guide

#productivity #tools #webdev #discuss

TL;DR: The gap that actually hurts isn't the one after a patch drops — it's the one before your distro ships it. A CVE gets published on NVD with a CVSS score of 9.

📖 Reading time: ~34 min

What's in this article

The Situation Nobody Warns You About
First: Know What You're Actually Dealing With
Option 1: Live Kernel Patching (No Reboot Required)
Option 2: Build and Apply the Upstream Patch Yourself
Option 3: Mitigation Without a Patch
Option 4: Kernel Lockdown and Hardening to Reduce Blast Radius
Tracking the Patch: How to Know When Your Distro Catches Up
The Reboot Strategy: Making Reboots Less Painful

The Situation Nobody Warns You About

The gap that actually hurts isn't the one after a patch drops — it's the one before your distro ships it. A CVE gets published on NVD with a CVSS score of 9.8, the upstream kernel commit fixing it is already merged into Linus's tree, security researchers start publishing PoC code within 48 hours, and your apt update && apt upgrade returns exactly nothing useful. That's the window where real damage happens, and almost nobody has a written plan for it.

I ran into this firsthand during the Dirty Pipe disclosure (CVE-2022-0847). The fix landed in kernel 5.16.11 and 5.15.25, but Ubuntu 22.04 LTS was shipping 5.15.0-series kernels with its own backport schedule. The NVD entry was live, the PoC was on GitHub, and uname -r on my prod boxes showed a vulnerable kernel. Ubuntu's security team is genuinely fast compared to most distros — but "fast" in backport terms still meant roughly 4–7 days before linux-image packages reflected the fix. In a SOC 2 or PCI-DSS environment, that's not a gap you can explain away to an auditor with "we were waiting for apt."

StackRot (CVE-2023-3269) made this worse because the exploit required no special privileges — a local user could escalate to root via a flaw in the maple tree VM area replacement code. Looney Tunables (CVE-2023-4911) hit glibc rather than the kernel directly, but the pattern was identical: NVD publication → public PoC → distro patch shipping on a lag that felt uncomfortably long given the blast radius. These aren't obscure edge cases. They're the normal lifecycle of a serious Linux vulnerability, and the distro backport delay is structural, not accidental.

What you actually need during that window breaks down into three distinct actions, each with real tradeoffs:

Live kernel patching (kpatch, KernelCare, Canonical Livepatch) — applies the fix without a reboot, but you're trusting a third party's patch quality or paying for enterprise tooling
Building and deploying the upstream kernel yourself — gives you full control but breaks your package manager's expectations and requires a tested rollback path
Exploit-specific mitigations — disabling the vulnerable subsystem, tightening seccomp profiles, or dropping capabilities — which buy time without touching the kernel binary at all

This guide walks through all three with actual commands, not conceptual diagrams. I'll also cover how to verify whether your specific kernel build is actually vulnerable (CVE scope statements are frequently imprecise about which kernel configs matter), because spinning up emergency patching for a kernel compiled without the vulnerable feature is a waste of an incident response weekend. And if you're also juggling tooling costs while managing an active incident, the guide on Essential SaaS Tools for Small Business in 2026 covers keeping your operational overhead sane during exactly that kind of crunch.

First: Know What You're Actually Dealing With

The first mistake I see teams make is treating every kernel CVE as equally urgent. They get a scanner alert, see a CVSS score of 8.5, and immediately panic. The score tells you almost nothing about whether your specific system is at risk. What actually matters: your running kernel version, your compile-time config, and whether the vulnerability has a working public exploit.

Start with the obvious check that surprisingly few people do first:

# What's actually running right now
uname -r
# 6.1.0-18-amd64

# Compare this against the CVE's affected range — if the CVE says
# "fixed in 6.1.85" and you're on 6.1.0-18, you need to dig further.
# Debian backports patches without bumping the upstream version number,
# so "uname -r" alone doesn't tell the whole story on distro kernels.

The backport issue trips people up constantly. Debian stable might ship a kernel that looks older than the fix commit but has the patch applied. Check /proc/version_signature or the distro's changelog (apt changelog linux-image-$(uname -r) on Debian-based systems) before assuming you're exposed. On RHEL/CentOS, rpm -q --changelog kernel | grep CVE-2023-XXXX is faster.

Even if your version is in the vulnerable range, the subsystem might not be compiled into your kernel at all. This eliminates a huge percentage of "critical" CVEs in practice:

# Is the vulnerable subsystem even built in?
grep -r CONFIG_BLUETOOTH /boot/config-$(uname -r)
# CONFIG_BLUETOOTH=m   ← compiled as a module, loaded only if hardware present
# CONFIG_BLUETOOTH is not set  ← you're clean, move on

# Check if the module is actually loaded
lsmod | grep bluetooth
# Nothing? Then even if the module exists on disk, it's not running.

For tracking down the actual fix, skip the NVD page and go straight to the source. The NVD description is often vague and the "affected versions" data can be stale or wrong. The fix commit in the Linux kernel git tree tells you exactly what changed and which stable branches got the backport:

# Clone a shallow copy of the kernel tree if you don't have it
git clone --depth=1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

# Search commit history for a specific CVE — maintainers include them in commit messages
git log --oneline --all | grep CVE-2023-35788

# Or search the stable branch directly (faster for checking backport status)
git log --oneline stable/linux-6.1.y | grep CVE-2023-35788

On the CVSS vs real exploitability gap: I've seen CVSS 9.8 scores on vulnerabilities that require a very specific hardware device to be attached, or a non-default kernel config that no production system runs. Meanwhile, a CVSS 7.0 local privilege escalation with a working exploit on GitHub and a proof-of-concept that compiles on Ubuntu 22.04 in under 30 seconds is a five-alarm fire. My rule: if there's a public PoC that runs without modification, treat it as critical regardless of the score. If it's a theoretical bug with no PoC, requires CAP_NET_ADMIN, and your systems don't expose local shell access, deprioritize it. Check nomi-sec/PoC-in-GitHub and exploit-db.com before you finalize your severity assessment.

Option 1: Live Kernel Patching (No Reboot Required)

The thing that surprises most ops teams is that live kernel patching doesn't actually fix anything permanently — it buys you a maintenance window without the downtime tax. You're injecting compiled patches into the running kernel's memory. The moment that machine reboots for any reason (power event, OOM killer chain reaction, whatever), you're back to the unpatched kernel. That's not a bug in the approach, it's just the reality you need to plan around.

kpatch on RHEL/Fedora/CentOS

Red Hat's kpatch toolchain is the most mature open option. On RHEL 9, the workflow is tighter than it used to be — you're pulling a specific kernel-version-matched patch package rather than compiling anything yourself:

# Install the kpatch framework first
sudo dnf install kpatch kpatch-dnf

# Then pull the patch for your exact running kernel
# Replace the version string with whatever `uname -r` returns
sudo dnf install kpatch-patch-5_14_0-284_30_1

# Enable the service so patches load on... well, theoretically on boot
# (but remember: the patches themselves don't persist)
sudo systemctl enable --now kpatch

# Confirm what's actually loaded
kpatch list

The kpatch list output will show you patch module names and their status. If it's empty after install, the service didn't load — check journalctl -u kpatch. The kpatch-dnf plugin is what wires this into your normal update flow so patches get applied automatically when Red Hat ships them. Without it, you're manually tracking CVEs and matching package names yourself, which gets miserable fast.

Canonical Livepatch for Ubuntu

Canonical's offering is tighter to set up but the free tier has a hard ceiling — 3 machines, as of current pricing at livepatch.canonical.com. For a homelab or a small staging fleet, that's fine. For anything bigger you're looking at Ubuntu Pro pricing. The setup is genuinely one command once you have a token:

sudo canonical-livepatch enable <your-token-from-ubuntu.com/security/livepatch>

# Check status and what patches are applied
canonical-livepatch status --verbose

The verbose status output tells you the kernel state, which CVEs are covered, and whether a check-in with Canonical's servers succeeded. One gotcha: Livepatch only covers LTS kernels. If you're running an Ubuntu system with a hardware enablement (HWE) kernel that's newer than the LTS baseline, coverage can lag by weeks. Check the kernel matrix on their site before assuming you're covered on a machine you recently upgraded.

KernelCare (TuxCare)

KernelCare is the choice when you're managing 50+ heterogeneous hosts and want patches to apply without any human in the loop. The agent polls TuxCare's servers, downloads patches, and loads them — no commands, no cron jobs you wrote yourself, no remembering to run kpatch list on 80 machines. It covers a wider distro matrix than either kpatch or Livepatch, including older CentOS 6/7 systems that Red Hat's tooling has abandoned. The pricing is per-server-per-month, so at scale you're doing the math against the cost of scheduled maintenance windows plus the ops time to coordinate them. For most teams managing that headcount, KernelCare wins that calculation clearly.

The deeper issue with all three options: they're not a substitute for proper kernel updates, they're a gap-filler. The workflow should be — CVE drops, live patch immediately, then schedule a real reboot into the patched kernel within your next maintenance window. Teams that treat live patching as a permanent solution end up with kernel versions that drift badly from their package manager's expectations, and that creates its own class of debugging nightmares when something breaks six months later.

Option 2: Build and Apply the Upstream Patch Yourself

The thing that catches most people off guard is that "apply the upstream fix" sounds like a 10-minute job. It almost never is. Distro kernels — Ubuntu, RHEL, Debian — are not vanilla kernel.org trees. They carry hundreds of out-of-tree patches for driver backports, distro-specific config changes, and security hardening. Your upstream fix was written against a clean tree. You're applying it to a Frankenstein kernel. Budget 30–90 minutes of conflict resolution minimum, more if the vulnerability is in mm/ or the scheduler.

Step 1: Get the Fix Commit from kernel.org

Find the fix commit hash from the CVE advisory, kernel mailing list, or the security tracker at security-tracker.debian.org. Once you have the hash, generate a proper patch file:

# Clone a shallow copy of torvalds/linux if you don't have it
git clone --depth=50 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux

# Generate the patch — -1 means just that one commit
git format-patch -1 <commit-hash>
# Output: 0001-<subject-line>.patch

format-patch over a raw git diff matters here. The output includes the commit message, authorship metadata, and the diff in a format that git am can consume — which gives you cleaner conflict tracking than patch -p1 alone.

Step 2: Pull Your Distro's Kernel Source

On Ubuntu this is a one-liner, but it pulls a lot of data, so do it on a machine with real bandwidth:

# This drops the source tree in the current directory
apt-get source linux-image-$(uname -r)

# You'll end up with a directory like linux-6.8.0/
cd linux-6.8.0/

# Now attempt the patch
patch -p1 < ../0001-your-fix.patch

If you see FAILED -- saving rejects to file, open the .rej files. Each one shows the hunk that didn't apply. This is where the real work starts — you're manually reconciling the upstream fix against whatever Ubuntu or Canonical has already touched in that file. Use git log --oneline -- drivers/your/file.c inside the distro source to see what patches they've already applied. Sometimes the distro already backported 70% of the fix and you're just missing one hunk.

Step 3: Build Using Your Distro's Config

Never build with a generic config. You need to match the exact config the running kernel was compiled with, otherwise you'll produce a kernel that boots differently — missing modules, wrong subsystem options:

# Copy the running kernel's config
cp /boot/config-$(uname -r) .config

# Update for any new config symbols in the patched tree
make olddefconfig

# Build a .deb package — much easier to install and track than raw binaries
make -j$(nproc) deb-pkg

# Output lands one directory up: ../linux-image-*.deb
ls ../*.deb

The deb-pkg target is worth knowing. It produces installable packages you can track with dpkg, roll back cleanly, and push to other machines via a local apt repo. Building directly with make install is messier and harder to undo if the new kernel panics on boot.

The Secure Boot Trap

If your system has Secure Boot enabled — and most modern servers and cloud VMs do — a custom-built kernel won't boot unless it's signed with a key that's enrolled in the MOK (Machine Owner Key) database. This is the step that bites people at 2 AM. Generate your signing key first, before you finish the build:

# Generate a self-signed cert valid for 365 days
openssl req -new -x509 -newkey rsa:2048 \
  -keyout signing_key.pem \
  -out signing_cert.pem \
  -days 365 \
  -subj '/CN=Kernel Signing/'

# Sign the kernel image after build
sbsign --key signing_key.pem \
       --cert signing_cert.pem \
       --output vmlinuz-patched \
       arch/x86/boot/bzImage

# Enroll the cert into MOK — you'll need physical/console access on next boot
mokutil --import signing_cert.pem

The mokutil --import step queues the certificate for enrollment, but the actual enrollment happens during the next boot via a UEFI prompt. On a headless server you need IPMI or a serial console to complete it. If you skip this and just dpkg -i the kernel, Secure Boot will silently refuse to load it and your machine will drop to firmware or boot the old kernel — no error, just confusion.

Test in a VM First, Non-Negotiably

Spin up a VM with the exact same kernel version as production before touching anything live. On Ubuntu, multipass launch --name test-vm gets you a running instance in under two minutes. Copy the built .deb over, install it, and watch the boot. Specifically watch for: module load failures, any BUG: or WARNING: output in dmesg, and whether your specific vulnerable subsystem still functions correctly. The reason you need the same kernel version — not just the same distro release — is that the conflict patterns you resolved in the source are version-specific. A patch that applies cleanly and boots fine on 6.8.0-47 can behave differently on 6.8.0-51 if Canonical touched the same file in a point release.

Option 3: Mitigation Without a Patch

Mitigation Without a Patch

The most underrated moment in vulnerability response is realizing you don't actually need the vulnerable thing. Before you start stacking sysctls, run lsmod | grep <module> and check if it's even loaded. If it's not needed in production — and kernel modules for obscure filesystems, old networking protocols, or legacy hardware often aren't — just kill it permanently:

# Prevent the module from loading at boot or via modprobe
echo 'install tipc /bin/false' >> /etc/modprobe.d/disable-vuln.conf

# Rebuild initramfs so the blacklist is baked in (RHEL/CentOS/Fedora)
dracut -f

# On Debian/Ubuntu, use update-initramfs instead
update-initramfs -u -k all

The install <module> /bin/false trick is stronger than blacklist. A blacklist can be overridden by a dependency chain at boot; the install override literally replaces the load command with /bin/false. Verify it worked: modprobe tipc && echo loaded should silently fail with exit code 1. If you're on a system where the module is already loaded, you need rmmod tipc first — and yes, that can cause a brief service hiccup if anything has an open socket on it.

Sysctl mitigations vary wildly in how much they actually help. Some are genuine attack surface reduction; others are theater. The ones worth deploying fast when you're dealing with privilege escalation bugs:

kernel.yama.ptrace_scope=2 — blocks ptrace from non-privileged processes entirely. A huge chunk of local privilege escalation exploits rely on ptrace to inspect or manipulate process memory. This breaks some legitimate debugging workflows (strace on a running process, GDB attach), so warn your devs before pushing it. Scope 3 is completely locked down; scope 2 only allows root.
kernel.perf_event_paranoid=3 — disables perf event access for unprivileged users. Several kernel CVEs in the last few years have used perf subsystem as the initial primitive for info leak or privilege escalation. Set it to 3 and most of those entry points close.
kernel.dmesg_restrict=1 — without this, any local user can read kernel addresses from dmesg, which hands attackers KASLR bypass for free. Should honestly be default everywhere.

# Apply immediately (survives until reboot)
sysctl -w kernel.yama.ptrace_scope=2
sysctl -w kernel.perf_event_paranoid=3
sysctl -w kernel.dmesg_restrict=1

# Persist across reboots
cat >> /etc/sysctl.d/99-vuln-mitigation.conf << 'EOF'
kernel.yama.ptrace_scope = 2
kernel.perf_event_paranoid = 3
kernel.dmesg_restrict = 1
EOF

For network-triggered bugs — the TIPC heap overflow from CVE-2022-0435, or the nftables UAF chain — your fastest mitigation is dropping at the firewall before the kernel subsystem even touches the packet. With nftables:

# Drop all TIPC traffic (IP protocol 132 is SCTP, TIPC typically runs over UDP/port 666
# or directly over Ethernet — if over UDP:
nft add rule inet filter input udp dport 666 drop

# Verify the rule landed
nft list chain inet filter input

For bugs in raw socket handling or protocol-specific parsers, combining the firewall block with an AppArmor profile that denies CAP_NET_RAW to anything that doesn't need it gives you defense in depth. The exploit has to clear both hurdles. AppArmor profiles live in /etc/apparmor.d/ and you can reload a single profile without a reboot: apparmor_parser -r /etc/apparmor.d/your-profile.

Document everything with timestamps, and I mean everything. Auditors don't take "I added a sysctl" as an answer — they want to see when it was applied, what CVE it addresses, who approved it, and when you expect to replace it with an actual patch. A minimal but audit-proof record looks like this:

# /etc/sysctl.d/99-vuln-mitigation.conf
# CVE-2022-0435 (TIPC heap overflow) — ticket SEC-4821
# Applied: 2024-03-12T14:30Z by ops-user@
# Expected remediation: kernel update to 5.15.94+ pending vendor patch
# Reviewed by: security-lead@
kernel.perf_event_paranoid = 3

Put the same note in your change management system. The comment in the conf file is for the next engineer who SSHs in at 2am wondering why perf is broken; the ticket is for the auditor who pulls change logs six months later. Both matter, and neither substitutes for the other.

Option 4: Kernel Lockdown and Hardening to Reduce Blast Radius

Most people treat hardening as something you do after a vulnerability is disclosed. Flip that thinking: if you reduce what an attacker can do with a kernel bug, the urgency of every future unpatched CVE drops significantly. A vulnerability that requires writing to arbitrary kernel memory is far less scary if lockdown mode is active and your critical processes are running under tight seccomp profiles.

Kernel Lockdown Mode (5.4+)

Lockdown LSM shipped mainline in kernel 5.4 and gives you two levels: integrity and confidentiality. Integrity mode blocks things like writing to /dev/mem, loading unsigned modules, and using kprobes to modify kernel code — exactly the primitives most kernel exploits chain together. Confidentiality mode goes further and also blocks reading from /dev/mem and restricts some tracing interfaces. I'd recommend integrity for production; confidentiality will break eBPF-based observability tools, perf, and anything that leans on PERF_EVENT_OPEN with kernel-level access.

# Check current lockdown state
cat /sys/kernel/security/lockdown

# Set at runtime (survives until reboot, integrity is the safer starting point)
echo integrity > /sys/kernel/security/lockdown

# Lock it in via GRUB — add to GRUB_CMDLINE_LINUX in /etc/default/grub
lockdown=integrity

# Rebuild grub config after editing
update-grub  # Debian/Ubuntu
grub2-mkconfig -o /boot/grub2/grub.cfg  # RHEL/Rocky

Before enabling this on a box you actually care about, check whether your distro builds with CONFIG_SECURITY_LOCKDOWN_LSM=y. Ubuntu 20.04+ and Fedora 32+ have it enabled. CentOS 7 and older RHEL 8 variants don't. You can verify with grep LOCKDOWN /boot/config-$(uname -r). If it's not there, you're building custom kernels anyway — add it to your .config and set CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y so it activates before the LSM framework fully initializes, otherwise a clever exploit can race the enable window.

grsecurity for When You Need the Real Stuff

If you're running infrastructure that's genuinely high-value — payment processing, certificate authorities, anything where kernel exploitation is a plausible threat model — grsecurity is worth the subscription cost. It's not free; you need a commercial license from grsecurity.net. What you get is RBAC that operates at the kernel level (not just SELinux/AppArmor policies that live in userspace), significantly improved ASLR entropy that closes gaps in the mainline implementation, and PaX — which enforces non-executable memory strictly and makes return-oriented programming attacks substantially harder. The thing that separates it from mainline hardening isn't any single feature; it's the patch cadence and the fact that it often addresses classes of vulnerabilities before CVEs are assigned.

Seccomp Profiles: Deny What You Don't Use

Lockdown protects the kernel from being modified. Seccomp protects you by reducing the kernel attack surface your processes can even reach. The workflow most people skip is the profiling step — they either copy a generic profile or skip it entirely. The right way is to trace your actual workload:

# Profile syscalls your app actually makes under realistic load
# Run this during a representative traffic window, not just startup
strace -f -e trace=all -o /tmp/syscalls.txt ./your-app

# Extract just the syscall names, sorted by frequency
grep "^[0-9]" /tmp/syscalls.txt | awk -F'(' '{print $1}' | awk '{print $NF}' | sort | uniq -c | sort -rn

# For a running process (less intrusive than strace on startup)
# Use perf with kernel 4.17+ for lower overhead
perf trace -p $(pgrep your-app) 2>&1 | awk '{print $2}' | sort | uniq -c | sort -rn | head -40

Once you have the list, build a seccomp profile that allows exactly those syscalls and blocks everything else with SCMP_ACT_KILL_PROCESS — not SCMP_ACT_ERRNO, which lets the process handle the error and potentially try something else. For Docker workloads, drop this into a JSON profile and pass it with --security-opt seccomp=/path/to/profile.json. The gotcha that burns people: clone, futex, and mmap variants differ between kernel versions. A profile built on kernel 6.1 may reject clone3 on 4.18 because the syscall number changed — always pin your profile to the kernel family you're actually running.

Start with integrity lockdown, not confidentiality — confidentiality will break your monitoring stack on day one and you'll be disabling it under pressure at 2am.
Profile seccomp during peak load, not just startup — many syscalls only appear during specific code paths like garbage collection, signal handling, or connection teardown.
grsecurity RBAC requires learning time — budget a few days to run in learning mode before enforcing, or you'll lock yourself out of something critical.
Combine layers — lockdown + seccomp + SELinux/AppArmor together means an attacker needs to chain bypasses across three independent mechanisms, which is qualitatively harder than bypassing any one of them.

Tracking the Patch: How to Know When Your Distro Catches Up

The gap between a CVE hitting the NVD and your distro shipping a patched kernel can be days, weeks, or longer depending on the severity and how backport-friendly the fix is. Most teams I've seen get burned aren't running unpatched kernels because they didn't care — they're running them because nobody noticed the patch landed. That's a process failure, not a knowledge failure, and it's fixable with about 30 minutes of setup.

Ubuntu Security Notices (USN)

The USN mailing list at usn.ubuntu.com is underrated. You get an email the moment Canonical publishes a kernel errata, before most RSS aggregators pick it up. Subscribe at https://lists.ubuntu.com/mailman/listinfo/ubuntu-security-announce. Each notice includes the affected packages, fixed version numbers, and the CVE list. The thing that surprised me: kernel updates often bundle 10-20 CVEs in one USN, including several that never made major security news. If you're only watching NVD tweets, you're missing the composite picture. Filter your inbox with a rule like subject:"USN-" AND "linux-image" so you catch kernel-specific ones without drowning in OpenSSL and curl noise.

Red Hat's Errata Tracker

Red Hat's tracker at access.redhat.com/security/vulnerabilities is genuinely the best distro-level CVE interface I've used. For each CVE you get a per-release breakdown — RHEL 8.8, 8.9, 9.2, 9.3 — showing whether the fix is available, in-progress, or explicitly marked "will not fix" with a rationale. That last category matters: Red Hat sometimes decides a kernel CVE doesn't affect their build due to a config flag they ship disabled. If you're running CentOS Stream or RHEL derivatives like AlmaLinux, the errata IDs map across directly. Pull errata JSON programmatically if you want to automate:

# RHSA ID lookup via Red Hat's public API
curl -s "https://access.redhat.com/hydra/rest/securitydata/cve/CVE-2024-1086.json" \
  | jq '.affected_release[] | select(.product_name | test("Red Hat Enterprise Linux")) | {product: .product_name, package: .package, advisory: .advisory}'

Debian's Per-Branch Status at a Glance

Debian's tracker at https://security-tracker.debian.org/tracker/CVE-XXXX-XXXX shows status for each active branch — bullseye, bookworm, trixie, and sid — with a clear "fixed in X.X.X-Y" or "vulnerable" label. The thing to watch is the distinction between fixed in sid and fixed in stable. A patch can sit in unstable for two weeks before the stable backport ships. I've had situations where bookworm showed "fixed" in the tracker but the actual package version in the repo was still the vulnerable one — the tracker was reflecting an accepted upload that hadn't propagated through the mirror network yet. Always confirm with apt-cache policy linux-image-$(uname -r | cut -d'-' -f1,2) against the version in the tracker.

Automated Scanning with cve-bin-tool

For teams that want something more systematic than manual tracker lookups, cve-bin-tool gives you a quick command-line scan against the running kernel version:

pip install cve-bin-tool

# Scan your running kernel version against NVD data
cve-bin-tool --kernel $(uname -r)

# Output as JSON for piping into SIEM or Slack webhook
cve-bin-tool --kernel $(uname -r) --format json -o /tmp/kernel-cve-$(date +%F).json

The first run takes a few minutes because it downloads and caches the NVD database locally (~800MB). Subsequent runs against the same kernel are fast. One honest caveat: the tool checks the upstream kernel version against CVE data, not your distro's backport version. Your distro might be running 5.15.0-101-generic with 40 backported fixes but advertising 5.15.0 as the kernel version — cve-bin-tool will flag CVEs that are actually already patched. Treat its output as a starting point for cross-referencing against USN or the Debian tracker, not as a final verdict.

A Cron Job That Actually Tells You Something Useful

This is the thing I wish I'd set up earlier on every server. A simple cron that diffs your running kernel against what's available in the repo will catch the moment a patched kernel becomes installable — no manual checking required:

#!/bin/bash
# /usr/local/bin/check-kernel-update.sh
# Run as root or with sudo access to apt-cache

RUNNING=$(uname -r)
# Strip -generic suffix to get the base version string
BASE=$(echo "$RUNNING" | sed 's/-generic//')
# Get the latest available version from apt metadata
AVAILABLE=$(apt-cache show "linux-image-${BASE}" 2>/dev/null | awk '/^Version:/{print $2; exit}')
INSTALLED=$(dpkg-query -W -f='${Version}' "linux-image-${RUNNING}" 2>/dev/null)

if [ "$AVAILABLE" != "$INSTALLED" ] && [ -n "$AVAILABLE" ]; then
  echo "Kernel update available on $(hostname): running $INSTALLED, repo has $AVAILABLE" \
    | mail -s "[ACTION NEEDED] Kernel update available on $(hostname)" ops@yourteam.com
fi

Wire it up with crontab -e for a daily check at 8am:

0 8 * * * /usr/local/bin/check-kernel-update.sh

Two gotchas: first, on Ubuntu the metapackage naming is inconsistent across kernel flavors — HWE kernels use linux-image-generic-hwe-22.04 style naming, so adjust the sed pattern accordingly. Second, this script only tells you a new version is available; it doesn't tell you why. Pair it with a USN subscription so when the email fires, you can immediately pull up the notice and assess whether the CVE affects your workload.

The Reboot Strategy: Making Reboots Less Painful

The thing that catches most ops teams off guard isn't patching the kernel — it's discovering their reboot procedure is broken during an emergency. I've seen teams scramble through a CVE response only to find their GRUB config points to a kernel that was removed three months ago. Test your reboot procedure quarterly, on purpose, against a real node. Not a staging VM you spun up for the occasion — an actual production-class machine.

kexec: Skip the BIOS, Keep Your SLA

A full hardware reboot on a bare-metal node can burn 3-8 minutes just in POST and BIOS initialization before Linux even starts loading. kexec bypasses all of that by loading the new kernel directly from the running system and jumping to it, cutting reboot time to 20-40 seconds on most hardware. The sequence is two commands:

# Load the new kernel into memory first
kexec -l /boot/vmlinuz-6.1.0-21-amd64 \
  --initrd=/boot/initrd.img-6.1.0-21-amd64 \
  --reuse-cmdline

# Then trigger the jump — systemd handles shutdown hooks cleanly
systemctl kexec

The --reuse-cmdline flag pulls your existing kernel parameters from /proc/cmdline so you're not manually copying quiet splash root=UUID=... and inevitably mistyping it. One gotcha: kexec does not re-run your hardware initialization. If you have firmware bugs that manifest after extended uptime, kexec won't clear them — you'll need a full cold reboot eventually. Use kexec for velocity during a patch wave, not as a permanent substitute for real reboots.

Rolling Reboots in Kubernetes Clusters

The drain-reboot-uncordon pattern is standard, but the flags matter more than people realize. --delete-emptydir-data is the one that'll block your drain silently if you forget it — any pod using an emptyDir volume will hold up the drain indefinitely without that flag. Here's the full sequence I actually run:

# Cordon first so the scheduler stops placing new pods
kubectl cordon 

# Drain — this evicts pods respecting PodDisruptionBudgets
kubectl drain  \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --timeout=300s

# Patch and reboot the node here via your preferred method
# Then after it comes back online:
kubectl uncordon

The --timeout=300s is something I added after watching a drain hang for 40 minutes on a misbehaving pod during a live patch event. Set it explicitly. Also: if your PodDisruptionBudgets aren't configured, drain will happily evict your entire stateful set at once. Fix your PDBs before you're rolling nodes under pressure.

Cloud "Managed" Reboots Are Not What You Think

AWS Scheduled Events and GCP Live Migration sound similar and are completely different. GCP Live Migration physically moves your VM to different hardware transparently — you see maybe a few seconds of increased latency, and the underlying kernel doesn't change. It handles hypervisor-level maintenance, not guest kernel vulnerabilities. You still need to reboot your guest OS to apply a kernel patch. AWS Scheduled Events for instance-reboot actually do reboot your instance, but system-maintenance events may just be hypervisor updates with no guest involvement. Check the event type before assuming anything was fixed:

# On the instance, check what AWS is actually scheduling
aws ec2 describe-instance-status \
  --instance-ids i-0abc123def456 \
  --query 'InstanceStatuses[].Events'

I've watched teams mark a CVE as resolved because they saw a "maintenance event" complete in their cloud console, not realizing their guest kernel was unchanged. The cloud provider patching their hypervisor does nothing for a guest kernel vulnerability like Dirty Pipe or a local privilege escalation. Verify your running kernel version after any maintenance window — uname -r is one command, there's no excuse not to check.

Making Quarterly Reboot Drills Actually Useful

A drill where you reboot a throwaway node proves nothing. Structure it to catch real failure modes:

Boot into a kernel that's two versions back — confirms GRUB entries aren't getting pruned too aggressively by apt or dnf autoremove hooks
Verify your initramfs is current — run lsinitramfs /boot/initrd.img-$(uname -r) | grep -i cryptsetup if you're using LUKS; a stale initramfs after a kernel update is a real brick scenario
Time the full cycle — from patch application to the node passing health checks; if it's over 15 minutes, that's your actual RTO for kernel CVEs and your incident response plan needs to reflect it
Fail one node intentionally — don't uncordon it, and verify your cluster or load balancer actually reroutes traffic correctly before you're doing this for real

Real-World Walkthrough: Handling a Hypothetical Local Privilege Escalation

The thing that catches most ops teams off guard with a local privilege escalation isn't the severity — it's the false sense of safety from "requires local access." That phrase fools people into treating it like a low-priority item. It isn't. Every web shell, every compromised service account, every malicious insider is exactly the threat model for an LPE. Monday morning, CVE drops, your Ubuntu 22.04 fleet is sitting on 5.15.0-91, and the affected range is 5.15 through 6.1. Here's how I'd work through it, step by step.

Step 1: Is There a Public PoC? Check Before You Panic (or Relax)

Your first move is threat triage, not patching. Open three tabs simultaneously:

GitHub search: search the CVE number — results show up within hours of a drop
Exploit-DB: https://www.exploit-db.com/search?cve=CVE-YYYY-NNNNN
Kernel security mailing list archives: https://lore.kernel.org/linux-kernel/?q=CVE-YYYY-NNNNN

If there's no working PoC on Monday and the researchers who reported it followed coordinated disclosure, you typically have a window of days to a couple of weeks before reliable exploit code circulates. That changes your urgency calculation. No PoC means you can spend 30 minutes verifying the mitigation path rather than emergency-rebooting production at 9am. A reliable, weaponized PoC already on GitHub means you compress that timeline to hours. Check lore.kernel.org specifically for the patch commit — the commit message often tells you exactly which subsystem is affected and gives you enough context to assess your real exposure.

Step 2: Apply a sysctl Mitigation If One Exists

Many LPE advisories from the kernel security team include a sysctl workaround. If this one does, apply it and document it immediately. Don't just run the command — write it to a file with a comment that links the CVE:

# /etc/sysctl.d/99-cve-2024-NNNNN-mitigation.conf
# Temporary mitigation for CVE-2024-NNNNN (LPE, affects kernels 5.15-6.1)
# Remove this file once kernel >= X.Y.Z is deployed fleet-wide
# Reference: https://nvd.nist.gov/vuln/detail/CVE-2024-NNNNN
kernel.unprivileged_userns_clone = 0

sudo sysctl --system
# Verify it took effect
sysctl kernel.unprivileged_userns_clone

The filename convention with the CVE number matters — three months later when you're cleaning up, you'll know exactly why that file exists and which kernel version makes it safe to remove. Skip the naming convention and you'll find mystery sysctl files in production that nobody wants to touch. Also, confirm the mitigation doesn't break anything critical in your stack — unprivileged_userns_clone=0 for example breaks rootless containers, so Docker and Podman in rootless mode will stop working.

Step 3: Check Canonical Livepatch Before You Accept Downtime

If you're running Ubuntu 22.04 with Livepatch already enrolled, check immediately:

canonical-livepatch status --verbose

The output will tell you whether a livepatch for this specific CVE has been applied, is pending, or isn't available yet. Canonical doesn't livepatch every CVE — they prioritize high-severity remotely exploitable bugs and widespread LPEs. For a local-only LPE on 5.15, coverage isn't guaranteed. If the status shows the patch applied, you still keep the sysctl mitigation until you reboot into a patched kernel — livepatches are in-memory and don't survive reboots. Think of a livepatch as buying you a controlled maintenance window, not an excuse to skip patching.

Step 4: Subscribe to the Ubuntu Security Tracker and Wait Deliberately

Rather than refreshing https://ubuntu.com/security/CVE-YYYY-NNNNN manually, subscribe to Ubuntu Security Notices for kernel updates. The USN list is at https://lists.ubuntu.com/mailman/listinfo/ubuntu-security-announce. You're specifically watching for the status to flip from needed to released in the Jammy (22.04) row. The tracker page also shows you which source package carries the fix — for the kernel it's usually linux or linux-hwe-5.15 depending on your meta-package. Knowing the exact source package prevents you from installing the wrong kernel flavor and wondering why the CVE tracker still shows red.

Step 5: Test the Patched Kernel on One Host Before Rolling Fleet-Wide

Once the fix lands in the -security pocket, resist the urge to push it everywhere at once. Pull it on one canary host first:

# Pin to the current running kernel version first so you can see what upgrades
sudo apt-get install --only-upgrade linux-image-$(uname -r)

# Or if you want the latest available in -security:
sudo apt-get update && sudo apt-get dist-upgrade --dry-run | grep linux-image

Run your smoke tests, confirm the application stack comes up clean after reboot, then schedule rolling reboots across the fleet. I prefer batching by 10-15% of the fleet per wave with 30 minutes between waves. That's enough time to catch a boot regression before you've taken down half your capacity. For the reboot scheduling itself, if you're on systemd, systemctl reboot --when=+30m works well for coordinated timing across hosts.

Step 6: Verify and Clean Up

After each host reboots, three checks close the loop:

# Confirm you're on the patched kernel
uname -r
# Expected: something like 5.15.0-105-generic (higher than 5.15.0-91)

# If you used livepatch, confirm clean state post-reboot
kpatch list
# Expected: no patches listed — the fix is baked into the kernel now

# Remove the temporary sysctl mitigation
sudo rm /etc/sysctl.d/99-cve-2024-NNNNN-mitigation.conf
sudo sysctl --system

The kpatch list check is the one people skip. After rebooting into a patched kernel, livepatches should be gone — if they're still showing, something is off with your kernel meta-package and you may have rebooted into the same old kernel. The uname -r output is your ground truth. Run these checks via your config management tool (Ansible, Puppet, whatever) so you get fleet-wide verification in one shot rather than SSH-ing into 200 boxes individually.

When NOT to Roll Your Own Patch

When Rolling Your Own Patch Is the Wrong Call

The thing that trips up a lot of ops teams is treating every CVE like it demands immediate action. Before you spend three days backporting a patch, check the attack vector first. If the CVE requires local access and your servers have zero untrusted local users — think single-tenant VMs, dedicated bare-metal with no shell access handed out — the practical risk is close to zero. You're trading real engineering time and real regression risk for protection against a threat that isn't there. The risk/effort math just doesn't close.

Old kernels are a trap. If you're on something like 4.14 or 5.4 LTS and the upstream fix was written against 6.6, you're not backporting — you're rewriting. The subsystem internals shift enough that you end up carrying a diverged kernel with a patch that nobody has reviewed in the context you're running it. I've seen teams maintain custom-patched 4.19 kernels "just until we migrate" for eighteen months. The correct answer, when the fix doesn't apply cleanly, is to accelerate the version upgrade and mitigate via network controls or workload isolation in the meantime. A diverged kernel is a vulnerability of its own.

If you're running on managed Kubernetes nodes — EKS, GKE, AKS — stop. You do not have access to the host kernel, full stop. Trying to work around this with privileged DaemonSets or node modification scripts is the kind of thing that gets your cluster support contract voided and your security team's hair on fire. The right move is to open a support ticket with the cloud provider (they often patch faster than you'd expect, especially for high-CVSS issues), and while you wait, use pod security policies or network policies to isolate the workloads that would be exposed. Your blast radius reduction is in the workload layer, not the kernel layer.

The compliance angle is one people learn the hard way. PCI-DSS and FedRAMP both have provisions around vendor-supported software. A self-built kernel patch — even if it's literally the upstream commit cherry-picked clean — can fail an audit because your QSA or 3PAO sees "custom kernel not from a vendor" and flags it. I've watched technically correct patches cause audit findings that took months to remediate. If you're in a regulated environment, the acceptable path is usually:

Apply a compensating control (disable the affected syscall, restrict access, use seccomp)
Document the compensating control formally
Wait for your distro to ship the fix, then upgrade on the next change window

The compensating control buys you the time, and the vendor-shipped fix keeps your audit clean. Rolling your own patch doesn't actually shorten the audit remediation timeline — it adds a finding on top of the original one.

# Quick check before you commit to anything:
# 1. What's the attack vector?
grep "attackVector" nvd-cve-data.json
# LOCAL means no remote exploit — reassess urgency

# 2. Can the upstream patch even apply?
git fetch upstream
git cherry-pick --no-commit <upstream-commit-sha>
# If you hit more than ~10 conflicts, you're maintaining a fork now — stop here

# 3. Are you on a managed node?
systemd-detect-virt
# "none" means bare metal — you have kernel access
# "amazon", "gce", "azure" in a managed context means you don't

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.