mohideen sahib

Posted on Oct 15

Crash Dumps in Linux Kernel & Application Deep Dive

#linux #devops #sre #kdump

Crash Dumps in Linux: Kernel & Application Deep Dive

Crash dumps are essential for diagnosing system-level and application-level failures. They capture memory and execution state at the time of a crash, helping engineers identify root causes and prevent recurrence.

In Linux, there are two main types of crash dumps:

Kernel Crash Dump (kdump) – triggered when the kernel itself crashes.
Application Core Dump (coredump) – triggered when a process crashes.

1️⃣ Kernel Crash Dump (kdump)

When the Linux kernel crashes, it may leave the system unstable. kdump provides a safe way to capture a memory snapshot (vmcore) for post-mortem analysis.

How kdump Works

Crashkernel Reservation

At boot, a portion of RAM is reserved for the crash kernel via the GRUB kernel parameter:

crashkernel=512M

This memory is isolated from the main kernel, ensuring a stable environment to capture the dump.

Kernel Panic Handling

When the main kernel encounters a panic or fatal exception, the panic handler executes.

The panic handler invokes kexec, which jumps to the preloaded crash kernel in reserved memory.

Crash Kernel Boot

The crash kernel boots without BIOS/UEFI initialization or full hardware reinitialization.

Minimal drivers and services are loaded to safely capture memory.

Dump Collection

The crash kernel reads memory from the crashed main kernel and saves it as vmcore.

Storage options: local disk, NFS, or remote crash dump server.

Crashkernel Size Recommendations

Must be large enough to store the kernel memory, but not excessively reduce main system RAM.

Typical sizing rules:

RAM Size Crashkernel Size

< 2 GB 128–256 MB
2–8 GB 256–512 MB
8–64 GB 512–1024 MB

64 GB 1–2 GB

Rationale: The dump size depends on used kernel memory + active processes. Too small → dump fails; too large → reduces usable RAM.

Configuring kdump

Install kdump tools:

yum install kexec-tools # RHEL/CentOS
apt install kdump-tools # Debian/Ubuntu

Enable and start the service:

systemctl enable kdump
systemctl start kdump

Configure dump storage (/etc/kdump.conf):

Local storage

path /var/crash

Remote NFS

net nfsserver:/kdump

Optional: Reduce dump size:

core_collector makedumpfile -c --message-level 1

Remote NFS Dumps & Cleanup

Requirements:

Network interface must be up in the crash kernel.

NFS server must be reachable during crash kernel execution.

Cleanup strategies:

Remove dumps older than 30 days

find /mnt/kdump/ -type f -mtime +30 -exec rm -f {} \;

Limit total size

du -sh /mnt/kdump/

Automate via cron or systemd timers on the NFS server.

Testing & Analysis

Manual trigger:

echo c > /proc/sysrq-trigger

Verify dump:

ls -lh /var/crash

Analyze using crash:

crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/.../vmcore

Important checks:

Kernel panic messages

Last running processes

Memory corruption / Oops logs

Device driver states

2️⃣ Application Core Dump (coredump)

When an application crashes, Linux can capture a memory snapshot for debugging.

Triggering Core Dumps

Automatic: Segmentation fault, abort, unhandled exception.

Manual: Sending a signal:

kill -ABRT
kill -SIGSEGV

The process may be temporarily unserviceable while writing the dump.

Systemd-Based Core Dumps

Handled by systemd-coredump.

Dependencies:

systemd-coredump.service

systemd-journald (logging)

ulimit -c or LimitCORE in the unit file:

[Service]
LimitCORE=infinity

Not all units generate dumps. Restrictive unit options may block core dumps:

NoNewPrivileges=yes

PrivateTmp=yes

ProtectSystem=full/strict

ProtectHome=yes

ReadOnlyPaths / InaccessiblePaths

LimitCORE=0

Core Dump Configuration (/etc/systemd/coredump.conf)

[Coredump]
Storage=external # Disk storage
Compress=yes # Compress dumps
ProcessSizeMax=2G # Max size per dump
ExternalSizeMax=10G # Max total storage for all dumps
KeepFree=500M # Minimum free disk space

How cleanup happens:

systemd-coredump calculates current dump storage usage.
If adding a new dump exceeds ExternalSizeMax or violates KeepFree, oldest dumps are deleted.
New dump is written only after usage is within limits.

No cron jobs required — cleanup is dynamic during dump creation.

Enabling Core Dumps for Your Service

Set LimitCORE:

[Service]
LimitCORE=infinity

Ensure writable storage: /var/lib/systemd/coredump or external disk.
Avoid restrictive options like NoNewPrivileges or PrivateTmp.
For user services: Set ulimit -c unlimited in the shell or service environment.

Reviewing Core Dumps

List dumps:

coredumpctl list

Debug with GDB:

coredumpctl gdb

Key points:

Stack traces

Faulting instruction

Thread states

Memory allocations

Linked libraries

✅ Key Takeaways

Kernel dump: For system crashes; uses crashkernel + kexec.

Crashkernel sizing: Based on RAM usage; too small → dump fails.

Remote storage: Requires cleanup and monitoring.

Core dump: For processes; retention via ExternalSizeMax and KeepFree.

Only units without restrictive options generate dumps.

Core dumps can be triggered manually with signals; cleanup still applies.

DEV Community

Crash Dumps in Linux Kernel & Application Deep Dive

Local storage

Remote NFS

Remove dumps older than 30 days

Limit total size

Top comments (0)