DEV Community

Surviving the Linux OOM Killer

Raunak Ramakrishnan on October 04, 2018

When your Linux machine runs out of memory, Out of Memory (OOM) killer is called by kernel to free some memory. It is often encountered on servers ...

Read full post

Alex North-Keys • Aug 15 '19 • Edited

I toyed with that command a bit - I wanted to get the RSS and username in there, keep the sort, and include how many procs were included and skipped. Something like (fewer than 30 procs are shown due to trimming):

#                              OOM   OOM   OOM  
#     User     PID       RSS Score ScAdj   Adj  Command (shown 30, omits 945)
  someuser   17098  13056696   198     0     0  /usr/lib/firefox/firefox -cont
  someuser    5972   3645740    55     0     0  /usr/lib/firefox/firefox -cont
  someuser   17040   2668760    40     0     0  /usr/lib/firefox/firefox -no-r
  someuser    5898   2342168    35     0     0  /usr/lib/firefox/firefox -no-r
      root    4974   1531488    24     0     0  /usr/lib/xorg/Xorg -nolisten t
  someuser   15283    433544     9     0     0  /usr/bin/java -Dosgi.requiredJ
  someuser    6014    133240     2     0     0  /usr/lib/firefox/firefox -cont
  someuser    6094    171836     2     0     0  /usr/lib/firefox/firefox -cont
  someuser   26043    101524     2     0     0  emacs somefile.py 
  someuser    1889     70088     1     0     0  xterm -name XTerm8 -tn xterm-2
  someuser    3607     64096     1     0     0  xterm -name XTerm8 -tn xterm-2
  someuser   11903     99368     1     0     0  python 
  someuser   17166    111764     1     0     0  /usr/lib/firefox/firefox -cont
  someuser   32529     44736     1     0     0  xterm -name XTerm8 -tn xterm-2
  postgres   30473     66132     1     0     0  postgres: checkpointer process
      root    2388     41156     0  -500    -8  /usr/bin/dockerd -H unix:// 
      root   20036     24344     0  -900   -15  /usr/lib/snapd/snapd 
  message+    1605      4100     0  -900   -15  /usr/bin/dbus-daemon --system 
  postgres   30448      7220     0  -900   -15  /usr/lib/postgresql/11/bin/pos
      root   28818      3032     0 -1000   -17  /lib/systemd/systemd-udevd 
      root   30988      2832     0 -1000   -17  /usr/sbin/sshd -D

Here's the result. Whether this is an argument for or against bash syntax is an exercise for the reader. The cat/tr calls can probably be obviated :-)

#!/bin/bash
#    Displays running processes in descending order of OOM score
#      (skipping those with both score and adjust of zero).
#    https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9

contents-or-0 () { if [ -r "$1" ] ; then cat "$1" ; else echo 0 ; fi ; }

{
    header='# %8s %7s %9s %5s %5s %5s  %s\n'
    format="$(echo "$header" | sed 's/^./ /')"
    declare -a lines output
    IFS=$'\r\n' command eval 'lines=($(ps -e -o user,pid,rss))'
    shown=0 ; omits=0
    for n in $(eval echo "{1..$(expr ${#lines[@]} - 1)}") ; do # 1..skip header
        line="${lines[$n]}"
        case "$line" in *[0-9]*)
            set $line ; user=$1 ; pid=$2 ; rss=$3 ; shift 3
            oom_score=$(    contents-or-0  /proc/$pid/oom_score)
            oom_adj=$(      contents-or-0  /proc/$pid/oom_adj)
            oom_score_adj=$(contents-or-0  /proc/$pid/oom_score_adj)            
            if [ -f /proc/$pid/oom_score ] && \
               [ 0 -ne $oom_score -o 0 -ne $oom_score_adj -o 0 -ne $oom_adj ]
            then
                output[${#output[@]}]="$( \
                   printf "$format" \
                          "$user" \
                          "$pid" \
                          "$rss" \
                          "$oom_score" \
                          "$oom_score_adj" \
                          "$oom_adj" \
                          "$(cat /proc/$pid/cmdline | tr '\0' ' ' )" \
                )"
                (( ++shown ))
            else
                (( ++omits ))
            fi
            ;;
        esac
    done
    printf "$header"   ''   '' '' OOM   OOM   OOM ''
    printf "$header" User PID RSS Score ScAdj Adj \
        "Command (shown $shown, omits $omits)"
    for n in $(eval echo "{0..$(expr ${#output[@]} - 1)}") ; do
        echo "${output[$n]}"
    done | sort -k 4nr -k 5rn
}

#----eof

Dustin King • Oct 5 '18 • Edited

Very interesting.

This must be a linux-specific thing, not *nix in general. My MacOS laptop doesn't seem to have a /proc.

Edit to add: This article says Mac uses the sysctl function for some things that would otherwise use /proc for.

Andrey Loskutov • Oct 14 '20

We've found an interesting issue: specific oom_score_adj values in the range [942,999] seem to produce "unexpected" oom_adj values of 16, which seem to be out of range [-17, 15].

That is at least unexpected, any idea where it is coming from and if that could affect the oom_killer behavior (e.g. task with oom_score_adj=940 will be killed before the task with oom_score_adj=999)? At least /proc/<pid>/oom_score seem to be "OK" and is higher for oom_score_adj=1000...

sudo echo "999" > /proc/<pid>/oom_score_adj
cat /<pid>/oom_score_adj
999
cat /<pid>/oom_adj
16
cat /proc/<pid>/oom_score
1008

sudo echo "1000" > /proc/<pid>/oom_score_adj
cat /<pid>/oom_score_adj
1000
cat /<pid>/oom_adj
15
cat /proc/<pid>/oom_score
1009

Raunak Ramakrishnan • Oct 5 '18 • Edited

Agreed! There is no substitute for good monitoring. It catches many issues before they become bigger problems. Ultimately, we must be fixing the root cause for high memory which is generally poor design/architecture.

What you said about the tragedy of commons is exactly what happened to nice scores for process priority.

beelong_yank • Mar 4 '25

for more information

#!/bin/bash

LOG_FILE="/var/log/oom_monitor.log"
declare -A prev_processes  

while true; do
    clear
    printf 'PID\tPPID\tOOM Score\tOOM Adj\tCommand\n'

    declare -A current_processes  

    while read -r pid ppid comm; do
        if [ -f /proc/$pid/oom_score ] && [ "$(cat /proc/$pid/oom_score)" -ne 0 ]; then
            oom_score=$(cat /proc/$pid/oom_score)
            oom_adj=$(cat /proc/$pid/oom_score_adj)
            printf '%d\t%d\t%d\t\t%d\t%s\n' "$pid" "$ppid" "$oom_score" "$oom_adj" "$comm"

            current_processes["$pid"]="$comm"
        fi
    done < <(ps -e -o pid= -o ppid= -o comm=) | sort -k 3nr

    for pid in "${!prev_processes[@]}"; do
        if [[ ! -v current_processes["$pid"] ]]; then
            timestamp=$(date "+%Y-%m-%d %H:%M:%S")
            msg="[$timestamp] ALERT: Proces '${prev_processes[$pid]}' (PID $pid) is dead by OOM Killer!"
            echo -e "\n\033[1;31m$msg\033[0m" 
            echo "$msg" >> "$LOG_FILE"
        fi
    done

    prev_processes=("${current_processes[@]}")

    sleep 2  
done

Gnoale • Feb 9 '21

little typo I spotted : instead of sudo echo -200 > /proc/42/oom_score_adj do echo -200 | sudo tee - /proc/42/oom_score_adj

Raunak Ramakrishnan • Dec 14 '21

Thanks, corrected