I wrote this post to share with you one job experience that I had to live with recently, one problem I had and my temporal solution to this problem.
I'm new in this tech world, I will be grateful if any of you have some improvement recommendations and very pleased if this post it's useful for someone else.
A few mounths ago in my company we discovered that some Docker container was having problems with CPU usage. Out of nowhere the CPU usage of that container was increasing abruptly. So while the dev team was searching for the code error, I implemented a temporary solution. I made one script to log all cpu usage every 5 seconds:
#!/bin/bash
logs=/var/log/process_name.log
container_name=container_name
while :
do
# Get a variable with the cpu usage for a specific container
var=`docker stats --no-stream --format "{{.CPUPerc}}" $container_name`
length=${#var}
if (( $length==0 )); then
echo "Container ${container_name} does not exist"
echo "$(date +'%d-%m-%Y %H:%M') | Container $container_name does not exist" >> $logs
else
# CPU usage in number
percent="${var[@]::-4}"
echo "Actual cpu usage: ${percent}"
# Save actual CPU usage in file
echo "$(date +'%d-%m-%Y %H:%M') | ${percent}" >> $logs
fi
sleep 5
After that I created a supervisor config to run this process:
[program:process]
command=/opt/scripts/script.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/process.err.log
stdout_logfile=/var/log/process.err.log
Then I wrote a script to restart the problematic container, based on the logs of the previous script:
#!/bin/bash
container_name=container_name
logs_evaluated_lines=5
logs=/var/log/process_name.log
max_cpu=90
while :
do
# Lines in file
num=$(wc -l < $logs)
counter=0
# For 'logs_evaluated_lines' lines in logs increase counter if cpu is greater than 100%
for ((index=$num;index>=$num-$logs_evaluated_lines+1;index--))
do
value=$(sed "${index}q;d" $logs)
percent=$(echo $value | cut -c 20-)
#echo $percent
if (( $percent >= max_cpu )); then
# echo 'mayor'
counter=$((counter+1))
# else
# echo 'menor'
fi
done
echo "$(date +'%d-%m-%Y %H:%M') | Logs up to 100%: ${counter}"
echo "$(date +'%d-%m-%Y %H:%M') | Logs lines analyzed: ${logs_evaluated_lines}"
if (( $counter == $logs_evaluated_lines )); then
echo "$(date +'%d-%m-%Y %H:%M') | CPU Full usage";
echo "$(date +'%d-%m-%Y %H:%M') | Restarting Container"
docker restart $container_name
echo "$(date +'%d-%m-%Y %H:%M') | Container Restarted"
echo "$(date +'%d-%m-%Y %H:%M') | Container Restarted" >> $logs
else
echo "$(date +'%d-%m-%Y %H:%M') | CPU Usage OK"
fi
echo "$(date +'%d-%m-%Y %H:%M') |"
sleep 5
done
This script evaluate 'logs_evaluated_lines' lines in log and restarts the container if the count is upper 'max_cpu' variable
Top comments (1)
My thanks @tomasggarcia , this is exactly what I needed following the discovery that one of my Docker containers (homebridge running on Ubuntu server) was using 300% CPU, and raising the machine temperature to 140deg! Don't really want to monitor the stats live all the time, or worry about it, so needed something to restart the container if that happened again.
Could you be so kind as to advise what each of the files should be called, what permissions they need, and where they need to go on an Ubuntu 20.04 server? Also, are any crons needed, and if so how should they be setup? Much appreciated in advance.