When running multiple docker containers, be it on a single server, a Kubernetes cluster or in Docker Swarm, it is very important to be able to monitor their resource usage.
That's where cAdvisor comes in. It provides useful metrics to build Monitoring Dashboards using Grafana and Prometheus:
But there is one big problem: cAdvisor's default settings.
The consequence of running the default configuration is very high CPU usage. Especially noticeable in low-powered devices such as the Raspberry Pi's from the cluster in the graph above.
cAdvisor uses more CPU than the containers it is monitoring! 😭
Thankfully, it's quite simple. The parameters that make the biggest impact are:
A detailed explanation of each can be found here.
Docker-compose Swarm example
Below I'm defining a
housekeeping_interval of 30 seconds, setting
true, and disabling metrics I'm not interested in, using
NOTE: Beware that I'm using an ARM version of cAdvisor, change the image/tag accordingly.
version: '3.4' services: cadvisor: image: zcube/cadvisor:latest ports: - published: 9102 target: 9102 mode: host command: - "--port=9102" - "--housekeeping_interval=30s" - "--docker_only=true" - "--disable_metrics=percpu,sched,tcp,udp,disk,diskIO,accelerator,hugetlb,referenced_memory,cpu_topology,resctrl" volumes: - /var/lib/docker/:/var/lib/docker:ro - /dev/disk/:/dev/disk:ro - /sys:/sys:ro - /var/run:/var/run:ro - /:/rootfs:ro - /sys/fs/cgroup:/cgroup:ro - /etc/machine-id:/etc/machine-id:ro - /etc/localtime:/etc/localtime:ro deploy: mode: global update_config: order: stop-first resources: reservations: memory: 80M healthcheck: test: wget --quiet --tries=1 --spider http://localhost:9102/healthz || exit 1 interval: 15s timeout: 15s retries: 5 start_period: 30s
Did it help?
I'm sure you can guess at what time the new settings kicked in 😏. From 0.68 vCPU to 0.08 vCPU!
Not bad at all.
Top comments (0)