DEV Community

Cover image for How to Design Metrics With Prometheus Metric Types: the USE Method
Robert Nemet
Robert Nemet

Posted on • Originally published at rnemet.dev

How to Design Metrics With Prometheus Metric Types: the USE Method

This is the third part of a series about designing metrics for event-driven systems. You can check the first part and the second part of this series before proceeding.

While I discussed the general principles of designing metrics in the first part, I explained Prometheus metric types in the second part. I applied them as the RED method in the second part. In this article, I'll explain the USE method with Prometheus. Finally, a short discussion about the Four Golden Signals and a conclusion about all the methods.

Let's go...

The USE Method

The USE method by Brendan Gregg is a set of rules for designing metrics mainly used for the system not exposed to the users, like databases, message brokers, streaming platforms, etc.
Its key metrics are:

  • Utilization - the level to which a resource has been used
  • Errors - distribution of the number of errors per time
  • Saturation - the level to which a resource has extra work which can not be handled. It has to wait or drop extra work.

Implementation

I'll make an example of the USE method observing a CPU, memory, and network to simplify things and be close to what we use in daily work. I did examples using docker-compose, Prometheus, and Grafana. To get metrics from the system, I'm using the node-exporter. The complete example is in my github repo.

CPU Utilization

CPU utilization is the percentage of time the CPU is busy. The node-exporter provides the node_cpu_seconds_total metrics. This metric is a counter which counts the number of seconds the CPU has spent in each mode. One of the modes is idle, which is when the CPU is not busy.

In a period, say 1m, observe an average change in the idle counter. When subtracting a previously calculated value
from 1, we get the CPU utilization:

1 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m]))
Enter fullscreen mode Exit fullscreen mode

It is the same principle as in the RED method. We use counters, observe the rate of change, and then calculate the average.

If you are interested, continue to the rest on my blog.

Image of AssemblyAI tool

Challenge Submission: SpeechCraft - AI-Powered Speech Analysis for Better Communication

SpeechCraft is an advanced real-time speech analytics platform that transforms spoken words into actionable insights. Using cutting-edge AI technology from AssemblyAI, it provides instant transcription while analyzing multiple dimensions of speech performance.

Read full post

Top comments (0)

Billboard image

Deploy and scale your apps on AWS and GCP with a world class developer experience

Coherence makes it easy to set up and maintain cloud infrastructure. Harness the extensibility, compliance and cost efficiency of the cloud.

Learn more

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay