DEV Community

Randika Madhushan Perera
Randika Madhushan Perera

Posted on • Updated on

Prometheus Fundamentals - [Prometheus Data Model](Lesson-02)

Prometheus Fundamentals Lesson 01: Lesson 01

4. Prometheus Data Model

We have already discussed installing and setting up Prometheus, we're now ready to delve into the actual metric data stored within Prometheus. This lesson aims to shed light on the format used by Prometheus for storing and tracking data, focusing on the concept of time-series data.

4.1 What is Time-Series Data

Understanding what time-series data entails is crucial when working with Prometheus.

Time-series data is a fundamental aspect of Prometheus, with all metric data being stored in this form.

Time-series data consists of a series of values associated with different points in time.

Time-Series Metric Example 01

To illustrate, let's consider tracking the outdoor temperature. Rather than just noting the current temperature, we create a time series by recording the temperature at regular intervals, say every hour.

Outdoor temperature: 0C / 31F

  • 8.00AM --> -6C / 21F
  • 9.00AM --> -3C / 26F
  • 10.00AM -> -2C / 28F
  • 11.00AM -> 0C / 31F

Time-Series Metric Example 02

Prometheus operates on this principle. Every metric in Prometheus is essentially a time series, tracking a particular value over time, not just the current value.

For instance, Prometheus might track the available memory on a server, recording this data every minute. This approach allows Prometheus to provide a comprehensive view of how a metric evolves over time, not just its current state.

4.2 Metrics and Labels

In this section, we delve into the intricacies of metrics and labels within Prometheus, focusing on the fundamental components that constitute a Prometheus metric and the methods used to reference them.

Metric Names

Every metric in Prometheus has a metric name. The metric name refers to the general feature of a system or application that is being measured.

An example of a metric name:

node_cpu_seconds_total

node_cpu_seconds_total: measures the total amount of CPU time being used in CPU seconds.

Note that the metric name merely refers to the feature being measured. Metric names do not point to a specific data value but potentially a collection of many values.

Simply querying 'node_cpu_seconds_total' would likely return a list of multiple data points, such as CPU usage for multiple CPUs on a server, or even multiple servers.

Metric Labels

Prometheus uses labels to provide a dimensional data model. This means we can use labels to specify additional things, such as which node's CPU usage is being represented.

A unique combination of a metric name and a set of labels identifies a particular set of time-series data. This example uses a label called cpu to refer to the usage of a specific CPU.

node_cpu_seconds_total{cpu="0"}

Most Prometheus metrics have multiple labels.

node_cpu_seconds_total{cpu="0",instance="10.0.1.102:9100",mode="idle"} 289.98
node_cpu_seconds_total{cpu="0",instance="10.0.1.102:9100",mode="user"} 89.98

Using metric names and labels, you can write queries that do things like average CPU usage across a whole data center as well as drill down into the CPU usage of a single CPU on a single node.

Labels allow us to provide additional details, like specifying the node or CPU the data is referring to, or the CPU's operational mode. The unique combination of a metric name and its labels defines a specific time series in Prometheus.

Metric Types

We'll explore the different ways exporters represent metric data in Prometheus. It's crucial to understand that metric types are not specially represented in the Prometheus server itself; instead, they are conceptual tools used to interpret the data exporters expose.

Metric types refer to different ways in which exporters represent the metric data they provide.

Metric types are not represented in any special way in a Prometheus server, but it is important to understand them to properly interpret your metrics.

1. Counter

A counter is a single number that can only increase or be reset to zero. Counters represent cumulative values. Counters are typically used for tracking quantities like the number of processed records, application restarts, or errors that have occurred.

Total HTTP requests served.

+- 0
+- 12
+- 85
+- 276

The first number in our time series will be 0, then 12 and 85, etc.

Examples:

+- Number of HTTP requests served by an application
+- Number of records processed.
+- Number of application restarts.
+- Number of errors.

counters

This metric, being a counter, always increases or resets but never decreases.

2. Gauge

Unlike counters, gauges are numbers that can increase or decrease over time. Gauges might measure the current number of active HTTP requests, CPU usage, memory usage, or active threads. Gauges reflect the current state of a monitored resource.

Current HTTP requests active:

+- 76
+- 82
+- 24
+- 56

Examples:

+- Number of concurrent HTTP requests
+- CPU usage
+- Memory usage
+- Current active threads

Gauge

3. Histograms

Which counts the number of observations falling into configurable buckets, each with its time series. Histograms are often used to measure request durations. For instance, they can group HTTP request durations into buckets like under 0.3 seconds, between 0.3 to 0.6 seconds, and so on. Histograms also include metrics for the sum of all observed values and the total count of events.

http_request_duration_seconds_bucket{le="0.3"}
http_request_duration_seconds_bucket{le="0.6"}
http_request_duration_seconds_bucket{le="1.0"}

Prometheus histograms have cumulative buckets; for instance, the le=0.6 bucket includes all requests up to 0.6 seconds, encompassing the le=0.3 counts. Histograms also provide _sum and _count time series for total values and event counts. For insights like perecentiles, use the histogram_quantile() function in Prometheus.

Histograms

4. Summary

Similar to histograms, summaries also break down data but use quantiles instead of discrete bucket values. A quantile is akin to a percentile, providing a way to understand the distribution of data over certain percentage thresholds.

For example, a summary can show the 95th percentile of HTTP request durations.

http_request_duration_seconds{quantile="0.95"}

Like histograms, summaries also provide _sum and _count metrics.

Summary

Top comments (0)