DEV Community

Thiago Rodrigues
Thiago Rodrigues

Posted on

Prometheus Architecture

Prometheus Architecture

This is the first in a series of articles focused on the architecture of the main components of a modern monitoring stack.

Initially, I planned to start by comparing the different variants and tools in the Prometheus ecosystem (such as Mimir, Thanos, and Cortex). However, I realized it makes more sense to start with Prometheus itself; after all, it is the foundation and origin of all these solutions.

Most likely, at some point in your IT journey, you have heard of, seen, or used a metric exposed in Prometheus format for observability. Prometheus is an open-source project, graduated by the Cloud Native Computing Foundation (CNCF), being the second project to achieve this status, right after Kubernetes.

It works extremely well in Kubernetes environments, but also adapts perfectly to clouds and container-based environments in general.

Collection Model

Prometheus uses a pull-based approach to collect metrics. Unlike systems where agents actively send data, Prometheus goes to the source and "pulls" the data.

Prometheus Pull-based Collection Model
Figure 1: Prometheus data collection flow with pull and push mechanisms

The simplest way to get started is to run it as a container:

image: prom/prometheus:latest
Enter fullscreen mode Exit fullscreen mode

To operate it, you need a .yml configuration file, where you define global parameters, scrape frequencies, and targets.

A basic startup configuration looks like this:

global:
  scrape_interval: 15s # Scrape frequency
  evaluation_interval: 15s # Rule evaluation frequency
  external_labels:
    cluster: 'demo-cluster'
    environment: 'dev'

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        labels:
          instance: 'prometheus-server'
Enter fullscreen mode Exit fullscreen mode

The instance label (and others you add) allows you to filter and aggregate metrics later during queries, providing context to the data.

Core Architecture Components

A typical Prometheus deployment consists of several elements working together:

Prometheus Architecture Overview
Figure 2: Complete Prometheus architecture with all core components

  • Prometheus Server — The brain of the operation, responsible for collection and storage.
  • Targets — The endpoints that your applications or servers expose with metrics.
  • Exporters — Agents that translate metrics from third-party systems to Prometheus format.
  • Time-Series Database (TSDB) — The internal database optimized for time series.
  • PromQL — The powerful language for querying and analyzing data.
  • Push Gateway — An auxiliary component for handling short-lived jobs.
  • Alert Manager — The system responsible for managing, grouping, and routing alert notifications.
  • Client Libraries — Libraries for instrumenting custom code directly in applications.

Let's detail each of them.


Prometheus Server

The server is the central component. It performs three main functions:

  • Scraping: Periodically connects to configured targets via HTTP to fetch metrics.
  • Storage: Writes collected data to its local time-series database (TSDB).
  • Evaluation and Querying: Evaluates alert rules and responds to queries from users or visualization systems (like Grafana) via PromQL API.

In practice, it is the heart of the operation, ensuring the continuous flow of data from source to storage.


Targets

"Targets" are the sources of your metrics. They can be practically anything: a Linux server, a Java application, an API endpoint, or Kubernetes pods.

By default, Prometheus fetches metrics from the /metrics HTTP endpoint of the target, although this path is configurable.

scrape_configs:
  - job_name: 'node-metrics'
    static_configs:
      - targets: ['instance-dev:9100']
        labels:
          instance: 'instance-dev'
Enter fullscreen mode Exit fullscreen mode

Examples of common targets on an instance:

  • Port 9100: Where node_exporter exposes operating system metrics.
  • Port 8080: Where your application can expose custom business metrics.
  • Port 8081: Where cAdvisor exposes Docker container metrics.

Exporters

Not all software natively exposes metrics in Prometheus format (like a MySQL or Redis database, for example). That's where Exporters come in.

Exporters are small binaries that function as translators: they collect metrics from the original system (using its native APIs) and convert them to the readable text format that Prometheus understands, exposing them on an HTTP endpoint.

There is a vast range of exporters maintained by the community and officially:

  • node_exporter: Hardware and OS metrics (CPU, memory, disk).
  • blackbox_exporter: Probing of external endpoints via HTTP, DNS, TCP, ICMP.
  • mysqld_exporter / postgres_exporter / redis_exporter: Database-specific metrics.

You can check the complete list in the official documentation.

Metrics Format

When accessing an exporter's endpoint, you'll see data in plain text. Examples of what each exporter generates:

From node_exporter (System):

node_cpu_seconds_total{instance="instance-dev", cpu="0", mode="idle"} 145893.45
node_memory_MemAvailable_bytes{instance="instance-dev"} 4294967296
Enter fullscreen mode Exit fullscreen mode

From cAdvisor (Containers):

container_cpu_usage_seconds_total{instance="instance-dev", name="my-app", image="nginx:latest"} 234.56
Enter fullscreen mode Exit fullscreen mode

Application Metrics:

http_requests_total{instance="instance-dev", method="GET", status="200"} 1547
http_request_duration_seconds{instance="instance-dev", endpoint="/api/users"} 0.234
Enter fullscreen mode Exit fullscreen mode

Time-Series Database (TSDB)

The data that Prometheus collects is, by definition, time series: numerical values that change over time, always associated with a timestamp.

To store this efficiently, Prometheus uses its own TSDB, optimized for this use case. It doesn't use a traditional SQL database.

How Prometheus Stores Data

Prometheus stores metrics on disk in structures called blocks.

Prometheus TSDB Internals: Lifecycle & Storage Flow
Figure 3: Prometheus TSDB lifecycle and storage flow

  • Recent data is kept in memory for fast access and periodically flushed to disk.
  • Each block contains compressed samples, an index for fast lookups, and metadata.
  • Architecturally, the TSDB is focused on high write performance ("append-only").

Retention: By default, Prometheus keeps data locally for only 15 days. Older blocks are deleted to free up space. Prometheus was not natively designed to be a long-term storage solution, although the retention configuration can be changed.


PromQL

PromQL (Prometheus Query Language) is the integrated functional query language for retrieving and analyzing data. It's through it that you create dashboards in Grafana or define alerts.

The language allows you to select, filter, aggregate, and perform complex mathematical operations on time series.

Query Examples

# Simple metric selection (current value)
http_requests_total

# Filtering by labels
http_requests_total{instance="instance-dev", status="200"}

# Requests per second rate (average over the last 5 minutes)
rate(http_requests_total[5m])

# Global sum of request rate, aggregated by instance
sum(rate(http_requests_total[5m])) by (instance)

# Available memory percentage calculation
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# CPU usage (everything that's not 'idle')
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Enter fullscreen mode Exit fullscreen mode

PromQL is vast and includes functions for working with percentiles (histogram_quantile), linear predictions (predict_linear), and temporal comparisons (offset).


Push Gateway

As mentioned, Prometheus uses a pull model. But what about short-lived jobs (like a batch backup script) that start and finish before Prometheus has a chance to scrape?

For this, there's the Push Gateway. It functions as an intermediate cache. The short-lived job "pushes" its metrics to the Push Gateway upon completion. Prometheus, in turn, scrapes the Push Gateway at its regular interval.

Example of sending via shell script

# Sends a metric indicating how long a job took
echo "job_duration_seconds 45.2" | curl --data-binary @- \
  http://pushgateway:9091/metrics/job/batch-job/instance/worker-1
Enter fullscreen mode Exit fullscreen mode

Important Note

The Push Gateway is intended for very specific use cases. It should not be used to convert Prometheus into a push-based system. The pull model is preferable in most cases because it allows Prometheus to control the load, easily detect if a target is inactive (up/down), and simplifies service discovery.


Alert Manager

It's common to confuse the responsibilities here.

  • The Prometheus Server is the one that detects the problem (evaluating a PromQL rule) and fires the alert state.
  • The Alert Manager is the one that receives this firing and decides what to do with it.

Alert Manager is a separate component that handles the logistics of notifications. Its main functionalities are:

  • Grouping: If 100 services go down simultaneously because a network switch failed, you don't want to receive 100 emails. Alert Manager groups these similar alerts into a single notification.
  • Inhibition: If a critical alert of "Data Center down" is active, it can inhibit smaller alerts like "Server X unreachable".
  • Silencing: Allows "muting" alerts during planned maintenance windows.
  • Routing: Sends different alerts to different channels (e.g., critical alerts to PagerDuty, warnings to Slack).

Flow Example

In Prometheus (Detection Rule):

groups:
  - name: example
    rules:
      - alert: HighCPUUsage
        expr: rate(node_cpu_seconds_total{mode!="idle"}[5m]) > 0.8
        for: 5m # If the condition is true for 5 minutes...
        labels:
          severity: warning
Enter fullscreen mode Exit fullscreen mode

Prometheus detects and sends to Alert Manager.

In Alert Manager (Routing Configuration):

route:
  # Default route
  receiver: 'team-slack'
  routes:
    # Specific route for critical cases
    - match:
        severity: critical
      receiver: 'pagerduty-oncall'

receivers:
  - name: 'team-slack'
    slack_configs:
      - channel: '#alerts-general'
  - name: 'pagerduty-oncall'
    pagerduty_configs:
      - service_key: '...'
Enter fullscreen mode Exit fullscreen mode

Client Libraries

Besides using exporters for ready-made systems, the best practice is to instrument your own applications so they natively expose business and performance metrics.

Prometheus offers official client libraries for Go, Java/Scala, Python, and Ruby, in addition to several others maintained by the community (.NET, Node.js, Rust, etc.).

Example: Python Instrumentation

With just a few lines of code, your application starts serving a /metrics endpoint.

from prometheus_client import Counter, Histogram, start_http_server
import time

# 1. Define the metrics
requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests received',
    ['method', 'endpoint', 'status'] # Labels for dimensionality
)

request_duration = Histogram(
    'http_request_duration_seconds',
    'Histogram of request duration',
    ['endpoint']
)

# 2. Use in application code (e.g., using decorators)
@request_duration.labels(endpoint='/api/users').time()
def handle_user_request():
    # Application logic...
    time.sleep(0.1)
    # Increment the counter at the end
    requests_total.labels(method='GET', endpoint='/api/users', status='200').inc()

if __name__ == '__main__':
    # 3. Start an HTTP server to expose the metrics
    start_http_server(8000)
    print("Metrics server running on port 8000...")
    # Main application loop...
Enter fullscreen mode Exit fullscreen mode

The libraries already handle complex details like thread-safety and correct data formatting.


Why Prometheus?

Summarizing the strengths that made Prometheus the market standard:

  • Pull Model: Facilitates flow control, debugging, and failure detection in targets.
  • Multidimensional Data: The use of labels in time series allows incredibly flexible analyses.
  • PromQL: A query language designed specifically for the nature of monitoring data.
  • Operational Simplicity: A single static binary, easy to deploy, without complex external dependencies.
  • Service Discovery: Native and dynamic integration with Kubernetes, AWS, Azure, etc., to automatically discover new targets.
  • Open Ecosystem: Hundreds of exporters available for almost any technology.

Limitations

Despite being powerful, Prometheus' design (focused on the simplicity and reliability of a single node) brings inherent limitations:

  • Single Node Architecture: The Prometheus server was not designed for native horizontal scaling. If a server gets overloaded, the solution is usually to manually divide the load (sharding) among multiple servers.
  • Local and Ephemeral Storage: Data lives on the server's local disk. If the server dies and the disk is lost, the data is gone. There's no native data replication.
  • Long-Term Retention: It's not efficient to use Prometheus to store years of historical data.
  • Fragmented Global View: If you have multiple Kubernetes clusters, each with its own Prometheus, you don't have a unified view of metrics from all clusters in one place natively.

These architectural limitations were the motivators for creating tools that "embrace and extend" Prometheus, such as Thanos, Cortex, and Mimir.


Thank you for reading this far!

In the next articles, I intend to explore how to overcome the limitations mentioned above.

For this, I want to talk about:

  • Thanos: Adding long-term storage (Object Storage) and unified global view to Prometheus.
  • Cortex: The original solution for multi-tenant and horizontally scalable Prometheus.
  • Mimir: The evolution of Cortex, focused on massive scale and ease of operation.

Tags: #Prometheus #Monitoring #Observability #CNCF #DevOps #SRE

Top comments (0)