Observability Practices with Python, Prometheus, and Grafana

Observability Practices with a Real-World Example

Observability is essential for understanding how our applications behave in production. It involves collecting metrics, logs, and traces to detect issues, analyze trends, and optimize performance. Good observability allows teams to quickly identify bottlenecks, react to incidents, and maintain reliable systems.

What is Observability?

Observability is the ability to measure the internal state of a system by examining its outputs. In modern software, this means having visibility into metrics (quantitative measurements), logs (detailed event records), and traces (end-to-end request flows).

Real-World Example: Python, Prometheus, and Grafana

Let's look at a practical example. We build a simple Python API with Flask that exposes two key metrics using Prometheus:

http_requests_total: the number of HTTP requests received.
http_request_latency_seconds: the average latency of those requests.

The script also simulates traffic automatically, so there is always data available for monitoring.

We use Docker to run Prometheus and Grafana. Prometheus scrapes the metrics from our API, and Grafana visualizes them in real time. This setup is very similar to what you would use in a production environment.

Diagram

Example Code

from flask import Flask, jsonify
from prometheus_client import Counter, Summary, generate_latest, CONTENT_TYPE_LATEST
import time, random, threading, requests

app = Flask(__name__)
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_LATENCY = Summary('http_request_latency_seconds', 'HTTP request latency', ['endpoint'])

@app.route('/api')
def api():
    start = time.time()
    time.sleep(random.uniform(0.1, 0.8))
    REQUEST_COUNT.labels(method='GET', endpoint='/api').inc()
    REQUEST_LATENCY.labels(endpoint='/api').observe(time.time() - start)
    return jsonify({'message': 'Hello, observability!'})

@app.route('/metrics')
def metrics():
    return generate_latest(), 200, {'Content-Type': CONTENT_TYPE_LATEST}

def generate_traffic():
    while True:
        try:
            requests.get('http://localhost:8000/api')
        except Exception:
            pass
        time.sleep(random.uniform(0.5, 2))

if __name__ == '__main__':
    threading.Thread(target=generate_traffic, daemon=True).start()
    app.run(host='0.0.0.0', port=8000)

Prometheus Configuration (prometheus.yml)

global:
  scrape_interval: 5s
scrape_configs:
  - job_name: 'python-app'
    static_configs:
      - targets: ['host.docker.internal:8000']

Docker Compose Example

docker-compose.yml:
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - '9090:9090'
  grafana:
    image: grafana/grafana:latest
    ports:
      - '3000:3000'
    depends_on:
      - prometheus

Grafana Queries

Total requests: sum(rate(http_requests_total[1m]))
Average latency: rate(http_request_latency_seconds_sum[1m]) / rate(http_request_latency_seconds_count[1m])

Prometheus

It collects, stores, and queries metrics (such as CPU usage, memory, network traffic, etc.).

Dashboard Example

You can create a dashboard in Grafana to visualize these metrics in real time. This allows you to monitor your API's health, spot trends, and set up alerts for anomalies.

Why is this important?

With these metrics, you can:

Detect performance issues (e.g., high latency)
Monitor traffic patterns
Set up alerts for abnormal behavior
Make data-driven decisions to improve your application

How to Try It Yourself

Clone the repository and install dependencies:

   pip install flask prometheus_client requests

Run the Python script to start the API and metrics endpoint.
Use Docker Compose to start Prometheus and Grafana.
In Grafana, add Prometheus as a data source and create dashboards with queries like:
- sum(rate(http_requests_total[1m]))
- rate(http_request_latency_seconds_sum[1m]) / rate(http_request_latency_seconds_count[1m])

Conclusion

Observability is not just about collecting data, but about making your systems transparent and manageable. With tools like Prometheus and Grafana, and a few lines of code, you can gain valuable insights into your applications and ensure their reliability.