π§ link to the practice repository:
PRESS ME
In the modern era of distributed systems and microservices, "monitoring" is no longer enough. We need Observability. But what exactly is the difference, and how can we implement it without getting lost in complex configurations?
In this article, we will explore the core concepts of observability and build a real-world example using Python (Flask), Prometheus, and Grafana.
What is Observability?
While monitoring tells you when something is wrong (e.g., "The server is down"), observability allows you to understand why it is wrong by asking questions to your system from the outside.
It is often categorized into three pillars:
- Logs: A record of discrete events (e.g., "User X logged in").
- Metrics: Aggregated numerical data over time (e.g., "CPU usage is at 80%").
- Traces: The path of a request through your distributed system.
Today, we will focus heavily on Metrics using the industry-standard tool: Prometheus.
The Stack
For our practical exercise, we will use:
- Application: A simple Python Flask API.
- Instrumentation:
prometheus-clientlibrary to expose metrics. - Storage & Scraping: Prometheus.
- Visualization: Grafana.
Step 1: The Application Code
We need an application that doesn't just work, but talks to us. We will create a Flask app and instrument it to track:
- Request Count: How many requests we receive (broken down by status code and method).
- Latency: How long requests take to process.
Here is the core logic (app/main.py):
from flask import Flask, Response, request
import time
import random
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
app = Flask(__name__)
# 1. Define Prometheus Metrics
REQUEST_COUNT = Counter(
'app_request_count',
'Application Request Count',
['method', 'endpoint', 'http_status']
)
REQUEST_LATENCY = Histogram(
'app_request_latency_seconds',
'Application Request Latency',
['method', 'endpoint']
)
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
def after_request(response):
request_latency = time.time() - request.start_time
# 2. Record metrics after every request
REQUEST_LATENCY.labels(request.method, request.path).observe(request_latency)
REQUEST_COUNT.labels(request.method, request.path, response.status_code).inc()
return response
@app.route('/metrics')
def metrics():
# 3. Expose metrics for Prometheus to scrape
return Response(generate_latest(), mimetype=CONTENT_TYPE_LATEST)
# ... routes for / and /error ...
Step 2: Configuring Prometheus
Prometheus needs to know where to look for metrics. We create a prometheus.yml file:
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'flask_app'
static_configs:
- targets: ['app:5000']
This tells Prometheus to ping our app on port 5000 every 5 seconds and read the data from /metrics.
Step 3: Orchestration with Docker Compose
To make this easy to run anywhere, we use Docker Compose to spin up the App, Prometheus, and Grafana simultaneously.
version: '3.8'
services:
app:
build: ./app
ports: ["5000:5000"]
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
ports: ["9090:9090"]
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
Seeing it in Action
Once the containers are running, we can generate some traffic to our app.
- Generate Traffic: Refresh
http://localhost:5000/andhttp://localhost:5000/errora few times. - Check Prometheus: Go to
http://localhost:9090and search forapp_request_count. You will see the raw data increasing. - Visualize in Grafana:
- Go to
http://localhost:3000(admin/admin). - Add Prometheus as a Data Source (
http://prometheus:9090). - Create a dashboard to visualize
rate(app_request_count[1m]).
- Go to
Conclusion
Observability is not just for giant tech companies. With a few lines of code and open-source tools, you can gain deep insights into your application's health. Instead of guessing why your app is slow, you can look at the histogram metrics and know.
Happy Coding! π
Top comments (0)