Observability Practices with a Real-World Example
Observability is essential for understanding how our applications behave in production. It involves collecting metrics, logs, and traces to detect issues, analyze trends, and optimize performance. Good observability allows teams to quickly identify bottlenecks, react to incidents, and maintain reliable systems.
What is Observability?
Observability is the ability to measure the internal state of a system by examining its outputs. In modern software, this means having visibility into metrics (quantitative measurements), logs (detailed event records), and traces (end-to-end request flows).
Real-World Example: Python, Prometheus, and Grafana
Let's look at a practical example. We build a simple Python API with Flask that exposes two key metrics using Prometheus:
- http_requests_total: the number of HTTP requests received.
- http_request_latency_seconds: the average latency of those requests.
The script also simulates traffic automatically, so there is always data available for monitoring.
We use Docker to run Prometheus and Grafana. Prometheus scrapes the metrics from our API, and Grafana visualizes them in real time. This setup is very similar to what you would use in a production environment.
Diagram
Example Code
from flask import Flask, jsonify
from prometheus_client import Counter, Summary, generate_latest, CONTENT_TYPE_LATEST
import time, random, threading, requests
app = Flask(__name__)
REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
REQUEST_LATENCY = Summary('http_request_latency_seconds', 'HTTP request latency', ['endpoint'])
@app.route('/api')
def api():
start = time.time()
time.sleep(random.uniform(0.1, 0.8))
REQUEST_COUNT.labels(method='GET', endpoint='/api').inc()
REQUEST_LATENCY.labels(endpoint='/api').observe(time.time() - start)
return jsonify({'message': 'Hello, observability!'})
@app.route('/metrics')
def metrics():
return generate_latest(), 200, {'Content-Type': CONTENT_TYPE_LATEST}
def generate_traffic():
while True:
try:
requests.get('http://localhost:8000/api')
except Exception:
pass
time.sleep(random.uniform(0.5, 2))
if __name__ == '__main__':
threading.Thread(target=generate_traffic, daemon=True).start()
app.run(host='0.0.0.0', port=8000)
Prometheus Configuration (prometheus.yml)
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'python-app'
static_configs:
- targets: ['host.docker.internal:8000']
Docker Compose Example
docker-compose.yml:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- '9090:9090'
grafana:
image: grafana/grafana:latest
ports:
- '3000:3000'
depends_on:
- prometheus
Grafana Queries
- Total requests:
sum(rate(http_requests_total[1m]))
- Average latency:
rate(http_request_latency_seconds_sum[1m]) / rate(http_request_latency_seconds_count[1m])
Prometheus
It collects, stores, and queries metrics (such as CPU usage, memory, network traffic, etc.).
Dashboard Example
You can create a dashboard in Grafana to visualize these metrics in real time. This allows you to monitor your API's health, spot trends, and set up alerts for anomalies.
Why is this important?
With these metrics, you can:
- Detect performance issues (e.g., high latency)
- Monitor traffic patterns
- Set up alerts for abnormal behavior
- Make data-driven decisions to improve your application
How to Try It Yourself
- Clone the repository and install dependencies:
pip install flask prometheus_client requests
- Run the Python script to start the API and metrics endpoint.
- Use Docker Compose to start Prometheus and Grafana.
- In Grafana, add Prometheus as a data source and create dashboards with queries like:
sum(rate(http_requests_total[1m]))
rate(http_request_latency_seconds_sum[1m]) / rate(http_request_latency_seconds_count[1m])
Conclusion
Observability is not just about collecting data, but about making your systems transparent and manageable. With tools like Prometheus and Grafana, and a few lines of code, you can gain valuable insights into your applications and ensure their reliability.
Top comments (0)