DEV Community

DAVID JORDAN ANAMPA PANCCA
DAVID JORDAN ANAMPA PANCCA

Posted on

Observability Practices: Implementing Real-World Monitoring With Python and Prometheus

Modern applications don’t just need to run — they need to be understood. When something goes wrong in production, teams must be able to detect issues, diagnose the root cause, and monitor the system’s behavior in real time.
This is where observability becomes essential.

In this article, I explain what observability is, why it matters, and how I implemented a real-world example using Python, Prometheus, and FastAPI. You can use this code to build your own monitoring pipeline.

What Is Observability?

Observability is the ability to understand the internal state of a system based on the data it produces.

It is built around three core pillars:

1. Metrics

Numeric values that reflect system state.
Examples: request latency, CPU usage, memory consumption.

2. Logs

Detailed event records generated by applications and systems.
Examples: authentication messages, errors, warnings.

3. Traces

End-to-end tracking of requests across services.
Useful in microservices and distributed systems.

Together, these help answer:

  • What is happening?
  • Why is it happening?
  • Where is it failing?

Why Observability Matters

  • Observability helps teams:
  • Detect issues earlier
  • Reduce downtime
  • Improve performance
  • Understand user impact
  • Monitor applications at scale
  • Make data-driven decisions

Without observability, debugging becomes slow, reactive, and inconsistent.

Real-World Example: Observability With Python + Prometheus

For this example, I implemented observability on a small API using:

  • Python
  • FastAPI
  • Prometheus (metrics collection)
  • Grafana (optional dashboards)

This setup is commonly used in startups and cloud-native environments.

1. Install Dependencies

First, install the required packages:

pip install fastapi uvicorn prometheus-client

2. Python API With Prometheus Metrics

Below is a simple FastAPI application that exposes metrics at /metrics.
Prometheus will scrape this endpoint every few seconds.

__from fastapi import FastAPI
from prometheus_client import Counter, Histogram, generate_latest
from fastapi.responses import Response
import time
import random

app = FastAPI()

REQUEST_COUNT = Counter("api_requests_total", "Total number of API requests received")
REQUEST_LATENCY = Histogram("api_request_latency_seconds", "API request latency")

@app.get("/")
def home():
REQUEST_COUNT.inc()
with REQUEST_LATENCY.time():
time.sleep(random.uniform(0.1, 0.5))
return {"message": "API is running successfully"}_

@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type="text/plain")_

What this code does:
Metric Description
api_requests_total Counts all incoming requests
api_request_latency_seconds Measures request duration

These metrics help determine whether the API is fast, overloaded, or failing.

3. Prometheus Configuration

Create a file named prometheus.yml:

_global:
scrape_interval: 5s

scrape_configs:

  • job_name: "python-api" static_configs:
    • targets: ["localhost:8000"]_

Prometheus will scrape the metrics endpoint at:

http://localhost:8000/metrics

4. Run Prometheus

Download Prometheus, then run it:

./prometheus --config.file=prometheus.yml

Open the Prometheus UI at:
_
http://localhost:9090_

Query metrics like:

api_requests_total
rate(api_requests_total[1m])
api_request_latency_seconds_bucket

5. Optional: Grafana Dashboard

Grafana can visualize your Prometheus metrics with modern dashboards.

Typical graphs include:

  • Request rate
  • CPU and memory usage
  • Error percentage
  • Latency (p95, p99)

This is valuable when demonstrating observability to teams or stakeholders.

Observability Best Practices

To implement observability professionally:

✔ Instrument every major endpoint

Expose metrics for performance-critical APIs.

✔ Standardize metric names

Avoid random or unstructured naming.

✔ Include labels (tags)

Labels such as status_code, endpoint, or method add context.

✔ Use alerts

For example:
“95th percentile latency exceeds 500ms for 3 minutes.”

✔ Visualize everything

Dashboards make patterns obvious.

✔ Combine logs, metrics, and traces

Observability works best when all three pillars are present.

Conclusion

Observability allows teams to deeply understand how their systems behave.
Using Prometheus + FastAPI, I demonstrated how to expose useful metrics that support:

  • Faster debugging
  • Better performance insights
  • Safer deployments
  • Scalable system monitoring

This example can be expanded with tracing (OpenTelemetry), log pipelines (ELK Stack), or full cloud observability platforms like AWS CloudWatch, Datadog, or Azure Monitor.

References

Top comments (1)

Collapse
 
ahmed_a_o profile image
AHMED HASAN AKHTAR OVIEDO

Tu artículo está claro y explica la observabilidad de forma sencilla. El ejemplo con Python y Prometheus está bien hecho y funciona como guía rápida. Podrías hacer el final un poco más corto para que cierre con más fuerza, pero en general quedó práctico y fácil de seguir.