Modern applications don’t just need to run — they need to be understood. When something goes wrong in production, teams must be able to detect issues, diagnose the root cause, and monitor the system’s behavior in real time.
This is where observability becomes essential.
In this article, I explain what observability is, why it matters, and how I implemented a real-world example using Python, Prometheus, and FastAPI. You can use this code to build your own monitoring pipeline.
What Is Observability?
Observability is the ability to understand the internal state of a system based on the data it produces.
It is built around three core pillars:
1. Metrics
Numeric values that reflect system state.
Examples: request latency, CPU usage, memory consumption.
2. Logs
Detailed event records generated by applications and systems.
Examples: authentication messages, errors, warnings.
3. Traces
End-to-end tracking of requests across services.
Useful in microservices and distributed systems.
Together, these help answer:
- What is happening?
- Why is it happening?
- Where is it failing?
Why Observability Matters
- Observability helps teams:
- Detect issues earlier
- Reduce downtime
- Improve performance
- Understand user impact
- Monitor applications at scale
- Make data-driven decisions
Without observability, debugging becomes slow, reactive, and inconsistent.
Real-World Example: Observability With Python + Prometheus
For this example, I implemented observability on a small API using:
- Python
- FastAPI
- Prometheus (metrics collection)
- Grafana (optional dashboards)
This setup is commonly used in startups and cloud-native environments.
1. Install Dependencies
First, install the required packages:
pip install fastapi uvicorn prometheus-client
2. Python API With Prometheus Metrics
Below is a simple FastAPI application that exposes metrics at /metrics.
Prometheus will scrape this endpoint every few seconds.
__from fastapi import FastAPI
from prometheus_client import Counter, Histogram, generate_latest
from fastapi.responses import Response
import time
import random
app = FastAPI()
REQUEST_COUNT = Counter("api_requests_total", "Total number of API requests received")
REQUEST_LATENCY = Histogram("api_request_latency_seconds", "API request latency")
@app.get("/")
def home():
REQUEST_COUNT.inc()
with REQUEST_LATENCY.time():
time.sleep(random.uniform(0.1, 0.5))
return {"message": "API is running successfully"}_
@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type="text/plain")_
What this code does:
Metric Description
api_requests_total Counts all incoming requests
api_request_latency_seconds Measures request duration
These metrics help determine whether the API is fast, overloaded, or failing.
3. Prometheus Configuration
Create a file named prometheus.yml:
_global:
scrape_interval: 5s
scrape_configs:
- job_name: "python-api"
static_configs:
- targets: ["localhost:8000"]_
Prometheus will scrape the metrics endpoint at:
4. Run Prometheus
Download Prometheus, then run it:
./prometheus --config.file=prometheus.yml
Open the Prometheus UI at:
_
http://localhost:9090_
Query metrics like:
api_requests_total
rate(api_requests_total[1m])
api_request_latency_seconds_bucket
5. Optional: Grafana Dashboard
Grafana can visualize your Prometheus metrics with modern dashboards.
Typical graphs include:
- Request rate
- CPU and memory usage
- Error percentage
- Latency (p95, p99)
This is valuable when demonstrating observability to teams or stakeholders.
Observability Best Practices
To implement observability professionally:
✔ Instrument every major endpoint
Expose metrics for performance-critical APIs.
✔ Standardize metric names
Avoid random or unstructured naming.
✔ Include labels (tags)
Labels such as status_code, endpoint, or method add context.
✔ Use alerts
For example:
“95th percentile latency exceeds 500ms for 3 minutes.”
✔ Visualize everything
Dashboards make patterns obvious.
✔ Combine logs, metrics, and traces
Observability works best when all three pillars are present.
Conclusion
Observability allows teams to deeply understand how their systems behave.
Using Prometheus + FastAPI, I demonstrated how to expose useful metrics that support:
- Faster debugging
- Better performance insights
- Safer deployments
- Scalable system monitoring
This example can be expanded with tracing (OpenTelemetry), log pipelines (ELK Stack), or full cloud observability platforms like AWS CloudWatch, Datadog, or Azure Monitor.
References
- Prometheus Documentation – https://prometheus.io/docs
- Grafana Documentation – https://grafana.com/docs
- FastAPI – https://fastapi.tiangolo.com
- OpenTelemetry – https://opentelemetry.io
Top comments (1)
Tu artículo está claro y explica la observabilidad de forma sencilla. El ejemplo con Python y Prometheus está bien hecho y funciona como guía rápida. Podrías hacer el final un poco más corto para que cierre con más fuerza, pero en general quedó práctico y fácil de seguir.