DEV Community

Simple Observability Practice with Python and Prometheus: How to See Inside Your App Without Touching It

What are Observability Practices?

Observability is the ability of a system to enable a deep understanding of its internal behavior based on the information it emits externally. This includes metrics, logs, and distributed traces. In other words, observability isn't just about monitoring whether an app is "up or down," but about understanding how and why it behaves the way it does.

Imagine your system is like an airplane in flight. You can't open it to see what's going on inside, but you have instruments: altimeter, speedometer, temperature sensors, etc. The same is true of modern software: it's often in production, far from the development environment, and we need those tools to "see inside" without touching anything.

Observability practices are the set of strategies, tools, and conventions that allow us to systematically collect, structure, and analyze that information. These practices typically include:

  • Code instrumentation to generate metrics (such as the number of requests per second)
  • Configuring detailed logs with different levels (info, warn, error, etc.)
  • Distributed tracing across services to follow a request from start to finish

An observable application is one that, in the event of any failure, degradation, or unusual behavior, provides the necessary signals to detect, understand, and resolve the problem, without the need for guesswork or invasive testing.

What is observability for?

Observability isn't just useful in emergencies: it's essential for operating modern software reliably, especially in distributed systems, microservices, and platforms that scale to thousands of users per minute.

Its main functions include:

  • Detecting errors or unexpected behavior in real time: for example, sudden drops in traffic, latency spikes, or database connection errors.
  • Measuring the performance of critical functions: identifying which parts of the system consume the most resources, take longer, or execute abnormally frequently.
  • Generating automatic alerts when something goes wrong: integrating with systems like Grafana, Datadog, or Prometheus AlertManager to react before users notice.
  • Analyzing trends and usage patterns: allowing you to anticipate bottlenecks, prepare for scalability, or plan infrastructure changes.
  • Facilitate debugging in production: without the need to replicate scenarios locally or disrupt the system, since observability gives us a clear map of the flow and state of the running system.

What problems does it help solve?

Let's take a concrete example: an order processing microservice in an online store. It works perfectly in development, passes all tests, and is deployed to production. Everything seems fine... until:

  • The response time becomes slow at certain times.
  • Some orders aren't processed correctly.
  • Performance metrics start to fluctuate without explanation.

Without observability, it would be impossible to know what's going on. We'd only see users complaining or some orders not arriving, without knowing where the fault lies.

With good observability practices, we could:

  • Know how many orders are processed per minute or per hour
  • Detect how many fail and with what exact frequency
  • Visualize how long each order takes on average, and when that duration spikes
  • Correlate events (such as errors or load spikes) with recent changes or external conditions
  • Have exact traceability between services in a microservices architecture

Simple Practical Example: Observability in Python with Prometheus

We created a small Python service that simulates order processing and exposes custom metrics through Prometheus.

Technologies used:

  1. Language: Python
  2. Metrics: prometheus_client
  3. Automation: GitHub Actions
  4. Container: Docker
  5. Repository: Public GitHub

Image description

What does the program do?

Processes orders every second (simulated)

Records custom metrics:

  • orders_processed_total
  • orders_failed_total
  • order_processing_duration_seconds

Exhibition at: http://localhost:8000/metrics

Image description

Metrics view in the browser

Full code on GitHub

You can see the full code, structure, Dockerfile, and automation configuration here:
https://github.com/WhiteFall20/Simple_Example_Observability.git

Conclusion

Observability is no longer a luxury or a technical "plus": today, it is a fundamental necessity in the development and operation of modern systems. In times of increasingly distributed architectures, higher user demands, and more costly errors, having visibility into the real state of the system is key to ensuring software quality, scalability, and reliability.
This small project demonstrates that even with basic tools like Python and Prometheus, effective observability practices can be implemented: capturing custom metrics, analyzing them in real time, and automating processes through GitHub Actions. No complex or expensive infrastructure is required to get started.
Furthermore, observability not only helps resolve problems when they occur, but also allows for preventing them, making informed decisions, and learning from system behavior in production. Ultimately, it is a tool for knowledge, continuous improvement, and technological maturity.

Top comments (0)