DEV Community

Beyond Logs: Implementing Tracing and Golden Signals for Distributed Systems

By: João Vitor Nascimento De Mendonça

  1. The Observability Gap
    In a microservices environment, having logs is not enough. When a request fails, you need to know exactly where the bottleneck is. I recently moved a legacy monitoring setup to an OpenTelemetry-based tracing system to solve "hidden" latencies.

  2. The Four Golden Signals
    We focused on the Google SRE Golden Signals:

Latency: Time it takes to service a request.

Traffic: Demand placed on the system.

Errors: The rate of requests that fail.

Saturation: How "full" your service is.

  1. Implementation: Distributed Tracing
    By injecting Trace IDs across services, we could visualize the entire request lifecycle. We discovered that a specific middleware was adding 120ms of unnecessary overhead to every auth request—a find that logs alone couldn't pinpoint.

  2. Conclusion
    Logs tell you what happened; traces tell you where and why. If you aren't using distributed tracing in 2026, you are flying blind.

Post 3: Escalabilidade de Dados
Title: Database Sharding vs. Read Replicas: How We Scaled for 1M Concurrent Users
By: João Vitor Nascimento De Mendonça

  1. The Vertical Scaling Wall
    There comes a point where "just get a bigger instance" stops working for your database. We hit that wall when our RDS instance reached 90% CPU utilization even on the largest tier.

  2. Strategy 1: CQRS and Read Replicas
    The first step was separating concerns using the CQRS (Command Query Responsibility Segregation) pattern.

All writes go to the Primary instance.

All heavy GET requests are load-balanced across Read Replicas.

Result: CPU usage dropped by 40% instantly.

  1. Strategy 2: Application-Level Sharding
    For the most critical tables, we implemented Sharding based on user_id. By distributing data across multiple physical shards, we ensured that no single database became a single point of failure.

  2. Conclusion
    Scaling databases is 10% hardware and 90% architecture. Sharding is complex, but for massive scale, it’s the only path forward.

Top comments (0)