DEV Community

# observability

Gaining deep insights into system behavior through metrics, logs, and traces.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
I logged 300 Hermes runs to one file. trace-session-split cut it into 300.

Hermes Agent Challenge Submission: Build With Hermes Agent

I logged 300 Hermes runs to one file. trace-session-split cut it into 300.

Comments
2 min read
Translating LLM Telemetry Between OpenInference and OTel GenAI with Rust

Translating LLM Telemetry Between OpenInference and OTel GenAI with Rust

Comments
5 min read
Bronto for Fastly: Real-Time CDN Logging That Actually Scales

Bronto for Fastly: Real-Time CDN Logging That Actually Scales

2
Comments
5 min read
What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

Comments
5 min read
Distributed tracing across FastAPI and Celery with OpenTelemetry the part nobody shows you

Distributed tracing across FastAPI and Celery with OpenTelemetry the part nobody shows you

Comments
2 min read
Hallucination Detection at the Trace Layer: 4 Detectors You Can Ship Today

Hallucination Detection at the Trace Layer: 4 Detectors You Can Ship Today

Comments
10 min read
Eval Set Drift: How to Know When Your Golden Set Went Stale

Eval Set Drift: How to Know When Your Golden Set Went Stale

Comments
8 min read
Per-Customer LLM Cost Reports (Without Rearchitecting Your Billing Pipeline)

Per-Customer LLM Cost Reports (Without Rearchitecting Your Billing Pipeline)

Comments
8 min read
OpenTelemetry: The Foundation of Modern Cloud-Native Observability — Traces, Metrics, Logs, and the Future of Observability

OpenTelemetry: The Foundation of Modern Cloud-Native Observability — Traces, Metrics, Logs, and the Future of Observability

1
Comments
8 min read
EKS Metrics: Amazon Managed Prometheus vs Self-Managed Prometheus

EKS Metrics: Amazon Managed Prometheus vs Self-Managed Prometheus

Comments
10 min read
When one reliability surface has to satisfy everyone

When one reliability surface has to satisfy everyone

1
Comments
5 min read
Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector.

Your Agent Just Called the Same Tool 47 Times. Here's the 20-Line Detector.

Comments
7 min read
Log Level Strategies: Balancing Observability and Cost

Log Level Strategies: Balancing Observability and Cost

Comments
8 min read
Investigation Reports: When Monitors Get Smarter

Investigation Reports: When Monitors Get Smarter

2
Comments
4 min read
AI SRE and AI DevOps: different problems, one reliability stack

AI SRE and AI DevOps: different problems, one reliability stack

2
Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.