Your Datadog bill is 60% DEBUG logs

#monitoring #sre #observability #devops

A CTO asked me: "Should we move off Datadog? It's eating our runway."

I said: "Before you migrate, show me your retention config."

They didn't have one. The default was still set.

60% of the bill was DEBUG logs nobody had queried in 90 days. CloudWatch forwarders were pushing everything — access logs, auth logs, health checks. All at 30-day retention. All indexed. All paid for.

The migration would have taken 3 months, cost the team's sanity, and moved the same problem to Grafana.

The actual fix was a 2-week config exercise:

→ Tag logs by severity + service ownership
→ 3 retention tiers: P0 incidents keep 90d, operational 7d, DEBUG 24h
→ Stop indexing health-check logs. Archive them raw to S3 at $0.023/GB
→ Custom metrics audit: 18% of them weren't on any dashboard or alert
→ APM sampling reduced from 100% to 10% on non-critical services

Result: Datadog bill dropped 51% in 6 weeks. No vendor change. No re-training. No migration risk.

The observability industry loves selling you a new tool. But the problem isn't usually the tool. It's:
→ Defaults that were set when your traffic was 10x smaller
→ Nobody owns retention policy
→ Custom metrics piled up, nothing ever got deleted
→ Alerts firing so often everyone muted them

If you're about to RFP a new observability vendor: audit your current one first. You'll save 6 months and 60% of the spend.

If this sounds like your stack, repost. There's a VP Engineering reading a Grafana pitch deck right now who needs to hear it.

Datadog #Observability #DevOps #SRE #FinOps #CloudCost #IndiaSaaS #Founders #Engineering

DEV Community

Your Datadog bill is 60% DEBUG logs

Datadog #Observability #DevOps #SRE #FinOps #CloudCost #IndiaSaaS #Founders #Engineering

Top comments (0)