DEV Community

Cover image for Log Management at Scale: How We Cut Costs 70% Without Losing Signal
Samson Tanimawo
Samson Tanimawo

Posted on

Log Management at Scale: How We Cut Costs 70% Without Losing Signal

$12,000/Month for Logs Nobody Reads

Our logging bill was $12,000/month. We were ingesting 2TB/day. When I asked the team what percentage of logs they actually looked at during incidents, the answer was embarrassing: about 5%.

We were paying to store 95% noise.

The Log Audit

First, I categorized all log sources by value:

High value (always need during incidents):
  Application errors (stack traces)
  Authentication events
  Business transactions
  External API calls with responses
  Health check failures

Medium value (sometimes useful):
  Request/response logs (sampled)
  Performance metrics in logs
  Deployment events
  Configuration changes

Low value (almost never needed):
  Debug/trace level logs
  Health check successes
  Static asset requests
  Heartbeat messages
  Verbose framework logs
Enter fullscreen mode Exit fullscreen mode

Strategy 1: Log Levels as a Service

We made log levels dynamic. In production, default is WARN. During incidents, flip to DEBUG for the affected service:

import os
import logging

# Log level from environment variable, changeable at runtime
LOG_LEVEL = os.environ.get('LOG_LEVEL', 'WARNING')
logging.basicConfig(level=getattr(logging, LOG_LEVEL))

# Endpoint to change log level without restart
@app.post('/admin/log-level')
async def set_log_level(level: str):
    logging.getLogger().setLevel(getattr(logging, level.upper()))
    return {'status': 'ok', 'level': level}
Enter fullscreen mode Exit fullscreen mode

In Kubernetes:

# Normal operation
kubectl set env deployment/api LOG_LEVEL=WARNING

# During incident
kubectl set env deployment/api LOG_LEVEL=DEBUG

# After incident
kubectl set env deployment/api LOG_LEVEL=WARNING
Enter fullscreen mode Exit fullscreen mode

Strategy 2: Tiered Retention

retention_policy:
  hot_storage:  # Fast search, expensive
    duration: 7 days
    filter: "level >= WARN OR tag:business_event"

  warm_storage:  # Slower search, cheaper
    duration: 30 days
    filter: "level >= INFO"

  cold_storage:  # Archive only, cheapest
    duration: 365 days
    filter: "tag:audit OR tag:compliance"

  drop:  # Don't store at all
    filter: "level = DEBUG OR source:health_check"
Enter fullscreen mode Exit fullscreen mode

Strategy 3: Structured Logging

Unstructured logs are expensive to parse. Structured logs are cheap to query:

# Bad: Unstructured
logger.info(f"User {user_id} purchased {product_id} for ${amount}")
# Parsing this requires regex, which costs compute

# Good: Structured
logger.info("purchase_completed", extra={
    'user_id': user_id,
    'product_id': product_id, 
    'amount': amount,
    'currency': 'USD'
})
# Output: {"message": "purchase_completed", "user_id": "u123", ...}
# Queryable without parsing
Enter fullscreen mode Exit fullscreen mode

Strategy 4: Sample Verbose Logs

import random

def should_log_request(request):
    # Always log errors
    if request.status_code >= 400:
        return True
    # Always log slow requests
    if request.duration_ms > 1000:
        return True
    # Sample 10% of successful requests
    return random.random() < 0.10
Enter fullscreen mode Exit fullscreen mode

The Results

Before:
  Daily ingestion: 2 TB
  Monthly cost: $12,000
  Useful data: ~5%

After:
  Daily ingestion: 400 GB
  Monthly cost: $3,600
  Useful data: ~70%
Enter fullscreen mode Exit fullscreen mode

We cut costs by 70% AND improved signal quality. Searches are faster because there's less noise. Incidents resolve quicker because relevant logs surface immediately.

The Rule

Before adding a log statement, ask: "Will someone look at this during an incident?" If the answer is no, it's DEBUG level at most.

If you're spending too much on logs and want smarter log management, check out what we're building at Nova AI Ops.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)