Anderson Leite

Posted on Oct 31

Rethinking Observability Costs: How Structured Logging Can Save You Thousands

#infrastructure #logging #costsavings #softwareengineering

Long story short: Logs are cheap... Until they aren't.

Long story long: With modern apps emitting millions of lines per hour, unstructured logs become data debt.

Structured logging, when done right, can drastically cut ingest costs and improve your observability quality, all while making your engineering team more effective and your incident response faster.

The Observability Paradox

Here's the uncomfortable truth about modern observability: more data doesn't equal more insight.

Engineering teams by default wants to collect everything: Every request, every error, every debug statement, hoping that when something breaks, the answer will be hiding somewhere in their log aggregation tool.

When that incident happens, they're drowning in noise:

Searching through millions of log lines with regex
Waiting minutes for queries to complete
Finding the relevant signal buried in verbose stack traces
Paying exponentially more as log volume grows

The paradox is that we're simultaneously data-rich and insight-poor. We have petabytes of logs but still struggle to answer basic questions: "What caused this latency spike?" or "Which users are affected?"

The problem isn't the volume per se. It's that unstructured logs don't scale.

Modern observability platforms like New Relic charge based on two factors: data ingested (measured in GB) and compute consumption (measured in Compute Capacity Units or CCUs). CCUs track the computational work of loading pages, executing queries, and evaluating alerts.

Unstructured logs hit you twice: they inflate your data ingest costs and require expensive full-text searches that consume massive CCUs. Structured logging addresses both problems simultaneously.

Unstructured Chaos: Why String-Based Logs Hurt

Traditional logging looks like this (bear with my pseudo-code, I'm not a software engineer):

console.log('User login failed for email: ' + email + ' reason: ' + reason);
console.log('Processing payment id=' + paymentId + ' amount=' + amount + ' currency=' + currency);

Or in production logs:

[2025-10-29 10:23:45] INFO User login failed for email: user@example.com reason: Invalid password
[2025-10-29 10:24:12] ERROR Processing payment id=pay_123abc amount=99.99 currency=USD

This seems reasonable until you need to:

Query by field: "Show me all failed logins for this specific user" requires regex or full-text search across every log line. Slow and expensive.
Aggregate data: "What's the average payment amount by currency?" You can't. The data is trapped in strings.
Create dashboards: Extracting values from strings at query time is CPU-intensive and fragile. Change the log format, and your dashboards break.
Control costs: You can't selectively route or sample based on log content. It's all or nothing.
Debug efficiently: Finding related logs means string matching on request IDs manually inserted into messages.

The Hidden Costs

Unstructured logging creates a tax on everything:

Storage costs: You're storing redundant text. The string "User login failed for email:" appears millions of times when only the email changes.
Ingest costs: Many observability platforms charge by volume. Verbose, unstructured logs maximize that charge.
Query costs: Full-text search across terabytes is expensive. Some platforms charge per query or have rate limits.
Engineering time: Developers waste hours crafting complex queries or writing custom parsing scripts.
Incident response: When every second counts, slow log queries extend outages.

Structured Logging 101

Structured logging means emitting logs as key-value pairs (typically JSON) instead of free-form strings:

logger.info('User login failed', {
  event: 'user_login_failed',
  email: 'user@example.com',
  reason: 'invalid_password',
  ip_address: '192.168.1.1',
  user_agent: 'Mozilla/5.0...'
});

Output:

{
  "timestamp": "2025-10-29T10:23:45.123Z",
  "level": "info",
  "message": "User login failed",
  "event": "user_login_failed",
  "email": "user@example.com",
  "reason": "invalid_password",
  "ip_address": "192.168.1.1",
  "user_agent": "Mozilla/5.0..."
}

Now your logs are data, not just text.

The Power of Structure

Fast field-based queries: "Show failed logins" becomes event:user_login_failed AND reason:invalid_password — indexed, fast, cheap.
Aggregations and analytics: "Top 10 failure reasons by count" is trivial. Your observability tool treats fields as columns in a database.
Dynamic dashboards: Build visualizations that reference field names. Change log details without breaking dashboards.
Intelligent routing: Send high-value logs (errors, security events) to hot storage, and verbose debug logs to cold storage or drop them entirely.
Context enrichment: Automatically add fields like environment, service, version, trace_id to every log.
Machine learning: Anomaly detection and pattern recognition work far better on structured data (any decent observability platform offers this, like NewRelic or Datadog).

Case Study: Optimizing Next.js Application Logging

Let's look at a real-world example. A Next.js application was logging to New Relic with this pattern:

Before (unstructured):

console.log(`[${new Date().toISOString()}] API request: ${req.method} ${req.url} - Status: ${res.statusCode} - Duration: ${duration}ms`);
console.error(`Error in ${functionName}: ${error.message}\n${error.stack}`);

Monthly costs: ~$8,000 for log ingest (924 GB × $0.30/GB after free tier) plus ~$2,400 in Compute Capacity Units (CCUs) for queries and dashboards
Query performance: 30-60 seconds for complex searches
Dashboard reliability: Frequent breaks due to log format changes

After (structured with Pino):

import pino from 'pino';

const logger = pino({
  base: {
    service: 'nextjs-api',
    environment: process.env.NODE_ENV,
    version: process.env.APP_VERSION,
  },
  serializers: {
    req: pino.stdSerializers.req,
    res: pino.stdSerializers.res,
    err: pino.stdSerializers.err,
  }
});

// API logging
logger.info({
  event: 'api_request',
  method: req.method,
  url: req.url,
  status_code: res.statusCode,
  duration_ms: duration,
  user_id: req.user?.id,
  trace_id: req.headers['x-trace-id'],
});

// Error logging
logger.error({
  event: 'application_error',
  error_type: error.constructor.name,
  function_name: functionName,
  err: error, // Serialized automatically with stack trace
});

Results after 3 months:

Cost reduction: $10,400 → $3,800/month (63% savings)
- Data ingest: 924 GB → 340 GB through sampling = ~$100/month (from $277)
- CCU consumption: Reduced by 75% due to indexed queries = ~$600/month (from $2,400)
- User seats and base platform costs remained constant at ~$3,100/month
Query performance: 30-60s → 2-5s average (CCUs consumed per query dropped significantly)
Data retention: Extended from 15 to 30 days within budget
Alert accuracy: False positives reduced by 40%

What Made the Difference

NewRelic pricing model charges for both data ingest and Compute Capacity Units (CCUs). CCUs measure the compute capacity consumed by customer-initiated actions like loading pages, executing queries, and evaluating alerts. With unstructured logs, every query required full-text searches that consumed massive CCUs.

Structured logging changed the equation:

Sampling by event type: Kept 100% of errors and security events, sampled routine API requests at 10%, dropped debug logs in production, drastically reducing data ingest costs.
Environment-based filtering with Pipeline Cloud Rules: Dropped logs from dev/qa environments by default (saving 150+ GB/month), only enabling them on-demand for troubleshooting. Also filtered out noisy AKS system workloads and repetitive errors.
Efficient storage: JSON compression and NewRelic columnar storage meant less redundancy and lower ingest volume.
Faster, cheaper queries: Indexed field queries consumed far fewer CCUs compared to regex searches across unstructured text. A query that previously cost 50 CCUs now costs just 5 CCUs.
Better filtering: Pre-ingest filtering with NewRelic drop data rules eliminated noisy logs before they hit the billing meter for both ingest and compute.

Cost-Savings in Action

Here's how to operationalize structured logging for maximum savings:

1. Implement Sampling Strategies

Not all logs have equal value:

const logSamplingRates = {
  'security_event': 1.0,      // 100% - always keep
  'error': 1.0,                // 100% - always keep
  'api_request': 0.1,          // 10% - sample routine requests
  'debug': 0.01,               // 1% - rarely keep in production
};

function shouldLog(event: string): boolean {
  const rate = logSamplingRates[event] || 1.0;
  return Math.random() < rate;
}

if (shouldLog('api_request')) {
  logger.info({ event: 'api_request', ... });
}

Impact: Reducing routine logs by 90% while keeping critical events cuts costs dramatically without losing visibility into problems.

2. Intelligent Routing with Pipeline Cloud Rules

One of the most powerful cost optimization strategies is using New Relic's Pipeline Cloud Rules to intelligently route and filter logs based on their value. Pipeline Cloud Rules (the modern replacement for drop rules, which are end-of-life January 7, 2026) let you control data ingestion by environment, log level, source, and custom attributes.

Why this matters: Not all logs deserve premium treatment. Error logs need immediate querying, while debug logs from dev environments can be dropped entirely. Routing logs appropriately means you only pay premium prices (and consume CCUs) for high-value data.

Agent-Level Configuration

Start with basic filtering at the agent level:

// Configure with New Relic Node.js agent - newrelic.js
module.exports = {
  application_logging: {
    forwarding: {
      enabled: true,
      log_level: 'info', // Only forward info and above (info, warn, error)
    },
    local_decorating: {
      enabled: true
    }
  }
};

Cloud Rules: Drop by Log Level & Sampling

Use Pipeline Cloud Rules API for fine-grained control:

// Drop debug logs in production
const createCloudRule = async () => {
  const response = await fetch('https://api.newrelic.com/graphql', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'API-Key': process.env.NEW_RELIC_USER_KEY
    },
    body: JSON.stringify({
      query: `
        mutation {
          pipelineControlCreateRule(
            accountId: ${YOUR_ACCOUNT_ID}
            rule: {
              name: "Drop debug logs in production"
              description: "Reduce costs by dropping debug-level logs"
              enabled: true
              nrql: "DELETE FROM Log WHERE level = 'debug' AND environment = 'production'"
            }
          ) {
            rule { id name enabled }
            errors { description }
          }
        }
      `
    })
  });
  return response.json();
};

# Sample standard logs at 10% to reduce volume
mutation {
  pipelineControlCreateRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Sample standard logs"
      description: "Keep 10% of info/warn logs to reduce volume"
      enabled: true
      nrql: "DELETE FROM Log WHERE level IN ('info', 'warn') AND (timestamp % 10) != 0"
    }
  ) {
    rule { id }
  }
}

Cloud Rules: Environment-Based Filtering

Why pay to ingest verbose logs from development and QA environments when you only need them occasionally for troubleshooting?

# Drop logs from non-production environments by default
mutation {
  pipelineControlCreateRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Drop dev/qa logs by default"
      description: "Filter out non-production logs unless troubleshooting"
      enabled: true
      nrql: "DELETE FROM Log WHERE environment IN ('dev', 'qa', 'staging')"
    }
  ) {
    rule { id }
  }
}

Cloud Rules: Filter Kubernetes System Workloads

Eliminate noise from Azure/AKS infrastructure and system containers:

# Drop logs from system-level AKS workloads
mutation {
  pipelineControlCreateRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Drop AKS system workloads"
      description: "Filter out Azure-managed infrastructure logs"
      enabled: true
      nrql: "DELETE FROM Log WHERE k8s.namespace IN ('kube-system', 'kube-public', 'kube-node-lease', 'gatekeeper-system')"
    }
  ) {
    rule { id }
  }
}

# Drop logs from specific noisy applications
mutation {
  pipelineControlCreateRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Drop cert-manager noise"
      description: "Filter repetitive cert-manager authentication failures"
      enabled: true
      nrql: "DELETE FROM Log WHERE service.name = 'cert-manager' AND message LIKE '%failed to issue certificate%'"
    }
  ) {
    rule { id }
  }
}

Dynamic Troubleshooting Workflow

When you need to debug an issue in a dev/qa environment:

Temporarily disable the pipeline rule via API or UI
Reproduce the issue to capture logs
Investigate with full log data available
Re-enable the rule once troubleshooting is complete

// Helper script to toggle pipeline rules for troubleshooting
async function enableEnvironmentLogging(environment: string, durationMinutes: number = 30) {
  const ruleId = getRuleIdForEnvironment(environment); // Get rule ID from your config

  // Disable the drop rule
  await fetch('https://api.newrelic.com/graphql', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'API-Key': process.env.NEW_RELIC_USER_KEY
    },
    body: JSON.stringify({
      query: `
        mutation {
          pipelineControlUpdateRule(
            accountId: ${YOUR_ACCOUNT_ID}
            ruleId: "${ruleId}"
            rule: { enabled: false }
          ) {
            rule { id enabled }
          }
        }
      `
    })
  });

  console.log(`✅ Logging enabled for ${environment} environment`);
  console.log(`⏰ Will automatically re-enable filtering in ${durationMinutes} minutes`);

  // Schedule re-enabling
  setTimeout(async () => {
    await fetch('https://api.newrelic.com/graphql', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'API-Key': process.env.NEW_RELIC_USER_KEY
      },
      body: JSON.stringify({
        query: `
          mutation {
            pipelineControlUpdateRule(
              accountId: ${YOUR_ACCOUNT_ID}
              ruleId: "${ruleId}"
              rule: { enabled: true }
            ) {
              rule { id enabled }
            }
          }
        `
      })
    });
    console.log(`✅ Filtering re-enabled for ${environment} environment`);
  }, durationMinutes * 60 * 1000);
}

// Usage
await enableEnvironmentLogging('dev', 60); // Enable dev logs for 1 hour

Real-World Cost Impact

A company with 5 environments (dev, qa, staging, pre-prod, prod) was ingesting:

Production: 200 GB/month (needed)
Dev: 150 GB/month (rarely needed)
QA: 120 GB/month (rarely needed)
Staging: 80 GB/month (occasionally needed)
Pre-prod: 50 GB/month (occasionally needed)

Before pipeline rules: 600 GB/month × $0.30 = $180/month data ingest

After pipeline rules:

Production: 200 GB (always on)
Dev/QA: 5 GB (on-demand only, ~97% reduction)
Staging: 20 GB (25% sampling)
Pre-prod: 40 GB (80% sampling)
Total: 265 GB/month × $0.30 = $79.50/month
Savings: 56% reduction in data ingest costs

Multi-Destination Routing

For routing to external destinations (S3, SIEM, etc.), use log forwarders:

# Infrastructure agent logging.d/app-logs.yml
logs:
  - name: error-logs
    file: /var/log/app/errors.log
    attributes:
      log_type: error
      priority: high

  - name: security-logs
    file: /var/log/app/security.log
    attributes:
      log_type: security
      route_to: siem

  - name: standard-logs
    file: /var/log/app/app.log
    attributes:
      log_type: standard
      priority: low

Then use Pipeline Cloud Rules in New Relic to manage what stays in hot storage versus what gets exported to cold storage via Data Plus streaming features.

Important notes on Pipeline Cloud Rules:

Available with Advanced Compute add-on (required after January 7, 2026)
Existing drop rules will be automatically migrated by New Relic
Use DELETE FROM syntax (not the old DROP_DATA action)
More flexible and performant than legacy drop rules
Rules apply only to data arriving after creation (no retroactive deletion)

Impact: This approach saves on both data ingest charges AND CCU consumption, since you're querying far less data overall.

3. Retention Policies by Event Type

New Relic offers different retention periods based on data type and your plan (Standard vs Data Plus). You can further control retention using drop rules:

# Create retention rules via NerdGraph
mutation {
  nrqlDropRulesCreate(
    accountId: YOUR_ACCOUNT_ID
    rules: [
      {
        action: DROP_DATA_ON_INGEST
        nrql: "SELECT * FROM Log WHERE event_type NOT IN ('error', 'security_event') AND timestamp < ago(7 days)"
        description: "Drop standard logs older than 7 days"
      }
    ]
  ) {
    successes { id }
  }
}

Or leverage Data Plus for extended retention:

Standard data option: 8 days default retention for logs
Data Plus option: Up to 90 days extended retention with better query limits

Impact: Keep critical data longer without exploding costs. Data Plus costs $0.50/GB vs $0.30/GB for standard, but the extended retention and 3X query limits often justify the premium for high-value data.

4. Field-Level Filtering

Remove or redact unnecessary fields before ingest:

const sensitiveFields = ['password', 'credit_card', 'ssn'];
const verboseFields = ['full_stack_trace', 'request_body'];

function sanitizeLog(log: any) {
  const clean = { ...log };

  // Redact sensitive data
  sensitiveFields.forEach(field => {
    if (clean[field]) clean[field] = '[REDACTED]';
  });

  // Remove verbose fields in production
  if (process.env.NODE_ENV === 'production') {
    verboseFields.forEach(field => {
      delete clean[field];
    });
  }

  return clean;
}

Impact: Reduce log size per event, comply with privacy regulations, lower ingest costs.

5. Filtering Repetitive Noise

Structured logging makes it easy to identify and eliminate repetitive log entries that don't add value:

# Example: Drop repeated cert-manager errors after the first occurrence per hour
mutation {
  pipelineControlCreateRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Deduplicate cert-manager errors"
      description: "Keep first occurrence per hour, drop repeats"
      enabled: true
      nrql: "SELECT count(*) FROM Log WHERE service.name = 'cert-manager' AND message LIKE '%ACME authorization failed%' FACET message HAVING count(*) > 1"
      action: DROP
    }
  ) {
    rule { id }
  }
}

Or use application-level deduplication:

// Log deduplication utility
class LogDeduplicator {
  private recentLogs = new Map<string, number>();
  private readonly windowMs = 60000; // 1 minute

  shouldLog(event: string, key: string): boolean {
    const logKey = `${event}:${key}`;
    const lastSeen = this.recentLogs.get(logKey);
    const now = Date.now();

    if (lastSeen && (now - lastSeen) < this.windowMs) {
      return false; // Skip duplicate within window
    }

    this.recentLogs.set(logKey, now);
    return true;
  }

  // Cleanup old entries periodically
  cleanup() {
    const now = Date.now();
    for (const [key, timestamp] of this.recentLogs.entries()) {
      if (now - timestamp > this.windowMs) {
        this.recentLogs.delete(key);
      }
    }
  }
}

const deduplicator = new LogDeduplicator();

// Usage
if (deduplicator.shouldLog('cert_error', certDomain)) {
  logger.error({
    event: 'cert_manager_error',
    domain: certDomain,
    error: 'ACME authorization failed',
    message: 'First occurrence in this time window - investigate'
  });
}

Common sources of log noise to filter:

Health check endpoints: /health, /ready, /alive requests (unless they fail)
Static asset 404s: Bots scanning for wp-admin, phpmyadmin, etc.
Kubernetes system events: Repetitive pod scheduling logs, node status updates
Azure/AKS infrastructure: System container logs from kube-proxy, coredns, metrics-server
Certificate renewal attempts: cert-manager retry loops
Liveness probe failures: When a pod is already marked unhealthy

Impact: Companies typically find that 40-60% of their log volume is repetitive noise that provides minimal value. Filtering this noise can cut costs almost in half while actually improving signal-to-noise ratio for genuine issues.

Beyond Cost: Better Alerting and Insights

The real value of structured logging goes beyond just saving money:

Precise Alerting

Before: Alert when log message contains "error" or "failed"

Problem: Too noisy, many false positives

After: Alert on level:error AND service:payment-api AND error_type:DatabaseConnectionTimeout

Result: Actionable, low false-positive alerts

Pattern Detection

Structured logs enable machine learning for anomaly detection:

# Example: detect unusual error patterns
from sklearn.ensemble import IsolationForest

# Feature extraction from structured logs
features = logs_df[['status_code', 'duration_ms', 'error_count_last_hour']]

# Train anomaly detector
clf = IsolationForest(contamination=0.1)
clf.fit(features)

# Detect anomalies in real-time
anomalies = clf.predict(new_logs_features)

This is nearly impossible with unstructured strings.

Distributed Tracing Correlation

With consistent trace_id fields:

logger.info({
  event: 'api_request',
  trace_id: req.headers['x-trace-id'],
  span_id: generateSpanId(),
  parent_span_id: req.headers['x-parent-span-id'],
  ...
});

Your logs automatically connect with traces, giving you end-to-end visibility.

Business Analytics

Structured logs become a data source for product analytics:

-- Average checkout time by payment method
SELECT 
  payment_method,
  AVG(duration_ms) as avg_duration,
  COUNT(*) as total_checkouts
FROM logs
WHERE event = 'checkout_completed'
  AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY payment_method;

Your observability platform becomes a lightweight analytics tool.

Common Pitfalls to Avoid

Over-structuring: Don't create 50 fields when 10 will do. Focus on queryable, actionable data.

Inconsistent field names: user_id in one service and userId in another breaks aggregation. Standardize early, but if you inherit inconsistent naming across multiple services, you don't have to refactor everything immediately. Modern observability platforms like New Relic allow field transformation at the agent level before data is ingested.

// Example: New Relic Node.js agent custom attribute transformation
// newrelic.js
module.exports = {
  application_logging: {
    forwarding: {
      enabled: true
    }
  },

  // Custom instrumentation to normalize field names
  api: {
    // Register a custom tracer
    noticeError: function(error, customAttributes) {
      // Normalize field names before sending
      const normalized = normalizeAttributes(customAttributes);
      return this.noticeError(error, normalized);
    }
  }
};

// Normalization helper
function normalizeAttributes(attrs) {
  const mapping = {
    'userId': 'user_id',
    'requestId': 'request_id',
    'orderId': 'order_id',
    // Add your mappings here
  };

  const normalized = {};
  for (const [key, value] of Object.entries(attrs)) {
    const normalizedKey = mapping[key] || key;
    normalized[normalizedKey] = value;
  }
  return normalized;
}

Or use log forwarders like Fluent Bit with preprocessing:

[FILTER]
    Name modify
    Match *
    Rename userId user_id
    Rename requestId request_id
    Rename customerId customer_id

Why this matters: Field name normalization at the ingestion layer reduces engineering team burden. Teams can adopt structured logging incrementally without coordinating a massive refactor across all services simultaneously. The transformation happens transparently, and your dashboards and alerts work consistently across all services.

Logging sensitive data: Structure makes it easier to accidentally log PII. With GDPR, DORA, and other privacy regulations, it's critical to handle personal data appropriately. Instead of logging entire email addresses or IP addresses, partially anonymize them before sending to your observability tool.

// Utility functions for partial anonymization
function maskEmail(email: string): string {
  if (!email || !email.includes('@')) return '[invalid-email]';
  const [local, domain] = email.split('@');

  // Show first 2 chars and mask the rest before @
  const maskedLocal = local.length > 2 
    ? local.substring(0, 2) + '*'.repeat(local.length - 2)
    : local;

  return `${maskedLocal}@${domain}`;
}

function maskIP(ip: string): string {
  // For IPv4: mask last octet (e.g., 192.168.1.xxx)
  const parts = ip.split('.');
  if (parts.length === 4) {
    return `${parts[0]}.${parts[1]}.${parts[2]}.xxx`;
  }

  // For IPv6: mask last 4 groups
  const ipv6Parts = ip.split(':');
  if (ipv6Parts.length >= 4) {
    return ipv6Parts.slice(0, 4).join(':') + ':xxxx:xxxx:xxxx:xxxx';
  }

  return 'xxx.xxx.xxx.xxx';
}

function maskCreditCard(cc: string): string {
  // Show only last 4 digits
  return '**** **** **** ' + cc.slice(-4);
}

// Use in your logging
logger.info({
  event: 'user_login_failed',
  email: maskEmail(email), // "us***@example.com" instead of "user@example.com"
  ip_address: maskIP(ipAddress), // "192.168.1.xxx" instead of "192.168.1.105"
  reason: 'invalid_password',
  timestamp: new Date().toISOString()
});

New Relic also provides built-in obfuscation features:

// Configure log obfuscation rules in New Relic
// This can be done via the UI or NerdGraph API
const obfuscationRule = `
mutation {
  logConfigurationsCreateObfuscationRule(
    accountId: YOUR_ACCOUNT_ID
    rule: {
      name: "Mask PII in logs"
      description: "Hash emails and mask IPs for GDPR compliance"
      enabled: true
      filter: "source:'application_logs'"
      actions: [
        {
          attributes: ["email"]
          method: HASH_SHA256
          expression: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
        },
        {
          attributes: ["ip_address", "client_ip"]
          method: MASK
          expression: "\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"
        }
      ]
    }
  ) {
    rule { id name }
  }
}
`;

Why this matters:

GDPR Article 32 requires "pseudonymisation and encryption of personal data"
DORA (Digital Operational Resilience Act) mandates protecting sensitive operational data
Partial anonymization lets you retain usefulness (you can still see patterns, like "multiple failures from same domain") without storing full PII
Using hashing (SHA-256) allows you to search for specific users if you have their original email, while masking makes data completely anonymous
New Relic automatically obfuscates credit card and SSN patterns, but custom rules give you control over other PII

Always use allowlists, not denylists when deciding what to log. It's safer to explicitly specify "log these specific fields" rather than "log everything except these fields."

Ignoring cardinality: Fields with millions of unique values (like full URLs) can explode index sizes. Hash or truncate.

Forgetting humans: Include a readable message field. JSON is great for machines, but humans debug too.

Don't log more — log smarter. Structure turns noise into value.

Structured logging isn't just a technical upgrade — it's a strategic shift in how you think about observability. By treating logs as data from the start, you unlock better insights, faster debugging, and dramatically lower costs.

Start small: pick your highest-volume service, add structured logging, and measure the impact. You'll likely see meaningful cost savings within the first month. More importantly, you'll build a foundation for observability that scales with your business, not against it.

The goal isn't to log everything. It's to log the right things, in the right format, at the right time. Get that right, and your logs become an asset, not an expense.

Annex - Suggestion for an Implementation Roadmap

Week 1-2: Audit and Plan

Review current logging patterns and costs
Identify highest-volume log sources
Choose a structured logging library (Pino, Winston, Bunyan)
Define standard fields and event taxonomy

Week 3-4: Pilot Implementation

Start with one service or module
Implement structured logging with sampling
Set up routing and retention policies
Monitor cost and performance impact

Month 2: Expand and Refine

Roll out to additional services
Tune sampling rates based on data
Build dashboards using structured fields
Train team on new logging practices

Month 3: Optimize and Automate

Implement automated field standardization
Set up ML-based anomaly detection
Create logging guidelines and templates
Establish logging review in code reviews

Ongoing: Measure and Improve

Monthly cost reviews
Quarterly event taxonomy updates
Continuous sampling rate optimization

DEV Community