DEV Community

Cover image for Usage Based Billing: A Practical Guide for Engineers
Nikhil Kassetty
Nikhil Kassetty

Posted on

Usage Based Billing: A Practical Guide for Engineers

Usage based billing systems power some of the largest companies in tech: AWS bills for compute usage, Stripe processes transaction-based fees, and Twilio meters every API call. These systems must handle millions of events per second while maintaining perfect financial accuracy.

This article presents a complete system design for a production-grade usage billing platform, covering architecture, data models, and scaling strategies used by companies like Stripe, AWS, and Adyen.

System Requirements

Functional Requirements

  1. Capture usage events from multiple sources
  2. Aggregate usage across time windows
  3. Apply tiered and volume-based pricing rules
  4. Generate accurate invoices
  5. Handle late-arriving events
  6. Support multiple currencies

Non-Functional Requirements

  • Throughput: 100,000 events/second
  • Latency: < 100ms for event ingestion
  • Accuracy: 100% financial correctness
  • Availability: 99.99% uptime

Capacity Estimates

  • 10 million active customers
  • 100,000 events/second (8.6 billion/day)
  • 1KB per event = 8.6TB/day
  • 7 years retention = ~22PB total

High-Level Architecture

High-Level Architecture

  1. Event Ingestion: Validates and deduplicates incoming events
  2. Event Streaming: Kafka provides replay capability and decoupling
  3. Metering Pipeline: Aggregates raw events into billable usage
  4. Rating Engine: Applies pricing rules to usage quantities
  5. Billing Engine: Generates invoices with tax and currency handling
  6. Reconciliation: Verifies every event is accounted for

Key Design Decision: We separate metering from billing because they have different SLAs and scaling requirements. Metering is real-time, billing is batch-oriented.

Component 1: Event Ingestion
The ingestion layer handles burst traffic and ensures exactly-once processing.
API Contract:

jsonPOST /v1/usage/events
{
  "idempotencyKey": "evt_1a2b3c",
  "customerId": "cust_123",
  "meterId": "api_calls",
  "quantity": 1,
  "timestamp": "2025-02-01T10:05:23Z"
}
Enter fullscreen mode Exit fullscreen mode

Validation Steps:

  • Check required fields exist
  • Verify timestamp is within acceptable range
  • Confirm meterId is configured
  • Check idempotency key for duplicates

Deduplication Strategy:

Deduplication Strategy

We use Redis with a 24-hour TTL to catch duplicate events. The idempotency key prevents double-charging even if the same event is sent multiple times.

Component 2: Metering Pipeline
The metering pipeline aggregates raw events into usage totals.

Database Schema:

sqlCREATE TABLE meter_usage (
    customer_id VARCHAR(50),
    meter_id VARCHAR(50),
    period_start TIMESTAMP,
    period_end TIMESTAMP,
    quantity DECIMAL(20,6),
    event_count INTEGER,
    PRIMARY KEY (customer_id, meter_id, period_start)
);
Enter fullscreen mode Exit fullscreen mode

Aggregation Process:

Aggregation Process

Each event increments a counter for the customer's billing period. We use database UPSERT to atomically increment usage values.

Handling Late Events:

When events arrive after the billing period closes, we create adjustments instead of modifying finalized invoices. This maintains audit integrity.

Component 3: Rating Engine

The rating engine converts usage quantities into dollar amounts using pricing rules.

Pricing Models:

Comparison of tiered, volume, and flat pricing models for usage-based billing

Three common models:

  • Tiered: Different rates for different ranges (most common)
  • Volume: Single rate based on total volume
  • Flat: Same price per unit always

Database Schema:

sqlCREATE TABLE pricing_tiers (
    plan_id VARCHAR(50),
    meter_id VARCHAR(50),
    range_start DECIMAL(20,6),
    range_end DECIMAL(20,6),  -- NULL = infinity
    unit_price DECIMAL(10,6)
);
Enter fullscreen mode Exit fullscreen mode

Critical Implementation Detail:

Always use DECIMAL types for financial calculations. Never use FLOAT or DOUBLE as they introduce rounding errors that compound across millions of transactions.

Component 4: Invoice Generation

The billing engine combines rated usage across all meters into a final invoice.

Invoice Data Model:

markdown**Invoice Breakdown: Header, Line Items, and Totals

Each invoice contains:

  • Header (customer, period, currency, status)
  • Line items (one per meter with usage and amount)
  • Totals (subtotal, tax, final total)

Generation Process:

  1. Fetch all usage for customer's billing period
  2. Apply rating rules to each meter
  3. Calculate subtotal
  4. Apply taxes based on customer location
  5. Create invoice record with DRAFT status
  6. Allow 1-hour grace period for late events
  7. Finalize invoice (becomes immutable)

Component 5: Reconciliation

Reconciliation ensures financial accuracy by verifying every event is accounted for.

Three-Level Verification:

Three-Level Verification

  1. Event Count Check: Raw events = Aggregated usage count
  2. Amount Check: Recalculated amounts = Invoiced amounts
  3. Balance Check: Sum of invoices = Account balance

If any check fails, the system creates an alert for investigation. This catches data loss, pricing bugs, or calculation errors before customers are affected.

Event Ingestion:

  • Deploy multiple API instances behind load balancer
  • Each instance handles 10k req/sec
  • Scale to 10+ instances for 100k req/sec target

Kafka Consumers:

  • Partition topic by customer_id for ordered processing
  • Run one consumer per partition
  • Add partitions dynamically as throughput grows

Database Sharding:

pythondef get_shard(customer_id):
    # Distribute customers across 16 shards
    hash_value = hash(customer_id)
    return hash_value % 16
Enter fullscreen mode Exit fullscreen mode

Handling Traffic Spikes
Rate Limiting:
Limit events per customer to prevent abuse and protect system.

Queue Backpressure:
Monitor Kafka lag. If lag exceeds threshold, slow down ingestion or temporarily reject events with retry-after headers.

Failure Handling
Dead Letter Queue:
When event processing fails after retries, send to DLQ for manual review:

Retry Logic and Dead Letter Queue

  • Transient errors (DB timeout, network issues): Retry with exponential backoff
  • Permanent errors (invalid data): Send immediately to DLQ
  • Max retries exceeded: Send to DLQ after 3 attempts

This ensures no events are silently dropped while preventing infinite retry loops.

Real-World Patterns
Stripe's Approach:

API-first metering: Customers report usage via REST API
Mandatory idempotency keys on all requests
1-hour draft period before invoice finalization
Webhook notifications for all invoice lifecycle events

AWS's Approach:

Service-side metering: Each AWS service emits usage events
Hourly aggregation with daily rollups
Separate pricing catalog service
Complex reserved instance calculations

Twilio's Approach:

Real-time usage visibility in dashboard
Sub-account isolation for resellers
Usage alerts to prevent bill shock
Programmatic usage queries via API

Key Takeaways

  • Use event streaming (Kafka) for replay capability and decoupling
  • Implement idempotency at ingestion to prevent duplicate charges
  • Always use Decimal types for financial calculations, never floats
  • Build comprehensive reconciliation to catch errors before customers notice
  • Design for late events from day one with adjustments and grace periods
  • Separate metering from billing - they have different scaling requirements
  • Monitor consumer lag and implement backpressure handling

Conclusion
Building a production-grade usage based billing system requires careful attention to event reliability, financial accuracy, and scalability. The architecture presented here handles 100,000+ events per second while maintaining the precision required for revenue-critical systems.

The key is separating concerns cleanly (ingestion, metering, rating, billing, reconciliation) and treating each as an independent service that can scale horizontally. With proper idempotency, reconciliation, and failure handling, this system can reliably process billions of events while maintaining financial integrity.

Top comments (0)