Nikhil Kassetty

Posted on Dec 8, 2025

Usage Based Billing: A Practical Guide for Engineers

#design #tutorial #architecture #billing

Usage based billing systems power some of the largest companies in tech: AWS bills for compute usage, Stripe processes transaction-based fees, and Twilio meters every API call. These systems must handle millions of events per second while maintaining perfect financial accuracy.

This article presents a complete system design for a production-grade usage billing platform, covering architecture, data models, and scaling strategies used by companies like Stripe, AWS, and Adyen.

System Requirements

Functional Requirements

Capture usage events from multiple sources
Aggregate usage across time windows
Apply tiered and volume-based pricing rules
Generate accurate invoices
Handle late-arriving events
Support multiple currencies

Non-Functional Requirements

Throughput: 100,000 events/second
Latency: < 100ms for event ingestion
Accuracy: 100% financial correctness
Availability: 99.99% uptime

Capacity Estimates

10 million active customers
100,000 events/second (8.6 billion/day)
1KB per event = 8.6TB/day
7 years retention = ~22PB total

High-Level Architecture

Event Ingestion: Validates and deduplicates incoming events
Event Streaming: Kafka provides replay capability and decoupling
Metering Pipeline: Aggregates raw events into billable usage
Rating Engine: Applies pricing rules to usage quantities
Billing Engine: Generates invoices with tax and currency handling
Reconciliation: Verifies every event is accounted for

Key Design Decision: We separate metering from billing because they have different SLAs and scaling requirements. Metering is real-time, billing is batch-oriented.

Component 1: Event Ingestion
The ingestion layer handles burst traffic and ensures exactly-once processing.
API Contract:

jsonPOST /v1/usage/events
{
  "idempotencyKey": "evt_1a2b3c",
  "customerId": "cust_123",
  "meterId": "api_calls",
  "quantity": 1,
  "timestamp": "2025-02-01T10:05:23Z"
}

Validation Steps:

Check required fields exist
Verify timestamp is within acceptable range
Confirm meterId is configured
Check idempotency key for duplicates

Deduplication Strategy:

We use Redis with a 24-hour TTL to catch duplicate events. The idempotency key prevents double-charging even if the same event is sent multiple times.

Component 2: Metering Pipeline
The metering pipeline aggregates raw events into usage totals.

Database Schema:

sqlCREATE TABLE meter_usage (
    customer_id VARCHAR(50),
    meter_id VARCHAR(50),
    period_start TIMESTAMP,
    period_end TIMESTAMP,
    quantity DECIMAL(20,6),
    event_count INTEGER,
    PRIMARY KEY (customer_id, meter_id, period_start)
);

Aggregation Process:

Each event increments a counter for the customer's billing period. We use database UPSERT to atomically increment usage values.

Handling Late Events:

When events arrive after the billing period closes, we create adjustments instead of modifying finalized invoices. This maintains audit integrity.

Component 3: Rating Engine

The rating engine converts usage quantities into dollar amounts using pricing rules.

Pricing Models:

Three common models:

Tiered: Different rates for different ranges (most common)
Volume: Single rate based on total volume
Flat: Same price per unit always

Database Schema:

sqlCREATE TABLE pricing_tiers (
    plan_id VARCHAR(50),
    meter_id VARCHAR(50),
    range_start DECIMAL(20,6),
    range_end DECIMAL(20,6),  -- NULL = infinity
    unit_price DECIMAL(10,6)
);

Critical Implementation Detail:

Always use DECIMAL types for financial calculations. Never use FLOAT or DOUBLE as they introduce rounding errors that compound across millions of transactions.

Component 4: Invoice Generation

The billing engine combines rated usage across all meters into a final invoice.

Invoice Data Model:

Each invoice contains:

Header (customer, period, currency, status)
Line items (one per meter with usage and amount)
Totals (subtotal, tax, final total)

Generation Process:

Fetch all usage for customer's billing period
Apply rating rules to each meter
Calculate subtotal
Apply taxes based on customer location
Create invoice record with DRAFT status
Allow 1-hour grace period for late events
Finalize invoice (becomes immutable)

Component 5: Reconciliation

Reconciliation ensures financial accuracy by verifying every event is accounted for.

Three-Level Verification:

Event Count Check: Raw events = Aggregated usage count
Amount Check: Recalculated amounts = Invoiced amounts
Balance Check: Sum of invoices = Account balance

If any check fails, the system creates an alert for investigation. This catches data loss, pricing bugs, or calculation errors before customers are affected.

Event Ingestion:

Deploy multiple API instances behind load balancer
Each instance handles 10k req/sec
Scale to 10+ instances for 100k req/sec target

Kafka Consumers:

Partition topic by customer_id for ordered processing
Run one consumer per partition
Add partitions dynamically as throughput grows

Database Sharding:

pythondef get_shard(customer_id):
    # Distribute customers across 16 shards
    hash_value = hash(customer_id)
    return hash_value % 16

Handling Traffic Spikes
Rate Limiting:
Limit events per customer to prevent abuse and protect system.

Queue Backpressure:
Monitor Kafka lag. If lag exceeds threshold, slow down ingestion or temporarily reject events with retry-after headers.

Failure Handling
Dead Letter Queue:
When event processing fails after retries, send to DLQ for manual review:

Transient errors (DB timeout, network issues): Retry with exponential backoff
Permanent errors (invalid data): Send immediately to DLQ
Max retries exceeded: Send to DLQ after 3 attempts

This ensures no events are silently dropped while preventing infinite retry loops.

Real-World Patterns
Stripe's Approach:

API-first metering: Customers report usage via REST API
Mandatory idempotency keys on all requests
1-hour draft period before invoice finalization
Webhook notifications for all invoice lifecycle events

AWS's Approach:

Service-side metering: Each AWS service emits usage events
Hourly aggregation with daily rollups
Separate pricing catalog service
Complex reserved instance calculations

Twilio's Approach:

Real-time usage visibility in dashboard
Sub-account isolation for resellers
Usage alerts to prevent bill shock
Programmatic usage queries via API

Key Takeaways

Use event streaming (Kafka) for replay capability and decoupling
Implement idempotency at ingestion to prevent duplicate charges
Always use Decimal types for financial calculations, never floats
Build comprehensive reconciliation to catch errors before customers notice
Design for late events from day one with adjustments and grace periods
Separate metering from billing - they have different scaling requirements
Monitor consumer lag and implement backpressure handling

Conclusion
Building a production-grade usage based billing system requires careful attention to event reliability, financial accuracy, and scalability. The architecture presented here handles 100,000+ events per second while maintaining the precision required for revenue-critical systems.

The key is separating concerns cleanly (ingestion, metering, rating, billing, reconciliation) and treating each as an independent service that can scale horizontally. With proper idempotency, reconciliation, and failure handling, this system can reliably process billions of events while maintaining financial integrity.

DEV Community

Usage Based Billing: A Practical Guide for Engineers

Top comments (0)