Usage based billing systems power some of the largest companies in tech: AWS bills for compute usage, Stripe processes transaction-based fees, and Twilio meters every API call. These systems must handle millions of events per second while maintaining perfect financial accuracy.
This article presents a complete system design for a production-grade usage billing platform, covering architecture, data models, and scaling strategies used by companies like Stripe, AWS, and Adyen.
System Requirements
Functional Requirements
- Capture usage events from multiple sources
- Aggregate usage across time windows
- Apply tiered and volume-based pricing rules
- Generate accurate invoices
- Handle late-arriving events
- Support multiple currencies
Non-Functional Requirements
- Throughput: 100,000 events/second
- Latency: < 100ms for event ingestion
- Accuracy: 100% financial correctness
- Availability: 99.99% uptime
Capacity Estimates
- 10 million active customers
- 100,000 events/second (8.6 billion/day)
- 1KB per event = 8.6TB/day
- 7 years retention = ~22PB total
High-Level Architecture
- Event Ingestion: Validates and deduplicates incoming events
- Event Streaming: Kafka provides replay capability and decoupling
- Metering Pipeline: Aggregates raw events into billable usage
- Rating Engine: Applies pricing rules to usage quantities
- Billing Engine: Generates invoices with tax and currency handling
- Reconciliation: Verifies every event is accounted for
Key Design Decision: We separate metering from billing because they have different SLAs and scaling requirements. Metering is real-time, billing is batch-oriented.
Component 1: Event Ingestion
The ingestion layer handles burst traffic and ensures exactly-once processing.
API Contract:
jsonPOST /v1/usage/events
{
"idempotencyKey": "evt_1a2b3c",
"customerId": "cust_123",
"meterId": "api_calls",
"quantity": 1,
"timestamp": "2025-02-01T10:05:23Z"
}
Validation Steps:
- Check required fields exist
- Verify timestamp is within acceptable range
- Confirm meterId is configured
- Check idempotency key for duplicates
Deduplication Strategy:
We use Redis with a 24-hour TTL to catch duplicate events. The idempotency key prevents double-charging even if the same event is sent multiple times.
Component 2: Metering Pipeline
The metering pipeline aggregates raw events into usage totals.
Database Schema:
sqlCREATE TABLE meter_usage (
customer_id VARCHAR(50),
meter_id VARCHAR(50),
period_start TIMESTAMP,
period_end TIMESTAMP,
quantity DECIMAL(20,6),
event_count INTEGER,
PRIMARY KEY (customer_id, meter_id, period_start)
);
Aggregation Process:
Each event increments a counter for the customer's billing period. We use database UPSERT to atomically increment usage values.
Handling Late Events:
When events arrive after the billing period closes, we create adjustments instead of modifying finalized invoices. This maintains audit integrity.
Component 3: Rating Engine
The rating engine converts usage quantities into dollar amounts using pricing rules.
Pricing Models:
Three common models:
- Tiered: Different rates for different ranges (most common)
- Volume: Single rate based on total volume
- Flat: Same price per unit always
Database Schema:
sqlCREATE TABLE pricing_tiers (
plan_id VARCHAR(50),
meter_id VARCHAR(50),
range_start DECIMAL(20,6),
range_end DECIMAL(20,6), -- NULL = infinity
unit_price DECIMAL(10,6)
);
Critical Implementation Detail:
Always use DECIMAL types for financial calculations. Never use FLOAT or DOUBLE as they introduce rounding errors that compound across millions of transactions.
Component 4: Invoice Generation
The billing engine combines rated usage across all meters into a final invoice.
Invoice Data Model:
Each invoice contains:
- Header (customer, period, currency, status)
- Line items (one per meter with usage and amount)
- Totals (subtotal, tax, final total)
Generation Process:
- Fetch all usage for customer's billing period
- Apply rating rules to each meter
- Calculate subtotal
- Apply taxes based on customer location
- Create invoice record with DRAFT status
- Allow 1-hour grace period for late events
- Finalize invoice (becomes immutable)
Component 5: Reconciliation
Reconciliation ensures financial accuracy by verifying every event is accounted for.
Three-Level Verification:
- Event Count Check: Raw events = Aggregated usage count
- Amount Check: Recalculated amounts = Invoiced amounts
- Balance Check: Sum of invoices = Account balance
If any check fails, the system creates an alert for investigation. This catches data loss, pricing bugs, or calculation errors before customers are affected.
Event Ingestion:
- Deploy multiple API instances behind load balancer
- Each instance handles 10k req/sec
- Scale to 10+ instances for 100k req/sec target
Kafka Consumers:
- Partition topic by customer_id for ordered processing
- Run one consumer per partition
- Add partitions dynamically as throughput grows
Database Sharding:
pythondef get_shard(customer_id):
# Distribute customers across 16 shards
hash_value = hash(customer_id)
return hash_value % 16
Handling Traffic Spikes
Rate Limiting:
Limit events per customer to prevent abuse and protect system.
Queue Backpressure:
Monitor Kafka lag. If lag exceeds threshold, slow down ingestion or temporarily reject events with retry-after headers.
Failure Handling
Dead Letter Queue:
When event processing fails after retries, send to DLQ for manual review:
- Transient errors (DB timeout, network issues): Retry with exponential backoff
- Permanent errors (invalid data): Send immediately to DLQ
- Max retries exceeded: Send to DLQ after 3 attempts
This ensures no events are silently dropped while preventing infinite retry loops.
Real-World Patterns
Stripe's Approach:
API-first metering: Customers report usage via REST API
Mandatory idempotency keys on all requests
1-hour draft period before invoice finalization
Webhook notifications for all invoice lifecycle events
AWS's Approach:
Service-side metering: Each AWS service emits usage events
Hourly aggregation with daily rollups
Separate pricing catalog service
Complex reserved instance calculations
Twilio's Approach:
Real-time usage visibility in dashboard
Sub-account isolation for resellers
Usage alerts to prevent bill shock
Programmatic usage queries via API
Key Takeaways
- Use event streaming (Kafka) for replay capability and decoupling
- Implement idempotency at ingestion to prevent duplicate charges
- Always use Decimal types for financial calculations, never floats
- Build comprehensive reconciliation to catch errors before customers notice
- Design for late events from day one with adjustments and grace periods
- Separate metering from billing - they have different scaling requirements
- Monitor consumer lag and implement backpressure handling
Conclusion
Building a production-grade usage based billing system requires careful attention to event reliability, financial accuracy, and scalability. The architecture presented here handles 100,000+ events per second while maintaining the precision required for revenue-critical systems.
The key is separating concerns cleanly (ingestion, metering, rating, billing, reconciliation) and treating each as an independent service that can scale horizontally. With proper idempotency, reconciliation, and failure handling, this system can reliably process billions of events while maintaining financial integrity.







Top comments (0)