Building a Usage-Based Billing Pipeline

#webdev #programming

---
title: "Building a Usage-Based Billing Pipeline That Never Loses a Cent"
published: true
description: "Build a metering pipeline with idempotent event ingestion, PostgreSQL hypertables, and Stripe Meter API reconciliation that handles millions of events accurately."
tags: postgresql, architecture, api, backend
canonical_url: https://blog.mvpfactory.co/usage-based-billing-pipeline
---

## What We're Building

In this workshop, we'll wire up a three-stage usage-based billing pipeline: idempotent event ingestion, time-window aggregation with late-arrival handling, and reconciliation against Stripe's Meter API. By the end, you'll have the PostgreSQL hypertable + materialized view pattern that processes millions of events per day without losing a cent.

Here's the full architecture we're working toward:

SDK → Queue (SQS/Kafka) → Ingestion API → usage_events (hypertable)
↓
hourly_usage (continuous aggregate)
↓
Reconciliation Worker → Stripe Meter API
↓
Stripe Invoice Generation


## Prerequisites

- PostgreSQL with [TimescaleDB](https://docs.timescale.com/) extension installed
- A Stripe account with access to the Meter API (`/v2/billing/meter_events`)
- Familiarity with SQL aggregation and basic Python

## Step 1: Idempotent Event Ingestion

Every usage event needs an idempotency key generated at the source — the SDK or service emitting the event. Here's the minimal setup to get this working:

sql
CREATE TABLE usage_events (
id BIGINT GENERATED ALWAYS AS IDENTITY,
idempotency_key UUID NOT NULL,
customer_id TEXT NOT NULL,
meter_name TEXT NOT NULL,
quantity NUMERIC NOT NULL,
event_timestamp TIMESTAMPTZ NOT NULL,
ingested_at TIMESTAMPTZ DEFAULT now(),
UNIQUE (idempotency_key)
);


That `UNIQUE` constraint gives you exactly-once semantics at the database level. Your ingestion endpoint returns `200 OK` on conflict — the client sees success, the pipeline sees no duplicate.

**The docs don't mention this, but** — make your idempotency key a deterministic hash of the event's natural key (customer + meter + timestamp + request ID), not a random UUID. Random UUIDs break when retries come from different layers. Deterministic keys mean retries from the SDK, the queue, or the load balancer all converge to the same key.

## Step 2: Time-Window Aggregation With Late Arrivals

This is where TimescaleDB pays off. Convert `usage_events` into a hypertable, then build a continuous aggregate:

sql
SELECT create_hypertable('usage_events', 'event_timestamp');

CREATE MATERIALIZED VIEW hourly_usage
WITH (timescaledb.continuous) AS
SELECT
customer_id,
meter_name,
time_bucket('1 hour', event_timestamp) AS bucket,
SUM(quantity) AS total_quantity,
COUNT(*) AS event_count
FROM usage_events
GROUP BY customer_id, meter_name, bucket;


Now the part that actually matters — the refresh policy with a late-arrival window:

sql
SELECT add_continuous_aggregate_policy('hourly_usage',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '15 minutes'
);


That `start_offset` of 3 hours means any event arriving up to 3 hours late still gets folded into the correct bucket on the next refresh. Let me show you why this matters:

| Approach | Late-Arrival Handling | Query Speed (10M events/day) | Accuracy |
|---|---|---|---|
| Raw table SUM() | None, dropped events | 8–15s per customer | ~97–99% |
| Application-layer rollup | Manual, error-prone | 50–200ms | Depends on implementation |
| Continuous aggregate | Automatic re-aggregation | 5–20ms | 99.99%+ |

That jump from 97% to 99.99% sounds small until you're processing $2M/month in usage charges. 1% error is $20K you're either eating or fighting customers over.

## Step 3: Stripe Meter API Reconciliation

Make Stripe the sync target, not the source of truth. Your PostgreSQL aggregates are authoritative. The reconciliation loop:

1. Every billing period, query `hourly_usage` for each customer/meter
2. Compare against Stripe's meter event summaries via `/v1/billing/meters/{id}/event_summaries`
3. If the delta exceeds your threshold, emit a correction event
4. Log every reconciliation for audit

python
stripe.billing.meter_events.create(
event_name="api_requests",
payload={
"stripe_customer_id": customer.stripe_id,
"value": str(aggregated_quantity),
},
identifier=f"{customer.id}:{meter}:{bucket_iso}", # idempotency
)


The `identifier` field is Stripe's built-in idempotency mechanism for meter events. If your sync job crashes and restarts, it won't double-count.

## Gotchas

- **Random UUIDs as idempotency keys** — they break across retry boundaries. Use deterministic hashes of the event's natural key instead.
- **No late-arrival window** — without an explicit `start_offset`, events that arrive even slightly late get dropped from their billing bucket. Tune the offset based on your observed p99 delivery latency.
- **Stripe as source of truth** — at high volume, you need the audit trail in your infrastructure. Query disputes require data you control, not data behind a third-party API.
- **That 97% accuracy looks fine** — until 1% of $2M/month means $20K in billing errors every cycle.

## Wrapping Up

Here's the pattern I use in every billing project: generate deterministic idempotency keys at the source, aggregate with continuous views that handle late arrivals automatically, and own your source of truth while syncing to Stripe. This pipeline scales to millions of events per day and gives you the audit trail you'll need when — not if — a customer disputes an invoice.

Tune the 3-hour `start_offset` and 15-minute refresh cycle to match your system's actual delivery latency, and you're set.

DEV Community

Building a Usage-Based Billing Pipeline

Top comments (0)