Ray

Posted on Mar 28

"10 Stripe Billing Anomalies I Monitor in Real-Time (And the Code Behind Each One)"

#python #stripe #webhooks #opensource

10 Stripe Billing Anomalies I Monitor in Real-Time (And the Code Behind Each One)

After losing $800 in MRR to a silent charge failure cascade that went unnoticed for 4 days, I built BillingWatch — a self-hosted Stripe webhook monitor with 10 real-time anomaly detectors. Here's exactly how each one works.

The Architecture: Event-Triggered, Not Scheduled

The critical design choice: detectors run on every Stripe webhook event, not on a cron schedule. Cron-based checks have a window problem — you might miss a 5-minute card testing attack. Event-triggered detection means you catch it within seconds of the first anomalous event.

@app.post("/webhook")
async def stripe_webhook(request: Request, db: Session = Depends(get_db)):
    payload = await request.body()
    sig_header = request.headers.get("stripe-signature")

    event = stripe.Webhook.construct_event(payload, sig_header, WEBHOOK_SECRET)
    store_event(event, db)
    await run_detectors(event, db)

    return {"status": "ok"}

Each detector is a function that receives the event and returns an Alert dataclass or None. Then they all run in parallel.

The 10 Detectors

1. Charge Failure Spike

Triggers when the charge failure rate exceeds 15% in the last hour. Catches both organic failure surges (bad batch of cards) and the beginning of fraud attacks.

async def detect_charge_failure_spike(event, db):
    if event["type"] not in ("charge.failed", "charge.succeeded"):
        return None

    one_hour_ago = datetime.utcnow() - timedelta(hours=1)
    recent = db.query(Event).filter(
        Event.type.in_(["charge.failed", "charge.succeeded"]),
        Event.created_at >= one_hour_ago
    ).all()

    if len(recent) < 5:  # not enough data
        return None

    failures = [e for e in recent if e.type == "charge.failed"]
    failure_rate = len(failures) / len(recent)

    if failure_rate > 0.15:
        return Alert(
            severity="HIGH",
            detector="charge_failure_spike",
            message=f"Failure rate: {failure_rate:.0%} ({len(failures)}/{len(recent)} in last hour)"
        )

This one is the most battle-tested. It fires in real testing at 100% failure rate with the expected severity.

2. Fraud Spike (Disputes)

Fires when disputes exceed 5 in 24 hours OR 1% of recent charges. Stripe will flag your account as high-risk around the 1% threshold. Better to know first.

async def detect_fraud_spike(event, db):
    if event["type"] != "charge.dispute.created":
        return None

    yesterday = datetime.utcnow() - timedelta(hours=24)
    dispute_count = db.query(Event).filter(
        Event.type == "charge.dispute.created",
        Event.created_at >= yesterday
    ).count()

    charge_count = db.query(Event).filter(
        Event.type == "charge.succeeded",
        Event.created_at >= yesterday
    ).count()

    dispute_rate = dispute_count / charge_count if charge_count > 0 else 0

    if dispute_count >= 5 or dispute_rate > 0.01:
        return Alert(severity="CRITICAL", detector="fraud_spike", ...)

In testing, this fires immediately on the 5th dispute with CRITICAL severity.

3. Duplicate Charge

Catches idempotency failures — same customer, same amount, within 5 minutes. These happen more than you'd expect when webhook handlers aren't idempotent.

async def detect_duplicate_charge(event, db):
    if event["type"] != "charge.succeeded":
        return None

    charge = event["data"]["object"]
    five_min_ago = datetime.utcnow() - timedelta(minutes=5)

    duplicates = db.query(Event).filter(
        Event.customer_id == charge.get("customer"),
        Event.amount == charge.get("amount"),
        Event.created_at >= five_min_ago,
        Event.stripe_id != charge["id"]
    ).count()

    if duplicates > 0:
        return Alert(severity="HIGH", detector="duplicate_charge", ...)

4. Silent Lapse

The sneaky one. Fires when a subscription gets cancelled or payment fails, but the customer's account still shows active in your database. Detects the gap between billing reality and your app state.

The implementation queries your local user/subscription table and compares against the Stripe event — the cross-system check that catches integration bugs.

5. Revenue Drop

Detects sudden drops in payment volume. Compares hourly revenue to the 7-day rolling average. If today's hour is less than 40% of the baseline, something's wrong (could be a payment processor issue, not fraud).

6. Currency Mismatch

Fires when you receive a payment in an unexpected currency. Useful if you run a single-currency business and suddenly start seeing EUR or GBP charges — often means a configuration issue or unintended geographic expansion.

7. Negative Invoice

Catches credit notes and refunds that create negative invoice amounts. Usually harmless, but a sudden pattern of negative invoices can indicate abuse of your refund policy or a billing logic bug.

8. Plan Downgrade + Data Loss Risk

Fires when a customer downgrades to a plan with lower limits. Not strictly an anomaly — but it's a signal to check if their data will be cleaned up correctly before the billing cycle renews.

9. Timezone Billing Error

Triggers on subscriptions that renew in a timezone-ambiguous window (midnight UTC ± 2 hours). These are the ones most likely to result in off-by-one billing day errors.

10. Webhook Lag

Monitors the time between Stripe's event timestamp and when it hits your endpoint. If lag exceeds 30 seconds consistently, your webhook handler is either slow or Stripe is having delivery issues. Catches the "webhook queue backing up" failure mode before it cascades.

The Alert Pipeline

Every fired detector generates an alert stored in SQLite, then routed to your configured channels:

async def run_detectors(event, db):
    detectors = [
        detect_charge_failure_spike,
        detect_fraud_spike,
        detect_duplicate_charge,
        detect_silent_lapse,
        detect_revenue_drop,
        detect_currency_mismatch,
        detect_negative_invoice,
        detect_plan_downgrade,
        detect_timezone_billing_error,
        detect_webhook_lag,
    ]

    alerts = await asyncio.gather(*[d(event, db) for d in detectors])

    for alert in filter(None, alerts):
        store_alert(alert, db)
        await notify(alert)  # email, Slack, webhook

Running them concurrently means the whole detection pipeline adds < 50ms to webhook processing time.

What It Catches in Production

In the first week of running BillingWatch on a real Stripe account:

2 charge failure spikes caught within seconds (both were card testing probes)
1 duplicate charge from a misconfigured retry handler
3 silent lapse incidents that would have been discovered only on the next billing cycle

The $800 MRR loss that motivated building this wouldn't have happened if BillingWatch had been running. Silent lapses are quiet. Automation catches them.

Try It

BillingWatch is MIT-licensed and self-hosted — your Stripe data stays on your server.

GitHub: github.com/rmbell09-lang/BillingWatch

Setup takes about 10 minutes: clone, configure your STRIPE_WEBHOOK_SECRET and DATABASE_URL, deploy to any server that can receive Stripe webhooks.

DEV Community