The Day I Noticed the Drop
It started with a Slack notification I almost ignored: "Stripe: charge failed."
One failed charge. Not unusual. I dismissed it and kept coding.
Three days later, I was reviewing my dashboard and noticed MRR had dropped by over $800. Not a gradual slope — a cliff. Eight subscribers had silently churned while I was building features. The charge failures had been piling up, Stripe had retried and failed, and not a single alert had fired in any tool I used.
That's when I started building BillingWatch.
What "Silent Failures" Actually Look Like
The Stripe dashboard shows you events. But it doesn't tell you when a pattern is wrong.
Here's what I was missing:
- Duplicate charges — same customer, same amount, within 60 seconds. Stripe retries are aggressive; idempotency keys are easy to get wrong.
- Charge failure cascades — not one failed card, but five in an hour. That's not bad cards. That's a card-testing attack.
-
Lapsed subscriptions —
customer.subscription.updatedfires when a subscription lapses topast_due. Easy to filter out. Easy to miss for 3 days. - Negative invoice anomalies — unexpected credits creating negative line items. Fine if intentional. Silent bug if not.
None of these are events you can subscribe to cleanly. They're patterns across events. And patterns require a monitor.
The Architecture That Solved It
BillingWatch is a FastAPI service that sits in front of your Stripe webhook endpoint. Every event flows through a detector engine before your app processes it.
from fastapi import FastAPI, Request
from billingwatch.detectors import run_all_detectors
from billingwatch.store import save_event
app = FastAPI()
@app.post("/webhook")
async def stripe_webhook(request: Request):
payload = await request.body()
event = stripe.Webhook.construct_event(
payload, request.headers["stripe-signature"], STRIPE_WEBHOOK_SECRET
)
# Store first, then detect
await save_event(event)
alerts = await run_all_detectors(event)
if alerts:
await send_alerts(alerts)
return {"received": True}
The detectors are composable. Each one takes an event and returns a list of alerts (or nothing). They can look at historical events to find patterns.
class DuplicateChargeDetector:
async def detect(self, event: dict) -> list[Alert]:
if event["type"] != "charge.succeeded":
return []
charge = event["data"]["object"]
window_start = time.time() - 60 # 60-second window
recent = await get_charges(
customer=charge["customer"],
amount=charge["amount"],
since=window_start
)
if len(recent) > 1:
return [Alert(
level="warning",
message=f"Duplicate charge detected: {charge['customer']} charged ${charge['amount']/100:.2f} twice in 60s",
event_id=event["id"]
)]
return []
Run this on every charge.succeeded event and you'll never miss a duplicate again.
What I Learned Building This
The hardest part wasn't the code. It was figuring out which anomalies matter. I talked to five other SaaS founders. Everyone had a different "silent killer":
- One had a webhook endpoint that silently returned 200 but threw an exception internally
- One had a subscription that stayed active 90 days after the credit card expired
- One had card-testing attacks going undetected for weeks
BillingWatch ships with detectors for the most common patterns. But the real value is that you can write your own detectors for the failure modes specific to your business.
Ship It, Then Extend It
The core principle: monitor events, not just metrics.
Metrics tell you something went wrong. Event patterns tell you why — and often catch it before it shows up in metrics at all.
BillingWatch is open source. If you're running Stripe at any scale and you're not watching your webhook stream for anomalies, you're flying blind. I built it because I had to. You probably will too — better to use something that already works.
Top comments (0)