Vatsal Patel

Posted on May 10 • Edited on Jun 26

Moving 2500 Paying Customers Between Stripe Accounts in a 3-Hour Window

#stripe #architecture #migration #go

In a single 3-hour window at midnight US time, we moved 2,500 US customers from our Australian Stripe account to a new US Stripe account. Payment methods, credits, coupons, promo codes, and webhooks. Nobody got double-charged. Nobody lost access. No customer email went out, because from their side, nothing changed.

This is what it took to make that boring outcome happen.

Why we did it

The company operates in Australia and the US, but until last week both regions ran on a single AU Stripe account that took USD payments from US customers and settled them out as AUD. That setup cost us roughly 2% per US transaction in international card surcharges and currency conversion fees on top of the domestic AU rate, plus FX losses on settlement. It also made tax season painful and obscured per-region revenue reporting. The brief from the CEO was simple: clean financials, lower fees, no customer impact. The shape of the solution - a separate US Stripe account with all US customers migrated over - was mine to figure out.

Scope

Two products with inverted shapes.

The tutoring product has 2,500+ US customers, 1,100+ actively paying, billed per lesson. Many of them carrying credits, coupons and promo codes.

The schools product has tens of thousands of US customers but only a few dozen with active subscriptions and a few hundred with payment methods on file - most schools sit on the platform without a paid plan. So the smaller user base carried the bulk of the active billing risk, which is why I scoped the tutoring migration first. Schools is the next wave, sequenced behind a separate product refactor.

What Stripe gives you, and what it doesn't

Stripe's self-serve PAN copy tool is the only way to move card data between accounts in a PCI-compliant way. This is what it actually covers:

Copies customer objects, preserving the customer ID
Copies attached payment methods (with new payment method IDs)
Hands you a CSV mapping old payment method IDs to new ones

That's it. It does not move credits. It does not move coupons or promo codes. It does not move subscriptions, invoices, or any metadata you care about beyond the customer record itself.

Everything else was scripts I wrote, all in Go. Each had a dry-run mode, printing what they would change and against which records, and only mutated state when explicitly invoked with a write flag. For migrations like this where the cost of a wrong run is high and the cost of an extra dry run is zero, that's the cheapest insurance you can buy.

The plan, and the timeline

From "we're doing this" to "ready to deploy" was 5 days of build. The longest single block of elapsed time was waiting a month for the new US Stripe business account to clear verification, which was nothing to do with us. The execution itself ran 3 hours, between midnight and 3 AM US time, with the payment button disabled on the US web app for the window - no one was trying to press it anyway.

The engineering work fell into a few buckets:

Customer + payment method migration. Run the PAN copy tool, ingest the CSV, run a script that updates the payment method IDs in our database against the preserved customer IDs. We only migrated the customers with Stripe accounts and payment methods on file.

Credit migration. A script that reads each customer's cash balance from the old account and recreates it on the new one. This is the part that bit us - more on that below.

Coupons and promo codes. We issue per-customer coupons with promo codes attached. Sales had no uniform convention for where they put usage limits - sometimes on the coupon, sometimes on the promo code, sometimes split across both. The migration script had to read both sides, reconcile what was actually still valid, and recreate each coupon-and-code pair on the new account with the correct remaining usages and the correct customer restriction. The "correct remaining usages" calculation was the trickiest single piece of logic in the migration, because the source of truth varied per customer.

In-flight invoices. Open and failed invoices on the AU account were the thorny data to move. These weren't static records - Stripe retries failed payments automatically over several days, and customers pay open invoices on their own time when they see the email. Voiding and recreating them at cutover would have meant cancelling invoices that were about to settle on their own, sending customers a fresh invoice with a new number, and creating support churn for billing relationships that didn't need any intervention. So we left them on the old account for a 48-hour settlement window, let natural retries and customer payments clear what they were going to clear, and then ran a script at midnight US two days after the main cutover that voided whatever was still open and recreated it on the new account. The set that needed manual intervention ended up being much smaller than the set we started with.

Webhook routing. This is where the architecture got interesting. Our backend runs as two regional clusters, sharded by region. With one Stripe account we could route webhooks directly to the AU cluster and let it forward what it needed. With two Stripe accounts, neither cluster owns "the truth" anymore.

We solved it with a single Cloud Function as the webhook entry point for both Stripe accounts. The function verifies the signature against both accounts' signing secrets, looks up the customer's region in our metadata, and forwards the webhook to the correct cluster. We already have direct REST endpoints on each backend that can receive webhooks natively, and the longer-term plan is to point each Stripe account directly at the cluster that owns its customers. I deliberately chose not to do that yet. During the settlement tail, AU-account webhooks will keep firing about US customers as their old AU-account invoices clear - and those events need to land in the US cluster, not the AU one, because that's where the customer now lives. The cloud function is the one place that knows how to route by who the customer is rather than which Stripe account sent the webhook. Direct REST endpoints assume each cluster owns its inbound events; that assumption is broken until the AU account stops emitting events about migrated customers. The function comes out in about a month, once all in-flight invoices on the old account have closed.

This is the kind of decision that's easy to get wrong by reflex. The "right" architecture is direct webhooks per region. The right next step was the cloud function, because the cleaner architecture would have caused real customer-facing breakage during the tail of the migration. Worth being explicit about.

Archiving. Stripe doesn't let you delete or archive customers on the old account. So I wrote a script that suffixed every migrated customer's name with (ARCHIVED). Sales and finance still see them in the dashboard, but nobody on either team is going to accidentally start operating on a (ARCHIVED) record - a small piece of code that did more for the migration than most of the actual logic.

The Execution

A few hours before the cutover window, I ran every script in dry-run mode against production data one more time, verified the outputs, and eyeballed the planned changes one more time. Nothing surprising came back, which is what you want from a final pre-cutover rehearsal - boring is the goal.

Midnight start. Disable the payment button on the US web app. Run the PAN copy tool.

Stripe's documentation says the PAN copy tool can take up to 3 days. Other writeups I'd read suggested up to 2 hours for under 2,000 customers was normal. We had budgeted 3 hours and had contingency plans for it taking longer.

It finished in 10 minutes.

That was the single biggest unknown in the whole plan, and it evaporated immediately. The remaining 2 hours and 50 minutes were running the metadata scripts, deploying the cloud function and the backend changes, flipping the live keys, and QA-ing the result.

What went wrong

One thing, and it was instructive.

An account manager flagged it. She noticed that recently applied credits were missing on one of her accounts on the new US Stripe account. Stripe lets a customer hold cash balances in multiple currencies, and we read them through the customer API. For five customers, that API returned the AUD balance instead of the USD balance, even though those customers had real USD credit — and our migration script trusted the API and copied across a zero. Our verification script used the same API, so the bug went unnoticed at cutover and only surfaced because a human knew what the right number should have been. Why the API returned the zero AUD balance instead of the non-zero USD one, we still aren't sure.

The fix was small: pull balances using the cash balance API against both AUD and USD currency balances explicitly, then run a correction script on the affected customers. Five customers affected, all of whom had their correct credit restored before any of them touched the product.

I'd call this a near-miss rather than an incident. No one was billed incorrectly, no support ticket was opened. But it's the part of the migration I'd most like back. The lesson is specific: when an API returns a structured value, validating that the returned value matches a separate source of truth is worth the extra script run. We had the validation; we just ran it after the migration instead of as part of it.

No formal post-mortem. The fix was scoped and applied within a couple of hours, well before the customers even noticed the problem.

Verification

Two scripts ran post-cutover:

Customer + payment method audit. For every migrated customer, confirm the customer exists on the new account, has the expected number of payment methods, and that those payment methods are attached and the IDs matched with our database.
Credit audit. For every customer with a non-zero balance on the old account, confirm the same balance exists on the new account.

Subscription verification is its own piece of work because tutoring doesn't use Stripe subscriptions for billing - it bills per lesson. The schools migration is queued behind an unrelated product refactor and will run on the same tooling, with the small number of active subscriptions handled with a different script.

Team

I led the migration end to end: the technical plan, the research into the PAN copy tool's behavior and limits, the backend code changes, all of the scripts, the cutover sequencing, and the verification. Another senior engineer paired on the frontend changes and the cloud function - particularly the cluster-forwarding logic, which they owned. Finance and sales were kept in the loop on the timeline but didn't have execution responsibilities during the window.

What I'd do differently

Two things, looking back. I'd validate balances against a second source as part of the migration script rather than after it - the cash balance API quirk was the only real surprise of the migration, and it was the one thing my plan didn't have a pre-cutover check for. I'd write a short design doc on the coupon and promo-code logic before coding it, since the inconsistency in how sales had set usage limits was discovered pretty late into the building phase, so we settled for unit tests instead. The common thread is that both are pre-cutover discipline I traded for during-cutover speed - small upfront investments that would have made the cutover itself even quieter.

Takeaways

The migration was, by the metric that mattered, invisible. 2,500+ customers, a meaningful book of credits and active billing relationships, a webhook architecture change, and a 3-hour cutover that finished an hour ahead of plan. Zero customer-visible incidents, five customers requiring a same-day correction that none of them noticed.

If I had to compress the lessons:

The shape of vendor tooling determines the shape of your work. Stripe's PAN copy tool moves cards and customer IDs; everything else - credits, coupons, promo codes, the actual business state - is on you. Knowing exactly where the vendor's responsibility ends is the first task of a migration like this, not the last.
Choose the boring intermediate architecture when in-flight state spans the cutover. A cloud function fan-out is uglier than direct webhooks per region. It's also the only thing that doesn't break the tail of customers whose invoices straddle the migration.
Build for the people who'll touch the data after you. The (ARCHIVED) prefix mattered more than most of the actual code. A migration ends when finance and sales can use the new system safely, not when the scripts finish running.

Top comments (1)

Harjot Singh • May 30

stripe billing-ops costs (chargebacks, dunning, tax filings) eat indie margins fast. moonshift writes auth/billing/deploy to YOUR github + vercel for $3 flat per shipped saas. no monthly. first run free, no card. moonshift.io