Takayuki Kawazoe

Posted on May 6

We shipped a pricing page that was 50,000x off from the meter — a SaaS pricing post-mortem

#saas #pricing #stripe #ai

I noticed it mid-conversation. A potential customer was reading our pricing page out loud — "Pro plan, ¥50,000 a month, 100 credits" — and asked, "so each credit is about ¥500?" I opened the backend. Our RateCard row had input_token_rate: 1.5, output_token_rate: 7.5, and the implicit unit was one credit equals ¥0.01. A typical repair task burns 80,000–125,000 credits. On the backend, ¥50,000 buys you five million credits. On the LP, ¥50,000 was being read as five hundred yen per credit.

The gap was 50,000x. Same word — "credit" — pointing at two different products.

I'm building an AI dev harness called Codens; the relevant context here is that the billing meter is real (PostgreSQL row-level enforcement, a RateCard table, idempotent consume_credit calls from worker pods), and the marketing copy was written months earlier when "credit" meant something else. The implementation moved. The copy didn't. Nobody re-checked.

This is the post-mortem. What the SSOT actually said, what the LP was actually saying, how the two diverged silently, and what we shipped to re-align them. I'm writing it because I suspect this exact failure mode is more common than people admit — pricing copy that drifts from the meter is the kind of bug that doesn't show up in CI.

Where the SSOT lived (and what it said)

The single source of truth for our pricing is two things, both in the Billing Control Plane (BCP) database:

The billing_rate_cards table — token-to-credit conversion rates, model multipliers, minimum charge.
The plans table — tier definitions: monthly credit quotas and their JPY prices.

The rate card is enforced at consume time. Every POST /credits/deduct call goes through ConsumeCreditUseCase._price, which does this:

async def _price(self, req: ConsumeCreditRequest) -> Decimal:
    card = await self.rate_card_repo.get_current()
    if card is None:
        raise NotFound("no active rate card")
    multiplier = card.get_model_multiplier(req.model_name)
    in_cost = (Decimal(req.input_tokens) * card.get_input_token_rate() * multiplier).quantize(
        Decimal("0.01")
    )
    out_cost = (
        Decimal(req.output_tokens) * card.get_output_token_rate() * multiplier
    ).quantize(Decimal("0.01"))
    total = in_cost + out_cost
    minimum = Decimal(str(card.rates.get("minimum_charge", 0)))
    if total < minimum:
        total = minimum
    return total

Note the quantize(Decimal("0.01")). That's not a coincidence. Credits are denominated to two decimal places; a credit, in this code, is a one-yen unit at one cent precision. The seed migration is explicit about it:

INITIAL_RATES: dict[str, object] = {
    "input_token_rate": "1.5",
    "output_token_rate": "7.5",
    "by_model": {
        "claude-haiku-4-5": "0.2",
        "claude-sonnet-4-6": "1.0",
        "claude-opus-4-7": "5.0",
    },
}

1 input token costs 1.5 credits (multiplied by model factor). 1 output token costs 7.5 credits. A typical Sonnet repair task — 50K input tokens, 20K output tokens — comes to 50000 * 1.5 + 20000 * 7.5 = 225,000 credits. With internal corrections and the 3-Retry loop, real-world averages land at 80K–125K credits. At ¥0.01 per credit, that's ¥800–¥1,250 per task in raw token cost.

That's the meter. It was correct. It was deployed. It was charging real customers correctly.

Then the plans table sits on top:

PLANS: list[tuple] = [
    ("free", "Free", 30_000, 0, 3),
    ("hobby", "Hobby", 300_000, 3_000, 5),
    ("pro", "Pro", 1_000_000, 10_000, 10),
    ("business", "Business", 5_000_000, 50_000, 25),
    ("enterprise", "Enterprise", None, 0, None),
]

Format: (tier, display_name, credit_quota, price_jpy, seat_limit). So Pro is 1,000,000 credits/month for ¥10,000. Business is 5,000,000 credits for ¥50,000. The implied price is ¥0.01 per credit, exactly matching the rate card's quantize precision. The two tables are internally consistent. The DB knows what it is.

What marketing was saying

Now here's what the LP said before the alignment commit. I'll quote the actual diff:

<!-- BEFORE -->
<div class="text-4xl font-bold text-gray-900 mb-1">$333</div>
<div class="text-gray-500 text-sm">/month (excl. tax)</div>
<!-- ... -->
<span><strong>100 credits/month</strong></span>

Pro plan: $333/month, 100 credits/month. There was no Hobby tier. There was no mention of token-level metering. The "What is a Credit?" section said:

1 credit ≈ 1 AI repair or generation task. One credit is the unit consumed when AI completes one task. Depending on task complexity, 0.5–3 credits may be consumed per operation.

A reader doing the obvious math gets:

$333 / 100 credits = $3.33 per credit
At ¥150/USD: roughly ¥500 per credit

Then they go on to use the product. The first repair task fires. The meter charges them ~100,000 credits. They have ~999,900 credits left of an "100 credits/month" plan. The numbers on their dashboard make no sense relative to the LP. Either we're charging them 1000x what we said, or "credit" doesn't mean what the LP said it meant.

The latter, of course. But you can't tell which from the LP. Same word, two products.

The Business tier was worse. LP said $1,000/month for 500 credits. Implied unit price: $2/credit. Backend: 5,000,000 credits for ¥50,000 ≈ $333. So the LP wasn't just unit-confused — it was 3x more expensive than the meter, and implying a unit price 50,000x the actual one. Two errors stacked.

The exact arithmetic in the commit message is:

Old LP: Pro $333 = 100 credits (implied $3.33/credit ≈ ¥500/credit)
Backend SSOT: ¥0.01/credit (DB credit_packages: ¥1,000 = 100,000 credits)

¥500 vs ¥0.01. That's the 50,000x.

How the gap formed

Reconstructing the timeline, here's the most charitable version of what happened. Earlier in the project, "credit" did mean "one AI task." That was the original product copy: "you get N tasks per month, each task = 1 credit." It was a clean abstraction for a landing page. It was also wrong as soon as the implementation moved to token-level metering, which it did when we needed actual cost-tracking accuracy. A "task" varies wildly — a small bug fix is 30K tokens, a multi-file refactor is 200K. Charging a flat 1 credit for both was either over-charging the small ones or eating the large ones.

So the meter migrated to per-token. The rate card got the 1.5/7.5/multiplier shape. The consume_credit use case started quantizing to ¥0.01. Internal numbers all aligned around the new unit.

The LP was never updated.

There's no single moment where the divergence "happened." It happened in the absence of a moment — the absence of a re-check pass triggered by the implementation change. The feature flag was "we are now metering by token." The follow-up work was "verify the meter, fix any consumer code that depended on the old unit, ship." Marketing copy lives in a separate repo (www-codens), is owned by a separate part of my brain, and isn't a "consumer of the new unit" in any tracked sense. Nothing in CI catches "the LP says one thing, the database says another." The two systems never have to agree because they never read each other.

The general shape: pricing copy that doesn't gate on the engineering SSOT will drift the moment the SSOT moves. This is the same class of bug as comments diverging from code, except the consequences are a customer trust failure rather than a code-review nuisance. A customer who computes ¥500/credit and then sees credits drain by the hundred thousand has been lied to, regardless of intent.

Detecting the drift

The detection event was unglamorous. I was on a call with a potential customer. They asked the unit-price question. I said, "let me pull up the actual numbers," opened the BCP DB and the LP side by side, and stared.

A few things made the gap diagnosable in seconds once both were on screen:

The LP showed 100 credits/month. The DB showed 1_000_000. Three orders of magnitude off, just on the quota number.
The LP showed $333 for Pro. The DB showed ¥10,000 (~$67). Five times off, in the other direction.
The "What is a Credit?" copy said "1 credit ≈ 1 task." The rate card said input_token_rate: 1.5. The unit on the LP didn't even share dimensions with the unit in the meter.

The compound effect — wrong quota, wrong price, wrong unit definition — is what produced 50,000x. Each individual error was a typical 5–1000x drift; stacked, they multiplied.

The thing I want to call out: there was no monitoring that would have surfaced this. The meter was working correctly. Customer balances were tracking correctly against actual usage. CI was green. The bug was entirely between two documents that had no mechanical relationship — the rendered HTML and the seed migration. Either could be edited without touching the other and nothing would alert.

The git commit message I ended up writing pulls no punches:

fix(lp): align pricing page with backend SSOT (5 tiers, 1 credit = ¥0.01)

Marketing pricing was 5万倍ズレ vs implementation:

Old LP: Pro $333 = 100 credits (implied $3.33/credit ≈ ¥500/credit)

Backend SSOT: ¥0.01/credit (DB credit_packages: ¥1,000 = 100,000 credits)

"5万倍ズレ" is "50,000x off." I left it in Japanese in the commit because the visceral version of "we shipped pricing this wrong" reads better in the language I was thinking in when I caught it.

Re-alignment: what we shipped

The alignment commit (d082997) did three things.

One: restructure tiers to match the seed table. The LP went from a 4-tier structure (Free Trial / Pro / Business / Enterprise) to a 5-tier structure (Free Trial / Hobby / Pro / Business / Enterprise) matching the plans migration. Quotas and prices were copied directly from the seed:

<!-- AFTER -->
<div class="text-4xl font-bold text-gray-900 mb-1">$67</div>
<div class="text-gray-500 text-sm">/month (excl. tax) · ¥10,000</div>
<!-- ... -->
<span><strong>1,000,000 credits/month</strong> (~12-13 repair tasks)</span>

Note the (~12-13 repair tasks) annotation. That's the bridge between the meter unit (credits) and the human unit (tasks). The LP isn't pretending one credit equals one task anymore; it's giving you both numbers so you can do your own arithmetic.

Two: rewrite the "What is a Credit?" section to be honest about per-token pricing.

<h3>1 credit = ¥0.01 (~$0.000067)</h3>
<p>Pricing scales with actual token usage: 1 input token = 1.5 credits,
   1 output token = 7.5 credits, with model multiplier (Haiku 0.2x /
   Sonnet 1.0x / Opus 5.0x). Task consumption varies by complexity
   and retry count.</p>

This is the section I was most reluctant to rewrite. It surrenders the clean "1 credit ≈ 1 task" abstraction in exchange for technical accuracy. The trade is correct — accuracy beats simplicity when the simplicity is a lie — but it does make the LP harder to skim. The mitigation is the per-product example cards underneath: "Error Auto-Fix ≈ 80K-125K credits per task (¥800-1,250)." The reader gets the technical truth and the human-scale example side by side.

Three: fix the trial bonus to match a usable workload. The old trial was "2 credits/day (28 total)." Under the new (correct) unit, 28 credits is enough to consume about 0.0001% of one task. It would have been a defective trial.

The fixed trial: 30,000 credits over 14 days. That's roughly 0.4 of one repair task — enough to start a task, see the orchestration, and (in many cases) finish a small one. Still not generous, but in the right order of magnitude. The "free trial = enough to do one real thing" rule was something I should have written down as a constraint earlier; without it, the old "28 total credits" had no anchor in real usage.

The top-up packages got the same alignment treatment, with prices matching the DB seed exactly:

¥1,000 → 100,000 / ¥5,000 → 525,000 (+5%) / ¥10,000 → 1,100,000 (+10%) / ¥50,000 → 5,750,000 (+15%)

The bonuses (5/10/15%) are a separate decision; what matters here is that the four package sizes on the LP correspond byte-for-byte to four rows in the database.

Lessons for SaaS pricing as engineering

A few things I'm taking from this.

Pricing copy is downstream of the SSOT, not an upstream input. The bug existed because we treated the LP as a parallel artifact — written once, edited occasionally, sourced from "what we want to charge" rather than "what the meter actually does." Once a meter exists, the LP is a projection of the meter's state. If the meter's tiers are in a plans migration, the LP's tier cards have to be generated from (or at minimum cross-checked against) that migration. We're not yet generating the LP from the seed — that's the next step — but the alignment commit does at least make the values copy-pasted rather than independently invented.

"1 unit of consumption" framings need a meter that proves the unit. The "1 credit = 1 task" framing was honest at the moment we wrote it (when the meter was per-task) and dishonest the moment the meter moved (per-token). The general principle: any pricing-copy phrase of the form "X = Y" should be falsifiable against a query against the production database. If the LP says "100 credits per month" and the database disagrees, the database is correct and the LP is a bug.

Free trial size should match one full task, not "N credits/day." "2 credits/day" was a marketing-friendly framing that didn't survive contact with the meter. A trial isn't a daily allowance — it's a budget for the user to do one real thing and see the product work. The 30,000-credit trial gets close to that in our case (0.4 tasks; not quite); the "100,000 free for the trial period" framing other people use is more honest because the unit is the natural unit of consumption. If your trial isn't large enough to complete one canonical end-to-end use, it isn't a trial, it's a teaser.

When you find a 50,000x gap, the right move is to align AND drop prices. This is the part I almost skipped. The natural alignment was "raise the LP's credit numbers to match the SSOT" — keep the dollar prices, just stop saying "100 credits" when the meter gives you a million. But that would have meant a Pro plan at $333/month for 1,000,000 credits, which (at the actual cost-per-task) is over-priced relative to what we'd want to charge a developer dogfooding the tool. The alignment commit dropped the LP prices simultaneously: Pro $333 → $67, Business $1,000 → $333. The arithmetic justifying the drop is "we now know what it actually costs us per task; let's price the tier at a margin we'd agree to publish." Aligning copy without re-pricing would have produced a technically-correct LP that was strategically wrong.

Build the cross-check into CI. This is what I haven't done yet, and it's the thing that worries me most. There's no test that fails when the LP and the seed migration disagree. The fix is straightforward — parse the rendered HTML, extract the data-tier and data-credits attributes (which we'd need to add), assert they match a query against the plans table — but it's the kind of work that doesn't ship until it bites you a second time. I'd rather not have a second time.

Closing

If you've shipped a SaaS product where the meter and the marketing copy live in different repos, I'm curious how you've kept them in sync — or whether you have. The failure mode I described doesn't show up in any CI signal I've seen; it shows up in customer conversations, weeks or months after the divergence happens. Has anyone wired up a "render the LP, scrape the prices, diff against the database seed" check that actually works? Or is everyone relying on "marketing reads engineering's PRs," which is the implicit policy that quietly fails the moment marketing copy is in a separate repo?

The deeper thing I haven't resolved: the LP isn't just numbers. It's also abstractions — phrases like "1 credit per task" that aren't literal claims but framings. Those drift even when the numbers are right. I don't have a CI check for "is this metaphor still true." The honest answer for now is "review the LP copy whenever the meter changes," which is a process, not a guarantee. Curious whether anyone has done better.