DEV Community

Cover image for How to Build Pricing Tiers When AI Models Are Doing the Work
VentureIO
VentureIO

Posted on • Originally published at hub.operatoriq.io

How to Build Pricing Tiers When AI Models Are Doing the Work

{"@context":"https://schema.org","@type":"Article","headline":"How to Build Pricing Tiers When AI Models Are Doing the Work","url":"https://operatoriq.io/blog/ai-pricing-tiers-when-models-do-the-work/","datePublished":"2026-06-25T00:00:00Z","dateModified":"2026-06-25T00:00:00Z","author":{"@type":"Person","name":"Christine Johnson","jobTitle":"Founder","worksFor":{"@type":"Organization","name":"OperatorIQ"}},"publisher":{"@type":"Organization","name":"OperatorIQ","logo":{"@type":"ImageObject","url":"https://operatoriq.io/og/brand-logo.png"}},"description":"When LLMs run inside your product, flat-rate pricing stops working. Here's the framework for building AI-native tiers with real inference cost math, usage gates, and upgrade logic."}

{"@context":"https://schema.org","@type":"FAQPage","mainEntity":[{"@type":"Question","name":"How should SaaS companies price AI features when LLMs run inference per request?","acceptedAnswer":{"@type":"Answer","text":"Build usage-gated tiers that reflect your actual per-user inference cost. Calculate your loaded cost per active user per month (tokens x price per token x average calls), then set your Starter floor so the margin is positive at 80% usage. Gate the AI feature at a usage cap on Starter, remove the cap on Pro, and add a high-volume tier for power users. Flat-rate pricing that ignores inference cost will erode margin as AI usage scales."}},{"@type":"Question","name":"What is the difference between value-based and usage-based pricing for AI products?","acceptedAnswer":{"@type":"Answer","text":"Value-based pricing charges based on the outcome delivered (e.g., leads generated, documents processed). Usage-based pricing charges based on consumption (API calls, tokens, runs). For AI-native products, a hybrid works best: a flat monthly base that covers your fixed costs plus a usage gate or overage fee that maps to inference cost. This protects margin on power users while keeping entry pricing low for new customers."}},{"@type":"Question","name":"How do I calculate the right price for an AI-powered plan?","acceptedAnswer":{"@type":"Answer","text":"Start with inference cost. Example: 200 Claude Haiku calls per user per month, 500 tokens average per call, at $0.000025 per token = $2.50 per user per month in model cost. Add overhead (infra, support, auth) of ~$3/user, giving a floor of $5.50. A Starter plan at $29/month gives ~81% gross margin at this usage level. If Pro users run 1,000 calls at 1,500 tokens average, cost rises to ~$37.50/user , a $99 Pro plan still margins at ~62%. Gate the jump at the 200-call cap to push volume users to upgrade."}},{"@type":"Question","name":"When should I use credits vs usage caps for AI features?","acceptedAnswer":{"@type":"Answer","text":"Use credits when your AI features have highly variable per-call cost (e.g., a document analysis that could process 200 or 20,000 tokens depending on input size). Credits let users budget predictably and give you cost protection. Use usage caps (X runs per month) when your per-call cost is predictable and small. Caps are simpler to communicate and easier to enforce. Avoid credits if your customer base is non-technical , they add friction at the point of first value."}}]}

How to Build Pricing Tiers When AI Models Are Doing the Work

"We're eating the inference cost on every request and I don't even know how to start charging for it."

That's the pricing problem most AI-native SaaS founders hit around month 8. The product works. The churn is low. But the margin math is getting weird because your LLM bill is scaling with usage and your revenue is flat.

Flat-rate pricing made sense in 2021. It does not make sense when Claude runs 400 times a month for one customer and 12 times for another, but both pay the same $49/month.

Here is the framework for building tiers that reflect what the models actually cost.

TL;DR

  • The core problem: when AI models run on every user action, your cost-per-user scales with usage and flat-rate pricing crushes margin on power users.
  • The solution: usage-gated tiers. Gate the AI feature at a usage cap on Starter. Remove the cap on Pro. Add overages or a high-volume tier above that.
  • The math: calculate loaded cost per active user per month. Set your Starter floor so the margin is positive at 80% of the cap. Set Pro price so it covers your 90th-percentile power user.
  • Credits vs caps: use caps for predictable per-call cost. Use credits when token count varies by 10x or more per request.
  • The upgrade hook: the cap is the mechanism. When a user hits 80% of their monthly cap, send an in-product nudge and an email. At 100%, hard-gate with an upgrade path. Do not apologize for it.

Why flat-rate breaks for AI products

A non-AI SaaS product has near-zero marginal cost per active user. Serving user 1,000 costs almost the same as serving user 1. Your COGS is mostly infrastructure and support headcount. Flat-rate works because the marginal cost stays flat.

An AI-native product has a material marginal cost per user action. Every time your product calls an LLM, you pay inference cost. If a power user runs your AI feature 2,000 times a month and a casual user runs it 20 times, they do not cost the same to serve. Charging them the same price is a choice , and it is the wrong choice once you have a realistic cost model.

Here is what that cost model looks like.

The inference cost baseline (Claude Haiku, 2026 pricing)

| Usage level | Calls/month | Avg tokens/call | Total tokens | Monthly inference cost |
|, -|, -|, -|, -|, -|
| Light (casual user) | 40 | 800 | 32,000 | $0.48 |
| Medium (active user) | 200 | 1,000 | 200,000 | $2.50 |
| Heavy (power user) | 1,000 | 1,500 | 1,500,000 | $18.75 |
| Extreme (team lead) | 3,000 | 2,000 | 6,000,000 | $75.00 |

Based on Claude Haiku at approximately $0.00025 per 1K tokens (blended input/output estimate). Adjust for your actual model choice , Sonnet runs ~6-8x higher per token, Opus ~20-24x.

The light user at $0.48/month in inference cost fits under almost any flat-rate plan. The extreme user at $75/month in inference cost does not fit under a $49/month flat-rate plan. You lose money on every power user at that price.

This is not a pricing philosophy problem. It is a unit economics problem. Fix it with tiers.

The 3-tier structure for AI-native products

Most AI-native products land on a 3-tier model: Starter, Pro, and Team or High-Volume. Here is the logic for each.

Starter: covers your median active user, hard-gated

Goal: make entry easy, protect margin on the majority.

Price: set so you margin positively at 80% utilization of the cap.

Example: if your median active user runs 200 calls/month at 1,000 tokens average, your inference cost is $2.50. Add infra and support overhead of $3.00/user. Floor is $5.50. A $29/month plan gives ~81% gross margin at median usage. That is healthy.

Cap: 200 AI calls/month (or equivalent in tokens or credits). Not a soft limit. A hard gate with a clear upgrade path.

What goes behind the gate: the AI feature itself, not the product. The core product should work without the AI feature at Starter. The AI feature is the upgrade hook.

Pro: covers your 90th-percentile power user

Goal: grow with your serious users. Capture the value they get from high usage.

Price: set so you margin positively at your 90th-percentile usage level.

Example: if your 90th-percentile power user runs 1,000 calls/month at 1,500 tokens average, inference cost is $18.75. Add $5.00 overhead. Floor is $23.75. A $99/month plan gives ~76% gross margin on that user. Still healthy.

Cap: no cap, or a very high cap (5,000 calls/month) with team sharing.

What goes in Pro: uncapped AI usage, team seats (usually 3 to 5), priority processing if your inference queue has latency, and the API access if you have it.

High-Volume / Team: covers your 99th-percentile enterprise user

Goal: capture the outlier usage without losing the sale.

Price: $299 to $999/month, or a custom quote.

This tier exists because: at 3,000+ calls/month per user at high token counts, your inference cost approaches $75 to $150/month per seat. A Pro plan at $99 is now a margin-negative product for this customer segment. You need a tier that keeps the math positive at extreme usage.

What goes in Team: custom inference quotas, dedicated support, SSO, audit logs, SLA. The features that enterprise buyers need to justify the budget and get past procurement.

Credits vs caps: when to use each

There is a practical question under the tier question: do you express your usage limit in "runs per month" or "credits per month"?

Use runs (caps) when:

  • Your per-call token cost is predictable (within a 2x range per call)
  • Your customer base is non-technical and credit math creates friction
  • Your product has a natural unit (one document analyzed = one run, one email drafted = one run)

Use credits when:

  • Your per-call cost varies by 5x or more based on input size (short vs long documents, quick vs deep analysis)
  • You want to let users allocate usage across feature types at their own discretion
  • You have a developer audience comfortable with token/credit economics

Never use credits when:

  • Your customers are small business owners who don't think in credits
  • Credits would require more than one sentence to explain in your pricing page
  • Your support team would spend 20% of their time explaining why a customer ran out of credits faster than expected

Most B2B SaaS products serving operators, founders, and small businesses should use caps. Credits are for developer-facing products and high-token-variance use cases.

The upgrade trigger: how to turn the cap into revenue

The cap is only a revenue mechanism if you have an upgrade trigger. Without it, users hit the cap, get frustrated, and churn.

Here is the trigger sequence that works.

At 80% of cap: send an in-product banner and an email. Subject: "You've used 80% of your AI runs this month." Body: one sentence on what they've accomplished, one sentence on the upgrade path, one CTA button. No apologizing. No "we hate to limit you." Just the fact and the path forward.

At 100% of cap: hard-gate with a modal. The AI feature stops working. The modal has two buttons: Upgrade to Pro and View Usage. No dismiss button that lets them keep running on Starter after hitting the cap.

On the Upgrade to Pro page: show the math. "You used 200 runs in the first 18 days of this month. At that pace, you'd use 333 runs in a full month. Pro gives you unlimited runs at $99/month." Make the upgrade feel like the obvious next step, not a punishment.

After upgrade: send one email in 24 hours confirming what changed. "Your AI runs are now uncapped. Here's what you can do with the extra capacity." Link to 2 to 3 specific features or use cases unlocked by Pro.

This sequence , 80% nudge, 100% hard gate, clear upgrade modal, post-upgrade confirmation , is the standard. It is not aggressive. It is the expected behavior for any product with usage limits. Users who want Pro will upgrade. Users who don't will stay on Starter within their cap. Both outcomes are fine.

One mistake to avoid: tying AI access to seat count

The most common AI pricing mistake in B2B SaaS in 2026: gating AI features behind seat count instead of usage.

"Starter: 1 seat. Pro: 5 seats. Team: unlimited seats."

This is a seat-count pricing model wearing an AI costume. It creates a painful outcome: a 2-person team that uses your AI feature heavily is on Starter (cheap for you to serve, expensive for the product to run) while a 50-seat enterprise that barely touches AI features is on Team (expensive for you to charge, cheap to run). You're pricing backwards.

Price the AI on what it costs you to run , which is usage , not on headcount, which is incidental.

Seats can coexist with usage in your tiers (Pro includes 5 seats + 1,000 runs/month). But the AI gate should be on the usage side, not the seat side.

What LLM visibility has to do with this

If you're building an AI-native product and pricing it right, you have a secondary problem: AI assistants need to know you exist.

When your potential customers ask Claude, ChatGPT, or Perplexity for "AI-native SaaS pricing tools" or "how to build LLM pricing tiers," do you show up?

If you don't know the answer to that question, you have an AI visibility problem as real as your pricing problem. The $197 LLMRadar Audit at operatoriq.io/tools/ runs your brand across 4 LLMs with 10 queries, returns a cited-or-not matrix, and tells you exactly what to fix. The same operators building AI-native products are the ones who will Google your category in ChatGPT next quarter. Show up for them.

The tier structure summary

| Plan | Price | AI runs/month | Inference cost at median usage | Gross margin |
|, -|, -|, -|, -|, -|
| Starter | $29/month | 200 | $2.50 + $3 overhead | ~81% |
| Pro | $99/month | Unlimited (soft cap 2,000) | $25 + $5 overhead (90th pct) | ~70% |
| Team | $399/month | Custom quota | $75 + $10 overhead (99th pct) | ~79% |

Numbers based on Claude Haiku blended rate. Adjust for your model choice and your actual support/infra overhead.

The table is illustrative. Your numbers will differ. The structure will not. Starter covers median usage with a hard cap. Pro covers power users at a margin you can sustain. Team covers outlier usage at a price that keeps the unit economics positive.

Run the math on your own usage data before you set prices. Segment your active users by AI feature usage, find your 50th, 90th, and 99th percentile, and price each tier so it covers the loaded cost at that percentile with room for gross margin.

If you want this pricing architecture built for your product , tier logic, Stripe usage gates, upgrade trigger emails, and the cap enforcement layer , see the Concierge build at operatoriq.io/done-for-you/concierge/. Seven days. Flat fee. No calls.

Next up

This covers the tier structure. Next post covers a related question: how do you restructure your existing flat-rate plans without churning the customers who are on them? The migration sequencing, the grandfather logic, and the messaging that keeps upgrade rates high without triggering a support spike.

Cheers,

Christine

, -

Originally published on OperatorIQ on 2026-06-25.

Top comments (0)