DEV Community

Alex Cloudstar
Alex Cloudstar

Posted on • Originally published at alexcloudstar.com

Pricing AI Features in 2026: How To Charge For LLM-Backed Products Without Bleeding Margins

The first AI feature I shipped on a flat plan lost money on the third user who discovered it. Not slowly. Immediately. He was running a script through it on a loop because the UI did not stop him from doing that, and his single account burned through more in API costs that week than the feature was supposed to make in a month. I shipped the fix on a Sunday and rewrote the pricing on a Tuesday, and I have not priced an AI feature on a flat plan since.

That is the lesson the SaaS playbook had not caught up to yet in 2024 and that most teams have finally internalized by 2026. The economics of AI features are different from the economics of CRUD features. A heavy CRUD user costs you a few extra database rows. A heavy AI user costs you real money on every action they take. If your pricing does not reflect that, your power users are an unfunded liability and your accountant is the one who finds out.

This is what pricing AI features actually looks like in 2026, what works, what backfires, and how to land on a structure that scales with your costs instead of fighting them.

Why Flat Pricing Cannot Work For AI

The pitch for flat pricing is real. It is simple. It is predictable. It is what customers expect from SaaS. The reason it stops working for AI features is that the variance in how people use AI features is enormous, and the variance lands on your bill instead of theirs.

A normal SaaS feature has a soft cap on usage. There are only so many invoices a freelancer can send in a month. There are only so many tickets a support team can write. The heavy users pay the same as the light users and you make money on the average because the gap between them is bounded.

AI features have no such cap. A user with a script and a coffee can hit your endpoint a thousand times an hour without thinking about it. A user with a clever prompt can route a hundred-page document through your most expensive model and walk away. The cost per action is high enough and the variance wide enough that "average it out" stops being a viable financial model. You will lose money on the long tail and not make enough on the short tail to cover it.

The teams I have watched try to make flat pricing work end up doing one of three things. They cap usage in a way that frustrates the users they wanted to keep. They eat the cost and watch their gross margin compress until they have to raise prices on everyone. They quietly downgrade the model behind the feature until the quality drop kicks loose enough customers to balance the bill. None of these are good outcomes. All of them are what happens when you treat AI cost like SaaS cost.

The framing that has held up is that AI features have a unit cost that is not zero, is not negligible, and is not predictable from your headcount. They are closer to a metered service than a fixed-cost feature. Pricing them like the former is what works.

The Three Models That Have Converged

By 2026, the pricing models that survived contact with real AI workloads have shrunk to three. There are variations, but the underlying shapes are the same.

Pure usage-based. The customer pays per unit of work. Per token, per request, per generated artifact, per minute of voice. The pricing math passes through directly to the underlying cost with a margin on top. This is the model used by API-first products and by features where the unit of work is well-defined and the customer has a mental model for what they are buying. The OpenAI and Anthropic developer APIs are the canonical examples. So is most of what you would build on the Vercel AI Gateway.

Credit or token systems. The customer buys a bucket of credits per month and spends them on actions. Different actions cost different amounts of credits. Unused credits roll over or expire. This is the model that has won for consumer-facing AI products and prosumer SaaS, because it gives the customer a predictable monthly bill while still letting the vendor charge differentially for different costs. ChatGPT's plan structure, Midjourney's GPU minutes, the credit systems on most image generation tools all use some version of this.

Hybrid plans with overage. The customer pays a monthly subscription that includes a generous-but-bounded amount of AI usage, with overage billed per unit beyond the included amount. Power users pay more, light users pay the flat rate, and the vendor is protected from the worst-case user without making everyone feel metered. This is the dominant model for B2B SaaS that has added AI features on top of existing products. Notion, Linear, the modern incarnation of every productivity tool, all run some version of this.

The right choice depends on the product, the buyer, and the cost shape of the workload. A pure API product with sophisticated buyers should usually go usage-based. A consumer product where the unit of value is the action should usually go credit-based. A B2B feature glued onto an existing flat plan should almost always go hybrid.

The thing all three have in common is that the customer's bill moves with their usage in some way. Flat pricing breaks this link, and the link is what keeps the business model intact when usage spikes.

Pricing Close To The Unit Of Value

The hardest part of pricing AI features is figuring out what the customer is actually buying. The token is the unit your provider bills you in. It is rarely the unit your customer cares about.

A customer using an AI writer does not care how many tokens went into the article. They care that they got an article. The unit of value is "article." A customer using a code review agent does not care about the input context size. They care that they got a review they could ship. The unit of value is "review." A customer using a chatbot does not care about the round-trip token count. They care that they got an answer. The unit of value is "answer," or maybe "conversation."

If you price in tokens to a customer who is buying articles, you create a UX where the customer is doing math in their head about what their next action will cost, every action they take. That math is anxiety, and anxiety on every click is how features get used less and churn goes up. You also expose your customer to your supply-side problems. Token counts shift when models change. A new model might be twice as efficient for the same output. A new prompt might be twice as long. None of that should land in your customer's invoice, because none of it is something they did differently.

The pattern that works is to price in the unit your customer thinks in, and absorb the token-level variance behind it. An "article" costs the customer a flat number of credits regardless of how many tokens you used to generate it. A "review" costs a flat number of credits regardless of which model variant ran. The router work I covered in the LLM router pattern guide is what makes this possible. You pick a cheaper model when you can, an expensive model when you have to, and the customer never sees the difference because they are paying for the outcome, not the inputs.

The exception is when the customer is technical enough to care about the underlying mechanics. If you are selling to developers building on your API, token-level pricing is what they expect, because they are reasoning about their own cost downstream. The closer the customer is to building their own AI product on top of yours, the more you should price the way their providers price them. The further they are from that, the more you should abstract.

Setting The Margin Without Setting Yourself On Fire

The naive way to price an AI feature is to take the underlying cost, multiply by some margin, and ship. This works for an afternoon and then fails as soon as your usage mix shifts.

A few traps to avoid.

Margins that are too thin. A 20 percent margin on AI usage looks reasonable until you realize that 20 percent margin is supposed to cover the entire rest of the business. Support, hosting, the engineers building the product, marketing, taxes. Twenty percent of an API call is not a business. Three to five times the underlying cost is not unreasonable for a B2C product. For B2B, the multiples are usually higher. The customer is buying a product, not access to a wholesale price list, and they are paying for the work you did to make the feature work, not just for the model call underneath.

Margins that are too thick on simple work. Charging ten dollars for an action that costs you ten cents and that the customer could replicate by typing into ChatGPT for free is how you get a churn rate that does not survive the first competitor. Customers in 2026 know what frontier models cost. The margin you charge has to be defensible by the work you did to make the workflow actually useful. If the answer is "we just stuck their text into a system prompt," you do not get to charge fifty times the underlying cost.

Pricing that does not move with model prices. Frontier model prices have come down between two and ten times in the last two years and will keep coming down. If your pricing is locked in based on what models cost a year ago, your competitors will undercut you with the same workflow on cheaper inference. Build pricing that moves. Either pass cost reductions through to the customer (and make a marketing moment of it), or hold the price and use the margin expansion to fund features the customer actually wanted.

Prices set without an eval. "We will route this to the cheap model and charge the expensive-model price" is the move that destroys trust the second time a customer notices the quality dropped. The cost-saving routing pattern works only if your evals confirm the cheaper model is good enough for the bucket. Without that, you are quietly downgrading your product to widen your margin, and customers will figure it out. The same eval discipline I covered in AI evals for solo developers is what keeps the routing honest.

The margin that holds up is the one you can defend with both the underlying cost math and the work you did to make the product useful. Both halves matter. Either alone is a bad answer.

Designing The Plan Tiers

Once you have picked a model and set a margin, the next question is how the plan tiers should be shaped. This is the part where most teams pattern-match to SaaS and end up with tiers that do not work for AI.

The tiers that have held up have a few common features.

The free tier is bounded by usage, not by feature gates. Free users get a small but real allowance of the AI feature. Not a watered-down version. The same feature, with a usage cap. This is what lets free users actually evaluate whether the feature is worth paying for, instead of bouncing off a degraded version that did not show them the real value. The usage cap protects you from cost. The feature parity protects you from churn-before-conversion.

The middle tier is where most paying customers should land. It includes enough usage that a typical paying user does not hit the cap during a normal month. The price is set so that this tier is profitable on the average user. If you are seeing a lot of customers regularly hitting overage on this tier, the included usage is too low. If you are seeing the tier lose money on the average customer, the included usage is too high or the price is too low.

The top tier is for the heavy users and the ones who want predictability. It either has a much higher allowance or it includes overage credits at a discount. The customers on this tier are often businesses with a budget and a low tolerance for surprise invoices. The plan should give them predictable spend even at high volume, even if that means slightly higher unit cost than pure usage-based pricing would imply.

Overage exists at every tier. When a customer hits the cap, they get a clear, fair, undiscounted overage rate. Not a wall. Not a forced upgrade. Just a price they can choose to pay. Walls drive churn. Overage drives revenue. The math on which is better is not close.

The mistake to avoid is segmenting tiers by feature when the differentiator should be usage. If your AI feature is good enough to be the reason people are paying you, do not gate it behind the top tier. Gate it behind a usage allowance and let everyone use it. The customers who use it heavily will pay you more. The customers who use it lightly will not, and that is fine, because they cost you less.

Communicating Usage Without Inducing Anxiety

The UX of usage-based pricing is the part that usually decides whether the model works. Customers who feel a meter ticking in their head every time they click are customers who click less. Customers who get a clean dashboard and a predictable bill are customers who use the product more, find more value, and renew.

The patterns that work.

A real-time usage indicator that is informative without being alarmist. Show the customer how much of their allowance they have used this month. Show it as a percentage with a color. Do not show it as an estimated dollar amount that updates with every action. The dollar amount makes every click feel expensive. The percentage makes the cap feel like a budget.

Soft warnings before hard limits. When a customer is at 80 percent of their allowance with two weeks left in the month, send a quiet email. Tell them they are on track to hit the cap, what their options are, and how much overage would cost. Do not let them surprise themselves at 100 percent. Surprise overages are the single largest source of "I did not understand what I was buying" support tickets, and those are the tickets that turn into chargebacks.

Clear unit pricing on every action that costs credits. If a workflow costs 5 credits, tell the customer it costs 5 credits before they run it. Hidden costs feel like getting cheated, even when the price is fair. Visible costs feel like a transaction, which is what they are.

A monthly summary that ties usage to value. The customer should see, at the end of the month, what they got for their money. The number of articles, reviews, conversations, whatever the unit of value is. This is the artifact that makes renewal a yes when budgets get reviewed. The customer cannot remember what they did three weeks ago. The summary remembers for them.

The thing not to do is bury usage information in a settings page nobody opens. The whole point of usage-based pricing is that the cost reflects the value. If the customer cannot see the value, the cost feels arbitrary.

What Backfires

A few patterns look smart in the planning doc and turn into pain in production.

Free trials that do not have usage caps. A 14-day free trial of an AI feature with no usage limit is an invitation to a stranger to spend your money. Cap the trial at a sensible allowance. The serious customer will see the value within the cap. The freeloader will hit it and either convert or move on, and either is better than running up your bill for two weeks.

Custom pricing for everything above the lowest tier. "Contact sales" pricing makes sense for genuine enterprise deals. It does not make sense for the second paid tier of a self-serve product. Hiding prices forces a sales conversation on customers who would have happily paid the listed price, and most of them will leave instead of starting that conversation. Show prices. Negotiate enterprise.

BYO API keys. Letting the customer bring their own provider key looks like a way to push cost off your books. It also pushes off all the things that make your product work, including the routing, the evals, the caching, and the observability. The customer ends up running a worse version of your product on their own bill, and your value capture goes to zero. This is a pattern that comes back when budgets get tight and that is almost always a worse business than just charging for the value you provide.

Caching savings hidden from the customer. If your prompt caching layer cuts your costs by 60 percent, the customer should benefit from that, either through better pricing or through more included usage. Pocketing the savings entirely while the customer pays the pre-cache rate is a posture that survives until a competitor undercuts it. The caching architecture I wrote about in the prompt caching production guide is great for margins. It is even better when some of those margins are reinvested in price.

Pricing model changes without notice. Doubling prices, slashing allowances, or moving features between tiers without warning is the fastest way to turn happy customers into vocal critics. Every AI product I have watched make this move has eaten a wave of churn and a quarter of bad press. If the math is not working, the answer is to grandfather existing customers and change the pricing for new signups, not to break the deal mid-flight.

A Simple Framework For New Features

When I sit down to price a new AI feature now, the questions I run through have stabilized into a short list.

What is the unit of value the customer is buying? Not the token. The article, the review, the conversation, the analysis. Whatever the customer would describe to a friend.

What is the underlying cost per unit of value, including the worst-case input? Not the average. The 90th percentile. The pricing has to survive the heavy user, not just the median one.

What multiple of cost is defensible given the work my product does on top of the model? More if I am providing meaningful workflow, eval, integration, distribution. Less if I am thinly wrapping a frontier model.

How does the customer want to buy this? A meter, a credit pack, or an included allowance with overage. The answer depends on whether they are a developer, a prosumer, or a business buyer.

What does the cap look like at each tier, and where does the average user land relative to the cap? The middle tier should be profitable for the average user without making them feel restricted.

What happens when usage spikes? The pricing has to gracefully handle the customer who suddenly does ten times their normal volume, without either melting my margins or crashing into a wall that ends the relationship.

How does the price move when underlying model costs change? Either the customer benefits and I market the change, or I capture the margin and reinvest in features. The pricing should not be a fossil.

The answers shape the pricing. The pricing shapes the business. The business has to survive the worst customer in the dataset, not the average one. That framing is what flips AI feature pricing from "this is hard" to "this is solvable, and here is the structure that solves it."

What Does Not Change

The thing that has stayed constant through all of this is that customers will pay good money for AI features that solve real problems. They will not pay for AI features that solve fake problems, no matter how cheap the model gets. Pricing is the conversation about how much, in what shape, and on what terms. It is not the conversation about whether the feature is worth anything in the first place. That conversation happens upstream and the pricing model cannot rescue a feature that lost it.

The other constant is that the unit economics have to work. There is no pricing model that turns negative gross margin into a sustainable business. If a feature loses money on the average user, no amount of clever tiering will save it. The fix is upstream, in the cost structure, the routing, the model choice, the caching. Pricing is downstream of unit economics. It cannot beat them.

The AI feature pricing models that have converged in 2026 are not complicated. They are usage, credits, or hybrid with overage. The work is in the details. The unit of value, the margin, the tier shape, the UX, the caps, the warnings, the summaries. Get those right and you have a feature that scales with its costs instead of fighting them. Get them wrong and your most enthusiastic users are also the ones killing your business, which is not the user feedback loop you wanted.

That weekend I rewrote my pricing was the cheapest education I have ever bought. The bill that scared me into doing it was not even that high. The bill it would have been if I had not is the one I do not have to think about, because I caught it before it got there. Every AI feature I have shipped since has had pricing built in from day one, sized for the heavy user, communicated clearly, and tied to the unit the customer actually values. That has held up across model generations, across customer segments, across price drops in the underlying APIs. The pattern outlasts the parts.

If you are shipping an AI feature on a flat plan in 2026, the pricing is the bug. Fix it before your power user finds out it is there.

Top comments (0)