Khushi Dubey

Posted on Jun 23 • Originally published at opslyft.com

AI Costs Are Cloud Costs Now: Why FinOps Is the New Playbook for AI Spend

#infrastructure #llm #cloud #ai

Something quietly changed inside finance dashboards over the last eighteen months. The line item for AI tools used to be small and predictable. Now it sits right next to the cloud bill, growing at a pace nobody fully forecasted, and looking suspiciously similar to how AWS looked back in 2015.

This is not a coincidence. AI coding assistants, model APIs, and agent platforms all bill on usage. They are variable. They are skewed by power users. And most teams have almost no visibility into who is spending what, on which models, for which projects.

In this guide, you will learn why AI spend behaves exactly like cloud infrastructure spend, what FinOps lessons apply directly, and a practical framework you can use this quarter to bring AI costs under control without slowing your engineering teams down.

Why Today's AI Spend Looks Identical to Early Cloud Bills
Ten years ago, most finance teams treated cloud as a single line item. Engineering ran the show. Spend grew quietly until it didn't, and then everyone scrambled.

AI is repeating this exact pattern, just on a faster clock.

A few drivers explain why:

AI tools moved from fixed-seat pricing to usage-based pricing in under two years
Power-user skew is severe; a small percentage of developers often drive most of the consumption
Multiple models with very different price points create silent cost differences
Agentic workflows accumulate token cost in ways that linger long after a session ends
No engineer is incentivized to think about cost while they code
According to research from McKinsey's State of AI series, generative AI adoption inside companies more than doubled in a single year. That kind of growth curve mirrors the early AWS era, when teams discovered that elastic also meant expensive at scale.

The takeaway is simple. AI spend is not a new beast. It is the next chapter of cloud cost management, and the playbook that worked for EC2 and S3 already works for tokens and prompts.

The Visibility Gap That Is Quietly Costing Companies Millions
Walk into most engineering organizations and ask a simple question.

"Which team spent the most on AI last week?"

Silence usually follows. Or someone pulls up a vendor dashboard that shows total seats and a token total, but nothing useful below the surface.

This is the same gap cloud teams had a decade ago. The bill arrives. The total goes up. Nobody knows exactly why.

Common blind spots include:

Which developers are responsible for the largest spend
How spend splits across input tokens, output tokens, and cached tokens
Which models drive the cost across Claude, GPT, Gemini, and open-source families
Whether spend is mostly autocomplete or mostly long-running agent sessions
How spend correlates with actual engineering output
Gartner has been warning for years about shadow IT growing inside organizations. The new version is shadow AI. Developers find a tool, use it, expense it, and finance discovers it three quarters later when the consolidated invoice arrives.

The fix is not new technology. The fix is visibility, the same kind cloud cost programs built years ago

Five FinOps Principles That Apply Directly to AI
The FinOps Foundation spent years codifying what good cloud cost management looks like. Most of it transfers cleanly to AI.

Here are five principles worth lifting straight off the shelf:

Visibility comes before control. You cannot manage what you cannot see. Get the data first.
Allocate spend to teams, projects, and outcomes. Top-line totals are useless; team-level breakdowns are actionable.
Measure unit economics, not raw spend. Dollars per PR, dollars per ticket, dollars per deploy.
Detect anomalies early. Use a daily or weekly cadence, not monthly.
Use informed guardrails, not hard caps. Educate engineers; do not lock them out.
The pattern that emerges here is not technological. It is cultural. Finance and engineering have to share the same numbers.

Tagging and Allocation for AI: Treat Tokens Like EC2 Hours
In cloud cost work, tagging is the foundation. Without it, allocation is impossible.

AI spend actually has better attribution data than most cloud services. Every API request typically includes:

The model used
Input and output token counts
Latency and request metadata
Optional custom metadata fields
Caller identity, when API keys are scoped correctly
The raw signal is rich. The challenge is converting it into something a non-technical stakeholder can actually use.

A simple mapping looks like this:

Raw AI Data Business Translation
2.3M Opus input tokens, dev_id 472 Payments squad refactor, week 14
800K cached tokens on agent runs Docs team migration, ongoing
1.1M output tokens, GPT family Support ticket triage automation
50K tokens, Haiku model Inline autocomplete, all engineering
hat kind of breakdown turns a single invoice into a story finance can understand, and a budget engineering can own.

If your organization already has cost allocation workflows for cloud, you do not need to start from zero. Add AI as another provider with another set of dimensions, and feed it into the same reports.

If you are still building cloud allocation muscle, the opslyft blog covers tagging strategies that translate naturally to AI spend management.

Unit Economics: The Metric That Actually Matters
Raw spend numbers do not tell you whether your AI investment is working. Unit economics do.

Consider two teams.

Team A spends $4,000 per month on AI tools and ships 80 PRs.
Team B spends $4,000 per month on AI tools and ships 35 PRs.
Same spend. Very different efficiency. Without unit economics, the dashboards look identical.

The metrics that matter most include:

Cost per PR merged. How much does it cost in AI tokens to ship a unit of code?
Cost per ticket closed. How much does it cost to resolve a unit of planned work?
Cost per deploy. Measured across the full pipeline from prompt to production.
AI cost per developer per sprint. Is utilization rising as the team learns?
Cost per AI-assisted feature. End-to-end, including review and rework.
Computing these requires connecting two data sources. The cost side comes from your AI providers (Anthropic, OpenAI, Cursor, GitHub Copilot, and so on). The output side comes from GitHub, GitLab, Linear, Jira, or your CI pipeline.

When you put them together, conversations change. Instead of asking why AI costs are going up, the question becomes whether each dollar is producing more output than it did last quarter.

That is a question finance and engineering can actually answer together.

Detecting Anomalies Before They Become Invoices
Usage-based spend produces surprises. Cloud taught us this. AI is no different.

Common AI cost spikes include:

A developer leaves an agentic session running overnight with a runaway retry loop
A team switches from a lower-tier to a higher-tier model and the cost jumps 10x without anyone noticing
A long-running agent accumulates context until each turn costs five times the first
An automated workflow hits an edge case and retries hundreds of times
A new feature ships with verbose prompts and silently triples cost per request
Most of these are invisible until the monthly invoice arrives. By then the damage is done.

Anomaly detection works the same way it does in cloud. Set baselines, monitor daily or weekly, flag deviations, and surface them to the right team owner. The detection logic is identical. Only the patterns differ.

A few quick wins to set up immediately:

Daily per-developer spend baseline with a 2x threshold
Per-team weekly trend with month-over-month comparison
Model mix alert that notifies when premium model usage exceeds a percentage
Session-length alert that flags when a single agentic session exceeds a token threshold
None of this requires fancy machine learning. Simple thresholds catch the vast majority of cost surprises

Why Hard Caps Fail, and What to Use Instead
One of the harder lessons in cloud cost work was that blunt controls backfire.

Restrict instance types and engineers spin up larger instances less often, often using more compute than the cap was meant to save. Cap spend at a hard limit and entire projects stall on the last week of the month.

The same applies to AI.

If you cut off a developer's access to a high-quality model, they will fall back to a cheaper one, take longer to ship, and burn more total tokens in the process. The productivity gain that justified the tool evaporates.

Better alternatives include:

Soft budgets with alerts. "You are at 80% of your typical monthly spend with two weeks left" is useful. A shutoff is not.
Task-aware model guidance. Heavy reasoning warrants a premium model. Inline autocomplete does not. Make this explicit.
Real-time session cost visibility. Show developers what a session is costing as it runs.
Default to cheaper models with easy escalation. Use the cheapest model that meets the task, with a clear path to upgrade when needed.
Education over restriction. A short internal guide on model selection beats any cap.
The pattern here is the same one that worked in cloud. Trust engineers, give them the data, and let them make informed decisions.

Three Real Scenarios Where Companies Burn Money on AI
A few patterns come up over and over in conversations with engineering and finance leaders.

The Forgotten Agent A developer kicks off an agent on Friday afternoon to refactor a service. They go home. The agent hits a flaky test, retries, escalates context, retries again, and runs all weekend. Monday morning brings a single-developer spend equal to the rest of the team for the month.

The fix: a session-length alert and a per-session budget cap, not a per-developer cap.

The Silent Model Upgrade A team's tooling defaults change after a vendor update. What used to call the cheaper model now calls the premium model. Output quality goes up. Nobody notices the cost has gone up 8x until the invoice arrives.

The fix: model mix monitoring with a week-over-week trend alert.

The Context-Bloat Session An agent works on a large codebase. Each turn appends more context. By turn 40, a single message costs more than the entire first hour of the session. Productivity feels normal. Cost is exponential.

The fix: real-time per-session cost surfacing, plus guidance on when to reset context.

These are not edge cases. They are the new normal. Every team running AI tools at scale will hit some version of each within their first year.

How opslyft Helps Businesses Manage AI and Cloud Costs Together
Most companies trying to manage AI spend today face a familiar problem. The data sits in many places. Cursor has one dashboard. Anthropic has another. OpenAI has another. AWS has fifty. None of them talk to each other.

opslyft brings these data sources into a single view, applies cost allocation, and connects spend to engineering output. The platform was built for cloud cost management and extends naturally to AI tools, treating AI as another provider in a unified FinOps program.

Specific capabilities include:

Multi-source integration across cloud providers, AI tools, and developer platforms
Cost allocation by team, project, environment, and developer
Unit economics dashboards linking spend to PRs, tickets, and deploys
Anomaly detection with daily and weekly cadence
Soft budgets and informed guardrails that protect productivity
Optimization recommendations with measurable savings impact
Security-first deployment with read-only access patterns and SOC 2 controls
The principle is the same one that worked for cloud. Visibility first, then allocation, then unit economics, then targeted action. AI is just the next provider on the list.

Conclusion
AI spending is not a new problem. It is the next chapter of the same cloud cost story finance and engineering teams have been working through for a decade.

The companies that treat AI as just another provider inside their FinOps program will move faster, spend smarter, and avoid the budget shocks that catch everyone else by surprise.

Top comments (1)

Trigops • Jun 25

The GPU idle time point is underrated here. Most FinOps tooling assumes you're optimizing at the API call level — tokens, caching, model routing — but for teams running self-hosted inference, GPU utilization is the real lever. A half-loaded A100 at $3–4/hr is expensive dead weight that never shows up in the LLM API spend dashboards most teams are watching.

Your maturity model rings true too. The jump from Stage 2 to Stage 3 usually stalls not on tooling but on attribution — specifically, attributing LLM costs back to individual product features rather than a single shared API key. Until that's solved, optimization conversations stay abstract. Teams know they're overspending but can't point to where.

One thing worth adding to the forecasting section: usage spikes during model fine-tuning or eval runs are easy to conflate with production growth. Tagging experimental workloads separately (as you mention) is necessary, but so is making those tags automatic in your ML platform rather than relying on engineers to remember. Manual tagging compliance degrades fast under deadline pressure.