江欢（JackSoul）

Posted on Jun 5

LLM API cost attribution playbook for production SaaS teams

#ai #saas #api #costs

TL;DR

If your SaaS product calls multiple LLM providers, the invoice from OpenAI, Anthropic, Gemini, Bedrock, or OpenRouter is not enough. You need attribution at the feature, tenant, assistant, thread, model, and provider level. Otherwise every product experiment turns into one blended AI bill.

A practical LLM cost attribution stack has four layers:

One OpenAI-compatible gateway endpoint so apps route through a shared control point.
Scoped API keys per app, customer, assistant, or workflow.
Per-request metadata so calls can be grouped by tenant, feature, thread, and user.
Budget enforcement and fallback rules so spend is capped before an agent loop becomes expensive.

FerryAPI is built for teams that want this pattern without rewriting their OpenAI SDK integrations.

Why provider invoices are not enough

Provider invoices answer one narrow question: how much did the account spend overall?

They usually do not answer the questions a SaaS operator actually needs:

Which customer created the largest AI bill this week?
Which feature caused the usage spike?
Did the cost come from input tokens, output tokens, vector reads, or memory writes?
Which model/provider route was responsible?
Did a single thread or background job loop unexpectedly?
Can this customer be moved to a lower-cost route without changing the application code?

Without attribution, teams either over-restrict AI usage or absorb unpredictable margin loss.

The minimum metadata to capture

For every LLM call, store these fields:

tenant_id or organization id
user_id when available
assistant_id, agent id, or workflow id
thread_id or session id
feature name, route, or product surface
upstream provider
model name
input tokens
output tokens
cache-read tokens if supported
request cost
latency
request status / error reason

This turns AI usage into a normal product analytics problem instead of a surprise finance problem.

Where an AI API gateway helps

An OpenAI-compatible AI API gateway gives you one control plane between the app and multiple model providers.

That means you can:

keep existing OpenAI SDK clients pointed at a custom base_url
issue separate keys per customer, app, assistant, or environment
apply prepaid balances or hard quotas
route different traffic classes to different providers
preserve request logs for spend review and debugging
fall back to cheaper or free routes when a budget cap is hit

The important part is not only cheaper tokens. It is operational control.

A simple rollout plan

Step 1: route one low-risk feature through the gateway

Pick a non-critical workflow first, such as summaries, support-draft generation, or internal analytics.

Keep the same OpenAI SDK and change only:

base_url = https://api.your-gateway.example/v1
api_key  = scoped_key_for_this_feature

Step 2: attach metadata to every call

Start with tenant, feature, and thread. Add user and assistant ids later if needed.

Step 3: create budget thresholds

Use soft alerts first, then hard caps:

50% of budget: notify owner
80% of budget: switch to cheaper route for non-critical calls
100% of budget: block or fall back to free/open-source route

Step 4: review usage weekly

Look for:

high-output prompts that can be shortened
repeated context that should be cached
expensive models used for simple classification
tenants whose usage exceeds their plan economics

Checklist for evaluating a gateway

Use this checklist before adopting any AI API gateway:

Does it expose an OpenAI-compatible /v1 endpoint?
Can you create scoped API keys?
Can each key have a separate budget or prepaid balance?
Does it log provider, model, tokens, latency, and cost per request?
Can you export or filter usage by tenant, assistant, thread, or feature?
Does it support routing or fallback rules?
Are supported regions and model availability clear?
Is pricing visible enough to forecast gross margin?
Can you keep using your current SDKs and agents?

How FerryAPI fits this workflow

FerryAPI provides an OpenAI-compatible gateway for production apps that need:

one API entry point for multiple model routes
lower-cost model access options
prepaid balance and usage-based billing controls
customer API key management
dashboard-level cost visibility
integration with apps and agents that already support custom OpenAI base_url

Learn more: https://www.ferryapi.io/

Final note

AI API cost optimization is not just about picking the cheapest model. The bigger win is knowing exactly who spent what, why, and what rule should apply next time.

Once you have attribution, model routing and budget control become engineering choices instead of finance surprises.

DEV Community