TL;DR
If your SaaS product calls multiple LLM providers, the invoice from OpenAI, Anthropic, Gemini, Bedrock, or OpenRouter is not enough. You need attribution at the feature, tenant, assistant, thread, model, and provider level. Otherwise every product experiment turns into one blended AI bill.
A practical LLM cost attribution stack has four layers:
- One OpenAI-compatible gateway endpoint so apps route through a shared control point.
- Scoped API keys per app, customer, assistant, or workflow.
- Per-request metadata so calls can be grouped by tenant, feature, thread, and user.
- Budget enforcement and fallback rules so spend is capped before an agent loop becomes expensive.
FerryAPI is built for teams that want this pattern without rewriting their OpenAI SDK integrations.
Why provider invoices are not enough
Provider invoices answer one narrow question: how much did the account spend overall?
They usually do not answer the questions a SaaS operator actually needs:
- Which customer created the largest AI bill this week?
- Which feature caused the usage spike?
- Did the cost come from input tokens, output tokens, vector reads, or memory writes?
- Which model/provider route was responsible?
- Did a single thread or background job loop unexpectedly?
- Can this customer be moved to a lower-cost route without changing the application code?
Without attribution, teams either over-restrict AI usage or absorb unpredictable margin loss.
The minimum metadata to capture
For every LLM call, store these fields:
-
tenant_idor organization id -
user_idwhen available -
assistant_id, agent id, or workflow id -
thread_idor session id - feature name, route, or product surface
- upstream provider
- model name
- input tokens
- output tokens
- cache-read tokens if supported
- request cost
- latency
- request status / error reason
This turns AI usage into a normal product analytics problem instead of a surprise finance problem.
Where an AI API gateway helps
An OpenAI-compatible AI API gateway gives you one control plane between the app and multiple model providers.
That means you can:
- keep existing OpenAI SDK clients pointed at a custom
base_url - issue separate keys per customer, app, assistant, or environment
- apply prepaid balances or hard quotas
- route different traffic classes to different providers
- preserve request logs for spend review and debugging
- fall back to cheaper or free routes when a budget cap is hit
The important part is not only cheaper tokens. It is operational control.
A simple rollout plan
Step 1: route one low-risk feature through the gateway
Pick a non-critical workflow first, such as summaries, support-draft generation, or internal analytics.
Keep the same OpenAI SDK and change only:
base_url = https://api.your-gateway.example/v1
api_key = scoped_key_for_this_feature
Step 2: attach metadata to every call
Start with tenant, feature, and thread. Add user and assistant ids later if needed.
Step 3: create budget thresholds
Use soft alerts first, then hard caps:
- 50% of budget: notify owner
- 80% of budget: switch to cheaper route for non-critical calls
- 100% of budget: block or fall back to free/open-source route
Step 4: review usage weekly
Look for:
- high-output prompts that can be shortened
- repeated context that should be cached
- expensive models used for simple classification
- tenants whose usage exceeds their plan economics
Checklist for evaluating a gateway
Use this checklist before adopting any AI API gateway:
- Does it expose an OpenAI-compatible
/v1endpoint? - Can you create scoped API keys?
- Can each key have a separate budget or prepaid balance?
- Does it log provider, model, tokens, latency, and cost per request?
- Can you export or filter usage by tenant, assistant, thread, or feature?
- Does it support routing or fallback rules?
- Are supported regions and model availability clear?
- Is pricing visible enough to forecast gross margin?
- Can you keep using your current SDKs and agents?
How FerryAPI fits this workflow
FerryAPI provides an OpenAI-compatible gateway for production apps that need:
- one API entry point for multiple model routes
- lower-cost model access options
- prepaid balance and usage-based billing controls
- customer API key management
- dashboard-level cost visibility
- integration with apps and agents that already support custom OpenAI
base_url
Learn more: https://www.ferryapi.io/
Final note
AI API cost optimization is not just about picking the cheapest model. The bigger win is knowing exactly who spent what, why, and what rule should apply next time.
Once you have attribution, model routing and budget control become engineering choices instead of finance surprises.
Top comments (0)