You shipped an AI feature. Users love it. Then the monthly LLM bill arrives and it's 3x what you budgeted.
I've been there. I run AI agents that chain multiple LLM calls per task. For the first month, I had zero visibility into which calls cost what. PostHog's LLM analytics fixed that in about 10 minutes of setup. Here's exactly how.
The Problem: LLM Costs Are a Black Box
Most AI applications have the same blind spot. You know the total monthly bill from OpenAI or Anthropic. You don't know:
- Which feature costs the most per user
- How many tokens each conversation burns
- Whether your prompt caching is actually working
- Which model calls fail and trigger expensive retries
Traditional APM tools weren't built for this. PostHog's LLM analytics was.
Setup: 3 Lines of Python
Install the PostHog SDK with the AI extras:
pip install 'posthog[ai]'
Replace your OpenAI import with PostHog's wrapper:
# Before
from openai import OpenAI
client = OpenAI()
# After
from posthog.ai.openai import OpenAI
client = OpenAI(
posthog_api_key="phc_your_project_key",
posthog_host="https://us.i.posthog.com",
)
That's it. Every client.chat.completions.create() call now automatically captures an $ai_generation event with the model name, input/output tokens, latency, cost, and the full prompt/response.
Your existing code doesn't change. You call OpenAI the same way you always did. PostHog intercepts the response and logs everything.
What PostHog Captures Automatically
Each $ai_generation event includes:
| Property | What it tells you |
|---|---|
$ai_model |
Which model handled this call |
$ai_input_tokens |
Prompt tokens consumed |
$ai_output_tokens |
Completion tokens generated |
$ai_latency |
Response time in seconds |
$ai_total_cost |
Calculated cost in USD |
$ai_http_status |
Whether the call succeeded |
$ai_input |
Full prompt (disable with privacy mode) |
$ai_output |
Full response |
No manual event tracking. No custom properties to maintain. The wrapper does the accounting.
Using LiteLLM? Even Easier
If you route calls through LiteLLM for multi-provider setups, the integration is two lines:
import litellm
litellm.success_callback = ["posthog"]
litellm.failure_callback = ["posthog"]
This captures every call across every provider: OpenAI, Anthropic, Google, Cohere, local models. Same $ai_generation events, same cost tracking, same dashboards.
Custom Cost Overrides
PostHog knows the public pricing for major models. But if you've negotiated volume discounts or you're running a model PostHog doesn't recognize, override the cost calculation:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this document"}],
posthog_properties={
"$ai_input_token_price": 0.0000020, # per token, not per million
"$ai_output_token_price": 0.0000080,
"$ai_cache_read_token_price": 0.0000010,
}
)
Two things to watch:
- Prices are per individual token, not per million. A common mistake.
- Both
$ai_input_token_priceand$ai_output_token_pricemust be set for the override to take effect.
The Dashboard That Actually Matters
PostHog ships a pre-built LLM metrics dashboard template. But the three numbers I check daily:
Cost per conversation. Not cost per call. A single user interaction might chain 3-5 LLM calls (classification, retrieval, generation, summarization). PostHog's trace grouping ties them together so you see the real cost of serving one user request.
Error rate by model. Rate limits, timeouts, malformed responses. If 5% of your GPT-4o calls fail and trigger a retry on a more expensive model, your effective cost jumps. I caught a retry loop that was doubling my spend on classification calls.
P95 latency trend. Cost isn't just dollars. Slow responses kill UX. The latency chart shows whether your model choice is holding up under real traffic, not just benchmarks.
Traces: Debugging Expensive Calls
The trace view is where PostHog's LLM analytics gets interesting. Every conversation groups its LLM calls into a trace. Click into any trace and you see the full chain: which prompts went in, what came back, how long each step took, what it cost.
I found a prompt that was sending 4,000 tokens of context when 800 would have worked. One trace investigation, one prompt trim, 80% cost reduction on that call path.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
posthog_trace_id="support-ticket-classification",
posthog_properties={
"ticket_priority": "high",
"customer_tier": "enterprise",
}
)
Custom properties let you filter traces by business context. "Show me all enterprise customer traces where cost exceeded $0.50" becomes a one-click query.
What This Costs
PostHog's LLM analytics is free for 100K events per month. Each LLM call generates one event. If your app makes 100K LLM calls monthly, you're covered.
For context, most early-stage AI products process 10K-50K LLM calls monthly. You'd need significant scale to outgrow the free tier.
Compare that to dedicated LLM observability tools that either require self-hosting or charge earlier. PostHog gives you LLM analytics alongside your existing product analytics, session recordings, and feature flags in one tool.
Get Started in 10 Minutes
- Sign up for PostHog (free tier works)
- Install
posthog[ai]in your Python project - Swap your OpenAI import for PostHog's wrapper
- Deploy
- Check the LLM Analytics tab after a few calls come through
You'll immediately see which models you're using, what they cost, and where the money goes. Every AI product needs this visibility. The only question is whether you set it up before or after the invoice surprise.
Top comments (0)