Diven Rastdus

Posted on May 15 • Originally published at astraedus.dev

PostHog LLM Analytics: How to Track AI Agent Costs Before They Bankrupt You

#posthog #ai #llm #observability

You shipped an AI feature. Users love it. Then the monthly LLM bill arrives and it's 3x what you budgeted.

I've been there. I run AI agents that chain multiple LLM calls per task. For the first month, I had zero visibility into which calls cost what. PostHog's LLM analytics fixed that in about 10 minutes of setup. Here's exactly how.

The Problem: LLM Costs Are a Black Box

Most AI applications have the same blind spot. You know the total monthly bill from OpenAI or Anthropic. You don't know:

Which feature costs the most per user
How many tokens each conversation burns
Whether your prompt caching is actually working
Which model calls fail and trigger expensive retries

Traditional APM tools weren't built for this. PostHog's LLM analytics was.

Setup: 3 Lines of Python

Install the PostHog SDK with the AI extras:

pip install 'posthog[ai]'

Replace your OpenAI import with PostHog's wrapper:

# Before
from openai import OpenAI
client = OpenAI()

# After
from posthog.ai.openai import OpenAI

client = OpenAI(
    posthog_api_key="phc_your_project_key",
    posthog_host="https://us.i.posthog.com",
)

That's it. Every client.chat.completions.create() call now automatically captures an $ai_generation event with the model name, input/output tokens, latency, cost, and the full prompt/response.

Your existing code doesn't change. You call OpenAI the same way you always did. PostHog intercepts the response and logs everything.

What PostHog Captures Automatically

Each $ai_generation event includes:

Property	What it tells you
`$ai_model`	Which model handled this call
`$ai_input_tokens`	Prompt tokens consumed
`$ai_output_tokens`	Completion tokens generated
`$ai_latency`	Response time in seconds
`$ai_total_cost`	Calculated cost in USD
`$ai_http_status`	Whether the call succeeded
`$ai_input`	Full prompt (disable with privacy mode)
`$ai_output`	Full response

No manual event tracking. No custom properties to maintain. The wrapper does the accounting.

Using LiteLLM? Even Easier

If you route calls through LiteLLM for multi-provider setups, the integration is two lines:

import litellm

litellm.success_callback = ["posthog"]
litellm.failure_callback = ["posthog"]

This captures every call across every provider: OpenAI, Anthropic, Google, Cohere, local models. Same $ai_generation events, same cost tracking, same dashboards.

Custom Cost Overrides

PostHog knows the public pricing for major models. But if you've negotiated volume discounts or you're running a model PostHog doesn't recognize, override the cost calculation:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this document"}],
    posthog_properties={
        "$ai_input_token_price": 0.0000020,   # per token, not per million
        "$ai_output_token_price": 0.0000080,
        "$ai_cache_read_token_price": 0.0000010,
    }
)

Two things to watch:

Prices are per individual token, not per million. A common mistake.
Both $ai_input_token_price and $ai_output_token_price must be set for the override to take effect.

The Dashboard That Actually Matters

PostHog ships a pre-built LLM metrics dashboard template. But the three numbers I check daily:

Cost per conversation. Not cost per call. A single user interaction might chain 3-5 LLM calls (classification, retrieval, generation, summarization). PostHog's trace grouping ties them together so you see the real cost of serving one user request.

Error rate by model. Rate limits, timeouts, malformed responses. If 5% of your GPT-4o calls fail and trigger a retry on a more expensive model, your effective cost jumps. I caught a retry loop that was doubling my spend on classification calls.

P95 latency trend. Cost isn't just dollars. Slow responses kill UX. The latency chart shows whether your model choice is holding up under real traffic, not just benchmarks.

Traces: Debugging Expensive Calls

The trace view is where PostHog's LLM analytics gets interesting. Every conversation groups its LLM calls into a trace. Click into any trace and you see the full chain: which prompts went in, what came back, how long each step took, what it cost.

I found a prompt that was sending 4,000 tokens of context when 800 would have worked. One trace investigation, one prompt trim, 80% cost reduction on that call path.

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    posthog_trace_id="support-ticket-classification",
    posthog_properties={
        "ticket_priority": "high",
        "customer_tier": "enterprise",
    }
)

Custom properties let you filter traces by business context. "Show me all enterprise customer traces where cost exceeded $0.50" becomes a one-click query.

What This Costs

PostHog's LLM analytics is free for 100K events per month. Each LLM call generates one event. If your app makes 100K LLM calls monthly, you're covered.

For context, most early-stage AI products process 10K-50K LLM calls monthly. You'd need significant scale to outgrow the free tier.

Compare that to dedicated LLM observability tools that either require self-hosting or charge earlier. PostHog gives you LLM analytics alongside your existing product analytics, session recordings, and feature flags in one tool.

Get Started in 10 Minutes

Sign up for PostHog (free tier works)
Install posthog[ai] in your Python project
Swap your OpenAI import for PostHog's wrapper
Deploy
Check the LLM Analytics tab after a few calls come through

You'll immediately see which models you're using, what they cost, and where the money goes. Every AI product needs this visibility. The only question is whether you set it up before or after the invoice surprise.

DEV Community