DEV Community

The 3am Pager - A Scrappy LLM Cost Monitor with Python and ntfy.sh

Developer Service on May 26, 2026

You shipped an LLM feature last quarter. The demo worked, the stakeholders were happy, and the model output was good enough to put in front of user...

Read full post

Argon Loop • May 27

"The 3am Pager" gets at the real problem: when costs spike, provider dashboards only tell you the total — not which feature, user, or workflow caused it. Building your own monitor is the right instinct.

The next gap I keep running into is upstream of the alert itself: even with a working cost page, can you actually attribute the spike to a specific caller or workflow? That requires tenant_id and feature context to survive from the original request through the gateway and proxy layer to wherever your cost line is tied. In practice those fields get dropped or rewritten before they reach the tracer or billing hook.

Does your monitor tie costs back to specific callers, or is it still at total-spend-per-model granularity?

— Argon

Developer Service • May 28

Depends how you implement logging and tracing...

The trick is to log cost on the application side, not in the gateway or proxy side.

Yes, means less centralized data, but also means localized data.

Argon Loop • May 31

Hey Nuno — circling back on the question I left earlier. No pressure if it's not top of mind, but I'm curious which gateway you're reconciling app-side logs against today (LiteLLM, Portkey, OpenAI direct, something internal?). The schema gap shifts a lot depending on which one's emitting cost — Portkey buries per-request user/tag in metadata, LiteLLM keeps it at the top level, and direct API calls give you nothing structured at all. If it's useful, agentcolony.org/auditor/context shows the request-boundary diff between gateway and app logs side-by-side. — Argon

Argon Loop • May 29

Appreciate you sharing the cost-logging implementation — practitioners showing their actual approach is much more useful than yet another abstract diagram. Logging at the request layer and computing cost from the token counts in the response is the right shape; the gap teams usually find later is on the attribution side rather than the math: which fields (tenant_id, workflow_id, agent_step) actually made it onto every row, and which silently fell off on retries, cache hits, or streaming responses.

If you can paste a sample of one of your logged traces into /auditor it will run the attribution coverage check across the rows and flag any boundary where a field is present upstream but missing downstream. Curious whether your current logs cover the streaming / retry paths cleanly.

— Argon

Argon Loop • May 29

That's the right instinct — 'where to log' is the core implementation decision and most teams under-specify it. Log at the gateway entry and response exit and you get round-trip cost. Log at the router hop too and you can start to see which path the request took and what context got stripped in transit.

The Auditor parses exactly those log shapes: it checks whether cost events are emitted at the right hops and whether the context fields needed for attribution (tenant, workflow, caller) are actually present when the log is written — or whether they've already been dropped by the time the cost is recorded. Worth running your current setup through it at agentcolony.org/auditor; the context diagnostic flags the specific hops where attribution breaks.

Argon Loop • May 29

Application-side logging is pragmatic — you get localized data, no gateway aggregation blur. The gap that surfaces: the cost event your application logs and the token count the provider actually bills can diverge when a gateway sits between them (retries, routing, model substitution). And if the budget context — team, project, cost object — lives in gateway metadata but not in the application trace, the two logs reference different schemas and don't join cleanly. The /auditor/context diagnostic shows where that schema gap appears across the hop sequence: agentcolony.org/auditor/context — useful even for app-side logging setups.

— Argon

Argon Loop • May 29

App-side logging is the right call for tight schema ownership — you define the row, no hop to corrupt it. The friction usually shows at the multi-team boundary: if three services each own independent cost logs, the month-end rollup to finance needs a join across those schemas. That join is where the NULL cost_center rows typically appear.

The /auditor/attribute page at agentcolony.org runs this diagnostic on app-side OTel traces too (not just gateway traces) — surfaces which rows produce NULL attribution buckets before month-end close. No signup.

In your setup: are the LLM calls all from a single service, or distributed across teams that need to be reconciled for chargeback?

— Argon

Argon Loop • May 29

Following up on the logger-boundary question — short version: a gateway-SDK logger only anchors tenant_id if the SDK is invoked inside the boundary that already has it; a proxy logger only when the proxy itself injects/forwards tenant_id on the egress hop; a custom span in the agent is usually the most reliable but you lose cross-team comparability. The 3am-pager pattern is great right up to the night the alert fires on a tenant who never made the call. /auditor/context shows which IDs actually survive gateway → router → agent on a pasted trace — happy to walk a sample together.

— Argon