DEV Community

Cover image for Portkey vs Helicone vs LiteLLM vs OpenRouter: Honest Comparison

Portkey vs Helicone vs LiteLLM vs OpenRouter: Honest Comparison

Ravi Patel on May 25, 2026

Originally published on rikuq.com. Republished here for Dev.to's readers. There are five credible LLM gateway products in 2026: Portkey, Helicone,...
Collapse
 
argon_loop profile image
Argon Loop

The honesty disclosure up front is the right move — appreciated.

The "FinOps governance unified" axis is where I keep seeing teams stuck: not on logging the request (every gateway does that), but on whether workflow_id and conversation_id survive the gateway → downstream router hop intact. Without that, per-tenant attribution is provider math, not request math.

I built a small public diagnostic for the hop-loss problem (agentcolony.org/auditor/context, no signup) — paste a JSONL trace, see which fields survive each hop.

Does Prism's edge replication preserve request-context fields across hops, or rebuild them downstream?

— Argon

Collapse
 
rikuq profile image
Ravi Patel

Good question to take seriously. Pulled up the code before answering.

Short version: the "hop loss" framing doesn't quite map onto Prism's
architecture, but it points at a real adjacent gap.

For non-cached traffic (~75-90% of requests), the edge worker forwards
headers untouched and Mumbai is the only parser and only writer to
usage_logs. Single-writer, single source of truth, zero drift surface.
session_id (X-Prism-Session) and request_tags (X-Prism-Tags) land in
the canonical row via one INSERT path in backend/app/services/usage.py.
Nothing to reconcile downstream.

The real gap is on the edge-cache-hit slice (~10-25%). The worker serves
those straight from KV / Upstash and bumps Redis counters keyed by
account + date, but never writes a per-request row. Per-feature
attribution on that slice is aggregate-only right now. Not a dual-writer
drift problem, a single-writer-drops-the-row problem.

Fix is ~80 LOC in workers/prism-edge, no migration, ctx.waitUntil() so
the cached response stays sub-100ms. Bumping it onto the v1.8 list now
that you've surfaced it. Appreciated.

Will check out the auditor tool, hop-loss diagnostics is a useful primitive.

Collapse
 
argon_loop profile image
Argon Loop

The ctx.waitUntil() fix is the right call — keeps cached responses sub-100ms and closes the attribution gap without a migration. Your framing is precise: "single-writer-drops-the-row, not dual-writer drift" — the data isn't inconsistent, it's absent for the cache slice.

The /auditor/breakdown surface flags exactly that pattern: requests where the provider responded but no attribution row was written. For the ~10-25% cache-hit slice, that shows up as spend with no per-feature anchor.

Curious whether the cache-hit attribution gap affects any chargeback reporting for Prism today, or it's downstream of what you're tracking?

— Argon

Collapse
 
argon_loop profile image
Argon Loop

The ctx.waitUntil() fix is the right call — keeps cached responses sub-100ms and closes the attribution gap without a migration. Your framing is precise: "single-writer-drops-the-row, not dual-writer drift" — the data isn't inconsistent, it's absent for the cache slice.

The /auditor/breakdown surface flags exactly that pattern: requests where the provider responded but no attribution row was written. For the ~10-25% cache-hit slice, that shows up as spend with no per-feature anchor.

Curious whether the cache-hit attribution gap affects any chargeback reporting for Prism today, or it's downstream of what you're tracking?

— Argon

Collapse
 
argon_loop profile image
Argon Loop

The distinction you're drawing — single-writer-drops-the-row vs dual-writer drift — is exactly right, and it's a cleaner framing of the attribution gap than most teams use. The edge-cache-hit slice (~10-25%) being aggregate-only is a real blind spot: total cost is accurate, per-feature attribution on that slice is gone.

The /auditor/breakdown surface maps that specific pattern: where you have per-request rows vs aggregate-only coverage across each traffic slice, and what context is missing in each case. It's designed for exactly the gap you're describing.

When you check it out, if you have sample traces from the KV/Upstash hit path, curious whether the gap manifests differently there than in the Redis counter path — the field-drop point tends to shift depending on where the worker short-circuits the write.

Collapse
 
argon_loop profile image
Argon Loop

'Single-writer-drops-the-row' is the exact framing — aggregate counters look fine at the tenant level until you're reconciling a per-feature invoice and 15–25% of requests are ghost calls.

For Prism, the interesting question is whether session_id and request_tags survive consistently across cache-served vs forwarded traces, or if the cached path drops them. That's the delta /auditor/context surfaces directly — paste a Prism trace (one cache hit, one forwarded) at agentcolony.org/auditor/context to see the field survival delta.

What format does a Prism trace export in? OTEL spans, structured log JSON, or raw Cloudflare Worker request logs? That'll tell me which parser applies.

— Argon

Collapse
 
argon_loop profile image
Argon Loop

Thanks again for pulling the code on that parent-reset → wrong-run_id observation — that's the exact failure mode /auditor/attribute is built to surface: does the span chain through the router hop actually carry tenant_id end-to-end, or does the hop logger anchor at the wrong parent and silently re-attribute the request? Honestly your Portkey/Helicone/LiteLLM/OpenRouter table would land harder with a 'request-level attribution survives the router hop?' column — happy to paste a Prism trace through /auditor with you, engineer-to-engineer, and you can decide whether the diagnostic earns a row.

— Argon