Portkey Alternative: I Switched Away from Portkey. Here's the Honest Reason Why.

#llm #ai #devops #mlops

TL;DR

Portkey is genuinely good for small-to-mid teams shipping fast — the DX is excellent, setup is minimal, and the observability dashboard is among the strongest for single-team use
Palo Alto Networks completed the acquisition on May 29, 2026. Portkey is now Prisma AIRS's AI Gateway. What that means for the developer-first roadmap is an open question, and it's a real factor for any long-term infrastructure decision
TrueFoundry is more to set up but covers the full lifecycle — routing, MCP governance, model deployment, data sovereignty — which is where we kept hitting Portkey's ceiling

We adopted Portkey about eighteen months ago. At the time it was the obvious call: our team was six people, we were routing to OpenAI and Anthropic, and we needed observability fast. Portkey was running in an afternoon. The dashboard was clean. Cost tracking worked. Prompt versioning was a genuine quality-of-life improvement. I'd have recommended it to anyone in that situation without hesitation.

Then things changed. The team grew to forty engineers across four squads. Two squads started using self-hosted models. A security review asked for per-team cost attribution and a full audit trail. We added agent workflows that needed MCP governance. And in April, Palo Alto Networks announced they were acquiring Portkey.

At each of those inflection points, we bumped into something Portkey didn't handle the way we needed. This post is the honest account of what those were — and why we eventually switched.

What Portkey does really well

Before getting into the friction, it's worth being clear about where Portkey is genuinely strong — because the "obvious better choice" framing is only useful if it's grounded in real tradeoffs.

Developer onboarding is pretty good. Three lines of code and you have routing, retries, fallback, and observability. No config files, no YAML, no infrastructure decisions. For a team that needs to be in production this week, that matters enormously.

The prompt management UI works well. Versioned prompts, labeled deployments, a playground that actually works — this is the kind of tooling that makes the difference between prompt iteration being a discipline versus a guessing game. Portkey's Prompt Engineering Studio is genuinely ahead of most alternatives here.

LLM-level observability is clean and actionable. Cost per provider, per model, per API key. Token usage over time. Latency distributions. It's not flashy but it's exactly what you need when you're trying to understand what your LLM usage is actually costing.

Reliability features work. Automatic retries, fallback chains, load balancing across providers — these all worked reliably in production.

If your situation is: one or two teams, routing to external providers, no compliance requirements, no self-hosted models, no MCP governance — Portkey is probably the right tool. Stop reading and go set it up.

Where we kept hitting walls

Wall 1: Environment isolation

This was the first thing that caught us. We had dev, staging, and production all pointing at the same Portkey workspace. Fine when you're small. Once we had four squads doing concurrent experiments, routing config changes in dev started bleeding into staging metrics. We couldn't cleanly separate observability by environment.

Portkey's workspace model is good for a single team. It's not designed around the idea that multiple environments need hard isolation at the config and observability level. We ended up running three separate Portkey accounts, which meant three billing relationships, three sets of API keys to manage, and no unified view across environments.

With TrueFoundry, workspace scoping is Kubernetes-namespace-backed — environments are physically isolated, not just logically separated in a shared dashboard. One control plane, clean separation. That's the architecture we needed.

Wall 2: Cost attribution across teams

Around month four, our head of engineering asked a reasonable question: "Which team is spending the most on LLM calls, and on what models?" We couldn't answer it cleanly.

Portkey does per-key and per-workspace budget tracking, but cross-team attribution requires either separate workspaces (back to the three-account problem) or careful key management discipline that breaks down as teams grow. Budget enforcement is also reactive — you get alerted after you've hit a limit, not before. We had one incident where an agent workflow hit a token spike over a weekend and ran up a bill before the Monday alert fired.

TrueFoundry enforces budgets on the hot path, not as a post-spend alert. Cost attribution runs by team, user, model, and application simultaneously — including for self-hosted models, which Portkey has no visibility into at all. When we moved two squads to self-hosted Llama, their costs became completely invisible to Portkey. That's not a gap you can work around.

Wall 3: The compliance audit

This was the one that really forced the issue. Our security team ran a compliance review and asked two things: "Can you prove that LLM traffic from the finance squad never left our network?" and "Can you produce a per-request audit trail for MCP tool calls from the last 90 days?"

Portkey's hybrid VPC mode genuinely keeps inference payloads in-network — I want to be clear that this is real, not marketing. But the control plane — the dashboard, guardrail configuration, analytics aggregation — remains in Portkey's cloud. Our security team's position was that control plane residency matters as much as data residency. If an attacker compromises the control plane, they can change guardrail rules even if they can't see the prompts. That's not a position I'd have anticipated, but once they explained it, it's hard to argue with.

TrueFoundry runs the entire hot path — auth, rate limiting, guardrails, traces — inside our Kubernetes cluster with no external dependencies. The architecture docs describe this pretty clearly: in-memory auth, config synced via NATS, OTEL traces exported to our own backends. Nothing leaves the cluster unless we tell it to. For the compliance team, that was the distinction that mattered.

On the MCP audit trail question: at the time, Portkey's MCP guardrails were still in early access, and custom tool-call validation was via an adjacent webhook path rather than a native policy engine. We couldn't produce the per-request trace with user attribution that the audit required.

Wall 4: Self-hosted models

When two of our squads moved to self-hosted Llama models on our own GPU infrastructure, we needed the gateway to route to those endpoints the same way it routed to OpenAI. Not a separate system — the same observability, the same cost attribution, the same RBAC.

Portkey doesn't host models and has no visibility into self-hosted model infrastructure. You can point it at a custom endpoint, but GPU utilization, container logs, pod health — none of that is accessible. When a self-hosted model started OOM-crashing under load, the debugging path was: notice the errors in Portkey's dashboard, switch to a completely different observability stack to diagnose the infrastructure, fix the issue, come back. Two tools, two contexts, one problem.

TrueFoundry handles both the gateway and the compute layer. When something breaks, I can see the request error in the same UI where I check the pod logs and GPU memory. That's not a minor convenience — it cut our mean time to resolution for model infrastructure incidents from hours to minutes.

The acquisition question

On April 30, 2026, Palo Alto Networks announced they were acquiring Portkey. The deal closed May 29. Portkey is now the AI Gateway component of Prisma AIRS, their enterprise security platform.

I want to be careful not to be unfair here. Palo Alto Networks said they'll continue supporting existing Portkey customers, and the acquisition does make sense architecturally — an AI gateway is a natural adjacency for a security platform trying to govern agentic AI.

But here's the practical question we had to ask: is the product we're betting our AI infrastructure on going to keep prioritizing developer-first AI gateway features, or is it going to be prioritized around security platform consolidation for Prisma AIRS customers?

That's not a rhetorical question. I genuinely don't know the answer. What I do know is that when you're making a 2-3 year infrastructure commitment, "the roadmap might shift toward a security vendor's priorities" is a risk that didn't exist six months ago. We decided it was a risk we weren't comfortable with for a piece of core infrastructure. That might be overcautious — I wouldn't fault a team for staying on Portkey if it's working for them. But it was the final factor in our decision.

Where TrueFoundry was better suited for us

I'm not going to pretend TrueFoundry is strictly better in every dimension. It's not.

TrueFoundry is Kubernetes-native. If your team isn't already running Kubernetes or doesn't have platform engineering capacity, that's might be new to adapt. Portkey's three-line setup isn't just marketing - it reflects a lighter deployment model.

For pure API routing to external providers, Portkey is faster to get value from. If MCP governance, self-hosted models, and full data sovereignty aren't in your near-term picture, TrueFoundry's additional surface area is overhead you don't need.

What TrueFoundry does meaningfully better:

Full data sovereignty with everything running inside your VPC, control plane included
Per-team cost attribution across both external APIs and self-hosted models, enforced on the hot path
MCP governance with tool-level RBAC, pre/post guardrail hooks, Virtual MCP Servers, per-request audit trail
Unified observability from request trace down to pod logs and GPU utilization
Routing and deployment in one system, so migrating from OpenAI to a self-hosted model doesn't require a new platform
Roadmap driven by AI infrastructure, not security platform consolidation

The honest version: if you're a team that will stay in "routing to external providers" territory indefinitely, Portkey is probably the right call for now. If your needs include self-hosted models, regulated data environments, MCP governance, or multi-team cost attribution, we hit those walls and they were real.

What's your experience been? Curious whether others on Portkey are watching the Prisma AIRS integration and deciding what to do — or whether most teams are comfortable staying put and waiting to see how the roadmap develops. Drop it in the comments.