Stefan 🚀

Posted on Jun 4

Scaling GraphQL Federation? Here’s When OSS Starts to Hold You Back

#webdev #opensource #graphql #cosmooss

Starting with open source makes sense. It’s fast, flexible, and familiar. For teams adopting GraphQL Federation, Cosmo OSS offers one of the cleanest on-ramps out there. It runs locally, works out of the box, and doesn’t force you into a control plane.

But there’s a point where DIY federation hits its limits.

You go from evaluating architecture to running a production graph. From local tests to handling real traffic. From one team to many. And when that shift happens, OSS starts asking for more than it gives back.

This post covers when and why teams graduate from Cosmo OSS—and what Cosmo Enterprise offers that makes scaling federation sustainable.

Cosmo OSS: The Right Tool for Starting

cosmo-router and the wgc CLI are often all you need to get federation off the ground. You can self-host, deploy anywhere, and experiment without lock-in.

You get:

Federation v1 and v2 support
Apache 2.0 license
No mandatory hosted control plane
A Helm chart that just works
Schema composition and routing in one binary

It fits neatly into existing stacks—Postgres, Redis, observability, you name it. That’s why OSS is a great choice for internal testing, prototyping, or early rollout.

Why “Free” Isn’t the Same as “Supported”

Cosmo OSS is open source—not a managed service. There’s no SLA, no onboarding, and no guarantee of prioritization.

“Open source means you can use it. It doesn’t mean I’ll teach you how to use it.”
— paraphrased from The Good Thing, Episode 16

Maintainers contribute software—not labor. If you’re using OSS in production, it’s your responsibility to maintain, secure, and scale it.

Even helpful pull requests come at a cost. Every line merged becomes a line maintained. If you're hitting compliance targets, managing incidents, or deploying multi-team graphs, that tradeoff gets expensive fast.

What Cosmo Enterprise Adds

Cosmo Enterprise starts with the same OSS core, then adds everything you need to run Federation with confidence:

Governance and schema safety

Version-aware schema registry
Composition checks at every stage
Contracts for limiting field exposure
Field-level usage metrics to drive safe refactors

Built-in observability

Distributed tracing across federated services
Field-level analytics powered by ClickHouse
Native OpenTelemetry support
Real-time insights across environments and regions

Enterprise-grade security

Self-hosted router—data never leaves your environment
Only anonymized metadata is shared
Blocks OWASP Top 10 GraphQL threats
SOC 2, HIPAA, RBAC, audit logs, SSO—all built-in

High performance and direct support

Low-latency router built in Go
CDN-backed config updates
Handles billions of requests/month
Access to engineers who built the platform

Deployment Options That Match Your Risk Model

Every Cosmo Enterprise deployment shares one key principle: you deploy the Router.

1. Cosmo Cloud (Managed):

WunderGraph manages the control plane: schema registry, Studio, observability, usage analytics, and backend services like Kafka, Redis, and ClickHouse. You need to deploy the Router, but nothing else is required of your team.

2. Hybrid:

You host the Router inside your own infrastructure while everything else is handled by WunderGraph. Only anonymized metadata is sent to the managed control plane. All query and response data stays entirely within your systems. This model is common in regulated industries where data locality is non-negotiable.

3. Self-hosted:

Everything runs in your environment: the Router, schema registry, observability stack, Studio, and access management. It’s the most operationally intensive path—but also the most flexible. Supports air-gapped, sovereign cloud, and fully isolated deployments.

You’ve Outgrown OSS If...

Schema merges feel risky
You’re flying blind on subgraph usage
You’re maintaining ClickHouse, Redis, Postgres, and backups yourself
Your team owns observability and federation and platform tooling
You’re prepping for SOC 2 or HIPAA and don’t have RBAC or audit logs
Query latency is creeping up and you don’t know why

If any of that sounds familiar, OSS is probably costing more than it’s saving.

Migration Doesn’t Require a Rewrite

The best part? Moving from OSS to Enterprise is incremental.

Same wgc CLI
Same schema definitions
Same subgraphs and SDLs
Same Helm deployments
Routers stay live during transition
Registry changes pull from the same CDN

You can even integrate Cosmo Cloud with an existing OSS Router to unlock analytics and governance without touching your data path.

Cosmo is Apollo Federation–compatible too. Migrating from another gateway? You can reuse project structure, SDLs, and composition logic—just swap the router.

Real Teams, Real Reasons

eBay partnered directly with WunderGraph to scale OSS in a fully self-hosted model across their data centers.

SoundCloud reduced infra costs from $14K to $9.7K/month and dropped compute usage by 86%—just by switching to Cosmo Enterprise.

kHealth achieved HIPAA compliance with a hybrid deployment.

On The Beach unblocked schema coordination across teams by using Studio, contracts, and analytics.

Every one of them hit different limits—but solved them with the same foundation.

TL;DR

Start with OSS.
Scale with Enterprise.
Don’t wait for fragility to become failure.
OSS is where you prove out federation.
Enterprise is where you protect it.
If you're ready to move beyond the basics, read the full post here:

👉 When to Migrate from Cosmo OSS
Or get in touch to talk migration, deployment models, or performance at scale.

DEV Community