Why We Replaced GraphQL Subgraphs with gRPC Services in WunderGraph Cosmo

#grpc #graphql #api #futureoffederation

When teams adopt GraphQL Federation, they usually split their API into Subgraphs. Each Subgraph speaks GraphQL, and the Router stitches them together at runtime.

But after years of working with Federation, we kept hitting the same limits…Type safety, performance, and the bottlenecks of Subgraph frameworks.

So we asked:

What if Subgraphs didn’t need to use GraphQL at all?

We built a new model in WunderGraph Cosmo. The Router still speaks GraphQL at the edge, but Subgraphs now run as gRPC services behind the scenes.

Here’s how it works and why we made the switch.

GraphQL Subgraphs Look Clean, But Aren’t Type Safe

A Subgraph might start simple:

type User @key(fields: "id") {
  id: ID!
  name: String!
}

But when the Router queries that Subgraph, it sends this:

{
  "representations": [
    { "__typename": "User", "id": "1" },
    { "__typename": "User", "id": "2" }
  ]
}

The _entities field uses a list of _Any. There’s no compile-time check that your resolver handles the shape correctly. You might not catch issues until runtime or production.

By switching to gRPC, we eliminate this risk. Type mismatches fail at codegen, not later.

Subgraphs Still Leave You With N+1 Problems

GraphQL has a known issue with N+1 queries. Subgraph frameworks don’t solve this for you; you’re expected to implement a data loader yourself.

That works fine for small services, but it doesn't scale. Every team ends up solving the same problem, again and again.

In Cosmo, the Router already handles batching and query planning. If your Subgraphs are gRPC services, they receive batched requests out of the box without needing custom logic.

GraphQL Adds Overhead That gRPC Skips

Every Subgraph written in GraphQL has to parse, validate, normalize, and execute the incoming request. That’s a lot of work for a small data fetch.

Even with an optimized Router, performance can tank if a Subgraph is implemented in a slower language or framework.

With gRPC, we bypass all that. The Router just sends a single request. There is no parsing or dynamic execution. That saves CPU, latency, and developer effort.

Apollo’s Gatekeeping Slows Everyone Down

One of the biggest issues in the Federation ecosystem is Apollo's control over the Subgraph spec and its frameworks. You can’t ship new features until they do. Even when they do, it takes time for other frameworks to catch up.

With gRPC, we’re free to move faster. New Router features are immediately usable by any service that supports gRPC—no framework rewrites required.

From GraphQL SDL to gRPC in Practice

You still write a GraphQL Subgraph SDL:

type User @key(fields: "id") {
  id: ID!
  name: String!
}

But instead of implementing that in a Subgraph framework, we compile it to this:

syntax = "proto3";
package service;

service UsersService {
  rpc LookupUserById(LookupUserByIdRequest) returns (LookupUserByIdResponse) {}
}

message LookupUserByIdRequestKey {
  string id = 1;
}

message LookupUserByIdRequest {
  repeated LookupUserByIdRequestKey keys = 1;
}

message LookupUserByIdResponse {
  repeated User result = 1;
}

message User {
  reserved 2 to 3;
  string id = 1;
  string name = 4;
}

The repeated field handles batching. The reserved tags prevent field conflicts when evolving the schema. It’s fast, reliable, and strictly typed.

You can register the service like this:

subgraphs:
  - name: users
    routing_url: localhost:4011
    grpc:
      schema_file: ./schema.graphql
      proto_file: ./generated/service.proto
      mapping_file: ./generated/mapping.json

No runtime SDL parsing. No entity resolution bugs. Just fast, typed calls between the Router and backend services.