DEV Community

Cover image for Cosmo Router: High Performance Federation v1 & v2 Router / Gateway
Stefan  🚀
Stefan 🚀

Posted on • Originally published at wundergraph.com

Cosmo Router: High Performance Federation v1 & v2 Router / Gateway

Launch Day 3 is all about routing federated GraphQL APIs.

Welcome back to the third day of our launch week.
Today we're launching Cosmo Router, a high performance GraphQL API Gateway / Router compatible with Apollo Federation v1 and v2.

We've been in the business of building GraphQL API Gateways for more than five years now.
When it came to building a Federation compatible Gateway, we were facing a couple of challenges which ultimately led to a refactor of some parts of our GraphQL Query Planner and Execution Engine.

As a result, we're happy to announce the Cosmo Router.

Like all our API Gateways, it's written in Go and we've just open sourced it under the Apache 2.0 license.

You can find the source code on GitHub in the Cosmo Monorepo.

Why Cosmo Router

The WunderGraph Cosmo stack aims to provide an open source drop-in replacement for Apollo Federation / Apollo GraphOS.

For some companies, it's not possible to use non-OSI approved licenses, especially for critical infrastructure like API Gateways.

Others are afraid of vendor lock-in or simply cannot use cloud services for compliance reasons.

For all these companies, we've built Cosmo and the Cosmo Router.

Dataloader 3.0 - Why Cosmo Router is so fast

Cosmo Router is built on top of graphql-go-tools, a high performance GraphQL engine written in Go.

We've been working on this engine for more than five years now and it's the foundation of all our GraphQL products.

In order to fully support Federation v2, we heavily refactored parts of the Query Planner,
deprecated our existing Dataloader implementation and completely rewrote it from scratch with a new algorithm.

As you can see in the following benchmark, the new Dataloader implementation gives Cosmo Router a 3.3x performance improvement over the previous implementation (WG Gateway).

Please take these numbers with a grain of salt, as they are only a snapshot of a specific benchmark.

Different benchmarks will yield different results and we're by no means claiming that Cosmo Router is faster than other Gateways.

However, we're confident that our new Dataloader implementation is more efficient and resource friendly than the previous implementation.
The new algorithm is using a lot less concurrency and works especially well with nested lists of entities.

Federation Router Comparison<br>

That said, performance was not the reason for the rewrite.

Implementing directives like @shareable and @requires gave us a lot of headaches with the old implementation.

The new Dataloader implementation is conceptually much more complex,but the implementation is much simpler and easier to understand.

As such, the new Cosmo Router is not only faster, but most importantly, it's easier to maintain and extend.

If you want to learn more about the details, we will give an in-depth presentation of how Cosmo Router works under the hood at the GraphQL Conf in San Francisco Sep 21.
I'd love to see you there.

Ludicrous Mode

In addition to the new Dataloader implementation,
we've also added a new feature called Ludicrous Mode.

The benchmark below shows the performance of Cosmo Router with and without Ludicrous Mode.
It's important to note that Ludicrous Mode is a theoretical optimum and will usually not be reached in production, which is why we've put it in a separate diagram.

Federation Router Comparison (with Ludicrous Mode)<br>

Ludicrous Mode is a feature that allows Cosmo Router to send origin requests that are read-only using single-flight.

This means that if an origin request is read-only, e.g. a Query, and two or more requests in-flight are exactly the same, Cosmo Router will only send one request to the origin and share the result with all in-flight requests.

Ludicrous Mode still requires the engine to process each and every client request, but depending on the outgoing traffic, it can dramatically reduce the load on the origin.

This feature is especially useful for requests that are executed very often and are read-only,
like resolving nested lists of entities.

Compatibility with Apollo Federation v1 and v2 as well as Open Federation

As we've announced previously, we've open sourced Open Federation
which is a specification for building federated GraphQL APIs.

Open Federation is compatible with Apollo Federation v1 and v2, hence Cosmo Router is compatible with Apollo Federation v1 and v2 as well as Open Federation.

This means, you can use Cosmo Router as a drop-in replacement for Apollo Gateway and Apollo Router.

Open Telemetry (OTEL) Instrumentation

Understanding your API traffic is a key concern for running APIs in production, especially when it comes to federated GraphQL APIs.

By definition, federated GraphQL APIs are distributed systems, which means that you need to understand the performance of each individual service as well as the overall performance of the system.

You want to get insights into the performance of the gateway, all subgraphs, and other systems the subgraphs depend on.

This way, you can identify bottlenecks and optimize your system.

To satisfy this need, we've added a whole stack of Open Telemetry (OTEL) tools to the Cosmo stack. As the Router is the entry point to your API stack, we're instrumenting it with OTEL and send metrics to the Cosmo OTEL Collector to give you an end-to-end view of your API traffic on Cosmo Studio.

That said, because it's OTEL, you can also send the metrics generated by Cosmo Router to any other OTEL compatible backend like Jaeger, Datadog, or others.

This way, you can easily integrate Cosmo Router into your existing monitoring stack.

More info on Distributed Tracing can be found in the Docs.
If you want to instrument your own subgraphs and other services, please follow this guide.

Prometheus Metrics

In addition to OTEL, it's also important to have metrics.

As our goal is to build on top of existing standards, we've added Prometheus metrics to Cosmo Router.

We're using the RED Method to provide you with the most important metrics.

What's next?

For the future, we're working hard on adding compatibility with Federation 2.x features,
If you find any bugs or have feature requests, please open an issue on GitHub.

Conclusion

Alright, that's it for today.
Now have a look at the GitHub Repository and give it a ⭐️.

If you want to learn more about Cosmo, check out the documentation.

Top comments (0)