In this series of blogs, will be sharing my learnings, while on the journey towards Apollo GraphQL as the gateway and BFF layer.
Pratilipi is India’s largest digital platform connecting readers and writers in 12 Indian languages.
Having around 2.5M+ stories published by 250K+ authors, and with 20M monthly active users, our services serve around 1.5 Billion requests in a day. Due to this, scalability is an integral part to any architectural solution we build here at Pratilipi
Before diving into the aspects of the N+1 problem, let’s review how GraphQL queries are handled on the server side using resolvers.
GraphQL Resolvers
GraphQL Server needs to know how to populate data for every field in your schema so that it can respond to requests for that data. To accomplish this, it uses resolvers.
A resolver is a function that’s responsible for populating the data for a single field in your schema. It can populate that data in any way you define, which for us are mostly downstream micro-service HTTP REST calls
N+1 Query Problem : The Achilles Heel
The n+1 problem occurs when processing a query involves performing one initial request to a data source along with N subsequent requests for fetching all the data fields requested by the query.
In order to understand this more clearly, let’s take a look at an example.
Suppose we have a schema which establishes a relation between a Book and an Author type :
Imagine a user trying to get the top 100 books in the platform, and their corresponding authors using the following query :
The resolvers in the server for this query will look similar to this :
While this query is being resolved, there will be an initial request that goes to the book service to get the top 100 book details. For the sake of simplicity, lets assume that the below mentioned url gives us that book list :
http://internal.pratilipi.com/books?sort=top&limit=100
Now, if we observe closely, we can figure out from the resolvers that for each book present in the list, the author is resolved independently. For the sake of simplicity, lets assume that the below mentioned url gives us a particular author data:
http://internal.pratilipi.com/authors/{authorId}
When we execute the above mentioned query to get the top 100 books and their corresponding author names, we first make a single call to retrieve 100 records of books from the book service, and then for each book, we make another call to the author service to fetch the author details for the given the author ID.
In total, 101 requests are being made to the internal services to resolve this single query. And this is what the N+1 problem is all about.
Dataloader :
Dataloader was designed by Facebook with this specific use case in mind. Essentially what it does is to wait for all your resolvers to load in their individual keys (in our example use case, all the author ids). Once it has them, it hits the downstream service once on a batch endpoint with the keys, and returns a promise that resolves an array of the values. In essence, it batches our queries instead of making them one at a time.
Drawbacks of a batching solution :
- Support: Most REST APIs don’t support batching. If they are build in a CRUD way with basic end points, the batch API is often left out of implementation.
- Under utilised cache: Using a batched endpoint jeopardises caching. When you fetch data in a batch request, the response you receive is for the exact combination of resources you’re requesting. Unless you request that same combination again, future requests for the same resource won’t be served from cache.
- Cache Invalidation: Even if we manage to cache some meaningful batches, it has to be invalidated when even one of the resource in the batch is mutated. What makes it is even more harder is figuring out all the batches the resource was a part of, for invalidation.How we solved this problem at Pratilipi with an intelligent caching layer will be explained in depth in the upcoming blog.
Caching with Dataloaders
To give a deeper understanding of under utilisation of cache in Dataloader, let’s take a use case.
In case of Pratilipi, we have around 2M daily active users. If we assume that each of these users end up calling the above mentioned query once, the total combined requests for this query will be 2M times in a day.
The stats for number of requests to the downstream services will be as below:
Total downstream : 2M * 101 = 202 Million
Book service : 2M (Top 100 for each query)
Author service : 200M (Authors for 2M*100 books)
Author service (With Dataloader): 2M (Authors for 2M sets of books)
Now let’s take the scale of Pratilipi in picture. We have around 250K unique authors writing for us.
If we go by the data loader approach, the number of requests to the author service is 2M. In comparison to the 200M, this seems a really good bargain. But if we dig deeper into the numbers, we can see that the total number of combinations in which 100 authors can be picked from a set of 250K authors is 250C₁₀₀ = 6.5*10³⁸¹. So caching these 2M requests will not he helpful, as these 2M author combinations are just a subset of the possible combinations, and the cache hit ratio (exact same combinations happening again) is near to zero.
Hence, having a usable caching layer is a distant possibility while using dataloaders.
About Author :
Roshan Alexander, Engineering Manager @Pratilipi
Passionate about anything that is related to usage of technology for making the world a better place to live in
A hardcore programmer, having several years of experience in architecting highly scalable and robust distributed systems for product startup companies.
You can reach out to me in Linked-In
Top comments (0)