DEV Community

Cover image for The mysteries of GraphQL clients' cache - The Introduction
Andrej Szalma
Andrej Szalma

Posted on

The mysteries of GraphQL clients' cache - The Introduction

Recently I completed an internship at Microsoft where I had the privilege to work with people who are experts in their fields, more specifically in the field of GraphQL. I had numerous opportunities to learn from them and now I would like to teach you something. Let me take you on a journey through the midst of GraphQL clients & their cache.

This is a two-part series, where I'll talk about GraphQL client caches and compare the client's cache performance.

Part 1: The mysteries of GraphQL clients' cache - The Introduction
Part 2: The mysteries of GraphQL clients' cache - The Showdown


"There are only two hard things in Computer Science: cache invalidation and naming things."
-- Phil Karton

As every blog post that even just remotely mentions the subject of caches, I too have felt the need to include this infamous quote. Regardless if we are going to speak about cache invalidation or not, this quote always brightens up the mood a little bit.

A quick intro to GraphQL clients

The conventional (but not only) way to use GraphQL is as an API that communicates through HTTP POST requests. Because the response of a GraphQL API is a bit more complex than a simple REST response, libraries have been built to make your life easier when communicating with such APIs. Libraries such as Apollo client, Relay, URQL, etc., help you automatically handle things like batching, caching, constructing queries, managing UI state and much more.

With GraphQL becoming ever-so-popular, the support and development for these have skyrocketed in the past few years. When starting a new project, there is plenty to choose, but the question is, which one is the best for you?

Caching - not the geo one

As "hinted" in the name of this post, our main focus today is going to be caching. Caching in GraphQL, as anywhere else, is used to save data received from some data layer (eg. server) so the next time our application needs that data, it can read it from our cache memory instead of doing another network request to our data layer. This way, we can save precious resources and minimize client load times. The above-explained type of cache can also be called an In-Memory cache.

However, implementing and maintaining a cache is not as easy as it sounds and every client has their way of doing it. I have researched mainly two popular clients - Apollo client and Relay.

Before I start speaking about this thought, let me explain what is data normalization, and why is it important to us.

Data normalization

Data normalization is the process of restructuring some data to reduce redundancy. You might have heard about this from relational databases, where you have 5 normal forms to get through before you can even start thinking about being happy.. :) Both clients have a normalized cache (which is the de facto standard now) and they need to perform this process to convert the JSON blob they receive from the GraphQL server into a relational structure. The algorithm used for normalization varies between them, however, the principles remain the same.

Apollo client

As a GraphQL client - Apollo is the more simplistic and flexible of the two. It provides an easier way of getting started, a bit more comprehensive documentation, and perhaps better community support. However, when it comes to cache implementation, it is, unfortunately, lacking behind.

However, before we get to compare the performance of the two, let me explain how Apollo's cache works. As mentioned previously, they are using an In-Memory cache and this consists of two main parts - EntityStore and ResultCache.

EntityStore is the main cache which holds normalized data in a flat lookup table, therefore when data is read from it, it needs to be de-normalized.

ResultCache has been introduced to help with the denormalization problem. Therefore, when the first read of a query is executed against the EntityStore the de-normalized data is memoized in the ResultCache and this then makes all upcoming reads of the identical query very fast. However, this comes with the overhead of having to write into the ResultCache on every first read operation on some query that has not yet been memoized. (The match has to be 1:1 exact here)

Normalization algorithm

As previously mentioned, Apollo maintains a normalized In-Memory cache by default, and now I'd like to explain in a few words how their normalization algorithm works.

Firstly, let's say we have a GraphQL query to get all the users, which looks like this:

query getAllUsers {
  users {
    id
    firstname
    lastname
  }
}
Enter fullscreen mode Exit fullscreen mode

When a GraphQL response reaches the client, it comes in a JSON format, looking something like this:

{
  "data": {
    "users": [
      {...},
      {...},
      {...},
    ]  
  }    
}
Enter fullscreen mode Exit fullscreen mode

The contents of the data object are the actual response, therefore from now on, we will only work with what is inside of data.
Now, the first step to normalize this data would be to split our users Array into single objects.

{
  "__typename": "User",
  "id": 1,
  "firstname": "John",
  "lastname": "Doe"
}

{...}

{...}
Enter fullscreen mode Exit fullscreen mode

Notice, that a __typename field has been added to our response, even though we have not requested it in our query. This is because Apollo client requests this field automatically, even if you don't explicitly do so. In the next step, you will see why.

Now that we have extracted all our objects, we can perform another step of normalization, which would be creating a globally unique identifier for every object, so that it can be saved in a key-value type lookup table (Hashmap). By default, Apollo client uses a composite key made of the __typename + id. Now you see, why Apollo client requested the __typename field automatically. However, not all objects in our response always have a unique id which could be in the composite key, therefore Apollo gives us the possibility to choose which fields we want to use for creating this unique identifier key using the typePolicies setting in the InMemoryCache config.

Once that we have created the unique identifier for an object, we need to look into it and find any nested objects. Should there be any, they will need to be extracted, and assigned with a unique identifier key, which will then be placed in their original position as a reference. See the following example:

{
  "__typename": "User",
  "id": 1,
  "firstname": "John",
  "lastname": "Doe",
  "address": {
    "__typename": "Address",
    "id": 1,
    "line1": ...,
    ...
  }
}

// The cache lookup table would look something like this after the normalization of the above object.

{
  "User:1": {
    "id": 1,
    ...
    "address": {
      "__ref": "Address:1"
    }
  },
  "Address:1": {
    "id": 1,    
    ...
  }
}
Enter fullscreen mode Exit fullscreen mode

However, it is important to mention, that should these be simply scalar fields, let's say an array of objects, which are not a GraphQL type, then these will not get extracted from the object.

And now, you know the overview of how Apollo performs normalization, not that hard, right? :)

Next, let's jump over to our friend Relay.

Relay

As previously mentioned, Apollo is the more simplistic and versatile one, whereas Relay is the more optimized and narrow-scoped one. What I mean by this, is that whereas Apollo Client has frameworks for multiple languages, Relay was made specifically for React and is highly optimised.

However, when it comes to cache, they are not that different from their core implementations. If you forget about Apollo's ResultCache for a minute, then they are indeed very similar. However, since Relay is based on granular fragments instead of whole queries as Apollo is, it does not need anything such as a ResultCache. This is why Apollo client had to add another layer of complexity to their InMemoryCache to optimize for second reads. (Read re-renders of whole queries are very frequent in apollo, and therefore the memoization process performed by the ResultCache helps to speed this up)

Store is the one source of truth for an instance of RelayRuntime which holds a collection of entities presented by the RecordSource type. These are (as before) a collection of normalized records belonging to a single query/mutation/etc. The query goes through a normalization process and its entities are extracted and saved into Records which are then all collected in one RecordSource. The RecordSource object is then merged into the Store and subscribers (observers) to a fragment that was affected are notified.

Normalization algorithm

Since I have provided an in-depth explanation of the normalization algorithm in Apollo, I am not going to do the same here, as they are similar in their nature. However, one thing, that I believe is worth mentioning, is how they choose their unique identifiers for normalized records.

As you already know, Apollo client uses the composite key of __typename + id fields and only extracts nested objects if they have GraphQL types (not scalars). However, Relay, on the other hand, takes a different approach, where every nested object, is extracted into a Record and is assigned a DataId. This is a globally unique identifier in the scope of the cache and it can be made from the id field of the objects, or if they don't have such fields, it can be based on the path to the record from the nearest object with an id (such path-based ids are called client ids). Thanks to this logic, even nested scalar objects are always extracted and normalized.

Next up

Now that I have explained the basics of how GraphQL client cache works, I would like to show you a showdown between the clients, comparing their caches head-to-head. Use this link to read more about that in Part 2 - The mysteries of GraphQL clients' cache - The Showdown.

Top comments (0)