Gabriel Nordeborn

Posted on Apr 16, 2020

The magic of the Node interface

#graphql #relay

This series of articles is written by Gabriel Nordeborn and Sean Grove. Gabriel is a frontend developer and partner at the Swedish IT consultancy Arizon, and has been a Relay user for a long time. Sean is a co-founder of OneGraph.com, unifying 3rd-party APIs with GraphQL.

The official GraphQL website recently added a section called “Global Object Identification” to its collection of best practices. In this article, we’ll dive into what global object identification is, why it’s useful and what type of developer experience it enables.

What is Global Object Identification?

Global Object Identification in GraphQL is about two things:

Having each individual object in the graph be identifiable by a globally unique ID, across types. So no two objects can have the same ID, even if they're of different types.
Being able to query any single object that has an id only by that id. This is done via the Node interface which we’ll talk more in depth about below.

This enables libraries to safely automated away several things that are otherwise left up to the developer to implement and manage themselves:

A GraphQL framework can automatically update the cache of any object since all objects have a unique ID.
If we ever need to fetch new or refreshed fields on an object, all we need to know is its GraphQL type and its id.

We’ll go deeper into why these two points enable a great developer experience below.

What’s the `Node` interface anyway?

In schemas implementing the Node interface, you’ll see a top level field on the Query object called node, and it takes a single argument: id, and it returns a single field: id .

Wat?

When you first look at it, it’s indeed a bit strange!

    query NodeInterfaceExample1 {
      gitHub {
        node(id: "MDQ6VXNlcjM1Mjk2") {
          id # As one might expect, this is indeed "MDQ6VXNlcjM1Mjk2"
        }
      }
    }

But because node is an interface, it implements every GraphQL object in the schema that can be fetched via its id:

    query NodeInterfaceExample2 {
      gitHub {
        node(id: "MDQ6VXNlcjM1Mjk2") {
          id
          ... on GitHubUser {
            login # "sgrove"
            followers {
              totalCount # it's over 9000!!!!
            }
          }
        }
      }
    }

But why is it useful?

That means that if we know the id and the GraphQL Type of a thing, then we always know how to look it up. The alternative might be something like:

     query WithoutInterfaceExample {
      gitHub {
        user(login: "sgrove") {
          login
          followers {
            totalCount
          }
        }
      }
    }

There are two challenges with this query that will mean lots of extra work for us as developers, but that means extra chances for us to get things wrong:

Finding a GitHubUser this way doesn’t accept its id, it only accepts a login argument. That means if we ever had, say, an ownerId value, we still need to know the *login* value. This gets trickier and trickier because every kind of object might be looked up by different keys.
Even if we had the login value, we have to build a potentially complicated query with nested selections (gitHub.user in this case) to get the extra fields we want.

Ultimately, both those bits of knowledge are implicit - we know them as developers, we have that knowledge in our head - and that means the computer and our tooling doesn’t know it.

But if our tooling does know how to look up any object given its id, then it can help us in wonderful ways.

This will also relieve some pressure on the backend. Whenever you need to refetch a node that’s deeply nested in a query, the backend will need to resolve all levels above the node in order to get to it. If it’s able to just take the id and resolve a single node directly though, that added pressure goes away.

Why globally unique ids? Aren’t integers good enough?

So with the node interface we can look up any object by its id - great! But that comes with an implication: No two objects in our entire API can have the same id!

Quick aside: This seems daunting for just a moment at first, but there's an easy way to do this for any schema that we’ll see in just a moment

That means that every GraphQL object that has an id field (say, User.id or Post.id) must have a unique ID that no other GraphQL Object has. Take this query for example:

    query FirstUserAndFirstPost {
      user {
        id
      }
      post {
        id
      }
    }

It’s common for lots of databases to use incrementing integers as ids, so you might expect a response like:

    {
      "data": {
        "user": {
          "id": 1
        }
        "post": {
          "id": 1
        }
      }
    }

But combined with the node interface, this just won’t work. Say we have a post id and we want to get some additional fields (like its title). We should be able to run this query:

    query NodeInterfaceExample3 {
      node(id: 1) {
        id
        ... on Post {
          title
        }
      }
    }

But there are two objects that could be looked up with the id of 1! It’s now unclear if the object that will come back will be a User or a Post.

Don’t lose heart! A simple solution is at hand!

So if we want to internally keep using integers for our ID, how can we have globally unique IDs? Let’s look at the example ID used in the GitHub (whose API is beautiful and Relay compatible as well!) query: node(id: "MDQ6VXNlcjM1Mjk2")

    > atob("MDQ6VXNlcjM1Mjk2")
    "04:User35296"

Hey, I see some integers in there!

Basically, the technique is to construct a nodeId, and it looks like:

    `base64Encode(${apiVersion}:${GraphQLObjectTypeName}${ObjectId})

So for our post and our user on version 1 of our API, we would expect to see:

    "MDE6UG9zdDE=" # decodes to "01:Post1"
    "MDE6VXNlcjE=" # decodes to "01:User1"

If we revisit our previous node example that had a conflict and use our new ID:

    query NodeInterfaceExample4 {
      node(id: "MDE6UG9zdDE=") {
        id
        ... on Post {
          title
        }
      }
    }

Now our single node resolver can base64-decode the id, find that it’s looking for a Post object with id of 1, and return the correct object.

The reason we prefix the API version in the global id is to give us a chance to later change how we encode global ids in a non-breaking way, in case we need it. It’s a bit of flexibility we can buy for effectively nothing.

If things aren't quite clicking for you yet (especially if you don't get why an id would be anything but the database id of that object), it might help to think of the id field on a GraphQL type this way - you’re actually looking at a node in GraphQL, an entire graph, and not in your database table. This means that when you’re looking at the id of a GraphQL type, you’re looking at a node in a graph, not a table in a database. In this light, having the id of a GraphQL type be globally unique makes much more sense, since its primary objective is to be an identifier in your graph of objects, not in your database.

What does this enable?

Like demonstrated above, most schemas will already have some way of resolving a single item of most things, even if it’s not via a top level node field. So why does this whole thing matter? Is it just for making the resolution slightly more convenient?

Well, that helps too of course. But the real power comes from standardization.

If you think about it, even if your schema has a top level field called postById(id: ID!), that indeed does resolve a post by an ID, there’s really no way for anyone but the schema creator to know that’s in fact what the field does. Without parsing the semantical meaning of postById and inferring that it in fact does mean “a single post, just by its ID”, any external tooling or framework will not be able to safely assume that that’s what that particular field does. You may have any number of fields resolving a single Post, like postByDatabaseId(id: ID!): Post, mostLikedPost(id: ID!): Post, latestPost(id: ID!): Post and so on. This makes it impossible for external tooling to safely assume which field it can use to refetch a single Post.

The Node interface, however, is an official GraphQL best practice, and this means that frameworks and tooling can build on top of it and anyone who implements it will benefit. And there’s some pretty cool things you can build with it. Let’s look at some concrete examples from Relay:

Autogeneration of pagination queries

The latest iteration of Relay (now called only Relay, but commonly referred to as Relay Modern) can automatically generate queries for refetching or paginating data, since it can refetch any GraphQL object via the Node interface. This is enables a very ergonomic developer experience. Basically, Relay can take something like this:

    fragment UserFriendsList_user on User {
      id
      friends(first: $first, after: $after) {
        edges {
          node {
            firstName
            lastName
          }
        }
      }
    }

And automatically generate a pagination query for that, like this:

    query UserFriendsListPaginationQuery($id: ID!, $first: Int!, $after: String) {
      node(id: $id) {
        ... on User {
          id
          friends(first: $first, after: $after) {
            edges {
              node {
                firstName
                lastName
              }
            }
          }
        }
      }
    }

…just because it knows that User, the type the fragment UserFriendsList_user is on and where the pagination is defined, implements the node interface, and can therefore be re-fetched via its id.

It doesn’t matter where the User whose friends we’re paginating is located in the query, the node interface let Relay refetch any Useras long as it has its id.

We have a whole article on pagination in Relay you’re encouraged to check out here.

Autogeneration of queries for your typical "Show more" functionality

In the same vein, Relay makes it really easy to build a classic "Show more" functionality. Relay can help you take something like this:

    fragment ProfilePage_user on User {
      id
      name
      avatarUrl
      bio @include(if: $showMore)
    }

...and generate a query that'll let you fetch that fragment again, but now including bio by setting $showMore to true.

Your experience of building the "show more" functionality would basically be as simple as running refetch({showMore: true}) when the user presses the "Show more"-button. Relay takes care of the rest, and it can do that because it knows how to refetch that User via its id using the Node interface, regardless of where that User is found.

Wrapping up

Hopefully you’ve now got some insight into why globally unique IDs and the Node interface is a good idea. Relay make great use of it, and since it’s an official GraphQL best practice, any other tooling can build on top of it too.

Oldest comments (5)

Daniel Lo Nigro • Apr 17 '20

Note that your unique IDs like MDQ6VXNlcjM1Mjk2 don't have to be Base64 encoded; simply using user_35296 would work too. The reason they're commonly Base64 encoded is to remind users that the ID is an opaque identifier, meaning the client shouldn't manually mess with it (eg. construct an ID using 'user_' + id or anything like that) and instead just treat it as some arbitrary identifier.

Something you might find interesting is that even with the requirement of having unique IDs, Facebook still uses 64-bit integer IDs for objects in GraphQL. The reason this is possible is because the IDs are still globally unique! A fun trick is that you can go to facebook.com/{id} and it'll redirect to the correct place - This works for profiles (eg. facebook.com/731901032), Pages (eg. facebook.com/108824017345866), videos (eg. facebook.com/221489128955927), photos (eg. facebook.com/124830722411862), and pretty much everything else. The key is that each object type has a range of IDs for that object, and each database master has a range of IDs, so for a given ID we can easily tell which object type it is and which database it's located on.

You can accomplish something similar with MySQL by setting the auto_increment_increment property per session. This controls the amount IDs are auto incremented by. For example, setting it to 100 will mean the first row gets an ID of 1, the second row gets 101, the third row gets 201, etc. The intended use-case for this is master-master replication to ensure IDs between two servers don't overlap, however if you only have a single database then you can also use it to get unique per-object IDs: For example, user table can start at 1 and go 101, 201, 301, etc. and the post table can start at 2 and go 102, 202, 302, etc. Then you'd check if id % 100 == 1 then it's a user ID, if id % 100 == 2 then it's a post ID, etc.

Gabriel Nordeborn • Apr 17 '20

Oh wow, that's really interesting! And really clever with the IDs 😄 also pretty... unexpected and inspiring that that works out at the massive FB scale. Makes you wonder how many similar things are over engineered in much smaller code bases.

Thanks for sharing that!

Sebastian Fratini • Oct 2 '20

This is very interesting. I had a few questions.

How to you implement the logic to base64 encode the ID? Is it manual per resolver? Or can you "paste" that snippet somewhere so it is automatically translated to the correct GraphQL Node type for example?
How does the node resolver implementation would look like? Again, my goal is to understand the best practices as we are implementing our own first GraphQL. Is it a factory pattern and you analyze each one of the types that would be returned? I am trying to make the code as re usable as possible. Thanks!

Gabriel Nordeborn • Oct 2 '20

Hi Sebastian, and thank you!

In pseudo code, something like:

const makeGloballyUniqueId = (
  typename: string, 
  identifier: string
)  => base64.encode(`${typename}:${identifier}`);

It's really not more complicated than that for generating the actual ID. You could then abstract that further and do a small function that can generate a resolver for you, that you can re-use throughout your schema.

I'd say, in the simplest possible form, it'd look like this (warning for more pseudo-code):

const [typename, id] = decodeGloballyUniqueId(id);

switch(typename) {
  case "User": 
    return User.get(id);
  case "BlogPost":
    return BlogPost.get(id);
....

So, it's just a matter of decoding the id and extracting the information you're after, and then use that to resolve the relevant object you need.

Does that make sense?

Sebastian Fratini • Oct 3 '20

Thank you! Yes, this is something similar to what I was implementing. Since I am using sequelize, I was trying to avoid code duplication and maybe thought if it was possible to have a hook or something in the graphql schema so I dont have to manually translate each ID on each resolver.

The code has perfect sense and I'll check the best way to implement it.

Thanks again!