Bogdan Nedelcu

Posted on

# The Graph in GraphQL

GraphQL is today a ubiquitous choice for building APIs. The technology, open-sourced by Facebook, allows clients to fetch what only they need and aggregates requests under a unique query interface. With GraphQL, we can build faster applications, consume fewer data, and leverage fantastic developer tooling. I have been fascinated by GraphQL ever since its release. However, a question kept coming back to me: How does it leverage the power of graphs? In the following lines, we will start by taking a look at graphs, trees, and recursive properties. With that knowledge, let's dive deep into the original specifications and the javascript implementation of the server runtime. We will break apart the inner workings of GraphQL into its most simple and smaller parts and then put them back together. In the process, we will uncover how the data-structures are used to create the technology that changed the web as we know it.

## What is a Graph?

A long time before GraphQL there where graphs, but what are they exactly? Graphs are data structures that resemble the natural way we build mental models and relate concepts. In graphs, relationships between represented entities are as relevant as the entities themselves.\
We build graphs with abstract objects called `node`s or vertices. The connection between two nodes is called an `edge`. We can then explore a `graph` recursively following the `edges` in a specific order.

### A-Cyclical Directed Graphs

There are different types of graphs, depending on how the nodes and the edges are arranged. We will focus for now on a-cyclical directed graphs because these are the ones we find in GraphQL. A directed edge has a start and an end and can only be traversed following that direction. Adding direction to the `edges` changes the meaning of the relationship between nodes and introduces a hierarchy. Let's say, for example, we want to represent money loans with a graph. Every edge would represent money borrowed, and the direction would represent the money flow from the lender to the party taking the loan.

## From a Graph to a Tree

Graphs can transform into a different data structure depending on the constraints applied to them. A graph cycle or a circuit is a set of edges where the last edge is also the first edge. When a graph has no cycles is called an a-cyclical graph. A directional graph that is also a-cyclical is known as a `tree`.

The tree structure has several advantages because of its recursive nature. The elementary unit of a `tree` is a `root` node and one or many `children` nodes. If we model our data as a `graph` and impose the necessary constraints on it, we can leverage `tree` properties to process it. While one can transverse a `tree` as a whole is usually easier to work at a local level, node by node. Read and write operations can be extended to the full length of a `tree` by executing functions on the `root` node and then recursively on the subsequent `children`.

## Modelling with Graph(QL)

As we all know in `GraphQL`, we represent our business domain by using a `schema`. The schema itself is a `graph` composed of `type`s representing different entities. Types are extracted from a problem space using domain-driven techniques. They can have different fields, and every `field` points again to another type. In the picture above you can see that `lastname`, `firstname` and `email` point to the `scalar` type `String`. `Scalar` types do not have any subfields and they represent the leaves of the `query` tree. A path through the schema will always resolve in a collection of scalars structured like a `tree`. Most GraphQL implementations allow developers to add their own `scalars` with custom validation and serialization functions. The relationships between a `type` and its fields are unidirectional edges and are the building block of the schema. That makes the GraphQL Schema an `acyclic directed graph`. As we mentioned before this kind of graph can be read like a tree, visiting each tree once, in a process called tree traversal. A GraphQL `query` is a path in the graph, going from the root type to its subtypes until we reach scalar types with no subfields. As a result, a `query` is a projection of a certain subset of the GraphQL schema to a tree. On the backend side, every field of a type maps to a `resolver` function that returns its value when queried. The `query` result is created by merging the result of running `resolver` functions for every field extracted from the schema. GraphQL, however, does not stop here. `Tree` properties and recursive functions are used not only to model data but mainly to validate and execute queries on that schema.

## Schema parsing

The GraphQl server parses the schema document at execution time. Types are extracted and stored as plain Javascript `Objects` with references to their fields, and to the resolver functions in a dictionary called `typeMap`. When a field must be resolved the execution algorithm will look for it in the dictionary and use both the `resolver` function and the references to its subtypes to build its value.

``````// Simplified structure of the type map
let typeMap = {
rootType: {
fields: { // array with the fields of the root ype
user: {
type: {
fields: {
lastname: {...},
settings: {...},
}
},
resolve: () => ({})  // points to a resolve function for the type
},
settings: {
type: {
fields: {
membership: {...},
}
},
resolve: () => ({})  // points to a resolve function for the type
}
}
},
};
``````

As every `type` contains a reference to its `resolver` function, one can resolve the whole schema by repeating three steps:

1. Retrieve a `type` from the `typeMap` dictionary
2. Run its `resolver` function
3. Repeat the same on the `field`s of this `type`

To sum up: the GraphQL schema document is parsed on the server. During the parsing process, the types extracted and stored together with references to its `resolver` functions in a dictionary called `typeMap`. Because of its tree-like structure, the dictionary can be read and wrote using recursive functions following different transversals.

## Query Parsing

The GraphQL server parses every query from a `string` to an Abstract Syntax Tree(AST). An AST is a tree representation of the syntax of source code from a particular language. Every node in the tree represents a statement in the `query`, including its type, arguments, and position.

The `AST` is a common abstraction for compilers and is used to validate syntax correctness in a process called semantic analysis. Again, because of its tree-like structure, the `AST` can be processed and interpreted by recursive functions. This process is behind the `query` validation feature that GraphQL editors usually offer.

## Query Execution

Once a `query` operation has been converted to an `AST` and its structure validated, we can use the `tree` properties to execute the `query`. The core of the execution algorithm is a recursive function that runs on every node of the query tree following a depth-first-search order.

The traversal ensures that fields are executed and resolved in a stable and consistent order. Following the first order traversal the field execution function will be called on each field in the following sequence:
The `executeField` function contains the magic behind the field value resolution and is well described in the GraphQL specifications. The function arguments are the `name` of the `type` is being run on, the definition of that type from the `typeMap` dictionary and the `resolver` function. First, the algorithm executes the `resolver` function and stores the return. Next, it completes the field value depending on its `type`. If the field type is a `scalar`, its value is simply "coerced" using a serialization function and directly returned. If the field type is an `Object` the `completeValue` process is started. The function `collectFields` assembles all the subfields on the respective object type that have not been resolved by the `resolver` function and returns a `fieldGroup`, an `array` ordered respecting the depth‐first‐search style. Then `executeField` runs recursively on each one of the subfields collected, in parallel. Finally, the algorithm merges and coerces the values returned by the first execution of the `resolver` function and the `completeValue` return and builds the final result according to the order in the query `AST` tree.

The resolution algorithm described above is a simplification of the GraphQL specifications. Proper `error` handling and response building make the actual implementation more tricky. Parsing queries into trees simplifies the resolution algorithm by leveraging recursiveness and ensures the consistency of field execution for queries on schemas of any shape and size.

## Summing up

Graphs are the core reason why GraphQL is such a great choice to build and consume APIs. On the one hand, graphs allow developers to model the data in a natural way using directional relationships and hierarchies. The GraphQL Schema is a direct representation of a problem space based on natural language.

On the other hand, GraphQL leverages the recursive properties of AST trees to validate and execute queries. The depth first-order transversal of query trees enables stable and predictable parallel data fetching. The recursive nature of queries enabled fast development of tools like GraphiQL and the Apollo Client that leverage it for client-side query validation, caching and cache invalidation.

## Final thoughts

To build exceptional software we need a fundamental understanding of the tools we use. It is usually simple pieces put together in harmony that make up sophisticated technology. The core abstraction in GraphQL is the graph. A linear algebra concept used to represent information in a non-linear and hierarchical way, or simply put: how we think about it every other day.

Even more fascinating is the fact that at the core of any technology we find the incredible ways humans solve problems naturally.

Originally published on bogdanned.com.

Manoj Dinesh

This is an awesome article. Thoroughly enjoyed reading it. Recommended to everyone starting the GraphQL journey

Bogdan Nedelcu

Thank you a lot Manoj! I am really happy you enjoyed it!

Manoj Dinesh

youtu.be/5tgz-vNMy1Y

Hyeseong Kim

Awesome. Most introduction articles focus on "QL". This is a great article explaining an important part of GraphQL!

Daniel Norman

Bogdan Nedelcu

I am really happy to hear that Daniel! Cheers, Bogdan

Melih Yıldız • Edited

I found your article while trying to find a way to represent DAGs using graphql. Sadly, you are wrong and fooled the Google's algorithm.
A directed acyclic graph, or a DAG, is not a tree. Tree is a special type of graph where every node only has one "parent", as in only 1 ingoing edge, and multiple children, as in multiple outgoing edges. Whereas in a DAG you can find 2 edges going into 1 node. Since the edges are directed this still does not create a cycle. Please check the attached image. While it's still a DAG, it's not a tree. Even when you reverse the edges.

Jean Gérard Bousiquot

Awesome!