Why GraphQL? 🤔
GraphQL beautifully solves three crucial problems:
- Fetching data for a view used to require multiple round trips to the server. With GraphQL, you can retrieve all the initial data needed for a view in just one request. REST APIs, on the other hand, often involve complex parameters and conditions that are difficult to manage and scale.
- Clients used to be tightly coupled to servers. But with GraphQL, clients communicate using a request language that eliminates the need for servers to hardcode data shapes or sizes. This decoupling empowers us to maintain and enhance clients independently from servers.
- Front-end developers often faced a frustrating experience. Thankfully, GraphQL offers a declarative language for expressing data requirements in user interfaces. Developers can focus on what they need, without worrying about how to make it available. There's a strong connection between UI data requirements and the way developers describe that data in GraphQL. 💪
This article will delve into the details of how GraphQL tackles these challenges. But before we dive in, let's start with some simple definitions.
What is GraphQL? 🤔
Imagine GraphQL as a language. If we teach this language to a software application, that application can effectively communicate its data requirements to a backend service that also understands GraphQL.
Learning GraphQL is easier for new applications compared to mature ones, much like how children pick up languages more quickly than adults. Starting from scratch with GraphQL is a breeze, but introducing it to an existing application requires some effort.
To enable a data service to understand GraphQL, we need to implement a runtime layer and expose it to clients who want to interact with the service. This runtime layer acts as a translator, representing the data service and interpreting the GraphQL language. It's important to note that GraphQL itself isn't a storage engine; it requires a translating runtime layer.
This runtime layer, which can be implemented in any language, defines a graph-based schema that exposes the capabilities of the data service it represents. Clients speaking GraphQL can query this schema within its defined capabilities. This approach decouples clients from servers, enabling independent evolution and scalability for both.
A GraphQL request can either be a query (read operation) or a mutation (write operation). In both cases, the request is a simple string that a GraphQL service can interpret, execute, and respond to with data in a specified format. JSON is commonly used for responses in mobile and web applications.
What is GraphQL? (The Explain-it-like-I’m-5 version) 🧒📖
GraphQL is like a bridge for communication. Imagine you have a client and a server, and they need to talk to each other. The client has to tell the server what data it needs, and the server has to provide the requested data to the client. GraphQL comes in the middle to make this communication smoother.
"But why can't the client talk directly to the server?" you may ask. Well, it actually can.
However, there are reasons to consider adding a GraphQL layer between clients and servers. One of the main reasons, and the most popular one, is efficiency. Typically, the client needs to ask the server about multiple resources, while the server knows how to respond with just one resource. This results in the client making multiple round trips to the server to gather all the required data.
With GraphQL, we can shift this complexity of multiple requests to the server-side, handling it through the GraphQL layer. The client asks a single question to the GraphQL layer and receives a response that precisely meets its needs.
Using a GraphQL layer brings numerous benefits. For instance, it simplifies and standardizes communication with multiple services. When multiple clients are requesting data from various services, a GraphQL layer in the middle can streamline and unify this communication. While it's also possible to achieve this with REST APIs, GraphQL provides a structured and standardized approach.
Instead of each client directly contacting different data services (as shown in the previous example), they can communicate with the GraphQL layer. The GraphQL layer then handles the communication with the individual data services. This way, GraphQL not only isolates clients from the need to communicate in multiple languages but also translates a single request into multiple requests to various services using different languages.
Imagine you have three people who speak different languages and possess different knowledge. Now, imagine you have a question that requires combining the knowledge of all three individuals. If you have a translator who speaks all three languages, answering your question becomes easy. This is precisely what a GraphQL runtime does.
Computers aren't yet intelligent enough to answer any question on their own, so they rely on algorithms. This is why we need to define a schema on the GraphQL runtime, which clients utilize.
The schema is like a document that lists all the questions a client can ask the GraphQL layer. There is flexibility in how the schema is used because we're dealing with a graph of interconnected nodes. The schema primarily represents the boundaries of what the GraphQL layer can answer.
Still not clear? Let's call GraphQL what it truly is: a replacement for REST APIs. So, let me address the question you're likely wondering now.
What's wrong with REST APIs? 🤔
The biggest problem with REST APIs is the nature of multiple endpoints. 🔄 These require clients to do multiple round-trips to get their data.
REST APIs are usually a collection of endpoints, where each endpoint represents a resource. So when a client needs data from multiple resources, it needs to perform multiple round-trips to a REST API to put together the data it needs.
In a REST API, there is no client request language. Clients do not have control over what data the server will return. There is no language through which they can do so. More accurately, the language available for clients is very limited.
For example, the READ REST API endpoints are either:
GET /ResourceName - to get a list of all the records from that resource, or
GET /ResourceName/ResourceID - to get the single record identified by that ID.
A client can't, for example, specify which fields to select for a record in that resource. That information is in the REST API service itself and the REST API service will always return all of the fields regardless of which ones the client actually needs. GraphQL's term for this problem is over-fetching of information that's not needed. It's a waste of network and memory resources for both the client and server.
One other big problem with REST APIs is versioning. If you need to support multiple versions, that usually means new endpoints. This leads to more problems while using and maintaining those endpoints and it might be the cause of code duplication on the server.
The REST APIs problems mentioned above are the ones specific to what GraphQL is trying to solve. They are certainly not all of the problems of REST APIs, and I don't want to get into what a REST API is and is not. I am mostly talking about the popular resource-based-HTTP-endpoint APIs. Every one of those APIs eventually turns into a mix that has regular REST endpoints + custom ad-hoc endpoints crafted for performance reasons. This is where GraphQL offers a much better alternative.
How does GraphQL do its magic? 🎩
There are a lot of concepts and design decisions behind GraphQL, but probably the most important ones are:
- A GraphQL schema is a strongly typed schema. To create a GraphQL schema, we define fields that have types. Those types can be primitive or custom, and everything else in the schema requires a type. This rich type system allows for rich features like having an introspective API and being able to build powerful tools for both clients and servers.
- GraphQL speaks to the data as a Graph, and data is naturally a graph. If you need to represent any data, the right structure is a graph. The GraphQL runtime allows us to represent our data with a graph API that matches the natural graph shape of that data.
- GraphQL has a declarative nature for expressing data requirements. GraphQL provides clients with a declarative language for them to express their data needs. This declarative nature creates a mental model around using the GraphQL language that's close to the way we think about data requirements in English, and it makes working with a GraphQL API a lot easier than the alternatives.
The last concept is why I personally believe GraphQL is a game changer.
Those are all high-level concepts. Let's get into some more details.
To solve the multiple round-trip problem, GraphQL makes the responding server just a single endpoint. Basically, GraphQL takes the custom endpoint idea to an extreme and just makes the whole server a single custom endpoint that can reply to all data questions.
The other big concept that goes with this single endpoint concept is the rich client request language that is needed to work with that custom single endpoint. Without a client request language, a single endpoint is useless. It needs a language to process a custom request and respond with data for that custom request.
Having a client request language means that the clients will be in control. They can ask for exactly what they need and the server will reply with exactly what they're asking for. This solves the over-fetching problem.
When it comes to versioning, GraphQL has an interesting take on that. Versioning can be avoided altogether. Basically, we can just add new fields without removing the old ones, because we have a graph and we can flexibly grow the graph by adding more nodes. So we can leave paths on the graph for old APIs and introduce new ones without labeling them as new versions. The API just grows.
This is especially important for mobile clients because we can't control the version of the API they're using. Once installed, a mobile app might continue to use that same old version of the API for years. On the web, it's easy to control the version of the API because we just push new code. For mobile apps, that's a lot harder to do.
Not totally convinced yet? How about we do a one-to-one comparison between GraphQL and REST with an actual example?
RESTful APIs vs GraphQL APIs — Example 🌐⚔️
Let's imagine that we are the developers responsible for building a shiny new user interface to represent the Star Wars films and characters.
The first UI we've been tasked to build is simple: a view to show information about a single Star Wars person. For example, Darth Vader, and all the films this person appeared in. This view should display the person's name, birth year, planet name, and the titles of all the films in which they appeared.
As simple as that sounds, we're actually dealing with 3 different resources here: Person, Planet, and Film. The relationship between these resources is simple and anyone can guess the shape of the data here. A person object belongs to one planet object and it will have one or more film objects.
The JSON data for this UI could be something like:
{
"data": {
"person": {
"name": "Darth Vader",
"birthYear": "41.9BBY",
"planet": {
"name": "Tatooine"
},
"films": [
{ "title": "A New Hope" },
{ "title": "The Empire Strikes Back" },
{ "title": "Return of the Jedi" },
{ "title": "Revenge of the Sith" }
]
}
}
}
Assuming a data service gave us this exact structure for the data, here's one possible way to represent its view with React.js:
// The Container Component:
<PersonProfile person={data.person} ></PersonProfile>// The PersonProfile Component:
Name: {person.name}
Birth Year: {person.birthYear}
Planet: {person.planet.name}
Films: {person.films.map(film => film.title)}
This is a simple example, and while our experience with Star Wars might have helped us here a bit, the relationship between the UI and the data is very clear. The UI used all the "keys" from the JSON data object we imagined.
Let's now see how we can ask for this data using a RESTful API.
We need a single person's information using a RESTful API. Here's how we might structure the request:
GET /people/{personId}
This endpoint would return the following JSON response:
{
"name": "Darth Vader",
"birthYear": "41.9BBY",
"planetId": 1
"filmIds": [1, 2, 3, 6],
*** other information we do not need ***
}
To display the person's information, we would need to make additional requests to retrieve the related data. For example, to get the planet name, we would need to make another request to the "homeworld" URL:
GET - /planets/1
The response would be:
GET - /films/1
GET - /films/2
GET - /films/3
GET - /films/6
Each of these requests would return the film information:
{
"title": "A New Hope"
}
To display the data in the UI, we would need to handle these asynchronous requests and wait for all the responses before rendering the complete view. This can result in multiple round trips to the server and potential over-fetching of data. 🔄🌐
In contrast, with GraphQL, we can make a single request to retrieve all the required data in one go, without the need for multiple requests or over-fetching. The GraphQL query for fetching the same data would look like this:
query {
person(personId: "1") {
name
birthYear
planet {
name
}
films {
title
}
}
}
The GraphQL server would respond with exactly the requested data:
{
"data": {
"person": {
"name": "Darth Vader",
"birthYear": "41.9BBY",
"planet": {
"name": "Tatooine"
},
"films": [
{ "title": "A New Hope" },
{ "title": "The Empire Strikes Back" },
{ "title": "Return of the Jedi" },
{ "title": "Revenge of the Sith" }
]
}
}
}
By using GraphQL, we simplify the data retrieval process, reduce the number of network requests, and eliminate over-fetching, resulting in more efficient and flexible data fetching for our UI. 🚀💪🌟
💡💰 The Price of GraphQL's Flexibility 💰💡
There are always trade-offs, even in the world of GraphQL. While GraphQL brings tremendous flexibility, it also introduces certain challenges and considerations.
One significant concern is the potential for resource exhaustion attacks, also known as Denial of Service (DoS) attacks. GraphQL servers can be targeted with complex queries that consume excessive server resources. It's relatively easy to query deep nested relationships (e.g., user -> friends -> friends...) or use field aliases to request the same field multiple times. Resource exhaustion attacks are not unique to GraphQL, but when working with GraphQL, we must be particularly cautious about them. 🚫💥
Fortunately, there are mitigations we can employ. We can perform cost analysis on queries in advance and enforce limits on the amount of data that can be consumed. Implementing timeouts to terminate long-running requests is another effective measure. Additionally, since GraphQL functions as a resolving layer, we can handle rate limit enforcement at a lower level beneath GraphQL. These measures help protect the server from resource-heavy queries.
If the GraphQL API endpoint we aim to safeguard is not public and is intended solely for internal consumption by our own clients (web or mobile), we can employ a whitelist approach. This involves pre-approving queries that the server can execute. Clients can then request the execution of pre-approved queries using a unique query identifier. Facebook has adopted this approach to enhance security.
Authentication and authorization are other critical considerations when working with GraphQL. The question arises: do we handle them before, after, or during the GraphQL resolve process?
To answer this question, let's view GraphQL as a domain-specific language (DSL) layered atop our backend data-fetching logic. Authentication and authorization become additional layers. GraphQL does not directly facilitate the implementation of authentication or authorization logic; that's not its purpose. However, if we choose to place these layers behind GraphQL, we can utilize GraphQL to transmit access tokens between clients and the enforcement logic. This approach closely resembles how authentication and authorization are handled with RESTful APIs.
GraphQL poses some challenges when it comes to client data caching. Unlike RESTful APIs, which are naturally cache-friendly due to their resource-oriented nature, caching GraphQL responses requires a bit more effort. While we can use the query text itself as a cache key, this approach has limitations and may lead to inefficiencies and data consistency issues. Overlapping results from multiple GraphQL queries can be problematic with this basic caching approach.
Fortunately, there is an elegant solution to this problem: a Graph Query deserves a Graph Cache. By normalizing the response of a GraphQL query into a flat collection of records and assigning each record a globally unique ID, we can cache individual records instead of entire responses. However, this is not a straightforward process. Records may reference other records, resulting in a cyclic graph that requires careful handling and traversal when populating and reading from the cache. Implementing a caching logic layer becomes necessary. This approach proves to be more efficient overall compared to response-based caching. Frameworks like Relay.js adopt this caching strategy and provide automatic management of the cache.
Perhaps one of the most critical issues to address with GraphQL is the notorious N+1 SQL query problem. GraphQL query fields function as stand-alone functions, and resolving these fields with data from a database might result in a new database request per resolved field. While it's relatively straightforward to analyze, detect, and solve N+1 issues in a simple RESTful API endpoint by enhancing SQL queries, dynamically resolving fields in GraphQL poses a greater challenge. Luckily, Facebook is at the forefront of tackling this issue with a solution called DataLoader.
As the name suggests, DataLoader is a utility that allows us to retrieve data from databases and make it available to GraphQL resolver functions. Instead of directly querying the database with individual SQL queries, we can use DataLoader as an intermediary agent to reduce the number of actual database requests.
DataLoader employs a combination of batching and caching techniques to achieve this optimization. If a client request requires querying the database for multiple items, DataLoader can consolidate these queries and efficiently load their responses in batches. Furthermore, DataLoader caches the fetched data, making it readily available for subsequent requests involving the same resources.
By leveraging DataLoader, we can mitigate the performance impact of N+1 SQL queries in GraphQL and improve the overall efficiency of our data retrieval process. It is a valuable tool for optimizing data fetching in GraphQL applications.
Thanks for reading. 📖😊
Top comments (0)