Nick Boers for Jobber

Posted on Oct 7, 2022

The GraphQL N+1 Problem and SQL Window Functions

#graphql #ruby #rails #webdev

A post by Clinton Pahl and Nick Boers, PhD

Introduction
AssociationLoader and its Drawbacks
Background for the SQL Window Loader
Aggregate Functions
Window Functions
Memory-Efficient N+1 Resolution
Introducing the WindowLoader
Conclusion

Introduction

At Jobber, we're constantly evolving a modern GraphQL API to support our Web-based interface, mobile app interfaces, and third-party integrations. GraphQL allows these clients to specify the field structure and data they need in response to API queries. For example, consider the following GraphQL query:

query JobVisits {
  jobs {
    nodes {
      visits {
        nodes {
          title
        }
      }
    }
  }
}

In this GraphQL query, the client has requested the title for each job visit. The response will contain an array of jobs, and for each job, an array of visits with titles. The structure of the JSON response is similar to the query:

{
  "data": {
    "jobs": {
      "nodes": [
        {
          "visits": {
            "nodes": [
              {
                "title": "Initial Assessment of Property"
              },
              ...
            ]
          }
        },
        ...
      ]
    }
  }
}

In our Rails application, we use the popular graphql Ruby gem to resolve GraphQL queries. When used naively, it essentially resolves queries as a depth-first tree traversal, which leads to the N+1 problem in GraphQL.

GraphQL’s N+1 problem, which might be better thought of as the 1+N problem, refers to the number of fetches from a backend data store necessary to resolve a relationship. In the previous example, a single fetch can obtain all of the jobs for an account. After obtaining all of the jobs, a naive resolver fetches the visits for the first job, the visits for the second job, and so on. After one fetch to get all the jobs, N additional fetches get the visits. Given the depth of relationships possible in a GraphQL query, these fetches from the backend data store can quickly balloon and lead to poor performance.

The poor performance will specifically be seen in API response times. Let’s assume the above query returned 100 jobs and that fetching the visits for each job from the database takes 2 ms. In this example, the additional 100 fetches will add 200 ms to the response time for just one field.

`AssociationLoader` and its Drawbacks

For the relationships subject to the N+1 problem, the graphql-batch Ruby gem and its AssociationLoader provide some relief. This gem was developed by Shopify. Using Ruby promises (provided by the promise.rb Ruby gem), it fundamentally alters the order of field resolution for GraphQL queries, and in a sense, converts the query from a depth-first traversal to a breadth-first traversal.

As a breadth-first traversal, a call to resolve visits for a single job doesn’t actually fetch data for the job’s visits. Instead, it returns a promise. The resolver returns a promise for each call to resolve visits. Once they're all batched, the data can be fetched from the backend in a single operation.

Under the hood, the AssociationLoader leverages ::ActiveRecord::Associations::Preloader. Resolving a field with promises involves collecting all of the records (e.g., jobs) where an association (e.g., visits) needs to be resolved. The Active Record Preloader then goes ahead and fetches all of the data in a single data fetch operation. After loading the data, the individual promises are fulfilled using the Active Record data loaded into memory.

Using the AssociationLoader and GraphQL Ruby field extensions, developers can easily configure a field to preload the required associations when it's resolved:

field :visits, Types::VisitType.connection_type, null: false, preload: :visits

Unfortunately, this approach has a weakness as soon as a GraphQL query includes pagination arguments. Consider the following slightly more complicated GraphQL query, which obtains the first three visits for each job.

query FirstThreeJobVisits {
  jobs {
    nodes {
      visits(first: 3) {
        nodes {
          title
        }
      }
    }
  }
}

In this example, the Active Record Preloader can still be used to satisfy the visits association for each job. Suppose jobs contains all of the jobs, resolving visits in a single fetch might involve a call like

::ActiveRecord::Associations::Preloader.new.preload(jobs, :visits)

Unfortunately, this code will fetch all of the visits for each job, even though the GraphQL response will only include the first three visits for each job. In this approach, the Rails application will need to perform the pagination, and in the process, it fetches unnecessary data from the database, which consume unnecessary memory in the Rails application.

Background for the SQL Window Loader

Our initial solution to preloading was based on the AssociationLoader, and it served us well to a point. It addressed the GraphQL N+1 problem, and it significantly reduced our API response times when compared to not addressing the N+1 problem. Unfortunately, as we increased the use of connection types (with their first, last, before, and after arguments), retrieving all associated records from the database only to paginate them in the Rails application was inefficient. For objects with many associated records, it consumes more memory than necessary.

After recognizing the problem, we brainstormed options to offload some of the work onto the database server to ultimately reduce the Rails application’s memory consumption. One particularly promising avenue involved SQL window functions. After deciding to pursue SQL window functions, we started our work by considering the WindowKeyLoader example described in the graphql-batch repository.

Given that many readers may be unfamiliar with SQL window functions, the following subsections provide some background.

Aggregate Functions

An understanding of SQL aggregate functions will help in understanding SQL window functions. Consider a traditional SQL aggregate function such as count(). In a statement involving such an aggregate function, the GROUP BY clause groups records, each distinct group becomes a row in the result. The database management system (DBMS) applies the aggregate function to each group’s records to produce the function’s output.

For example, consider the following query:

SELECT
  account_id,
  count(user_id) AS user_count
FROM
  users
GROUP BY
  account_id

This query

selects records from the users table,
groups those records by account_id and essentially flattens them so each row in the result has a distinct account_id, and
computes the field user_count for each row in the result by applying the function count() to the records in each group.

That (simplified) explanation of the GROUP BY clause provides some background for understanding window functions.

Window Functions

When using a window function, records are conceptually grouped only for the context of the function. Those groups are not flattened in the result. The OVER clause immediately following the function name will apply the function to a window, and the OVER clause itself defines the groups of that window.

For example, consider the following query:

SELECT
  account_id,
  user_id,
  login_count,
  rank() OVER partition_by_account_id AS user_rank
FROM
  users 
WINDOW partition_by_account_id AS (
  PARTITION BY account_id
  ORDER BY
    login_count DESC
)

In this example, the DBMS applies rank() to subsets of the records, where PARTITION BY account_id defines the subsets. For each unique account ID, the result includes a field user_rank with values from 1 to n where n is the number of users for the unique account ID. The users are ranked by the number of times they have logged into the system.

Memory-Efficient N+1 Resolution

Recall the following GraphQL query from earlier, which obtains the first three visits for each job.

query FirstThreeJobVisits {
  jobs {
    nodes {
      visits(first: 3) {
        nodes {
          title
        }
      }
    }
  }
}

If jobs typically have very few visits, the AssociationLoader might be a reasonable solution. If jobs have many visits, it would load many visits from the database that would ultimately be discarded because of the first: 3 filter.

Using SQL window functions, it’s possible to apply the first: 3 filter at the query level and load only the necessary records into the Rails application. For example, the following query would only load the first 3 visits for each job.

SELECT
  numbered_visits.*
FROM (
  SELECT
    *,
    row_number() OVER partition_by_job_id AS row_number
  FROM
    visits
  WHERE
    job_id IN (...)
  WINDOW partition_by_job_id AS (
    PARTITION BY job_id
    ORDER BY id ASC
  )
) AS numbered_visits
WHERE
  row_number <= 3

For direct associations (i.e., those where a foreign key links two tables), deriving this SQL is a rather mechanical process. It’s mechanical enough that we created a new WindowLoader to make use of SQL window functions for resolving these associations.

Introducing the `WindowLoader`

When data access patterns suggest an SQL window function will improve the performance resolving a field, simply adding our new window_load argument to the GraphQL field method will cause the resolver to use SQL window functions when resolving the field. The new window_load argument provides the name of the association, e.g.,

module GraphqlSchema
  module Main
    module Types
      class JobType < GraphqlSchema::Common::Types::BaseObject
        ...   
        field :visits, Types::VisitType.connection_type, window_load: :visits
        ...
      end
    end
  end
end

In our BaseField, derived from the graphql Ruby gem’s Schema::Field, the initializer accepts this window_load argument. When the argument specifies an association, the BaseField constructor adds a custom connection extension to the field (WindowConnectionExtension).

Our WindowConnectionExtension class inherits from the gem’s Schema::Field::ConnectionExtension and Schema::FieldExtension classes. This connection extension has two hooks that wrap field resolution: resolve and after_resolve. The former hook is called to resolve the field, and in this instance, it uses our WindowLoader class to obtain a Ruby promise for the resolution of the field. The latter hook is called after field resolution and after the resolution of promises, and in this instance, it uses our WindowConnection class.

The former class, GraphqlSchema::Common::Loaders::WindowLoader, which inherits from GraphQL::Batch::Loader, first records the foreign keys that will need to be used in the SQL query. In response to .load calls, it returns promises. To finally resolve the promises, it generates and runs a single SQL query that uses the previously-described window functions. Most developers using this window loader are totally unaware these steps occur behind the scenes.

The latter class, GraphqlSchema::Common::Pagination::WindowConnection, which inherits from GraphQL::Pagination::ArrayConnection, produces a result with the expected fields for pagination, e.g., cursors and total counts.

Conclusion

Naively using the graphql Ruby gem to resolve GraphQL queries for a Ruby on Rails API leads to an implementation that suffers from GraphQL’s N+1 problem. Iterating on that solution with a gem such as graphql-batch with its AssociationLoader can dramatically improve the situation by solving the N+1 problem and significantly reducing API response times. When a GraphQL query accepts arguments for pagination, a solution like the AssociationLoader can lead to loading more data than necessary from the database, and as a result, higher than necessary memory consumption in the Rails server. With SQL window functions, it’s possible to offload the pagination onto the database server so that the Rails application does not receive more records than necessary. Given the flexibility of the graphql and graphql-batch gems, it’s possible to create an easy to use interface for loading data using SQL window functions.

About Jobber

Our awesome Jobber technology teams span across Payments, Infrastructure, AI/ML, Business Workflows & Communications. We work on cutting edge & modern tech stacks using React, React Native, Ruby on Rails, & GraphQL.

If you want to be a part of a collaborative work culture, help small home service businesses scale and create a positive impact on our communities, then visit our careers site to learn more!

How to reduce TTFB

In the past few years in the web dev world, we’ve seen a significant push towards rendering our websites on the server. Doing so is better for SEO and performs better on low-powered devices, but one thing we had to sacrifice is TTFB.

In this article, we’ll see how we can identify what makes our TTFB high so we can fix it.