DEV Community

Carc
Carc

Posted on

GraphQL Java Data Loader

Intro

GraphQL is a powerful and flexible query language for APIs that enables clients to request exactly the data they need, eliminating over-fetching and under-fetching of information. However, as GraphQL queries become more complex and involve multiple data sources, it can be challenging to efficiently retrieve and serve data to clients. This is where GraphQL data loaders come into play.

GraphQL data loaders are a critical component in optimizing GraphQL APIs, designed to tackle the notorious N+1 query problem, which occurs when a GraphQL server fetches the same data repeatedly for a list of related items. Data loaders help streamline the process of fetching data from various sources, such as databases, APIs, or even local caches, by batching and caching requests. By doing so, they significantly improve the efficiency and performance of GraphQL queries.

In this tutorial we will take a deep dive on the batching feature, for what we will explore how it does its magic having a look at the java implementation of the data loader.

Batching

Batching is the process of collecting multiple individual data retrieval requests into a single batch request, thus reducing the number of calls made to data sources. This is especially crucial when dealing with relationships in GraphQL queries.

Consider a typical scenario where a GraphQL query requests a list of items and, for each item, additional related data such as user information. Without batching, this would result in a separate database query or API request for each item, leading to the N+1 query problem. With batching, these individual requests can be efficiently combined into a single request, drastically reducing the number of round-trips to the data source

Java Data Loader Batching

Let’s say we have a graphql query like the below one

{
  user {
    name
    friends {
      name
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

It generates the following query result

{
  "user": {
    "name": "John",
    “friends”: [
      {
        "name": "Jane",
      },
      {
        "name": "Bob",
      },
      {
        "name": "Alice",
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

A naive implementation would perform a call to retrieve a user object for every user in the query response, i.e. 4 calls, one for the root object and one for each friend in the list.

However the DataLoader does not immediately perform the remote calls, it just enqueues the calls and returns a promise (CompletableFuture) to deliver a user object. Once we have enqueued all the calls that build our query result we must request the DataLoader to start executing them. At this point is where the magic happens. The DataLoader will start to extract the user id for each call and put it into a list which will be used to query the backend we configured and retrieve the user list using just one request.

The batching usually happens by levels, in this case we have 2 levels. The root user and his friends. By using the DataLoader batchig this response will require just 2 calls.

The code

Let’s put some code to show how it would be used.

First thing we need to have is a BatchLoader. It will load users from the user backend in batches, thus reducing the amount of API calls to that backend.

List<User> loadUsersById(List<Long> userIds) {
   System.out.println("Api call to load users = " + userIds);
   return users.stream().filter(u -> userIds.contains(u.id())).toList();
}

BatchLoader<Long, User> userBatchLoader = new BatchLoader<>() {
   @Override
   public CompletionStage<List<User>> load(List<Long> userIds) {
      return CompletableFuture.supplyAsync(() -> {
         return loadUsersById(userIds);
      });
   }
};
Enter fullscreen mode Exit fullscreen mode

Then we need to create a DataLoader which will use the previous BatchLoader to perform the loading of the whole user tree.

var userLoader = DataLoaderFactory.newDataLoader(userBatchLoader);

var userDTO = new UserDTO();
userLoader.load(1L).thenAccept(user -> {
   userDTO.id = user.id();
   userDTO.name = user.name();
   user.friends().forEach(friendId -> {
      userLoader.load(friendId).thenAccept(friend -> {
         userDTO.friends.add(new FriendDTO(friend.id(), friend.name()));
      });
   });
});

userLoader.dispatchAndJoin();
System.out.println(userDTO);
Enter fullscreen mode Exit fullscreen mode

It will produce the following debug output

Api call to load users = [1]
Api call to load users = [2, 3, 4]
UserDTO{id=1, name='John', friends=[FriendDTO[id=2, name=Jane], FriendDTO[id=3, name=Bob], FriendDTO[id=4, name=Alice]]}
Enter fullscreen mode Exit fullscreen mode

If you are curious about how this internally works I will show you one custom implementation of the user DataLoader. Not the real one. Just one simplified version to help you in getting the whole picture.

static class UserLoader {
   BatchLoader<Long, User> userBatchLoader;

   record QueueEntry(long id, CompletableFuture<User> value) { }
   List<QueueEntry> loaderQueue = new ArrayList<>();

   UserLoader(BatchLoader<Long, User> userBatchLoader) {
      this.userBatchLoader = userBatchLoader;
   }

   CompletableFuture<User> load(long userId) {
      var future = new CompletableFuture<User>();
      loaderQueue.add(new QueueEntry(userId, future));
      return future;
   }

   List<User> dispatchAndJoin() {
      List<User> joinedResults = dispatch().join();
      List<User> results = new ArrayList<>(joinedResults);
      while (loaderQueue.size() > 0) {
         joinedResults = dispatch().join();
         results.addAll(joinedResults);
      }
      return results;
   }

   CompletableFuture<List<User>> dispatch() {
      var userIds = new ArrayList<Long>();
      final List<CompletableFuture<User>> queuedFutures = new ArrayList<>();

      loaderQueue.forEach(qe -> {
         userIds.add(qe.id());
         queuedFutures.add(qe.value());
      });

      loaderQueue.clear();

      var userFutures = userBatchLoader.load(userIds).toCompletableFuture();

      return userFutures.thenApply(users -> {
         for (int i = 0; i < queuedFutures.size(); i++) {
            var userId = userIds.get(i);
            var user = users.get(i);
            var future = queuedFutures.get(i);
            future.complete(user);
         }
         return users;
      });
   }
}
Enter fullscreen mode Exit fullscreen mode

So, first look at CompletableFuture<User> load(long userId), it does not perform any userId lookup, it just:

  1. Enqueues the lookup
  2. Produces a CompletableFuture to let you chain further lookups based on the one you provided. So, the lookups are deferred until we actually request its execution using dispatchAndJoin()

Now, look at List<User> dispatchAndJoin(). That will be called once we are ready to retrieve the user list. It will:

  1. Call CompletableFuture<List<User>> dispatch() which will perform the following actions:
    1. Group all userIds into one list and send it to the underlying BatchLoader which performs the actual API call to the backend.
    2. Complete the CompletableFuture that was provided when we registered the lookup (when we called CompletableFuture<User> load(long userId)), thus adding more elements to loaderQueue. At this point userId lookups for the next level got enqueued.
  2. Repeat the process while there are elements remaining in loaderQueue.

References

https://www.graphql-java.com/documentation/batching/
https://github.com/graphql-java/java-dataloader

Top comments (0)