behnam rahimpour

Posted on Aug 23

A Practical Guide to GraphQL N+1 Solutions in NestJS

#webdev #backend #softwareengineering #programming

Overview

At Comfortel, we developed a large-scale admin panel to manage a complex system with tightly coupled domains and bounded contexts. We used a GraphQL gateway to abstract the intricate communications between these domains.

Initially, the team was pleased with this abstraction, as many are when first adopting GraphQL. However, as in all software development—and indeed, in life—every decision is a trade-off. Our choice of abstraction soon led to new challenges in performance and scalability, primarily the infamous N+1 problem.

This case study details our problem-solving approach, the design of our implementation based on our technologies with Nestjs as our Gateway, and the final results.

What is the N+1 problem (why GraphQL often triggers it)

What it is: N+1 happens when we fetch a list of N parent records, then for each parent we issue an additional query to fetch nested data. That becomes 1 (for the list) + N extra queries. With deeper nesting it multiplies quickly (e.g., per-nested item queries).
Why GraphQL triggers it: GraphQL resolvers are field-based. Without batching, each field on each returned parent can cause its own data fetch. For lists, this leads to an explosion in calls to databases/brokers if left unoptimized.
Concrete example (Places → Members → Users)
- Query: list 50 places, and for each place resolve members, and for each member resolve their user.
- Naive calls:
- 1 call: list places (DB/broker)
- +50 calls: for each place, fetch members
- +Σ calls: for each place's members fetch user. If average 10 members per place, +500 calls.
- Total naive example with 50 places and 10 members per place: 1 + 50 + 500 = 551 calls per request.
- Complexity: O(P + P + P·M) ≈ O(P·M). With additional nested levels, this compounds.

Solutions

Apollo Connectors batching (schema-level change)
- Batch resolver calls at the Graph schema layer using Connectors.
- Requires updating the GraphQL schema with @connect mappings.
- Docs: Apollo Connectors: Batch Requests
- Example (IDs batched via query params):

type Product
  @connect(
    source: "ecom"
    http: { GET: "/products", queryParams: "id: $batch.id" }
    selection: "id name"
    batch: { maxSize: 10 }
  ) {
  id: ID!
  name: String
}

DataLoader (application-level wrapper)
- Batch and cache per-request fetches in application resolvers using a DataLoader abstraction.
- No schema changes required. Fits where we control resolver implementation.

Solution Choice

We are operating at the application level; therefore, DataLoader is a better fit for us than schema-level connectors. based on the time and efficiency of our goal.

Coding Design

Context-based DataLoader Management

Our requirements

Reusability: a shared, domain-scoped batching layer that can be reused across resolvers.
Automatic nested batching: convert N+1 per nested field into 1 batched request per nested layer:
- 1 request for the parent list
- 1 request for each nested entity layer (e.g., members, ancestors, parents)
Request-scoped caching: Each GraphQL request gets its own DataLoader instances to prevent cache pollution between requests and remove duplicated nested entities ids.

Why Context-based approach?
We need to initialize DataLoader in each GraphQL request separately to handle caches of DataLoader for each request separately. As each domain batch loader can be used in many other domains and managing dependencies, normal injecting and initializing DataLoader causes cache issues of singleton instances between requests. Therefore, we use the context callback of GraphQL with forRootAsync and initialize DataLoader in each incoming GraphQL request.

Encapsulation Strategy
To encapsulate the details of managing DataLoaders and their initialization and their modules, we created BatchLoadService and BatchLoadModule. This allows us to manage injection and initialization of domain batch loaders from a centralized location.

To achieve this with dataloader we rely on two tools:

load: this method, resolves a single key to a single result (or a list, but keyed by one id) via batching.
loadMany: resolve multiple keys at once (e.g., a list of ids) via batching.

Scenarios and how to apply the tools

Parent entity with one nested entity
- a) If parent exposes nested entity id: use load with the nested id.
- Example: get Members by member ids. and the return type will be array of members.
- b) If parent does NOT expose nested entity id: use load with parent id (the batch function fetches by parent ids).
- Example service: get Members by place ids. and the return type again will be array of members.
Parent entity with list of nested entities
- a) If parent exposes nested ids list: use loadMany with nested ids.
- Example: get ancestors by ancestor_ids. and the return type will be array of ancestor ids.
- b) If parent does NOT expose nested ids list: use load with parent id; the batch function returns an array per parent.
- Example service: get Members by place ids and return an array per place and the return type will be array of array of members.
- Note: the batch function must return results in the exact order of requested keys (array of arrays aligned to parent order).

Directory design (per-domain batch layer)

Keep a reusable batching class per domain to consolidate all batchable methods for that domain, and export typed loaders for use in resolvers across domains.

src/
  infrastructure/
    common/
      batch-load/
        batch-load.module.ts        # Centralized batch loader management
        batch-load.service.ts       # Service to initialize all loaders
        batch-load.interface.ts     # Common interfaces and types
  application/
    place/
      place.batch-load.ts        # Place-specific batching (e.g., by place ids)
    service/
      service.batch-load.ts      # Example for service domain (pattern mirrors place)

Class diagram (PUML)

Implementation guideline

GraphQL Module Configuration

The GraphQL module is configured with forRootAsync to initialize DataLoaders per request from context property:

GraphQLModule.forRootAsync<ApolloDriverConfig>({
  driver: ApolloDriver,
  imports: [BatchLoadModule],
  useFactory: (batchLoadService: BatchLoadService) => ({
    autoSchemaFile: join(process.cwd(), "dist/schema.gql"),
    graphiql: true,
    context: () => ({ batchRequest: batchLoadService.initLoaders() }),
    buildSchemaOptions: {
      numberScalarMode: "integer",
    },
    formatError(formattedError) {
      return config().formatGraphQLError(formattedError);
    },
  }),
  inject: [BatchLoadService],
}),

As you see for each request from context we initialize our dataloader.

BatchLoad Module Structure

@Module({
  providers: [BatchLoadService, MemberBatchLoad, PlaceBatchLoad],
  imports: [PlaceModule, UserPlaceModule],
  exports: [BatchLoadService],
})
export default class BatchLoadModule {}

Adding New Domain Batch Loaders

Create the batch loader class implementing the BatchLoad interface:

@Injectable()
export default class NewDomainBatchLoad implements BatchLoad {
  constructor(private readonly newDomainService: NewDomainService) {}

  initLoaders() {
    return {
      loadByIds: this.loadByIds(),
      // ... other loaders
    };
  }

  private loadByIds() {
    return new DataLoader(async (ids: readonly string[]): Promise<NewDomain[]> => {
      const items = await this.newDomainService.findByIds(ids as string[]);
      const resultMap = this.makeFrequencyCounter(items);
      return ids.map((id) => resultMap[id] ?? null);
    });
  }

  private makeFrequencyCounter(items: NewDomain[]) {
    return items.reduce((acc, item) => {
      acc[item._id] = item;
      return acc;
    }, {} as Record<string, NewDomain>);
  }
}

Add to BatchLoadService:

@Injectable()
export default class BatchLoadService {
  constructor(
    private readonly placeBatchLoad: PlaceBatchLoad,
    private readonly memberBatchLoad: MemberBatchLoad,
    private readonly newDomainBatchLoad: NewDomainBatchLoad, // Add this
  ) {}

  initLoaders() {
    return {
      place: this.placeBatchLoad.initLoaders(),
      member: this.memberBatchLoad.initLoaders(),
      newDomain: this.newDomainBatchLoad.initLoaders(), // Add this
    };
  }
}

Add to BatchLoadModule:

@Module({
  providers: [
    BatchLoadService, 
    MemberBatchLoad, 
    PlaceBatchLoad,
    NewDomainBatchLoad, // Add this
  ],
  imports: [PlaceModule, UserPlaceModule, NewDomainModule], // Add required modules
  exports: [BatchLoadService],
})
export default class BatchLoadModule {}

Resolver Usage

Resolvers now get batch loaders from the GraphQL context instead of dependency injection:

@ResolveField(() => PlaceObject, { nullable: true })
parent(@Parent() place: Place, @Context("batchRequest") batchRequest: BatchLoadContext) {
  if (!place.parent_id) return undefined;
  return batchRequest.place.listByPlaceIds.load(place.parent_id);
}

@ResolveField(() => [PlaceObject], { nullable: "itemsAndList" })
ancestors(@Parent() place: Place, @Context("batchRequest") batchRequest: BatchLoadContext) {
  if (!place.ancestor_ids) return [];
  return batchRequest.place.listByPlaceIds.loadMany(place.ancestor_ids ?? []);
}

@ResolveFieldSafe(() => PaginatedMemberObject, {
  nullable: "itemsAndList",
  name: "members",
})
members(@Parent() place: Place, @Context("batchRequest") batchRequest: BatchLoadContext) {
  return batchRequest.member.nestedListByPlaceIds.load(place._id);
}

Key Implementation Details

Members batch loader (parent id → list of nested entities)
- Tool used: load with parent place id, returning an array per key, keeping order.

private nestedListByPlaceIds() {
  return new DataLoader(async (placeIds: readonly string[]): Promise<Member[][]> => {
    const [members] = await this.membersService.getMembersByPlaceIds(placeIds as string[]);
    const resultFrequencyCounter = this.makeMembersFrequencyCounter(members);
    return placeIds.map((id) => resultFrequencyCounter[id] ?? []);
  });
}

Place batch loader (single id for parent, list of ids for ancestors):
- Tool used for single id: load by place id (e.g., parent), and for lists: loadMany for ancestors.
- The reducer builds a frequency counter to ensure O(N) combine and preserve order.

private listByPlaceIds() {
  return new DataLoader(async (placeIds: readonly string[]): Promise<(Place | null)[]> => {
    const places = await this.placesService.findByIds(placeIds as string[]);
    const resultFrequencyCounter = this.makePlacesFrequencyCounter(places);
    return placeIds.map((id) => resultFrequencyCounter[id] ?? null);
  });
}

The key principle with DataLoader: the batch function must return results in the same order as the keys. To avoid O(N²) lookup, we build a frequency counter map keyed by id and then map results back in order of input keys. This gives O(N) combine time for each batch.

Results and complexity comparison

Legacy approach:
- Calls ≈ 1 (places) + P (members by place) + P·M (users by member) = O(P·M)
- Example with 50 places and 10 members per place → 551 calls
Batched approach with DataLoader:
- Calls ≈ 1 (places) + 1 (all members for all places) + 1 (all users for all members) = O(1) per nested layer
- Example with 50 places and 10 members per place → 3 calls
Impact: Dramatically fewer DB/broker requests, lower latency, and significantly reduced load on downstream services, with clear, reusable batching primitives per domain.

Last consideration: Handling Large Lists

We now gather all nested entity IDs efficiently. However, a new consideration arises: if a nested entity is a long list associated with a single parent, fetching the entire list could result in a heavy, inefficient query.

To solve this potential problem, we addressed it through API design. For use cases where a nested entity could return a very long list, our GraphQL API returns only the list of IDs. We then provide a separate, dedicated API endpoint to fetch the contents of that list, complete with pagination features.

Conclusion

This challenge pushed us to delve deeper into GraphQL's execution logic and ultimately implement a sophisticated solution using DataLoader patterns. The results were undeniable: we achieved a significant reduction in database load and dramatically improved response times for our large-scale admin panel, ensuring it could scale effectively.

However, no solution is perfect. As we noted, addressing one bottleneck often reveals another, leading us to design a hybrid approach for large datasets using paginated endpoints.

The key lesson? Powerful abstractions are invaluable, but they must be implemented with a clear understanding of their performance implications. Proactive monitoring and a willingness to iterate on your design are crucial for long-term success.

We’re eager to hear from other developers and architects. Share your experiences and thoughts in the comments below!

Help us reach more developers by your reaction and sharing this case study with your network.

DEV Community