DEV Community

Currently Buffering
Currently Buffering

Posted on

I instrumented 95 DataLoaders in a production GraphQL API — here's what I found

DataLoader is the standard fix for GraphQL's N+1 query problem. Batch your database calls per request, cache within the request lifecycle, done.

But once DataLoader is in production, you're flying blind. Which loaders are actually called per request? Is your cache hit rate 15% or 60%? Should your batch size be 10 or 50? APM tools tell you resolver latency, but they don't understand DataLoader batching.

I built dataloader-ai to answer those questions. Then I tested it for real by instrumenting 95 DataLoader instances in Open Collective's GraphQL API.

The problem: invisible batching

Open Collective runs one of the largest open-source GraphQL APIs on the web. Their server/graphql/loaders/ directory contains 96 DataLoader instances across 20 files — loaders for collectives, expenses, transactions, members, comments, orders, and more.

Without instrumentation, none of these questions are answerable:

  • Which loaders fire per request? You can guess from the schema, but you don't know for sure without tracing.
  • Are batches efficient? A loader called 20 times in a request should ideally create 1 batch of 20 — not 20 batches of 1.
  • What's the cache hit rate? DataLoader's cache is per-request, but hit rate varies wildly depending on query shape.
  • Is the batch size right? Too small = more round trips. Too big = slow batches. The default is often wrong.

The tool: dataloader-ai

dataloader-ai is a drop-in wrapper for the dataloader package. Same API, zero config:

// before
import DataLoader from 'dataloader'
const userLoader = new DataLoader(batchLoadUsers)

// after
import { DataLoaderAI } from 'dataloader-ai'
const userLoader = new DataLoaderAI(batchLoadUsers, { name: 'user' })
Enter fullscreen mode Exit fullscreen mode

Same load()/loadMany()/clear()/prime() API. Under the hood it tracks:

  • Cache hit rate per loader (with visual bar in terminal)
  • Avg and p95 latency per batch function
  • Batch efficiency (rolling sparkline of batch sizes)
  • Batch-size recommendations based on a configurable latency target

It prints a live report to your terminal every 5 seconds:

▲ dataloader-ai 14:23:01
──────────────────────────────────────────────────────
user
  cache [████████████████░░░░░░░░] 64.2%
  avg=12.4ms p95=18.1ms batched=47 avoided=86 savings=$0.0086
  batch efficiency ▄▄█▄▅█▆▅██▄▆▇
  recommendation ↑ increase 10 → 12

product
  cache [████████░░░░░░░░░░░░░░░░] 34.1%
  avg=8.7ms p95=14.3ms batched=31 avoided=42 savings=$0.0042
  batch efficiency █▄▅▄██▄▅▆▄▅
  recommendation ↓ decrease 10 → 8

──────────────────────────────────────────────────────
Enter fullscreen mode Exit fullscreen mode

No API key required. No account. No data leaves your machine. It works in local-first mode — the terminal output is the product. An optional cloud dashboard exists for teams who want historical trends and alerts.

The experiment: Open Collective's API

I forked opencollective/opencollective-api and replaced 95 of 96 DataLoader instances with DataLoaderAI, adding a descriptive name to each:

// before
new DataLoader(async (ids) => { ... })

// after
new DataLoaderAI(async (ids: readonly number[]) => { ... }, { name: 'collective-by-id' })
Enter fullscreen mode Exit fullscreen mode

The changes were mechanical — 20 files, 397 insertions, 379 deletions. You can see the full fork PR here.

What I found

server/graphql/loaders/index.ts is the hotspot — 43 inline DataLoader instances in a single file (1,401 lines). This is where most collective, expense, and transaction loaders live. If you're going to instrument anything, start here.

Named loaders make debugging 10x easier. Before, every loader was an anonymous new DataLoader(fn). After, each one has a name like collective-by-slug, expense-attached-files, or tier-total-donated. When the terminal report prints, you immediately know which loader is slow or under-batching.

The readonly array pattern matters. DataLoaderAI tracks batch efficiency by counting keys per batch call. TypeScript's readonly number[] (vs number[]) makes this explicit — the batch function receives an immutable snapshot of keys.

One loader stayed vanilla. The buildLoaderForAssociation helper in helpers.ts is a generic utility that creates loaders dynamically — it's not a named, domain-specific loader. It's the right call to leave it as-is rather than add a generic name that doesn't tell you anything.

How the recommendation engine works

This is not ML. It's honest heuristics, and I want to be transparent about that.

The BatchSizeOptimizer maintains a rolling window of batch latencies (default: last 20 batches). Every 5 batches, it checks:

  • If avg latency < 70% of target → increase batch size by 20% (you have headroom)
  • If avg latency > 130% of target OR p95 > 200% of target → decrease by 20% (you're overloading)
  • Otherwise → hold (near-optimal)

The default target is 50ms. If your batch function averages 12ms and your target is 50ms, the recommendation is: "you can safely batch more keys per call — increase from 10 to 12." That's a 20% reduction in round trips with zero risk.

This is transparent. You can see exactly why each recommendation is made. You can configure the target latency, min/max batch size, and window size. No black box.

A realistic example

The SDK ships with a realistic ecommerce example — an Apollo Server with 5 DataLoaderAI loaders (users, products, categories, reviews, orders) and a load-test script that fires 5 different query patterns.

Run it:

git clone https://github.com/currentlybuffering/dataloader-ai
cd dataloader-ai/src/examples/realistic-ecommerce
npm install
node index.ts
# in another terminal:
node load-test.ts
Enter fullscreen mode Exit fullscreen mode

The terminal report shows all 5 loaders with live metrics. The orders loader (15-35ms simulated DB latency) consistently gets "increase batch size" recommendations. The category loader (3-7ms) holds steady. The reviews loader shows the most cache-hit variance because review queries overlap differently per request pattern.

What this means for your GraphQL server

If you're running DataLoader in production:

  1. Add names to your loaders. Even if you don't use dataloader-ai, naming your loaders makes debugging dramatically easier. Just add a name property to your DataLoader options.

  2. Check your batch efficiency. Are you getting 1 batch of N keys, or N batches of 1 key? If resolvers call .load() late in the cycle (after awaits), DataLoader can't batch them.

  3. Measure cache hit rate per query. A query that fetches the same user 5 times in one request should have 80% cache hit rate on the user loader. If it's 0%, something is wrong with your per-request cache lifecycle.

  4. Tune batch sizes to your actual latency. The default maxBatchSize in DataLoader is Infinity. Most teams set it to something arbitrary (10, 50, 100) without measuring. Use your actual batch function latency to pick the right value.

Try it

npx dataloader-ai demo
Enter fullscreen mode Exit fullscreen mode

No install, no account, no API key. The demo simulates a GraphQL server and prints live metrics to your terminal.

For your own server:

npm install dataloader-ai
Enter fullscreen mode Exit fullscreen mode

Then swap DataLoaderDataLoaderAI with a name option. That's it.

  • Local mode: free forever, terminal metrics, no data leaves your machine
  • Cloud dashboard: free during beta, historical trends + alerts
  • SDK: MIT-licensed, on GitHub, on npm (1,400+ downloads/month)

I'm the solo developer behind dataloader-ai. Built it because I kept running into the same observability gap in GraphQL servers. Would love feedback from anyone running DataLoader in production.

Top comments (0)