DEV Community

Cover image for Your API Responses Are 40x Larger Than They Need to Be
Polliog
Polliog

Posted on

Your API Responses Are 40x Larger Than They Need to Be

I was profiling a production API last year when I noticed something that should have been obvious from the start: the response body for a simple list endpoint was 2.4MB. The actual useful data? About 60KB.

The rest was a mix of unused fields, redundant nesting, and no compression. It had been that way since day one, and nobody had noticed because on localhost it's fast enough that it doesn't register.

This is not a rare situation. It's the default.

The Three Ways APIs Bloat Their Responses

1. No Compression

This is the easiest win and the most commonly skipped.

HTTP has supported gzip compression since 1999. Brotli has been in all major browsers since 2017. Most APIs don't enable either by default.

# Check if your API compresses responses
curl -s -o /dev/null -w "%{size_download}" https://api.example.com/users
curl -s -o /dev/null -w "%{size_download}" -H "Accept-Encoding: gzip" https://api.example.com/users
Enter fullscreen mode Exit fullscreen mode

If both numbers are the same, you're not compressing.

The fix in Node.js with Fastify:

import fastifyCompress from "@fastify/compress";

await app.register(fastifyCompress, {
  global: true,
  encodings: ["br", "gzip", "deflate"],
  threshold: 1024, // don't compress responses under 1KB
});
Enter fullscreen mode Exit fullscreen mode

And with Express:

import compression from "compression";

app.use(compression({
  level: 6,
  threshold: 1024,
  filter: (req, res) => {
    if (req.headers["x-no-compression"]) return false;
    return compression.filter(req, res);
  },
}));
Enter fullscreen mode Exit fullscreen mode

Real numbers from a list endpoint returning 500 records:

Format Size
No compression 2.4MB
gzip 280KB
brotli 210KB

That's an 8-11x reduction with zero changes to your data model, zero changes to clients, and about 10 lines of code.

Note: Handling compression in Node.js works well for most setups. At large scale, offloading it to your reverse proxy (NGINX) or CDN (Cloudflare) saves CPU cycles since Node.js is single-threaded and compression is CPU-intensive. If you're already behind a proxy, check whether it's compressing for you before adding it in Node.js too.

2. Over-fetching

Every ORM makes it trivial to return entire database rows. Most codebases do exactly that.

// This returns every column in the users table
const users = await db.query("SELECT * FROM users");

// Including: password_hash, internal_flags, created_at, updated_at,
// deleted_at, last_ip, raw_oauth_payload, internal_notes...
Enter fullscreen mode Exit fullscreen mode

The fix is not complicated - it just requires being deliberate:

// Return only what the client actually needs
const users = await db.query(`
  SELECT id, name, email, avatar_url, role
  FROM users
  WHERE organization_id = $1
  ORDER BY created_at DESC
`, [orgId]);
Enter fullscreen mode Exit fullscreen mode

If you're using an ORM like Prisma, use select explicitly:

const users = await prisma.user.findMany({
  where: { organizationId },
  select: {
    id: true,
    name: true,
    email: true,
    avatarUrl: true,
    role: true,
  },
});
Enter fullscreen mode Exit fullscreen mode

The temptation to use SELECT * or skip the select clause is real because it saves two minutes of typing. The cost is paid on every request, by every client, forever.

3. Redundant Nesting and Metadata

This one is subtler. It's the pattern where every response wraps data in a consistent envelope, which is fine, but the envelope carries metadata that nobody uses.

{
  "success": true,
  "status": 200,
  "message": "OK",
  "timestamp": "2026-04-02T10:00:00Z",
  "requestId": "abc-123",
  "version": "v1",
  "data": {
    "items": [...],
    "meta": {
      "total": 1250,
      "page": 1,
      "perPage": 20,
      "lastPage": 63,
      "from": 1,
      "to": 20,
      "currentPage": 1,
      "hasNextPage": true,
      "hasPrevPage": false
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

success, status, message, and version are duplicating what HTTP already tells the client. currentPage is page renamed. from and to are derivable from page and perPage. hasPrevPage is page > 1.

A cleaner version:

{
  "data": [...],
  "pagination": {
    "total": 1250,
    "page": 1,
    "perPage": 20,
    "hasNext": true
  }
}
Enter fullscreen mode Exit fullscreen mode

Less noise, same information, smaller payload.

Keyset Pagination vs. Offset Pagination

While we're talking about list endpoints, there's a related issue worth covering.

Offset pagination (LIMIT 20 OFFSET 200) requires the database to count and skip rows. On large tables this gets slow fast, and it also makes total count queries expensive.

-- This scans and counts the entire table
SELECT COUNT(*) FROM logs WHERE organization_id = $1;

-- On a table with 50M rows this can take 2-5 seconds
Enter fullscreen mode Exit fullscreen mode

Keyset pagination avoids both problems:

// Instead of page/offset, use the last seen ID.
// Request one extra row to check if there's a next page.
const logs = await db.query(`
  SELECT id, timestamp, level, message
  FROM logs
  WHERE organization_id = $1
    AND id < $2
  ORDER BY id DESC
  LIMIT $3
`, [orgId, lastSeenId, pageSize + 1]);

// If we got more than pageSize rows, there's a next page.
// Trim the extra row before sending.
const hasNext = logs.length > pageSize;
const items = logs.slice(0, pageSize);
Enter fullscreen mode Exit fullscreen mode

You lose the ability to jump to arbitrary pages, which matters for some UIs but not for most. You gain: fast queries at any depth, no COUNT(*) needed, and stable pagination even when rows are being inserted concurrently.

Conditional Responses with ETags

If your data doesn't change between requests, the client shouldn't have to download it again.

import { createHash } from "crypto";

// In your route handler
const data = await fetchData(orgId);

// If your data has an updated_at field, use that instead of hashing
// the full payload - much cheaper on large responses.
// const etag = createHash("md5").update(`${data.id}:${data.updatedAt}`).digest("hex");

const etag = createHash("md5")
  .update(JSON.stringify(data))
  .digest("hex");

const clientEtag = req.headers["if-none-match"];

if (clientEtag === etag) {
  return res.status(304).send(); // Not Modified - zero payload
}

res
  .header("ETag", etag)
  .header("Cache-Control", "private, max-age=0, must-revalidate")
  .send(data);
Enter fullscreen mode Exit fullscreen mode

One caveat: hashing the full JSON.stringify(data) on every request is expensive if your payload is large. If the data has an updated_at timestamp in the database, derive the ETag from that instead - hash(id + updated_at) is a constant-time operation regardless of payload size and avoids blocking the event loop.

For list endpoints where data changes frequently, this won't help much. For configuration endpoints, user profile data, or static reference data, a 304 response is massively cheaper than resending the same payload on every poll.

Putting It Together

The combined impact on that 2.4MB endpoint:

Change Size Reduction
Baseline 2.4MB -
+ brotli compression 210KB 91%
+ select only needed fields 58KB 97.5%
+ clean response envelope 55KB 97.7%
+ ETag (no change) 0KB 100%

The 40x number in the title is real. Most of it comes from compression alone - the rest is just discipline.

None of these changes require a rewrite. They don't break existing clients. They're additive. The only cost is a bit of attention to defaults that most frameworks don't set correctly out of the box.

Start with compression. It takes 10 minutes and the numbers will surprise you.


Found something I got wrong, or a pattern that works better for your stack? Drop it in the comments.

Top comments (0)