Sarim Nadeem

Posted on Jun 13

10 API and Database Blunders Every Beginner Makes (and How to Fix Them Before Production Bites Back)

#api #webdev #beginners #database

You wrote your first REST endpoint. You ran your first SELECT *. Things worked. Life was good.

Then a colleague casually mentioned "idempotency key," another asked why your /users?page=10000 request takes 12 seconds, and your tech lead frowned at your migration script. Suddenly, "it works on my machine" is not enough anymore.

I have been through that frowning-tech-lead phase. So has every senior engineer you know. The mistakes are surprisingly predictable — and almost all of them are easy to avoid if someone points them out before you ship.

This post is that someone.

Below are 10 mistakes I see beginners make again and again with APIs and databases. For each one, you get the blunder, why it bites, and the actual fix.

TL;DR for the skimmers: Always use idempotency keys on mutating endpoints. Use cursor pagination, not offset. Version your API from day one. Normalize first, denormalize only when profiled data tells you to. Index foreign keys. Watch out for N+1 queries. Never run blocking schema changes on hot tables.

Part 1 — API Blunders

1. Shipping a POST endpoint without idempotency keys

This is the single most expensive mistake on day 1 of API design, and almost nobody catches it in a code review.

The blunder:

POST /orders
{ "user_id": 42, "amount": 99.99 }

The user taps "Pay." The phone's signal drops mid-request. The HTTP client retries. Two charges. One angry customer. One support ticket.

GET requests are naturally safe to retry — reading does not change state. But POST, PATCH, and DELETE are not. Without protection, every network blip becomes a duplicate.

The fix — accept an Idempotency-Key header:

POST /orders
Idempotency-Key: 8f14e45f-ceea-467a-9575-d7a08c9b4f1c
{ "user_id": 42, "amount": 99.99 }

On the server side:

Client generates a UUID and sends it on every mutating request.
Server checks if it has seen that key. If yes, return the stored response — do not re-execute.
If no, process the request, store the response keyed by that UUID, then return it.
Expire the record after 24 hours.

Stripe's entire business depends on this pattern. If they had shipped without it in 2011, retrofitting it would have been a breaking change for every customer they had. Bake it in from day one.

A minimal SQL schema for storing idempotency records

CREATE TABLE idempotency_keys (
  key           TEXT PRIMARY KEY,
  api_key_id    BIGINT NOT NULL,
  request_path  TEXT NOT NULL,
  request_hash  TEXT NOT NULL,
  status        TEXT NOT NULL DEFAULT 'processing',
  response_code INT,
  response_body JSONB,
  created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
  expires_at    TIMESTAMPTZ NOT NULL DEFAULT now() + INTERVAL '24 hours'
);

Tip: wrap the idempotency check and the business operation in the same database transaction. This prevents a crash between "recorded the key" and "completed the work" from leaving you in a half-done state.

2. Using offset pagination on anything that will grow

Almost every beginner writes pagination like this:

SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 200000;

It works. On page 1. With 10,000 rows, it still works. Then your table hits 50 million rows and ?page=10000 takes 5 seconds.

Why it breaks: Offset pagination tells the database "skip the first 200,000 rows, then give me 20." The database literally scans and discards 200,000 rows before reading the ones you want. There is no index optimization that fixes this — it is how OFFSET works.

The fix — cursor-based pagination:

-- First page
SELECT id, name, created_at FROM orders
ORDER BY created_at DESC, id DESC
LIMIT 21;

-- Next page — cursor decodes to the last seen (created_at, id)
SELECT id, name, created_at FROM orders
WHERE (created_at, id) < ('2025-12-01T10:30:00Z', 9847)
ORDER BY created_at DESC, id DESC
LIMIT 21;

Return a response shaped like this:

{
  "data": [ ... ],
  "pagination": {
    "has_more": true,
    "next_cursor": "eyJjcmVhdGVkX2F0IjoiMjAyNS0xMi0wMVQxMDozMDowMFoiLCJpZCI6OTg0N30="
  }
}

Quick comparison so you know when to pick which:

Approach	Performance at deep pages	Stable during inserts?	Best for
Offset (`?page=2`)	Slow (scans + discards)	No (page contents shift)	Tiny admin tables (under 100K rows)
Cursor (`?cursor=abc`)	Fast at any depth	Yes	Public APIs, feeds, anything that grows

Pro tip: Fetch LIMIT + 1 rows. If you get the extra one back, set has_more: true and drop it. This saves you a separate COUNT(*) query, which on a large table can take 30+ seconds on its own.

3. "We will add versioning later"

You will not. Or, more precisely, you will — and it will be a breaking change for every existing consumer.

The blunder: Shipping /users instead of /v1/users. When a breaking change inevitably comes, you have to either tell every client to update or maintain a confusing patchwork.

The fix — pick a versioning strategy on day one:

For public APIs, just use URL path versioning. It is the most discoverable and the easiest for third parties.

GET /v1/users/123

That is it. Adding /v1/ on day one is a 30-minute decision. Retrofitting it after launch is a multi-month migration.

A common pushback is "versioning is premature optimization." It is not. Adding versioning later is breaking. Adding it now is free. The asymmetry makes the decision for you.

4. Useless error responses

Compare these two error responses:

Bad:

HTTP/1.1 400 Bad Request

Good:

{
  "error": {
    "code": "invalid_currency",
    "message": "Currency 'usd' is not valid. Did you mean 'USD'? Currencies must be uppercase ISO 4217 codes.",
    "param": "currency",
    "request_id": "req_8f14e45f"
  }
}

Good error messages are the best API documentation. They tell developers exactly what went wrong, why, and how to fix it. The request_id field is a small thing that pays for itself a hundred times over when someone opens a support ticket — you can correlate it to a log entry in seconds.

Pick one consistent error envelope and use it across every endpoint. Do not let one route return { "error": "..." } and another return { "errors": [...] }. Clients build error handlers around that shape. Changing it later is a breaking change even when the HTTP status code is the same.

5. Wrong HTTP status codes (or all 200s)

A surprising number of APIs return 200 OK with { "success": false, "error": "..." } in the body. Please do not do this.

HTTP status codes exist precisely so clients, proxies, monitoring tools, and load balancers can react without parsing your JSON. The basics every beginner should memorize:

Code	Meaning	Common use
`200`	OK	Successful GET/PUT/PATCH
`201`	Created	Successful POST that created a resource
`204`	No Content	Successful DELETE
`400`	Bad Request	The request body is malformed
`401`	Unauthorized	Not authenticated (no/invalid token)
`403`	Forbidden	Authenticated, but not allowed
`404`	Not Found	The resource does not exist
`409`	Conflict	Conflicting state (e.g., duplicate email)
`422`	Unprocessable Entity	Validation failure on otherwise-valid request
`429`	Too Many Requests	Rate limited — include `Retry-After`
`500`	Internal Server Error	Your bug, not theirs

A common beginner mix-up: using 401 when you mean 403. Remember — 401 means "I do not know who you are," 403 means "I know who you are, you just cannot do this."

6. Mixing naming conventions

This one is small but it is the number-one developer complaint about inconsistent APIs:

GET /users         → { "createdAt": "...", "userName": "..." }
GET /orders        → { "created_at": "...", "user_name": "..." }
GET /products      → { "creation_date": "...", "name": "..." }

Three endpoints. Three conventions. One frustrated client developer.

Pick one — snake_case or camelCase — and apply it everywhere. If one endpoint uses created_at, every endpoint uses created_at. Add a linter (spectral is great for OpenAPI specs) to your CI so this fails the build automatically. Humans will not catch this consistently. Linters will.

Part 2 — Database Blunders

7. Denormalizing "for performance" before you have a problem

Here is a scene I have witnessed more times than I can count:

Beginner: "I am going to copy the user's name into every order so I do not have to JOIN."

Then the user updates their name. Now half the orders show "Sarah" and half show "Sara." And nobody knows which one is current.

Quick refresher:

Normalization stores each fact exactly once. The user's name lives in users. Orders reference the user via user_id. Reading an order with the user's name requires a JOIN.
Denormalization copies data for read speed. The user's name is also stored on the orders row, so you skip the JOIN.

When normalization is the right default:

You are building a new application and do not know your read patterns yet.
Data integrity matters (financial records, anything regulatory).
Writes are roughly as common as reads.

When denormalization actually earns its place:

Reads outnumber writes by a huge margin (a product catalog read 1,000× per write).
JOINs are measurably the bottleneck — you have profiled, not guessed.
You are building a read-optimized model on the side, such as a search index, analytics table, or CQRS read model — and you have a clear story for how it gets refreshed.

The order matters: normalize first, then denormalize specific paths when the data tells you to. Going the other way around — starting denormalized and trying to clean it up — is one of the most painful refactors in software.

Rule of thumb: If you are reaching for denormalization before you have a query profiler showing a JOIN is the bottleneck, you are guessing. Add proper indexes first. Most "JOIN performance problems" turn out to be missing indexes.

8. Indexing nothing — or indexing everything

Two equally common, equally painful extremes.

Indexing nothing. Every query is a full table scan. The application feels fine for a week. Then your users table hits a million rows and login takes 3 seconds because you forgot an index on email.

Indexing everything. Every INSERT, UPDATE, and DELETE now has to update all 12 indexes you created "just in case." Writes get slower and slower until your hot table cannot keep up with traffic.

Indexes are not free. Every index adds a write operation to every change of the underlying row. A table with 8 indexes means a single insert causes roughly 9 disk writes (1 for the table, 8 for the indexes).

A sane starting point:

Always index your primary key (the database does this for you).
Index every foreign key. Beginners forget this constantly, and unindexed FKs make JOINs slow and DELETE on the parent table catastrophic.
Index columns you WHERE on most often (email, status, user_id).
For multi-column filters, use a composite index — but pay attention to column order. A composite index on (user_id, created_at) helps WHERE user_id = ? AND created_at > ? but not WHERE created_at > ? alone. Put the column you filter by exact match first; put the range column last.
Then stop. Adding more indexes "for safety" actively hurts you.

Bonus blunder — functions on indexed columns

-- Does NOT use the index on `email`
SELECT * FROM users WHERE LOWER(email) = 'sara@example.com';

The function wraps the column, so the database cannot use the index. Either store the value already lowercased, or create a functional index:

CREATE INDEX idx_users_email_lower ON users (LOWER(email));

This is one of the most common reasons "I added an index but my query is still slow" tickets land in senior engineers' inboxes.

9. The N+1 query problem

This is the silent killer. Your tests pass. Your dev environment is fast. Production goes live, and every page load fires 500 database queries.

The blunder (in pseudo-code, but every ORM has its flavor):

const orders = await db.query("SELECT * FROM orders WHERE user_id = ?", userId);
// orders.length = 50

for (const order of orders) {
  // BOOM — this fires one query per order. 50 orders = 50 queries.
  order.items = await db.query("SELECT * FROM items WHERE order_id = ?", order.id);
}

One initial query (the "1") plus N follow-up queries (one per row). Hence "N+1."

The fix — fetch in batches:

-- Single query, get all items for all 50 orders at once
SELECT * FROM items WHERE order_id IN (1, 2, 3, ..., 50);

Then group them in memory by order_id. One round-trip instead of fifty.

Most ORMs have a name for this. In Rails it is includes. In Sequelize it is include. In Django it is select_related / prefetch_related. In Prisma it is include. Learn yours. If you take only one thing from this section, take this.

10. Running ALTER TABLE on a live, large table during business hours

The first time you do this in production, you find out two things very fast:

ALTER TABLE on a table with 100 million rows can lock the table for hours.
Locked means no writes. Locked means your application throws errors. Locked means an incident channel that pings you at 3 a.m.

The fix — never do a blocking schema change on a hot table. Use online schema migration tooling:

MySQL: gh-ost (built by GitHub for exactly this reason) or pt-online-schema-change.
PostgreSQL: Most simple changes (ADD COLUMN with no default, CREATE INDEX CONCURRENTLY, ADD CONSTRAINT NOT VALID then VALIDATE) can be done online if you do them in the right order.

The bigger pattern is "expand and contract":

Expand: Add the new column / table / index. Application keeps writing the old way and the new way.
Backfill: In a separate batch job, populate historical data. Do it in small batches to avoid long-running transactions.
Migrate reads: Switch the application to read from the new column.
Contract: Drop the old column.

Never combine "add the new thing" and "drop the old thing" in one deploy. That is how you cause an outage. If you roll back the deploy, the old code now has no column to write to.

A small bonus: things that look safe but are breaking changes

These are the ones that bite you after a few months in production, when someone says "I just added a tiny thing." Memorize this list:

Adding a required field to a request body — old clients break.
Renaming a field (even on a new version) — clients with cached schemas break.
Tightening validation (was 500 chars, now 255) — existing data fails on update.
Changing the sort order of a list endpoint — clients silently get wrong data.
Wrapping a response in an envelope ([...] → { "data": [...] }) — every client breaks.
Changing a timestamp format from "2025-12-01T10:30:00Z" to "2025-12-01T10:30:00+00:00" — both are valid ISO 8601, both are different strings, and string-comparing clients fail.

The pattern: anything that changes the shape or rules of a response is a breaking change, even when the data is "logically the same."

Your pre-ship checklist

A reasonable checklist for your next API or service:

[ ] Mutating endpoints accept an Idempotency-Key header.
[ ] Pagination is cursor-based by default.
[ ] URLs include /v1/ from day one.
[ ] One consistent error envelope across every endpoint.
[ ] HTTP status codes match what actually happened.
[ ] Naming convention enforced by a linter in CI.
[ ] Normalized schema by default, denormalized only on profiled hot paths.
[ ] Every foreign key has an index.
[ ] No N+1 queries — verified by logging actual SQL in dev.
[ ] Schema migrations go through expand-and-contract, never blocking changes on hot tables.

If you ship something with this list checked, you are already operating well above the level most beginner APIs hit production at.

Wrapping up

Here is the honest truth: nobody catches all of these on their first project. I did not. The people writing your favorite developer tools did not. The fastest path to leveling up is not memorizing patterns — it is recognizing the names of the failure modes when you read code, so you can spot them before they hit production.

What is the one mistake from this list that you learned the hard way? Drop it in the comments — I would love to hear it, and so would the next person reading this post.

Happy shipping.

DEV Community