DEV Community

Cover image for How Data Access Breaks Down Under Load
Rahman Nugar
Rahman Nugar

Posted on

How Data Access Breaks Down Under Load

Most database performance problems do not begin with obviously slow queries.

In fact, many systems look perfectly fine early on. Queries execute quickly, endpoints respond in a few milliseconds, and the database appears healthy. Even when something starts slowing down, the fix usually feels simple enough. Add an index, reduce a join, cache a response, and move on.

That works for a while, which is part of what makes these issues deceptive.

Because in production systems, performance problems rarely come from a single query being expensive in isolation. More often, they emerge from how the application accesses data as traffic increases and concurrency grows.

A request that looks harmless locally may execute dozens of queries. Some of those queries may repeat unnecessarily. Some may fetch significantly more data than is actually needed. Others may hold connections longer than expected or compete with other transactions under load. None of these things look particularly dangerous on their own, but once they begin happening across hundreds or thousands of requests, the behavior of the system changes completely.

Table of Contents

  1. The Real Problem
  2. Profiling Data Access Properly
  3. N+1 Query Problem
  4. Overfetching and Data Shape
  5. Indexes
  6. Connection Pools and Throughput
  7. Transactions, Isolation, and Contention
  8. Deadlocks
  9. How These Problems Compound
  10. Closing Thoughts

1. The Real Problem

A database does not receive queries in isolation. It receives a continuous stream of queries generated by application code, and that stream is shaped entirely by how requests are implemented.

As engineers, we often think in terms of individual operations:

  • fetch user
  • fetch orders
  • fetch comments

But the database does not see isolated operations. It sees concurrent requests, connection pressure, transaction overlap, repeated scans, lock contention between related requests, and large volumes of queries arriving continuously from the application layer.

This is why systems that feel “fast” in development can degrade surprisingly quickly in production.

Locally, you might test with:

  • 20 users
  • small datasets
  • almost no concurrency

Production changes all of those assumptions simultaneously. The dataset becomes larger. More requests execute at the same time. Queries that looked harmless start repeating at scale. Operations that used to take milliseconds begin waiting on locks or connections instead of execution itself.

2. Profiling Data Access

Most engineers start profiling from the database. They look for slow queries, run EXPLAIN ANALYZE, and try to optimize whatever looks expensive.

That approach is incomplete, because it assumes the problem is within a single query. A query that takes 300ms once may not be a real problem. A query that takes 3ms but executes 200 times per request usually is.

In reality, you need to start from the request.

Take an endpoint and break it down in terms of what it actually does to the database. Not just the result it returns, but the work it generates.

For example:

GET /dashboard

query_count: 74
total_db_time: 118ms
max_query_time: 6ms
Enter fullscreen mode Exit fullscreen mode

Nothing there looks especially alarming until you realize the endpoint executes seventy-four queries.

At low traffic, that may still feel fast enough. Under concurrency, it becomes a completely different problem because every request now generates far more database work than expected.

Repetition Is Often Worse Than Latency

A very common production issue is not necessarily slow queries, but repeated queries.

Something like:

sql
SELECT * FROM posts WHERE user_id = ?
Enter fullscreen mode Exit fullscreen mode

executed over and over again within the same request can lead to N + 1 query problem.

Execution Time Is Not the Full Cost

Another thing that becomes important under load is separating execution time from wait time.

A query might execute quickly once it reaches the database, but still spend significant time:

  • waiting for a connection
  • waiting on another transaction
  • waiting for locks to clear When EXPLAIN Actually Helps

EXPLAIN ANALYZE becomes useful once you confirm the issue is inside the query itself.

For example, if a query unexpectedly performs a sequential scan over a very large table:

Seq Scan on sessions
Rows Removed by Filter: 900000
Enter fullscreen mode Exit fullscreen mode

that is usually a strong signal that:

  • an index is missing
  • the existing index is unusable
  • or the query shape does not match the access pattern properly

But EXPLAIN only helps with localized inefficiency. It does not tell you whether the application is generating too many queries overall, saturating the connection pool, or overfetching aggressively.

3. N+1 Query Problem

The N+1 problem is one of the easiest ways to accidentally overload a database without realizing it early enough.

The reason it slips through so often is because the code usually looks somewhat reasonable.

ts
const users = await prisma.user.findMany()

for (const user of users) {
  const posts = await prisma.post.findMany({
    where: {
      userId: user.id
    }
  })
}
Enter fullscreen mode Exit fullscreen mode

Nothing about this feels obviously dangerous during development.

If there are 10 users, this endpoint executes:

  • 1 query to fetch users
  • 10 additional queries to fetch posts

The issue is not that the queries are individually slow. The issue is that the amount of database work now scales with the number of records returned.

As the dataset grows:

  • 100 users → 101 queries
  • 500 users → 501 queries

The endpoint becomes progressively more expensive simply because more rows are returned.

Under concurrency, this gets worse very quickly because every request repeats the same inefficient access pattern against the database.

One common mistake is assuming that concurrency fixes the problem.

For example:

ts
await Promise.all(
  users.map((user) =>
    prisma.post.findMany({
      where: {
        userId: user.id
      }
    })
  )
)
Enter fullscreen mode Exit fullscreen mode

This may reduce the response time of the endpoint because the queries now execute concurrently, but the application is still generating one query per user.

The access pattern itself has not changed.

In some cases, this can actually make things worse because the application now sends many related queries to the database at the same time instead of sequentially. Under enough traffic, that creates even more connection pressure and query contention.

The proper fix is usually to fetch related data via joins or batch queries.

For example:

const users = await prisma.user.findMany({
  include: {
    posts: true
  }
})
Enter fullscreen mode Exit fullscreen mode

or

const userIds = users.map((user) => user.id)

const posts = await prisma.post.findMany({
  where: {
    userId: {
      in: userIds
    }
  }
})
Enter fullscreen mode Exit fullscreen mode

4. Overfetching and Data Shape

In many systems, the problem is not necessarily that queries are slow. The problem is that queries return significantly more data than the request actually needs.

For example:

selecting entire rows instead of specific fields
loading large relations unnecessarily
returning deeply nested objects to clients that only need summaries

ORMs make this particularly easy because queries are abstracted with language code.

Something like:

include: {
  posts: true
}
Enter fullscreen mode Exit fullscreen mode

might technically solve an N+1 problem while still introducing unnecessary work elsewhere.

Maybe the client only needs:

  • post title
  • createdAt
  • latest 5 posts

but the query now returns:

  • full post bodies
  • metadata
  • all historical posts
  • nested relations

At low traffic, this usually feels harmless however under load, it becomes expensive because every unnecessary field now contributes to:

  • larger payloads
  • higher memory usage
  • longer serialization time
  • increased network transfer
  • more cache pressure

5. Indexes and Access Patterns

A lot of engineers think of indexes as generic speed improvements, but indexes are really about access patterns. They determine how efficiently the database can locate data as datasets grow. Without an index, the database often has no efficient way to locate matching rows, so it scans through the table looking for them.

With an index, the database can navigate directly to the relevant portion of data instead of traversing everything.

Imagine an order table with 1000 rows, scanning the entire DB to return orders for a user may feel fast here however such query may become extremely expensive at 10 million rows even though the query itself never changed.

Composite Indexes

Composite indexes become important once queries filter or sort across multiple fields.

For example:

@@index([userId, revoked])
Enter fullscreen mode Exit fullscreen mode

works well for queries like:

WHERE userId = ?
AND revoked = false
Enter fullscreen mode Exit fullscreen mode

One thing to note about composite indexes is that they are order-sensitive.

@@index([userId, revoked])
Enter fullscreen mode Exit fullscreen mode

would work for queries like:

WHERE userId = ?
AND revoked = false
Enter fullscreen mode Exit fullscreen mode
WHERE userId = ?
Enter fullscreen mode Exit fullscreen mode

but not

 WHERE revoked = false and userId = ?
Enter fullscreen mode Exit fullscreen mode

Indexes Are Not Free

Indexes improve reads by introducing additional work elsewhere.

Every insert, update, or delete must now maintain those index structures as well.

Too many indexes can:

slow writes
increase storage usage
increase memory pressure
create unnecessary maintenance overhead

The correct pattern is to think in terms of tradeoffs, you ideally want to index read heavy table columns.

6. Connection Pools and Throughput

Every query requires a database connection to fetch data or mutate state. These connections are not lightweight resources. They consume:

  • memory
  • CPU scheduling overhead
  • active database resources

Without pooling, applications can overwhelm the database very quickly under concurrency. Instead of opening a new connection per request the application maintains a reusable pool of active connections shared across requests.

At that point, the system effectively becomes a queue:

  • requests wait for available connections
  • queries execute
  • connections return to the pool

Pool Saturation

Once all connections are busy:

new queries must wait
request latency increases
timeouts begin appearing

This is one reason systems sometimes feel slow even when individual queries are relatively fast.

The queries themselves may execute quickly once they reach the database, but they spend significant time waiting for available connections beforehand.

Pool Size Trade-offs

A pool that is too small increases queueing and latency meanwhile pool too large creates a different problem.

Too many concurrent connections increase:

  • context switching
  • lock contention
  • database scheduling overhead

Eventually the database spends more time coordinating active work than executing queries efficiently.

This is also why tools like PgBouncer become important in high concurrency systems. They reduce connection overhead and help applications manage large numbers of requests more efficiently.

Pooling, however, does not fix inefficient access patterns. It only controls how much concurrent pressure reaches the database at once.

A badly optimized endpoint behind a connection pool is still a badly optimized endpoint. The pool simply delays the point at which the system begins struggling.

This distinction becomes important because many performance issues that appear to be “database problems” are actually workload problems created by the application layer itself.

7. Transactions, Isolation, and Contention

Transactions refer to set of operations that fail or succeed together. Relational databases utilizes transactions(if configured) to carry out DB work. They ensure atomicity but alone do not automatically guarantee correctness under concurrency.

Once systems start handling large amounts of concurrent traffic, correctness becomes just as important as performance.

Multiple requests may now attempt to:

  • update the same rows
  • read partially changing data
  • modify shared resources simultaneously

Without proper transaction control, this can create subtle inconsistencies that are extremely difficult to debug once they appear in production.

Isolation

solation levels determine how transactions behave while other transactions are executing at the same time.

For example:

  • can one transaction read uncommitted changes from another?
  • can the same query return different results within the same transaction?
  • can two transactions overwrite each other’s updates?

These problems become more visible once systems become highly concurrent.

One common example is the lost update problem.

Imagine two concurrent requests reading the same balance:

txt
Balance = 100

Transaction A reads 100
Transaction B reads 100

Transaction A updates to 50
Transaction B updates to 50
Enter fullscreen mode Exit fullscreen mode

The final balance becomes 50 instead of 0 because both transactions operated on stale state.

This is why isolation levels exist.

Lower isolation levels improve throughput because transactions coordinate less aggressively, but they also allow more concurrency anomalies.

Higher isolation levels improve correctness, but increase coordination overhead and reduce concurrency; meaning concurrent requests run similarly to a one after another flow.

For example:

  • analytics systems may tolerate slightly stale reads
  • payment systems usually cannot

One thing that becomes important under heavy traffic is understanding that transactions also hold resources while they remain open.

Long-running transactions increase:

  • lock duration
  • contention
  • connection occupancy

which means poorly designed transaction boundaries can indirectly degrade the entire system under load.

8. Deadlocks

Deadlocks occur when transactions wait on each other in a cycle, and neither can proceed. They occur when multiple transactions access shared resources in different orders and end up waiting on each other indefinitely.

For example:


Transaction A locks Row 1
Transaction B locks Row 2

Transaction A now waits for Row 2
Transaction B now waits for Row 1
Enter fullscreen mode Exit fullscreen mode

Neither transaction can continue so the database resolves the deadlock by aborting one of them.

Deadlocks are a sign that the system does not have a consistent way of coordinating access.

One of the most effective ways to reduce deadlocks is maintaining consistent lock ordering.

For example, if multiple transactions must update two rows:

  • always lock the smaller ID first
  • always acquire resources in the same sequence

Keeping transactions short also matters.

The longer a transaction remains open:

  • the longer locks are held
  • the more overlap exists with other transactions
  • the higher the probability of contention

9. How These Problems Compound

One thing that makes production performance difficult is that these issues rarely appear independently.

A system may begin with a small N+1 pattern.

That increases query volume.

Higher query volume increases connection usage.

Connection pressure increases wait time.

Longer waits increase transaction duration.

Longer transactions increase lock contention.

More contention increases the likelihood of deadlocks and throughput collapse.

At that point, the system is no longer failing because of one slow query. The entire access pattern of the application has become unstable under concurrency.

10. Closing Thoughts

Most database performance issues are not caused by one catastrophic query. They usually emerge gradually from small inefficiencies that compound under load:

  • repeated queries
  • excessive payloads
  • inefficient indexing
  • poor transaction boundaries
  • uncontrolled concurrency

Many of these systems still feel “fast” early on, which is part of why these problems are easy to ignore initially.

The challenge is that production traffic changes the behavior of the system completely. Queries that once felt harmless begin repeating at scale, transactions overlap more aggressively, connection pressure increases, and small inefficiencies start compounding into system-wide bottlenecks.

At that point, database performance is no longer just about whether a query is fast. It becomes about whether the overall pattern of data access can still hold up under concurrency.

Top comments (0)