DEV Community

Shuwen
Shuwen

Posted on

Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality

As software engineers, we are often asked a very common interview question:

Why do we use database indexing?

The standard answer is familiar to all of us:

  • To make queries faster
  • To reduce time complexity from O(n) to O(log n)
  • By using data structures such as B-trees or B+-trees
  • At the cost of extra storage and maintenance during insert, update, and delete operations

All of this is correct.

But this answer is incomplete.

There is a deeper reason behind indexing—one that becomes obvious only when you work with real production systems, cloud databases, and billing dashboards.


The Real Problem: Disk I/O, Not CPU

At the end of the day, databases store data on disk.

Even in modern systems with caching and memory layers:

  • Memory access happens in nanoseconds
  • Disk access happens in milliseconds

That difference is millions of times slower.

When a query does not use an index, the database often has no choice but to:

  1. Scan a large number of rows
  2. Read many data pages from disk
  3. Perform repeated I/O operations

This is where performance collapses—not because of algorithms, but because of physical disk access.


Indexing Is an I/O Optimization

Indexes allow the database engine to:

  • Jump directly to relevant data pages
  • Read far fewer disk blocks
  • Avoid full table scans

This results in:

  • Fewer disk reads
  • Better cache utilization
  • Lower latency

In other words:

Indexing primarily reduces disk I/O, not just CPU work.

Speed is the symptom.
Reduced I/O is the cause.


Disk Has a Lifespan and a Cost

Every disk read:

  • Consumes I/O bandwidth
  • Adds wear to storage devices (especially SSD-based systems)
  • Competes with other workloads

In traditional on-prem systems, this means:

  • Hardware degradation
  • Capacity planning problems

In cloud systems, it means something very concrete:

Money.


Cloud Databases Charge for I/O

Managed databases such as Amazon Aurora charge based on:

  • Storage
  • Compute
  • I/O requests

Behind the scenes, every disk read is counted.

So when a poorly indexed query:

  • Scans millions of rows
  • Performs unnecessary disk reads
  • Runs repeatedly

You are not just slowing down the system—you are directly increasing your cloud bill.


A Realistic Example

Imagine a legacy system with:

  • Minimal or missing indexes
  • Heavy reporting queries
  • High read traffic

Suppose it generates:

  • 100 billion I/O requests per month

At that scale, even a tiny per-request cost becomes massive.

Now introduce:

  • Proper indexing
  • Query optimization
  • Better access patterns

If indexing reduces disk reads by 80–90%, then:

  • Query latency drops
  • System stability improves
  • Cloud cost drops dramatically

This is not theoretical.
This happens all the time in real production systems.


Yes, Indexes Have a Cost — and That’s OK

Indexes do introduce trade-offs:

  • Extra storage
  • Slower writes
  • Maintenance overhead during INSERT, UPDATE, and DELETE

But in most production systems:

  • Reads vastly outnumber writes
  • Disk I/O is the dominant bottleneck
  • Cloud billing is tied to I/O

In these environments, indexing is almost always the right trade-off.


A Better Answer to “Why Do We Use Indexing?”

Instead of saying:

“Indexing makes queries faster.”

A more accurate answer is:

Indexing reduces disk I/O, which improves performance, extends hardware lifespan, and significantly lowers cloud costs.


Final Takeaway

Indexing exists because:

  • Disk access is slow
  • Disk access is expensive
  • Disk access is limited
  • Disk access is billed in the cloud

Speed is just the visible benefit.
Cost and hardware reality are the real reasons.


Credits & References

Concept inspiration and database fundamentals:
Berkeley CS186 – Introduction to Database Systems
University of California, Berkeley
https://cs186berkeley.net/

Top comments (0)