Shuwen

Posted on Dec 30, 2025

Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality

#indexing #softwareengineering

As software engineers, we are often asked a very common interview question:

Why do we use database indexing?

The standard answer is familiar to all of us:

To make queries faster
To reduce time complexity from O(n) to O(log n)
By using data structures such as B-trees or B+-trees
At the cost of extra storage and maintenance during insert, update, and delete operations

All of this is correct.

But this answer is incomplete.

There is a deeper reason behind indexing—one that becomes obvious only when you work with real production systems, cloud databases, and billing dashboards.

The Real Problem: Disk I/O, Not CPU

At the end of the day, databases store data on disk.

Even in modern systems with caching and memory layers:

Memory access happens in nanoseconds
Disk access happens in milliseconds

That difference is millions of times slower.

When a query does not use an index, the database often has no choice but to:

Scan a large number of rows
Read many data pages from disk
Perform repeated I/O operations

This is where performance collapses—not because of algorithms, but because of physical disk access.

Indexing Is an I/O Optimization

Indexes allow the database engine to:

Jump directly to relevant data pages
Read far fewer disk blocks
Avoid full table scans

This results in:

Fewer disk reads
Better cache utilization
Lower latency

In other words:

Indexing primarily reduces disk I/O, not just CPU work.

Speed is the symptom.
Reduced I/O is the cause.

Disk Has a Lifespan and a Cost

Every disk read:

Consumes I/O bandwidth
Adds wear to storage devices (especially SSD-based systems)
Competes with other workloads

In traditional on-prem systems, this means:

Hardware degradation
Capacity planning problems

In cloud systems, it means something very concrete:

Money.

Cloud Databases Charge for I/O

Managed databases such as Amazon Aurora charge based on:

Storage
Compute
I/O requests

Behind the scenes, every disk read is counted.

So when a poorly indexed query:

Scans millions of rows
Performs unnecessary disk reads
Runs repeatedly

You are not just slowing down the system—you are directly increasing your cloud bill.

A Realistic Example

Imagine a legacy system with:

Minimal or missing indexes
Heavy reporting queries
High read traffic

Suppose it generates:

100 billion I/O requests per month

At that scale, even a tiny per-request cost becomes massive.

Now introduce:

Proper indexing
Query optimization
Better access patterns

If indexing reduces disk reads by 80–90%, then:

Query latency drops
System stability improves
Cloud cost drops dramatically

This is not theoretical.
This happens all the time in real production systems.

Yes, Indexes Have a Cost — and That’s OK

Indexes do introduce trade-offs:

Extra storage
Slower writes
Maintenance overhead during INSERT, UPDATE, and DELETE

But in most production systems:

Reads vastly outnumber writes
Disk I/O is the dominant bottleneck
Cloud billing is tied to I/O

In these environments, indexing is almost always the right trade-off.

A Better Answer to “Why Do We Use Indexing?”

Instead of saying:

“Indexing makes queries faster.”

A more accurate answer is:

Indexing reduces disk I/O, which improves performance, extends hardware lifespan, and significantly lowers cloud costs.

Final Takeaway

Indexing exists because:

Disk access is slow
Disk access is expensive
Disk access is limited
Disk access is billed in the cloud

Speed is just the visible benefit.
Cost and hardware reality are the real reasons.

Credits & References

Concept inspiration and database fundamentals:
Berkeley CS186 – Introduction to Database Systems
University of California, Berkeley
https://cs186berkeley.net/

DEV Community