As software engineers, we are often asked a very common interview question:
Why do we use database indexing?
The standard answer is familiar to all of us:
- To make queries faster
- To reduce time complexity from O(n) to O(log n)
- By using data structures such as B-trees or B+-trees
- At the cost of extra storage and maintenance during insert, update, and delete operations
All of this is correct.
But this answer is incomplete.
There is a deeper reason behind indexing—one that becomes obvious only when you work with real production systems, cloud databases, and billing dashboards.
The Real Problem: Disk I/O, Not CPU
At the end of the day, databases store data on disk.
Even in modern systems with caching and memory layers:
- Memory access happens in nanoseconds
- Disk access happens in milliseconds
That difference is millions of times slower.
When a query does not use an index, the database often has no choice but to:
- Scan a large number of rows
- Read many data pages from disk
- Perform repeated I/O operations
This is where performance collapses—not because of algorithms, but because of physical disk access.
Indexing Is an I/O Optimization
Indexes allow the database engine to:
- Jump directly to relevant data pages
- Read far fewer disk blocks
- Avoid full table scans
This results in:
- Fewer disk reads
- Better cache utilization
- Lower latency
In other words:
Indexing primarily reduces disk I/O, not just CPU work.
Speed is the symptom.
Reduced I/O is the cause.
Disk Has a Lifespan and a Cost
Every disk read:
- Consumes I/O bandwidth
- Adds wear to storage devices (especially SSD-based systems)
- Competes with other workloads
In traditional on-prem systems, this means:
- Hardware degradation
- Capacity planning problems
In cloud systems, it means something very concrete:
Money.
Cloud Databases Charge for I/O
Managed databases such as Amazon Aurora charge based on:
- Storage
- Compute
- I/O requests
Behind the scenes, every disk read is counted.
So when a poorly indexed query:
- Scans millions of rows
- Performs unnecessary disk reads
- Runs repeatedly
You are not just slowing down the system—you are directly increasing your cloud bill.
A Realistic Example
Imagine a legacy system with:
- Minimal or missing indexes
- Heavy reporting queries
- High read traffic
Suppose it generates:
- 100 billion I/O requests per month
At that scale, even a tiny per-request cost becomes massive.
Now introduce:
- Proper indexing
- Query optimization
- Better access patterns
If indexing reduces disk reads by 80–90%, then:
- Query latency drops
- System stability improves
- Cloud cost drops dramatically
This is not theoretical.
This happens all the time in real production systems.
Yes, Indexes Have a Cost — and That’s OK
Indexes do introduce trade-offs:
- Extra storage
- Slower writes
- Maintenance overhead during INSERT, UPDATE, and DELETE
But in most production systems:
- Reads vastly outnumber writes
- Disk I/O is the dominant bottleneck
- Cloud billing is tied to I/O
In these environments, indexing is almost always the right trade-off.
A Better Answer to “Why Do We Use Indexing?”
Instead of saying:
“Indexing makes queries faster.”
A more accurate answer is:
Indexing reduces disk I/O, which improves performance, extends hardware lifespan, and significantly lowers cloud costs.
Final Takeaway
Indexing exists because:
- Disk access is slow
- Disk access is expensive
- Disk access is limited
- Disk access is billed in the cloud
Speed is just the visible benefit.
Cost and hardware reality are the real reasons.
Credits & References
Concept inspiration and database fundamentals:
Berkeley CS186 – Introduction to Database Systems
University of California, Berkeley
https://cs186berkeley.net/
Top comments (0)