Athreya aka Maneshwar

Posted on Dec 25, 2025 • Edited on Dec 26, 2025

Indexes and the Rise of the DBMS

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*

In the previous post, we saw how the relational model gives structure to data entities become relations, relationships become tables, and integrity constraints define what a valid database state looks like.

But structure alone is not enough.

Once data grows beyond trivial size, how we access relations becomes just as important as what they represent.

This is where indexing and the database management system (DBMS) enter the picture.

Why Indexes Exist

A full relation scan reading every row in a table is the most basic way to answer a query.

It is also one of the most expensive operations a database can perform.

Given what we already know about disks:

Reading data happens block by block
Disk I/O dominates execution time
Sequential scans over large relations are costly

It should be clear why databases need a faster way to locate rows.

An index is a data structure built on top of a relation to support efficient lookup of rows based on values of one or more attributes.

An index does not store the data itself. Instead, it stores:

Search key values
Pointers or references to rows in the base relation
Routing information that guides the lookup along the most efficient path

Indexes are not part of the relational model itself, but they are indispensable in real systems.

Indexes and Relations

Some important properties of indexes in relational databases:

An index is always associated with one base relation
A relation may have:
- No index
- One index
- Many indexes
Indexes are defined on one or more attributes, collectively called the index search key
The search key does not have to be a key of the relation

If the index search key is the primary key of the relation, the index is called a primary index. Otherwise, it is a secondary index.

Most systems allow many secondary indexes on the same relation.

Indexes are created and dropped by users, but they are maintained automatically by the database system.

Users never directly read from or write to an index.

Whenever rows are:

Inserted
Deleted
Updated

the DBMS updates all affected indexes behind the scenes.

This maintenance is not free. Adding more indexes improves read performance but slows down writes, since every modification must also update index structures.

Hash Indexes and Tree Indexes

Two families of index structures dominate relational systems.

Hash Indexes

A hash index is optimized for fast point lookups.

It works by:

Applying a hash function to the search key value
Mapping the value to a bucket
Searching only within that bucket

Hash indexes are extremely fast when querying for exact matches.

However, they have a major limitation:

They do not support range queries efficiently

Asking for values between two bounds requires scanning many or all buckets, defeating the purpose of the index.

B-Tree and B+-Tree Indexes

A B-tree index maintains search key values in a sorted order.

This ordering allows:

Efficient point lookups
Efficient range queries
Ordered traversal without scanning the entire relation

Each node in a B-tree contains:

Search key values
Pointers to child nodes

Traversal decisions are made by comparing the search key at each node.

A widely used variant is the B+-tree:

Internal nodes contain only routing information
Leaf nodes contain the actual data references
Leaf nodes are often linked to support fast range scans

B+-trees are especially well-suited to disk-based storage because they minimize random I/O and keep tree height small.

At this point, we have the first glimpse of how logical queries begin to map onto physical disk access patterns.

Why Applications Should Not Manage Data Directly

Historically, applications manipulated database files directly:

Reading bytes from files
Modifying in-memory structures
Writing data back to disk

This file processing approach quickly breaks down.

Some fundamental problems:

Applications become tightly coupled to file formats
Integrity constraints are hard to enforce
Partial writes and crashes corrupt data
Concurrent access becomes dangerous
Recovery after failure is extremely complex

As databases grow and multiple users access them concurrently, correctness becomes nearly impossible to guarantee at the application level.

Even well-written applications can produce inconsistent states under concurrency or failure.

Enter the Database Management System

The solution is to insert a dedicated software layer between applications and storage: the Database Management System (DBMS).

A DBMS provides:

Higher-level data abstractions (relations instead of files)
Well-defined APIs for querying and updating data
Automatic index maintenance
Enforcement of integrity constraints
Concurrency control
Failure recovery

Applications no longer manipulate files. They interact only with the DBMS.

From the application’s perspective:

Data appears as tables
Operations are declarative
Storage details are hidden

From the system’s perspective:

Queries are translated into efficient execution plans
Indexes are used when beneficial
Disk I/O is carefully managed
Failures are handled transparently

The DBMS acts as a black box that bridges the gap between:

Conceptual data models
Physical storage realities

Where We Are Now

At this stage, the layers are clearly visible:

Disks define physical constraints
Relations define logical structure
Indexes define access paths
The DBMS coordinates everything in between

What remains unresolved is the hardest problem of all:

What happens when many users modify data concurrently, and failures occur mid-operation?

That question leads directly to transactions, ACID properties, transaction management, and concurrency control.

That’s where this journey goes next and where databases stop being simple data stores and start becoming serious systems.

👉 Check out: FreeDevTools

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

⭐ Star it on GitHub: freedevtools

DEV Community