Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*
In the previous post, we saw how the relational model gives structure to data entities become relations, relationships become tables, and integrity constraints define what a valid database state looks like.
But structure alone is not enough.
Once data grows beyond trivial size, how we access relations becomes just as important as what they represent.
This is where indexing and the database management system (DBMS) enter the picture.
Why Indexes Exist
A full relation scan reading every row in a table is the most basic way to answer a query.
It is also one of the most expensive operations a database can perform.
Given what we already know about disks:
- Reading data happens block by block
- Disk I/O dominates execution time
- Sequential scans over large relations are costly
It should be clear why databases need a faster way to locate rows.
An index is a data structure built on top of a relation to support efficient lookup of rows based on values of one or more attributes.
An index does not store the data itself. Instead, it stores:
- Search key values
- Pointers or references to rows in the base relation
- Routing information that guides the lookup along the most efficient path
Indexes are not part of the relational model itself, but they are indispensable in real systems.
Indexes and Relations
Some important properties of indexes in relational databases:
- An index is always associated with one base relation
- A relation may have:
- No index
- One index
- Many indexes
- Indexes are defined on one or more attributes, collectively called the index search key
- The search key does not have to be a key of the relation
If the index search key is the primary key of the relation, the index is called a primary index. Otherwise, it is a secondary index.
Most systems allow many secondary indexes on the same relation.
Indexes are created and dropped by users, but they are maintained automatically by the database system.
Users never directly read from or write to an index.
Whenever rows are:
- Inserted
- Deleted
- Updated
the DBMS updates all affected indexes behind the scenes.
This maintenance is not free. Adding more indexes improves read performance but slows down writes, since every modification must also update index structures.
Hash Indexes and Tree Indexes
Two families of index structures dominate relational systems.
Hash Indexes
A hash index is optimized for fast point lookups.
It works by:
- Applying a hash function to the search key value
- Mapping the value to a bucket
- Searching only within that bucket
Hash indexes are extremely fast when querying for exact matches.
However, they have a major limitation:
- They do not support range queries efficiently
Asking for values between two bounds requires scanning many or all buckets, defeating the purpose of the index.
B-Tree and B+-Tree Indexes
A B-tree index maintains search key values in a sorted order.
This ordering allows:
- Efficient point lookups
- Efficient range queries
- Ordered traversal without scanning the entire relation
Each node in a B-tree contains:
- Search key values
- Pointers to child nodes
Traversal decisions are made by comparing the search key at each node.
A widely used variant is the B+-tree:
- Internal nodes contain only routing information
- Leaf nodes contain the actual data references
- Leaf nodes are often linked to support fast range scans
B+-trees are especially well-suited to disk-based storage because they minimize random I/O and keep tree height small.
At this point, we have the first glimpse of how logical queries begin to map onto physical disk access patterns.
Why Applications Should Not Manage Data Directly
Historically, applications manipulated database files directly:
- Reading bytes from files
- Modifying in-memory structures
- Writing data back to disk
This file processing approach quickly breaks down.
Some fundamental problems:
- Applications become tightly coupled to file formats
- Integrity constraints are hard to enforce
- Partial writes and crashes corrupt data
- Concurrent access becomes dangerous
- Recovery after failure is extremely complex
As databases grow and multiple users access them concurrently, correctness becomes nearly impossible to guarantee at the application level.
Even well-written applications can produce inconsistent states under concurrency or failure.
Enter the Database Management System
The solution is to insert a dedicated software layer between applications and storage: the Database Management System (DBMS).
A DBMS provides:
- Higher-level data abstractions (relations instead of files)
- Well-defined APIs for querying and updating data
- Automatic index maintenance
- Enforcement of integrity constraints
- Concurrency control
- Failure recovery
Applications no longer manipulate files. They interact only with the DBMS.
From the application’s perspective:
- Data appears as tables
- Operations are declarative
- Storage details are hidden
From the system’s perspective:
- Queries are translated into efficient execution plans
- Indexes are used when beneficial
- Disk I/O is carefully managed
- Failures are handled transparently
The DBMS acts as a black box that bridges the gap between:
- Conceptual data models
- Physical storage realities
Where We Are Now
At this stage, the layers are clearly visible:
- Disks define physical constraints
- Relations define logical structure
- Indexes define access paths
- The DBMS coordinates everything in between
What remains unresolved is the hardest problem of all:
What happens when many users modify data concurrently, and failures occur mid-operation?
That question leads directly to transactions, ACID properties, transaction management, and concurrency control.
That’s where this journey goes next and where databases stop being simple data stores and start becoming serious systems.
👉 Check out: FreeDevTools
Any feedback or contributors are welcome!
It’s online, open-source, and ready for anyone to use.
⭐ Star it on GitHub: freedevtools

Top comments (0)