Introduction
As a software developer, I recently learned how to optimize database performance, particularly focusing on improving the efficiency of reading 1 million records from a hard disk. Initially, this process took 100 seconds, but by implementing multi-level indexing, I was able to reduce the read time to just 3 milliseconds.
This article documents my learning process and the steps I took to achieve this optimization. I will:
Explain how data is stored on a hard disk and how database queries interact with storage.
Implement multi-level indexing to optimize search performance.
Calculate the disk access cost and optimize the I/O operation time to 3ms.
Compare performance improvements before and after indexing.
Understanding Data Storage on Hard Disks
A hard disk consists of circular platters divided into concentric tracks and further subdivided into pie-shaped sectors. The intersection of a track and a sector form a file block.
Key Definitions:
File Block Size: Approximately 4KB.
I/O Operations: Read/write operations occur in terms of file blocks.
How Data is Read from a Hard Disk
Before this optimization, I had a limited understanding of how data is read from a hard disk. I discovered that when an I/O operation is performed:
The entire file block is loaded into RAM.
The RAM processes the data.
The processed data is returned to the hard disk if updated.
Hard Disk Structure Representation
Diagrammatic representation of hard disk
Performance Analysis: Reading 1 Million Records
Given:
Each record size = 400 bytes
Each file block size = 4KB (4000 bytes)
Number of records per block = 4000 / 400 = 10 records per block
Total records = 1,000,000
Number of required file blocks = 1,000,000 / 10 = 100,000 (10^5) blocks
Disk read time per block = 1 ms
Total time to read all records = 100,000 ms = 100 sec
Problem Statement:
After realizing that reading 1 million records in 100 seconds was inefficient, I started researching ways to optimize search operations and reduce read time to 3 ms.
Optimizing Search with Multi-Level Indexing
Step 1: Creating an Index Table
To reduce the number of blocks that need to be scanned, I created an index table that acts as a shortcut to access file blocks more efficiently.
Each index entry requires 10 bytes, and since each block stores 4KB (4000 bytes), it can hold:
4000 / 10 = 400 index entries per block
Step 2: Reducing Block Access with Indexing
Initially, 100,000 blocks needed to be scanned.
The index table reduced this to 250 blocks.
Searching the index table takes 250 ms.
An additional 1 ms is required to read the corresponding data block.
- Total optimized time = 251 ms.
Step 3: Implementing Multi-Level Indexing
I then discovered that a multi-level indexing approach could further optimize search efficiency. By introducing a second-level index, I could reduce block access even further.
- Each block stores 400 index entries.
- The first index table contains 250 entries.
- A second-level index table is created, requiring only 1 block.
Now, the search process follows these steps:
- Read the second-level index (1 ms).
- Find the correct block in the first-level index (1 ms).
- Access the actual data (1 ms).
Multi-Level Index Table Representation
Final Optimized Time
- Total read operations: 3 ms (compared to 100 sec initially).
- This technique, called Multi-Level Indexing, is implemented using a B-tree structure.
Conclusion
Throughout this process, I learned how indexing is crucial role in optimizing database search performance. Using multi-level indexing, I successfully reduced the read time for 1 million records from 100 sec to 3 ms. This method significantly improves search performance in large datasets by minimizing disk I/O operations.
This was a fascinating deep dive into database optimization, and I now have a much stronger understanding of indexing and search performance tuning.
If you want to learn more about database check out this video click here
Top comments (0)