DEV Community

Cover image for How I Optimized Database I/O from 100 Seconds to 3ms Using Multi-Level Indexing
Ayomide Olofinsawe
Ayomide Olofinsawe

Posted on

How I Optimized Database I/O from 100 Seconds to 3ms Using Multi-Level Indexing

Introduction

As a software developer, I recently learned how to optimize database performance, particularly focusing on improving the efficiency of reading 1 million records from a hard disk. Initially, this process took 100 seconds, but by implementing multi-level indexing, I was able to reduce the read time to just 3 milliseconds.

This article documents my learning process and the steps I took to achieve this optimization. I will:

  • Explain how data is stored on a hard disk and how database queries interact with storage.

  • Implement multi-level indexing to optimize search performance.

  • Calculate the disk access cost and optimize the I/O operation time to 3ms.

  • Compare performance improvements before and after indexing.

Understanding Data Storage on Hard Disks

A hard disk consists of circular platters divided into concentric tracks and further subdivided into pie-shaped sectors. The intersection of a track and a sector form a file block.

Key Definitions:

File Block Size: Approximately 4KB.
I/O Operations: Read/write operations occur in terms of file blocks.

How Data is Read from a Hard Disk
Before this optimization, I had a limited understanding of how data is read from a hard disk. I discovered that when an I/O operation is performed:

  1. The entire file block is loaded into RAM.

  2. The RAM processes the data.

  3. The processed data is returned to the hard disk if updated.

Hard Disk Structure Representation

Hard disk

structure representation of hard disk

Diagrammatic representation of hard disk

Performance Analysis: Reading 1 Million Records

Given:

  • Each record size = 400 bytes

  • Each file block size = 4KB (4000 bytes)

  • Number of records per block = 4000 / 400 = 10 records per block

  • Total records = 1,000,000

  • Number of required file blocks = 1,000,000 / 10 = 100,000 (10^5) blocks

  • Disk read time per block = 1 ms

  • Total time to read all records = 100,000 ms = 100 sec

Problem Statement:
After realizing that reading 1 million records in 100 seconds was inefficient, I started researching ways to optimize search operations and reduce read time to 3 ms.

Optimizing Search with Multi-Level Indexing

Step 1: Creating an Index Table
To reduce the number of blocks that need to be scanned, I created an index table that acts as a shortcut to access file blocks more efficiently.

index table

Each index entry requires 10 bytes, and since each block stores 4KB (4000 bytes), it can hold:
4000 / 10 = 400 index entries per block

file block table

Step 2: Reducing Block Access with Indexing

  • Initially, 100,000 blocks needed to be scanned.

  • The index table reduced this to 250 blocks.

  • Searching the index table takes 250 ms.

  • An additional 1 ms is required to read the corresponding data block.

- Total optimized time = 251 ms.

Step 3: Implementing Multi-Level Indexing

I then discovered that a multi-level indexing approach could further optimize search efficiency. By introducing a second-level index, I could reduce block access even further.

  • Each block stores 400 index entries.
  • The first index table contains 250 entries.
  • A second-level index table is created, requiring only 1 block.

Now, the search process follows these steps:

  1. Read the second-level index (1 ms).
  2. Find the correct block in the first-level index (1 ms).
  3. Access the actual data (1 ms).

Multi-Level Index Table Representation

Multi- level indexing

Final Optimized Time

  • Total read operations: 3 ms (compared to 100 sec initially).
  • This technique, called Multi-Level Indexing, is implemented using a B-tree structure.

Conclusion

Throughout this process, I learned how indexing is crucial role in optimizing database search performance. Using multi-level indexing, I successfully reduced the read time for 1 million records from 100 sec to 3 ms. This method significantly improves search performance in large datasets by minimizing disk I/O operations.
This was a fascinating deep dive into database optimization, and I now have a much stronger understanding of indexing and search performance tuning.

If you want to learn more about database check out this video click here

API Trace View

Struggling with slow API calls? 🕒

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs