<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abdul Qadir</title>
    <description>The latest articles on DEV Community by Abdul Qadir (@qadir21).</description>
    <link>https://dev.to/qadir21</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860671%2F17eb0cc9-db04-4f77-bd94-e400f231f40e.jpeg</url>
      <title>DEV Community: Abdul Qadir</title>
      <link>https://dev.to/qadir21</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/qadir21"/>
    <language>en</language>
    <item>
      <title>Building a Database Engine from Scratch</title>
      <dc:creator>Abdul Qadir</dc:creator>
      <pubDate>Sat, 04 Apr 2026 08:54:23 +0000</pubDate>
      <link>https://dev.to/qadir21/building-a-database-engine-from-scratch-3gii</link>
      <guid>https://dev.to/qadir21/building-a-database-engine-from-scratch-3gii</guid>
      <description>&lt;h1&gt;
  
  
  How I Built a Database Engine from Scratch in C++
&lt;/h1&gt;

&lt;p&gt;Most developers use databases every day — MySQL, PostgreSQL, SQLite — without ever thinking about what's happening underneath. I wanted to change that. So I built one from scratch.&lt;/p&gt;

&lt;p&gt;This is the story of &lt;strong&gt;4mulaQuery&lt;/strong&gt; — a custom database engine written in C++, exposed through a Java Spring Boot REST API, containerized with Docker, and deployed live on the internet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Build a Database?
&lt;/h2&gt;

&lt;p&gt;I was studying data structures and kept asking myself — &lt;em&gt;how does a real database actually store data? How does it find a record instantly among millions?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Books explain B+ Trees theoretically. But I wanted to feel it. So I built it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal was simple: understand databases from the ground up, not just use them.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
   │
   ▼
Java Spring Boot API
   │
   ▼ (stdin/stdout)
C++ Database Engine
   │
   ▼
4mulaQuery.db (Binary File)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three completely independent layers — each doing one job.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1 — C++ Storage Engine
&lt;/h2&gt;

&lt;p&gt;This is the heart of the project. No libraries. No shortcuts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: Binary File Storage
&lt;/h3&gt;

&lt;p&gt;First, I defined a fixed-width row schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Row&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// 4 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;     &lt;span class="c1"&gt;// 32 bytes&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;255&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;       &lt;span class="c1"&gt;// 255 bytes&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="c1"&gt;// Total: 291 bytes per row&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fixed-width means I can calculate any row's position instantly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;position = row_number × 291 bytes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;Pager&lt;/code&gt; class handles all disk I/O — reading and writing raw bytes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 2: B+ Tree Indexing
&lt;/h3&gt;

&lt;p&gt;The first version used linear search — O(n). Fine for 100 records. Terrible for 100,000.&lt;/p&gt;

&lt;p&gt;I upgraded to a &lt;strong&gt;B+ Tree&lt;/strong&gt; — the same data structure PostgreSQL and MySQL use internally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why B+ Tree?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O(log n) insert, search, delete&lt;/li&gt;
&lt;li&gt;Leaf nodes form a linked list — full table scans stay fast&lt;/li&gt;
&lt;li&gt;All data lives in leaf nodes — internal nodes are just routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My implementation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Page Layout (4096 bytes per page):

Leaf Node:
  [node_type | is_root | parent | num_cells | next_leaf | cells...]
  Each cell = key(4 bytes) + Row(291 bytes) = 295 bytes
  Max cells per leaf = 13

Internal Node:
  [node_type | is_root | parent | num_keys | right_child | cells...]
  Each cell = child(4 bytes) + key(4 bytes) = 8 bytes
  Max keys = 509
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Capacity:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Depth 1: 13 records&lt;/li&gt;
&lt;li&gt;Depth 2: 6,617 records&lt;/li&gt;
&lt;li&gt;Depth 3: 3,369,353 records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hardest part was the &lt;strong&gt;leaf split&lt;/strong&gt; — when a leaf node fills up, you have to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new leaf&lt;/li&gt;
&lt;li&gt;Redistribute records&lt;/li&gt;
&lt;li&gt;Update the linked list&lt;/li&gt;
&lt;li&gt;Inform the parent node&lt;/li&gt;
&lt;li&gt;Create a new root if needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Getting this right took me 2 days of debugging.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2 — Java Spring Boot API
&lt;/h2&gt;

&lt;p&gt;C++ handles storage. But browsers speak HTTP, not binary.&lt;/p&gt;

&lt;p&gt;I built a Java Spring Boot REST API as the bridge — it spawns a C++ process for every query, sends the command via stdin, and reads the response from stdout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// EngineService.java&lt;/span&gt;
&lt;span class="nc"&gt;Process&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;startProcess&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;streamHandler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;writeCommand&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;streamHandler&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readOutput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The API follows Single Responsibility Principle:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EngineService     → orchestrates everything
ProcessManager    → spawns/kills C++ process
StreamHandler     → stdin/stdout I/O
QueryLogger       → logs every query for ML
CommandType       → enum: INSERT | SEARCH | DELETE | ALL
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Endpoints:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /api/insert?id=1&amp;amp;name=Abdul&amp;amp;email=a@b.com
GET /api/search?id=1
GET /api/delete?id=1
GET /api/all
GET /api/logs        → analytics data
POST /api/auth/login
POST /api/auth/register
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 3 — Frontend Dashboard
&lt;/h2&gt;

&lt;p&gt;A single-page app with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Login / Signup / Forgot Password&lt;/li&gt;
&lt;li&gt;Dashboard with real-time DB operations&lt;/li&gt;
&lt;li&gt;Data Explorer&lt;/li&gt;
&lt;li&gt;Query Console (raw commands)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics Dashboard&lt;/strong&gt; — live charts using Chart.js&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The analytics dashboard fetches &lt;code&gt;/api/logs&lt;/code&gt; and renders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query distribution (bar chart)&lt;/li&gt;
&lt;li&gt;Average execution time per query type&lt;/li&gt;
&lt;li&gt;Success rate (pie chart)&lt;/li&gt;
&lt;li&gt;Timeline of last 20 queries&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ML Query Logging
&lt;/h2&gt;

&lt;p&gt;Every query is automatically logged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;timestamp,type,execution_ms,success,command
2026-04-01 10:00:01,INSERT,6,true,"insert,1,Abdul,abdul@test.com"
2026-04-01 10:00:02,SEARCH,2,true,"search,1"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built a Python analytics script (&lt;code&gt;analyze.py&lt;/code&gt;) that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads &lt;code&gt;query_logs.csv&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Generates performance graphs&lt;/li&gt;
&lt;li&gt;Identifies slow queries and patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This data will eventually train an ML model to predict and optimize query execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Docker Deployment
&lt;/h2&gt;

&lt;p&gt;The whole system runs in a single Docker container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Multi-stage build&lt;/span&gt;
&lt;span class="c"&gt;# Stage 1: Compile C++ engine&lt;/span&gt;
&lt;span class="c"&gt;# Stage 2: Build Java JAR&lt;/span&gt;
&lt;span class="c"&gt;# Stage 3: Runtime with both binaries&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployed live on Render.com:&lt;br&gt;
👉 &lt;strong&gt;fourmulaquery.onrender.com&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Databases are just files&lt;/strong&gt;&lt;br&gt;
At the lowest level, every database is reading and writing bytes to disk. The magic is in how efficiently you do it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. B+ Trees are beautiful&lt;/strong&gt;&lt;br&gt;
Once I understood why leaf nodes form a linked list, everything clicked. Range scans become trivial. Full table scans stay fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Inter-process communication is tricky&lt;/strong&gt;&lt;br&gt;
Getting Java to talk to C++ via stdin/stdout required careful stream handling — especially flushing buffers at the right time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Build the simple version first&lt;/strong&gt;&lt;br&gt;
I started with linear search. It worked. Then I upgraded to B+ Tree. This made debugging so much easier — I always had a working baseline.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] SQL Parser — &lt;code&gt;SELECT * FROM users WHERE id &amp;gt; 5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;[ ] Tree rebalancing on delete&lt;/li&gt;
&lt;li&gt;[ ] Distributed version&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;🌐 &lt;strong&gt;Live Demo:&lt;/strong&gt; fourmulaquery.onrender.com&lt;br&gt;&lt;br&gt;
💻 &lt;strong&gt;GitHub:&lt;/strong&gt; github.com/4mulaMind/4mulaQuery&lt;/p&gt;

&lt;p&gt;If you're learning data structures, build something real with them. You'll understand them 10x better.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Abdul Qadir&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyfkzekc7fbe9uit2fvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyfkzekc7fbe9uit2fvh.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>java</category>
      <category>database</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
