The Disk-Level Architecture of OLTP vs. OLAP

#database #architecture #datascience #performance

Every backend engineer has seen this happen, you build an application on a relational database like MySQL, handling thousands of concurrent transactions effortlessly. Then, the business asks for a real time analytics dashboard. But when you run an aggregation query over historical data, suddenly the database that effortlessly managed live traffic starts thrashing, evicting your working set, and dragging application performance down.

This isn't a tuning problem, a missing index, or a badly written query. It’s a fundamental architectural collision.

OLTP (Online Transaction Processing)

OLTP encompasses nearly every concurrent digital interaction triggered across a distributed system. A user downloading a PDF, a microservice firing an automatic maintenance log, a comment on a social feed these are all transactions.

Data engineers rely on OLTP systems (like MySQL or PostgreSQL) to capture these concurrent streams of interactions for creating, updating and deleting records.

The Tree Based In-Place Engine

To reliably capture massive volumes of transactions without corrupting data or locking up the application, OLTP systems rely on a highly optimized, row oriented architecture built around the B+ Tree.

Because they must provide immediate, atomic updates to existing records, transactional databases manage state through a strict sequence of physical tree traversal and in-memory page mutation:

The B+ Tree Indexing: When a transaction reads or updates id: 1, the engine traverses a B+ Tree from the root, through the branch nodes, directly to the specific physical leaf node holding that row. This O(\log n) traversal guarantees a fast, isolated point-lookup. It ensures the application always hits the single version of the row without scanning irrelevant data.
The Buffer Pool & In-Place Updates: OLTP systems perform in place updates. The database pulls the exact page containing id: 1 from the physical disk into memory (the Buffer Pool). The specific row is mutated directly in RAM and the page is marked as 'dirty'.
Write-Ahead Logging (WAL): Because updating data in the Buffer Pool is volatile, the database must guarantee strict ACID durability. Before the dirty page is ever flushed back to the disk to overwrite the old contiguous block, the raw change event is appended to a sequential log (the WAL). Sequential disk writes are incredibly fast, guaranteeing that the transaction is permanently recorded without forcing the user to wait for a heavy disk overwrite.

This architecture makes relational databases the ultimate engine for ingesting millions of concurrent transactions safely. But the exact physical layout that makes it perfect for point updates is exactly what causes random I/O and memory spikes when you try to sequentially scan millions of those rows at once.

OLAP (Online Analytical Processing)

While OLTP is heavily optimized for real time state mutations, OLAP is engineered for multi-dimensional aggregation across historical datasets. Organizations today collect massive volumes of fragmented data across different sources like application databases, website clickstreams, and internal microservices.

Data engineers rely on OLAP systems (like StarRocks or data lakes built on Apache Iceberg) to unify these decoupled streams, executing complex aggregations, scans and joins to answer multi dimensional questions.

The Layer Based Append Only Engine

To solve the random I/O problem of the B-Tree, modern OLAP engines and data lakes (like StarRocks or Apache Iceberg) abandon in place updates entirely. Instead, they use a layered, append only architecture, heavily inspired by LSM-Trees (Log-Structured Merge-trees).

The Ingestion Layer (Layer 0)
When new data arrives or an update to an existing record occurs, the OLAP engine doesn't traverse a giant tree to find the old record. It simply writes the new record as a brand new, highly compressed Parquet or columnar file into the top layer of storage (Layer 0). This turns every write and update into a fast, sequential disk append. If id: 1 changes its status from "pending" to "active", the database doesn't overwrite the old "pending" row. It just drops a new file in Layer 0 saying id: 1 is now "active".
The Read Path (Merge-on-Read)
Because updates are just new inserts, there are technically duplicate records for id: 1 sitting on the disk. When an analytical query runs, the storage engine reads top-down. It checks the newest layers first. When it sees the "active" record in Layer 0, it dynamically masks or ignores the older "pending" record in the lower layers. The query gets the absolute latest state without ever having to traverse a massive tree structure.
The Compaction Engine
If the database only ever appended data, the disk would eventually explode with millions of tiny Parquet files, and read performance would degrade as the engine tried to merge too many layers on the fly. To handle this, background compaction run continuously. These async processes take the fragmented files from Layer 0 and merge them into larger, heavily optimized files in Layer 1. During this compaction process, the engine identifies the duplicates, physically drops the stale "pending" record for id: 1, and creates a single version of truth in the lower layer.

Bridging the Gap

If OLTP is strictly engineered for atomic writes and OLAP is optimized for massive reads, how do we get data from one to the other without destabilizing either system?

You don't do it by writing to both databases from your application, that introduces complexity and unacceptable latency. You also can't rely on heavy, periodic batch ETL jobs running directly against the transactional database without spiking resource usage and impacting live users.

The solution requires completely decoupling ingestion from analytics. In the modern data stack, this is achieved through Change Data Capture (CDC), streaming committed changes directly from the OLTP's Write-Ahead Log into the OLAP's append-only ingestion layer.

But the mechanics of how we safely stream, transform, and ingest those logs at scale without dropping events is an entirely different architectural deep-dive.