🏦 OLTP vs. OLAP: Why One Database Isn't Enough

#datawarehouse #oltp #olap #database

Running analytical queries on your production database is one of the fastest ways to take down your application. A single poorly written report can lock rows, spike CPU, and cause your checkout to time out while a business stakeholder waits for last quarter's numbers. The reason is that OLTP and OLAP systems are optimized for fundamentally different workloads, and trying to do both in one place breaks one or both.

OLTP: The Operational Database

OLTP (Online Transaction Processing) systems handle the real-time operations of your application. Every time a user places an order, transfers money, or updates their profile, that goes through an OLTP database.

The design priorities are speed on individual rows and guaranteed consistency:

Highly normalized schema to eliminate redundancy and make writes fast
Short, surgical queries that touch a few rows at a time
ACID transactions to ensure partial writes never happen
High write throughput: hundreds or thousands of concurrent writes per second

A typical OLTP query:

-- Point lookup: one row, one index, sub-millisecond response
SELECT pair, rate, fetched_at
FROM forex_rates
WHERE pair = 'USD/KES'
ORDER BY fetched_at DESC
LIMIT 1;

This query is fast because it uses an index on pair and returns immediately. The database optimizes for reads that return small amounts of data quickly.

Common OLTP systems: PostgreSQL, MySQL, Oracle, SQL Server

Common use cases: Banking transactions, e-commerce checkouts, inventory management, API backends

OLAP: The Analytical Database

OLAP (Online Analytical Processing) systems are built for analysis over large historical datasets. The queries are complex, touch millions of rows, and compute aggregations across long time ranges.

Design priorities shift entirely:

Denormalized or columnar schema to minimize joins
Wide table scans across millions of rows
Heavy aggregations with GROUP BY, window functions, and subqueries
Read-heavy workload where writes are batched, not real-time

A typical OLAP query:

-- Aggregate across 3 months of data: touches millions of rows
SELECT
    symbol,
    DATE(timestamp)                         AS trade_date,
    SUM(volume)                             AS total_volume,
    ROUND(AVG(close_price), 4)              AS avg_price,
    ROUND(
        (close_price - LAG(close_price) OVER (PARTITION BY symbol ORDER BY DATE(timestamp)))
        / NULLIF(LAG(close_price) OVER (PARTITION BY symbol ORDER BY DATE(timestamp)), 0) * 100,
        2
    )                                       AS daily_change_pct
FROM stock_prices
WHERE timestamp >= NOW() - INTERVAL '90 days'
GROUP BY symbol, DATE(timestamp), close_price
ORDER BY symbol, trade_date;

This query scans 90 days of price data, computes window functions, and returns a summary. Run this on an OLTP database mid-day and you'll cause real problems for every transaction competing for the same resources.

Common OLAP systems: BigQuery, Snowflake, Redshift, DuckDB, PostgreSQL (analytical workloads)

Common use cases: Business intelligence dashboards, trend analysis, pipeline quality checks, executive reporting

Why the Separation Matters

The conflict is physical. OLTP workloads need row-level locking and fast index lookups. OLAP workloads need sequential scans over huge datasets and full CPU for aggregations. Running both on the same database means:

Analytical queries hold locks that block your write path
CPU-intensive aggregations starve your transactional queries of resources
Index strategies that help OLAP scans conflict with write performance
Backups and maintenance windows affect both workloads at once

The standard architecture is to keep a PostgreSQL database (or equivalent) for operational transactions and load data into a warehouse (BigQuery, DuckDB, Snowflake) for analytics. Airflow handles the movement between them.

Side-by-Side Comparison

	OLTP	OLAP
Optimization target	Fast writes, row-level consistency	Fast reads, aggregate performance
Query type	Precise lookups on few rows	Broad scans, heavy aggregations
Data freshness	Real-time, live data	Historical, updated in batches
Schema design	Normalized	Denormalized or columnar
Concurrency	Thousands of short transactions	Fewer, long-running queries
Tools	PostgreSQL, MySQL	BigQuery, Snowflake, DuckDB

The Practical Rule

If a query ends with LIMIT 1 or WHERE id = ?, it belongs in your OLTP system. If it ends with GROUP BY and touches more than a few thousand rows, it belongs in your warehouse.

Keeping them separate isn't just an architectural preference. It's what keeps your production application stable while your analytics team runs reports freely.

Follow me on dev.to for more data engineering content, or browse the project code at github.com/declerke.