DEV Community

Cover image for Why Real-Time Analytics Eventually Changes Your Database Architecture
Mohamed Hussain S
Mohamed Hussain S

Posted on

Why Real-Time Analytics Eventually Changes Your Database Architecture

A lot of systems begin with a single database.

Usually PostgreSQL.

And honestly, in the beginning, that works perfectly fine.

The application stores:

  • users
  • payments
  • inventory
  • authentication
  • operational state

Dashboards query the same database.

Analytics queries also run directly on PostgreSQL.

Everything feels simple.


The Problem Usually Starts Slowly

At first, analytical queries are small.

Maybe:

  • daily reports
  • lightweight aggregations
  • small dashboards

Nothing too serious.

But over time, systems start generating:

  • more events
  • more metrics
  • more logs
  • more historical records
  • more observability data

And analytical workloads start behaving very differently from transactional workloads.

For example:

SELECT
    service_name,
    avg(response_time_ms)
FROM metrics
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY service_name;
Enter fullscreen mode Exit fullscreen mode

This is a very different kind of workload from:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;
Enter fullscreen mode Exit fullscreen mode

One is trying to preserve operational correctness.

The other is trying to analyze huge amounts of historical data.

And eventually those workloads start colliding.


PostgreSQL Slowly Becomes Responsible for Everything

This is where things usually start getting interesting.

A lot of systems unintentionally turn PostgreSQL into:

  • the transactional database
  • the reporting database
  • the analytics database
  • the observability database

all at the same time.

And honestly, modern PostgreSQL is capable enough that this can work surprisingly well for a while.

Until:

  • dashboards become heavier
  • retention windows grow
  • analytical scans become larger
  • observability traffic increases
  • aggregations become expensive

Now suddenly the same database handling:

  • payments
  • authentication
  • users
  • inventory

is also handling large analytical workloads.

And this is usually where architectural pressure starts building.


The Real Problem Is Workload Isolation

This is honestly the biggest lesson.

The issue is usually not:

“PostgreSQL is slow.”

The issue is:

transactional workloads and analytical workloads optimize for completely different things.

Transactional systems care heavily about:

  • consistency
  • operational latency
  • updates
  • row-level modifications
  • business correctness

Analytical systems care heavily about:

  • large scans
  • aggregations
  • compression
  • historical analytics
  • query throughput

Those are fundamentally different workload patterns.

And eventually trying to optimize one database perfectly for both becomes painful.


Why Observability Changes Everything So Quickly

One thing I find interesting is how fast observability workloads expose architectural limitations.

Because observability systems continuously generate:

  • logs
  • metrics
  • traces
  • events

And these workloads grow aggressively over time.

Now imagine running:

  • large aggregations
  • historical scans
  • high-cardinality queries
  • real-time dashboards

on the same database handling:

  • authentication
  • inventory
  • operational business logic
  • transactional traffic

At smaller scale this may still work.

At larger scale:

  • query contention increases
  • operational latency becomes sensitive
  • workload isolation becomes harder

And eventually systems start evolving toward separation.


This Is Usually When Analytical Databases Start Appearing

At some point, many systems evolve toward something like this:

Application
    ↓
PostgreSQL
    ↓
CDC / Kafka / Airbyte
    ↓
ClickHouse / OLAP DB
    ↓
Analytics / Dashboards / Observability
Enter fullscreen mode Exit fullscreen mode

This pattern has become extremely common in modern analytical systems.

And honestly, the reason is pretty simple:

PostgreSQL remains responsible for operational correctness.

ClickHouse becomes responsible for analytical scale.

Each system handles the workload it was actually designed for.


Not All Analytical Data Needs PostgreSQL First

One important thing though:

Not all analytical data even originates from PostgreSQL.

A lot of observability workloads:

  • logs
  • metrics
  • traces
  • telemetry events

often flow directly into ClickHouse/OLAP DB through streaming pipelines.

Something like:

Applications / Services
        ↓
Kafka / Streaming Pipelines
        ↓
ClickHouse / OLAP DB
Enter fullscreen mode Exit fullscreen mode

In many systems, PostgreSQL stores the business data while ClickHouse directly handles logs, metrics, events, and analytical workloads.

And honestly, this makes a lot of sense.

Because analytical systems are usually optimized for:

  • append-heavy ingestion
  • historical querying
  • event-style workloads

not transactional business operations.


Why Not Just Use ClickHouse for Everything?

This is another common misunderstanding.

ClickHouse is incredible for analytical workloads.

But transactional systems still require things like:

  • frequent updates
  • operational consistency
  • transactional guarantees
  • row-level modifications
  • business-critical correctness

Those are not the primary design goals of analytical databases.

You generally do not want your:

  • authentication system
  • payment workflows
  • inventory updates
  • operational application state

depending entirely on analytical database behavior.


Why CDC Pipelines Become So Important

One reason this architecture became so practical is CDC (Change Data Capture).

Instead of repeatedly exporting data manually, systems continuously stream changes from PostgreSQL into analytical systems using:

  • Kafka
  • Debezium
  • Airbyte
  • streaming pipelines

That means:

  • operational systems continue working normally
  • analytical systems receive near real-time data
  • workloads stay separated cleanly

And analytical queries no longer compete directly against transactional traffic.


Don’t Rush Into Multi-Database Architectures

One important thing though:

Most systems do not need Kafka + ClickHouse pipelines on Day 1.

Honestly, many applications can scale surprisingly far with PostgreSQL alone using:

  • proper indexing
  • query optimization
  • read replicas
  • partitioning
  • extensions like Citus

The goal is not to introduce more infrastructure as early as possible.

The real signal usually appears when analytical workloads start affecting operational user experience.

That is often when workload separation starts becoming worth the additional architectural complexity.

Because systems like:

  • CDC pipelines
  • Kafka
  • analytical databases

also introduce operational overhead of their own.

And good architecture is usually about introducing complexity only when the workload actually demands it.


The Bigger Engineering Lesson

Most systems do not start with multiple databases.

They evolve into them as workloads grow.

Transactional workloads and analytical workloads behave very differently at scale.

And eventually systems start separating:

  • operational correctness
  • analytical querying
  • observability workloads
  • historical analytics

into infrastructure optimized for each workload.


Final Thought

A lot of modern systems do not start with multiple databases.

They evolve into them.

Because transactional workloads and analytical workloads eventually want very different things from the same infrastructure.

And real-time analytics is often the thing that forces that architectural separation to happen.

Top comments (0)