Mohamed Hussain S

Posted on May 19

Why Real-Time Analytics Eventually Changes Your Database Architecture

#clickhouse #postgres #dataengineering #architecture

A lot of systems begin with a single database.

Usually PostgreSQL.

And honestly, in the beginning, that works perfectly fine.

The application stores:

users
payments
inventory
authentication
operational state

Dashboards query the same database.

Analytics queries also run directly on PostgreSQL.

Everything feels simple.

The Problem Usually Starts Slowly

At first, analytical queries are small.

Maybe:

daily reports
lightweight aggregations
small dashboards

Nothing too serious.

But over time, systems start generating:

more events
more metrics
more logs
more historical records
more observability data

And analytical workloads start behaving very differently from transactional workloads.

For example:

SELECT
    service_name,
    avg(response_time_ms)
FROM metrics
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY service_name;

This is a very different kind of workload from:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;

One is trying to preserve operational correctness.

The other is trying to analyze huge amounts of historical data.

And eventually those workloads start colliding.

PostgreSQL Slowly Becomes Responsible for Everything

This is where things usually start getting interesting.

A lot of systems unintentionally turn PostgreSQL into:

the transactional database
the reporting database
the analytics database
the observability database

all at the same time.

And honestly, modern PostgreSQL is capable enough that this can work surprisingly well for a while.

Until:

dashboards become heavier
retention windows grow
analytical scans become larger
observability traffic increases
aggregations become expensive

Now suddenly the same database handling:

payments
authentication
users
inventory

is also handling large analytical workloads.

And this is usually where architectural pressure starts building.

The Real Problem Is Workload Isolation

This is honestly the biggest lesson.

The issue is usually not:

“PostgreSQL is slow.”

The issue is:

transactional workloads and analytical workloads optimize for completely different things.

Transactional systems care heavily about:

consistency
operational latency
updates
row-level modifications
business correctness

Analytical systems care heavily about:

large scans
aggregations
compression
historical analytics
query throughput

Those are fundamentally different workload patterns.

And eventually trying to optimize one database perfectly for both becomes painful.

Why Observability Changes Everything So Quickly

One thing I find interesting is how fast observability workloads expose architectural limitations.

Because observability systems continuously generate:

logs
metrics
traces
events

And these workloads grow aggressively over time.

Now imagine running:

large aggregations
historical scans
high-cardinality queries
real-time dashboards

on the same database handling:

authentication
inventory
operational business logic
transactional traffic

At smaller scale this may still work.

At larger scale:

query contention increases
operational latency becomes sensitive
workload isolation becomes harder

And eventually systems start evolving toward separation.

This Is Usually When Analytical Databases Start Appearing

At some point, many systems evolve toward something like this:

Application
    ↓
PostgreSQL
    ↓
CDC / Kafka / Airbyte
    ↓
ClickHouse / OLAP DB
    ↓
Analytics / Dashboards / Observability

This pattern has become extremely common in modern analytical systems.

And honestly, the reason is pretty simple:

PostgreSQL remains responsible for operational correctness.

ClickHouse becomes responsible for analytical scale.

Each system handles the workload it was actually designed for.

Not All Analytical Data Needs PostgreSQL First

One important thing though:

Not all analytical data even originates from PostgreSQL.

A lot of observability workloads:

logs
metrics
traces
telemetry events

often flow directly into ClickHouse/OLAP DB through streaming pipelines.

Something like:

Applications / Services
        ↓
Kafka / Streaming Pipelines
        ↓
ClickHouse / OLAP DB

In many systems, PostgreSQL stores the business data while ClickHouse directly handles logs, metrics, events, and analytical workloads.

And honestly, this makes a lot of sense.

Because analytical systems are usually optimized for:

append-heavy ingestion
historical querying
event-style workloads

not transactional business operations.

Why Not Just Use ClickHouse for Everything?

This is another common misunderstanding.

ClickHouse is incredible for analytical workloads.

But transactional systems still require things like:

frequent updates
operational consistency
transactional guarantees
row-level modifications
business-critical correctness

Those are not the primary design goals of analytical databases.

You generally do not want your:

authentication system
payment workflows
inventory updates
operational application state

depending entirely on analytical database behavior.

Why CDC Pipelines Become So Important

One reason this architecture became so practical is CDC (Change Data Capture).

Instead of repeatedly exporting data manually, systems continuously stream changes from PostgreSQL into analytical systems using:

Kafka
Debezium
Airbyte
streaming pipelines

That means:

operational systems continue working normally
analytical systems receive near real-time data
workloads stay separated cleanly

And analytical queries no longer compete directly against transactional traffic.

Don’t Rush Into Multi-Database Architectures

One important thing though:

Most systems do not need Kafka + ClickHouse pipelines on Day 1.

Honestly, many applications can scale surprisingly far with PostgreSQL alone using:

proper indexing
query optimization
read replicas
partitioning
extensions like Citus

The goal is not to introduce more infrastructure as early as possible.

The real signal usually appears when analytical workloads start affecting operational user experience.

That is often when workload separation starts becoming worth the additional architectural complexity.

Because systems like:

CDC pipelines
Kafka
analytical databases

also introduce operational overhead of their own.

And good architecture is usually about introducing complexity only when the workload actually demands it.

The Bigger Engineering Lesson

Most systems do not start with multiple databases.

They evolve into them as workloads grow.

Transactional workloads and analytical workloads behave very differently at scale.

And eventually systems start separating:

operational correctness
analytical querying
observability workloads
historical analytics

into infrastructure optimized for each workload.

Final Thought

A lot of modern systems do not start with multiple databases.

They evolve into them.

Because transactional workloads and analytical workloads eventually want very different things from the same infrastructure.

And real-time analytics is often the thing that forces that architectural separation to happen.

DEV Community