A lot of systems begin with a single database.
Usually PostgreSQL.
And honestly, in the beginning, that works perfectly fine.
The application stores:
- users
- payments
- inventory
- authentication
- operational state
Dashboards query the same database.
Analytics queries also run directly on PostgreSQL.
Everything feels simple.
The Problem Usually Starts Slowly
At first, analytical queries are small.
Maybe:
- daily reports
- lightweight aggregations
- small dashboards
Nothing too serious.
But over time, systems start generating:
- more events
- more metrics
- more logs
- more historical records
- more observability data
And analytical workloads start behaving very differently from transactional workloads.
For example:
SELECT
service_name,
avg(response_time_ms)
FROM metrics
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY service_name;
This is a very different kind of workload from:
UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;
One is trying to preserve operational correctness.
The other is trying to analyze huge amounts of historical data.
And eventually those workloads start colliding.
PostgreSQL Slowly Becomes Responsible for Everything
This is where things usually start getting interesting.
A lot of systems unintentionally turn PostgreSQL into:
- the transactional database
- the reporting database
- the analytics database
- the observability database
all at the same time.
And honestly, modern PostgreSQL is capable enough that this can work surprisingly well for a while.
Until:
- dashboards become heavier
- retention windows grow
- analytical scans become larger
- observability traffic increases
- aggregations become expensive
Now suddenly the same database handling:
- payments
- authentication
- users
- inventory
is also handling large analytical workloads.
And this is usually where architectural pressure starts building.
The Real Problem Is Workload Isolation
This is honestly the biggest lesson.
The issue is usually not:
“PostgreSQL is slow.”
The issue is:
transactional workloads and analytical workloads optimize for completely different things.
Transactional systems care heavily about:
- consistency
- operational latency
- updates
- row-level modifications
- business correctness
Analytical systems care heavily about:
- large scans
- aggregations
- compression
- historical analytics
- query throughput
Those are fundamentally different workload patterns.
And eventually trying to optimize one database perfectly for both becomes painful.
Why Observability Changes Everything So Quickly
One thing I find interesting is how fast observability workloads expose architectural limitations.
Because observability systems continuously generate:
- logs
- metrics
- traces
- events
And these workloads grow aggressively over time.
Now imagine running:
- large aggregations
- historical scans
- high-cardinality queries
- real-time dashboards
on the same database handling:
- authentication
- inventory
- operational business logic
- transactional traffic
At smaller scale this may still work.
At larger scale:
- query contention increases
- operational latency becomes sensitive
- workload isolation becomes harder
And eventually systems start evolving toward separation.
This Is Usually When Analytical Databases Start Appearing
At some point, many systems evolve toward something like this:
Application
↓
PostgreSQL
↓
CDC / Kafka / Airbyte
↓
ClickHouse / OLAP DB
↓
Analytics / Dashboards / Observability
This pattern has become extremely common in modern analytical systems.
And honestly, the reason is pretty simple:
PostgreSQL remains responsible for operational correctness.
ClickHouse becomes responsible for analytical scale.
Each system handles the workload it was actually designed for.
Not All Analytical Data Needs PostgreSQL First
One important thing though:
Not all analytical data even originates from PostgreSQL.
A lot of observability workloads:
- logs
- metrics
- traces
- telemetry events
often flow directly into ClickHouse/OLAP DB through streaming pipelines.
Something like:
Applications / Services
↓
Kafka / Streaming Pipelines
↓
ClickHouse / OLAP DB
In many systems, PostgreSQL stores the business data while ClickHouse directly handles logs, metrics, events, and analytical workloads.
And honestly, this makes a lot of sense.
Because analytical systems are usually optimized for:
- append-heavy ingestion
- historical querying
- event-style workloads
not transactional business operations.
Why Not Just Use ClickHouse for Everything?
This is another common misunderstanding.
ClickHouse is incredible for analytical workloads.
But transactional systems still require things like:
- frequent updates
- operational consistency
- transactional guarantees
- row-level modifications
- business-critical correctness
Those are not the primary design goals of analytical databases.
You generally do not want your:
- authentication system
- payment workflows
- inventory updates
- operational application state
depending entirely on analytical database behavior.
Why CDC Pipelines Become So Important
One reason this architecture became so practical is CDC (Change Data Capture).
Instead of repeatedly exporting data manually, systems continuously stream changes from PostgreSQL into analytical systems using:
- Kafka
- Debezium
- Airbyte
- streaming pipelines
That means:
- operational systems continue working normally
- analytical systems receive near real-time data
- workloads stay separated cleanly
And analytical queries no longer compete directly against transactional traffic.
Don’t Rush Into Multi-Database Architectures
One important thing though:
Most systems do not need Kafka + ClickHouse pipelines on Day 1.
Honestly, many applications can scale surprisingly far with PostgreSQL alone using:
- proper indexing
- query optimization
- read replicas
- partitioning
- extensions like Citus
The goal is not to introduce more infrastructure as early as possible.
The real signal usually appears when analytical workloads start affecting operational user experience.
That is often when workload separation starts becoming worth the additional architectural complexity.
Because systems like:
- CDC pipelines
- Kafka
- analytical databases
also introduce operational overhead of their own.
And good architecture is usually about introducing complexity only when the workload actually demands it.
The Bigger Engineering Lesson
Most systems do not start with multiple databases.
They evolve into them as workloads grow.
Transactional workloads and analytical workloads behave very differently at scale.
And eventually systems start separating:
- operational correctness
- analytical querying
- observability workloads
- historical analytics
into infrastructure optimized for each workload.
Final Thought
A lot of modern systems do not start with multiple databases.
They evolve into them.
Because transactional workloads and analytical workloads eventually want very different things from the same infrastructure.
And real-time analytics is often the thing that forces that architectural separation to happen.
Top comments (0)