DEV Community: Charity Jelimo

Refreshing PostgreSQL Materialized Views Without Downtime

Charity Jelimo — Thu, 14 May 2026 13:09:00 +0000

Materialized views are one of PostgreSQL’s most useful features for analytics and reporting workloads.

They solve a very common problem: some queries are simply too expensive to run repeatedly in real time.

Imagine a dashboard query that joins multiple large tables, aggregates millions of rows, and calculates metrics per customer. Running that query on every request can quickly become slow, expensive, and unpredictable under load.

A materialized view solves this by storing the query result physically on disk.

Instead of recalculating the query every time, PostgreSQL precomputes the result once and serves future reads directly from the stored data.

CREATE MATERIALIZED VIEW mv_ordersummary AS
SELECT
    company_id,
    COUNT(*) AS total_orders,
    SUM(amount) AS revenue
FROM orders
GROUP BY company_id;

Querying the materialized view is fast because the expensive computation already happened earlier.

But materialized views introduce a new problem:

the data becomes stale.

Unlike normal views, materialized views do not automatically update when source tables change. You must refresh them manually.

That is where things become complicated.

Refreshing a materialized view sounds simple at first:

REFRESH MATERIALIZED VIEW mv_ordersummary;

But in production systems, especially multi-tenant systems with continuous traffic, refresh behavior becomes an operational challenge.

Large refreshes can take minutes.
Refresh operations acquire locks.
Users continue reading during refreshes.
One failed refresh should not impact other tenants.

And at scale, dozens or hundreds of materialized views may refresh continuously.

In our case, every tenant has its own set of materialized views:

mv_ordersummary_123
mv_ordersummary_456
mv_ordersummary_789

Why Refreshing Materialized Views Is Harder Than It Looks

The problem is not creating the materialized view.

The problem is replacing old data with new data while traffic is actively reading from it.

A few realities make this difficult:

Refreshes can take minutes on large datasets
PostgreSQL refresh operations acquire locks
Readers expect consistent results
Multi-tenant systems amplify operational problems
One bad tenant refresh should not break the entire refresh cycle

At small scale, REFRESH MATERIALIZED VIEW works fine.

At larger scale, you start caring about:

lock duration
transaction scope
bloat
validation
observability
failure isolation
rollback safety

The strategy you choose depends on your workload characteristics.
Let’s walk through the main approaches PostgreSQL engineers typically use.

Approach 1: Plain REFRESH MATERIALIZED VIEW

The simplest option is the default PostgreSQL refresh behavior.

REFRESH MATERIALIZED VIEW mv_ordersummary_123;

How It Works

PostgreSQL rebuilds the materialized view contents in place.

During the refresh, PostgreSQL acquires an ACCESS EXCLUSIVE lock on the materialized view.

That lock blocks:

reads
writes
concurrent refreshes

Readers wait until the refresh completes.

Pros

Very simple.
No special schema design required.
No additional storage overhead.
Usually fast for smaller views.
Cons
The lock behavior is a dealbreaker for high-concurrency systems.
If your refresh takes 3 minutes, readers block for 3 minutes.
In practice, that means:
API latency spikes
request timeouts
connection pool exhaustion
cascading failures

This becomes especially painful in multi-tenant systems where refreshes happen continuously.

Even if each refresh is “only” 30 seconds, enough concurrent tenants create constant lock pressure.

Verdict

Good for:

internal tools
small datasets
low read concurrency
maintenance windows

Not acceptable for continuously queried production workloads.

Approach 2: REFRESH MATERIALIZED VIEW CONCURRENTLY

PostgreSQL provides a safer option:

REFRESH MATERIALIZED VIEW CONCURRENTLY mv_ordersummary_123;

This is the first thing most engineers try after discovering lock contention.

How It Works

Instead of replacing the materialized view contents directly, PostgreSQL builds updated contents alongside the existing data and swaps them internally.

Readers can continue querying during the refresh.

Pros

No blocking reads.

This is the biggest advantage.

For many workloads, this alone is sufficient.

Cons

There are important caveats.

Requires a UNIQUE index

PostgreSQL requires at least one unique index covering all rows.

CREATE UNIQUE INDEX idx_mv_ordersummary_123
ON mv_ordersummary_123(order_id);

Not every materialized view naturally has a stable unique key.

Sometimes you end up manufacturing synthetic uniqueness just to satisfy refresh requirements.

Still one massive transaction

The entire concurrent refresh runs in one transaction.

For large views, that means:

long transaction lifetimes
large WAL generation
vacuum delays
replication lag risk
Slower than regular refresh

Concurrent refreshes are usually slower than standard refreshes because PostgreSQL must maintain visibility guarantees while comparing old and new rows.

Can create bloat

Frequent concurrent refreshes can generate table and index bloat over time.

You often need aggressive autovacuum tuning or periodic maintenance.

Verdict

A strong default choice when:

you can define a stable unique index
refresh duration is acceptable
storage churn is manageable But it still couples refresh execution to the live object itself.

That limitation matters at scale.

Approach 3: Incremental Refresh with Triggers or pg_ivm

The next level is incremental maintenance.

Instead of rebuilding the entire materialized view, you update only the changed rows.

This is the idea behind extensions like pg_ivm.

How It Works

Base table writes trigger incremental updates to the materialized view.

Conceptually:

INSERT into orders
    -> trigger fires
    -> materialized aggregate updated

Pros

Near real-time freshness.
No expensive full rebuilds.
No giant refresh windows.
Excellent for low-latency analytical workloads.

Cons

The complexity increases dramatically.

Write amplification

Every write to source tables now performs additional maintenance work.

Heavy OLTP systems can suffer significant write overhead.

2.** More operational complexity**

Triggers become part of the critical write path.

That introduces:

contention
debugging complexity
migration risk
transactional edge cases
SQL limitations

Incremental maintenance works best for simpler aggregation patterns.

Complex joins, window functions, or non-deterministic logic can become difficult or unsupported.

Verdict

Excellent for:

real-time analytics
event-heavy systems
carefully controlled schemas Overkill for many batch-oriented refresh pipelines.

Approach 4 (Recommended): Blue/Green Swap with Shadow Views

Instead of refreshing the live materialized view directly, we maintain a separate shadow refresh view.

For every logical view, we maintain three physical objects:

mv_ordersummary_123           -- live
mv_ordersummary_123_refresh   -- next candidate
mv_ordersummary_123_blue      -- previous generation

The key idea:

refresh happens entirely outside the live object
validation occurs before promotion
promotion is an atomic rename swap

The expensive work is isolated from readers.

The Refresh Flow

Our refresh functionality executes roughly in this sequence:

Refresh _refresh
Validate row count
Perform atomic rename swap
Reset stale refresh view
Update metadata

The critical insight is that the live object remains untouched until promotion.

Step 1: Refresh the Shadow View
REFRESH MATERIALIZED VIEW mv_ordersummary_123_refresh;

This may take minutes.

That is fine.

Nobody reads from _refresh.

The live materialized view continues serving traffic normally.

Step 2: Validate Before Promotion

We refuse to swap in obviously broken refreshes.

At minimum:

SELECT COUNT(*) 
FROM mv_ordersummary_123_refresh;

If the row count is zero, we abort promotion.

This catches failures like:

upstream ETL issues
accidental filters
empty joins
bad migrations

Validation can be more sophisticated:

row-count deltas
checksum comparisons
timestamp sanity checks
aggregate thresholds

The important thing is validating before touching production readers.

Step 3: Atomic 3-Way Rename Swap

This is the core pattern.

Inside one transaction:

BEGIN;

ALTER MATERIALIZED VIEW mv_ordersummary_123
    RENAME TO mv_ordersummary_123_blue;

ALTER MATERIALIZED VIEW mv_ordersummary_123_refresh
    RENAME TO mv_ordersummary_123;

ALTER MATERIALIZED VIEW mv_ordersummary_123_blue
    RENAME TO mv_ordersummary_123_refresh;

COMMIT;

The rename sequence is effectively atomic from the application's perspective.

Readers either see:

the old live view
or the new live view Never a half-built state.

Why This Works Well

The expensive operation is the refresh.

The rename itself is extremely fast.

The brief ACCESS EXCLUSIVE lock during rename typically lasts milliseconds.

That is fundamentally different from holding the lock during a multi-minute refresh.

Visualizing the Swap

Before:

LIVE      -> mv_ordersummary_123
REFRESH   -> mv_ordersummary_123_refresh
OLD       -> mv_ordersummary_123_blue

After refresh completes:

LIVE      -> old data
REFRESH   -> new data
OLD       -> previous generation

After atomic rename transaction:

LIVE      -> new data
REFRESH   -> old data
OLD       -> previous live generation name

Effectively:

live -> _blue
_refresh -> live
_blue -> _refresh

Readers continue querying the stable live name the entire time.

Step 4: Reset the Stale Refresh View

This is an important optimization.

After promotion, the old live view becomes the next cycle’s _refresh candidate.

We immediately clear it:

REFRESH MATERIALIZED VIEW mv_ordersummary_123_refresh
WITH NO DATA;

This does two things:

frees stale contents
guarantees the next refresh starts clean

Without this step, old data lingers indefinitely and can confuse debugging or validation logic.

The WITH NO DATA trick is underused and extremely useful in rotation-based refresh systems.

Step 5: Update Metadata

Finally:

UPDATE company_materialized_view_config
SET view_last_refreshed = NOW()
WHERE company_id = 123;

This powers:

dashboards
alerting
freshness monitoring
retry logic

Operational metadata matters as much as the refresh itself.

Why Blue/Green Works So Well

This pattern solves several operational problems simultaneously.

1.** Readers Never Touch the Refresh Process**

The live object remains stable during expensive refresh operations.

That isolation is the biggest win.

Validation Happens Before Exposure

You can reject bad refreshes safely.

That alone justifies the extra complexity.

Failures Are Easy to Contain

If refresh fails:

log it
skip promotion
continue processing other tenants

The existing live view remains untouched.

Promotion Is Fast

The swap itself is metadata-only.

That minimizes lock duration dramatically.

Tradeoffs

No solution is free.

*Approximately 2x Storage * You effectively maintain duplicate materialized views.

For very large datasets, this matters.

*Naming Conventions Become Important * Your orchestration layer must understand:

_refresh
_blue
live names

You need deterministic naming and discovery logic.

Brief Rename Locks Still Exist

The rename transaction acquires ACCESS EXCLUSIVE.

But the duration is tiny compared to full refresh locking.

Milliseconds instead of minutes is usually an acceptable trade.

Final thoughts:
Refreshing materialized views is easy in development environments.

Refreshing them safely in production is not.

As datasets grow and concurrency increases, refresh strategy becomes an operational architecture decision rather than just a SQL command.

There is no universally correct approach:

plain refresh is simplest
concurrent refresh improves availability
incremental maintenance provides real-time freshness
blue/green swaps maximize isolation and validation safety

For our workload, the blue/green pattern provided the best balance between simplicity, reliability, and operational safety.

The core idea is straightforward:

Never rebuild the object your users are actively reading.

Refresh elsewhere.

Validate.

Promote atomically.

What's a data warehouse?

Charity Jelimo — Sun, 29 Mar 2026 17:44:51 +0000

The data challenge in modern businesses

Imagine you’re running an online store. Every day, customers place orders, browse products, and make payments. All this data is being captured across different systems; your website, payment service, and customer database.

Now, at the end of the month, you’re asked a simple question: “What are our top-selling products, and how has revenue changed over time?”

Surprisingly, answering this isn’t easy.

The data is scattered, inconsistent, and stored in systems designed for daily operations, not analysis. This is the challenge most businesses face: they have data, but they can’t easily turn it into insights.

This is where a data warehouse comes in.

A data warehouse is a centralized system that collects data from multiple sources, cleans and organizes it, and stores it in a structured format optimized for analysis. Unlike operational databases that focus on fast transactions, data warehouses are designed for querying large volumes of historical data, making them ideal for reporting, dashboards, and business intelligence.

In simple terms, a data warehouse doesn’t just store data, it transforms it into something the business can actually use to make better decisions.

Data warehouse vs other data repositories

A data warehouse is just one way to store data, and it’s important to understand how it differs from other common systems:

Operational databases (OLTP): OLTP systems record business interactions as they occur in the day-to-day operation of the organization, and support querying of this data to make inferences. Fast and efficient, they keep the business running but they aren’t built to analyze historical trends or answer complex questions.ge-scale analysis.
Data lakes: Data Lakes store everything, raw and unprocessed. This includes structured data like tables, semi-structured data like JSON files, or even unstructured data like images and logs. They’re perfect for data scientists and advanced analytics, but without organization, it can be difficult to get clear answers quickly.
Data lakehouses: Data lakehouses are a hybrid. They combine the flexibility of a data lake with the structured, query-ready features of a warehouse. You can store raw data while also running analytics, giving businesses the best of both worlds.

Common Table Expressions

Charity Jelimo — Wed, 25 Feb 2026 13:46:35 +0000

A Common Table Expression (CTE) is a temporary result set that simplifies and structures SQL queries. It is defined using the WITH keyword and can improve query readability and reusability. In some cases, CTEs can also enhance performance by avoiding redundant calculations.

What are CTEs?

Temporary Result Set:
CTEs exist only during the execution of the query and are not stored in the database.
Readability and Maintainability:
By breaking complex logic into reusable components, CTEs make queries easier to understand.
Reusable Within the Query:
A CTE can be referenced multiple times within the query, avoiding repeated logic or calculations.

Why use CTEs?

Simplify Complex Queries: Break large queries into smaller, named parts for clarity.
Eliminate Redundant Calculations: Replace repeated subqueries with a single calculation in a CTE.
Improve Maintainability: Centralize repeated logic in one place, making updates easier.
Enable Recursive Queries: Handle hierarchical or iterative data using recursive CTEs.

How Do CTEs Optimize Performance?

Reduce Redundancy: CTEs calculate a result once, reducing unnecessary repetition. Example:

SELECT 
    o.OrderID
FROM Orders o
WHERE o.TotalAmount > (SELECT AVG(TotalAmount) FROM Orders)
  AND o.TotalAmount < (SELECT AVG(TotalAmount) FROM Orders);

Problem: The subquery (SELECT AVG(TotalAmount) FROM Orders) is calculated twice.

Solution Using a CTE:

WITH AvgTotalAmount AS (
    SELECT AVG(TotalAmount) AS AvgAmount
    FROM Orders
)
SELECT o.OrderID
FROM Orders o
JOIN AvgTotalAmount a
WHERE o.TotalAmount > a.AvgAmount
  AND o.TotalAmount < a.AvgAmount;

Simplify Execution Plans:
With a CTE, the database evaluates the logic once and reuses the result.
Readable Performance Gains:
Even if there’s no computational gain, a CTE often makes execution plans easier to debug.