DEV Community

Lawrence Cooke
Lawrence Cooke

Posted on

How To Use Materialized Views

There are times when a query takes a long time to run. While indexes and good query design often help, sometimes the query is inherently slow.

In cases like this, we need to find an alternative way to collect the data to prevent the queries from creating slowness on a website.

Materialized views can provide significant improvement in query performance, and are especially useful in aggregated and reporting queries.

To demonstrate this I am importing a 100 million row CSV of mock temperature data from about 400 cities.

Amsterdam;2010-06-28;15.4
Kano;2017-04-23;18.7
Calgary;2016-05-07;4.3
Reggane;2014-10-04;32.0
Fukuoka;2010-04-17;22.6
Khartoum;2017-05-29;29.8
Vilnius;2014-06-16;4.2
Murmansk;2011-09-29;3.8
Parakou;2010-09-12;10.6
Cairo;2014-03-29;42.6
Edmonton;2015-04-24;-2.1
...
Enter fullscreen mode Exit fullscreen mode

Creating and importing data into a table

To create the table in PostgreSQL the following table definition can be used:

CREATE TABLE weather_data (
    id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    city VARCHAR(100),
    recorded_date DATE,
    temperature FLOAT
);
Enter fullscreen mode Exit fullscreen mode

Importing the data can be done through psql

psql -h localhost -U postgres -W
Enter fullscreen mode Exit fullscreen mode
\copy weather_data(city, recorded_date, temperature) FROM 'measurements.csv' WITH (FORMAT csv, DELIMITER ';', HEADER);
Enter fullscreen mode Exit fullscreen mode

Querying the data

The query used here is one derived from the 1 billion row challenge.

SELECT city, 
  MIN(temperature) AS min_temperature, 
  MAX(temperature) AS max_temperature, 
  AVG(temperature) AS avg_temperature
FROM weather_data 
GROUP BY city
ORDER BY city;
Enter fullscreen mode Exit fullscreen mode

Image description

On 100 million rows, this took about 7 seconds to run. If a query like this was running on a website, it would be too slow to be practical.

An option to consider is to use Redis, or Elastic Search to take the load off the database.

Another option is to use a materialized view in PostgreSQL to store the returned results.

What are Views?

A view in SQL is a virtual table. Rather than storing data, they store a SQL query. When the virtual table is queried, it's running the more complex query underneath to return the result.

CREATE VIEW weather_stats AS
SELECT city, 
  MIN(temperature) AS min_temperature, 
  MAX(temperature) AS max_temperature, 
  AVG(temperature) AS avg_temperature
FROM weather_data 
GROUP BY city
ORDER BY city;
Enter fullscreen mode Exit fullscreen mode

To query the view we would use a query like

SELECT * FROM weather_stats;
Enter fullscreen mode Exit fullscreen mode

Since this is just running the stored SQL query, this does not provide any query execution time improvement.

To improve execution time, a materialized view would be a better option.

What are Materialized Views?

Materialized views are quite different from regular views. Instead of storing a SQL query and running the query each time it's used, materialized views store the result set on disk.

This makes accessing the result set much faster than just running the raw query.

To create a materialized view, the syntax is almost the same as a regular view

CREATE MATERIALIZED VIEW weather_stats AS
SELECT city, 
  MIN(temperature) AS min_temperature, 
  MAX(temperature) AS max_temperature, 
  AVG(temperature) AS avg_temperature
FROM weather_data 
GROUP BY city
ORDER BY city;
Enter fullscreen mode Exit fullscreen mode

The create time for the materialized view will be about the same amount of time as it would be if you just ran the query, however when querying the materialized view, instead of the query taking 7 seconds, it returns results in about 5ms.

SELECT * FROM weather_stats;
Enter fullscreen mode Exit fullscreen mode

Refreshing materialized views

Since materialized views store the result set, the data can become stale.

To prevent the data from becoming too stale, the materialized view can be refreshed as regularly as needed to keep the data up to date.

To refresh the data, a REFRESH query is run

REFRESH MATERIALIZED VIEW weather_stats;
Enter fullscreen mode Exit fullscreen mode

Refreshing the data will temporarily block access to the view, causing downtime.

Once the refresh is complete, the data will become available again. The downtime would be based on how long the underlying query takes to run.

Concurrent Materialized View Refresh

To avoid downtime, concurrent refreshes can be used.

A concurrent refresh creates a temporary copy of the result set and when the refresh is complete, switches the materialized view to the new data. This allows access to the data during a data refresh.

Concurrent refreshes require a unique index to be added to the materialized view.

CREATE UNIQUE INDEX idx_weather_stats_city ON weather_stats(city);
Enter fullscreen mode Exit fullscreen mode

Once the index has been created, a concurrent refresh can be run

REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats;
Enter fullscreen mode Exit fullscreen mode

Updating A Materialized View Schema

Updating the views schema requires dropping the view and recreating it. Downtime would occur in these cases.

DROP MATERIALIZED VIEW IF EXISTS weather_stats;
Enter fullscreen mode Exit fullscreen mode

After the view has been dropped, a new CREATE query can be run to build the materialized view again and import the result set into the new table definition.

Scheduling the materialized view refresh

To schedule the materialized view refresh, an external script could be created, and use crontab to trigger the refresh.

Alternatively, it's possible to schedule the refresh inside PostgreSQL

Installing pg_cron extension

To run cron jobs inside PostgreSQL, the PostgreSQL pg_cron extension needs to be installed.

sudo apt-get install postgresql-17-cron
Enter fullscreen mode Exit fullscreen mode

Make sure you install the version compatible with the version of PostgreSQL you have installed.

Add the extension to postgresql.conf

sudo nano /etc/postgresql/17/main/postgresql.conf
Enter fullscreen mode Exit fullscreen mode

and add the extension at the bottom of the conf file

shared_preload_libraries = 'pg_cron'
Enter fullscreen mode Exit fullscreen mode

Restart PostgreSQL

 sudo systemctl restart postgresql
Enter fullscreen mode Exit fullscreen mode

Activate the extension in PostgreSQL by running the CREATE EXTENSION query.

CREATE EXTENSION IF NOT EXISTS pg_cron;
Enter fullscreen mode Exit fullscreen mode

Now we can schedule the view refresh

SELECT cron.schedule('refresh_weather', '0 * * * *', 
    'REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats');
Enter fullscreen mode Exit fullscreen mode

This will set the cron to refresh the view every hour on the hour.

Check that the cron is running by executing this query

SELECT * FROM cron.job;
Enter fullscreen mode Exit fullscreen mode
jobid schedule command nodename nodeport database username active jobname
1 0 * * * * REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats localhost 5432 postgres postgres true refresh_weather

When not to use materialized views

While materialized views are a useful feature, they are not the right solution for every situation.

In cases where data changes frequently and/or when data consistency is critical, materialized views are not a good option to use.

Currency exchange rates, where the data needs to be up to date to the second would not be a good use case for materialized views unless you were looking to store historical data.

It's also worth noting that because materialized views are stored on disk, that the disk space needed to store the view needs to be considered.

Summary

While there are a few concepts to learn with materialized views, they can significantly improve query execution time compared to raw queries, which will benefit your website.

Materialized views are not a good option for all cases. Understanding your data and customer needs will help determine if they are a viable solution or not.

Top comments (0)