There are times when a query takes a long time to run. While indexes and good query design often help, sometimes the query is inherently slow.
In cases like this, we need to find an alternative way to collect the data to prevent the queries from creating slowness on a website.
Materialized views can provide significant improvement in query performance, and are especially useful in aggregated and reporting queries.
To demonstrate this I am importing a 100 million row CSV of mock temperature data from about 400 cities.
Amsterdam;2010-06-28;15.4
Kano;2017-04-23;18.7
Calgary;2016-05-07;4.3
Reggane;2014-10-04;32.0
Fukuoka;2010-04-17;22.6
Khartoum;2017-05-29;29.8
Vilnius;2014-06-16;4.2
Murmansk;2011-09-29;3.8
Parakou;2010-09-12;10.6
Cairo;2014-03-29;42.6
Edmonton;2015-04-24;-2.1
...
Creating and importing data into a table
To create the table in PostgreSQL the following table definition can be used:
CREATE TABLE weather_data (
id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
city VARCHAR(100),
recorded_date DATE,
temperature FLOAT
);
Importing the data can be done through psql
psql -h localhost -U postgres -W
\copy weather_data(city, recorded_date, temperature) FROM 'measurements.csv' WITH (FORMAT csv, DELIMITER ';', HEADER);
Querying the data
The query used here is one derived from the 1 billion row challenge.
SELECT city,
MIN(temperature) AS min_temperature,
MAX(temperature) AS max_temperature,
AVG(temperature) AS avg_temperature
FROM weather_data
GROUP BY city
ORDER BY city;
On 100 million rows, this took about 7 seconds to run. If a query like this was running on a website, it would be too slow to be practical.
An option to consider is to use Redis, or Elastic Search to take the load off the database.
Another option is to use a materialized view in PostgreSQL to store the returned results.
What are Views?
A view in SQL is a virtual table. Rather than storing data, they store a SQL query. When the virtual table is queried, it's running the more complex query underneath to return the result.
CREATE VIEW weather_stats AS
SELECT city,
MIN(temperature) AS min_temperature,
MAX(temperature) AS max_temperature,
AVG(temperature) AS avg_temperature
FROM weather_data
GROUP BY city
ORDER BY city;
To query the view we would use a query like
SELECT * FROM weather_stats;
Since this is just running the stored SQL query, this does not provide any query execution time improvement.
To improve execution time, a materialized view would be a better option.
What are Materialized Views?
Materialized views are quite different from regular views. Instead of storing a SQL query and running the query each time it's used, materialized views store the result set on disk.
This makes accessing the result set much faster than just running the raw query.
To create a materialized view, the syntax is almost the same as a regular view
CREATE MATERIALIZED VIEW weather_stats AS
SELECT city,
MIN(temperature) AS min_temperature,
MAX(temperature) AS max_temperature,
AVG(temperature) AS avg_temperature
FROM weather_data
GROUP BY city
ORDER BY city;
The create time for the materialized view will be about the same amount of time as it would be if you just ran the query, however when querying the materialized view, instead of the query taking 7 seconds, it returns results in about 5ms.
SELECT * FROM weather_stats;
Refreshing materialized views
Since materialized views store the result set, the data can become stale.
To prevent the data from becoming too stale, the materialized view can be refreshed as regularly as needed to keep the data up to date.
To refresh the data, a REFRESH query is run
REFRESH MATERIALIZED VIEW weather_stats;
Refreshing the data will temporarily block access to the view, causing downtime.
Once the refresh is complete, the data will become available again. The downtime would be based on how long the underlying query takes to run.
Concurrent Materialized View Refresh
To avoid downtime, concurrent refreshes can be used.
A concurrent refresh creates a temporary copy of the result set and when the refresh is complete, switches the materialized view to the new data. This allows access to the data during a data refresh.
Concurrent refreshes require a unique index to be added to the materialized view.
CREATE UNIQUE INDEX idx_weather_stats_city ON weather_stats(city);
Once the index has been created, a concurrent refresh can be run
REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats;
Updating A Materialized View Schema
Updating the views schema requires dropping the view and recreating it. Downtime would occur in these cases.
DROP MATERIALIZED VIEW IF EXISTS weather_stats;
After the view has been dropped, a new CREATE query can be run to build the materialized view again and import the result set into the new table definition.
Scheduling the materialized view refresh
To schedule the materialized view refresh, an external script could be created, and use crontab to trigger the refresh.
Alternatively, it's possible to schedule the refresh inside PostgreSQL
Installing pg_cron extension
To run cron jobs inside PostgreSQL, the PostgreSQL pg_cron
extension needs to be installed.
sudo apt-get install postgresql-17-cron
Make sure you install the version compatible with the version of PostgreSQL you have installed.
Add the extension to postgresql.conf
sudo nano /etc/postgresql/17/main/postgresql.conf
and add the extension at the bottom of the conf file
shared_preload_libraries = 'pg_cron'
Restart PostgreSQL
sudo systemctl restart postgresql
Activate the extension in PostgreSQL by running the CREATE EXTENSION query.
CREATE EXTENSION IF NOT EXISTS pg_cron;
Now we can schedule the view refresh
SELECT cron.schedule('refresh_weather', '0 * * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats');
This will set the cron to refresh the view every hour on the hour.
Check that the cron is running by executing this query
SELECT * FROM cron.job;
jobid | schedule | command | nodename | nodeport | database | username | active | jobname |
---|---|---|---|---|---|---|---|---|
1 | 0 * * * * | REFRESH MATERIALIZED VIEW CONCURRENTLY weather_stats | localhost | 5432 | postgres | postgres | true | refresh_weather |
When not to use materialized views
While materialized views are a useful feature, they are not the right solution for every situation.
In cases where data changes frequently and/or when data consistency is critical, materialized views are not a good option to use.
Currency exchange rates, where the data needs to be up to date to the second would not be a good use case for materialized views unless you were looking to store historical data.
It's also worth noting that because materialized views are stored on disk, that the disk space needed to store the view needs to be considered.
Summary
While there are a few concepts to learn with materialized views, they can significantly improve query execution time compared to raw queries, which will benefit your website.
Materialized views are not a good option for all cases. Understanding your data and customer needs will help determine if they are a viable solution or not.
Top comments (0)