DEV Community

Cover image for Faster Reads With Materialized Views
sreevardhanreddi
sreevardhanreddi

Posted on

Faster Reads With Materialized Views

This Blog aims to give you a gist of Views in PostgreSQL and how to integrate in Django.

PostgreSQL Views are quite handy, think of Views as wrapping(abstracting) your complex queries and assigning a name to them.

They are several benifits of using Views, here are few

  • Views hide the complexity

if you have a query that requires aggregating or joining multiple tables and has complex logic, you can code all that logic into a view
and then retreive data from it as if it were a normal table.

  • Views could be used for security

imagine you have a table with sensitive data in certain columns/rows and you want to give access to a user without exposing that data, views allow us to wrap only the required data and grant permission to that view instead of underlying table.

Like all the SQL databases postgres has two kinds of views

  • Views
  • Materialized Views

Here are few differences between Views and Materialized Views

Views Materialized Views
Views are not stored in the disk. Materiliazed Views are stored on the disk.
Whenever we read data from views, data is read from the underlying table. Since materialized views has its own table, reading from materialized views will read from the table directly, but not from the underlying table.

Let us try to implement Materialized Views. Consider a following scenario of a Blogging site. A typical Blogging site models would contain

  • blogs table
  • categories table
  • tags table

where the relationship between them is quite straightforward, blogs has foreign key to categories and has a many to many relationship with tags. The equivalent Django Models would look like this.


class Tags(models.Model):
    name = models.CharField(max_length=200, unique=True)

    class Meta:
        verbose_name = "Tag"
        verbose_name_plural = "Tags"
        db_table = "tags"

    def __str__(self):
        return self.name


class Category(models.Model):
    name = models.CharField(max_length=200, unique=True)

    class Meta:
        verbose_name = "Category"
        verbose_name_plural = "Categories"
        db_table = "categories"

    def __str__(self):
        return self.name


class Blog(models.Model):
    title = models.CharField(max_length=200, unique=True)
    content = models.TextField()
    created_at = models.DateTimeField(auto_now=True)
    is_published = models.BooleanField(default=True)
    category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True)
    tags = models.ManyToManyField(Tags)

    class Meta:
        verbose_name = "Blog"
        verbose_name_plural = "Blogs"
        db_table = "blogs"

    def __str__(self):
        return self.title


Enter fullscreen mode Exit fullscreen mode

Let's assume a situation where we have to query all the three tables and get data from all the three tables, for us to achieve this we'll have to join all the tables, the resulting django query would look like this



queryset = Blog.objects.select_related("category").prefetch_related("tags").all()

# the resulting SQL query generated by Django ORM is

SELECT "blogs"."id",
    "blogs"."title",
    "blogs"."content",
    "blogs"."created_at",
    "blogs"."is_published",
    "blogs"."category_id",
    "categories"."id",
    "categories"."name"
FROM "blogs"
LEFT OUTER JOIN "categories"
ON ("blogs"."category_id" = "categories"."id")
LIMIT 21

SELECT ("blogs_tags"."blog_id") AS "_prefetch_related_val_blog_id",
       "tags"."id",
       "tags"."name"
  FROM "tags"
 INNER JOIN "blogs_tags"
    ON ("tags"."id" = "blogs_tags"."tags_id")
 WHERE "blogs_tags"."blog_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)

Enter fullscreen mode Exit fullscreen mode

Notice that Django actually performs two queries here, we can do better by writing our own query :)


SELECT b.id,
       b.title AS blog_title,
       b.created_at AS blog_created_at,
       b.is_published AS blog_is_published,
       b.category_id AS blog_category_id,
       b.content AS blog_content,
       c.id AS category_id,
       c.name AS category_name,
       bt.blog_id AS blog_tag_blog_id,
       count(t.id) AS tag_count,
       jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tag_json
FROM blogs b
LEFT JOIN categories c ON b.category_id = c.id
LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
LEFT JOIN tags t ON bt.tags_id = t.id
GROUP BY b.id,
         c.id,
         bt.blog_id
ORDER BY b.id;


Enter fullscreen mode Exit fullscreen mode

As you can see the query is quite complex, we'll need to run the query every time when we need to access the data, thankfully views provide a convenient way to make it simple.

Let's see how we can create a Materialized Views


CREATE materialized VIEW mv_blogs_with_categories_and_tags_combined AS
SELECT b.id,
       b.title AS blog_title,
       b.content AS blog_content,
       b.created_at AS blog_created_at,
       b.is_published AS blog_is_published,
       c.id AS category_id,
       c.name AS category_name,
       count(t.id) AS tag_count,
       jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tags
FROM blogs b
LEFT JOIN categories c ON b.category_id = c.id
LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
LEFT JOIN tags t ON bt.tags_id = t.id
GROUP BY b.id,
         c.id,
         bt.blog_id
ORDER BY b.id WITH NO DATA;


--- create a unique index, this is needed when we try to refresh the materialized view concurrently
CREATE UNIQUE INDEX ON mv_blogs_with_categories_and_tags_combined (id);

--- refresh the materialized view, to populate the data into it.
REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;

-------- refresh the materialized view concurrently, without locking the table
-------- when we refresh the view without concurrently , a lock is acquired on the table
-------- to prevent this we use concurrently
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_blogs_with_categories_and_tags_combined;


Enter fullscreen mode Exit fullscreen mode

In the below screenshots you can see the execution times, when querying the materialized view table vs performing the complex query.

without_materialized_views

Execution Time: 25.874 ms

Enter fullscreen mode Exit fullscreen mode

with_materialized_views

Execution Time: 0.155 ms

Enter fullscreen mode Exit fullscreen mode

Speed Up ~ 160x

All the code is available here in this repo https://github.com/sreevardhanreddi/django-materialized-views


Integration With Django

Now that the SQL is done, let us go through integrating it with Django

Create an empty migrations file using this command

python manage.py makemigrations --name materialized_blogs pg_app --empty

Enter fullscreen mode Exit fullscreen mode

This file that is generated is used to create the migrations using Django's migrate command and it like show below


# Generated by Django 3.2.6 on 2021-08-12 06:59

from django.db import migrations
from django.db.migrations.operations.special import RunSQL


class Migration(migrations.Migration):

    dependencies = [
        ("pg_app", "0001_initial"),
    ]

    operations = [
        migrations.RunSQL(
            """
            CREATE materialized VIEW mv_blogs_with_categories_and_tags_combined AS
            SELECT b.id,
                b.title AS blog_title,
                b.content AS blog_content,
                b.created_at AS blog_created_at,
                b.is_published AS blog_is_published,
                c.id AS category_id,
                c.name AS category_name,
                count(t.id) AS tag_count,
                jsonb_agg(jsonb_build_object('tag_id', t.id, 'tag_name', t.name)) AS tags
            FROM blogs b
            LEFT JOIN categories c ON b.category_id = c.id
            LEFT JOIN blogs_tags bt ON b.id = bt.blog_id
            LEFT JOIN tags t ON bt.tags_id = t.id
            GROUP BY b.id,
                    c.id,
                    bt.blog_id
            ORDER BY b.id WITH NO DATA;
            """,
            """
                DROP MATERIALIZED VIEW IF EXISTS mv_blogs_with_categories_and_tags_combined;
            """,
        ),
        migrations.RunSQL(
            """
            REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;
            """,
            """
            REFRESH MATERIALIZED VIEW mv_blogs_with_categories_and_tags_combined;
            """,
        ),
        migrations.RunSQL(
            """
            CREATE UNIQUE INDEX idx_mv_blogs_with_categories_and_tags_combined_id ON mv_blogs_with_categories_and_tags_combined (id);
            """,
            """
            DROP INDEX IF EXISTS idx_mv_blogs_with_categories_and_tags_combined_id
            """,
        ),
    ]


# RunSQL allows you to write code for migrating forwards and backwards
# i.e, applying migrations and unapplying them. here
# the first string in RunSQL is the forward SQL, the second is the reverse SQL


Enter fullscreen mode Exit fullscreen mode

Once you have added the SQL commands, run python manage.py migrate this will apply the migrations to the database.

After this create model, which reflects the materialized view


class BlogsWithCategoriesAndTagsCombined(models.Model):
    id = models.IntegerField(primary_key=True)
    blog_title = models.CharField(max_length=200, unique=True)
    blog_content = models.TextField()
    blog_created_at = models.DateTimeField()
    blog_is_published = models.BooleanField()
    category_id = models.IntegerField()
    category_name = models.CharField(max_length=200)
    tag_count = models.IntegerField()
    tags = models.JSONField()

    class Meta:
        managed = False
        db_table = "mv_blogs_with_categories_and_tags_combined"

    def __str__(self):
        return self.blog_title

Enter fullscreen mode Exit fullscreen mode

Notice the meta class here, it has managed = False , this tells django to not create the table for this model in the database.

Once this is done, querying it is as simple as


queryset = BlogsWithCategoriesAndTagsCombined.objects.all()

Enter fullscreen mode Exit fullscreen mode

Since materialized views rely on the tables for the data, when a table updates, we'll have to manually refresh the materialized views, this can be done inside a cron job using django-crontab or celery.

This was originally posted here https://sreevardhanreddi.github.io/

Top comments (0)