DEV Community

Cover image for 5 things developers must know about Materialized Views and Materialize
dpmaterialize
dpmaterialize

Posted on

5 things developers must know about Materialized Views and Materialize

From smart homes and industrial systems to web browsers and server logs, there are streaming applications everywhere around us. As these applications run, new data could roll in several hundred times a second. If you're running some kind of data analysis —an average, sum, or complex aggregation—you want results to be as fresh as possible. As an example, if you're in the e-Commerce business asking for the total amount spent on the latest sale plus the total amount spent over the customer's lifetime, you better include the latest sale in the total.

Building technology that uses SQL to query streaming data and incrementally maintain streaming data views is challenging. To solve this, enterprises collect streaming data and batch-load it into analytical data warehousing systems, which can then be queried using SQL. This increases the “time to insight”, failing to capture all the events up-to-the near millisecond. With technologies like Materialize, you can work with streaming data using SQL, without compromising on speed. Materialize offers a fully-managed streaming database built for real-time enterprise data apps that is easy to use, powerful and scalable.

We believe that there are still many people who might not know much about Materialize. Are you one of them? Continue reading — we’ll cover 5 things every developer must know about materialized views in Materialize.

Before we dive in, let’s cover some quick basics….

Why Materialized Views?

If you’re new to databases and don’t know what a materialized view is, check out Andy Hattemer’s blog on “What is a Materialized View?”. In case you’re wondering, here are some of the key reasons why you might want to use a materialized view to power your workload:

Data partitioning - Depending on which data fields need to be shown, multiple materialized views can be built on the base tables to improve performance. In that way, the view surfaces only the fields it needs, and Materialize guarantees that incoming data stream changes are immediately visible across all the materialized views using that data.

Data pre-processing - Materialized views really show their value when your data requires some pre-processing before it’s viewed. In other words, instead of just reproducing a field in a data record, you may need to run some operation such as a pre-aggregate function (a sum, for instance) or even a pre-join or filter operation that creates completely new data from the underlying fields. You can learn more in Jessica Laughlin’s “Why use a materialized view” blog.

Data enrichment - In this scenario, Materialize views don’t just combine data from tables, but also combine row data in tables with streaming data in real-time to produce net new dynamic result sets. For example, let’s say you want to combine live clickstream event data with historical pre-computed data to decide whether or not the user should be served an ad on the site.

Now that we understand why you might want to use materialized views in an application, let’s move on to 5 things every developer should know about Materialize.

1: If you know Postgres, you know Materialize

Materialize is Postgres wire-compatible, which means that if you and your team are familiar with Postgres then you can easily use Materialize. For instance, Materialize also defines materialized views using SQL, just like Postgres does. With features such as JOINs, lateral sub-queries, and more, Materialize stands out among streaming data solutions. The result of a common SQL language interface is that Materialize can integrate with a wide variety of data engineering tools, powering a diverse range of applications, including reactive web and mobile apps, data segmentation, and analytical apps.

2: In Materialize, views work on streaming data

In most traditional databases, materialized views are not recommended when the data in the base table is constantly changing. Due to their design, these systems typically cannot handle high-velocity, real-time data streams. Streaming data is at the core of Materialize's design. If you have a legacy app that stores data in Postgres tables, materialized views can be easily set up to use data from those tables. In the case of newer greenfield streaming applications, like those built using technologies like Apache Kafka®, materialized views can be built to directly consume the Kafka stream and stay up-to-date with the changing data.

3: In Materialize, views are optimized using dataflows

OLAP databases predominantly use a pull-based query execution model that moves compute to data. In these cases, the compute step happens typically at the time of data reads. At the heart of Materialize’s design is a stack of long-researched stream processing frameworks called Timely Dataflow and Differential Dataflow. Dataflow frameworks flip the programming paradigm, using directed graphs and fixed operators to accommodate data-intensive streaming workloads where data moves at high speed while queries are long-lived and reactive to data changes.

4: In Materialize, views are performant yet consistent

In Materialize, the materialized views are served straight out of memory. This results in low-latency, high-throughput query responses for the application. Also, unlike other data systems that require a lot of compute resources, the incremental computation required to maintain a materialized view is not proportional to the size of the full dataset. This translates to lesser memory requirements, compute, and overall cost for your development project. Aside from performance, Materialize also ensures strong data consistency, so events hitting the database during a particular time window are properly accounted for in the materialized view. With Materialize, you don’t have to choose between scalable performance and strong consistency - you get both.

5: Materialize scales as your business scales

In recent years, separating storage and compute layers has become a hot topic for building large-scale distributed systems. This fundamental idea is at the core of the “Unbundled architecture” that we are developing in Materialize Cloud. If you want to learn more about it, check out Frank McSherry’s recent webinar on this topic. This results in a data architecture that is horizontally scalable, highly available (through active replication), and capable of ingesting data-intensive streaming workloads even when the compute is busy or idle, while serving low latency SQL queries.

Looking ahead with Materialize

We are working on building the next generation of Materialize in the cloud. If you’d like to give it a try, sign up for early access. Additionally, if you have questions or are interested in connecting with others using Materialize, join the Materialize community. We look forward to engaging with you, hearing your feedback, and learning about the new use cases you’re building out with Materialize.

Top comments (0)