DEV Community: ReadySet

Optimizing Performance: A 2024 Updated Guide to Setting Up Caching in Laravel

ReadySet — Mon, 04 Mar 2024 16:50:18 +0000

Laravel is an excellent framework for those who want to use PHP. It pairs well with other languages, offers clean code, and includes features for a full-stack application out-of-the-box.

One of those features is caching. Laravel includes an elegant, comprehensive caching system designed to be highly flexible, allowing developers to choose from various caching drivers and strategies, tailoring the caching solution to fit their applications' specific needs and architecture.

This adaptability ensures that Laravel applications remain scalable, responsive, and efficient, regardless of complexity or user base size.

This article will thoroughly explore Laravel caching, examine the various cache configurations available in Laravel, and discuss the reasons for utilizing Readyset. So, let's begin.

Caching Options in Laravel

Laravel has a robust caching system that supports various drivers, facilitating seamless caching implementation.

The configuration of caching in a Laravel application is managed in the .env file. To change the caching option, modify the CACHE_DRIVER value in .env to your preferred caching driver. By default, it's set to file. You can find all supported caching options in config/cache.php.

File

File caching stores cache data in your computer's file system. It offers a simple, disk-based caching mechanism suitable for smaller applications but may not be optimal for larger ones. To use file caching, add to your .env:

CACHE_DRIVER=file

Array

The array cache driver stores data in a PHP array. This non-persistent method is suitable for request-based caching, especially during application testing. To use it, set in your .env:

CACHE_DRIVER=array

Database

Database caching stores cache data in a database table, ideal for applications needing database-driven caching. First, create a cache table using:

php artisan cache:table

Then, run migrations:

php artisan migrate

Finally, update your .env:

CACHE_DRIVER=database

Memcached

Memcached is a memory caching system for distributed caching environments. Install the Memcached PECL package, ensure Memcached is installed and running on your server, and set in your .env:

CACHE_DRIVER=memcached

Redis

Redis offers advanced key-value store caching with data persistence and high performance. Install the Redis server and the PhpRedis PHP extension via PECL or the predis/predis package (e.g., composer require predis/predis "^1.1") via Composer. Then, update your .env:

CACHE_DRIVER=redis

DynamoDB

For DynamoDB caching, create a table in AWS DynamoDB and note its name. Install the AWS SDK for PHP via Composer (composer require aws/aws-sdk-php), set AWS credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION) in your .env file, update config/cache.php with DynamoDB details, and set:

CACHE_DRIVER=dynamodb

Octane

Laravel Octane enhances performance significantly and can be used for caching. Install Octane via composer require laravel/octane, choose a caching backend like Swoole Table or Redis, update config/cache.php to set 'octane' as the default cache driver, and configure your .env:

CACHE_DRIVER=octane

Start Octane with php artisan octane:start.

Null

The null cache driver effectively disables caching, useful in specific environments or during development/testing when caching interference is undesirable. Set in your .env:

CACHE_DRIVER=null

Configuring Cache Settings

For further customization and configuration of cache-related settings, modify the config/cache.php file. This file allows for additional settings specific to each caching driver and the overall caching strategy of your application.

Laravel Cache Methods

Laravel offers a versatile range of caching methods, allowing for the easy implementation of various caching strategies. Here's an explanation of these methods, organized by functionality:

Storing Data in Cache Methods

Cache::put()

This function stores data in Laravel's cache. It requires a key, the value to be stored, and an optional duration (in seconds) for how long the data should be kept. If the duration is not specified, it uses the default duration from config/cache.php:

Cache::put('key', 'value', 60); // Stores 'value' under 'key' for 60 seconds.

Cache::putMany()

Similar to Cache::put(), this function allows caching of several items simultaneously with a shared expiration duration:

Cache::putMany(['data1' => 'value1', 'data2' => 'value2'], 30); // Stores 'value1' and 'value2' for 30 seconds.

Cache::remember()

This method checks for the specified key in the cache. If not found, it executes the provided closure function to obtain data, caches the result, and returns it. This is useful for reducing database queries:

$value = Cache::remember('key', 60, function() {
    return 'data'; // Data retrieval logic
});

Cache::forever()

Cache::forever() is used to store data in the cache indefinitely, ideal for rarely changing data:

Cache::forever('key', 'data');

Retrieving and Managing Cached Data

Cache::get()

Retrieves the value for a specified key. If the key does not exist, it returns null or a default value if provided:

$data = Cache::get('key', 'default-value');

Cache::many()

Fetches multiple cache items using an array of keys and returns an array of values:

$data = Cache::many(['key1', 'key2']);

Cache::forget()

Cache::forget() removes a specific item from the cache, useful for invalidating cache entries:

Cache::forget('key');

Cache::flush()

Cache::flush() clears all cache data. Use with caution as it removes everything stored in the cache:

Cache::flush();

Adjusting Cached Values: Increment and Decrement

To increment or decrement cached values, use Cache::increment() and Cache::decrement():

Cache::increment('key');
Cache::decrement('key');

Advanced Features

Cache::lock()

For handling race conditions, Laravel offers atomic locks using Cache::lock(). This is an advanced feature useful in specific scenarios.

While there are more methods available in Laravel's cache system, the ones mentioned above are among the most widely used. For a comprehensive list, refer to Laravel's official cache documentation.

Useful Laravel Cache Commands

Laravel provides several commands through artisan to simplify cache management. Below are some of these essential commands:

Clear Laravel Cache

To clear the application cache before it expires, especially useful in development, use:

php artisan cache:clear

This command removes all items from the cache.

Clear Route Cache

Laravel caches the application routes for improved performance. To clear this route cache:

php artisan route:clear

And to cache the routes, which is advisable in production for faster route registration:

php artisan route:cache

Clear Config Cache

When you change config files and need to refresh the cached configuration, use:

php artisan config:clear

This is especially important after deploying changes in a production environment.

Clear Compiled Views

Compiled view files can be cleared with:

php artisan view:clear

Useful when updating views and needing to force Laravel to recompile them.

Clear All Cache

For a comprehensive cache clearing that includes cache, route cache, view cache, and compiled services, use:

php artisan optimize:clear

This command is particularly helpful during deployments to ensure all cached elements are refreshed. Each of these commands plays a crucial role in efficiently managing cached data and ensuring the smooth operation of a Laravel application.

Why Readyset?

Caching is a widely adopted technique used to enhance data retrieval speed. It works by storing the results of data requests in memory. This way, when that same data is required again, it can be quickly fetched from memory instead of being reprocessed.

However, caching isn't without its challenges. One of the primary issues is managing the cache effectively. This involves updating the cache regularly and removing (or invalidating) cached data when it's no longer accurate or relevant.

Readyset is a specialized SQL caching tool designed to enhance database performance. It stands in the middle of your application and your MySQL or PostgreSQL database and caches data.

Here's a simple breakdown of how Readyset works:

Unique Approach: Unlike typical caching methods, Readyset employs a dataflow graph technique. This means it automatically updates cached data whenever new information is added to your primary database.
Seamless Integration: After capturing an initial image of your database, Readyset positions itself as a secondary database. This allows it to continuously receive and incorporate new data directly from your main database server.
Protocol Compatibility: Readyset is compatible with both MySQL and PostgreSQL protocols. This compatibility means you can connect your application straight to Readyset without major adjustments. If Readyset encounters a query that it hasn't cached, it simply forwards this query to your main database to ensure consistent and accurate data retrieval.

Using Readyset With Laravel

To get started using Readyset in a Laravel project, you need to

Install Readyset on your machine or server
Connect to Readyset
Cache Queries
Configure your Laravel application database to connect to Readyset

Now we know what to do, let's get started with the implementation.

Install Readyset

We need a database to connect to. For this tutorial, we will use an employee sample database. To follow along, you can find the steps to download and install it on GitHub.

The recommended approach to install Readyset is through Docker. Ensure that you have Docker installed globally. Next, download the Readyset Docker image from the Docker hub using the command below

docker pull readysettech/readyset

After downloading the image, you need your primary database credentials to start Readyset. Your credentials should be in this format:

mysql://<username>:<password>@<host>:<port>/<db_name>

Let's start Readyset as a root user with a password of password and connect to the employee database we downloaded above.

docker run -d -p 3307:3307 -p 6034:6034         \
--name readyset                                 \
-e UPSTREAM_DB_URL=mysql://root:password@host.docker.internal:3306/employees \
-e LISTEN_ADDRESS=0.0.0.0:3307                  \
readysettech/readyset:latest

This will start up Readyset. You can confirm that Readyset is running from your Docker console by looking at the Log of the Readyset docker container. Your screen should be similar to the image below

You can also confirm this in the terminal using the command below

docker logs readyset

Note: Your Database user would require a password to connect to Readyset. MySQL or Postgres users without passwords would not work. If your database runs locally, use host.docker.internal not localhost for the connection.

For a more detailed guide on installing Readyset, visit the Readyset documentation.

Connect to Readyset

We can connect to Readyset the same way we would connect to MySQL using the terminal. Readyset uses a different port number 3307. MySQL uses port 3306 by default. This is the only significant difference in the connection process. Now let's connect to Readyset using the same details as the root user above.

mysql -u root -p password -P 3307

Verify that you are connected using the command below:

mysql> SHOW READYSET TABLES;

The tables with the status of snapshotted can now be cached using Readyset.

Create Cache

To cache using Readyset, we first run a query one time, check that the query is cacheable by Readyset, then create cache for the query. To get started, let's run a simple query to count how many employees earn 100,000:

mysql> SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;

Notice that the request took 0.44 seconds to complete. Next, let's ensure that the query is cacheable by Readyset using this command:

mysql> SHOW PROXIED SUPPORTED QUERIES;

Let's create our cache by adding CREATE CACHE FROM to the query and then run the query after caching to verify it works:

mysql> CREATE CACHE FROM SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;

When this command runs, repeat the select query to see how fast it runs:

mysql> SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;

Now it takes only 0.03 seconds to run the same command that took 0.44 seconds to run earlier.

Readyset Caching With Laravel

Now we have Readyset installed, let's take the implementation a step forward by implementing caching in a simple Laravel API using Readyset.

Remember that Readyset sits between your MySQL database and your application. To see Readyset in action, we will execute the request we already created a cache for earlier: SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000; in our Laravel API and measure the response time when our database is connected to Readyset and when it is not.

To get started, let's create a new Laravel project using the composer command below:

composer create project laravel/laravel Readyset_Laravel_Caching

Once the project is created, connect your application to your database. In your .env file, add your database credentials like the ones below:

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=employees
DB_USERNAME=<username>
DB_PASSWORD=<password>

API Route

The next step would be to create a route for the request. Navigate to routes/api.php file and add the following:

Route::get('/test-cache', function () {
    $employees = (DB::select('SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000'));
    return response()->json($employees);
});

Testing the API

Now let serve our API and compare the response time of the running the request when caching with Readyset and when not.

Serve the application using the command below:

php artisan serve

Request Without Cache

To test the application, you can use any API client you prefer. If you find it difficult to expose your endpoint to the internet for testing, I suggest using Ngrok. If you're unfamiliar with Ngrok, here's a useful tutorial to help you set it up.

Response time of request without caching.

Request With Readyset Cache

As Readyset caches SQL results, you can add it to Laravel via the database setup rather than through the CACHE_DRIVER.

Connecting to Readyset can be done simply by changing the port number of your database connection in the env file from DB_PORT=3306 to DB_PORT=3307. Let's do that using the code below:

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3307
DB_DATABASE=employees
DB_PASSWORD=<username>
DB_USERNAME=<password>

Easily Manage Caching in Laravel With Readyset

Laravel developers are spoiled with caching options. From simple file caching through to DynamoDB, you can find a caching solution for any situation. Add to that, the caching API is easy to use and understand.

But you are still stuck with managing that cache and all the headaches that go with it. Readyset removes those headaches and makes it easy for laravel developers to incorporate SQL caching into their applications with zero code changes. Here, we got our hands dirty with the code by:

Installing and running Readyset using Docker
Creating a simple API in Laravel
Seeding our database
Connecting our database to Readyset, and implementing caching

Feel free to experiment with the code as much as you can and share your experience with the rest of the community. If you are interested in using Readyset, you can sign up here.

Investigating and Optimizing Over-Querying

ReadySet — Wed, 14 Feb 2024 14:59:13 +0000

Imagine you're running a popular e-commerce online bookstore that offers a vast collection of titles and authors to a growing user base. However, you've noticed a troubling trend over the past few months: the website is gradually slowing down, especially during peak hours when users browse various book categories. After an initial investigation, you find that the cause of the slowdown isn't an increase in user traffic or a lack of server resources. Instead, it's rooted in the very foundation of how your application interacts with your Postgres database.

The culprit? N+1 query problems. As more users navigate your site, more requests are made to the database to fetch information. Instead of being efficiently retrieved in grouped queries, each request individually pulls associated data like author details, reviews, and related books. What should have been a streamlined operation has turned into a burdensome load on your database, leading to longer load times and a compromised user experience.

This scenario is not unique to your online bookstore. Regardless of size or domain, many applications encounter similar performance bottlenecks due to N+1 queries. Understanding the nature of these queries, their impact on database performance, and how to optimize them is crucial for developers and database administrators. Here, we’re going into N+1 queries in a Postgres environment, providing insights and strategies to turn a potential database nightmare into a well-optimized, efficient system.

What Are N+1 Queries?

N+1 queries are a common performance bottleneck in databases. This issue occurs when an application performs an initial query to retrieve a set of records, followed by additional queries for each individual record. The name 'N+1' stems from making one (1) initial query and then N additional queries, resulting in N+1 total queries for N records.

Let’s say you have an application that displays user profiles and their respective posts. The application first executes a query to fetch all users. This is the "1" in N+1. The application performs another query for each user retrieved to fetch their posts. If there are ten users, this results in 10 additional queries (the "N" in N+1), totaling 11 queries. While this approach may seem straightforward, it's highly inefficient, especially as the number of users grows.

Let’s look at what this looks like. Assume you have two tables: users and posts. Each user has multiple posts. The posts table has a foreign key that references the users table. You want to display each user along with their posts.

Users Table:

id (Primary Key)
name

Posts Table:

id (Primary Key)
content
user_id (Foreign Key to Users)

A naive N+1 query to get these posts might look like this:

-- Query 1: Fetch all users
SELECT id, name FROM users;

-- For each user obtained from the above query, execute the following query:
SELECT content FROM posts WHERE user_id = [user_id];

Here, Query 1 is the "1" query. Query 2 will be the “N” queries, where you iterate through the user_ids.

As a result, if there are 100 users, the total number of queries executed will be 101: 1 for fetching all users and 100 for fetching the posts for each user. This is a classic example of an N+1 query problem and can lead to performance issues, especially with a large number of users and posts.

N+1 Queries in ORMs

It’s common to find N+1 queries in frameworks using Object-Relational Mappings (ORMs). These are designed to convert models in an application into SQL statements to query data in relational databases. However, they often exacerbate the N+1 query issue due to their internal mechanisms.

Let’s use the Python web framework Django as our example. In Django, an N+1 query problem can quickly occur when you have related models and access related data without properly optimizing your queries. Let's consider a scenario where each User has a foreign key to a Profile model:

# models.py
from django.db import models

class Profile(models.Model):
    bio = models.TextField()
    # other fields like date_of_birth, location, etc.

class User(models.Model):
    name = models.CharField(max_length=100)
    profile = models.OneToOneField(Profile, on_delete=models.CASCADE)
    # other fields like email, etc.

# views.py
from django.shortcuts import render
from .models import User

def user_list(request):
    users = User.objects.all()
    user_profiles = []
    for user in users:
        profile = user.profile  # This creates the N+1 problem
        user_profiles.append((user, profile))

    context = {'user_profiles': user_profiles}
    return render(request, 'user_list.html', context)

Here, the User.objects.all() query retrieves all users. This is the "1" in N+1. Inside the loop, accessing each user's profile may result in a separate database query to fetch the corresponding Profile instance. This is the "N" part of N+1, where N is the number of users.

Because ORMs abstract away the underlying SQL queries, developers might not immediately realize that their code generates these inefficient N+1 queries.

The Impact of N+1 Queries on Performance

N+1 queries impact performance, especially in large-scale systems. These queries lead to inefficient data retrieval, increased load on the database, and, ultimately, a poor user experience. Understanding the consequences of N+1 queries is crucial for database optimization.

Increased Database Load

Each additional query in an N+1 problem adds to the load on the database. This is particularly problematic with large datasets. For instance, if an application retrieves 1000 users and makes a separate query for each user's posts, it results in 1001 queries hitting the database instead of potentially just one. This extra load can slow down the database response times, affecting all application users.

Imagine a web application displaying a list of users and their recent activities. Without optimization, fetching 1,000 users might result in 1,001 queries (one to fetch all users and 1,000 to fetch activities for each). This heavy load can lead to longer wait times for the data to load, affecting user experience.

A high volume of queries also consumes more CPU and memory resources on the database server. In our previous example, instead of a single, efficient query, the server must process 1,001 queries, consuming more resources. This impacts the query in question and affects the overall efficiency of the database server, hindering its ability to handle other requests.

Scaling Challenges

Applications with N+1 query problems often struggle to scale. As the data grows, so does the number of queries. In a social media app, for example, as more users join and create posts, the N+1 issue exacerbates, leading to an exponential increase in queries. This scaling challenge can cause significant performance degradation over time, requiring more hardware resources to maintain the same level of performance.

The total time complexity for the N+1 queries pattern is O(N). As the number of records (N) increases, the total number of queries increases linearly. This linear growth can lead to significant performance degradation, especially with large datasets. Each query incurs a certain amount of overhead due to network latency, query parsing, execution planning, and data retrieval. This overhead, multiplied by the number of queries, can substantially affect performance.

Poor User Experience

The cumulative effect of these performance issues can result in longer loading times for users, negatively impacting the user experience. In an e-commerce site, if product details are fetched with N+1 queries, each additional millisecond in load time can potentially lead to lost sales as customers grow impatient and leave the site.

Consider a real-time data dashboard that monitors and displays various metrics. If each metric's data is fetched using separate queries for each element, the dashboard will experience noticeable delays, failing to deliver the real-time experience expected by users.

Detecting and Investigating N+1 Queries

Sometimes, the simplest way to detect N+1 queries is through careful code review. Reviewers can look for loops or iterative processes that make database calls, particularly in the context of ORMs or when accessing related data.

But N+1 queries can often be subtle and not immediately evident, especially in complex applications. Here are some effective strategies and tools that can be employed to detect and investigate N+1 queries:

Query Logging

Enabling query logging in Postgres is one of the first steps in detecting N+1 query problems. Logging all executed queries allows you to analyze the logs for patterns that indicate N+1 issues.

-- Set the logging level to log all statements
ALTER DATABASE your_database_name SET log_statement = 'all';

After enabling logging, look for sequences of similar queries that differ only in a parameter, such as multiple queries fetching details for different user IDs.

2024-01-23 10:00:01 UTC LOG:  statement: SELECT * FROM users
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 1
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 2
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 3
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 4
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 5
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 6
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 7
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 8
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 9
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 10
...

This is a strong indicator of N+1 queries.

Performance Monitoring Tools

Tools like pgBadger, DataDog, or ScoutAPM offer detailed insights into database performance. They can help identify inefficient query patterns that may suggest N+1 issues.

With pgBadger, you can analyze your Postgres logs to get a report highlighting frequently executed queries. A high frequency of similar queries can be a sign of N+1 problems.

The SELECT * FROM posts WHERE user_id = ? query stands out due to its high frequency of execution (850 times). This pattern is indicative of an N+1 query problem. It suggests that for each user fetched by the SELECT * FROM users query (executed ten times), there are multiple subsequent queries to fetch posts. The average duration of each posts query is low (5ms), but the cumulative impact (total duration of 4250ms) is significant, pointing to a potential performance issue.

The disparity between the number of executions of the users query and the posts query is a classic sign of N+1 queries. From this, the recommendations might be to investigate the application code following the execution of SELECT * FROM users, especially the parts where posts for each user are accessed or displayed. Then, developers can optimize the query pattern to use JOIN operations or batch processing to reduce the total number of queries.

An application performance monitoring tool like DataDog or Scout APM will often automatically highlight inefficiencies in an application, such as N+1 queries.

They can show the response times for these queries, the number of calls, and the query itself. This kind of report helps pinpoint where the application might be inefficiently using the database, guiding developers toward specific areas of the code that may need optimization to address the N+1 query problem.

Languages also have profiling tools built in, such as Python's cProfile. These can help detect N+1 queries by showing where your application spends most of its time.

import cProfile
cProfile.run('function_that_loads_data()')

This profiling can help identify functions that are making excessive database calls. A large amount of time spent in database-related functions could indicate N+1 queries.

Beyond language tools, most ORMs allow enabling SQL debug logging. This feature logs every SQL query the ORM executes, making spotting repetitive query patterns indicative of N+1 problems easier. Keeping with the Python theme, in Django, SQL debug logging can be activated by setting DEBUG = True in your settings.py, which causes all SQL queries to be printed to the console during development. You can also use Django's django.db.connection.queries for query insights.

EXPLAIN Command

The EXPLAIN command in Postgres is an invaluable tool for understanding how your queries are being executed. It can help you identify queries that do not efficiently use indexes or perform full table scans, which could be part of an N+1 problem.

EXPLAIN SELECT * FROM posts WHERE user_id = 1;

The output provides insights into the query execution plan, which can help identify inefficiencies.

QUERY PLAN
-----------------------------------------------------------
 Seq Scan on posts  (cost=0.00..35.50 rows=1560 width=2048)
   Filter: (user_id = 1)

Here, EXPLAIN tells us we perform a “Seq Scan on posts.” This indicates that a sequential scan is being performed on the posts table. Sequential scans are generally less efficient than index scans, especially for larger tables, as they involve scanning each row.

The sequential scan (Seq Scan) on the posts table might not be a problem for a single query. However, suppose similar queries are being executed repeatedly for different user_id values (as in an N+1 scenario). In that case, it indicates that the database is performing numerous full table scans, which can be highly inefficient.

The absence of an index scan suggests that there might not be an index on the user_id column or the query planner did not find it efficient to use the index. For an N+1 query pattern, the repeated execution of such full table scans can significantly degrade performance.

Optimization and Query Design

Optimization and better query design are the only strategies to significantly reduce the impact of N+1 query problems in a Postgres environment. Doing so naturally leads to better application performance and scalability.

Eager loading

When using ORMs, the best solution (given you can’t optimize the query yourself) is eager loading. Eager loading involves modifying the query to fetch all related data in a single query instead of separate queries for each record. This can be achieved in ORMs with specific methods or query options (like .include in Rails or select_related in Django). These reduce the number of queries to the database, improving performance.

If we want to optimize our Django example above, this can be achieved through techniques like select_related or prefetch_related, designed to handle database queries more efficiently for related objects.

Assuming the User model has a ForeignKey to another model, let's say a Profile model, here's an optimized version of the Django code that avoids the N+1 query problem:

# models.py
from django.db import models

class Profile(models.Model):
    bio = models.TextField()
    # other fields

class User(models.Model):
    name = models.CharField(max_length=100)
    profile = models.OneToOneField(Profile, on_delete=models.CASCADE)
    # other fields

class Post(models.Model):
    user = models.ForeignKey(User, related_name='posts', on_delete=models.CASCADE)
    content = models.TextField()
    # other fields

# views.py
from django.shortcuts import render
from .models import User

def user_profiles(request):
    # Use 'select_related' to fetch the related Profile in the same query
    users = User.objects.select_related('profile').all()

    # 'prefetch_related' is used for reverse ForeignKey relationships
    # This fetches all related posts in a separate query, reducing the overall number of queries
    users = users.prefetch_related('posts')

    return render(request, 'user_profiles.html', {'users': users})

The select_related('profile') method is used with the User.objects.all() query. This fetches the associated Profile for each User in the same database query, thus avoiding separate queries for each user's profile.

The prefetch_related('posts') method handles the reverse ForeignKey relationship from User to Post. It performs a separate query to fetch all related posts and then efficiently pairs them with the corresponding users, which is more efficient than doing individual queries for each user's posts.

This approach significantly reduces the number of queries, particularly when you have a large number of users and posts, thus improving the performance of your Django application.

Caching

If you have particular queries you want to return fast, you can implement caching to reduce the need to query the database each time.

Readyset allows you to cache SQL queries without changes to your application code. The only change needed is to swap out your primary Postgres database connection string with a Readyset connection string. Readset will connect to your primary database and register with the replication stream. After snapshotting your database, every query will be proxied through Readyset. Along with your monitoring tools from above, you can then also use Readyset to understand your query performance:

When you have detected N+1 queries that can be cached (usually, ready-heavy queries are optimal candidates), you can start caching. In this case, we want to cache our posts queries. To do so, we just prepend CREATE CACHE FROM to our query:

CREATE CACHE FROM SELECT * FROM posts WHERE user_id = ?;

The results from this query will now be served from Readyset with sub-millisecond latencies. Readyset monitors the replication stream from the primary database, looking for changes to the data, so the cache will be automatically updated whenever the underlying data is updated in the primary database.

Readyset also works with ORMs to help increase performance and optimize those queries under the hood.

Query design

Just as you can optimize your system around your queries, if you aren’t using an ORM, you can also optimize your queries for better performance. Here are a few options:

Use Joins Appropriately: Utilize SQL joins to combine data from multiple tables in a single query. Understand the differences between INNER JOIN, LEFT JOIN, etc., and choose the most appropriate use case to avoid unnecessary data retrieval.
Select Only Required Columns: Specify the columns you need in your SELECT statements rather than using SELECT *. This reduces the amount of data transferred and processed, making the query more efficient.
Understand and Use Indices: Ensure your queries effectively leverage indices, especially for columns in JOIN, WHERE, or ORDER BY clauses. Regularly review and optimize indices based on query patterns and data changes.
Avoid Looping Queries: Identify scenarios where your code iteratively executes queries (like in a loop) and refactor them to use bulk data retrieval techniques. Replace multiple small queries with fewer, more comprehensive queries.
Limit Data with Pagination: When dealing with large datasets, use pagination to limit the data retrieved and processed in a single query. This improves both database and application performance.

Thus, to optimize the basic queries from above, you could add a join into your query to fetch all the data at once:

SELECT u.id, u.name, p.content
FROM users u
LEFT JOIN posts p ON u.id = p.user_id

Solving the N+1 Puzzle

N+1 queries can destroy your application’s performance as you scale. What’s worse is that this issue can be abstracted from you by ORMs and other high-level frameworks, making them harder to detect and resolve. But with monitoring tools and a better understanding of the problem, you can detect these troublesome queries and work to improve your query design and optimize your performance.

At Readyset, we are building ways to help developers and platform engineers do just that. By caching queries, you can significantly improve your response latencies while still serving your users' fresh data. If this sounds like something that will help you, sign up for Readyset Cloud or reach out to us if you have any questions.

A Practical Guide to Caching: What to Cache and When

ReadySet — Mon, 05 Feb 2024 18:49:24 +0000

Caching is a Goldilocks problem. Caching too much leads to stale data, memory bloat, and a lot of cache management. Caching too little leads to longer latencies, higher database load, and the need to provision more and more storage.

The aim is to cache just right, optimize cache performance, minimize database load, and enhance overall system efficiency.

This is a lot easier said than done. How do you know what to cache and when? Modern databases help you understand the usage patterns of your data, including the frequency of reads versus writes, the size of the data being accessed, and the variability of query response times. These metrics are critical in informing your caching strategy, enabling you to identify hotspots of frequent access where caching can provide the most benefit. Here, we want to take you through determining what you should cache and when using the insights your database can provide.

Analyzing Workload Patterns

Understanding workload patterns in your production database is crucial for effective caching, as it directly influences what data to cache and the caching strategy to employ.

Read/Write Ratio Analysis

The read/write ratio analysis is fundamental to understanding workload patterns in database systems. This ratio illustrates the frequency of read operations (like SELECT queries) compared to write operations (INSERT, UPDATE, DELETE). A thorough analysis of this ratio can guide decisions on what data to cache and how to manage the cache effectively.

A high read ratio indicates that data is frequently accessed but only sometimes changed. This is ideal for caching because it doesn't require frequent invalidation or updates once data is cached. A high write ratio suggests more dynamic data, which can lead to frequent cache invalidations and reduced cache effectiveness.

The basic steps for analyzing your read/write ratio are:

Data collection. Use database monitoring tools or query logs to collect data on read and write operations. Some databases provide built-in tools or extensions (e.g., pg_stat_statements in PostgreSQL) that can simplify this process.
Quantify operations. Count the number of read and write operations over a given period. This can be done through script automation or database monitoring tools.
Calculate the ratio: Compute the proportion of reads to writes. A simple formula could be: Read/Write Ratio = Number of Reads / Number of Writes.

In Postgres, we can use the pg_stat_statements extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

And then query pg_stat_statements:

SELECT query, calls, rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT %' OR query LIKE 'INSERT %' OR query LIKE 'UPDATE %' OR query LIKE 'DELETE %';

This will show you the frequency of SELECT statements versus INSERT, UPDATE, and DELETE statements. You can then calculate the read/write ratio from these counts.

High read/write ratios (e.g., 10:1) indicate that the data is read ten times more often than it is written to. This data is a good candidate for caching. A low read/write ratio (e.g., 1:2) indicates more writes than reads. You need to be cautious about caching this data for this type of data, as the cache would need frequent updates or invalidations.

For data with a high read ratio, you can cache and set a longer Time-To-Live (TTL) for that cache. If you are caching low read/write ratio data, you must design an efficient cache invalidation strategy to ensure data consistency.

Temporal Locality and Hotspots

Temporal locality is based on the principle that recently accessed data will likely be accessed again soon. This pattern is a key factor in identifying 'hotspots' in your data - areas where frequent access suggests a high potential benefit from caching. Understanding and identifying these hotspots allows for a more targeted and efficient caching strategy, improving performance and resource utilization.

To identify hotspots, you need to:

Monitoring access patterns. Use database monitoring tools to track access frequency to different data elements. Look for patterns where specific rows, tables, or queries are accessed repeatedly.
Analyze query logs. Analyze logs for repeated access to specific queries. Frequent execution of the same query, especially within short time frames, indicates a hotspot.
Understand usage metrics. Collect metrics such as the number of hits, execution time, and frequency of access for various database elements.

To understand the access frequency, you must configure your database to log detailed query information. In your postgresql.conf, you can set the following parameters:

logging_collector = on # enable logging.
log_directory = 'pg_logs' # specify the directory for log files.
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # set the log file naming convention.
log_statement = 'all' # log every executed statement.

To analyze the logs, you can use something like pgBadger. pgBadger will parse the log and create an HTML report showing detailed query statistics.

Suppose you have a query log from a Postgres database and notice that specific user profiles are accessed frequently. Temporal locality suggests that user profiles accessed in the last day will likely be reaccessed.

Data or queries identified as hotspots are strong candidates for caching. The user profile data in the above example should be prioritized for caching. This is how we cache at ReadySet, a process called partial materialization:

You can think of partial materialization as a demand-driven cache-filling mechanism. With it, only a subset of the query results are stored in memory based on common input parameters to the query. For example, if a query is parameterized on user IDs, then ReadySet would only cache the results of that query for the active subset of users, since they are the ones issuing requests.

This allows us to cache relevant data while also reducing memory overhead. The goal is to minimize latency and database load by caching data showing repeated access patterns over time. By focusing on the most frequently accessed data, you can significantly improve the performance of your database and applications, ensuring that resources are allocated efficiently and that the cache serves its purpose effectively.

Query Analysis and Profiling

Query analysis and profiling are critical aspects of understanding and optimizing database performance. By using query analysis tools and techniques like the EXPLAIN command and examining query execution plans, engineers can gain insights into how queries are executed, which are resource-intensive, and how they can be optimized. This information is invaluable in making informed decisions about caching strategies.

An execution plan shows how the database engine executes a query, including steps like scans, joins, sorts, and aggregations. The EXPLAIN command in SQL provides the execution plan for a query. It reveals the database's operations and their CPU and I/O cost.

To use EXPLAIN with Your Query, just prepend your SQL query with EXPLAIN. For instance:

EXPLAIN SELECT * FROM user_logs WHERE user_id = 1234;

This will output the execution plan without actually running the query. You are looking for operations like sequential scans, index scans, sorts, and joins:

Seq Scan on user_logs  (cost=0.00..155.00 rows=5000 width=132)
  Filter: (user_id = 1234)

In this case, we have a sequential scan, indicating the query is scanning the entire user_logs table. This can be inefficient for large tables. It also shows a filter is applied to the user_id, which is inefficient if this operation is frequent or the table is large (you should use indexes instead).

The estimated cost and number of rows affected help assess the query's efficiency. Cost is a unitless measurement used by Postgres to estimate the relative expense of executing the query. It combines I/O, CPU, and other factors to help compare the efficiency of different query plans.

The lower the cost, the more efficient the query is expected to be. Queries that run frequently and have high costs are prime candidates for caching. Queries that involve significant data retrieval and minimal updates are ideal for caching.

Effective query analysis and profiling allow engineers to identify performance bottlenecks and optimize query execution. This is a crucial step in determining which data or queries will benefit most from caching. Regularly profiling queries, especially resource-intensive or frequently executed, can significantly enhance database performance and overall application efficiency.

Query Latencies

Query latencies, the time taken to execute database queries, are a crucial metric in workload analysis. They provide insight into the performance of the database and are instrumental in identifying which queries might benefit most from caching. High latencies often indicate bottlenecks or inefficiencies that can be alleviated through strategic caching. For engineers, measuring, analyzing, and interpreting query latencies are key to optimizing database performance.

We can set log_min_duration_statement in postgresql.conf to log queries that exceed a specified execution time.

set log_min_duration_statement=1000;

This allows us to look for queries with consistently high execution times, as these are primary candidates for optimization and caching. You want to be able to analyze patterns in this data to determine if high latencies are isolated incidents or part of a pattern. Recurring high latencies during certain operations or times indicate systematic issues.

For example, after enabling query logging in PostgreSQL, you might find:

LOG:  duration:1050 ms  statement: SELECT * FROM products WHERE category = 'Electronics';

This indicates that the query took 1050 milliseconds, which might be considered high for this operation. High latencies often point to performance bottlenecks. These could be due to unoptimized queries, lack of appropriate indexing, or heavy load on the database. Queries with high read latencies and low write operations are excellent candidates for caching. For instance, caching its results could significantly improve performance if the product query above is read-heavy and the data doesn't change often.

Regularly monitoring and analyzing query latencies ensure the caching strategy aligns with current usage patterns and performance requirements.

Readyset shows you the latencies for each of the queries you are running through it:

Readyset displays the 50p, 90p, and 99p latencies for query response times. They are key metrics used in performance analysis to understand the distribution of latencies in a system:

50p (50th percentile) latency. Also known as the median latency. This means that 50% of your queries or requests are faster than this time, and 50% are slower. It gives a good indication of the typical experience for a user or system interaction.
90p (90th percentile) latency. This indicates that 90% of your queries or requests are faster than this time, and 10% are slower. This metric helps in understanding the experience of most of your users, excluding outliers that might be unusually slow.
99p (99th percentile) latency. This shows that 99% of your queries or requests are faster than this time, and 1% are slower. It is useful for identifying the long tail of slow requests, which can be critical for understanding and improving users' experience with the worst performance.

You can also graph this data for specific queries:

This can tell you which queries are performing poorly and whether this performance changes over time or has any pattern. Here, you can find the longest latency queries and immediately cache them with Readyset.

An Example Caching Workflow With Readyset

Imagine an e-commerce platform with a Postgres database facing performance issues. The platform has experienced high latencies and low throughput, particularly when accessing product information and user profiles.

Step 1: Analyzing Read/Write Ratios
We start by analyzing the read/write ratio to understand the nature of our database workload. First, we enable the pg_stat_statements extension in Postgres to track query statistics:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

We then query pg_stat_statements to get insights into our read and write operations:

SELECT query, calls, total_time, rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT %' OR query LIKE 'INSERT %' OR query LIKE 'UPDATE %' OR query LIKE 'DELETE %';

After analyzing the data, we find a high read/write ratio for product information queries (e.g., 15:1), indicating these are read-heavy and good candidates for caching. Conversely, user profile updates show a lower read/write ratio (e.g., 2:1), suggesting frequent updates.

Step 2: Identifying Temporal Locality and Hotspots
We use database monitoring tools to track access patterns and identify hotspots. In the postgresql.conf, we set:

logging_collector = on
log_directory = 'pg_logs'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_statement = 'all'

Also set:

set log_min_duration_statement=1000;

We use pgBadger to analyze the logs. The report shows that queries related to product categories like 'Electronics' are accessed frequently, indicating a hotspot.

Step 3: Query Analysis and Profiling
We use the EXPLAIN command to analyze query plans.

EXPLAIN SELECT * FROM products WHERE category = 'Electronics';

The analysis reveals inefficient sequential scans, suggesting the need for better indexing and potential caching.

Step 4: Addressing Query Latencies
We observe the query latencies in the logs:

LOG:  duration: 1050 ms  statement: SELECT * FROM products WHERE category = 'Electronics';

This high latency is a clear indicator of a performance bottleneck.

Step 5: Implementing Caching
Based on our analysis, we decide to cache the frequently accessed product information.

We can do this through Readyset. Readyset requires no changes to our application code. All we have to do is connect Readyset to our Postgres database and then use our Readyset connection string in our application instead of the credentials for the primary Postgres database.

Once this is set up, all the queries from your application to your database will initially be proxied through Readyset. Then you can start caching. In this case, we want to cache our electronics product information. To do so, we just prepend CREATE CACHE FROM to our query:

CREATE CACHE FROM SELECT * FROM products WHERE category = 'Electronics';

The results from this query will now be served from Readyset with sub-millisecond latencies. As Readyset monitors the replication stream from the primary database, looking for changes to the data, the cache will be automatically updated whenever the underlying data is updated in the primary database.

Step 6: Monitoring and Continuous Improvement
We continuously monitor query performance using Readyset, which shows us the latencies for each cached query. By analyzing the 50th, 90th, and 99th percentile latencies, we adjust our caching strategies to ensure optimal performance.

By following this approach, we significantly improved the e-commerce platform's performance. The product information queries' latency dropped, and the overall system throughput increased. This real-world example demonstrates the impact of strategic caching in resolving performance bottlenecks in a database-driven application.

Understanding Data Characteristics

The data itself will influence caching strategies. Different data types have unique characteristics that affect how they should be cached. Data size, consistency requirements, and business needs are critical in determining the most effective caching approach. Understanding these characteristics is essential to ensure that the caching strategy aligns with the nature of the data being handled.

Object Size

The size of objects, including individual data items and the results of database queries, is a crucial factor to consider. Object size directly impacts cache storage efficiency, access speed, and overall system performance.

Object Size here could mean:

Data item size. This refers to the size of individual data items, like rows in a database. Large data items can consume significant cache space, potentially reducing the overall effectiveness of the cache.
Query result size. The size of data returned by database queries. Some queries might return large datasets, which, when cached, can take up substantial space.

Many databases offer functions to measure the size of rows or tables. In Postgres, you can use pg_column_size to find the size of a column’s value or pg_total_relation_size for the size of a table. E.g.:

SELECT pg_column_size(*) FROM user_data WHERE user_id = 123;

Suppose you have a table user_data in a Postgres database, and you frequently query user profiles. Analyzing the size of these profiles helps determine if they should be cached:

SELECT AVG(pg_column_size(*)) FROM user_data;

If the average size is relatively small, caching individual profiles might be efficient. However, if the size is large, consider caching subsets of data or using a different strategy, like partial caching.

Be cautious when caching large objects, as they can lead to rapid cache eviction of other items. This can diminish the effectiveness of the cache. If objects are too large to be effectively held in memory, disk-based caching might be more appropriate, though it's slower.

Object size factors into your cache configuration. Consider using eviction policies like Least Recently Used (LRU) for larger objects to maintain cache efficiency.

Consistency Requirements

Consistency here refers to how up-to-date and synchronized the cached data is with the underlying data source. Understanding whether your application needs strict or eventual consistency will guide you in choosing the right caching mechanisms and policies.

Strict Consistency
Strict consistency guarantees that all transactions are executed sequentially, and any read reflects the latest write. It provides the strongest consistency guarantee but can be expensive and impact performance. It is ideal for mission-critical applications where data integrity is paramount. Financial applications, where transaction integrity is crucial, typically require strict consistency. An account balance, for example, must always reflect the most recent transactions.

Two types of caching strategies for strict consistency are:

Write-through cache. This strategy involves writing data to both the cache and the database simultaneously. It ensures that the cache always has the most up-to-date data.
Cache invalidation. Another approach is to invalidate the relevant cache entries immediately upon any data modification in the database.

Strict consistency ensures data integrity but can lead to higher latency and increased load on the database due to frequent cache updates or invalidations.

Eventual Consistency

With eventual consistency, writes are eventually propagated to all replicas, but there is no guarantee of how long it takes. Reads may reflect older writes or incomplete data until the replicas are synchronized. Eventual consistency offers high availability and scalability but can lead to inconsistencies for a short period. Social media platforms, where seeing the most up-to-date data is not always critical (e.g., a user's number of followers), can use eventual consistency.

Possible caching strategies for eventual consistency are:

Time-To-Live (TTL): Cached data is given a TTL, after which it is either refreshed from the database or evicted.
Lazy Loading: Data is updated in the cache only when requested, leading to potential staleness but reduced load on the database.

Eventual consistency improves read performance, reduces database load, and can serve stale data.

ReadySet is eventually consistent, as each new write requires reconstructing the dataflow graph for impacted queries.

The choice between strict and eventual consistency in caching strategies hinges on the application's specific needs. Understanding these requirements is crucial for engineers to design a caching system that balances data integrity with performance and efficiency. By carefully evaluating the nature of the data and its usage patterns, a suitable caching approach can be devised that optimally serves the application's needs.

Business Criticality

Finally, there is a more qualitative characteristic–business criticality. It involves assessing which data is crucial for the functionality and performance of your application and should, therefore, be prioritized in caching strategies. This assessment often requires balancing the need for speed against the tolerance for staleness.

This is mostly about understanding what data is important. Data that significantly impacts the user experience, such as personalized content, frequently accessed user profiles, or dashboard metrics, is often critical for caching. Data that is vital for daily business operations, like transaction records in a financial application or inventory levels in a retail system, should be prioritized for caching.

A few examples of different types of data and their impact on caching are:

E-Commerce Platforms:

Product Catalog: Frequently accessed but infrequently updated, making it a good candidate for caching.
User Shopping Carts: High business criticality but requires strict consistency due to the dynamic nature of the content.

Content Delivery Networks (CDNs):

Media Content: Images and videos are crucial for user experience, making them important to cache. They are typically static, allowing for longer TTL values.

Financial Applications:

Account Balances: Require real-time accuracy, demanding a strict consistency approach in caching.
Historical Transaction Data: Less critical for immediate operations, can be cached with eventual consistency.

Within all of these, you are trying to balance speed and staleness. Data that contributes to faster load times and smoother user interactions should be cached, especially if it's accessed frequently. Then, you can think about staleness tolerance by determining how tolerant the business process or user experience is to stale data. For example, slightly outdated product recommendations may be acceptable, but outdated pricing information is not.

This is ultimately a trade-off in terms of speed vs staleness and cost vs benefit. You must analyze the cost of caching (regarding resources and management) against its benefits to business operations and user satisfaction. Plus, you have to factor in developing an effective cache maintenance strategy to maintain data integrity without compromising performance. It requires a deep understanding of the technical aspects of caching and the business or user perspective of data importance.

Caching Done Right

It takes a lot to find the right data to cache. But it is worth it. The effort invested in identifying what to cache, understanding the nature of your data, and aligning your caching strategy with business needs pays off in multiple ways. When done right, caching can significantly improve application performance, enhance user experience, reduce database load, and ultimately contribute to the efficiency and scalability of your systems.

At Readyset, we’re making caching easier. Once you’ve identified the data you need to cache, with Readyset’s help if needed, Readyset will start caching it automatically, giving you sub-millisecond query responses with no extra logic in your application.

Try Readyset Cloud or reach out to us if you have any questions.

Stateful Property Testing in Rust

ReadySet — Fri, 26 Jan 2024 16:55:34 +0000

How ReadySet is using stateful property testing to to find subtle bugs.

Introduction

Most developers have written unit tests: you feed your code some inputs and verify the outputs. Property tests automate this process. Instead of manually creating inputs, your test generates them for you and validates the input/output pairs against a set of properties you define. These can be tricky to write since you have to develop good generation strategies and find general properties to test (rather than just hardcoding individual test case results), but they can also help find edge cases you might not have thought to write unit tests for.

But what if you’re writing property tests for a system with internal state? Testing one input at a time may not be enough anymore – you might need to run whole sequences of steps to test the way the system changes state over time.

This is where stateful property testing comes in. We’ve had great successes with this technique at ReadySet and we’re excited to share it with you. To that end, we’ve written and released a general-purpose OSS library, called proptest-stateful, that helps you reap the same benefits for your own projects. But before we talk more about our own experiences, let’s start by explaining the general concepts.

What is stateful property testing?

Stateful property tests generate a sequence of steps to run and then execute them one at a time, checking postconditions for each operation along the way. If a test fails, the test framework will try to remove individual steps in the test sequence to find a minimal failing case. This process is called “shrinking” and is helpful for making test failures easy to interpret. Randomly generated sequences of steps can be very long and complex, so shrinking helps separate out what actually caused the bug.

If you’ve written property tests before, you might be wondering what’s so difficult about this. Can’t you just write code to generate a step, then generate a random sequence of steps and run each one in turn?

There are a few reasons that this approach falls short: later steps in the sequence may depend on the internal state generated by earlier steps. When shrinking the generated steps to a minimal failing case, you need to be careful that you don’t accidentally create test cases that fail for the wrong reasons.

Example Use Case: Counters

Imagine you’re testing a simple library that maintains an unsigned integer counter. There are two API operations: increment and decrement. This is a contrived example, and is simplified to help illustrate the testing techniques, but in essence, this is a database.

It’s a very small memory-only database with a very strict schema, but we’re still maintaining a set of data and updating it in response to client requests, just like any other database.

struct Counter { count: usize }

impl Counter {
    fn new(count: usize) -> Self { Counter { count } }
    fn inc(&mut self) { self.count += 1; }
    fn dec(&mut self) { self.count -= 1; }
}

If you try to decrement the counter below 0, it will underflow and crash, which is a known limitation for this library. If you try to test this use case by generating a random sequence of increment/decrement operations, you will quickly run into this problem. As such, for each operation you generate, your generation code will need to consider whether it expects the counter value to be zero and if it does, make sure not to generate a decrement operation.

We can define a simple model to keep track of the expected state like so:

struct TestState { model_count: usize }

We can keep the model count updated via a next_state callback that gets run after generating each operation, and then use the model state to decide which operations we can safely generate next:

fn next_state(&mut self, op: &Self::Operation) {
  match op {
    CounterOp::Inc => self.model_count += 1,
    CounterOp::Dec => self.model_count -= 1,
  }
}

fn op_generators(&self) -> Vec<Self::OperationStrategy> {
  if self.model_count > 0 {
    vec![Just(CounterOp::Dec).boxed(), Just(CounterOp::Inc).boxed()]
  } else {
    vec![Just(CounterOp::Inc).boxed()]
  }
}

Additionally, if your test finds a bug and it tries to remove steps to find a minimum failing case, it’s possible that you might still trigger underflow in a simplified case even if the original test case did not. If the original case was “start at 0, increment, decrement, increment” then you can’t remove the first increment unless you also remove the decrement! To avoid this, we also must define preconditions via a separate callback:

preconditions_met(&self, op: &Self::Operation) -> bool {
    match op {
        CounterOp::Inc => true,
        CounterOp::Dec => self.model_count > 0,
    }
}

This logic may seem redundant with the code in op_generators, and indeed, any test case generated by op_generators that doesn’t pass the precondition test will be filtered out. However, without the logic in op_generators to avoid generating invalid test cases, it can be slow and inefficient to randomly stumble across cases that pass all the precondition checks. If you have familiarity with property testing in general, similar caveats apply here as with test-case filtering.

The upshot here is that the framework can reuse the same model during runtime, so you can also easily check the real-world counter value after each operation to verify that it matches the value you expect. For a deeper dive into this example, check out the README, or jump straight into a full implementation.

Now let’s take a look at some real-world bugs we’ve found using these techniques, many of which would’ve been nearly impossible to find with more conventional tests.

Bug Spotlight

We have a stateful property test suite that tests replication of data and DDL commands from a primary database into ReadySet’s caching system. Because ReadySet automatically keeps cached query results up-to-date in response to new data and schema changes, it’s critical to make sure that there are no corner cases where the cached data might lose synchronization with the database.

To search for edge cases, this test suite generates sequences of DDL and DML statements and runs them against both Postgres and against ReadySet, checking after each step that ReadySet’s cached data matches that of the database without ReadySet in the mix. For this test, the model state is used to make sure that the DML we generate lines up with the tables we’ve created and their corresponding schemas.

You can take a look at the source code – it’s already found a plethora of bugs since its inception earlier this year, including some really subtle edge cases that would’ve been nightmares to track down in production. Let’s dive into some examples:

Bug #1: Starting Small

CREATE TABLE t(i int);
ALTER TABLE t RENAME COLUMN i TO j;
CREATE VIEW v AS SELECT * FROM t;

This is a minimal failing case found by our test. It creates a table with a single column, renames the column, then creates a view referencing the table.

We should be able to run queries against the ‘v’ view in ReadySet, but trying to do so resulted in a cryptic error about internal invariant violations. Something about this specific combination of steps was triggering a bug.

The Fix: It turned out that there were certain types of column changes we weren’t replicating due to an overly broad filter clause in one of the internal queries ReadySet performs against PostgreSQL. Luckily, the fix turned out to be a straightforward one-line change.

Creating a view that selected from a table that had previously had a column renamed wasn’t something we’d thought to specifically test with any unit tests or integration suites. With millions of possible edge cases like this, enumerating every case via a unit test is impossible.

While stateful property testing quickly found a failure, the initial test case still had thirteen different steps. Many of the steps were unrelated and obfuscated the root cause of the bug. Shrinking saved us hours of painful debugging - the minimal case had only three steps, making it clear that the problem was with how we were replicating ALTER COLUMN commands.

Okay, simple enough, right? Next up we have this gem:

Bug #2: Ramping Up

CREATE TABLE cats (id INT);
CREATE TABLE dogs (id INT);
INSERT INTO cats VALUES (1);
ALTER TABLE dogs ADD COLUMN bark VARCHAR;
DELETE FROM cats WHERE id = 1;

Here, we created two tables named “cats” and “dogs”. We inserted a value into “cats”, added a column to”dogs”, and then deleted the value we inserted into “cats”.

When run against ReadySet in quick succession, the row of (1) would still show up in the ‘cats’ table forever. The row should have been deleted, but somehow the DELETE was getting silently dropped.

The oddest part? We’re not doing anything unusual to the cats table at all! We just create the table, insert a row, and delete a row.

This was so odd that at first it seemed like there was something wrong with the test. It turned out this bug was easy to reproduce though, and the DELETE was only lost if it happened to coincide with a separate ALTER TABLE statement on a completely unrelated table!

There weren’t any server-side errors that showed up alongside this bug, but because the test suite actively checks table contents as a postcondition of each test step, we were still able to detect that the expected state of the system failed to match up with the actual results. This would’ve been extremely difficult to diagnose in production since this bug left behind no evidence of when or why it was triggered, so generating a minimal set of steps to reproduce this was an incredible win for us.

The Fix: This edge case was being triggered by the ‘dogs’ table being altered before any rows were written to it. Because no data had been replicated to ‘dogs’, ReadySet had not yet created some internal metadata for the table. This missing table metadata fooled ReadySet into thinking it was in the midst of a full re-snapshot of all tables. ReadySet drops outdated replication events while re-snapshotting, as we are already recopying the entire table from scratch, and the deletion fell into that window. These dropped events would normally be redundant, but since ‘cats’ wasn’t actually being re-snapshotted, the events were simply lost, and the cache fell out of sync with the database.

Resolving this bug was trickier than bug #1, but we still were able to develop fixes pretty quickly (check here and here for more!).

Bug #3: Best For Last

In this case we already knew a bug existed, but we didn’t know what caused it. ReadySet was crashing when a specific user tried to run a series of database migrations, but it wasn’t clear why the crash was occurring.

This is actually a pretty common occurrence: your user has a verifiable bug, but finding a minimal repro is the hard part. Fortunately, that’s just what stateful property testing does.

We had just added ENUM support for a user and we did know this bug had something to do with ENUM types, so we added support to the test suite for creating and altering enum types. In short order, this minimal failing case showed right up:

CREATE TYPE et AS ENUM ('a');
CREATE TABLE t (e et);
DROP TABLE t;
ALTER TYPE et ADD VALUE 'b';

First, we create a custom enum type, use it in a table, then drop that table and alter the type.

Playing around with this, it became immediately clear why this was a tricky bug to reproduce. It only occurs if you alter a type after dropping a table that had used that type in its schema. The issue is that ReadySet maintains a list of associated caches for each custom type, and we had simply neglected to update that list when dropping a table. When we then altered the type, ReadySet would try to invalidate a cache that didn’t exist and crash (see here for the fix).

Hard to find by hand, but no sweat for a stateful property test!

Conclusion

I hope you’ve enjoyed this post, and I hope you’re as excited as we are about making property tests stateful.

As you can see, this is a great technique for database systems, but it can be useful for any other kind of stateful application as well. Server software with internal state can be an obvious fit, but there’s no need to limit yourself to backend server systems like this. People have used these techniques for everything from game engines to automotive software, so the sky’s the limit!

We want to share this technique with the world, so we’ve released an open source proptest-stateful crate on crates.io and in our GitHub repo, which anyone can use to write their own stateful property tests in Rust. And, of course, if writing these kinds of tests yourself sounds exciting, don’t hesitate to take a look at our careers page. Until next time!

How Database Replication Works Under the Hood

ReadySet — Mon, 22 Jan 2024 16:43:31 +0000

Replicating data is a process you will encounter if you build a product. At the very least, you’ll want some kind of backup for your primary database, either through taking a snapshot or having a secondary “follower” database mirroring your primary data. You might also replicate data into a data warehouse for analysis, copy data to read-only replicas for load balancing, or replicate data while performing upgrades to your infrastructure.

That’s to say, database replication is a ubiquitous and helpful tool in database administration. So how does it work? How do you get data from one database to another?

It’s not quite as simple as just copying the data. If we were to do that, replication would become critically slow as the total amount of data increases. Instead, replication takes advantage of the built-in system for guaranteeing the atomicity and durability of your database commits–the write-ahead log (WAL).

Here, we want to show you how the WAL works in databases and how it is used for replication.

The WAL is the Database

In relational databases, we model data as tables like this:

Tables have rows and columns, where each row represents a record, and each column represents a field within that record. This data-oriented view of databases makes sense for users, as we mostly care about the data in the database and how different fields relate.

But that isn’t the only way to think about a database. If you are trying to build a database, it makes more sense to see it from an operational perspective, where the database is a time-ordered sequence of events or changes. In that scenario, you’d just have a log of every transaction or modification recorded sequentially.

The table above becomes:

If we want to add another value, it is just appended to the log:

The same goes for any other operations. If we update an entry, the UPDATE is added to the log, but we can also still see the initial value from the INSERT:

Same with DELETE:

In the end, our table looks like this:

The data only represent the current state of the database. We don’t know how it got there. The logs provide a historical account of changes.

This chronological ledger is pivotal for both recovery and replication purposes. Write-ahead logging is a fundamental principle in database systems. Before any change is committed to the database files on disk, it is first logged in the WAL. This approach serves two primary purposes:

Atomicity: In the event of a system crash or failure, the WAL is used to replay transactions, ensuring that all completed transactions are preserved, and all incomplete transactions are rolled back, thus maintaining the database’s consistency.
Durability: The WAL ensures that it is not lost once a transaction is committed. By writing changes to a log before applying them to the database, the WAL provides a fail-safe against data loss due to unexpected failures.

In essence, the WAL acts as the backbone of the database's state. The log is the first point of contact for any change, making it the most up-to-date record of the database's evolution.

When it comes to database replication, logs are the source of truth. They provide a comprehensive and sequential record of all changes, making them ideal for replicating data across multiple systems.

Replication Process: The primary database's log is continuously streamed to followers in log-based replication. Each change recorded in the log is replicated in the secondary systems. This ensures the replicas are an exact copy of the primary database, mirroring every change.
Real-time Replication: With logs being the first to record any change, they enable near real-time replication. This is critical for applications requiring high data availability and up-to-date information across all nodes.
Consistency and Reliability: Using logs for replication ensures that the data remains consistent across all replicas. Since the log records every transaction in the order they occur, the replication process respects this order, maintaining transactional integrity.

The log is more than a mere database recording mechanism; it is the authoritative ledger of all transactions and changes. Its role in replication is indispensable, particularly in systems like ReadySet, where maintaining sub-millisecond latencies for query caching is critical. By leveraging logs' detailed, sequential nature, such systems ensure real-time, consistent, and reliable replication.

Logical Replication

There are three main types of replication:

Physical Replication: Involves byte-by-byte copying of the database, replicating the exact physical state of the primary database. This method is commonly used in streaming replication.
Snapshot Replication: Captures the state of a database at a particular point in time, akin to taking a "snapshot." This is less dynamic and is often used for less frequently changing data.
Logical Replication: Focuses on replicating changes at a higher level–the level of database transactions. It allows for copying specific tables or rows and columns within tables, providing much more flexibility than physical replication.

Logical replication in database systems leverages the detailed information stored in database logs. In logical replication, instead of copying raw data, a replication agent reads the transaction logs and interprets the changes recorded in the logs. This interpretation converts the low-level format of the log (which is often storage and database-specific) into a high-level representation of the changes (like SQL statements or a similar format).

After interpretation, the changes are queued. This queuing mechanism ensures that they are delivered to the target database in the correct order, maintaining transactional integrity and consistency. The final step involves applying these changes to the target database by executing the high-level change records (like SQL statements) on the target database.

All this is achieved through a publish/subscribe model.

Publication: In this model, the primary database defines a "publication," essentially a set of database changes– inserts, updates, deletes –that it is willing to publish.
Subscription: The secondary systems, or subscribers, subscribe to these publications. They receive only the changes that are part of the subscribed publications, allowing for selective replication.

The publish/subscribe model in logical replication enables efficient distribution of data changes, as each subscriber can choose what data it needs. Subscribers can then receive updates in near real-time, making this method suitable for systems that require up-to-date information across multiple nodes.

This is easy to set up. Let’s do so using Postgres. On two separate database clusters, set up two independent databases, one called “primary_db”:

CREATE DATABASE primary_db

The other is called “follower_db”:

CREATE DATABASE follower_db

In both, create a table:

CREATE TABLE replicated_table (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    value INT
);

In primary_db, populate this table with some data:

INSERT INTO replicated_table (name, value) VALUES ('Item1', 10);
INSERT INTO replicated_table (name, value) VALUES ('Item2', 20);
INSERT INTO replicated_table (name, value) VALUES ('Item3', 30);
INSERT INTO replicated_table (name, value) VALUES ('Item4', 40);
INSERT INTO replicated_table (name, value) VALUES ('Item5', 50);

If you check the table in primary_db, you should see the data:

SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
(5 rows)

If you do the same in follower_db, it should still be empty:

SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
(0 rows)

With the tables created, we can create the publication. On primary_db, create a publication for this table using the CREATE PUBLICATION command:

CREATE PUBLICATION my_publication FOR TABLE replicated_table;

You can then check to see if that was successfully created by checking in the pg_publication_tables table:

SELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';   

    pubname               | schemaname    |   tablename             
----------------+------------+-----------------
 my_publication   | public            | replicated_table 
(1 row)

Hop over to your follower_db and create a subscription to that publication using connection information from your primary database using CREATE SUBSCRIPTION:

CREATE SUBSCRIPTION my_subscriber     
CONNECTION 'dbname=primary_db host=localhost port=5432'
PUBLICATION my_publication

(note: if you want to try this on a local version of postgres, create your publication manually using “SELECT pg_create_logical_replication_slot(my_'subscriber', 'pgoutput');” on publishing_database and then add “WITH (create_slot = false)” to your CREATE SUBSCRIPTION command above on subscribing_database. Otherwise, CREATE_SUBSCRIPTION will hang.)

Now, if we check our table in follower_db again, we’ll see the data:

SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
(5 rows)

If we add a row to the table in primary_db:

INSERT INTO replicated_table (name, value) VALUES ('Item6', 60);

We’ll immediately see the data in follower_db:

SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
  6 | Item6 |    60
(6 rows)

Our data is now being replicated. If we look into postgresql.log (this isn’t the write-ahead log, instead it is a log of what is happening with our postgres application), we can see what has happened:

STATEMENT:  CREATE_REPLICATION_SLOT "pg_17529_sync_17519_7307265844192364778" LOGICAL pgoutput (SNAPSHOT 'use')
LOG:  starting logical decoding for slot "pg_17529_sync_17519_7307265844192364778"
DETAIL:  Streaming transactions committing after 0/A90190B0, reading WAL from 0/A9019078.

The STATEMENT shows that we’ve created a new logical replication slot named pg_17529_sync_17519_7307265844192364778. The replication slot is created with the pgoutput plugin, the built-in logical decoding output plugin provided by PostgreSQL, used for logical replication. The “SNAPSHOT 'use'” part indicates that this slot will use the snapshot associated with the transaction running CREATE_REPLICATION_SLOT. So, this has captured the state of the database at a particular point in time.

The LOG and DETAIL indicate that logical decoding has started for the newly created replication slot. The message “Streaming transactions committing after 0/A90190B0, reading WAL from 0/A9019078” specifies the WAL location from which the replication is starting.

This means it will start streaming changes committed after the WAL location 0/A90190B0, and it's currently reading from the WAL location 0/A9019078. In WAL format of older versions of Postgres (<=9.2), the 0 is the segment of the WAL log, and A90190B0/A9019078 are the offsets within that segment. In newer versions, you get a single 64-bit integer pointer instead of two 32-bit integers

How ReadySet Uses Logical Replication

In most caching mechanisms, replication isn’t used. Instead, developers must write their caching logic in the application layer to transfer data from a database to a cache.

ReadySet works differently by taking advantage of the mechanism described above. ReadySet registers itself as a consumer of a database’s replication stream, receiving the same information as a read replica–the log of changes made to the primary upstream database.

When you initially register a database with ReadySet, we take a snapshot of that database. ReadySet applies every write from the replication log to this copy of the upstream database. Cache misses don't go to the upstream database. Instead, the result is computed using the dataflow graph, and the result is cached. If any updated data in the replication stream is part of the dataflow graph for cached results, ReadySet will update the cache using this new data.

In the context of ReadySet, logical replication is not just about copying data; it's about maintaining high efficiency and performance in query caching:

Sub-millisecond Latencies: Using logical replication, ReadySet can ensure that the cache is updated almost instantly with changes from the primary database, maintaining sub-millisecond latencies.
Consistency with Primary Data: The real-time nature of logical replication means that the cache is always consistent with the primary database, ensuring that the users get up-to-date and accurate data.

Logical replication with ReadySet provides the flexibility, speed, and accuracy required for web-scale products. If you want to start with ReadySet, sign up for early access to ReadySet Cloud today!

Medical Joyworks Improves Page Load Times by 500% With ReadySet

ReadySet — Wed, 20 Dec 2023 18:18:49 +0000

About Medical Joyworks

Medical Joyworks is a medical education company specializing in gamified learning, catering to a diverse audience, ranging from medical students to pharmaceutical companies. They elevate healthcare education through immersive case studies and user conversations, all seamlessly accessible via their app. Medical Joyworks also curates an active newsletter, reaching an audience of over 700,000 subscribers.

Medical Joyworks Wanted a Custom Solution

With their subscriber list growing rapidly, Medical Joyworks turned to a custom solution for email delivery and analysis for several reasons:

To configure the delivery of custom newsletter content tailored to each subscriber’s specialty interest.
To avoid the escalating costs associated with conventional email services (popular platforms charge upwards of $4,000/month for 200,000 contacts, and their pricing scales up for enterprise usage).
To gain the flexibility of switching between various email delivery providers such as Amazon SES and Postmark, depending on real time performance.

However, even with these advantages, the company encountered multiple challenges around tuning their database to perform well in this new context.

The Challenges of Database Performance Optimization

The Medical Joyworks team was using Postgres and was running into slow page load times for their campaigns and contacts page, designed for administrators to analyze newsletter performance metrics. These delays were primarily attributed to:

The execution of multiple complex database queries to calculate metrics like open rates and link click rates.
Multiple database requests to display campaign and open rate data for each user.

While Medical Joyworks could optimize the number of queries and introduce a different algorithm to reduce page load times, implementation would demand weeks of effort, straining the team’s bandwidth.

Instead, they had ReadySet up and running in 15 minutes, improving page load times by 500%.

Opting for an expedited solution, Medical Joyworks implemented ReadySet, swapping out a connection string in their Laravel app’s database configuration file and deploying in about 15 minutes.

The result? An impressive 500% improvement in page load times, achieved by caching a few queries - a process that would have taken weeks to implement manually.

“Analysis is easier with ReadySet. When we are looking for patterns with different variables/filtering options, the pages load much faster.”

By introducing ReadySet into their stack, Medical Joyworks has found a solution for a faster and more efficient analysis of vital newsletter delivery data for their expanding user base of 700,000 subscribers. Whether it’s refining open rates to contacts within specific specialties or examining various other metrics, page loads are now consistently snappy due to their usage of ReadySet.

Explore how ReadySet can enhance your stack here.

Finding Slow Queries in Postgres Using Datadog

ReadySet — Thu, 14 Dec 2023 16:04:31 +0000

You’re starting to notice lags in your application's performance. Page loads are slowing, and the user experience is being impacted. At the same time, your AWS bills are also starting to creep up.

These can be the tell-tale signs of serious database performance issues. Your queries are taking too long to execute and are using too many resources when they do. The answer most people move to is to start caching queries to reduce the load on your database. This is exactly what we built ReadySet for.

But which queries? Databases like Postgres have some performance monitoring capabilities, but these are clunky to use, and you’ll end up knee-deep in statistics configurations. The answer is to add application performance monitoring (APMs) into your stack. APMs, like Datadog, make it easier to monitor your database performance and can give you insights into how your database and queries perform.

Here, we’ll show how you can use Datadog and query metrics to understand your queries better and prioritize candidate queries for caching.

Use Query Metrics to Identify High-Load/High-Call Queries

The Datadog Query Metrics dashboard provides insights into the performance of normalized queries. You can visualize performance trends and filter and group queries to analyze how queries perform. The dashboard graphs key metrics like requests and average latency and gives metrics for timing, requests, and the amount of data pulled per normalized query.

Let’s go through how these can be used to find good caching candidates.

Using AVG LATENCY and ROWS/QUERY to identify load

High average latency can indicate that a query could be more efficient or that the database is struggling to execute it quickly, leading to performance bottlenecks.

If we look at this Query Metrics dashboard, we can see that there are a few queries with high average latency:

The line graph for AVG LATENCY (top right) shows the average time it takes for queries to execute over a specific period. Spikes in this graph indicate periods where queries are taking longer to execute, which can indicate high-load conditions or performance issues within the database.

The two main spikes in the short time frame of the graph (one hour) both relate to write operations:

DELETE FROM UserTeams WHERE User_id = ? AND Team_id = ?.
DELETE FROM Sessions WHERE Id = ?

As delete operations, these can’t be cached. However, looking at the other queries in the list, we can see some SELECT queries with long average latencies. For instance, the “SELECT s. Id s…” query seems to have a long latency, so could be a good candidate for caching.

The cumulative time spent on this particular query (“TOTAL TIME”) and the proportion of the total database time that is consumed by this query (“PERCENT TIME”) is also high–over three-quarters of database time is spent on this query. It means it is a dominant consumer of database time and is likely to significantly contribute to performance issues.

A further pointer in this direction is the average number of rows processed per query execution (“ROWS/QUERY”). A high number of rows per query can suggest that the query is processing a large amount of data, which might be optimized for better performance.

This should be a query investigated for caching. And if it can’t be cached, it should be optimized to reduce the load on the database.

Using REQUESTS to analyze query frequency

Beyond the load of an individual query on the database, caching candidates can also come from analyzing query frequency.

The frequency of queries is a critical aspect of database performance. High-frequency queries, especially if they are resource-intensive, can contribute significantly to the load on the database. If specific queries are executed often, they can become candidates for optimization, such as by improving the query's efficiency, adding indexes to speed up execution, or caching the results when possible to reduce the load on the database.

In Datadog, we can look at REQUESTS to understand frequency:

The REQUESTS graph (top left) displays the number of queries over time, segmented by normalized query. There are fluctuations in query frequency, which can indicate peak usage times or potential bursts of activity that could stress the database. The table below provides more detailed data on the frequency of specific queries, with the "REQUESTS" column showing the total number of executions for each query.

This data helps identify which queries are run most frequently and may contribute to performance issues, particularly during peak activity periods, as shown in the graph. The two top SELECT queries have millions of requests between them. This means that they could be good options for caching, as optimizing them could lead to significant performance improvements.

Frequent execution implies that even minor efficiencies gained per query can substantially reduce the overall database load. Caching can serve repeated requests without needing to access the database each time, thus freeing up resources for other operations and improving the application's responsiveness.

Use Explain Plans to Evaluate Query Cost

xplain plans are a great feature of database statistics that provide insight into how the SQL optimizer will execute a query. They show the query execution path chosen by the database, including which indexes will be used, how tables are joined, and the estimated cost associated with each operation. By understanding explain plans, developers can identify potential performance issues and optimize queries, leading to faster execution and more efficient resource use.

You can generate an Explain plan within SQL using the EXPLAIN command:

Here, the startup cost of this query is 0, but the total cost is 458. That 458 is unitless but is a representation of the resources required to scan the table or index. It has to scan 10,000 rows, and each row is 244 bytes. By using EXPLAIN with different queries, you can understand how the SQL optimizer is traversing data and what optimizations can be made.

Datadog provides an excellent visual representation of an explain plan:

For more complex queries, it visualizes the most efficient way to execute a given query by considering various query plans. This is a good proxy for latency and overall resource consumption of specific queries.

In this specific plan, several "Hash Join" operations are shown, which are common in executing queries involving joining tables. The sequential scan ("Seq Scan") on pg_proc indicates a full scan of that table, which could be costly if the table is large.

The importance of this data lies in the ability to analyze and optimize queries. Operations with high costs could be targets for optimization. For example, if a sequential scan has a high total cost, it might be beneficial to consider indexing the scanned table to reduce the cost. The plan rows and plan width can also inform decisions about indexes and query structure.

Caching can also benefit high-cost queries as it reduces the need to perform expensive operations multiple times.

Monitor Throughput and Latency During Peak Times to Identify Bottlenecks

A database isn’t under constant load. Instead, load ebbs and flows over days, weeks, and months, depending on the type of service your application provides. Thus, you must identify peak load times to understand when caching data is most helpful.

You can use Datadog’s real-time monitoring to observe query performance during high-traffic periods, use historical data to predict future peak periods, and prepare by caching the most impactful queries.

By looking into the Query Details of a given query (that you have identified through explain plans and analysis of the metrics above), you can see when this particular query is putting strain on the system. Here, you can look at:

Requests. Peaks in this graph can indicate high traffic periods.
Avg latency. Spikes may suggest that queries take longer, possibly due to high load or inefficient execution plans.
Total time. Spikes can indicate periods where the query is consuming more resources.
Percent time. High consistent values or peaks suggest the query is a significant part of the database workload.
Rows/Query. Consistently high numbers indicate heavy data retrieval, which can impact performance.

This is a query-specific breakdown of the metrics from the main dashboard. Monitoring these metrics helps identify peak times for query load and execution times, critical for capacity planning, identifying potential bottlenecks, and optimizing database performance.

By analyzing these graphs, developers can identify when and what to cache to reduce loads at peak times.

Integrating Datadog With Postgres

Leveraging APMs like Datadog is critical to enhancing database performance and user experience.

If this has whetted your appetite for adding more monitoring to your database so you can understand query metrics, optimize performance, and cache complex, intensive queries, then you can add Datadog to any Postgres (or other) database.

Datadog has written a series on Postgres performance and how to integrate Datadog into your application and expand on some of the techniques here.

Start Caching Slow Queries with ReadySet

By identifying high-load queries, analyzing their frequency, evaluating their cost, and monitoring performance during peak times, you can set the stage to strategically cache the right queries which leads to faster page loads, a smoother user experience, and reduced costs.

ReadySet Cloud makes the caching process easy. Sign up for access here to start caching queries in fifteen minutes or less without making any changes to your application code.