<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ReadySet</title>
    <description>The latest articles on DEV Community by ReadySet (@readysettech).</description>
    <link>https://dev.to/readysettech</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7996%2Fab553ccc-064e-43f5-ab71-f0c95dc48b81.png</url>
      <title>DEV Community: ReadySet</title>
      <link>https://dev.to/readysettech</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/readysettech"/>
    <language>en</language>
    <item>
      <title>Optimizing Performance: A 2024 Updated Guide to Setting Up Caching in Laravel</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Mon, 04 Mar 2024 16:50:18 +0000</pubDate>
      <link>https://dev.to/readysettech/optimizing-performance-a-2024-updated-guide-to-setting-up-caching-in-laravel-4j6l</link>
      <guid>https://dev.to/readysettech/optimizing-performance-a-2024-updated-guide-to-setting-up-caching-in-laravel-4j6l</guid>
      <description>&lt;p&gt;Laravel is an excellent framework for those who want to use PHP. It pairs well with other languages, offers clean code, and includes features for a full-stack application out-of-the-box. &lt;/p&gt;

&lt;p&gt;One of those features is caching. Laravel includes an elegant, comprehensive caching system designed to be highly flexible, allowing developers to choose from various caching drivers and strategies, tailoring the caching solution to fit their applications' specific needs and architecture. &lt;/p&gt;

&lt;p&gt;This adaptability ensures that Laravel applications remain scalable, responsive, and efficient, regardless of complexity or user base size.&lt;/p&gt;

&lt;p&gt;This article will thoroughly explore Laravel caching, examine the various cache configurations available in Laravel, and discuss the reasons for utilizing  &lt;a href="https://readyset.io/?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;&lt;u&gt;Readyset&lt;/u&gt;&lt;/a&gt;. So, let's begin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching Options in Laravel
&lt;/h2&gt;

&lt;p&gt;Laravel has a robust caching system that supports various drivers, facilitating seamless caching implementation.&lt;/p&gt;

&lt;p&gt;The configuration of caching in a Laravel application is managed in the .env file. To change the caching option, modify the &lt;code&gt;CACHE_DRIVER&lt;/code&gt; value in &lt;code&gt;.env&lt;/code&gt; to your preferred caching driver. By default, it's set to file. You can find all supported caching options in &lt;code&gt;config/cache.php&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  File
&lt;/h3&gt;

&lt;p&gt;File caching stores cache data in your computer's file system. It offers a simple, disk-based caching mechanism suitable for smaller applications but may not be optimal for larger ones. To use file caching, add to your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=file
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Array
&lt;/h3&gt;

&lt;p&gt;The array cache driver stores data in a PHP array. This non-persistent method is suitable for request-based caching, especially during application testing. To use it, set in your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=array
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Database
&lt;/h3&gt;

&lt;p&gt;Database caching stores cache data in a database table, ideal for applications needing database-driven caching. First, create a cache table using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan cache:table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, run migrations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan migrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, update your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memcached
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://memcached.org/"&gt;&lt;u&gt;Memcached&lt;/u&gt;&lt;/a&gt; is a memory caching system for distributed caching environments. Install the&lt;a href="https://pecl.php.net/package/memcached"&gt;&lt;u&gt; Memcached PECL package&lt;/u&gt;&lt;/a&gt;, ensure Memcached is installed and running on your server, and set in your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=memcached
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Redis
&lt;/h3&gt;

&lt;p&gt;Redis offers advanced key-value store caching with data persistence and high performance. Install the Redis server and the PhpRedis PHP extension via PECL or the predis/predis package (e.g., composer require predis/predis "^1.1") via Composer. Then, update your .env:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=redis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DynamoDB
&lt;/h3&gt;

&lt;p&gt;For DynamoDB caching, create a table in AWS DynamoDB and note its name. Install the AWS SDK for PHP via Composer (composer require aws/aws-sdk-php), set AWS credentials (&lt;code&gt;AWS_ACCESS_KEY_ID&lt;/code&gt;, &lt;code&gt;AWS_SECRET_ACCESS_KEY&lt;/code&gt;, &lt;code&gt;AWS_DEFAULT_REGION&lt;/code&gt;) in your &lt;code&gt;.env&lt;/code&gt; file, update &lt;code&gt;config/cache.php&lt;/code&gt; with DynamoDB details, and set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=dynamodb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Octane
&lt;/h3&gt;

&lt;p&gt;Laravel Octane enhances performance significantly and can be used for caching. Install Octane via composer require laravel/octane, choose a caching backend like Swoole Table or Redis, update &lt;code&gt;config/cache.php&lt;/code&gt; to set 'octane' as the default cache driver, and configure your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=octane
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Octane with php artisan &lt;code&gt;octane:start&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Null
&lt;/h3&gt;

&lt;p&gt;The null cache driver effectively disables caching, useful in specific environments or during development/testing when caching interference is undesirable. Set in your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CACHE_DRIVER=null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuring Cache Settings
&lt;/h3&gt;

&lt;p&gt;For further customization and configuration of cache-related settings, modify the &lt;code&gt;config/cache.php&lt;/code&gt; file. This file allows for additional settings specific to each caching driver and the overall caching strategy of your application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Laravel Cache Methods
&lt;/h2&gt;

&lt;p&gt;Laravel offers a versatile range of caching methods, allowing for the easy implementation of various caching strategies. Here's an explanation of these methods, organized by functionality:&lt;/p&gt;

&lt;h2&gt;
  
  
  Storing Data in Cache Methods
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cache::put()
&lt;/h3&gt;

&lt;p&gt;This function stores data in Laravel's cache. It requires a key, the value to be stored, and an optional duration (in seconds) for how long the data should be kept. If the duration is not specified, it uses the default duration from &lt;code&gt;config/cache.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::put('key', 'value', 60); // Stores 'value' under 'key' for 60 seconds.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::putMany()
&lt;/h3&gt;

&lt;p&gt;Similar to &lt;code&gt;Cache::put()&lt;/code&gt;, this function allows caching of several items simultaneously with a shared expiration duration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::putMany(['data1' =&amp;gt; 'value1', 'data2' =&amp;gt; 'value2'], 30); // Stores 'value1' and 'value2' for 30 seconds.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::remember()
&lt;/h3&gt;

&lt;p&gt;This method checks for the specified key in the cache. If not found, it executes the provided closure function to obtain data, caches the result, and returns it. This is useful for reducing database queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$value = Cache::remember('key', 60, function() {
    return 'data'; // Data retrieval logic
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::forever()
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Cache::forever()&lt;/code&gt; is used to store data in the cache indefinitely, ideal for rarely changing data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::forever('key', 'data');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Retrieving and Managing Cached Data
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cache::get()
&lt;/h3&gt;

&lt;p&gt;Retrieves the value for a specified key. If the key does not exist, it returns null or a default value if provided:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$data = Cache::get('key', 'default-value');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::many()
&lt;/h3&gt;

&lt;p&gt;Fetches multiple cache items using an array of keys and returns an array of values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$data = Cache::many(['key1', 'key2']);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::forget()
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Cache::forget()&lt;/code&gt; removes a specific item from the cache, useful for invalidating cache entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::forget('key');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cache::flush()
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Cache::flush()&lt;/code&gt; clears all cache data. Use with caution as it removes everything stored in the cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::flush();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Adjusting Cached Values: Increment and Decrement
&lt;/h3&gt;

&lt;p&gt;To increment or decrement cached values, use &lt;code&gt;Cache::increment()&lt;/code&gt; and &lt;code&gt;Cache::decrement()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache::increment('key');
Cache::decrement('key');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cache::lock()
&lt;/h3&gt;

&lt;p&gt;For handling race conditions, Laravel offers atomic locks using &lt;code&gt;Cache::lock()&lt;/code&gt;. This is an advanced feature useful in specific scenarios.&lt;/p&gt;

&lt;p&gt;While there are more methods available in Laravel's cache system, the ones mentioned above are among the most widely used. For a comprehensive list, refer to&lt;a href="https://laravel.com/docs/10.x/cache"&gt;&lt;u&gt; Laravel's official cache documentation&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Useful Laravel Cache Commands
&lt;/h2&gt;

&lt;p&gt;Laravel provides several commands through&lt;a href="https://laravel.com/docs/10.x/artisan"&gt;&lt;u&gt; artisan&lt;/u&gt;&lt;/a&gt; to simplify cache management. Below are some of these essential commands:&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear Laravel Cache
&lt;/h3&gt;

&lt;p&gt;To clear the application cache before it expires, especially useful in development, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan cache:clear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command removes all items from the cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear Route Cache
&lt;/h3&gt;

&lt;p&gt;Laravel caches the application routes for improved performance. To clear this route cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan route:clear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And to cache the routes, which is advisable in production for faster route registration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan route:cache
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Clear Config Cache
&lt;/h3&gt;

&lt;p&gt;When you change config files and need to refresh the cached configuration, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan config:clear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially important after deploying changes in a production environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear Compiled Views
&lt;/h3&gt;

&lt;p&gt;Compiled view files can be cleared with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan view:clear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful when updating views and needing to force Laravel to recompile them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clear All Cache
&lt;/h3&gt;

&lt;p&gt;For a comprehensive cache clearing that includes cache, route cache, view cache, and compiled services, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan optimize:clear
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command is particularly helpful during deployments to ensure all cached elements are refreshed. Each of these commands plays a crucial role in efficiently managing cached data and ensuring the smooth operation of a Laravel application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Readyset?
&lt;/h2&gt;

&lt;p&gt;Caching is a widely adopted technique used to enhance data retrieval speed. It works by storing the results of data requests in memory. This way, when that same data is required again, it can be quickly fetched from memory instead of being reprocessed.&lt;/p&gt;

&lt;p&gt;However, caching isn't without its challenges. One of the primary issues is managing the cache effectively. This involves updating the cache regularly and removing (or invalidating) cached data when it's no longer accurate or relevant.&lt;/p&gt;

&lt;p&gt;Readyset is a specialized SQL caching tool designed to enhance database performance. It stands in the middle of your application and your MySQL or PostgreSQL database and caches data.&lt;/p&gt;

&lt;p&gt;Here's a simple breakdown of how Readyset works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unique Approach:&lt;/strong&gt; Unlike typical caching methods, Readyset employs a dataflow graph technique. This means it automatically updates cached data whenever new information is added to your primary database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration:&lt;/strong&gt; After capturing an initial image of your database, Readyset positions itself as a secondary database. This allows it to continuously receive and incorporate new data directly from your main database server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protocol Compatibility:&lt;/strong&gt; Readyset is compatible with both MySQL and PostgreSQL protocols. This compatibility means you can connect your application straight to Readyset without major adjustments. If Readyset encounters a query that it hasn't cached, it simply forwards this query to your main database to ensure consistent and accurate data retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Readyset With Laravel
&lt;/h2&gt;

&lt;p&gt;To get started using Readyset in a Laravel project, you need to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install Readyset on your machine or server&lt;/li&gt;
&lt;li&gt;Connect to Readyset&lt;/li&gt;
&lt;li&gt;Cache Queries&lt;/li&gt;
&lt;li&gt;Configure your Laravel application database to connect to Readyset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now we know what to do, let's get started with the implementation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Readyset
&lt;/h3&gt;

&lt;p&gt;We need a database to connect to. For this tutorial, we will use an employee sample database. To follow along, you can find the steps to download and install it on &lt;a href="https://github.com/datacharmer/test_db"&gt;&lt;u&gt;GitHub&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The recommended approach to install Readyset is through Docker. Ensure that you have Docker installed globally. Next, download the Readyset Docker image from the Docker hub using the command below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker pull readysettech/readyset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After downloading the image, you need your primary database credentials to start Readyset. Your credentials should be in this format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql://&amp;lt;username&amp;gt;:&amp;lt;password&amp;gt;@&amp;lt;host&amp;gt;:&amp;lt;port&amp;gt;/&amp;lt;db_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's start Readyset as a root user with a password of password and connect to the employee database we downloaded above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -d -p 3307:3307 -p 6034:6034         \
--name readyset                                 \
-e UPSTREAM_DB_URL=mysql://root:password@host.docker.internal:3306/employees \
-e LISTEN_ADDRESS=0.0.0.0:3307                  \
readysettech/readyset:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start up Readyset. You can confirm that Readyset is running from your Docker console by looking at the Log of the Readyset docker container. Your screen should be similar to the image below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2coLEIPr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/zV9ZiiDmmQlMiIEnzd8dtaobUSFiQ92medKgkiGFqitqiiLu8BD73phICG2CYR-H46KL2dZjhKKCdUiqI2Q8qv0d80MmnDj13PBzP7YhV1ZDBjPIHplowR12KcNAF1IBdtqbh4f1PtM18nHRfIltBDs" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2coLEIPr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/zV9ZiiDmmQlMiIEnzd8dtaobUSFiQ92medKgkiGFqitqiiLu8BD73phICG2CYR-H46KL2dZjhKKCdUiqI2Q8qv0d80MmnDj13PBzP7YhV1ZDBjPIHplowR12KcNAF1IBdtqbh4f1PtM18nHRfIltBDs" alt="Docker console showing a Log of the Readyset docker container" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also confirm this in the terminal using the command below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker logs readyset
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: Your Database user would require a password to connect to Readyset. MySQL or Postgres users without passwords would not work. If your database runs locally, use &lt;code&gt;host.docker.internal&lt;/code&gt; not  &lt;code&gt;localhost&lt;/code&gt; for the connection.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For a more detailed guide on installing Readyset, visit the &lt;a href="https://docs.readyset.io/get-started/install-rs/docker/mysql?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;&lt;u&gt;Readyset documentation&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connect to Readyset
&lt;/h3&gt;

&lt;p&gt;We can connect to Readyset the same way we would connect to MySQL using the terminal. Readyset uses a different port number &lt;code&gt;3307&lt;/code&gt;. MySQL uses port &lt;code&gt;3306&lt;/code&gt; by default. This is the only significant difference in the connection process. Now let's connect to Readyset using the same details as the root user above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql -u root -p password -P 3307
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify that you are connected using the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql&amp;gt; SHOW READYSET TABLES;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x26YccSF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/BYGWTxjnwaaZnTiLMF7nPsUwxSNDTjWLROF1Mjhl-NgJBmBvfPIyHSrpbD5C7gyRhlemBFr0IWnAI_GbV86ZBkOPQ0PGeDD0wYFQ_IccQe3wrHyFY36ggONKjvGaCcAuS7rXwJCwRrS3uHqDPzfm9SY" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x26YccSF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/BYGWTxjnwaaZnTiLMF7nPsUwxSNDTjWLROF1Mjhl-NgJBmBvfPIyHSrpbD5C7gyRhlemBFr0IWnAI_GbV86ZBkOPQ0PGeDD0wYFQ_IccQe3wrHyFY36ggONKjvGaCcAuS7rXwJCwRrS3uHqDPzfm9SY" alt="Display of Readyset tables showing which tales can be cached with Readyset by displaying the status  raw `snapshotted` endraw " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The tables with the status of &lt;code&gt;snapshotted&lt;/code&gt; can now be cached using Readyset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create Cache
&lt;/h3&gt;

&lt;p&gt;To cache using Readyset, we first run a query one time, check that the query is cacheable by Readyset, then create cache for the query. To get started, let's run a simple query to count how many employees earn 100,000:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql&amp;gt; SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--2diK7QLk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/EBmRkNzNU-bmtIH5Y9F_DEl6CAzfmQCcE3QY11cuMPSGctX8Z3b61iayJrkw0F8yNchrt-J4JIlIvrEUnD22p3Mnd6s3oRY_MrB-7oe1V94q6JVBDRm0VtiJzd_J4j7dlFDtgjp5ru8rHE--jI-1Wks" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2diK7QLk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/EBmRkNzNU-bmtIH5Y9F_DEl6CAzfmQCcE3QY11cuMPSGctX8Z3b61iayJrkw0F8yNchrt-J4JIlIvrEUnD22p3Mnd6s3oRY_MrB-7oe1V94q6JVBDRm0VtiJzd_J4j7dlFDtgjp5ru8rHE--jI-1Wks" alt="Display of results of query to count how many employees earn $100,000 salaries." width="800" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice that the request took 0.44 seconds to complete. Next, let's ensure that the query is cacheable by Readyset using this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql&amp;gt; SHOW PROXIED SUPPORTED QUERIES;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TNTGGDfL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/OtEMMR6eRKDPgqQayQ5CNBjTFS9h1NnqGphjYob-rN-CNObEtovUYlHGdi7t6Ie9SSu9SRKqFAcmWMOAHisr7_U6hdO7lLc6FSIvXxIm9zQAdKT2O1JNUyvJGOZyfKBXB8wncVvj4z7qCVP2eTlSDAg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TNTGGDfL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/OtEMMR6eRKDPgqQayQ5CNBjTFS9h1NnqGphjYob-rN-CNObEtovUYlHGdi7t6Ie9SSu9SRKqFAcmWMOAHisr7_U6hdO7lLc6FSIvXxIm9zQAdKT2O1JNUyvJGOZyfKBXB8wncVvj4z7qCVP2eTlSDAg" alt="Display of results after running a command to see if the query is cacheable" width="800" height="108"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's create our cache by adding &lt;code&gt;CREATE CACHE FROM&lt;/code&gt; to the query and then run the query after caching to verify it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql&amp;gt; CREATE CACHE FROM SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this command runs, repeat the &lt;code&gt;select&lt;/code&gt; query to see how fast it runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mysql&amp;gt; SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iSWubqm6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/blpc7wJP7BDTU92l-dvbZ6u4XSyAlbBCp76QYgzmKcskKTD805F320tNLuuMKAVColFoP4c1vqs-k-lHmTj6R8YecrRDAI5GBlfZKoM0PQ3c9gg8T_6o7wNHoR7EHq8DHs4oWrLPpr0v9F2PFH4nNBc" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iSWubqm6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/blpc7wJP7BDTU92l-dvbZ6u4XSyAlbBCp76QYgzmKcskKTD805F320tNLuuMKAVColFoP4c1vqs-k-lHmTj6R8YecrRDAI5GBlfZKoM0PQ3c9gg8T_6o7wNHoR7EHq8DHs4oWrLPpr0v9F2PFH4nNBc" alt="Display of results after creating a cache and running the query to verify caching works" width="800" height="144"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now it takes only 0.03 seconds to run the same command that took 0.44 seconds to run earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Readyset Caching With Laravel
&lt;/h2&gt;

&lt;p&gt;Now we have Readyset installed, let's take the implementation a step forward by implementing caching in a simple Laravel API using Readyset.&lt;/p&gt;

&lt;p&gt;Remember that Readyset sits between your MySQL database and your application. To see Readyset in action, we will execute the request we already created a cache for earlier: &lt;code&gt;SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000;&lt;/code&gt; in our Laravel API and measure the response time when our database is connected to Readyset and when it is not.&lt;/p&gt;

&lt;p&gt;To get started, let's create a new Laravel project using the composer command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;composer create project laravel/laravel Readyset_Laravel_Caching
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the project is created, connect your application to your database. In your .env file, add your database credentials like the ones below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=employees
DB_USERNAME=&amp;lt;username&amp;gt;
DB_PASSWORD=&amp;lt;password&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  API Route
&lt;/h3&gt;

&lt;p&gt;The next step would be to create a route for the request. Navigate to routes/api.php file and add the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Route::get('/test-cache', function () {
    $employees = (DB::select('SELECT COUNT(*) FROM employees.salaries WHERE salary = 100000'));
    return response()-&amp;gt;json($employees);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing the API
&lt;/h3&gt;

&lt;p&gt;Now let serve our API and compare the response time of the running the request when caching with Readyset and when not.&lt;/p&gt;

&lt;p&gt;Serve the application using the command below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;php artisan serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Request Without Cache
&lt;/h3&gt;

&lt;p&gt;To test the application, you can use any API client you prefer. If you find it difficult to expose your endpoint to the internet for testing, I suggest using Ngrok. If you're unfamiliar with &lt;a href="https://ngrok.com/"&gt;&lt;u&gt;Ngrok&lt;/u&gt;&lt;/a&gt;, here's a useful &lt;a href="https://arjunamrutiya.medium.com/setting-up-laravel-with-ngrok-a-step-by-step-guide-a565b2c430b5"&gt;&lt;u&gt;tutorial to help you set it up&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Response time of request without caching.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Uug2Vbt8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/A5XvKGx9gF4jenvhRibHrouEv9MmuaEyhmYP9rmYtE95XorQqJCz1aocP6qcQsxId4kyuLDzmZXyhKKD7-ipgljslF3wdDjEAJu41SRK0dPn1eYJuPR8zfWKwz-8pHA92m_KXr-3sjWAeJ5s6npN2DE" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Uug2Vbt8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/A5XvKGx9gF4jenvhRibHrouEv9MmuaEyhmYP9rmYtE95XorQqJCz1aocP6qcQsxId4kyuLDzmZXyhKKD7-ipgljslF3wdDjEAJu41SRK0dPn1eYJuPR8zfWKwz-8pHA92m_KXr-3sjWAeJ5s6npN2DE" alt="Display of response time of request without caching" width="800" height="574"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Request With Readyset Cache
&lt;/h3&gt;

&lt;p&gt;As Readyset caches SQL results, you can add it to Laravel via the database setup rather than through the &lt;code&gt;CACHE_DRIVER&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Connecting to Readyset can be done simply by changing the port number of your database connection in the env file from &lt;code&gt;DB_PORT=3306&lt;/code&gt; to &lt;code&gt;DB_PORT=3307&lt;/code&gt;. Let's do that using the code below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3307
DB_DATABASE=employees
DB_PASSWORD=&amp;lt;username&amp;gt;
DB_USERNAME=&amp;lt;password&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EJ5rqfyB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/hpa-rmb56XuXNwOleI31zl0zfpme_vbxRw0E-xExpivJPlr4FhRPu2jssL6SsMPuqA_U5BYwxYtHYCuFShIRSGHVWsL38bGcLmzT83uv6-H3baB8qg3y-zbLW9_e--QhLI4gmzxgnQeiS3GIJC5Vues" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EJ5rqfyB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://lh7-us.googleusercontent.com/hpa-rmb56XuXNwOleI31zl0zfpme_vbxRw0E-xExpivJPlr4FhRPu2jssL6SsMPuqA_U5BYwxYtHYCuFShIRSGHVWsL38bGcLmzT83uv6-H3baB8qg3y-zbLW9_e--QhLI4gmzxgnQeiS3GIJC5Vues" alt="Display of response time of request with Readyset cahce" width="800" height="618"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Easily Manage Caching in Laravel With Readyset
&lt;/h2&gt;

&lt;p&gt;Laravel developers are spoiled with caching options. From simple file caching through to DynamoDB, you can find a caching solution for any situation. Add to that, the caching API is easy to use and understand.&lt;/p&gt;

&lt;p&gt;But you are still stuck with managing that cache and all the headaches that go with it. Readyset removes those headaches and makes it easy for laravel developers to incorporate SQL caching into their applications with zero code changes. Here, we got our hands dirty with the code by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing and running Readyset using Docker&lt;/li&gt;
&lt;li&gt;Creating a simple API in Laravel&lt;/li&gt;
&lt;li&gt;Seeding our database&lt;/li&gt;
&lt;li&gt;Connecting our database to Readyset, and implementing caching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feel free to experiment with the code as much as you can and share your experience with the rest of the community. If you are interested in using Readyset, you can &lt;a href="https://readyset.io/?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;sign up here&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>caching</category>
      <category>database</category>
    </item>
    <item>
      <title>Investigating and Optimizing Over-Querying</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Wed, 14 Feb 2024 14:59:13 +0000</pubDate>
      <link>https://dev.to/readysettech/investigating-and-optimizing-over-querying-2ek</link>
      <guid>https://dev.to/readysettech/investigating-and-optimizing-over-querying-2ek</guid>
      <description>&lt;p&gt;Imagine you're running a popular e-commerce online bookstore that offers a vast collection of titles and authors to a growing user base. However, you've noticed a troubling trend over the past few months: the website is gradually slowing down, especially during peak hours when users browse various book categories. After an initial investigation, you find that the cause of the slowdown isn't an increase in user traffic or a lack of server resources. Instead, it's rooted in the very foundation of how your application interacts with your Postgres database.&lt;/p&gt;

&lt;p&gt;The culprit? N+1 query problems. As more users navigate your site, more requests are made to the database to fetch information. Instead of being efficiently retrieved in grouped queries, each request individually pulls associated data like author details, reviews, and related books. What should have been a streamlined operation has turned into a burdensome load on your database, leading to longer load times and a compromised user experience.&lt;/p&gt;

&lt;p&gt;This scenario is not unique to your online bookstore. Regardless of size or domain, many applications encounter similar performance bottlenecks due to N+1 queries. Understanding the nature of these queries, their impact on database performance, and how to optimize them is crucial for developers and database administrators. Here, we’re going into N+1 queries in a Postgres environment, providing insights and strategies to turn a potential database nightmare into a well-optimized, efficient system.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are N+1 Queries?
&lt;/h2&gt;

&lt;p&gt;N+1 queries are a common performance bottleneck in databases. This issue occurs when an application performs an initial query to retrieve a set of records, followed by additional queries for each individual record. The name 'N+1' stems from making one (1) initial query and then N additional queries, resulting in N+1 total queries for N records.&lt;/p&gt;

&lt;p&gt;Let’s say you have an application that displays user profiles and their respective posts. The application first executes a query to fetch all users. This is the "1" in N+1. The application performs another query for each user retrieved to fetch their posts. If there are ten users, this results in 10 additional queries (the "N" in N+1), totaling 11 queries. While this approach may seem straightforward, it's highly inefficient, especially as the number of users grows.&lt;/p&gt;

&lt;p&gt;Let’s look at what this looks like. Assume you have two tables: &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;posts&lt;/code&gt;. Each user has multiple posts. The &lt;code&gt;posts&lt;/code&gt; table has a foreign key that references the &lt;code&gt;users&lt;/code&gt; table. You want to display each user along with their posts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Users Table:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;id&lt;/code&gt; (Primary Key)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;name&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Posts Table:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;id&lt;/code&gt; (Primary Key)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;content&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt; (Foreign Key to Users)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A naive N+1 query to get these posts might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Query 1: Fetch all users
SELECT id, name FROM users;

-- For each user obtained from the above query, execute the following query:
SELECT content FROM posts WHERE user_id = [user_id];
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, Query 1 is the "1" query. Query 2 will be the “N” queries, where you iterate through the &lt;code&gt;user_ids&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As a result, if there are 100 users, the total number of queries executed will be 101: 1 for fetching all users and 100 for fetching the posts for each user. This is a classic example of an N+1 query problem and can lead to performance issues, especially with a large number of users and posts.&lt;/p&gt;

&lt;h3&gt;
  
  
  N+1 Queries in ORMs
&lt;/h3&gt;

&lt;p&gt;It’s common to find N+1 queries in frameworks using Object-Relational Mappings (ORMs). These are designed to convert models in an application into SQL statements to query data in relational databases. However, they often exacerbate the N+1 query issue due to their internal mechanisms.&lt;/p&gt;

&lt;p&gt;Let’s use the Python web framework Django as our example. In Django, an N+1 query problem can quickly occur when you have related models and access related data without properly optimizing your queries. Let's consider a scenario where each &lt;code&gt;User&lt;/code&gt; has a foreign key to a &lt;code&gt;Profile&lt;/code&gt; model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# models.py
from django.db import models

class Profile(models.Model):
    bio = models.TextField()
    # other fields like date_of_birth, location, etc.

class User(models.Model):
    name = models.CharField(max_length=100)
    profile = models.OneToOneField(Profile, on_delete=models.CASCADE)
    # other fields like email, etc.

# views.py
from django.shortcuts import render
from .models import User

def user_list(request):
    users = User.objects.all()
    user_profiles = []
    for user in users:
        profile = user.profile  # This creates the N+1 problem
        user_profiles.append((user, profile))

    context = {'user_profiles': user_profiles}
    return render(request, 'user_list.html', context)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the &lt;code&gt;User.objects.all()&lt;/code&gt; query retrieves all users. This is the "1" in N+1. Inside the loop, accessing each user's profile may result in a separate database query to fetch the corresponding &lt;code&gt;Profile&lt;/code&gt; instance. This is the "N" part of N+1, where N is the number of users.&lt;/p&gt;

&lt;p&gt;Because ORMs abstract away the underlying SQL queries, developers might not immediately realize that their code generates these inefficient N+1 queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Impact of N+1 Queries on Performance
&lt;/h2&gt;

&lt;p&gt;N+1 queries impact performance, especially in large-scale systems. These queries lead to inefficient data retrieval, increased load on the database, and, ultimately, a poor user experience. Understanding the consequences of N+1 queries is crucial for database optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Increased Database Load
&lt;/h3&gt;

&lt;p&gt;Each additional query in an N+1 problem adds to the load on the database. This is particularly problematic with large datasets. For instance, if an application retrieves 1000 users and makes a separate query for each user's posts, it results in 1001 queries hitting the database instead of potentially just one. This extra load can slow down the database response times, affecting all application users.&lt;/p&gt;

&lt;p&gt;Imagine a web application displaying a list of users and their recent activities. Without optimization, fetching 1,000 users might result in 1,001 queries (one to fetch all users and 1,000 to fetch activities for each). This heavy load can lead to longer wait times for the data to load, affecting user experience.&lt;/p&gt;

&lt;p&gt;A high volume of queries also consumes more CPU and memory resources on the database server. In our previous example, instead of a single, efficient query, the server must process 1,001 queries, consuming more resources. This impacts the query in question and affects the overall efficiency of the database server, hindering its ability to handle other requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling Challenges
&lt;/h3&gt;

&lt;p&gt;Applications with N+1 query problems often struggle to scale. As the data grows, so does the number of queries. In a social media app, for example, as more users join and create posts, the N+1 issue exacerbates, leading to an exponential increase in queries. This scaling challenge can cause significant performance degradation over time, requiring more hardware resources to maintain the same level of performance.&lt;/p&gt;

&lt;p&gt;The total time complexity for the N+1 queries pattern is O(N). As the number of records (N) increases, the total number of queries increases linearly. This linear growth can lead to significant performance degradation, especially with large datasets. Each query incurs a certain amount of overhead due to network latency, query parsing, execution planning, and data retrieval. This overhead, multiplied by the number of queries, can substantially affect performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Poor User Experience
&lt;/h3&gt;

&lt;p&gt;The cumulative effect of these performance issues can result in longer loading times for users, negatively impacting the user experience. In an e-commerce site, if product details are fetched with N+1 queries, each additional millisecond in load time can potentially lead to lost sales as customers grow impatient and leave the site.&lt;/p&gt;

&lt;p&gt;Consider a real-time data dashboard that monitors and displays various metrics. If each metric's data is fetched using separate queries for each element, the dashboard will experience noticeable delays, failing to deliver the real-time experience expected by users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detecting and Investigating N+1 Queries
&lt;/h2&gt;

&lt;p&gt;Sometimes, the simplest way to detect N+1 queries is through careful code review. Reviewers can look for loops or iterative processes that make database calls, particularly in the context of ORMs or when accessing related data.&lt;/p&gt;

&lt;p&gt;But N+1 queries can often be subtle and not immediately evident, especially in complex applications. Here are some effective strategies and tools that can be employed to detect and investigate N+1 queries:&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Logging
&lt;/h3&gt;

&lt;p&gt;Enabling query logging in Postgres is one of the first steps in detecting N+1 query problems. Logging all executed queries allows you to analyze the logs for patterns that indicate N+1 issues.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Set the logging level to log all statements
ALTER DATABASE your_database_name SET log_statement = 'all';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After enabling logging, look for sequences of similar queries that differ only in a parameter, such as multiple queries fetching details for different user IDs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2024-01-23 10:00:01 UTC LOG:  statement: SELECT * FROM users
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 1
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 2
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 3
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 4
2024-01-23 10:00:02 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 5
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 6
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 7
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 8
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 9
2024-01-23 10:00:03 UTC LOG:  statement: SELECT * FROM posts WHERE user_id = 10
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a strong indicator of N+1 queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Monitoring Tools
&lt;/h3&gt;

&lt;p&gt;Tools like &lt;a href="https://github.com/darold/pgbadger"&gt;pgBadger&lt;/a&gt;, &lt;a href="https://www.datadoghq.com/blog/database-performance-monitoring-datadog/"&gt;DataDog&lt;/a&gt;, or &lt;a href="https://scoutapm.com/blog/understanding-n1-database-queries"&gt;ScoutAPM&lt;/a&gt; offer detailed insights into database performance. They can help identify inefficient query patterns that may suggest N+1 issues.&lt;/p&gt;

&lt;p&gt;With pgBadger, you can analyze your Postgres logs to get a report highlighting frequently executed queries. A high frequency of similar queries can be a sign of N+1 problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjsnrs4ldikq7lto4au8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjsnrs4ldikq7lto4au8.png" alt="Table showing query patterns and frequency of execution" width="691" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;SELECT * FROM posts WHERE user_id = ?&lt;/code&gt; query stands out due to its high frequency of execution (850 times). This pattern is indicative of an N+1 query problem. It suggests that for each user fetched by the &lt;code&gt;SELECT * FROM users&lt;/code&gt; query (executed ten times), there are multiple subsequent queries to fetch posts. The average duration of each &lt;code&gt;posts&lt;/code&gt; query is low (5ms), but the cumulative impact (total duration of 4250ms) is significant, pointing to a potential performance issue.&lt;/p&gt;

&lt;p&gt;The disparity between the number of executions of the &lt;code&gt;users&lt;/code&gt; query and the &lt;code&gt;posts&lt;/code&gt; query is a classic sign of N+1 queries. From this, the recommendations might be to investigate the application code following the execution of &lt;code&gt;SELECT * FROM users&lt;/code&gt;, especially the parts where posts for each user are accessed or displayed. Then, developers can optimize the query pattern to use JOIN operations or batch processing to reduce the total number of queries.&lt;/p&gt;

&lt;p&gt;An application performance monitoring tool like DataDog or Scout APM will often automatically highlight inefficiencies in an application, such as N+1 queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hofr1069jh7qz0ani0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hofr1069jh7qz0ani0g.png" alt="Application performance monitoring tool displaying inefficiencies in an application" width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;They can show the response times for these queries, the number of calls, and the query itself. This kind of report helps pinpoint where the application might be inefficiently using the database, guiding developers toward specific areas of the code that may need optimization to address the N+1 query problem.&lt;/p&gt;

&lt;p&gt;Languages also have profiling tools built in, such as Python's &lt;code&gt;cProfile&lt;/code&gt;. These can help detect N+1 queries by showing where your application spends most of its time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import cProfile
cProfile.run('function_that_loads_data()')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This profiling can help identify functions that are making excessive database calls. A large amount of time spent in database-related functions could indicate N+1 queries.&lt;/p&gt;

&lt;p&gt;Beyond language tools, most ORMs allow enabling SQL debug logging. This feature logs every SQL query the ORM executes, making spotting repetitive query patterns indicative of N+1 problems easier. Keeping with the Python theme, in Django, SQL debug logging can be activated by setting &lt;code&gt;DEBUG = True&lt;/code&gt; in your &lt;code&gt;settings.py&lt;/code&gt;, which causes all SQL queries to be printed to the console during development. You can also use Django's &lt;code&gt;django.db.connection.queries&lt;/code&gt; for query insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  EXPLAIN Command
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;EXPLAIN&lt;/code&gt; command in Postgres is an invaluable tool for understanding how your queries are being executed. It can help you identify queries that do not efficiently use indexes or perform full table scans, which could be part of an N+1 problem.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;EXPLAIN SELECT * FROM posts WHERE user_id = 1;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The output provides insights into the query execution plan, which can help identify inefficiencies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QUERY PLAN
-----------------------------------------------------------
 Seq Scan on posts  (cost=0.00..35.50 rows=1560 width=2048)
   Filter: (user_id = 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;EXPLAIN&lt;/code&gt; tells us we perform a “Seq Scan on posts.” This indicates that a sequential scan is being performed on the &lt;code&gt;posts&lt;/code&gt; table. Sequential scans are generally less efficient than index scans, especially for larger tables, as they involve scanning each row.&lt;/p&gt;

&lt;p&gt;The sequential scan (Seq Scan) on the &lt;code&gt;posts&lt;/code&gt; table might not be a problem for a single query. However, suppose similar queries are being executed repeatedly for different &lt;code&gt;user_id&lt;/code&gt; values (as in an N+1 scenario). In that case, it indicates that the database is performing numerous full table scans, which can be highly inefficient.&lt;/p&gt;

&lt;p&gt;The absence of an index scan suggests that there might not be an index on the &lt;code&gt;user_id&lt;/code&gt; column or the query planner did not find it efficient to use the index. For an N+1 query pattern, the repeated execution of such full table scans can significantly degrade performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimization and Query Design
&lt;/h2&gt;

&lt;p&gt;Optimization and better query design are the only strategies to significantly reduce the impact of N+1 query problems in a Postgres environment. Doing so naturally leads to better application performance and scalability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Eager loading
&lt;/h3&gt;

&lt;p&gt;When using ORMs, the best solution (given you can’t optimize the query yourself) is eager loading. Eager loading involves modifying the query to fetch all related data in a single query instead of separate queries for each record. This can be achieved in ORMs with specific methods or query options (like &lt;code&gt;.include&lt;/code&gt; in Rails or &lt;code&gt;select_related&lt;/code&gt; in Django). These reduce the number of queries to the database, improving performance.&lt;/p&gt;

&lt;p&gt;If we want to optimize our Django example above, this can be achieved through techniques like &lt;code&gt;select_related&lt;/code&gt; or &lt;code&gt;prefetch_related&lt;/code&gt;, designed to handle database queries more efficiently for related objects.&lt;/p&gt;

&lt;p&gt;Assuming the &lt;code&gt;User&lt;/code&gt; model has a ForeignKey to another model, let's say a &lt;code&gt;Profile&lt;/code&gt; model, here's an optimized version of the Django code that avoids the N+1 query problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# models.py
from django.db import models

class Profile(models.Model):
    bio = models.TextField()
    # other fields

class User(models.Model):
    name = models.CharField(max_length=100)
    profile = models.OneToOneField(Profile, on_delete=models.CASCADE)
    # other fields

class Post(models.Model):
    user = models.ForeignKey(User, related_name='posts', on_delete=models.CASCADE)
    content = models.TextField()
    # other fields

# views.py
from django.shortcuts import render
from .models import User

def user_profiles(request):
    # Use 'select_related' to fetch the related Profile in the same query
    users = User.objects.select_related('profile').all()

    # 'prefetch_related' is used for reverse ForeignKey relationships
    # This fetches all related posts in a separate query, reducing the overall number of queries
    users = users.prefetch_related('posts')

    return render(request, 'user_profiles.html', {'users': users})
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;select_related('profile')&lt;/code&gt; method is used with the &lt;code&gt;User.objects.all()&lt;/code&gt; query. This fetches the associated &lt;code&gt;Profile&lt;/code&gt; for each &lt;code&gt;User&lt;/code&gt; in the same database query, thus avoiding separate queries for each user's profile.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;prefetch_related('posts')&lt;/code&gt; method handles the reverse ForeignKey relationship from &lt;code&gt;User&lt;/code&gt; to &lt;code&gt;Post&lt;/code&gt;. It performs a separate query to fetch all related posts and then efficiently pairs them with the corresponding users, which is more efficient than doing individual queries for each user's posts.&lt;/p&gt;

&lt;p&gt;This approach significantly reduces the number of queries, particularly when you have a large number of users and posts, thus improving the performance of your Django application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching
&lt;/h3&gt;

&lt;p&gt;If you have particular queries you want to return fast, you can implement caching to reduce the need to query the database each time.&lt;/p&gt;

&lt;p&gt;Readyset allows you to cache SQL queries without changes to your application code. The only change needed is to swap out your primary Postgres database connection string with a Readyset connection string. Readset will connect to your primary database and register with the replication stream. After snapshotting your database, every query will be proxied through Readyset. Along with your monitoring tools from above, you can then also use Readyset to understand your query performance:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs09qsypucro1u2uc5ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhs09qsypucro1u2uc5ph.png" alt="Readyset displaying a table with query performance" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you have detected N+1 queries that can be cached (usually, ready-heavy queries are optimal candidates), you can start caching. In this case, we want to cache our posts queries. To do so, we just prepend &lt;code&gt;CREATE CACHE FROM&lt;/code&gt; to our query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE CACHE FROM SELECT * FROM posts WHERE user_id = ?;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The results from this query will now be served from Readyset with sub-millisecond latencies. Readyset monitors the replication stream from the primary database, looking for changes to the data, so the cache will be automatically updated whenever the underlying data is updated in the primary database.&lt;/p&gt;

&lt;p&gt;Readyset also works with ORMs to help increase performance and optimize those queries under the hood.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query design
&lt;/h3&gt;

&lt;p&gt;Just as you can optimize your system around your queries, if you aren’t using an ORM, you can also optimize your queries for better performance. Here are a few options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use Joins Appropriately: Utilize SQL joins to combine data from multiple tables in a single query. Understand the differences between &lt;code&gt;INNER JOIN&lt;/code&gt;, &lt;code&gt;LEFT JOIN&lt;/code&gt;, etc., and choose the most appropriate use case to avoid unnecessary data retrieval.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select Only Required Columns: Specify the columns you need in your &lt;code&gt;SELECT&lt;/code&gt; statements rather than using &lt;code&gt;SELECT *&lt;/code&gt;. This reduces the amount of data transferred and processed, making the query more efficient.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Understand and Use Indices: Ensure your queries effectively leverage indices, especially for columns in &lt;code&gt;JOIN&lt;/code&gt;, &lt;code&gt;WHERE&lt;/code&gt;, or &lt;code&gt;ORDER BY&lt;/code&gt; clauses. Regularly review and optimize indices based on query patterns and data changes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Avoid Looping Queries: Identify scenarios where your code iteratively executes queries (like in a loop) and refactor them to use bulk data retrieval techniques. Replace multiple small queries with fewer, more comprehensive queries.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Limit Data with Pagination: When dealing with large datasets, use pagination to limit the data retrieved and processed in a single query. This improves both database and application performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thus, to optimize the basic queries from above, you could add a join into your query to fetch all the data at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT u.id, u.name, p.content
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Solving the N+1 Puzzle
&lt;/h2&gt;

&lt;p&gt;N+1 queries can destroy your application’s performance as you scale. What’s worse is that this issue can be abstracted from you by ORMs and other high-level frameworks, making them harder to detect and resolve. But with monitoring tools and a better understanding of the problem, you can detect these troublesome queries and work to improve your query design and optimize your performance.&lt;/p&gt;

&lt;p&gt;At Readyset, we are building ways to help developers and platform engineers do just that. By caching queries, you can significantly improve your response latencies while still serving your users' fresh data. If this sounds like something that will help you, sign up for &lt;a href="https://readyset.io/?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;Readyset Cloud&lt;/a&gt; or &lt;a href="https://join.slack.com/t/readysetcommunity/shared_invite/zt-2272gtiz4-0024xeRJUPGWlRETQrGkFw"&gt;reach out to us&lt;/a&gt; if you have any questions.&lt;/p&gt;

</description>
      <category>caching</category>
      <category>postgres</category>
      <category>postgressql</category>
      <category>database</category>
    </item>
    <item>
      <title>A Practical Guide to Caching: What to Cache and When</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Mon, 05 Feb 2024 18:49:24 +0000</pubDate>
      <link>https://dev.to/readysettech/a-practical-guide-to-caching-what-to-cache-and-when-47ha</link>
      <guid>https://dev.to/readysettech/a-practical-guide-to-caching-what-to-cache-and-when-47ha</guid>
      <description>&lt;p&gt;Caching is a Goldilocks problem. Caching too much leads to stale data, memory bloat, and a lot of cache management. Caching too little leads to longer latencies, higher database load, and the need to provision more and more storage. &lt;/p&gt;

&lt;p&gt;The aim is to cache just right, optimize cache performance, minimize database load, and enhance overall system efficiency. &lt;/p&gt;

&lt;p&gt;This is a lot easier said than done. How do you know what to cache and when? Modern databases help you understand the usage patterns of your data, including the frequency of reads versus writes, the size of the data being accessed, and the variability of query response times. These metrics are critical in informing your caching strategy, enabling you to identify hotspots of frequent access where caching can provide the most benefit. Here, we want to take you through determining what you should cache and when using the insights your database can provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Analyzing Workload Patterns
&lt;/h2&gt;

&lt;p&gt;Understanding workload patterns in your production database is crucial for effective caching, as it directly influences what data to cache and the caching strategy to employ.&lt;/p&gt;

&lt;h3&gt;
  
  
  Read/Write Ratio Analysis
&lt;/h3&gt;

&lt;p&gt;The read/write ratio analysis is fundamental to understanding workload patterns in database systems. This ratio illustrates the frequency of read operations (like &lt;code&gt;SELECT&lt;/code&gt; queries) compared to write operations (&lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;). A thorough analysis of this ratio can guide decisions on what data to cache and how to manage the cache effectively.&lt;/p&gt;

&lt;p&gt;A high read ratio indicates that data is frequently accessed but only sometimes changed. This is ideal for caching because it doesn't require frequent invalidation or updates once data is cached. A high write ratio suggests more dynamic data, which can lead to frequent cache invalidations and reduced cache effectiveness.&lt;/p&gt;

&lt;p&gt;The basic steps for analyzing your read/write ratio are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data collection.&lt;/strong&gt; Use database monitoring tools or query logs to collect data on read and write operations. Some databases provide built-in tools or extensions (e.g., &lt;code&gt;pg_stat_statements&lt;/code&gt; in PostgreSQL) that can simplify this process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quantify operations.&lt;/strong&gt; Count the number of read and write operations over a given period. This can be done through script automation or database monitoring tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculate the ratio:&lt;/strong&gt; Compute the proportion of reads to writes. A simple formula could be: Read/Write Ratio = Number of Reads / Number of Writes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In Postgres, we can use the &lt;code&gt;pg_stat_statements&lt;/code&gt; extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then query &lt;code&gt;pg_stat_statements&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT query, calls, rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT %' OR query LIKE 'INSERT %' OR query LIKE 'UPDATE %' OR query LIKE 'DELETE %';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will show you the frequency of &lt;code&gt;SELECT&lt;/code&gt; statements versus &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt; statements. You can then calculate the read/write ratio from these counts.&lt;/p&gt;

&lt;p&gt;High read/write ratios (e.g., 10:1) indicate that the data is read ten times more often than it is written to. This data is a good candidate for caching. A low read/write ratio (e.g., 1:2) indicates more writes than reads. You need to be cautious about caching this data for this type of data, as the cache would need frequent updates or invalidations.&lt;/p&gt;

&lt;p&gt;For data with a high read ratio, you can cache and set a longer Time-To-Live (TTL) for that cache. If you are caching low read/write ratio data, you must design an efficient cache invalidation strategy to ensure data consistency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Temporal Locality and Hotspots
&lt;/h3&gt;

&lt;p&gt;Temporal locality is based on the principle that recently accessed data will likely be accessed again soon. This pattern is a key factor in identifying 'hotspots' in your data - areas where frequent access suggests a high potential benefit from caching. Understanding and identifying these hotspots allows for a more targeted and efficient caching strategy, improving performance and resource utilization.&lt;/p&gt;

&lt;p&gt;To identify hotspots, you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring access patterns.&lt;/strong&gt; Use database monitoring tools to track access frequency to different data elements. Look for patterns where specific rows, tables, or queries are accessed repeatedly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyze query logs.&lt;/strong&gt; Analyze logs for repeated access to specific queries. Frequent execution of the same query, especially within short time frames, indicates a hotspot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understand usage metrics.&lt;/strong&gt; Collect metrics such as the number of hits, execution time, and frequency of access for various database elements.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To understand the access frequency, you must configure your database to log detailed query information. In your &lt;code&gt;postgresql.conf&lt;/code&gt;, you can set the following parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;logging_collector = on # enable logging.
log_directory = 'pg_logs' # specify the directory for log files.
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # set the log file naming convention.
log_statement = 'all' # log every executed statement.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To analyze the logs, you can use something like &lt;a href="https://github.com/darold/pgbadger/tree/master"&gt;pgBadger&lt;/a&gt;. pgBadger will parse the log and create an HTML report showing detailed query statistics.&lt;/p&gt;

&lt;p&gt;Suppose you have a query log from a Postgres database and notice that specific user profiles are accessed frequently. Temporal locality suggests that user profiles accessed in the last day will likely be reaccessed. &lt;/p&gt;

&lt;p&gt;Data or queries identified as hotspots are strong candidates for caching. The user profile data in the above example should be prioritized for caching. This is how we cache at ReadySet, a process called &lt;a href="https://docs.readyset.io/concepts/overview"&gt;partial materialization&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can think of partial materialization as a demand-driven cache-filling mechanism. With it, only a subset of the query results are stored in memory based on common input parameters to the query. For example, if a query is parameterized on user IDs, then ReadySet would only cache the results of that query for the active subset of users, since they are the ones issuing requests.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This allows us to cache relevant data while also reducing memory overhead. The goal is to minimize latency and database load by caching data showing repeated access patterns over time. By focusing on the most frequently accessed data, you can significantly improve the performance of your database and applications, ensuring that resources are allocated efficiently and that the cache serves its purpose effectively. &lt;/p&gt;

&lt;h3&gt;
  
  
  Query Analysis and Profiling
&lt;/h3&gt;

&lt;p&gt;Query analysis and profiling are critical aspects of understanding and optimizing database performance. By using query analysis tools and techniques like the &lt;code&gt;EXPLAIN&lt;/code&gt; command and examining query execution plans, engineers can gain insights into how queries are executed, which are resource-intensive, and how they can be optimized. This information is invaluable in making informed decisions about caching strategies.&lt;/p&gt;

&lt;p&gt;An execution plan shows how the database engine executes a query, including steps like scans, joins, sorts, and aggregations. The &lt;code&gt;EXPLAIN&lt;/code&gt; command in SQL provides the execution plan for a query. It reveals the database's operations and their CPU and I/O cost.&lt;/p&gt;

&lt;p&gt;To use &lt;code&gt;EXPLAIN&lt;/code&gt; with Your Query, just prepend your SQL query with &lt;code&gt;EXPLAIN&lt;/code&gt;. For instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXPLAIN SELECT * FROM user_logs WHERE user_id = 1234;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will output the execution plan without actually running the query. You are looking for operations like sequential scans, index scans, sorts, and joins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Seq Scan on user_logs  (cost=0.00..155.00 rows=5000 width=132)
  Filter: (user_id = 1234)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this case, we have a sequential scan, indicating the query is scanning the entire &lt;code&gt;user_logs&lt;/code&gt; table. This can be inefficient for large tables. It also shows a filter is applied to the &lt;code&gt;user_id&lt;/code&gt;, which is inefficient if this operation is frequent or the table is large (you should use indexes instead).&lt;/p&gt;

&lt;p&gt;The estimated cost and number of rows affected help assess the query's efficiency. Cost is a unitless measurement used by Postgres to estimate the relative expense of executing the query. It combines I/O, CPU, and other factors to help compare the efficiency of different query plans. &lt;/p&gt;

&lt;p&gt;The lower the cost, the more efficient the query is expected to be. Queries that run frequently and have high costs are prime candidates for caching. Queries that involve significant data retrieval and minimal updates are ideal for caching.&lt;/p&gt;

&lt;p&gt;Effective query analysis and profiling allow engineers to identify performance bottlenecks and optimize query execution. This is a crucial step in determining which data or queries will benefit most from caching. Regularly profiling queries, especially resource-intensive or frequently executed, can significantly enhance database performance and overall application efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Query Latencies
&lt;/h3&gt;

&lt;p&gt;Query latencies, the time taken to execute database queries, are a crucial metric in workload analysis. They provide insight into the performance of the database and are instrumental in identifying which queries might benefit most from caching. High latencies often indicate bottlenecks or inefficiencies that can be alleviated through strategic caching. For engineers, measuring, analyzing, and interpreting query latencies are key to optimizing database performance.&lt;/p&gt;

&lt;p&gt;We can set &lt;code&gt;log_min_duration_statement&lt;/code&gt; in &lt;code&gt;postgresql.conf&lt;/code&gt; to log queries that exceed a specified execution time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set log_min_duration_statement=1000;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows us to look for queries with consistently high execution times, as these are primary candidates for optimization and caching. You want to be able to analyze patterns in this data to determine if high latencies are isolated incidents or part of a pattern. Recurring high latencies during certain operations or times indicate systematic issues.&lt;/p&gt;

&lt;p&gt;For example, after enabling query logging in PostgreSQL, you might find:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG:  duration:1050 ms  statement: SELECT * FROM products WHERE category = 'Electronics';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This indicates that the query took 1050 milliseconds, which might be considered high for this operation. High latencies often point to performance bottlenecks. These could be due to unoptimized queries, lack of appropriate indexing, or heavy load on the database. Queries with high read latencies and low write operations are excellent candidates for caching. For instance, caching its results could significantly improve performance if the product query above is read-heavy and the data doesn't change often.&lt;/p&gt;

&lt;p&gt;Regularly monitoring and analyzing query latencies ensure the caching strategy aligns with current usage patterns and performance requirements.&lt;/p&gt;

&lt;p&gt;Readyset shows you the latencies for each of the queries you are running through it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32901cjnn9qh14fxjqip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F32901cjnn9qh14fxjqip.png" alt="Readyset showing the 50p, 90p, and 99p latencies for query response times" width="800" height="452"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Readyset displays the 50p, 90p, and 99p latencies for query response times. They are key metrics used in performance analysis to understand the distribution of latencies in a system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;50p (50th percentile) latency.&lt;/strong&gt; Also known as the median latency. This means that 50% of your queries or requests are faster than this time, and 50% are slower. It gives a good indication of the typical experience for a user or system interaction.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;90p (90th percentile) latency.&lt;/strong&gt; This indicates that 90% of your queries or requests are faster than this time, and 10% are slower. This metric helps in understanding the experience of most of your users, excluding outliers that might be unusually slow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;99p (99th percentile) latency.&lt;/strong&gt; This shows that 99% of your queries or requests are faster than this time, and 1% are slower. It is useful for identifying the long tail of slow requests, which can be critical for understanding and improving users' experience with the worst performance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also graph this data for specific queries:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsvucick9cx6ggl5u2ad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsvucick9cx6ggl5u2ad.png" alt="Readyset displaying graph of query performance" width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can tell you which queries are performing poorly and whether this performance changes over time or has any pattern. Here, you can find the longest latency queries and immediately cache them with Readyset.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Example Caching Workflow With Readyset
&lt;/h2&gt;

&lt;p&gt;Imagine an e-commerce platform with a Postgres database facing performance issues. The platform has experienced high latencies and low throughput, particularly when accessing product information and user profiles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Analyzing Read/Write Ratios&lt;/strong&gt;&lt;br&gt;
We start by analyzing the read/write ratio to understand the nature of our database workload. First, we enable the &lt;code&gt;pg_stat_statements&lt;/code&gt; extension in Postgres to track query statistics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then query &lt;code&gt;pg_stat_statements&lt;/code&gt; to get insights into our read and write operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT query, calls, total_time, rows
FROM pg_stat_statements
WHERE query LIKE 'SELECT %' OR query LIKE 'INSERT %' OR query LIKE 'UPDATE %' OR query LIKE 'DELETE %';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After analyzing the data, we find a high read/write ratio for product information queries (e.g., 15:1), indicating these are read-heavy and good candidates for caching. Conversely, user profile updates show a lower read/write ratio (e.g., 2:1), suggesting frequent updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Identifying Temporal Locality and Hotspots&lt;/strong&gt;&lt;br&gt;
We use database monitoring tools to track access patterns and identify hotspots. In the &lt;code&gt;postgresql.conf&lt;/code&gt;, we set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;logging_collector = on
log_directory = 'pg_logs'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_statement = 'all'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;set log_min_duration_statement=1000;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use pgBadger to analyze the logs. The report shows that queries related to product categories like 'Electronics' are accessed frequently, indicating a hotspot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Query Analysis and Profiling&lt;/strong&gt;&lt;br&gt;
We use the &lt;code&gt;EXPLAIN&lt;/code&gt; command to analyze query plans.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXPLAIN SELECT * FROM products WHERE category = 'Electronics';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The analysis reveals inefficient sequential scans, suggesting the need for better indexing and potential caching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Addressing Query Latencies&lt;/strong&gt;&lt;br&gt;
We observe the query latencies in the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG:  duration: 1050 ms  statement: SELECT * FROM products WHERE category = 'Electronics';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This high latency is a clear indicator of a performance bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Implementing Caching&lt;/strong&gt;&lt;br&gt;
Based on our analysis, we decide to cache the frequently accessed product information.&lt;/p&gt;

&lt;p&gt;We can do this through Readyset. Readyset requires no changes to our application code. All we have to do is connect Readyset to our Postgres database and then use our Readyset connection string in our application instead of the credentials for the primary Postgres database. &lt;/p&gt;

&lt;p&gt;Once this is set up, all the queries from your application to your database will initially be proxied through Readyset. Then you can start caching. In this case, we want to cache our electronics product information. To do so, we just prepend &lt;code&gt;CREATE CACHE FROM&lt;/code&gt; to our query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE CACHE FROM SELECT * FROM products WHERE category = 'Electronics';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The results from this query will now be served from Readyset with sub-millisecond latencies. As Readyset monitors the replication stream from the primary database, looking for changes to the data, the cache will be automatically updated whenever the underlying data is updated in the primary database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Monitoring and Continuous Improvement&lt;/strong&gt;&lt;br&gt;
We continuously monitor query performance using Readyset, which shows us the latencies for each cached query. By analyzing the 50th, 90th, and 99th percentile latencies, we adjust our caching strategies to ensure optimal performance.&lt;/p&gt;

&lt;p&gt;By following this approach, we significantly improved the e-commerce platform's performance. The product information queries' latency dropped, and the overall system throughput increased. This real-world example demonstrates the impact of strategic caching in resolving performance bottlenecks in a database-driven application.&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding Data Characteristics
&lt;/h2&gt;

&lt;p&gt;The data itself will influence caching strategies. Different data types have unique characteristics that affect how they should be cached. Data size, consistency requirements, and business needs are critical in determining the most effective caching approach. Understanding these characteristics is essential to ensure that the caching strategy aligns with the nature of the data being handled.&lt;/p&gt;
&lt;h3&gt;
  
  
  Object Size
&lt;/h3&gt;

&lt;p&gt;The size of objects, including individual data items and the results of database queries, is a crucial factor to consider. Object size directly impacts cache storage efficiency, access speed, and overall system performance.&lt;/p&gt;

&lt;p&gt;Object Size here could mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data item size. This refers to the size of individual data items, like rows in a database. Large data items can consume significant cache space, potentially reducing the overall effectiveness of the cache.&lt;/li&gt;
&lt;li&gt;Query result size. The size of data returned by database queries. Some queries might return large datasets, which, when cached, can take up substantial space.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many databases offer functions to measure the size of rows or tables. In Postgres, you can use &lt;code&gt;pg_column_size&lt;/code&gt; to find the size of a column’s value or &lt;code&gt;pg_total_relation_size&lt;/code&gt; for the size of a table. E.g.:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT pg_column_size(*) FROM user_data WHERE user_id = 123;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Suppose you have a table &lt;code&gt;user_data&lt;/code&gt; in a Postgres database, and you frequently query user profiles. Analyzing the size of these profiles helps determine if they should be cached:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT AVG(pg_column_size(*)) FROM user_data;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the average size is relatively small, caching individual profiles might be efficient. However, if the size is large, consider caching subsets of data or using a different strategy, like partial caching.&lt;/p&gt;

&lt;p&gt;Be cautious when caching large objects, as they can lead to rapid cache eviction of other items. This can diminish the effectiveness of the cache. If objects are too large to be effectively held in memory, disk-based caching might be more appropriate, though it's slower.&lt;/p&gt;

&lt;p&gt;Object size factors into your cache configuration. Consider using eviction policies like Least Recently Used (LRU) for larger objects to maintain cache efficiency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consistency Requirements
&lt;/h3&gt;

&lt;p&gt;Consistency here refers to how up-to-date and synchronized the cached data is with the underlying data source. Understanding whether your application needs strict or eventual consistency will guide you in choosing the right caching mechanisms and policies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strict Consistency&lt;/strong&gt;&lt;br&gt;
Strict consistency guarantees that all transactions are executed sequentially, and any read reflects the latest write. It provides the strongest consistency guarantee but can be expensive and impact performance. It is ideal for mission-critical applications where data integrity is paramount. Financial applications, where transaction integrity is crucial, typically require strict consistency. An account balance, for example, must always reflect the most recent transactions.&lt;/p&gt;

&lt;p&gt;Two types of caching strategies for strict consistency are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write-through cache.&lt;/strong&gt; This strategy involves writing data to both the cache and the database simultaneously. It ensures that the cache always has the most up-to-date data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache invalidation.&lt;/strong&gt; Another approach is to invalidate the relevant cache entries immediately upon any data modification in the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strict consistency ensures data integrity but can lead to higher latency and increased load on the database due to frequent cache updates or invalidations. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eventual Consistency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With eventual consistency, writes are eventually propagated to all replicas, but there is no guarantee of how long it takes. Reads may reflect older writes or incomplete data until the replicas are synchronized. Eventual consistency offers high availability and scalability but can lead to inconsistencies for a short period.  Social media platforms, where seeing the most up-to-date data is not always critical (e.g., a user's number of followers), can use eventual consistency.&lt;/p&gt;

&lt;p&gt;Possible caching strategies for eventual consistency are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time-To-Live (TTL):&lt;/strong&gt; Cached data is given a TTL, after which it is either refreshed from the database or evicted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy Loading:&lt;/strong&gt; Data is updated in the cache only when requested, leading to potential staleness but reduced load on the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventual consistency improves read performance, reduces database load, and can serve stale data.&lt;/p&gt;

&lt;p&gt;ReadySet is &lt;a href="https://docs.readyset.io/concepts/overview"&gt;eventually consistent&lt;/a&gt;, as each new write requires reconstructing the dataflow graph for impacted queries.&lt;/p&gt;

&lt;p&gt;The choice between strict and eventual consistency in caching strategies hinges on the application's specific needs. Understanding these requirements is crucial for engineers to design a caching system that balances data integrity with performance and efficiency. By carefully evaluating the nature of the data and its usage patterns, a suitable caching approach can be devised that optimally serves the application's needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Criticality
&lt;/h3&gt;

&lt;p&gt;Finally, there is a more qualitative characteristic–business criticality. It involves assessing which data is crucial for the functionality and performance of your application and should, therefore, be prioritized in caching strategies. This assessment often requires balancing the need for speed against the tolerance for staleness.&lt;/p&gt;

&lt;p&gt;This is mostly about understanding what data is important. Data that significantly impacts the user experience, such as personalized content, frequently accessed user profiles, or dashboard metrics, is often critical for caching. Data that is vital for daily business operations, like transaction records in a financial application or inventory levels in a retail system, should be prioritized for caching.&lt;/p&gt;

&lt;h3&gt;
  
  
  A few examples of different types of data and their impact on caching are:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;E-Commerce Platforms:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product Catalog: Frequently accessed but infrequently updated, making it a good candidate for caching.&lt;/li&gt;
&lt;li&gt;User Shopping Carts: High business criticality but requires strict consistency due to the dynamic nature of the content.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Content Delivery Networks (CDNs):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Media Content: Images and videos are crucial for user experience, making them important to cache. They are typically static, allowing for longer TTL values.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Financial Applications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Account Balances: Require real-time accuracy, demanding a strict consistency approach in caching.&lt;/li&gt;
&lt;li&gt;Historical Transaction Data: Less critical for immediate operations, can be cached with eventual consistency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Within all of these, you are trying to balance speed and staleness. Data that contributes to faster load times and smoother user interactions should be cached, especially if it's accessed frequently. Then, you can think about staleness tolerance by determining how tolerant the business process or user experience is to stale data. For example, slightly outdated product recommendations may be acceptable, but outdated pricing information is not.&lt;/p&gt;

&lt;p&gt;This is ultimately a trade-off in terms of speed vs staleness and cost vs benefit. You must analyze the cost of caching (regarding resources and management) against its benefits to business operations and user satisfaction. Plus, you have to factor in developing an effective cache maintenance strategy to maintain data integrity without compromising performance. It requires a deep understanding of the technical aspects of caching and the business or user perspective of data importance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching Done Right
&lt;/h2&gt;

&lt;p&gt;It takes a lot to find the right data to cache. But it is worth it. The effort invested in identifying what to cache, understanding the nature of your data, and aligning your caching strategy with business needs pays off in multiple ways. When done right, caching can significantly improve application performance, enhance user experience, reduce database load, and ultimately contribute to the efficiency and scalability of your systems.&lt;/p&gt;

&lt;p&gt;At Readyset, we’re making caching easier. Once you’ve identified the data you need to cache, with Readyset’s help if needed, Readyset will start caching it automatically, giving you sub-millisecond query responses with no extra logic in your application. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloudv2-prod.us.auth0.com/u/signup?state=hKFo2SBWY213dWRmRFd0Y0pQcTkzTWFKQXVEbVk0TXpNYVgxQqFur3VuaXZlcnNhbC1sb2dpbqN0aWTZIGlqQ0ZkZ3FRQXJMaFpBTUZYdTBubVRyNTJ3b19SVDZio2NpZNkgZUhCb0ZhOHZDbDFHQjhYWGxCckJiMExpODNiMnROOHE&amp;amp;utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;Try Readyset Cloud&lt;/a&gt; or &lt;a href="https://readysetcommunity.slack.com/join/shared_invite/zt-2272gtiz4-0024xeRJUPGWlRETQrGkFw?&amp;amp;utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;reach out to us&lt;/a&gt; if you have any questions.&lt;/p&gt;

</description>
      <category>caching</category>
      <category>database</category>
      <category>postgressql</category>
      <category>sql</category>
    </item>
    <item>
      <title>Stateful Property Testing in Rust</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Fri, 26 Jan 2024 16:55:34 +0000</pubDate>
      <link>https://dev.to/readysettech/stateful-property-testing-in-rust-pnn</link>
      <guid>https://dev.to/readysettech/stateful-property-testing-in-rust-pnn</guid>
      <description>&lt;p&gt;How ReadySet is using stateful property testing to to find subtle bugs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlbxfehsa6cc5skx5h5e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlbxfehsa6cc5skx5h5e.png" alt="Image description" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most developers have written unit tests: you feed your code some inputs and verify the outputs. Property tests automate this process. Instead of manually creating inputs, your test generates them for you and validates the input/output pairs against a set of properties you define. These can be tricky to write since you have to develop good generation strategies and find general properties to test (rather than just hardcoding individual test case results), but they can also help find edge cases you might not have thought to write unit tests for.&lt;/p&gt;

&lt;p&gt;But what if you’re writing property tests for a system with internal state? Testing one input at a time may not be enough anymore – you might need to run whole sequences of steps to test the way the system changes state over time.&lt;/p&gt;

&lt;p&gt;This is where stateful property testing comes in. We’ve had great successes with this technique at ReadySet and we’re excited to share it with you. To that end, we’ve written and released a general-purpose OSS library, called &lt;a href="https://crates.io/crates/proptest-stateful?ref=blog.readyset.io"&gt;proptest-stateful&lt;/a&gt;, that helps you reap the same benefits for your own projects. But before we talk more about our own experiences, let’s start by explaining the general concepts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is stateful property testing?
&lt;/h2&gt;

&lt;p&gt;Stateful property tests generate a &lt;em&gt;sequence&lt;/em&gt; of steps to run and then execute them one at a time, checking postconditions for each operation along the way. If a test fails, the test framework will try to remove individual steps in the test sequence to find a minimal failing case. This process is called “shrinking” and is helpful for making test failures easy to interpret. Randomly generated sequences of steps can be very long and complex, so shrinking helps separate out what actually &lt;em&gt;caused&lt;/em&gt; the bug.&lt;/p&gt;

&lt;p&gt;If you’ve written property tests before, you might be wondering what’s so difficult about this. Can’t you just write code to generate a step, then generate a random sequence of steps and run each one in turn?&lt;/p&gt;

&lt;p&gt;There are a few reasons that this approach falls short: later steps in the sequence may depend on the internal state generated by earlier steps. When shrinking the generated steps to a minimal failing case, you need to be careful that you don’t accidentally create test cases that fail for the wrong reasons.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Use Case: Counters
&lt;/h3&gt;

&lt;p&gt;Imagine you’re testing a simple library that maintains an unsigned integer counter. There are two API operations: increment and decrement. This is a contrived example, and is simplified to help illustrate the testing techniques, but in essence, this is a database.&lt;/p&gt;

&lt;p&gt;It’s a very small memory-only database with a very strict schema, but we’re still maintaining a set of data and updating it in response to client requests, just like any other database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;struct Counter { count: usize }

impl Counter {
    fn new(count: usize) -&amp;gt; Self { Counter { count } }
    fn inc(&amp;amp;mut self) { self.count += 1; }
    fn dec(&amp;amp;mut self) { self.count -= 1; }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you try to decrement the counter below 0, it will underflow and crash, which is a known limitation for this library. If you try to test this use case by generating a random sequence of increment/decrement operations, you will quickly run into this problem. As such, for each operation you generate, your generation code will need to consider whether it expects the counter value to be zero and if it does, make sure not to generate a decrement operation.&lt;/p&gt;

&lt;p&gt;We can define a simple model to keep track of the expected state like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;struct TestState { model_count: usize }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can keep the model count updated via a &lt;code&gt;next_state&lt;/code&gt; callback that gets run after generating each operation, and then use the model state to decide which operations we can safely generate next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fn next_state(&amp;amp;mut self, op: &amp;amp;Self::Operation) {
  match op {
    CounterOp::Inc =&amp;gt; self.model_count += 1,
    CounterOp::Dec =&amp;gt; self.model_count -= 1,
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fn op_generators(&amp;amp;self) -&amp;gt; Vec&amp;lt;Self::OperationStrategy&amp;gt; {
  if self.model_count &amp;gt; 0 {
    vec![Just(CounterOp::Dec).boxed(), Just(CounterOp::Inc).boxed()]
  } else {
    vec![Just(CounterOp::Inc).boxed()]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, if your test finds a bug and it tries to remove steps to find a minimum failing case, it’s possible that you might still trigger underflow in a simplified case even if the original test case did not. If the original case was “start at 0, increment, decrement, increment” then you can’t remove the first increment unless you also remove the decrement! To avoid this, we also must define preconditions via a separate callback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;preconditions_met(&amp;amp;self, op: &amp;amp;Self::Operation) -&amp;gt; bool {
    match op {
        CounterOp::Inc =&amp;gt; true,
        CounterOp::Dec =&amp;gt; self.model_count &amp;gt; 0,
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This logic may seem redundant with the code in &lt;code&gt;op_generators&lt;/code&gt;, and indeed, any test case generated by &lt;code&gt;op_generators&lt;/code&gt; that doesn’t pass the precondition test will be filtered out. However, without the logic in &lt;code&gt;op_generators&lt;/code&gt; to avoid generating invalid test cases, it can be slow and inefficient to randomly stumble across cases that pass all the precondition checks. If you have familiarity with property testing in general, similar caveats apply here as with test-case filtering.&lt;/p&gt;

&lt;p&gt;The upshot here is that the framework can reuse the same model during runtime, so you can also easily check the real-world counter value after each operation to verify that it matches the value you expect. For a deeper dive into this example, check out the &lt;a href="https://github.com/readysettech/proptest-stateful/?ref=blog.readyset.io"&gt;README&lt;/a&gt;, or jump straight into a &lt;a href="https://github.com/readysettech/proptest-stateful/blob/main/tests/counter.rs?ref=blog.readyset.io"&gt;full implementation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now let’s take a look at some real-world bugs we’ve found using these techniques, many of which would’ve been nearly impossible to find with more conventional tests.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug Spotlight
&lt;/h2&gt;

&lt;p&gt;We have a stateful property test suite that tests replication of data and DDL commands from a primary database into ReadySet’s caching system. Because ReadySet automatically keeps cached query results up-to-date in response to new data and schema changes, it’s critical to make sure that there are no corner cases where the cached data might lose synchronization with the database.&lt;/p&gt;

&lt;p&gt;To search for edge cases, this test suite generates sequences of DDL and DML statements and runs them against both Postgres and against ReadySet, checking after each step that ReadySet’s cached data matches that of the database without ReadySet in the mix. For this test, the model state is used to make sure that the DML we generate lines up with the tables we’ve created and their corresponding schemas.&lt;/p&gt;

&lt;p&gt;You can take a look at the &lt;a href="https://github.com/readysettech/readyset/blob/main/replicators/tests/ddl_vertical.rs?ref=blog.readyset.io"&gt;source code&lt;/a&gt; – it’s already found a plethora of bugs since its inception earlier this year, including some really subtle edge cases that would’ve been nightmares to track down in production. Let’s dive into some examples:&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug #1: Starting Small
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE t(i int);
ALTER TABLE t RENAME COLUMN i TO j;
CREATE VIEW v AS SELECT * FROM t;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a minimal failing case found by our test. It creates a table with a single column, renames the column, then creates a view referencing the table.&lt;/p&gt;

&lt;p&gt;We should be able to run queries against the ‘v’ view in ReadySet, but trying to do so resulted in a cryptic error about internal invariant violations. Something about this specific combination of steps was triggering a bug.&lt;/p&gt;

&lt;p&gt;The Fix: It turned out that there were certain types of column changes we weren’t replicating due to an overly broad filter clause in one of the internal queries ReadySet performs against PostgreSQL. Luckily, &lt;a href="https://github.com/readysettech/readyset/commit/be61d40127a2b6f65763bfd9e3392aecb0863b4e?ref=blog.readyset.io"&gt;the fix&lt;/a&gt; turned out to be a straightforward one-line change.&lt;/p&gt;

&lt;p&gt;Creating a view that selected from a table that had previously had a column renamed wasn’t something we’d thought to specifically test with any unit tests or integration suites. With millions of possible edge cases like this, enumerating every case via a unit test is impossible.&lt;/p&gt;

&lt;p&gt;While stateful property testing quickly found a failure, the initial test case still had thirteen different steps. Many of the steps were unrelated and obfuscated the root cause of the bug. Shrinking saved us hours of painful debugging - the minimal case had only three steps, making it clear that the problem was with how we were replicating &lt;code&gt;ALTER COLUMN&lt;/code&gt; commands.&lt;/p&gt;

&lt;p&gt;Okay, simple enough, right? Next up we have this gem:&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug #2: Ramping Up
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE cats (id INT);
CREATE TABLE dogs (id INT);
INSERT INTO cats VALUES (1);
ALTER TABLE dogs ADD COLUMN bark VARCHAR;
DELETE FROM cats WHERE id = 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we created two tables named “cats” and “dogs”. We inserted a value into “cats”, added a column to”dogs”, and then deleted the value we inserted into “cats”.&lt;/p&gt;

&lt;p&gt;When run against ReadySet in quick succession, the row of (1) would still show up in the ‘cats’ table forever. The row should have been deleted, but somehow the &lt;code&gt;DELETE&lt;/code&gt; was getting silently dropped.&lt;/p&gt;

&lt;p&gt;The oddest part? We’re not doing anything unusual to the cats table at all! We just create the table, insert a row, and delete a row.&lt;/p&gt;

&lt;p&gt;This was so odd that at first it seemed like there was something wrong with the test. It turned out this bug was easy to reproduce though, and the &lt;code&gt;DELETE&lt;/code&gt; was only lost if it happened to coincide with a separate &lt;code&gt;ALTER TABLE&lt;/code&gt; statement on a completely unrelated table!&lt;/p&gt;

&lt;p&gt;There weren’t any server-side errors that showed up alongside this bug, but because the test suite actively checks table contents as a postcondition of each test step, we were still able to detect that the expected state of the system failed to match up with the actual results. This would’ve been extremely difficult to diagnose in production since this bug left behind no evidence of when or why it was triggered, so generating a minimal set of steps to reproduce this was an incredible win for us.&lt;/p&gt;

&lt;p&gt;The Fix: This edge case was being triggered by the ‘dogs’ table being altered before any rows were written to it. Because no data had been replicated to ‘dogs’, ReadySet had not yet created some internal metadata for the table. This missing table metadata fooled ReadySet into thinking it was in the midst of a full re-snapshot of all tables. ReadySet drops outdated replication events while re-snapshotting, as we are already recopying the entire table from scratch, and the deletion fell into that window. These dropped events would normally be redundant, but since ‘cats’ wasn’t actually being re-snapshotted, the events were simply lost, and the cache fell out of sync with the database.&lt;/p&gt;

&lt;p&gt;Resolving this bug was trickier than bug #1, but we still were able to develop fixes pretty quickly (check &lt;a href="https://github.com/readysettech/readyset/commit/2cbb7103fbba1854b3a409df325a8f5e45991502?ref=blog.readyset.io"&gt;here&lt;/a&gt; and &lt;a href="https://github.com/readysettech/readyset/commit/fb1b1c16cd510cd0650cbb8333cb9f3e2597e87e?ref=blog.readyset.io"&gt;here&lt;/a&gt; for more!).&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug #3: Best For Last
&lt;/h3&gt;

&lt;p&gt;In this case we &lt;em&gt;already knew&lt;/em&gt; a bug existed, but we didn’t know what caused it. ReadySet was crashing when a specific user tried to run a series of database migrations, but it wasn’t clear why the crash was occurring.&lt;/p&gt;

&lt;p&gt;This is actually a pretty common occurrence: your user has a verifiable bug, but finding a &lt;em&gt;minimal&lt;/em&gt; repro is the hard part. Fortunately, that’s just what stateful property testing does.&lt;/p&gt;

&lt;p&gt;We had just added &lt;code&gt;ENUM&lt;/code&gt; support for a user and we did know this bug had something to do with &lt;code&gt;ENUM&lt;/code&gt; types, so we added support to the test suite for creating and altering enum types. In short order, this minimal failing case showed right up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TYPE et AS ENUM ('a');
CREATE TABLE t (e et);
DROP TABLE t;
ALTER TYPE et ADD VALUE 'b';
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, we create a custom enum type, use it in a table, then drop that table and alter the type.&lt;/p&gt;

&lt;p&gt;Playing around with this, it became immediately clear why this was a tricky bug to reproduce. It only occurs if you alter a type after dropping a table that had used that type in its schema. The issue is that ReadySet maintains a list of associated caches for each custom type, and we had simply neglected to update that list when dropping a table. When we then altered the type, ReadySet would try to invalidate a cache that didn’t exist and crash (see &lt;a href="https://github.com/readysettech/readyset/commit/abacf64c79dc5cb32a42f7d72e09d5967c3472ac?ref=blog.readyset.io"&gt;here&lt;/a&gt; for the fix).&lt;/p&gt;

&lt;p&gt;Hard to find by hand, but no sweat for a stateful property test!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;I hope you’ve enjoyed this post, and I hope you’re as excited as we are about making property tests stateful.&lt;/p&gt;

&lt;p&gt;As you can see, this is a great technique for database systems, but it can be useful for any other kind of stateful application as well. Server software with internal state can be an obvious fit, but there’s no need to limit yourself to backend server systems like this. People have used these techniques for everything from game engines to automotive software, so the sky’s the limit!&lt;/p&gt;

&lt;p&gt;We want to share this technique with the world, so we’ve released an open source &lt;a href="https://crates.io/crates/proptest-stateful?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;proptest-stateful crate on crates.io&lt;/a&gt; and &lt;a href="https://github.com/readysettech/proptest-stateful/?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;in our GitHub repo&lt;/a&gt;, which anyone can use to write their own stateful property tests in Rust. And, of course, if writing these kinds of tests yourself sounds exciting, don’t hesitate to take a look at our &lt;a href="https://readyset.io/about?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;careers page&lt;/a&gt;. Until next time!&lt;/p&gt;

</description>
      <category>rust</category>
      <category>database</category>
    </item>
    <item>
      <title>How Database Replication Works Under the Hood</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Mon, 22 Jan 2024 16:43:31 +0000</pubDate>
      <link>https://dev.to/readysettech/how-database-replication-works-under-the-hood-4deh</link>
      <guid>https://dev.to/readysettech/how-database-replication-works-under-the-hood-4deh</guid>
      <description>&lt;p&gt;Replicating data is a process you will encounter if you build a product. At the very least, you’ll want some kind of backup for your primary database, either through taking a snapshot or having a secondary “follower” database mirroring your primary data. You might also replicate data into a data warehouse for analysis, copy data to read-only replicas for load balancing, or replicate data while &lt;a href="https://knock.app/blog/zero-downtime-postgres-upgrades?ref=blog.readyset.io"&gt;performing&lt;/a&gt; &lt;a href="https://archive.ph/K5ZuJ?ref=blog.readyset.io"&gt;upgrades&lt;/a&gt; to your infrastructure.&lt;/p&gt;

&lt;p&gt;That’s to say, database replication is a ubiquitous and helpful tool in database administration. So how does it work? How do you get data from one database to another? &lt;/p&gt;

&lt;p&gt;It’s not quite as simple as just copying the data. If we were to do that, replication would become critically slow as the total amount of data increases. Instead, replication takes advantage of the built-in system for guaranteeing the atomicity and durability of your database commits–the write-ahead log (WAL).&lt;/p&gt;

&lt;p&gt;Here, we want to show you how the WAL works in databases and how it is used for replication.&lt;/p&gt;

&lt;h2&gt;
  
  
  The WAL is the Database
&lt;/h2&gt;

&lt;p&gt;In relational databases, we model data as tables like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4jiz3l84ori9879ddwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4jiz3l84ori9879ddwc.png" alt="Image of model data table" width="472" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Tables have rows and columns, where each row represents a record, and each column represents a field within that record. This data-oriented view of databases makes sense for users, as we mostly care about the data in the database and how different fields relate.&lt;/p&gt;

&lt;p&gt;But that isn’t the only way to think about a database. If you are trying to build a database, it makes more sense to see it from an operational perspective, where the database is a time-ordered sequence of events or changes. In that scenario, you’d just have a log of every transaction or modification recorded sequentially. &lt;/p&gt;

&lt;p&gt;The table above becomes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa19apz6kfpunj6mjo4t2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa19apz6kfpunj6mjo4t2.png" alt="log of every transaction or modification recorded sequentially" width="800" height="124"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we want to add another value, it is just appended to the log:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qn3cgv95mjtuaavy1o2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qn3cgv95mjtuaavy1o2.png" alt="display of another value being appended to the log" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The same goes for any other operations. If we update an entry, the UPDATE is added to the log, but we can also still see the initial value from the INSERT:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmmc3ps1t9v9rv82999u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwmmc3ps1t9v9rv82999u.png" alt="display of UPDATE added to the log" width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same with DELETE:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rzxacwd3j1iqryrfq1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rzxacwd3j1iqryrfq1l.png" alt="display of DELETE added to the log" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the end, our table looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy57xhwalrka67vy93hih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy57xhwalrka67vy93hih.png" alt="Data table after the changes listed above are made" width="424" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data only represent the current state of the database. We don’t know how it got there. The logs provide a historical account of changes. &lt;/p&gt;

&lt;p&gt;This chronological ledger is pivotal for both recovery and replication purposes. Write-ahead logging is a fundamental principle in database systems. Before any change is committed to the database files on disk, it is first logged in the WAL. This approach serves two primary purposes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Atomicity:&lt;/strong&gt; In the event of a system crash or failure, the WAL is used to replay transactions, ensuring that all completed transactions are preserved, and all incomplete transactions are rolled back, thus maintaining the database’s consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durability:&lt;/strong&gt; The WAL ensures that it is not lost once a transaction is committed. By writing changes to a log before applying them to the database, the WAL provides a fail-safe against data loss due to unexpected failures.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In essence, the WAL acts as the backbone of the database's state. The log is the first point of contact for any change, making it the most up-to-date record of the database's evolution.&lt;/p&gt;

&lt;p&gt;When it comes to database replication, logs are the source of truth. They provide a comprehensive and sequential record of all changes, making them ideal for replicating data across multiple systems. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Replication Process:&lt;/strong&gt; The primary database's log is continuously  streamed to followers in log-based replication. Each change recorded in the log is replicated in the secondary systems. This ensures the replicas are an exact copy of the primary database, mirroring every change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Replication:&lt;/strong&gt; With logs being the first to record any change, they enable near real-time replication. This is critical for applications requiring high data availability and up-to-date information across all nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency and Reliability:&lt;/strong&gt; Using logs for replication ensures that the data remains consistent across all replicas. Since the log records every transaction in the order they occur, the replication process respects this order, maintaining transactional integrity.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The log is more than a mere database recording mechanism; it is the authoritative ledger of all transactions and changes. Its role in replication is indispensable, particularly in systems like ReadySet, where maintaining sub-millisecond latencies for query caching is critical. By leveraging logs' detailed, sequential nature, such systems ensure real-time, consistent, and reliable replication.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logical Replication
&lt;/h2&gt;

&lt;p&gt;There are three main types of replication:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Physical Replication:&lt;/strong&gt; Involves byte-by-byte copying of the database, replicating the exact physical state of the primary database. This method is commonly used in streaming replication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot Replication:&lt;/strong&gt; Captures the state of a database at a particular point in time, akin to taking a "snapshot." This is less dynamic and is often used for less frequently changing data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logical Replication:&lt;/strong&gt; Focuses on replicating changes at a higher level–the level of database transactions. It allows for copying specific tables or rows and columns within tables, providing much more flexibility than physical replication.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Logical replication in database systems leverages the detailed information stored in database logs. In logical replication, instead of copying raw data, a replication agent reads the transaction logs and interprets the changes recorded in the logs. This interpretation converts the low-level format of the log (which is often storage and database-specific) into a high-level representation of the changes (like SQL statements or a similar format).&lt;/p&gt;

&lt;p&gt;After interpretation, the changes are queued. This queuing mechanism ensures that they are delivered to the target database in the correct order, maintaining transactional integrity and consistency. The final step involves applying these changes to the target database by executing the high-level change records (like SQL statements) on the target database.&lt;/p&gt;

&lt;p&gt;All this is achieved through a publish/subscribe model.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Publication:&lt;/strong&gt; In this model, the primary database defines a "publication," essentially a set of database changes– inserts, updates, deletes –that it is willing to publish.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscription:&lt;/strong&gt; The secondary systems, or subscribers, subscribe to these publications. They receive only the changes that are part of the subscribed publications, allowing for selective replication.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The publish/subscribe model in logical replication enables efficient distribution of data changes, as each subscriber can choose what data it needs. Subscribers can then receive updates in near real-time, making this method suitable for systems that require up-to-date information across multiple nodes.&lt;/p&gt;

&lt;p&gt;This is easy to set up. Let’s do so using Postgres. On two separate database clusters, set up two independent databases, one called “primary_db”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE DATABASE primary_db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The other is called “follower_db”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE DATABASE follower_db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In both, create a table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE replicated_table (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    value INT
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In primary_db, populate this table with some data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO replicated_table (name, value) VALUES ('Item1', 10);
INSERT INTO replicated_table (name, value) VALUES ('Item2', 20);
INSERT INTO replicated_table (name, value) VALUES ('Item3', 30);
INSERT INTO replicated_table (name, value) VALUES ('Item4', 40);
INSERT INTO replicated_table (name, value) VALUES ('Item5', 50);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you check the table in primary_db, you should see the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do the same in follower_db, it should still be empty:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
(0 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the tables created, we can create the publication. On primary_db, create a publication for this table using the CREATE PUBLICATION command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE PUBLICATION my_publication FOR TABLE replicated_table;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then check to see if that was successfully created by checking in the pg_publication_tables table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';   

    pubname               | schemaname    |   tablename             
----------------+------------+-----------------
 my_publication   | public            | replicated_table 
(1 row)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hop over to your follower_db and create a subscription to that publication using connection information from your primary database using CREATE SUBSCRIPTION:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE SUBSCRIPTION my_subscriber     
CONNECTION 'dbname=primary_db host=localhost port=5432'
PUBLICATION my_publication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(note: if you want to try this on a local version of postgres, create your publication manually using “SELECT pg_create_logical_replication_slot(my_'subscriber', 'pgoutput');” on publishing_database and then add “WITH (create_slot = false)” to your CREATE SUBSCRIPTION command above on subscribing_database. Otherwise, CREATE_SUBSCRIPTION will hang.)&lt;/p&gt;

&lt;p&gt;Now, if we check our table in follower_db again, we’ll see the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
(5 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we add a row to the table in primary_db:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO replicated_table (name, value) VALUES ('Item6', 60);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll immediately see the data in follower_db:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT * FROM replicated_table;
 id | name  | value 
----+-------+-------
  1 | Item1 |    10
  2 | Item2 |    20
  3 | Item3 |    30
  4 | Item4 |    40
  5 | Item5 |    50
  6 | Item6 |    60
(6 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our data is now being replicated. If we look into postgresql.log (this isn’t the write-ahead log, instead it is a log of what is happening with our postgres application), we can see what has happened:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;STATEMENT:  CREATE_REPLICATION_SLOT "pg_17529_sync_17519_7307265844192364778" LOGICAL pgoutput (SNAPSHOT 'use')
LOG:  starting logical decoding for slot "pg_17529_sync_17519_7307265844192364778"
DETAIL:  Streaming transactions committing after 0/A90190B0, reading WAL from 0/A9019078.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The STATEMENT shows that we’ve created a new logical replication slot named pg_17529_sync_17519_7307265844192364778. The replication slot is created with the pgoutput plugin, the built-in logical decoding output plugin provided by PostgreSQL, used for logical replication. The “SNAPSHOT 'use'” part indicates that this slot will use the snapshot associated with the transaction running CREATE_REPLICATION_SLOT. So, this has captured the state of the database at a particular point in time.&lt;/p&gt;

&lt;p&gt;The LOG and DETAIL indicate that logical decoding has started for the newly created replication slot. The message “Streaming transactions committing after 0/A90190B0, reading WAL from 0/A9019078” specifies the WAL location from which the replication is starting. &lt;/p&gt;

&lt;p&gt;This means it will start streaming changes committed after the WAL location 0/A90190B0, and it's currently reading from the WAL location 0/A9019078. In WAL format of older versions of Postgres (&amp;lt;=9.2), the 0 is the segment of the WAL log, and A90190B0/A9019078 are the offsets within that segment. &lt;a href="https://pgpedia.info/x/xlogrecptr.html?ref=blog.readyset.io"&gt;In newer versions&lt;/a&gt;, you get a single 64-bit integer pointer instead of two 32-bit integers&lt;/p&gt;

&lt;h2&gt;
  
  
  How ReadySet Uses Logical Replication
&lt;/h2&gt;

&lt;p&gt;In most caching mechanisms, replication isn’t used. Instead, developers must &lt;a href="https://blog.readyset.io/dont-use-kv-stores/"&gt;write their caching logic in the application layer&lt;/a&gt; to transfer data from a database to a cache.&lt;/p&gt;

&lt;p&gt;ReadySet works differently by taking advantage of the mechanism described above. ReadySet registers itself as a consumer of a database’s replication stream, receiving the same information as a read replica–the log of changes made to the primary upstream database.&lt;/p&gt;

&lt;p&gt;When you initially register a database with ReadySet, we take a snapshot of that database. ReadySet applies every write from the replication log to this copy of the upstream database. Cache misses don't go to the upstream database. Instead, the result is computed using the &lt;a href="https://docs.readyset.io/concepts/streaming-dataflow?ref=blog.readyset.io"&gt;dataflow graph&lt;/a&gt;, and the result is cached. If any updated data in the replication stream is part of the dataflow graph for cached results, ReadySet will update the cache using this new data.&lt;/p&gt;

&lt;p&gt;In the context of ReadySet, logical replication is not just about copying data; it's about maintaining high efficiency and performance in query caching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sub-millisecond Latencies:&lt;/strong&gt; Using logical replication, ReadySet can ensure that the cache is updated almost instantly with changes from the primary database, maintaining sub-millisecond latencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistency with Primary Data:&lt;/strong&gt; The real-time nature of logical replication means that the cache is always consistent with the primary database, ensuring that the users get up-to-date and accurate data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Logical replication with ReadySet provides the flexibility, speed, and accuracy required for web-scale products. If you want to start with ReadySet, sign up for &lt;a href="https://readyset.io/early-access?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;early access to ReadySet Cloud&lt;/a&gt; today!&lt;/p&gt;

</description>
      <category>database</category>
      <category>devops</category>
    </item>
    <item>
      <title>Medical Joyworks Improves Page Load Times by 500% With ReadySet</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Wed, 20 Dec 2023 18:18:49 +0000</pubDate>
      <link>https://dev.to/readysettech/medical-joyworks-improves-page-load-times-by-500-with-readyset-5hlo</link>
      <guid>https://dev.to/readysettech/medical-joyworks-improves-page-load-times-by-500-with-readyset-5hlo</guid>
      <description>&lt;h2&gt;
  
  
  About Medical Joyworks
&lt;/h2&gt;

&lt;p&gt;Medical Joyworks is a medical education company specializing in gamified learning, catering to a diverse audience, ranging from medical students to pharmaceutical companies. They elevate healthcare education through immersive case studies and user conversations, all seamlessly accessible via their app. Medical Joyworks also curates an active newsletter, reaching an audience of over 700,000 subscribers. &lt;/p&gt;

&lt;h2&gt;
  
  
  Medical Joyworks Wanted a Custom Solution
&lt;/h2&gt;

&lt;p&gt;With their subscriber list growing rapidly, Medical Joyworks turned to a custom solution for email delivery and analysis for several reasons: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;To configure the delivery of custom newsletter content tailored to each subscriber’s specialty interest.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To avoid the escalating costs associated with conventional email services (popular platforms charge upwards of $4,000/month for 200,000 contacts, and their pricing scales up for enterprise usage). &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To gain the flexibility of switching between various email delivery providers such as Amazon SES and Postmark, depending on real time performance. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, even with these advantages, the company encountered multiple challenges around tuning their database to perform well in this new context.  &lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenges of Database Performance Optimization
&lt;/h2&gt;

&lt;p&gt;The Medical Joyworks team was using Postgres and was running into slow page load times for their campaigns and contacts page, designed for administrators to analyze newsletter performance metrics. These delays were primarily attributed to:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The execution of multiple complex database queries to calculate metrics like open rates and link click rates. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple database requests to display campaign and open rate data for each user. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While Medical Joyworks could optimize the number of queries and introduce a different algorithm to reduce page load times, implementation would demand weeks of effort, straining the team’s bandwidth. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead, they had ReadySet up and running in 15 minutes, improving page load times by 500%. &lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Opting for an expedited solution, Medical Joyworks implemented ReadySet, swapping out a connection string in their Laravel app’s database configuration file and deploying in about 15 minutes. &lt;/p&gt;

&lt;p&gt;The result? An impressive 500% improvement in page load times, achieved by caching a few queries - a process that would have taken weeks to implement manually. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Analysis is easier with ReadySet. When we are looking for patterns with different variables/filtering options, the pages load much faster.” &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By introducing ReadySet into their stack, Medical Joyworks has found a solution for a faster and more efficient analysis of vital newsletter delivery data for their expanding user base of 700,000 subscribers. Whether it’s refining open rates to contacts within specific specialties or examining various other metrics, page loads are now consistently snappy due to their usage of ReadySet. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://readyset.io/early-access?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;Explore how ReadySet can enhance your stack here&lt;/a&gt;. &lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>Finding Slow Queries in Postgres Using Datadog</title>
      <dc:creator>ReadySet</dc:creator>
      <pubDate>Thu, 14 Dec 2023 16:04:31 +0000</pubDate>
      <link>https://dev.to/readysettech/finding-slow-queries-in-postgres-using-datadog-4o45</link>
      <guid>https://dev.to/readysettech/finding-slow-queries-in-postgres-using-datadog-4o45</guid>
      <description>&lt;p&gt;You’re starting to notice lags in your application's performance. Page loads are slowing, and the user experience is being impacted. At the same time, your AWS bills are also starting to creep up.&lt;/p&gt;

&lt;p&gt;These can be the tell-tale signs of serious database performance issues. Your queries are taking too long to execute and are using too many resources when they do. The answer most people move to is to start caching queries to reduce the load on your database. This is exactly what we built ReadySet for.&lt;/p&gt;

&lt;p&gt;But which queries? Databases like Postgres have some &lt;a href="https://www.postgresql.org/docs/current/monitoring.html?ref=blog.readyset.io" rel="noopener noreferrer"&gt;performance monitoring capabilities&lt;/a&gt;, but these are clunky to use, and you’ll end up knee-deep in statistics configurations. The answer is to add application performance monitoring (APMs) into your stack. APMs, like Datadog, make it easier to monitor your database performance and can give you insights into how your database and queries perform. &lt;/p&gt;

&lt;p&gt;Here, we’ll show how you can use &lt;a href="https://www.datadoghq.com/?ref=blog.readyset.io" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt; and query metrics to understand your queries better and prioritize candidate queries for caching. &lt;/p&gt;

&lt;h2&gt;
  
  
  Use Query Metrics to Identify High-Load/High-Call Queries
&lt;/h2&gt;

&lt;p&gt;The Datadog &lt;a href="https://docs.datadoghq.com/database_monitoring/query_metrics/?ref=blog.readyset.io" rel="noopener noreferrer"&gt;Query Metrics&lt;/a&gt; dashboard provides insights into the performance of normalized queries. You can visualize performance trends and filter and group queries to analyze how queries perform. The dashboard graphs key metrics like requests and average latency and gives metrics for timing, requests, and the amount of data pulled per normalized query.&lt;/p&gt;

&lt;p&gt;Let’s go through how these can be used to find good caching candidates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using AVG LATENCY and ROWS/QUERY to identify load
&lt;/h2&gt;

&lt;p&gt;High average latency can indicate that a query could be more efficient or that the database is struggling to execute it quickly, leading to performance bottlenecks. &lt;/p&gt;

&lt;p&gt;If we look at this Query Metrics dashboard, we can see that there are a few queries with high average latency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgia9yzoysycmz8atmgnb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgia9yzoysycmz8atmgnb.png" alt="Datadog query metrics dashboard, showing a few queries with high average latency"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The line graph for AVG LATENCY (top right) shows the average time it takes for queries to execute over a specific period. Spikes in this graph indicate periods where queries are taking longer to execute, which can indicate high-load conditions or performance issues within the database.&lt;/p&gt;

&lt;p&gt;The two main spikes in the short time frame of the graph (one hour) both relate to write operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DELETE FROM UserTeams WHERE User_id = ? AND Team_id = ?. &lt;/li&gt;
&lt;li&gt;DELETE FROM Sessions WHERE Id = ?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As delete operations, these can’t be cached. However, looking at the other queries in the list, we can see some SELECT queries with long average latencies. For instance, the “SELECT s. Id s…” query seems to have a long latency, so could be a good candidate for caching.&lt;/p&gt;

&lt;p&gt;The cumulative time spent on this particular query (“TOTAL TIME”) and the proportion of the total database time that is consumed by this query (“PERCENT TIME”) is also high–over three-quarters of database time is spent on this query. It means it is a dominant consumer of database time and is likely to significantly contribute to performance issues.&lt;/p&gt;

&lt;p&gt;A further pointer in this direction is the average number of rows processed per query execution (“ROWS/QUERY”). A high number of rows per query can suggest that the query is processing a large amount of data, which might be optimized for better performance.&lt;/p&gt;

&lt;p&gt;This should be a query investigated for caching. And if it can’t be cached, it should be optimized to reduce the load on the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using REQUESTS to analyze query frequency
&lt;/h2&gt;

&lt;p&gt;Beyond the load of an individual query on the database, caching candidates can also come from analyzing query frequency.&lt;/p&gt;

&lt;p&gt;The frequency of queries is a critical aspect of database performance. High-frequency queries, especially if they are resource-intensive, can contribute significantly to the load on the database. If specific queries are executed often, they can become candidates for optimization, such as by improving the query's efficiency, adding indexes to speed up execution, or caching the results when possible to reduce the load on the database.&lt;/p&gt;

&lt;p&gt;In Datadog, we can look at REQUESTS to understand frequency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd6bzkwatnoyqtbfr97f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbd6bzkwatnoyqtbfr97f.png" alt="Datadog query metrics dashboard, showing requests overview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The REQUESTS graph (top left) displays the number of queries over time, segmented by normalized query. There are fluctuations in query frequency, which can indicate peak usage times or potential bursts of activity that could stress the database. The table below provides more detailed data on the frequency of specific queries, with the "REQUESTS" column showing the total number of executions for each query.&lt;/p&gt;

&lt;p&gt;This data helps identify which queries are run most frequently and may contribute to performance issues, particularly during peak activity periods, as shown in the graph. The two top SELECT queries have millions of requests between them. This means that they could be good options for caching, as optimizing them could lead to significant performance improvements. &lt;/p&gt;

&lt;p&gt;Frequent execution implies that even minor efficiencies gained per query can substantially reduce the overall database load. Caching can serve repeated requests without needing to access the database each time, thus freeing up resources for other operations and improving the application's responsiveness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Explain Plans to Evaluate Query Cost
&lt;/h2&gt;

&lt;p&gt;xplain plans are a great feature of database statistics that provide insight into how the SQL optimizer will execute a query. They show the query execution path chosen by the database, including which indexes will be used, how tables are joined, and the estimated cost associated with each operation. By understanding explain plans, developers can identify potential performance issues and optimize queries, leading to faster execution and more efficient resource use.&lt;/p&gt;

&lt;p&gt;You can generate an Explain plan within SQL using the &lt;a href="https://www.postgresql.org/docs/current/using-explain.html?ref=blog.readyset.io" rel="noopener noreferrer"&gt;EXPLAIN command&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzniuwtj29s91ageigbar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzniuwtj29s91ageigbar.png" alt="Explain command written in SQL"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, the startup cost of this query is 0, but the total cost is 458. That 458 is unitless but is a representation of the resources required to scan the table or index. It has to scan 10,000 rows, and each row is 244 bytes. By using EXPLAIN with different queries, you can understand how the SQL optimizer is traversing data and what optimizations can be made.&lt;/p&gt;

&lt;p&gt;Datadog provides an excellent visual representation of an explain plan:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0o8zbp1mjunewn4e3q7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq0o8zbp1mjunewn4e3q7.png" alt="Datadog's visual representation of an explain plan"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more complex queries, it visualizes the most efficient way to execute a given query by considering various query plans. This is a good proxy for latency and overall resource consumption of specific queries.&lt;/p&gt;

&lt;p&gt;In this specific plan, several "Hash Join" operations are shown, which are common in executing queries involving joining tables. The sequential scan ("Seq Scan") on pg_proc indicates a full scan of that table, which could be costly if the table is large.&lt;/p&gt;

&lt;p&gt;The importance of this data lies in the ability to analyze and optimize queries. Operations with high costs could be targets for optimization. For example, if a sequential scan has a high total cost, it might be beneficial to consider indexing the scanned table to reduce the cost. The plan rows and plan width can also inform decisions about indexes and query structure. &lt;/p&gt;

&lt;p&gt;Caching can also benefit high-cost queries as it reduces the need to perform expensive operations multiple times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitor Throughput and Latency During Peak Times to Identify Bottlenecks
&lt;/h2&gt;

&lt;p&gt;A database isn’t under constant load. Instead, load ebbs and flows over days, weeks, and months, depending on the type of service your application provides. Thus, you must identify peak load times to understand when caching data is most helpful.&lt;/p&gt;

&lt;p&gt;You can use Datadog’s real-time monitoring to observe query performance during high-traffic periods, use historical data to predict future peak periods, and prepare by caching the most impactful queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlw7bdet32xkkxl4ll61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxlw7bdet32xkkxl4ll61.png" alt="Datadog's real time monitoring dashboard, displaying query performance during high traffic periods"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By looking into the Query Details of a given query (that you have identified through explain plans and analysis of the metrics above), you can see when this particular query is putting strain on the system. Here, you can look at:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Requests.&lt;/strong&gt; Peaks in this graph can indicate high traffic periods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avg latency.&lt;/strong&gt; Spikes may suggest that queries take longer, possibly due to high load or inefficient execution plans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total time.&lt;/strong&gt; Spikes can indicate periods where the query is consuming more resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Percent time.&lt;/strong&gt; High consistent values or peaks suggest the query is a significant part of the database workload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rows/Query.&lt;/strong&gt; Consistently high numbers indicate heavy data retrieval, which can impact performance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a query-specific breakdown of the metrics from the main dashboard. Monitoring these metrics helps identify peak times for query load and execution times, critical for capacity planning, identifying potential bottlenecks, and optimizing database performance. &lt;/p&gt;

&lt;p&gt;By analyzing these graphs, developers can identify when and what to cache to reduce loads at peak times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Datadog With Postgres
&lt;/h2&gt;

&lt;p&gt;Leveraging APMs like Datadog is critical to enhancing database performance and user experience. &lt;/p&gt;

&lt;p&gt;If this has whetted your appetite for adding more monitoring to your database so you can understand query metrics, optimize performance, and cache complex, intensive queries, then you can add Datadog to any Postgres (or other) database.&lt;/p&gt;

&lt;p&gt;Datadog has written a series on Postgres performance and how to integrate Datadog into your application and expand on some of the techniques here.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start Caching Slow Queries with ReadySet
&lt;/h2&gt;

&lt;p&gt;By identifying high-load queries, analyzing their frequency, evaluating their cost, and monitoring performance during peak times, you can set the stage to strategically cache the right queries which leads to faster page loads, a smoother user experience, and reduced costs. &lt;/p&gt;

&lt;p&gt;ReadySet Cloud makes the caching process easy. &lt;a href="https://readyset.io/early-access?utm_campaign=eg&amp;amp;utm_medium=social&amp;amp;utm_source=dev.to"&gt;Sign up for access here&lt;/a&gt; to start caching queries in fifteen minutes or less without making any changes to your application code. &lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
