<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sadeq Dousti</title>
    <description>The latest articles on DEV Community by Sadeq Dousti (@msdousti).</description>
    <link>https://dev.to/msdousti</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1228061%2Fa4e73d99-ad3f-4b36-946e-838e11745a2c.jpg</url>
      <title>DEV Community: Sadeq Dousti</title>
      <link>https://dev.to/msdousti</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/msdousti"/>
    <language>en</language>
    <item>
      <title>PostgreSQL + Outbox Pattern Revamped — Part 2</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sun, 29 Jun 2025 23:01:26 +0000</pubDate>
      <link>https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf</link>
      <guid>https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-1-3lai/"&gt;Part 1 of this series&lt;/a&gt; introduced the outbox pattern as a reliable approach for message delivery in distributed systems, while highlighting several implementation pitfalls that can lead to failures. These include sorting by &lt;code&gt;created_at&lt;/code&gt;, incorrect data types, suboptimal indexing, overly large batch selections, poison pill messages, and inefficient index or visibility checks. A key insight was the performance impact of stale indexes, especially in high-throughput systems. To address this, the post proposed a revamped design using partitioned tables, which simplifies cleanup and mitigates visibility issues. Below is a summary of the traditional vs. revamped design, along with the DDL for both. Throughout this article, we will use the red color to indicate the traditional design and the blue color for the revamped design. Notice the update to the &lt;code&gt;published_at&lt;/code&gt; column of the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition causes the record to be deleted from it and inserted into the &lt;code&gt;outbox_published&lt;/code&gt; partition, which is a key feature of the revamped design.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89bifb5jjc8ejbeaqwvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89bifb5jjc8ejbeaqwvf.png" alt="Traditional vs. revamped outbox" width="605" height="499"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traditional (non-partitioned) outbox table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished_id_idx&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Revamped partitioned outbox tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
        &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_published&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
        &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;outbox_new_unpublished_id_idx&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although this approach is easily applicable to new systems, applying it to existing databases is less straightforward due to PostgreSQL's limitations around retrofitting table partitioning.&lt;/p&gt;

&lt;p&gt;In this second part, we will first explore migration patterns from a "traditional" outbox table to a partitioned one. We cover five scenarios and for each scenario, introduce a migration pattern. Next, we will pertain to the &lt;code&gt;autovacuum&lt;/code&gt; settings for the outbox tables, and how to tune them for optimal performance. Finally, we will discuss index maintenance for the outbox tables, which is crucial for high-throughput systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  Table of Contents
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Outbox migration patterns
  * COP: Cold Outbox Partitioning
  * COPRA: Cold Outbox Partitioning w/ Rapid Attachment
  * HOP: Hot Outbox Partitioning
  * HOPER: Hot Outbox Partitioning w/ Eventual Replacement
  * HOPIA: Hot Outbox Partitioning w/ Immediate Access
Tuning AUTOVACUUM for outbox tables
Outbox table index maintenance
Conclusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Outbox migration patterns
&lt;/h1&gt;

&lt;p&gt;Let's consider the scenario that we have an existing service with a traditional outbox table, and we want to migrate to a partitioned outbox table. We will use some schematics to illustrate the migration process, and assume a rolling update approach where the new version of the service is deployed alongside the old one, and they co-exist for a while. This is the most common scenario in production systems, where we cannot afford downtime or data loss. Depending on the requirements, we may have to deploy several versions of the service. The services will be denoted by the green color, and the version number like &lt;code&gt;V1&lt;/code&gt;, &lt;code&gt;V2&lt;/code&gt;, etc.&lt;/p&gt;

&lt;p&gt;Below, I will describe five patterns for migrating the outbox table, in the increasing order of complexity. If your requirements allow for a simpler approach, you can choose one of the simpler patterns. If you need more flexibility or performance, you can opt for a more complex pattern. Each pattern is given a mnemonic acronym to help you remember it, and also there is a flowchart to help you decide which pattern to use. The mnemonics are meaningful names, though some of them are not well-known English words. So, below the flowchart, I used a symbol to indicate the meaning of each mnemonic. 😉&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;COP: Cold Outbox Partitioning&lt;/strong&gt;. This is the simplest pattern, where you can stop the publication of messages for a short while (e.g., 10 minutes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;COPRA: Cold Outbox Partitioning w/ Rapid Attachment&lt;/strong&gt;. This pattern is similar to COP, except that the old outbox has to be rapidly attached to the new partitioned outbox table as the &lt;code&gt;outbox_published&lt;/code&gt; partition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HOP: Hot Outbox Partitioning&lt;/strong&gt;. In this pattern, your service cannot afford to stop the publication of messages, but you do not need to attach the old outbox table to the new partitioned outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HOPER: Hot Outbox Partitioning w/ Eventual Replacement&lt;/strong&gt;. This pattern is similar to HOP, except that the old outbox table eventually replaces the &lt;code&gt;outbox_published&lt;/code&gt; partition, and newly published messages are copied over. No immediate access to the old outbox table is required, but it can be accessed later if needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HOPIA: Hot Outbox Partitioning w/ Immediate Access&lt;/strong&gt;. This is by far the most complex pattern, with all requirements of HOP, and your service needs uninterrupted access to all the published messages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The decision flowchart below can help you choose the right pattern for your requirements. The next sections will describe each pattern in detail.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszqe0cnkuqucizjh0ptd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszqe0cnkuqucizjh0ptd.png" alt="Deciding what pattern to use" width="634" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  COP: Cold Outbox Partitioning
&lt;/h2&gt;

&lt;p&gt;COP is the simplest pattern for migration from a traditional outbox table to a partitioned one. The requirements are as relaxed as possible:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Paused publication:&lt;/strong&gt; You can stop the publication of messages for a short while (e.g., 10 minutes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent lifecycle:&lt;/strong&gt; The existing data in the old outbox table is not needed to be available in the new partitioned setup. You can leave it as a separate table, or drop it if you want to save space.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;An example is a service that periodically uses the outbox. For instance, in our company, we have an "interest service" that uses the outbox during the nightly accrual or the monthly interest payout. In all other times, the outbox is rarely used, and therefore we can afford to pause the publication of messages for a short while during the migration.&lt;/p&gt;

&lt;p&gt;Given these requirements, the migration process is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;V1&lt;/code&gt; is using the traditional outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V2&lt;/code&gt; is deployed with the new partitioned outbox table:

&lt;ul&gt;
&lt;li&gt;Insertions are now made into the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;Publication continues from the &lt;strong&gt;old&lt;/strong&gt; outbox table, until there are no more messages to publish.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V3&lt;/code&gt; is deployed where both insertions and publication are now performed over the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Between the &lt;code&gt;V2&lt;/code&gt; and &lt;code&gt;V3&lt;/code&gt; deployments, there is a short period where unpublished messages accumulated in the &lt;code&gt;outbox_unpublished&lt;/code&gt; table, and as such the publication is effectively paused.&lt;/p&gt;

&lt;p&gt;Let's depict this process in a schematic. Notice that due to the rolling deployment, &lt;code&gt;V1&lt;/code&gt; and &lt;code&gt;V2&lt;/code&gt; co-exist for a while, and the same applies to &lt;code&gt;V2&lt;/code&gt; and &lt;code&gt;V3&lt;/code&gt;. However, &lt;code&gt;V1&lt;/code&gt; and &lt;code&gt;V3&lt;/code&gt; do not co-exist. Also, &lt;code&gt;V3&lt;/code&gt; is only deployed after all messages in the old &lt;code&gt;outbox&lt;/code&gt; table are published. To prevent name clashes, let's assume that the new partitioned outbox table is named &lt;code&gt;outbox_new&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmome9g6zmixpzfru069.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcmome9g6zmixpzfru069.png" alt="The COP migration pattern" width="518" height="579"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  COPRA: Cold Outbox Partitioning w/ Rapid Attachment
&lt;/h2&gt;

&lt;p&gt;COPRA is the second-simplest pattern—after COP—for migration from a traditional outbox table to a partitioned one. The requirement on paused publication is the same as before, but the existing data in the old outbox table is needed to be available in the new partitioned setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Paused publication:&lt;/strong&gt; You can stop the publication of messages for a short while (e.g., 10 minutes).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid attachment:&lt;/strong&gt; The existing data in the old outbox table has to be available in the new partitioned setup.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;An example is when the application performs some kind of analytical query on the outbox table, or there is a logical replication setup that replicates the outbox table to another database. In this case, we cannot afford to lose the existing data in the old outbox table, and we need to attach it to the new partitioned outbox table.&lt;/p&gt;

&lt;p&gt;The DDL to create the new partitioned outbox table is a bit different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It won't create the &lt;code&gt;outbox_published&lt;/code&gt; partition, as we will attach the old outbox table to it later.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;id&lt;/code&gt; column has to be generated with a higher starting value to avoid clashes with the existing &lt;code&gt;id&lt;/code&gt; values in the old outbox table.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;-- The start value of the identity column must be greater than &lt;/span&gt;
    &lt;span class="c1"&gt;-- the greatest id of outbox to avoid clashes.&lt;/span&gt;
    &lt;span class="c1"&gt;-- 1_000_000 is used for illustration only.&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;START&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
        &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;outbox_new_unpublished_id_idx&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The attachment of the old outbox table to the new partitioned outbox table can be done with the following DDLs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The new partition may not contain an identity column.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 

&lt;span class="c1"&gt;-- This will lock the table, see below.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
    &lt;span class="n"&gt;ATTACH&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
        &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may also want to change or drop the indexes on the old outbox table (e.g., the primary key). A crucial point here, according to the &lt;a href="https://www.postgresql.org/docs/current/ddl-partitioning.html" rel="noopener noreferrer"&gt;PostgreSQL documentation&lt;/a&gt;, is as follows:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note that when running the &lt;code&gt;ATTACH PARTITION&lt;/code&gt; command, the table will be scanned to validate the partition constraint while holding an &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; lock on that partition.&lt;/p&gt;

&lt;p&gt;It is recommended to avoid this scan by creating a &lt;code&gt;CHECK&lt;/code&gt; constraint matching the expected partition constraint on the table prior to attaching it. Once the &lt;code&gt;ATTACH PARTITION&lt;/code&gt; is complete, it is recommended to drop the now-redundant &lt;code&gt;CHECK&lt;/code&gt; constraint.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the above example, the &lt;code&gt;outbox&lt;/code&gt; table will be locked exclusively, to check that it satisfies the condition of being the &lt;code&gt;DEFAULT&lt;/code&gt; partition of the &lt;code&gt;outbox_new&lt;/code&gt; table. This means that all the rows must satisfy the condition &lt;code&gt;published_at IS NOT NULL&lt;/code&gt;. If the outbox table is large, this can take a considerable amount of time. It can be OK if:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;outbox&lt;/code&gt; table does not have to be accessed during the migration, and&lt;/li&gt;
&lt;li&gt;The publication can remain paused for a while.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If #1 is not the case, you can use the &lt;code&gt;CHECK&lt;/code&gt; constraint as the documentation suggests, and elaborated below. If #2 is not the case, you can use the &lt;code&gt;HOP&lt;/code&gt; pattern instead, which does not require pausing publication.&lt;/p&gt;

&lt;p&gt;To use the &lt;code&gt;CHECK&lt;/code&gt; constraint approach, you can follow these steps. Note that these steps must be done in separate transactions. The first step adds a &lt;code&gt;CHECK&lt;/code&gt; constraint to the &lt;code&gt;outbox&lt;/code&gt; table, in the &lt;code&gt;NOT VALID&lt;/code&gt; mode. This means the constraint is only enforced for new rows (which in this case we don't expect any, since no new insertions are made into the &lt;code&gt;outbox&lt;/code&gt; table for now). The second step validates the constraint, which will check all the existing rows in the &lt;code&gt;outbox&lt;/code&gt; table, but it will not lock the table in the ACCESS EXCLUSIVE mode (or does so only very briefly to mark the constraint as valid at the end). The third step attaches the &lt;code&gt;outbox&lt;/code&gt; table to the &lt;code&gt;outbox_new&lt;/code&gt; table as the &lt;code&gt;DEFAULT&lt;/code&gt; partition. The fourth step drops the now-redundant &lt;code&gt;CHECK&lt;/code&gt; constraint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Add a constraint with NOT VALID option to avoid locking the table.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;outbox_published_check&lt;/span&gt;
        &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;VALID&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- In a separate transaction, validate the constraint.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="n"&gt;VALIDATE&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;outbox_published_check&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;span class="c1"&gt;-- Optional: In a separate transaction, set the published_at column to NOT NULL.&lt;/span&gt;
&lt;span class="c1"&gt;-- This uses the CHECK constraint on the table, so it will be fast.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
   &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- In a separate transaction, attach the partition.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
    &lt;span class="n"&gt;ATTACH&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
        &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- In a separate transaction, drop the now-redundant constraint.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINT&lt;/span&gt; &lt;span class="n"&gt;outbox_published_check&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;According to the above discussion, the migration process is as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;V1&lt;/code&gt; is using the traditional outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V2&lt;/code&gt; is deployed with &lt;code&gt;outbox_new&lt;/code&gt; table, having &lt;code&gt;outbox_unpublished&lt;/code&gt; as the only partition:

&lt;ul&gt;
&lt;li&gt;Insertions are now made into the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;Publication continues from the &lt;strong&gt;old&lt;/strong&gt; outbox table, until there are no more messages to publish.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V3&lt;/code&gt; is deployed which attaches the existing &lt;code&gt;outbox&lt;/code&gt; table to the new partitioned outbox table as the &lt;code&gt;DEFAULT&lt;/code&gt; partition.

&lt;ul&gt;
&lt;li&gt;This can be done with or without using the &lt;code&gt;CHECK&lt;/code&gt; constraint approach, depending on whether an exclusive lock on the &lt;code&gt;outbox&lt;/code&gt; table is acceptable.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V4&lt;/code&gt; is deployed where both insertions and publication are now performed over the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The schematics below illustrate the COPRA migration pattern.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkwz0igcogxy3xpprfqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flkwz0igcogxy3xpprfqn.png" alt="The COPRA migration pattern" width="800" height="1275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  HOP: Hot Outbox Partitioning
&lt;/h2&gt;

&lt;p&gt;HOP is the first pattern for migration from a traditional outbox table to a partitioned one, where we cannot afford to pause the publication of messages. The requirements are as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Immediate publication:&lt;/strong&gt; You cannot stop the publication of messages, even for a short while.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent lifecycle:&lt;/strong&gt; The existing data in the old outbox table is not needed to be available in the new partitioned setup. You can leave it as a separate table, or drop it if you want to save space.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;An example is a service that continuously uses the outbox, such as a payment service that processes transactions in real-time. In this case, we cannot stop the publication of messages, since it would disrupt other services that depend on the message delivery.&lt;/p&gt;

&lt;p&gt;Contrary to the stringent requirement on the publication, the migration process is pretty straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;V1&lt;/code&gt; is using the traditional outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V2&lt;/code&gt; is deployed with the new partitioned outbox table:

&lt;ul&gt;
&lt;li&gt;Insertions are now made into the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;Publication continues from the &lt;strong&gt;old&lt;/strong&gt; outbox table, until there are no more messages to publish.&lt;/li&gt;
&lt;li&gt;Then, publication continues from the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V3&lt;/code&gt; is deployed where both insertions and publication are now performed over the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The following PL/pgSQL code shows how to implement &lt;code&gt;V2&lt;/code&gt; by keeping the variable &lt;code&gt;outbox_table&lt;/code&gt;, that points to the outbox table to be used for publication. Initially, it points to the old outbox table, and then it switches to the new partitioned outbox table after all messages in the old outbox table are published. This way, the service does not have to check the source table at each iteration, which improves the performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
    &lt;span class="n"&gt;outbox_table&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'outbox'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;count_updated&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;LOOP&lt;/span&gt;
        &lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
            &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
                &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;
                &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
                &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
                &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;
            &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;outbox_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;outbox_table&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;GET&lt;/span&gt; &lt;span class="k"&gt;DIAGNOSTICS&lt;/span&gt; &lt;span class="n"&gt;count_updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ROW_COUNT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="n"&gt;count_updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;outbox_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'outbox'&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt;
            &lt;span class="c1"&gt;-- No more unpublished messages in `outbox`&lt;/span&gt;
            &lt;span class="c1"&gt;-- Switch to the new partitioned outbox table.&lt;/span&gt;
            &lt;span class="n"&gt;outbox_table&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'outbox_new'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The schematics below illustrate the HOP migration pattern.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uq0ak2ewwmfltenfbyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uq0ak2ewwmfltenfbyc.png" alt="The HOP migration pattern" width="800" height="1275"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  HOPER: Hot Outbox Partitioning w/ Eventual Replacement
&lt;/h2&gt;

&lt;p&gt;HOPER is a migration pattern for the following requirements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Immediate publication:&lt;/strong&gt; You cannot stop the publication of messages, even for a short while.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eventual replacement:&lt;/strong&gt; The existing data in the old outbox table is needed to be available in the new partitioned setup, but it does not have to be immediately accessible. The eventual goal is to have all published messages in a single partition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The steps are a combination of the HOP and COPRA patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;V1&lt;/code&gt; is using the traditional outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V2&lt;/code&gt; is deployed with the new partitioned outbox table:

&lt;ul&gt;
&lt;li&gt;The starting value of the &lt;code&gt;id&lt;/code&gt; column is set to a higher value to avoid clashes with the existing &lt;code&gt;id&lt;/code&gt; values in the old outbox table, as explained in the COPRA section above.&lt;/li&gt;
&lt;li&gt;Insertions are now made into the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;Publication continues from the &lt;strong&gt;old&lt;/strong&gt; outbox table, until there are no more messages to publish.&lt;/li&gt;
&lt;li&gt;Then, publication continues from the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V3&lt;/code&gt; is deployed where both insertions and publication are now performed over the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V4&lt;/code&gt; is deployed which:

&lt;ul&gt;
&lt;li&gt;Drops &lt;code&gt;IDENTITY&lt;/code&gt; from the &lt;code&gt;outbox-&amp;gt;id&lt;/code&gt; column.&lt;/li&gt;
&lt;li&gt;First applies and NOT VALID &lt;code&gt;CHECK&lt;/code&gt; constraint to the &lt;code&gt;outbox&lt;/code&gt; table, and then validates it in a separate transaction. This is fully explained in the COPRA section above.&lt;/li&gt;
&lt;li&gt;In a single transaction: Detaches the &lt;code&gt;outbox_published&lt;/code&gt; DEFAULT partition from &lt;code&gt;outbox_new&lt;/code&gt;, and instead attaches the existing &lt;code&gt;outbox&lt;/code&gt; table to the as the &lt;code&gt;DEFAULT&lt;/code&gt; partition.&lt;/li&gt;
&lt;li&gt;Drops the now-redundant &lt;code&gt;CHECK&lt;/code&gt; constraint from the &lt;code&gt;outbox&lt;/code&gt; table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;A job is deployed which gradually copies all the rows from the &lt;code&gt;outbox_published&lt;/code&gt; partition to the &lt;code&gt;outbox&lt;/code&gt; table, and then drops the &lt;code&gt;outbox_published&lt;/code&gt; partition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The schematics below illustrate the HOPER migration pattern.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fmsdousti%2Foutbox2%2Frefs%2Fheads%2Fmain%2Fhoper.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fmsdousti%2Foutbox2%2Frefs%2Fheads%2Fmain%2Fhoper.png" alt="The HOPER migration pattern" width="800" height="2276"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  HOPIA: Hot Outbox Partitioning w/ Immediate Access
&lt;/h2&gt;

&lt;p&gt;HOPIA is the migration pattern for the strictest requirements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Immediate publication:&lt;/strong&gt; You cannot stop the publication of messages, even for a short while.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate access:&lt;/strong&gt; All the published data has to be immediately accessible. There is no time window to copy the data from one table to another.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While this seems very complex, there is actually a simple solution: Just follow the HOPER pattern, but add an AFTER-INSERT trigger to the &lt;code&gt;outbox_published&lt;/code&gt; partition that copies the row to the &lt;code&gt;outbox&lt;/code&gt; table. This way, all the published messages are immediately available in the &lt;code&gt;outbox&lt;/code&gt; table, and you can access them at any time: There's no need to copy them later.&lt;/p&gt;

&lt;p&gt;The trigger can be implemented as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The new partition may not contain an identity column.&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;outbox_published_insert_trigger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;outbox_published_insert&lt;/span&gt;
&lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox_published&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;EACH&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt;
&lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;outbox_published_insert_trigger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The steps are a similar to the HOPER pattern, but with the addition of the trigger:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;V1&lt;/code&gt; is using the traditional outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V2&lt;/code&gt; is deployed with the new partitioned outbox table:

&lt;ul&gt;
&lt;li&gt;The starting value of the &lt;code&gt;id&lt;/code&gt; column is set to a higher value to avoid clashes with the existing &lt;code&gt;id&lt;/code&gt; values in the old outbox table, as explained in the COPRA section above.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;above trigger&lt;/strong&gt; is created on the &lt;code&gt;outbox_published&lt;/code&gt; partition.&lt;/li&gt;
&lt;li&gt;Insertions are now made into the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;Publication continues from the &lt;strong&gt;old&lt;/strong&gt; outbox table, until there are no more messages to publish.&lt;/li&gt;
&lt;li&gt;Then, publication continues from the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V3&lt;/code&gt; is deployed where both insertions and publication are now performed over the &lt;strong&gt;new&lt;/strong&gt; partitioned outbox table.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;V4&lt;/code&gt; is deployed which:

&lt;ul&gt;
&lt;li&gt;Drops &lt;code&gt;IDENTITY&lt;/code&gt; from the &lt;code&gt;outbox-&amp;gt;id&lt;/code&gt; column.&lt;/li&gt;
&lt;li&gt;First applies and NOT VALID &lt;code&gt;CHECK&lt;/code&gt; constraint to the &lt;code&gt;outbox&lt;/code&gt; table, and then validates it in a separate transaction. This is fully explained in the COPRA section above.&lt;/li&gt;
&lt;li&gt;In a single transaction: Detaches the &lt;code&gt;outbox_published&lt;/code&gt; DEFAULT partition from &lt;code&gt;outbox_new&lt;/code&gt;, and instead attaches the existing &lt;code&gt;outbox&lt;/code&gt; table to the as the &lt;code&gt;DEFAULT&lt;/code&gt; partition.&lt;/li&gt;
&lt;li&gt;Drops the now-redundant &lt;code&gt;CHECK&lt;/code&gt; constraint from the &lt;code&gt;outbox&lt;/code&gt; table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If needed, you can also drop the &lt;code&gt;outbox_published&lt;/code&gt; partition after the migration is complete, since all the published messages are now available in the &lt;code&gt;outbox&lt;/code&gt; table.&lt;/p&gt;

&lt;p&gt;The schematics are very similar to the HOPER pattern, with the addition of the trigger in &lt;code&gt;V2&lt;/code&gt; and the removal of the copy job. For brevity, I will not repeat the schematics here, but you can refer to the HOPER section above.&lt;/p&gt;

&lt;h1&gt;
  
  
  Tuning AUTOVACUUM for outbox tables
&lt;/h1&gt;

&lt;p&gt;Let's take a detour from the migration patterns and discuss how to tune the &lt;code&gt;autovacuum&lt;/code&gt; settings for the outbox tables. This is important to ensure that the outbox tables are maintained properly, especially in high-throughput systems where many messages are inserted and published frequently. This section assumes familiarity with the &lt;code&gt;autovacuum&lt;/code&gt; feature in PostgreSQL. A good, in-depth article is Laurenz Albe's &lt;a href="https://www.cybertec-postgresql.com/en/tuning-autovacuum-postgresql/" rel="noopener noreferrer"&gt;Tuning PostgreSQL autovacuum&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By default, PostgreSQL's &lt;code&gt;autovacuum&lt;/code&gt; settings are designed to work well on the medium-sized tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres=# \dconfig *autovacuum_*
         List of configuration parameters
               Parameter               |   Value
---------------------------------------+-----------
 autovacuum_analyze_scale_factor       | 0.1
 autovacuum_analyze_threshold          | 50
 autovacuum_freeze_max_age             | 200000000
 autovacuum_max_workers                | 3
 autovacuum_multixact_freeze_max_age   | 400000000
 autovacuum_naptime                    | 1min
 autovacuum_vacuum_cost_delay          | 2ms
 autovacuum_vacuum_cost_limit          | -1
 autovacuum_vacuum_insert_scale_factor | 0.2
 autovacuum_vacuum_insert_threshold    | 1000
 autovacuum_vacuum_scale_factor        | 0.2
 autovacuum_vacuum_threshold           | 50
 autovacuum_work_mem                   | -1
 log_autovacuum_min_duration           | 10min
(14 rows)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following table summarizes the most important settings and their default values:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AUTOVACUUM task&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vacuum run on update/delete&lt;/td&gt;
&lt;td&gt;At least 20% of the table changes plus 50 rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vacuum run on insert&lt;/td&gt;
&lt;td&gt;At least 20% of the table changes plus 1000 rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Analyze run&lt;/td&gt;
&lt;td&gt;At least 10% of the table changes plus 50 rows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In our revamped outbox design, there's an imbalance between the partitions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;outbox_unpublished&lt;/code&gt; partition is frequently inserted to and updated (resulting in delete), but its size is usually small.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;outbox_published&lt;/code&gt; partition is only inserted to, but it can grow indefinitely large.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since PostgreSQL default &lt;code&gt;autovacuum&lt;/code&gt; settings are designed for small and medium-sized tables, we usually do not have to tune them for the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition. However, keep an eye on the amount of time &lt;code&gt;autovacuum&lt;/code&gt; takes to run on this partition, plus whether the query for the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition is efficient. For both purposes, you can monitor the server logs. For the latter, you would need &lt;code&gt;log_min_duration_statement&lt;/code&gt; set to a low value, such as 100ms, or use the &lt;code&gt;auto_explain&lt;/code&gt; extension. Another possibility is to use the &lt;code&gt;pg_stat_statements&lt;/code&gt; extension, which can help you identify slow queries and their execution plans.&lt;/p&gt;

&lt;p&gt;For the insert-only table &lt;code&gt;outbox_published&lt;/code&gt;, vacuuming is not needed except for preventing the transaction ID wraparound.&lt;br&gt;
If it is not used in the application code, it does not have to be analyzed either. However, since disabling auto-analyze is not possible without disabling &lt;code&gt;autovacuum&lt;/code&gt; entirely, we can tune the parameters to make it less frequent. Here's an example to run both vacuum and analyze on the &lt;code&gt;outbox_published&lt;/code&gt; partition every 100K rows inserted, regardless of the table size. This gives you a consistent behavior, but it is suitable only for the case where the application code does not query the &lt;code&gt;outbox_published&lt;/code&gt; partition directly. There is also no one-size-fits-all solution, so you may need to adjust the values based on your workload and performance requirements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_published&lt;/span&gt;
   &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="n"&gt;autovacuum_vacuum_insert_scale_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;autovacuum_vacuum_insert_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;autovacuum_analyze_scale_factor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;autovacuum_analyze_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Outbox table index maintenance
&lt;/h1&gt;

&lt;p&gt;In the revamped outbox design, we have a single indexes on the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, and no indexes on the &lt;code&gt;outbox_published&lt;/code&gt; partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;outbox_new_unpublished_id_idx&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;outbox_unpublished&lt;/code&gt; partition is frequently inserted to and updated (resulting in delete), so the index on it is supposed to be severely bloated over time. We'll use &lt;a href="https://raw.githubusercontent.com/ioguix/pgsql-bloat-estimation/refs/heads/master/btree/btree_bloat-superuser.sql" rel="noopener noreferrer"&gt;this nice script&lt;/a&gt; by ioguix to estimate the bloat of the B-Tree index. The script requires superuser privileges to run, since it accesses the &lt;code&gt;pg_statistic&lt;/code&gt; system catalog.&lt;/p&gt;

&lt;p&gt;Let's create a view to simplify the usage of the script. In order not to clutter the blog, I will not paste the entire 90+ line script here, but you can find it in the link above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="c1"&gt;-- put btree_bloat-superuser.sql here&lt;/span&gt;
&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a freshly created &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, the following query shows that the index is empty, thus not bloated at all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Update the statistics of the outbox_unpublished partition.&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idxname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'outbox_new_unpublished_id_idx'&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;gx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-[ RECORD 1 ]----+------------------------------
current_database | postgres
schemaname       | public
tblname          | outbox_unpublished
idxname          | outbox_new_unpublished_id_idx
real_size        | 8192
extra_size       | 0
extra_pct        | 0
fillfactor       | 90
bloat_size       | 0
bloat_pct        | 0
is_na            | f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Column &lt;code&gt;bloat_pct&lt;/code&gt; is the bloat percentage of the index, which is 0% in this case, meaning that the index is not bloated at all. &lt;code&gt;is_na&lt;/code&gt; is the estimation "Not Applicable?" If true, do not trust the stats. In this case, it is false, meaning that the stats are valid and applicable.&lt;/p&gt;

&lt;p&gt;Let's insert 100K rows into the &lt;code&gt;outbox_new&lt;/code&gt; table, and see how the index looks like after that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idxname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'outbox_new_unpublished_id_idx'&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;gx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-[ RECORD 1 ]----+------------------------------
current_database | postgres
schemaname       | public
tblname          | outbox_unpublished
idxname          | outbox_new_unpublished_id_idx
real_size        | 2260992
extra_size       | 237568
extra_pct        | 10.507246376811594
fillfactor       | 90
bloat_size       | 8192
bloat_pct        | 0.36231884057971014
is_na            | f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Column &lt;code&gt;bloat_pct&lt;/code&gt; is less than 0.4%, meaning that the index is still not bloated at all.&lt;/p&gt;

&lt;p&gt;Let's update the &lt;code&gt;published_at&lt;/code&gt; column of 10K rows, and see how the index looks like after that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idxname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'outbox_new_unpublished_id_idx'&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;gx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-[ RECORD 1 ]----+------------------------------
current_database | postgres
schemaname       | public
tblname          | outbox_unpublished
idxname          | outbox_new_unpublished_id_idx
real_size        | 2260992
extra_size       | 434176
extra_pct        | 19.202898550724637
fillfactor       | 90
bloat_size       | 237568
bloat_pct        | 10.507246376811594
is_na            | f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, the &lt;code&gt;bloat_pct&lt;/code&gt; is 10.5%, meaning that the index is bloated by 10.5%. So, with 10K out of 100K rows updated, the index is bloated by 10.5%. Let's update the rest of rows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idxname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'outbox_new_unpublished_id_idx'&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;gx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-[ RECORD 1 ]----+------------------------------
current_database | postgres
schemaname       | public
tblname          | outbox_unpublished
idxname          | outbox_new_unpublished_id_idx
real_size        | 2260992
extra_size       | 2252800
extra_pct        | 99.6376811594203
fillfactor       | 90
bloat_size       | 2252800
bloat_pct        | 99.6376811594203
is_na            | f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wow, almost 100% bloat! Can &lt;code&gt;VACUUM (INDEX_CLEANUP on) outbox_new;&lt;/code&gt; fix it? Try it, and you'll find the answer is no. What if we insert 10K new rows?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox_new&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;bloat&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;idxname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'outbox_new_unpublished_id_idx'&lt;/span&gt; &lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;gx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-[ RECORD 1 ]----+------------------------------
current_database | postgres
schemaname       | public
tblname          | outbox_unpublished
idxname          | outbox_new_unpublished_id_idx
real_size        | 2482176
extra_size       | 2269184
extra_pct        | 91.41914191419141
fillfactor       | 90
bloat_size       | 2244608
bloat_pct        | 90.42904290429043
is_na            | f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, the &lt;code&gt;bloat_pct&lt;/code&gt; is 90.4%, meaning the bloat is partially fixed, but still very high.&lt;/p&gt;

&lt;p&gt;Fortunately, it's easy to remove the index bloat by running a &lt;code&gt;REINDEX&lt;/code&gt; command on the index or the table. To prevent an EXCLUSIVE lock on the table, you should use the &lt;code&gt;CONCURRENTLY&lt;/code&gt; option, which will take longer but will not block other operations on the table. Since the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition is supposed to be small, the &lt;code&gt;REINDEX&lt;/code&gt; command should not take too long:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;outbox_new_unpublished_id_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OR&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;REINDEX&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;CONCURRENTLY&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The index maintenance is not a one-time operation, but rather a periodic task that should be run regularly to keep the index in good shape. The frequency of the maintenance depends on the workload and the size of the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition. You can use the &lt;code&gt;pg_cron&lt;/code&gt; extension to schedule the maintenance task, or use a job scheduler like &lt;code&gt;cron&lt;/code&gt; or K8s CronJobs to run the &lt;code&gt;REINDEX&lt;/code&gt; command periodically. There's also a nice bash script by Vitaliy Kukharik, called &lt;a href="https://github.com/vitabaks/pg_auto_reindexer" rel="noopener noreferrer"&gt;pg_auto_reindexer&lt;/a&gt;, which can run during the set time and reindex all the indexes that are bloated more than a certain threshold. It can be used to automate the index maintenance process.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this article, we explored various migration patterns for transitioning from a traditional outbox table to a partitioned one in PostgreSQL. We covered five patterns—COP, COPRA, HOP, HOPER, and HOPIA—each designed for different requirements regarding publication pauses and data accessibility. The choice of pattern depends on your specific needs, with simpler patterns like COP being suitable for systems that can tolerate brief publication pauses, while more complex patterns like HOPIA address scenarios requiring continuous publication and immediate data access.&lt;/p&gt;

&lt;p&gt;We also discussed important operational aspects, including tuning autovacuum settings for outbox tables and maintaining indexes to prevent performance degradation. The partitioned outbox design offers significant advantages in terms of performance and maintenance, particularly for high-throughput systems.&lt;/p&gt;

&lt;p&gt;By implementing these patterns and following the recommended maintenance practices, you can achieve a more efficient and reliable outbox implementation that scales well with your system's growth while minimizing the common pitfalls associated with traditional outbox tables.&lt;/p&gt;

</description>
      <category>outbox</category>
      <category>postgres</category>
      <category>performance</category>
      <category>incidents</category>
    </item>
    <item>
      <title>Connecting from EKS pods to MSK: Setting up IAM, service accounts, trust relationships, and security groups</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Wed, 25 Jun 2025 23:17:22 +0000</pubDate>
      <link>https://dev.to/msdousti/connecting-from-eks-pods-to-msk-setting-up-iam-service-accounts-trust-relationships-and-2ed1</link>
      <guid>https://dev.to/msdousti/connecting-from-eks-pods-to-msk-setting-up-iam-service-accounts-trust-relationships-and-2ed1</guid>
      <description>&lt;p&gt;Recently, we have been working on a project that requires connecting from a pod running in Elastic Kubernetes Service (EKS) to a Managed Streaming for Apache Kafka (MSK) cluster. I've done this before, but I've never set up the whole thing from scratch. I thought it would be a good idea to document the steps I took to get everything working. This includes the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setting up the MSK cluster&lt;/li&gt;
&lt;li&gt;Setting up the EKS cluster&lt;/li&gt;
&lt;li&gt;Adjusting the security groups&lt;/li&gt;
&lt;li&gt;Installing Kafka tools on the EKS pod&lt;/li&gt;
&lt;li&gt;Enabling identity providers for the EKS cluster&lt;/li&gt;
&lt;li&gt;Creating an IAM role for the EKS service account&lt;/li&gt;
&lt;li&gt;Creating a Kubernetes service account&lt;/li&gt;
&lt;li&gt;Adjusting the trust relationship for the IAM role&lt;/li&gt;
&lt;li&gt;Creating a Kafka topic, producing messages, and consuming messages&lt;/li&gt;
&lt;li&gt;Cleaning up the resources after testing to save costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That seems like a lot, but I'll keep the guide full of screenshots and explanations to make it easy to follow along. Let's get started!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I intentionally refrained from explaining the "behind the scenes" of how this machinery works, to keep the guide focused on the practical steps. Google &lt;code&gt;Understanding AWS IRSA&lt;/code&gt; to find resources that explain the concepts in detail, if you're interested in the underlying mechanisms.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Prerequisites
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;An AWS account:&lt;/strong&gt; You'll need an AWS account to create the resources. This will cost you a bit, but you can use the free tier for some services. Without it, it cost me around 1 USD (estimated) for the whole setup. YMMV.&lt;/li&gt;
&lt;li&gt;You'll need some knowledge of Kubernetes and Kafka, as well as a general understanding of AWS services.&lt;/li&gt;
&lt;li&gt;I'll use AWS Web Console (including CloudShell) for all the steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Setting up the MSK cluster
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open the AWS Management Console&lt;/strong&gt; and navigate to the &lt;strong&gt;MSK service&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;Create cluster&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Choose the &lt;strong&gt;Quick create&lt;/strong&gt; option.&lt;/li&gt;
&lt;li&gt;Name your cluster &lt;code&gt;demo-msk-cluster&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Set cluster type to &lt;strong&gt;Serverless&lt;/strong&gt;. (So you pay as you go, and it scales automatically.)&lt;/li&gt;
&lt;li&gt;Leave all the other settings as default.&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;Create cluster&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It will take a few minutes to create the cluster. Once it's created, you can see it in the list of clusters, with a status of &lt;strong&gt;Active&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwcmddfzblr6chbawgo1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwcmddfzblr6chbawgo1.png" alt="demo-msk-cluster" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;code&gt;View client information&lt;/code&gt; to see the connection details. You'll need this information later when connecting from the EKS pod.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3t9q5xzvzh82pjia0ui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy3t9q5xzvzh82pjia0ui.png" alt="client information" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see above, the &lt;code&gt;Authentication type&lt;/code&gt; is IAM, and the Bootstrap server endpoint is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;boot-6nbanb1s.c3.kafka-serverless.us-east-1.amazonaws.com:9098
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your cluster will have a different endpoint, so make sure to copy it down and use it in the next steps.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setting up the EKS cluster
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open the AWS Management Console&lt;/strong&gt; and navigate to the &lt;strong&gt;Elastic Kubernetes Service&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Go to the &lt;strong&gt;Clusters&lt;/strong&gt; section and click on &lt;strong&gt;Create cluster&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;Quick configuration (with EKS Auto Mode)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Name your cluster &lt;code&gt;dev&lt;/code&gt;, leave the Kubernetes version, VPC, and subnets as default.&lt;/li&gt;
&lt;li&gt;Click on the &lt;code&gt;Create recommended role&lt;/code&gt; button for &lt;code&gt;Cluster IAM role&lt;/code&gt; and &lt;code&gt;Node IAM role&lt;/code&gt;, as denoted below:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxaawzzxyte7q07f5981.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxaawzzxyte7q07f5981.png" alt="create EKS cluster" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Follow the default settings for &lt;code&gt;Create recommended role&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cluster IAM role&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trusted entity type:&lt;/strong&gt; AWS service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; EKS 👉 EKS - Auto Cluster&lt;/li&gt;
&lt;li&gt;Add the default permissions, click next, and then create the role.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Node IAM role&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trusted entity type:&lt;/strong&gt; AWS service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use case:&lt;/strong&gt; EKS 👉 EKS - Auto Node&lt;/li&gt;
&lt;li&gt;Add the default permissions, click next, and then create the role.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Go back to the EKS cluster creation page, refresh the role list, and then select the appropriate roles for the cluster and node IAM roles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn7kt0jzykgds4ohhfid.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn7kt0jzykgds4ohhfid.png" alt="Assign roles" width="800" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Create&lt;/strong&gt; to create the EKS cluster. It will take a few minutes to create the cluster. Once it's created, you can see it in the list of clusters, with a status of &lt;strong&gt;Active&lt;/strong&gt;. (In my experience, it took a bit more to create an EKS cluster than an MSK cluster, so be patient.)&lt;/p&gt;

&lt;h1&gt;
  
  
  Adjusting the security groups
&lt;/h1&gt;

&lt;p&gt;Let's create our first pod in the EKS cluster to connect to the MSK cluster. Open CloudShell in the AWS Management Console, and run the following command to configure kubectl to use the EKS cluster (use the region where you created the EKS cluster):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws eks update-kubeconfig &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--name&lt;/span&gt; dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To test, let's list the nodes in the EKS cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should return a single node, as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0n55i10spvh2vha8rkq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0n55i10spvh2vha8rkq.png" alt="configure kubectl" width="695" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's create an Ubuntu pod. Here's the pod manifest that we'll use to create the pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu:24.04&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/bin/bash"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# keep the container running&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In CloudShell, save the above manifest to a file named &lt;code&gt;ubuntu.yaml&lt;/code&gt;, and then run the following command to create the pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ubuntu.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you immediately run &lt;code&gt;kubectl describe pods&lt;/code&gt;, you'll see a &lt;code&gt;FailedScheduling&lt;/code&gt;, with error message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0/1 nodes are available: 1 node(s) had untolerated taint {CriticalAddonsOnly: }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is because the single node in the EKS cluster has a taint that prevents pods from being scheduled on it unless they tolerate the taint. Since this is a serverless &amp;amp; elastic cluster, it automatically adds a node after some time, and schedules the pod on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p5s5np2mlu71sgoidx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1p5s5np2mlu71sgoidx6.png" alt="New node created, pod scheduled" width="474" height="263"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'll now get a shell into the pod to start our work. Run the following command to get a shell into the pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; ubuntu &lt;span class="nt"&gt;--&lt;/span&gt; bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your prompt should change to something like &lt;code&gt;root@ubuntu:/#&lt;/code&gt;. Let's update the package list and install some necessary network tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; net-tools netcat-traditional iputils-ping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's check if the pod can resolve the MSK cluster's bootstrap server endpoint name, and access its port. Run the following command to check if the pod can resolve the endpoint (replace the endpoint with your MSK cluster's bootstrap server endpoint):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ping boot-6nbanb1s.c3.kafka-serverless.us-east-1.amazonaws.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yes, it could resolve the endpoint to &lt;code&gt;172.31.37.226&lt;/code&gt; (in my case), but there was 100% packet loss. To check the port, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nc &lt;span class="nt"&gt;-zv&lt;/span&gt; boot-6nbanb1s.c3.kafka-serverless.us-east-1.amazonaws.com 9098
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No, the port is not accessible! We need to adjust the security groups to allow access from the EKS cluster to the MSK cluster. Let's find the IP of the pod first, so we can add it to the security group of the MSK cluster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ifconfig
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows the IP address of the pod, which in my case is &lt;code&gt;172.31.30.160&lt;/code&gt;. But I want the IP address of any pod that will be created in the EKS cluster, so I need to check the IP address of the subnets in the VPCs where the EKS cluster is created. There are actually several subnets, and it's generic enough to use &lt;code&gt;172.31.0.0/16&lt;/code&gt; as the CIDR block for the VPC.&lt;/p&gt;

&lt;p&gt;Go to the MSK cluster &lt;code&gt;demo-msk-cluster&lt;/code&gt; 👉 Properties, and scroll down to the &lt;strong&gt;Security groups applied&lt;/strong&gt; section. Click on the security group link to open it in a new tab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfgiac2rln85k7yfgt1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbfgiac2rln85k7yfgt1c.png" alt="MSK security group" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Edit inbound rules&lt;/strong&gt;, and then click on &lt;strong&gt;Add rule&lt;/strong&gt;. Select &lt;code&gt;All Traffic&lt;/code&gt; as the type, and &lt;code&gt;172.31.0.0/16&lt;/code&gt; as the custom source. Click on &lt;strong&gt;Save rules&lt;/strong&gt; to save the changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4mq5i3fcdbp8kvvwz6f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4mq5i3fcdbp8kvvwz6f.png" alt="Inbound rule" width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, if you try Netcat again (inside the pod), it should work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nc &lt;span class="nt"&gt;-zv&lt;/span&gt; boot-6nbanb1s.c3.kafka-serverless.us-east-1.amazonaws.com 9098
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vpce-07ab4e3e8f8e8a7d3-4nhymqra.vpce-svc-015711cabcd5af80a.us-east-1.vpce.amazonaws.com [172.31.26.34] 9098 (?) open
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Installing Kafka tools on the EKS pod
&lt;/h1&gt;

&lt;p&gt;OK, so we now have the basic network connectivity between the EKS pod and the MSK cluster. But to send requests to Kafka, we need to authenticate the requests. Since our MSK cluster is configured to use IAM authentication, it won't work out of the box.&lt;/p&gt;

&lt;p&gt;Let's try connecting to MSK without any authentication first, to see what happens. The following steps will be used to set up the Kafka client in the pod. First, we need to install OpenJDK and wget:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; openjdk-21-jre-headless wget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, download and install Kafka tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://dlcdn.apache.org/kafka/3.9.1/kafka_2.13-3.9.1.tgz
&lt;span class="nb"&gt;tar &lt;/span&gt;xvf kafka_2.13-3.9.1.tgz
&lt;span class="nb"&gt;cd &lt;/span&gt;kafka_2.13-3.9.1

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BOOTSTRAP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;boot-6nbanb1s.c3.kafka-serverless.us-east-1.amazonaws.com:9098
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's list the topics in the MSK cluster using the &lt;code&gt;kafka-topics.sh&lt;/code&gt; script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-topics.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BOOTSTRAP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It errs, as expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error while executing topic command : The AdminClient thread has exited. Call: listTopics
[2025-06-25 21:08:47,814] ERROR org.apache.kafka.common.errors.TimeoutException: The AdminClient thread has exited. Call: listTopics
 (org.apache.kafka.tools.TopicCommand)
[2025-06-25 21:08:47,822] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread)
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.nio.HeapByteBuffer.&amp;lt;init&amp;gt;(HeapByteBuffer.java:71)
        at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:391)
        at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:103)
        at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:462)
        at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:412)
        at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:694)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:596)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:501)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:596)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1542)
        at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1473)
        at java.base/java.lang.Thread.runWith(Thread.java:1596)
        at java.base/java.lang.Thread.run(Thread.java:1583)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Enabling identity providers for the EKS cluster
&lt;/h1&gt;

&lt;p&gt;To connect to the MSK cluster using IAM authentication, we need to enable identity providers for the EKS cluster. This allows the EKS cluster to use IAM roles for service accounts (IRSA) to authenticate with AWS services.&lt;/p&gt;

&lt;p&gt;Exit out of the pod shell by typing &lt;code&gt;exit&lt;/code&gt;, and then run the following command in CloudShell to enable identity providers for the EKS cluster (following the official Guide for &lt;a href="https://eksctl.io/installation/#for-unix" rel="noopener noreferrer"&gt;eksctl installation&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# for ARM systems, set ARCH to: `arm64`, `armv6` or `armv7`&lt;/span&gt;
&lt;span class="nv"&gt;ARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amd64
&lt;span class="nv"&gt;PLATFORM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;uname&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;_&lt;span class="nv"&gt;$ARCH&lt;/span&gt;

curl &lt;span class="nt"&gt;-sLO&lt;/span&gt; &lt;span class="s2"&gt;"https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_&lt;/span&gt;&lt;span class="nv"&gt;$PLATFORM&lt;/span&gt;&lt;span class="s2"&gt;.tar.gz"&lt;/span&gt;

&lt;span class="c"&gt;# (Optional) Verify checksum&lt;/span&gt;
curl &lt;span class="nt"&gt;-sL&lt;/span&gt; &lt;span class="s2"&gt;"https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_checksums.txt"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nv"&gt;$PLATFORM&lt;/span&gt; | &lt;span class="nb"&gt;sha256sum&lt;/span&gt; &lt;span class="nt"&gt;--check&lt;/span&gt;

&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xzf&lt;/span&gt; eksctl_&lt;span class="nv"&gt;$PLATFORM&lt;/span&gt;.tar.gz &lt;span class="nt"&gt;-C&lt;/span&gt; /tmp &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm &lt;/span&gt;eksctl_&lt;span class="nv"&gt;$PLATFORM&lt;/span&gt;.tar.gz

&lt;span class="nb"&gt;sudo mv&lt;/span&gt; /tmp/eksctl /usr/local/bin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, enable the identity providers for the EKS cluster by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;eksctl utils associate-iam-oidc-provider &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dev &lt;span class="nt"&gt;--approve&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-06-25 21:16:35 [ℹ]  will create IAM Open ID Connect provider for cluster "dev" in "us-east-1"
2025-06-25 21:16:36 [✔]  created IAM Open ID Connect provider for cluster "dev" in "us-east-1"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see the OIDC issuer URL for the EKS cluster, run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws eks describe-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; dev &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"cluster.identity.oidc.issuer"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For me, it returned the following URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://oidc.eks.us-east-1.amazonaws.com/id/72C2EA41B53D0AF8CA083756BF9D0CD8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Creating a permission policy for the IAM role
&lt;/h1&gt;

&lt;p&gt;Use &lt;code&gt;IAM&lt;/code&gt; 👉 &lt;code&gt;Policies&lt;/code&gt;, and create a "permit-all" policy for the MSK cluster. (Obviously, this is not recommended for production, but it's fine for testing purposes.) Add the following JSON policy in the policy editor, and save it with the name &lt;code&gt;MSK-permit-all&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kafka-cluster:*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Creating an IAM role for the EKS service account
&lt;/h1&gt;

&lt;p&gt;Let's create an IAM role that the EKS service account can assume to access the MSK cluster. Go to &lt;code&gt;IAM&lt;/code&gt; 👉 &lt;code&gt;Roles&lt;/code&gt; 👉 &lt;code&gt;Create role&lt;/code&gt;, select &lt;code&gt;Web identity&lt;/code&gt; as the trusted entity type, and then select the EKS cluster's OIDC provider that we just created. For &lt;strong&gt;audience&lt;/strong&gt;, select &lt;code&gt;sts.amazonaws.com&lt;/code&gt;, and then click on &lt;strong&gt;Next&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjoe68k488p42uoxt4kw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjoe68k488p42uoxt4kw.png" alt="Creating an IAM role" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Permissions policy&lt;/strong&gt;, search for &lt;code&gt;MSK-permit-all&lt;/code&gt; (which we created in the previous section) and select it. This policy allows full access to the MSK cluster. Select next. Give the role the name &lt;code&gt;EKS-MSK-Access&lt;/code&gt;, take note of the &lt;code&gt;Trust policy&lt;/code&gt; (screenshot below, we'll edit it in the next section), and then click on &lt;strong&gt;Create role&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uv9d8zvs97cp9z9xrqp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uv9d8zvs97cp9z9xrqp.png" alt="Trust policy" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Find the role &lt;code&gt;EKS-MSK-Access&lt;/code&gt; in the list of roles, and click on it to view its details. We need the ARN of the role for the next section, so copy it down. It should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;arn:aws:iam::533267425521:role/EKS-MSK-Access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4xdmoarojvpoteok1fw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4xdmoarojvpoteok1fw.png" alt="Role ARN" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Creating a Kubernetes service account
&lt;/h1&gt;

&lt;p&gt;Pods are associated with a "service account," which can be connected to an IAM role. This allows the pod to assume the IAM role and use its permissions to access AWS resources, including the MSK cluster. Kubernetes mounts the service account token in the pod, which is used to authenticate the requests. To see this token, run the following command inside the pod (using &lt;code&gt;echo&lt;/code&gt; + &lt;code&gt;cat&lt;/code&gt; as the file does not end with a newline):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/run/secrets/kubernetes.io/serviceaccount/token&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can paste the output in &lt;a href="https://jwt.io/" rel="noopener noreferrer"&gt;jwt.io&lt;/a&gt; to decode it. It should look something like this, which shows the pod is associated with the &lt;code&gt;default&lt;/code&gt; service account in the &lt;code&gt;default&lt;/code&gt; namespace (see &lt;code&gt;default:default&lt;/code&gt;):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpnhjqnt6du37mlh0kxn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpnhjqnt6du37mlh0kxn.png" alt="Default service account" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let's create a Kubernetes service account that is associated with the IAM role we created earlier. Run the following command in CloudShell to create a file named &lt;code&gt;sa.yaml&lt;/code&gt; with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::533267425521:role/EKS-MSK-Access&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, apply the manifest to create the service account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; sa.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure the service account is created successfully. Then, edit the Trust Relationship of the IAM role &lt;code&gt;EKS-MSK-Access&lt;/code&gt; to allow the EKS service account to assume the role. Go to the IAM console, find the role &lt;code&gt;EKS-MSK-Access&lt;/code&gt;, and click on it. Then, click on the &lt;strong&gt;Trust relationships&lt;/strong&gt; tab and then &lt;strong&gt;Edit trust relationship&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa74tq4ix7uz0rgssdt2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa74tq4ix7uz0rgssdt2p.png" alt="Edit IAM trust relationship" width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The important part is to replace the &lt;code&gt;aud&lt;/code&gt; key with a &lt;code&gt;sub&lt;/code&gt; key, and put the service account name and namespace in the value. The updated trust policy should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Federated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::533267425521:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/72C2EA41B53D0AF8CA083756BF9D0CD8"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"oidc.eks.us-east-1.amazonaws.com/id/72C2EA41B53D0AF8CA083756BF9D0CD8:sub"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system:serviceaccount:default:mysa"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now need to update the pod manifest to use the service account we just created. Let's edit the &lt;code&gt;ubuntu.yaml&lt;/code&gt; file we created earlier, and add the &lt;code&gt;serviceAccountName&lt;/code&gt; field under &lt;code&gt;spec&lt;/code&gt;. The updated manifest should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;serviceAccountName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysa&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu:24.04&lt;/span&gt;
      &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/bin/bash"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sleep&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;infinity"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you try to apply the manifest again, it will fail because we can only edit some of the fields under the &lt;code&gt;spec&lt;/code&gt; section of the pod. To fix this, we need to delete the existing pod and create a new one with the updated manifest. Run the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete pod ubuntu
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; ubuntu.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get the token again using the below command, and parse it using &lt;a href="https://jwt.io/" rel="noopener noreferrer"&gt;jwt.io&lt;/a&gt;, you'll see that the &lt;code&gt;sub&lt;/code&gt; field is &lt;code&gt;"system:serviceaccount:default:mysa"&lt;/code&gt; as expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/run/secrets/kubernetes.io/serviceaccount/token&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To make sure that the pod assumes the IAM role, we can use the AWS CLI inside the pod, as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nb"&gt;install &lt;/span&gt;unzip less &lt;span class="nt"&gt;-y&lt;/span&gt;
wget &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-O&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
./aws/install
aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should return the IAM role ARN that we created earlier, something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"UserId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AROAXYKJXFDY26KQHFDH7:botocore-session-1750889575"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Account"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"533267425521"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Arn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:sts::533267425521:assumed-role/EKS-MSK-Access/botocore-session-1750889575"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Unfortunately, we now need to install the Kafka tools again, as the pod is recreated. Log in to the pod again, and repeat the steps in section Installing Kafka tools on the EKS pod to install the Kafka tools. Don't try to list the topics yet, as we need more configuration to authenticate with the MSK cluster.&lt;/p&gt;

&lt;h1&gt;
  
  
  Installing the AWS MSK IAM Authentication JAR
&lt;/h1&gt;

&lt;p&gt;While inside the kafka tools directory, we need to download the &lt;code&gt;aws-msk-iam-auth&lt;/code&gt; JAR file, which is used to authenticate with the MSK cluster using IAM. Run the following command to download the JAR file, and save it in the &lt;code&gt;libs&lt;/code&gt; directory. We also add it to the &lt;code&gt;CLASSPATH&lt;/code&gt; environment variable so that Kafka tools can use it for authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wget https://github.com/aws/aws-msk-iam-auth/releases/download/v2.3.2/aws-msk-iam-auth-2.3.2-all.jar &lt;span class="nt"&gt;-P&lt;/span&gt; libs/
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLASSPATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;libs/aws-msk-iam-auth-2.3.2-all.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save the following to &lt;code&gt;client.properties&lt;/code&gt; file in the Kafka tools directory. This file contains the configuration for the Kafka client to connect to the MSK cluster using IAM authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;sasl.mechanism&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;AWS_MSK_IAM&lt;/span&gt;
&lt;span class="py"&gt;security.protocol&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;SASL_SSL&lt;/span&gt;
&lt;span class="py"&gt;sasl.jaas.config&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;software.amazon.msk.auth.iam.IAMLoginModule required;&lt;/span&gt;
&lt;span class="py"&gt;sasl.client.callback.handler.class&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;software.amazon.msk.auth.iam.IAMClientCallbackHandler&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run the following command now, it returns no errors (but also no topics, as we haven't created any yet):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-topics.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command-config&lt;/span&gt; client.properties &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BOOTSTRAP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--list&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Creating a Kafka topic, producing messages, and consuming messages
&lt;/h1&gt;

&lt;p&gt;Use the following command to create a Kafka topic named &lt;code&gt;topic1&lt;/code&gt; with 3 partitions and a replication factor of 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-topics.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command-config&lt;/span&gt; client.properties &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BOOTSTRAP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--topic&lt;/span&gt; topic1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create&lt;/span&gt; &lt;span class="nt"&gt;--partitions&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--replication-factor&lt;/span&gt; 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should return something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Created topic topic1.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you now list the topics as in the previous section, you should see &lt;code&gt;topic1&lt;/code&gt; in the list.&lt;/p&gt;

&lt;p&gt;To produce messages to the topic, we can use the &lt;code&gt;kafka-console-producer.sh&lt;/code&gt; script. Run the following command to start the producer. Each message is in its own line, so when you press Enter, it will send the message to the topic. You can type multiple messages, and then press Ctrl+C to exit the producer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-console-producer.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--producer&lt;/span&gt;.config client.properties &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BOOTSTRAP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--topic&lt;/span&gt; topic1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below, I sent messages &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;hello&lt;/code&gt;, and &lt;code&gt;bye&lt;/code&gt;, and then exited the producer with Ctrl+C:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63rpbr165o4eurikap1p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63rpbr165o4eurikap1p.png" alt="example producer" width="431" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's consume the messages we just produced. We can use the &lt;code&gt;kafka-console-consumer.sh&lt;/code&gt; script to consume messages from the topic. Run the following command to start the consumer, which will read messages from the beginning of the topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-console-consumer.sh &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--consumer&lt;/span&gt;.config client.properties &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; &lt;span class="nv"&gt;$BOOTSTRAP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--topic&lt;/span&gt; topic1 &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-beginning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The consumer will show all the messages produced to the topic, starting from the beginning. It then waits for new messages to be produced, until you exit it with Ctrl+C. Below, you can see the messages &lt;code&gt;1&lt;/code&gt;, &lt;code&gt;2&lt;/code&gt;, &lt;code&gt;hello&lt;/code&gt;, and &lt;code&gt;bye&lt;/code&gt; that we produced earlier:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4kzzd9zxkb97kuzl0ov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4kzzd9zxkb97kuzl0ov.png" alt="example consumer" width="431" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Cleaning up the resources after testing to save costs
&lt;/h1&gt;

&lt;p&gt;Phew! That was a lot of work, but we successfully connected from an EKS pod to an MSK cluster, produced messages, and consumed them. Now, let's clean up the resources we created to avoid any unnecessary costs.&lt;/p&gt;

&lt;p&gt;This is easy: Delete the EKS cluster and the MSK cluster, and you should be good to go.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In this guide, we've walked through the complete process of connecting an EKS pod to an MSK cluster using IAM authentication. We covered everything from setting up both services, configuring security groups and IAM roles, to producing and consuming Kafka messages.&lt;/p&gt;

&lt;p&gt;The key components we implemented include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creating and configuring MSK and EKS clusters&lt;/li&gt;
&lt;li&gt;Setting up network connectivity between the clusters&lt;/li&gt;
&lt;li&gt;Configuring IAM roles and service accounts for authentication&lt;/li&gt;
&lt;li&gt;Installing and configuring Kafka tools with AWS MSK IAM authentication&lt;/li&gt;
&lt;li&gt;Testing the connection by creating topics and exchanging messages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup provides a secure way to connect Kubernetes applications to Kafka streams using AWS's managed services, leveraging IAM for authentication rather than managing credentials manually. While this guide used a permissive IAM policy for demonstration purposes, in production environments you should follow the principle of least privilege and grant only the specific permissions needed. Of course, the policy we set up was too permissive, and the security group also opened all traffic from all VPCs. In production, you should restrict the security group to only allow traffic from the EKS cluster's VPC and subnets, and use a more restrictive IAM policy that only allows the necessary actions on the MSK cluster.&lt;/p&gt;

&lt;p&gt;The approach shown here can be extended to more complex scenarios, such as multi-cluster setups or integrating with other AWS services like Lambda or Kinesis.&lt;/p&gt;

</description>
      <category>eks</category>
      <category>msk</category>
      <category>iam</category>
      <category>aws</category>
    </item>
    <item>
      <title>PostgreSQL + Outbox Pattern Revamped — Part 1</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sun, 08 Jun 2025 13:49:09 +0000</pubDate>
      <link>https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-1-3lai</link>
      <guid>https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-1-3lai</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is a two-part series; for part 2, see &lt;a href="https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The outbox pattern is a well-known design pattern used in distributed systems to ensure messages are sent reliably, by first storing messages in a local database before sending them to a message broker. The &lt;code&gt;outbox&lt;/code&gt; table often contains fields like &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;payload&lt;/code&gt;, &lt;code&gt;created_at&lt;/code&gt;, and &lt;code&gt;published_at&lt;/code&gt;. There's a scheduled process that picks up N messages from the outbox, sends them to the message broker, and marks them as published.&lt;/p&gt;

&lt;p&gt;While this seems pretty straightforward, I've seen quite a number of incidents corresponding to the outbox pattern. In this blog post, I'll go over some of these incidents and discuss how to improve your setup to avoid them. With the last example, I'll demonstrate an issue that is often overlooked, and suggest a "revamped" solution that can significantly improve performance and reliability. This solution is based on partitioning the outbox table. &lt;/p&gt;

&lt;p&gt;I focus exclusively on PostgreSQL as my database of choice, but I think the principles can be applied to many other databases as well. All the experiments are done on PostgreSQL 17.5, on a MacBook Air M3 with 24 GB of RAM.&lt;/p&gt;

&lt;p&gt;Part 1 of this blog post gives a somewhat complete answer on how to avoid almost all outbox-pattern issues, yet it does not cover the problem of converting a regular outbox table to a partitioned table. &lt;a href="https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf"&gt;Part 2&lt;/a&gt; will tackle this problem, as well as give a more in-depth explanation of how to remove index bloat and improve performance further by adjusting the settings of the &lt;code&gt;autovacuum&lt;/code&gt; daemon for the partitioned outbox table.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Some people may favor &lt;a href="https://debezium.io/" rel="noopener noreferrer"&gt;Debezium&lt;/a&gt; or similar tools for the outbox pattern. With such tools, you only need to insert into the outbox table, and the tool will handle the rest by publishing the messages to the message broker. This is done by processing the database change logs, which is called WAL (Write-Ahead Logging) in PostgreSQL. I'm a proponent of Debezium myself, but there might be some limitations where teams prefer to implement the outbox pattern manually. Examples include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;avoiding the additional complexity of using Debezium, &lt;/li&gt;
&lt;li&gt;publishing to something other than Kafka (Debezium supports other sinks but at additional complexity), and&lt;/li&gt;
&lt;li&gt;avoiding logical replication in PostgreSQL (to simplify upgrades and avoid potential issues with logical replication slots).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As such, this blog post is not about Debezium, but rather about the outbox pattern itself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Table of Contents
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Incident 1: Sorting by created_at
Incident 2: Using the wrong data types
Incident 3: Incorrect indexing
  * Bad alternative 1
  * Bad alternative 2
Incident 4: Picking too many messages at each iteration
Incident 5: Poison pill messages
Incident 6: Inefficient index and visibility checking
  * Showcasing the issue
    - Setting up the outbox table and testing the index
    - Looking at the query plan
    - Simulating outbox message insertion and processing
    - Observing the query plan changes
  * How to fix the issue
    - Step 1: Create the partitioned outbox table
    - Step 2: Test the setup
    - Step 3: Test outbox message-processing performance
Conclusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Incident 1: Sorting by created_at
&lt;/h1&gt;

&lt;p&gt;In distributed systems, getting a total ordering of events based on local timestamps is a fatal error. The first incident is when each pod of the microservice labeled the outbox messages with its own &lt;code&gt;created_at&lt;/code&gt; timestamp. The message publisher fetched messages from the outbox and sorted them by &lt;code&gt;created_at&lt;/code&gt; before sending them to the message broker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This led to a situation where messages were sent in an order that was not consistent with the total ordering of events across the system, since the pods could have slightly different clocks. In our case, the difference was only a fraction of milliseconds, but it was enough to cause issues on a highly concurrent system.&lt;/p&gt;

&lt;p&gt;Setting &lt;code&gt;created_at&lt;/code&gt; using a single source of truth, such as the database server's clock, can help mitigate this issue. However, experience shows that relying on time for ordering is not a good idea in distributed systems. Instead, consider using a database-supplied sequence number.&lt;/p&gt;

&lt;p&gt;In PostgreSQL, define the &lt;code&gt;id&lt;/code&gt; column as &lt;code&gt;GENERATED BY DEFAULT AS IDENTITY&lt;/code&gt;, and then use it to order the messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Incident 2: Using the wrong data types
&lt;/h1&gt;

&lt;p&gt;Continuing with the previous tip, always use the &lt;code&gt;int8&lt;/code&gt; (= &lt;code&gt;bigint&lt;/code&gt;) data type for the &lt;code&gt;id&lt;/code&gt; column, rather than &lt;code&gt;int4&lt;/code&gt; (= &lt;code&gt;int&lt;/code&gt; or &lt;code&gt;integer&lt;/code&gt;). The latter allows for only two billion entries (assuming signed integers), and I've seen incidents where the DB ran out of the sequence number (though it was not in the context of the outbox).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tips:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;a href="https://github.com/postgres-ai/postgres-howtos/blob/main/0080_how_to_find_int4_pks_with_out_of_range_risks.md" rel="noopener noreferrer"&gt;this script&lt;/a&gt; to find &lt;code&gt;int4&lt;/code&gt; primary keys with out-of-range risks in a large database.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://github.com/postgres-ai/postgres-howtos/blob/main/0033_how_to_redefine_a_PK_without_downtime.md" rel="noopener noreferrer"&gt;this technique&lt;/a&gt; to redefine an &lt;code&gt;int4&lt;/code&gt; primary key to &lt;code&gt;int8&lt;/code&gt; without downtime.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Also, for timestamps, always use the variant that accounts for time zones, such as &lt;code&gt;timestamptz&lt;/code&gt; in PostgreSQL.&lt;/p&gt;

&lt;p&gt;For the &lt;code&gt;payload&lt;/code&gt; column, you can use &lt;code&gt;text&lt;/code&gt;, or &lt;code&gt;json/jsonb&lt;/code&gt; types in PostgreSQL. The latter is better if you want to store a serialized version of your data and possibly index it. However, it can be slower to store and retrieve, or more bulky. Also, if your message JSON serializer may output the character &lt;code&gt;\x00&lt;/code&gt;, you can't use &lt;code&gt;jsonb&lt;/code&gt; since it doesn't allow null characters.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;o_jsonb&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;o_jsonb&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{"resp":"&lt;/span&gt;&lt;span class="se"&gt;\x&lt;/span&gt;&lt;span class="s1"&gt;00"}'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  invalid input syntax for type json
LINE 1: insert into o_jsonb values ('{"resp":"\x00"}');
                                    ^
DETAIL:  Escape sequence "\x" is invalid.
CONTEXT:  JSON data, line 1: {"resp":"\x...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More info: &lt;a href="https://vladimir.varank.in/notes/2021/01/you-dont-insert-unicode-null-character-as-postgres-jsonb/" rel="noopener noreferrer"&gt;(You don't) Insert unicode NULL character as Postgres jsonb&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;All in all, the outbox table should look something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Incident 3: Incorrect indexing
&lt;/h1&gt;

&lt;p&gt;This is the most common incident I've seen. For the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The correct index is a partial index as follows. Note the column being indexed is &lt;code&gt;id&lt;/code&gt;, and the condition is &lt;code&gt;published_at IS NULL&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx1&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's insert some data into the outbox table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- insert data with non-null published_at&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- insert data with null published_at&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the query plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which uses the index &lt;code&gt;idx1&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;              QUERY PLAN
---------------------------------------
 Limit
   -&amp;gt;  Index Scan using idx1 on outbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bad alternative 1
&lt;/h2&gt;

&lt;p&gt;If you drop the index (&lt;code&gt;DROP INDEX idx1;&lt;/code&gt;), the planner will use the primary key index instead, but since that index is not partial, it will be much bigger and less efficient; note the &lt;code&gt;Filter&lt;/code&gt; condition in the query plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                  QUERY PLAN
----------------------------------------------
 Limit
   -&amp;gt;  Index Scan using outbox_pkey on outbox
         Filter: (published_at IS NULL)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the query sorted by &lt;code&gt;created_at&lt;/code&gt; instead of &lt;code&gt;id&lt;/code&gt;, which is not indexed, then Postgres will perform a sequential scan and sort, which is even worse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                  QUERY PLAN
----------------------------------------------
 Limit
   -&amp;gt;  Sort
         Sort Key: created_at
         -&amp;gt;  Seq Scan on outbox
               Filter: (published_at IS NULL)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bad alternative 2
&lt;/h2&gt;

&lt;p&gt;I've seen several cases where the index is defined on the &lt;code&gt;published_at&lt;/code&gt; column instead of &lt;code&gt;id&lt;/code&gt;, which is totally useless, because the B-Tree index will be full of &lt;code&gt;NULL&lt;/code&gt; values, and there's no way to use this index for both filtering and sorting by &lt;code&gt;id&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ Don't do this ❌&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx2&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Incident 4: Picking too many messages at each iteration
&lt;/h1&gt;

&lt;p&gt;PostgreSQL hates long-running transactions, and the outbox pattern is no exception. If you pick too many messages at once, you may end up with a long-running transaction that blocks other operations on the outbox table (such as Autovacuum), leading to performance degradation. For too large N, PostgreSQL may even prefer not to use the index at all!&lt;/p&gt;

&lt;p&gt;Notice that the outbox pattern does the following steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a transaction&lt;/li&gt;
&lt;li&gt;Fetch N messages from the outbox&lt;/li&gt;
&lt;li&gt;Publish the messages to the message broker&lt;/li&gt;
&lt;li&gt;Commit the transaction&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 3 entails network I/O, which can take a long time, especially if the message broker is slow or the network is congested. If you pick too many messages at once, the transaction will take longer to complete. Also, any failure will result in the entire transaction being rolled back, which means all the messages will be reprocessed.&lt;/p&gt;

&lt;p&gt;Do some performance testing to find the optimal number of messages to pick at each iteration. In my experience, picking ~100 messages at a time is a good starting point. Do not pick more than 1000 messages at once!&lt;/p&gt;

&lt;p&gt;Also, if possible, send the messages to the message broker in batches, rather than one by one. The batches should honor the message order, group messages by some criteria (e.g., user ID, account ID, etc.), and send messages in each group in order, while allowing for parallel processing of different groups. We use Kotlin coroutines as opposed to threads for utmost performance.&lt;/p&gt;

&lt;h1&gt;
  
  
  Incident 5: Poison pill messages
&lt;/h1&gt;

&lt;p&gt;A message that cannot be processed due to some error is called a "poison pill" message. If such a message is picked up by the outbox publisher, it will cause the entire transaction to fail, and the message will be reprocessed indefinitely. Grouping messages, as mentioned in the previous incident, can help lower the effect of poison pill messages: While the DB transaction is still around all the groups, each group can be processed independently, and exceptions in one group are caught to not affect the others.&lt;/p&gt;

&lt;p&gt;However, this is not a perfect solution, and alerting should be in place to notify the team when a poison pill message is detected. The team should then investigate the root cause and fix it. (Example: The Kafka topic to which the message is being published has a new schema in the Schema Registry that is incompatible with the message payload.)&lt;/p&gt;

&lt;h1&gt;
  
  
  Incident 6: Inefficient index and visibility checking
&lt;/h1&gt;

&lt;p&gt;This is by far the most complex issue, and understanding it requires a bit of background on how PostgreSQL handles visibility and indexing. It is also the main reason I wrote this blog post, as the solution we came up with was really rewarding in terms of performance and reliability. It kind of revamps the outbox pattern, in that we use partitioned tables without additional indexes, and allows for easy removal of old messages in the blink of an eye!&lt;/p&gt;

&lt;p&gt;The issue arises when the outbox table is being inserted into frequently, and at the same time the messages are being published to the message broker at a fast rate. This causes any index on the outbox table to become "inefficient" (referring to dead tuples + being bloated), which in turn leads to performance degradation. The inefficient index takes up more space and requires more time to scan. Besides, the visibility checking mechanism in PostgreSQL can cause additional overhead, as it needs to check the visibility of each row in the index, and possibly fetch many rows from the heap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Showcasing the issue
&lt;/h2&gt;

&lt;p&gt;Let's first demonstrate the issue with a simple example. It is important to note that this kind of issue is paramount in high-throughput systems, where the outbox table is being inserted into and processed at a fast rate. So, if you have a low-throughput system, you may not see this issue in practice, but it is still good to be aware of it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up the outbox table and testing the index
&lt;/h3&gt;

&lt;p&gt;This can be demonstrated with a simple example. We first set up the outbox table and its partial index as described earlier. I'm using an &lt;code&gt;UNLOGGED&lt;/code&gt; table to speed up the tests, but you should use a regular table in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;UNLOGGED&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx1&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we insert a large number of messages into the outbox table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Looking at the query plan
&lt;/h3&gt;

&lt;p&gt;Next, let's check the query plan for fetching messages from the outbox. Notice that I'm only fetching the &lt;code&gt;id&lt;/code&gt; column, to enforce the so-called "index-only scan." I'm also fetching a large number of messages (1000), in contrast to what was suggested earlier (See Incident 4). Both are for demonstration purposes only, to show a point that should be clear soon.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query plan shows that Postgres is using the index &lt;code&gt;idx1&lt;/code&gt; to fetch the messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                              QUERY PLAN
-----------------------------------------------------------------------
 Limit (actual rows=1000 loops=1)
   -&amp;gt;  Index Only Scan using idx1 on outbox (actual rows=1000 loops=1)
         Heap Fetches: 0
 Planning Time: 0.137 ms
 Execution Time: 0.133 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As expected, the index &lt;code&gt;idx1&lt;/code&gt; is used, the planner uses an "index-only scan," and there are no heap fetches, meaning that all the required data is available in the index itself. In other words, after reading the data from the index, Postgres does not need to access the heap (the actual table) to fetch any extra information. The query is fast, and it only takes a fraction of a millisecond to execute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simulating outbox message insertion and processing
&lt;/h3&gt;

&lt;p&gt;Now, let's run two additional parallel sessions: one that inserts new messages into the outbox table, and another that processes messages from the outbox. While both parallel sessions are running, we'll check the query plan again and again to see how it changes over time. Here, I opted for a simple loop rather than a &lt;code&gt;pgbench&lt;/code&gt; script, as I felt it's more beginner-friendly and easier to understand. However, you can use &lt;code&gt;pgbench&lt;/code&gt; or any other tool to simulate the load in a more realistic way, and also gather statistics on the performance of each parallel session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel Session 1: Insert messages into the outbox table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;LOOP&lt;/span&gt;
        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;-- Commit the transaction to avoid long-running transactions&lt;/span&gt;
        &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parallel Session 2: Process messages from the outbox table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;LOOP&lt;/span&gt;
        &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
            &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
            &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
            &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
            &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
        &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;-- Commit the transaction to avoid long-running transactions&lt;/span&gt;
        &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Observing the query plan changes
&lt;/h3&gt;

&lt;p&gt;While the two parallel sessions are running, let's check the query plan again and again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After just a few seconds, I got this awful query plan. Note that it fetches close to 100 million rows from the heap, which takes over 18.5 seconds to execute. This means a slowdown &lt;strong&gt;5 orders of magnitude&lt;/strong&gt; compared to the initial query plan, and can easily lead to an incident.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                              QUERY PLAN
-----------------------------------------------------------------------
 Limit (actual rows=1000 loops=1)
   -&amp;gt;  Index Only Scan using idx1 on outbox (actual rows=1000 loops=1)
         Heap Fetches: 96633220
 Planning Time: 0.067 ms
 Execution Time: 18553.061 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I should point out that not all plans were this bad. I executed it several times, and the results I got were as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Heap Fetches&lt;/th&gt;
&lt;th&gt;Execution Time (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;870094&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;96633220&lt;/td&gt;
&lt;td&gt;18553&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;11000&lt;/td&gt;
&lt;td&gt;2844&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1000&lt;/td&gt;
&lt;td&gt;1791&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;11000&lt;/td&gt;
&lt;td&gt;2708&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reason for this behavior is that autovacuum kicks in and cleans up the table and the index, and updates the visibility maps. As such, the number of heap fetches varies over time, but in most cases it was still 4 orders of magnitude slower!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; While the vacuum can clean up the index, it only marks dead rows in the index as invalid, and does not actually remove them from the index (because it requires rebuilding the index structure, which would slow down vacuum). Over time, this will cause the index to become bloated. We'll discuss this in the &lt;a href="https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf"&gt;Part 2&lt;/a&gt; of this blog post.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to fix the issue
&lt;/h2&gt;

&lt;p&gt;Fixing the issue is actually quite simple: Just use a partitioned outbox table. More specifically, the outbox table is partitioned by whether &lt;code&gt;published_at&lt;/code&gt; is &lt;code&gt;NULL&lt;/code&gt; or not. This way, one partition will always contain the messages that are not yet published, and the other partition will contain the messages that are already published. There will be no need for fetching tons of rows from the heap, since any row in the "unpublished" partition is guaranteed to have &lt;code&gt;published_at IS NULL&lt;/code&gt;. Furthermore, it is straightforward to remove all messages from the "published" partition: You can simply &lt;code&gt;TRUNCATE&lt;/code&gt; the partition regularly, which is a rapid operation in PostgreSQL. Finally, the operations are more "vacuum-friendly," as the "published" partition is only inserted to.&lt;/p&gt;

&lt;p&gt;Let's dive in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create the partitioned outbox table
&lt;/h3&gt;

&lt;p&gt;We create the &lt;code&gt;outbox&lt;/code&gt; table as before (note the &lt;code&gt;UNLOGGED&lt;/code&gt; is for testing purposes, you should use a regular table in production), with two differences:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The table is partitioned by the &lt;code&gt;published_at&lt;/code&gt; column.&lt;/li&gt;
&lt;li&gt;The table has no primary key, as the &lt;code&gt;id&lt;/code&gt; column is not listed in the partitioning key. Instead, we rely on the &lt;code&gt;GENERATED ALWAYS AS IDENTITY&lt;/code&gt; to generate unique IDs. Though in some cases (e.g., in logical replication), the lack of a primary key can cause issues, this is an acceptable trade-off for the outbox table for most use cases.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;UNLOGGED&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt; &lt;span class="k"&gt;GENERATED&lt;/span&gt; &lt;span class="n"&gt;ALWAYS&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;LIST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;published_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we create two partitions: one for the messages that are not yet published, and another for the messages that are already published. The first partition is called &lt;code&gt;outbox_unpublished&lt;/code&gt;, and the second partition is called &lt;code&gt;outbox_published&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;UNLOGGED&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt; 
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;UNLOGGED&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;outbox_published&lt;/span&gt; 
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;OF&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;DEFAULT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we create the index on the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, as it is the one that will be queried for fetching messages to publish. The other partition will not be queried, so it does not need an index. Also, the index should no longer be a partial index, as the partition itself is already filtered by &lt;code&gt;published_at IS NULL&lt;/code&gt;. However, for demonstration purposes and showing heap fetches, I opt to create a partial index anyway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx1&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;outbox_unpublished&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;-- This is unnecessary, as the partition is already filtered by published_at IS NULL&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Test the setup
&lt;/h3&gt;

&lt;p&gt;Let's first insert some messages into the &lt;code&gt;outbox&lt;/code&gt; partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="k"&gt;ANALYZE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's check the stats of the number of tuples inserted, updated, and deleted in each partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_ins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_upd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_del&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'outbox_%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is as expected: All 1 million tuples are inserted into the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      relname       | n_tup_ins | n_tup_upd | n_tup_del
--------------------+-----------+-----------+-----------
 outbox_unpublished |   1000000 |         0 |         0
 outbox_published   |         0 |         0 |         0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What if we update the &lt;code&gt;published_at&lt;/code&gt; column for a single row? The wrong way to do this, which is an easy way to create an incident, is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ❌ Don't do this ❌&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why on earth is this a bad idea? Because the outbox table has no index on the &lt;code&gt;id&lt;/code&gt; column, remember? Only the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition has an index on the &lt;code&gt;id&lt;/code&gt; column, so the above query will perform a sequential scan on the &lt;code&gt;outbox_published&lt;/code&gt; partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUMMARY&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's the query plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                     QUERY PLAN
-----------------------------------------------------
 Update on outbox
   Update on outbox_unpublished outbox_1
   Update on outbox_published outbox_2
   -&amp;gt;  Append
         -&amp;gt;  Seq Scan on outbox_unpublished outbox_1
               Filter: (id = 1)
         -&amp;gt;  Seq Scan on outbox_published outbox_2
               Filter: (id = 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You see &lt;code&gt;Seq Scan&lt;/code&gt; on both partitions as I opted for a partial index on the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, but as I pointed out earlier, this is just for demonstration purposes. &lt;/p&gt;

&lt;p&gt;Regardless, the point is that the &lt;code&gt;outbox_published&lt;/code&gt; partition will be sequentially scanned. In the beginning, where &lt;code&gt;outbox_published&lt;/code&gt; is empty, this &lt;code&gt;Seq Scan&lt;/code&gt; query will be fast. But as soon as the &lt;code&gt;outbox_published&lt;/code&gt; partition starts to accumulate messages, the query will become slower and slower, to a point where it can cause an incident.&lt;/p&gt;

&lt;p&gt;With partitioned tables, it's always best to include the partition key in the &lt;code&gt;WHERE&lt;/code&gt; clause. In our case, we can do this by using the &lt;code&gt;published_at IS NULL&lt;/code&gt; condition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUMMARY&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;-- This is the correct way to do it&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above will only scan the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, and will be fast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                         QUERY PLAN
------------------------------------------------------------
 Update on outbox
   Update on outbox_unpublished outbox_1
   -&amp;gt;  Index Scan using idx1 on outbox_unpublished outbox_1
         Index Cond: (id = 1)
         Filter: (published_at IS NULL)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's now actually run the update query, and check the stats again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Update the published_at column for a single row&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check the stats again&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_ins&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_upd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_tup_del&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_user_tables&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'outbox_%'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stats now show that one tuple was deleted from the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition, it was inserted into the &lt;code&gt;outbox_published&lt;/code&gt; partition, and no tuples were updated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      relname       | n_tup_ins | n_tup_upd | n_tup_del
--------------------+-----------+-----------+-----------
 outbox_unpublished |   1000000 |         0 |         1
 outbox_published   |         1 |         0 |         0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Test outbox message-processing performance
&lt;/h3&gt;

&lt;p&gt;Before running the message-processing query, let's check the query plan for fetching messages from the outbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Limit (actual rows=1000 loops=1)
   -&amp;gt;  Index Only Scan using idx1 on outbox_unpublished outbox (actual rows=1000 loops=1)
         Heap Fetches: 156
 Planning Time: 0.170 ms
 Execution Time: 0.165 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait, what?! There are heap fetches? Yes, because we updated the &lt;code&gt;published_at&lt;/code&gt; column for a single row, and the change is not significant enough to trigger an autovacuum. Do a manual &lt;code&gt;VACUUM outbox&lt;/code&gt;, and check the query plan again. You'll be surprised to see that there are still 156 heap fetches!&lt;/p&gt;

&lt;p&gt;The reason is that, starting from Postgres 14, VACUUM may skip index cleanup when the number of removable index entries is not significant. It has an option called &lt;code&gt;INDEX_CLEANUP&lt;/code&gt;, which is set to &lt;code&gt;AUTO&lt;/code&gt; by default. To enforce the index cleanup, you can set it to &lt;code&gt;ON&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;VACUUM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;INDEX_CLEANUP&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is as expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Limit (actual rows=1000 loops=1)
   -&amp;gt;  Index Only Scan using idx1 on outbox_unpublished outbox (actual rows=1000 loops=1)
         Heap Fetches: 0
 Planning Time: 0.166 ms
 Execution Time: 0.135 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'll reiterate that index cleanup is different from index bloat removal. This is a topic for the next part of this blog post, where we'll discuss how to remove index bloat and improve performance even further.&lt;/p&gt;

&lt;p&gt;Okay, let's now run the two parallel sessions again. The code for session 1 is exactly the same as before, but for session 2, we need to change the query so that the &lt;code&gt;UPDATE&lt;/code&gt; statement uses the additional condition &lt;code&gt;published_at IS NULL&lt;/code&gt; to ensure that it only updates the &lt;code&gt;outbox_unpublished&lt;/code&gt; partition:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel Session 1: Insert messages into the outbox table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;LOOP&lt;/span&gt;
        &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;
        &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parallel Session 2: Process messages from the outbox table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="n"&gt;LOOP&lt;/span&gt;
        &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
            &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
            &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
            &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
            &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
        &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;-- Ensure we only update the unpublished partition&lt;/span&gt;
        &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interestingly, the fetch query is pretty fast, and seemingly unaffected by the parallel sessions. I ran it many times over, and it gave me a consistent result of 1000 heap fetches and an execution time of 1–3 ms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ANALYZE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TIMING&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSTS&lt;/span&gt; &lt;span class="k"&gt;OFF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;outbox&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;published_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
    &lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Limit (actual rows=1000 loops=1)
   -&amp;gt;  Index Only Scan using idx1 on outbox_unpublished outbox (actual rows=1000 loops=1)
         Heap Fetches: 1000
 Planning Time: 0.042 ms
 Execution Time: 2.543 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;The outbox pattern is a powerful tool for ensuring reliable message delivery in distributed systems. However, it can lead to incidents if not implemented correctly. In this blog post, we discussed several common pitfalls and how to avoid them, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sorting by created_at&lt;/li&gt;
&lt;li&gt;Using the wrong data types&lt;/li&gt;
&lt;li&gt;Incorrect indexing&lt;/li&gt;
&lt;li&gt;Picking too many messages at each iteration&lt;/li&gt;
&lt;li&gt;Poison pill messages&lt;/li&gt;
&lt;li&gt;Inefficient index and visibility checking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last point is particularly important, as it can lead to significant performance degradation in high-throughput systems. We showed how to revamp the outbox pattern by using partitioned tables, which allows for easy removal of old messages and avoids the issues with stale indexes and visibility checking.&lt;/p&gt;

&lt;p&gt;While this revamped solution is straightforward to implement on a greenfield project, it can be more challenging to apply to an existing system. This is because PostgreSQL does not allow converting a regular table to a partitioned table. In &lt;a href="https://dev.to/msdousti/postgresql-outbox-pattern-revamped-part-2-1cbf"&gt;Part 2&lt;/a&gt; of this blog post, we'll tackle this problem, as well as discuss how to remove index bloat and improve performance even further by adjusting the settings of the &lt;code&gt;autovacuum&lt;/code&gt; daemon for the partitioned outbox table. Stay tuned for that, and wish you an incident-free outbox pattern implementation!&lt;/p&gt;

</description>
      <category>outbox</category>
      <category>postgres</category>
      <category>performance</category>
      <category>incidents</category>
    </item>
    <item>
      <title>Investigating PostgreSQL's `ANALYZE` Runtime Behavior: Sampling, Table Size, and Cache Effects</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sun, 06 Apr 2025 16:47:06 +0000</pubDate>
      <link>https://dev.to/msdousti/investigating-postgresqls-analyze-runtime-behavior-sampling-table-size-and-cache-effects-1eln</link>
      <guid>https://dev.to/msdousti/investigating-postgresqls-analyze-runtime-behavior-sampling-table-size-and-cache-effects-1eln</guid>
      <description>&lt;p&gt;PostgreSQL's &lt;code&gt;ANALYZE&lt;/code&gt; command is designed to collect statistics for the query planner by sampling a fixed number of pages from a table. As per the documentation, when &lt;code&gt;default_statistics_target&lt;/code&gt; is 100, the sample size is 30,000 pages (or less, if the table is samll and does not have 30K pages). This leads to a reasonable assumption: &lt;strong&gt;the runtime of &lt;code&gt;ANALYZE&lt;/code&gt; should be independent of the table size&lt;/strong&gt;, given that the number of pages sampled is constant.&lt;/p&gt;

&lt;p&gt;However, experimental evidence suggests otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experiment Setup
&lt;/h2&gt;

&lt;p&gt;We created five PostgreSQL tables of increasing size, from 100,000 to 400 million rows, using the following SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autovacuum_enabled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each table, we recorded (on a MacBook Air M3 with 24 GB RAM):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table size (in MB and number of pages)&lt;/li&gt;
&lt;li&gt;Analyze time under three conditions:

&lt;ul&gt;
&lt;li&gt;First time after table creation&lt;/li&gt;
&lt;li&gt;Subsequent runs with data in cache&lt;/li&gt;
&lt;li&gt;After purging OS and PostgreSQL cache using:
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew services stop postgresql@17 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;purge &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; brew services start postgresql@17
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Linux, use the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop postgresql &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'echo 3 &amp;gt; /proc/sys/vm/drop_caches'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;th&gt;Size (MB)&lt;/th&gt;
&lt;th&gt;Pages&lt;/th&gt;
&lt;th&gt;First Analyze (ms)&lt;/th&gt;
&lt;th&gt;Subsequent Analyze (ms)&lt;/th&gt;
&lt;th&gt;After Cache Purge (ms)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;448&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;4,480&lt;/td&gt;
&lt;td&gt;73&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000,000&lt;/td&gt;
&lt;td&gt;346&lt;/td&gt;
&lt;td&gt;44,288&lt;/td&gt;
&lt;td&gt;299&lt;/td&gt;
&lt;td&gt;209&lt;/td&gt;
&lt;td&gt;1,461&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000,000&lt;/td&gt;
&lt;td&gt;3,457&lt;/td&gt;
&lt;td&gt;442,496&lt;/td&gt;
&lt;td&gt;459&lt;/td&gt;
&lt;td&gt;454&lt;/td&gt;
&lt;td&gt;7,161&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;400,000,000&lt;/td&gt;
&lt;td&gt;13,827&lt;/td&gt;
&lt;td&gt;1,769,920&lt;/td&gt;
&lt;td&gt;1,426&lt;/td&gt;
&lt;td&gt;1,459&lt;/td&gt;
&lt;td&gt;8,630&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Visualization
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwtq9lzj26r01sirqexi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgwtq9lzj26r01sirqexi.png" alt="Analyze Time vs Table Size" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above chart shows that even though PostgreSQL samples a fixed number of pages, the &lt;code&gt;ANALYZE&lt;/code&gt; runtime &lt;strong&gt;increases with table size&lt;/strong&gt;, especially after purging caches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Possible Explanation: Reservoir Sampling Complexity
&lt;/h2&gt;

&lt;p&gt;One explanation lies in how PostgreSQL implements sampling. PostgreSQL uses &lt;a href="https://en.wikipedia.org/wiki/Reservoir_sampling" rel="noopener noreferrer"&gt;reservoir sampling&lt;/a&gt;, a technique that allows uniform random sampling from a stream (i.e., the full table). While the sample size is fixed (e.g., 30,000 pages), the &lt;strong&gt;time complexity of the algorithm is not constant&lt;/strong&gt;. In the naive implementation, it's linear, while in the optimized versions, it's logarithmic (in the number of items being sampled).&lt;/p&gt;

&lt;p&gt;PostgreSQL &lt;a href="https://github.com/postgres/postgres/blob/2e4ccf1b4508cc337bb4d0afff1e32a049d549fc/src/backend/utils/misc/sampling.c#L118-L131" rel="noopener noreferrer"&gt;uses&lt;/a&gt; Vitter's Reservoir Sampling Algorithm Z:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cm"&gt;/*
 * These two routines embody Algorithm Z from "Random sampling with a
 * reservoir" by Jeffrey S. Vitter, in ACM Trans. Math. Softw. 11, 1
 * (Mar. 1985), Pages 37-57.  Vitter describes his algorithm in terms
 * of the count S of records to skip before processing another record.
 * It is computed primarily based on t, the number of records already read.
 * The only extra state needed between calls is W, a random state variable.
 *
 * reservoir_init_selection_state computes the initial W value.
 *
 * Given that we've already read t records (t &amp;gt;= n), reservoir_get_next_S
 * determines the number of records to skip before the next record is
 * processed.
 */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also &lt;a href="https://github.com/postgres/postgres/blob/5e1915439085014140314979c4dd5e23bd677cac/src/backend/utils/misc/sampling.c#L32C1-L34C14" rel="noopener noreferrer"&gt;uses&lt;/a&gt; Knuth's optimized version called Algorithm S, where the total number of data to be sampled is known in advance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cm"&gt;/*
 * Since we know the total number of blocks in advance, we can use the
 * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
 * algorithm.
 */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The source code actually mentions that the implemented algorithm &lt;a href="https://github.com/postgres/postgres/blob/5e1915439085014140314979c4dd5e23bd677cac/src/backend/utils/misc/sampling.c#L81" rel="noopener noreferrer"&gt;may deviate&lt;/a&gt; from Algorithm S:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cm"&gt;/*
 * It is not obvious that this code matches Knuth's Algorithm S.
 * Knuth says to skip the current block with probability 1 - k/K.
 * If we are to skip, we should advance t (hence decrease K), and
 * repeat the same probabilistic test for the next block.
 */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means PostgreSQL must touch all pages of the table to decide whether to include them in the sample, even if it eventually keeps only 30,000 of them. This explains why larger tables take longer to &lt;code&gt;ANALYZE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But, all in all, it still seems that &lt;code&gt;analyze&lt;/code&gt; runtime depends considerably - even linearly - on the table size! In &lt;a href="https://www.postgresql.org/message-id/2r6r60h049h0lg4s9ve3qe1h38ubprpo30%40email.aon.at" rel="noopener noreferrer"&gt;an email to Postgres Hackers back in 2004&lt;/a&gt;, Manfred Koizar mentioned this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Playing around with some real numbers (for 100 tuples/page and a sample&lt;br&gt;
size of 3000) I got:&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; rel  | page
 size | reads
------+-------------
   30 |    30
  300 |   300    expectation is something like 299.9995 
  500 |   499
   1K |   990
   3K |  2.6K
  30K |    8K
 100K |   12K
   1M |   19K
  10M |   26K
 100M |   33K
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This growth rate is steeper than O(log(nblocks)).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have an idea how this could be done with O(1) page reads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What I have in mind is a kind of "Double Vitter" algorithm.  Whatever we&lt;br&gt;
do to get our sample of rows, in the end the sampled rows come from no&lt;br&gt;
more than sample_size different blocks.  So my idea is to first create a&lt;br&gt;
random sample of sample_size block numbers, and then to sample the rows&lt;br&gt;
out of this pool of blocks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The idea is implemented in Postgres, as the source code &lt;a href="https://github.com/postgres/postgres/blob/2e4ccf1b4508cc337bb4d0afff1e32a049d549fc/src/backend/utils/misc/sampling.c#L27" rel="noopener noreferrer"&gt;refers to it&lt;/a&gt;. However, based on my experiments, the running time of &lt;code&gt;analyze&lt;/code&gt; doesn't seem like O(1) or O(log(nblocks)), but rather O(nblocks). Of course, asymptotic calculations cannot be simply based on numeric observations!&lt;/p&gt;
&lt;h2&gt;
  
  
  Effect of Cache Purging
&lt;/h2&gt;

&lt;p&gt;The drastic increase in analyze time after purging caches highlights the importance of PostgreSQL shared buffers and OS-level caching. Without the cache:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each page access becomes a disk I/O operation.&lt;/li&gt;
&lt;li&gt;Disk reads (even from SSDs) are significantly slower than memory access.&lt;/li&gt;
&lt;li&gt;This especially affects large tables where random disk reads dominate the runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why, for the 400 million row table, analyze time jumped from ~1.4 seconds to over 8 seconds after cache purging.&lt;/p&gt;

&lt;p&gt;Interestingly, the time to analyze did not reduce afterwards, until I ran vacuum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="n"&gt;timing&lt;/span&gt;
&lt;span class="n"&gt;Timing&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8630&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;107&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;08&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8706&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;080&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;08&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;706&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;vacuum&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;43634&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;979&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;43&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;635&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1548&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;534&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;549&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1434&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;224&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;00&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;434&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason is probably the sampling: Analyze loads a different sample into RAM each time, leading to a lot of disk I/O when the data is not in RAM. On the other hand, when &lt;code&gt;vacuum&lt;/code&gt; is run the first time, it reads all the disk pages, warming up caches.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vacuum freeze&lt;/code&gt; is even more agressive, leading to a huge boost in &lt;code&gt;analyze&lt;/code&gt; running time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;vacuum&lt;/span&gt; &lt;span class="k"&gt;freeze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt; &lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;VACUUM&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;70350&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;355&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;350&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;658&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;919&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;538&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;907&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;528&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;144&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t_400_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ANALYZE&lt;/span&gt;
&lt;span class="nb"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;499&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;968&lt;/span&gt; &lt;span class="n"&gt;ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Still, after running &lt;code&gt;vacuum freeze&lt;/code&gt; on tables of size 100M and 400M rows, the latter takes 3x to analyze.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>statistics</category>
      <category>sampling</category>
      <category>cache</category>
    </item>
    <item>
      <title>The effect of frequent values on histogram_bounds in PostgreSQL</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sun, 06 Apr 2025 12:45:47 +0000</pubDate>
      <link>https://dev.to/msdousti/understanding-histogram-bounds-in-postgresql-54d1</link>
      <guid>https://dev.to/msdousti/understanding-histogram-bounds-in-postgresql-54d1</guid>
      <description>&lt;p&gt;In a &lt;a href="https://substack.com/home/post/p-157149678" rel="noopener noreferrer"&gt;recent blog post&lt;/a&gt;, I explored &lt;strong&gt;statistics in PostgreSQL&lt;/strong&gt;, with a focus on how the database engine builds &lt;strong&gt;histogram bounds&lt;/strong&gt; during the &lt;code&gt;ANALYZE&lt;/code&gt; process. These histograms are critical for query planning and optimization, as they give the planner a way to estimate data distributions across columns.&lt;/p&gt;

&lt;p&gt;One part of the post demonstrated how to generate a skewed distribution and inspect the resulting histogram bounds using the &lt;code&gt;pg_stats&lt;/code&gt; view. Here's the original setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autovacuum_enabled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
     &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a table with 500,500 rows, where 1 is repeated 1000 times, 2 is repeated 999 times, and so on, until 1000 which is repeated only once. This distribution is shown in the following diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7brvpt9gae01goqsv1oy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7brvpt9gae01goqsv1oy.png" alt="Distribution of inserted data" width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The blog post also demonstrates the concept of histogram bounds in Postgres, which can be observed after analyzing the table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;histogram_bounds&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;histogram_bounds&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_stats&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'t'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is based on a random sampling of the table, so your mileage may vary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;5, 18, 32, 40, 54, 68, 80, 100, 110, 119, 125, 136, 144, 149, 
154, 160, 165, 171, 178, 182, 189, 195, 202, 211, 216, 222, 
227, 234, 239, 245, 250, 256, 261, 268, 273, 278, 283, 289, 
295, 301 307, 314, 320, 325, 332, 339, 346, 352, 358, 364, 
370, 376, 383, 389, 396, 402, 409, 415, 422, 429 436, 443, 
450, 458, 466, 473, 481, 489, 496, 503, 511, 519, 527, 536, 
545, 555, 564, 573, 583, 593 603, 613, 623, 633, 645, 657, 
668, 680, 693, 706, 721, 734, 749, 765, 783, 802, 824, 846, 
873, 910, 994
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The expectation is that the histogram would reflect this skewed distribution, assigning &lt;strong&gt;more histogram buckets&lt;/strong&gt; to regions with &lt;strong&gt;higher data density&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reader Observation: Histogram Does Not Match Intuition
&lt;/h2&gt;

&lt;p&gt;Chris Jones, a reader of the blog, &lt;a href="https://traderepublic.substack.com/p/statistics-how-postgresql-counts-55d/comment/99729446" rel="noopener noreferrer"&gt;made a sharp observation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In your out of &lt;code&gt;histogram_bounds&lt;/code&gt; for table t above, it shows that there are 7 buckets for values 0-100, and 17 buckets for 200-300, even though we know that there are more records with values in the 0-100 range. I got similar bounds when I ran the same example. I tried with lots of different statistics_target values and always got the same. It seems like this histogram implies a distribution that is not correct.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's demonstrate his point with actually drawing the bucket distribution for the histogram bounds (using &lt;a href="https://gist.github.com/msdousti/f94ebbb9f69204f4de1ba64391b0f6c5" rel="noopener noreferrer"&gt;this Python script&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0rqx5jigpl414e7t0ku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp0rqx5jigpl414e7t0ku.png" alt="Bucket distribution for the histogram_bounds" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is an important point. If histogram bounds are used for cardinality estimation, shouldn't denser regions have &lt;strong&gt;finer granularity&lt;/strong&gt;?&lt;/p&gt;




&lt;h2&gt;
  
  
  Digging Deeper: Why Is the Histogram Counterintuitive?
&lt;/h2&gt;

&lt;p&gt;To understand this, I looked at the PostgreSQL source code. Specifically, this &lt;a href="https://github.com/postgres/postgres/blob/bd65cb3cd48a7a5ce48b26f8031ad3968efed87e/src/backend/commands/analyze.c#L2734-L2738" rel="noopener noreferrer"&gt;comment in &lt;code&gt;analyze.c&lt;/code&gt;&lt;/a&gt; explains part of the logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cm"&gt;/*
 * Generate a histogram slot entry if there are at least two distinct
 * values not accounted for in the MCV list. (This ensures the
 * histogram won't collapse to empty or a singleton.)
 */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This suggests that histogram bounds are only created when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The bucket contains at least two distinct elements;&lt;/li&gt;
&lt;li&gt;The values are not in the MCV (most common values) list.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the above distribution, high-frequency values like 1 and 2 have a higher chance of being in the MCV list than low-frequency values like 900.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verifying the Hypothesis
&lt;/h2&gt;

&lt;p&gt;To test this explanation, I modified the data by &lt;strong&gt;adding a small random offset&lt;/strong&gt; to each value. This ensures that every value is unique (or nearly unique), avoiding the issue of MCV elimination during histogram generation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autovacuum_enabled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;analyze&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;histogram_bounds&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;histogram_bounds&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_stats&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;tablename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'t'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this change, the histogram bounds now behave as expected: the 0–100 range contains &lt;strong&gt;many more histogram boundaries&lt;/strong&gt; than the 100–200 and 200–300 ranges. This better reflects the actual density of the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcp3unu0c7x3bp8t7ibl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcp3unu0c7x3bp8t7ibl.png" alt="Bucket distribution for the histogram_bounds with additive random noise" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;PostgreSQL's histogram statistics do account for &lt;strong&gt;data distribution&lt;/strong&gt;, but the internal logic excludes &lt;strong&gt;MCV values&lt;/strong&gt; when selecting histogram boundaries. As Rober Haas put it, "histogram is intended to be a histogram of non-MCVs".&lt;/p&gt;

&lt;p&gt;Thanks to Chris Jones for the sharp observation that triggered a deeper dive into how PostgreSQL builds histograms, and to Rober Haas for commenting on the blog with the above useful insight on &lt;a href="https://discord.com/invite/yMbds24D8n" rel="noopener noreferrer"&gt;PostgreSQL Hacking Discord.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>statistics</category>
      <category>database</category>
      <category>histograms</category>
    </item>
    <item>
      <title>PostgreSQL partitioning with desired index names</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sat, 01 Feb 2025 10:31:05 +0000</pubDate>
      <link>https://dev.to/msdousti/postgresql-partitioning-with-desired-index-names-1gcd</link>
      <guid>https://dev.to/msdousti/postgresql-partitioning-with-desired-index-names-1gcd</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;A while back, Hetti wrote an article titled &lt;a href="https://hdombrovskaya.wordpress.com/2025/01/26/indexing-partitioned-table-disaster/" rel="noopener noreferrer"&gt;Indexing Partitioned Table Disaster&lt;/a&gt;. There, she explained why the lack of control over index names in PostgreSQL table partitions can lead to some issues. We discussed a bit over &lt;a href="https://www.linkedin.com/feed/update/urn:li:activity:7289344003035086848?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7289344003035086848%2C7289406302823284736%29&amp;amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287289406302823284736%2Curn%3Ali%3Aactivity%3A7289344003035086848%29" rel="noopener noreferrer"&gt;LinkedIn comments&lt;/a&gt;, and this article is an elaboration on the discussion.&lt;/p&gt;

&lt;p&gt;To be self-contained, I will show the "the lack of control over index names in PostgreSQL table partitions", but not how it can lead to issues. For that, go read the great article above!&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;We first need to create a base table with some indexes. Here, I'm creating two indexes on the same set of columns, but one index is a partial index. This is used later to prove a point, but in real-world applications, I wouldn't allow the partition key (&lt;code&gt;created_at&lt;/code&gt;) to be nullable, neither would I create two indexes on the same list of columns (unless of course there's a good reason to do so, which I can't think of any right now!)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;generated&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;identity&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at_nn__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a partition
&lt;/h2&gt;

&lt;p&gt;Next, let's create a single partition for the base table. This will serve for data in February 2025:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-02-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-03-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Querying indexes
&lt;/h2&gt;

&lt;p&gt;I came up with a rather complex, recursive CTE query to show all indexes defined on table 't' and its partitions, and whether those indexes are valid. We will run this query several times in this article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="k"&gt;recursive&lt;/span&gt; &lt;span class="n"&gt;partitions&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="c1"&gt;-- select the base partitioned table t&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;child_oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;table_name&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'t'&lt;/span&gt;

  &lt;span class="k"&gt;union&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt;

  &lt;span class="c1"&gt;-- recursively select all partitions that inherit from their parent&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_inherits&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;partitions&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inhparent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;child_oid&lt;/span&gt;
  &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inhrelid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;case&lt;/span&gt;
         &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indisvalid&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="s1"&gt;'VALID'&lt;/span&gt;
         &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="s1"&gt;'INVALID'&lt;/span&gt;
       &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;partitions&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;pg_index&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indrelid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;child_oid&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indexrelid&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ci&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the result of running the query. It clearly shows that the partition &lt;code&gt;t_2025_02&lt;/code&gt; "inherited" the indexes from the base table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; table_name |          index_name          | status
------------+------------------------------+--------
 t          | t__id__created_at__idx       | VALID
 t          | t__id__created_at_nn__idx    | VALID
 t_2025_02  | t_2025_02_id_created_at_idx  | VALID
 t_2025_02  | t_2025_02_id_created_at_idx1 | VALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a caveat though: We had no control over how the indexes are named:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;t_2025_02_id_created_at_idx&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_2025_02_id_created_at_idx1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL uses the partition name and column names to form the index name, and if this results in the same index name (as in above), it starts adding suffixes 1, 2, 3, ...&lt;/p&gt;

&lt;p&gt;Can we have control over the index names? The answer is yes, though it brings a bit of complexity on the table. We'll consider two cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Case 1:&lt;/strong&gt; Partition does not exist yet. We want to create it and name the indexes ourselves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Case 2:&lt;/strong&gt; Partition exists and is attached to the base table. We want to add a new index to the base table, and have control over the index name on the partition.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Case 1: Controlling index name for newly created partitions
&lt;/h1&gt;

&lt;p&gt;Let's drop the above table, to start from a clean slate where the partition does not exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a partition
&lt;/h2&gt;

&lt;p&gt;We create the partition &lt;code&gt;LIKE&lt;/code&gt; the parent table, but exclude &lt;code&gt;indexes&lt;/code&gt;. We should also exclude other objects like &lt;code&gt;identity&lt;/code&gt; as Postgres forbids them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;including&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;excluding&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="k"&gt;excluding&lt;/span&gt; &lt;span class="k"&gt;identity&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's create indexes with the names we desire:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at_nn__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, attach the partition to the base table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;alter&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
  &lt;span class="n"&gt;attach&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-02-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-03-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the query to show indexes (see section "Querying indexes" above for the query):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; table_name |            index_name             | status
------------+-----------------------------------+--------
 t          | t__id__created_at__idx            | VALID
 t          | t__id__created_at_nn__idx         | VALID
 t_2025_02  | t_2025_02__id__created_at__idx    | VALID
 t_2025_02  | t_2025_02__id__created_at_nn__idx | VALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostgreSQL intelligently finds the corresponding index on the parent table, and "attaches" the child index to it. We can use psql &lt;code&gt;\d+&lt;/code&gt; meta-command to check that each parent index has the right child: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faljyc9jifk5h7j8m78ia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faljyc9jifk5h7j8m78ia.png" alt="Description of index t__id__created_at__idx" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvgnvuhcc4w3ji59qi29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvgnvuhcc4w3ji59qi29.png" alt="Description of index t__id__created_at__idx" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, if the child table does not have the corresponding index, Postgres will create it using the default naming convention.&lt;/p&gt;

&lt;h1&gt;
  
  
  Case 2: Controlling index name for existing partitions
&lt;/h1&gt;

&lt;p&gt;In this section, we consider a parent table with an existing partition. We'd like to create a new index on the parent, while controlling the name of the index on the partition. Let's first drop the parent table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then create the parent and one partition for it, without any indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;bigint&lt;/span&gt; &lt;span class="k"&gt;generated&lt;/span&gt; &lt;span class="n"&gt;always&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;identity&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;
  &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-02-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-03-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating indexes on parent table only
&lt;/h2&gt;

&lt;p&gt;We'll first create an index on the parent, and not on any of its partitions. The trick is to use the Postgres syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/sql-createindex.html" rel="noopener noreferrer"&gt;create index documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;When &lt;code&gt;CREATE INDEX&lt;/code&gt; is invoked on a partitioned table, the default behavior is to recurse to all partitions to ensure they all have matching indexes. Each partition is first checked to determine whether an equivalent index already exists, and if so, that index will become attached as a partition index to the index being created, which will become its parent index. If no matching index exists, a new index will be created and automatically attached; the name of the new index in each partition will be determined as if no index name had been specified in the command. If the ONLY option is specified, no recursion is done, and the index is marked invalid. (&lt;code&gt;ALTER INDEX ... ATTACH PARTITION&lt;/code&gt; marks the index valid, once all partitions acquire matching indexes.) Note, however, that any partition that is created in the future using &lt;code&gt;CREATE TABLE ... PARTITION OF&lt;/code&gt; will automatically have a matching index, regardless of whether &lt;code&gt;ONLY&lt;/code&gt; is specified.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at_nn__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the query from the section "Querying indexes" above, and you'll see the indexes are created in the &lt;code&gt;INVALID&lt;/code&gt; state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; table_name |        index_name         | status
------------+---------------------------+---------
 t          | t__id__created_at__idx    | INVALID
 t          | t__id__created_at_nn__idx | INVALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating indexes on the partition
&lt;/h2&gt;

&lt;p&gt;Here, we proceed as normal, and create indexes directly on the partition, with whatever name we desire. Note the use of &lt;code&gt;concurrently&lt;/code&gt; to prevent exclusive locks on the tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;concurrently&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;concurrently&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at_nn__idx&lt;/span&gt;
  &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Querying the indexes, we see that the partition indexes are created in the &lt;code&gt;VALID&lt;/code&gt; state, but the parent indexes are still invalid. We'll fix that in the next section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; table_name |            index_name             | status
------------+-----------------------------------+---------
 t          | t__id__created_at__idx            | INVALID
 t          | t__id__created_at_nn__idx         | INVALID
 t_2025_02  | t_2025_02__id__created_at__idx    | VALID
 t_2025_02  | t_2025_02__id__created_at_nn__idx | VALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Attaching partition indexes to the parent
&lt;/h2&gt;

&lt;p&gt;From PostgreSQL &lt;a href="https://www.postgresql.org/docs/current/sql-alterindex.html" rel="noopener noreferrer"&gt;alter index documentation&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;

&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ATTACH&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Causes the named index (possibly schema-qualified) to become attached to the altered index. The named index must be on a partition of the table containing the index being altered, and have an equivalent definition. An attached index cannot be dropped by itself, and will automatically be dropped if its parent index is dropped.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let's attach the indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;alter&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at__idx&lt;/span&gt;
  &lt;span class="n"&gt;attach&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at__idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;alter&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t__id__created_at_nn__idx&lt;/span&gt;
  &lt;span class="n"&gt;attach&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="n"&gt;t_2025_02__id__created_at_nn__idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, query indexes, and voilà! We have all the desired indexes in the valid state.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;table_name |            index_name             | status
------------+-----------------------------------+--------
t          | t__id__created_at__idx            | VALID
t          | t__id__created_at_nn__idx         | VALID
t_2025_02  | t_2025_02__id__created_at__idx    | VALID
t_2025_02  | t_2025_02__id__created_at_nn__idx | VALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Automation
&lt;/h1&gt;

&lt;p&gt;Everyone loves automation! Here, we have a function that automatically creates all indexes from the parent on the child, making the child ready to be attached to the parent. It excludes indexes marked as "primary key" (because it will be created anyway during attachment):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="k"&gt;replace&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;create_indexes_on_partition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_table&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;part_table&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;returns&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;declare&lt;/span&gt;
    &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;sql&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;begin&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;index_oid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
               &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;index_name&lt;/span&gt;
        &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pg_index&lt;/span&gt; &lt;span class="n"&gt;ind&lt;/span&gt;
        &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;ind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indexrelid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt;
        &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;pg_class&lt;/span&gt; &lt;span class="n"&gt;tbl&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;tbl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;oid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indrelid&lt;/span&gt;
        &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;tbl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_table&lt;/span&gt;
          &lt;span class="c1"&gt;-- all indexes, except primary keys&lt;/span&gt;
          &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;indisprimary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
    &lt;span class="n"&gt;loop&lt;/span&gt;
        &lt;span class="k"&gt;sql&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s1"&gt;'create index %I_%I on %I %s;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;part_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;regexp_replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;part_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="c1"&gt;-- index method and column list&lt;/span&gt;
            &lt;span class="k"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_get_indexdef&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_oid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="s1"&gt;'USING.*'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;notice&lt;/span&gt; &lt;span class="s1"&gt;'executing: %'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;execute&lt;/span&gt; &lt;span class="k"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;language&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Test
&lt;/h2&gt;

&lt;p&gt;Do all the steps in the "Setup" section above, and create a partition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;like&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;including&lt;/span&gt; &lt;span class="k"&gt;all&lt;/span&gt; &lt;span class="k"&gt;excluding&lt;/span&gt; &lt;span class="n"&gt;indexes&lt;/span&gt; &lt;span class="k"&gt;excluding&lt;/span&gt; &lt;span class="k"&gt;identity&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;create_indexes_on_partition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'t'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'t_2025_02'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NOTICE:  Executing: CREATE INDEX t_2025_02___id__created_at__idx ON t_2025_02 USING btree (id, created_at);
NOTICE:  Executing: CREATE INDEX t_2025_02___id__created_at_nn__idx ON t_2025_02 USING btree (id, created_at) WHERE (created_at IS NOT NULL);
 create_indexes_on_partition
-----------------------------
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, attach the partition to the base table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;alter&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
  &lt;span class="n"&gt;attach&lt;/span&gt; &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="n"&gt;t_2025_02&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-02-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2025-03-01'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the query to show indexes (see section "Querying indexes" above for the query):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; table_name |             index_name             | status
------------+------------------------------------+--------
 t          | t__id__created_at__idx             | VALID
 t          | t__id__created_at_nn__idx          | VALID
 t_2025_02  | t_2025_02___id__created_at__idx    | VALID
 t_2025_02  | t_2025_02___id__created_at_nn__idx | VALID
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can schedule the task of creating partitions, adding indexes, and then attaching them to the base table using &lt;a href="https://github.com/citusdata/pg_cron" rel="noopener noreferrer"&gt;pg_cron&lt;/a&gt; or any other scheduling mechanism you like!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>index</category>
      <category>partitioning</category>
    </item>
    <item>
      <title>Database: Indexing issues with random UUIDs</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Sun, 07 Jul 2024 15:00:28 +0000</pubDate>
      <link>https://dev.to/msdousti/database-indexing-issues-with-uuids-306g</link>
      <guid>https://dev.to/msdousti/database-indexing-issues-with-uuids-306g</guid>
      <description>&lt;p&gt;UUIDs are very popular among developers as the datatype for the identifier of a table. Database administrators, on the other hand, despise UUIDs, as they are not very database friendly: Most UUID types are random, meaning they are all over the place. As a result, the DBMS has to put extra effort to generate, process, store, and retrieve them.&lt;/p&gt;

&lt;p&gt;In my opinion, use of UUIDs must generally be confined to the cases where "universally unique" identifiers are needed. For an identifier that is unique within a table or a certain domain, a database sequence number would usually be OK.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There are certain cases where this general statement is debatable. See the &lt;strong&gt;Discussion&lt;/strong&gt; section at the end of this article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Also, there are versions of UUID that are time-sorted, and are preferred to random ones.&lt;/p&gt;

&lt;p&gt;In this post, we will see a few issues with indexing UUID columns. While PostgreSQL is used, I can imagine other DBMSs have similar issues with UUIDs. We also use some "sorted" versions of UUIDs, and compare them with the random ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  TLDR (from the summary section):
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The index on a randomly sorted UUID column is more than 20% larger.&lt;/li&gt;
&lt;li&gt;The time required to insert randomly generated UUID values is more than twice the time required to insert ordered UUID values, when the column has an index. In particular, the index creation time is 3-4 times slower.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;The database schema consists of two tables with identical schemas to store random and ordered UUIDs, respectively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We may also use temporary tables for fast operations (store intermediate data). Note that these tables have no primary keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;temp&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;tmp_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;temp&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;tmp_ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I'll use &lt;code&gt;psql&lt;/code&gt; as my database client, and specially use its meta-commands to check the timing (&lt;code&gt;\timing&lt;/code&gt;) size of tables and indexes (&lt;code&gt;\dt+&lt;/code&gt; and &lt;code&gt;\di+&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;The PostgreSQL server is version 16, which supports digit grouping. This makes the code more readable: 10 million is written as &lt;code&gt;10_000_000&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;All timings are reported while running both &lt;code&gt;psql&lt;/code&gt; and the PostgreSQL server on a MacBook M1 Pro.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating and storing random UUIDs
&lt;/h2&gt;

&lt;p&gt;Since version 13, PostgresQL is equipped with the function &lt;code&gt;gen_random_uuid()&lt;/code&gt;. This function returns a version 4 (random) UUID.&lt;/p&gt;

&lt;p&gt;Next, we generate 10 million UUIDv4, and store them in the &lt;code&gt;t_rnd&lt;/code&gt; table. Timings are reported because I enabled &lt;code&gt;\timing&lt;/code&gt; in &lt;code&gt;psql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This took around 56 seconds.&lt;/p&gt;

&lt;p&gt;Look at the table and index information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres=# \dt+ t_rnd

List of relations
┌─[ RECORD 1 ]──┬───────────┐
│ Schema        │ public    │
│ Name          │ t_rnd     │
│ Type          │ table     │
│ Owner         │ postgres  │
│ Persistence   │ permanent │
│ Access method │ heap      │
│ Size          │ 422 MB    │
│ Description   │           │
└───────────────┴───────────┘

postgres=# \di+ t_rnd_pkey

List of relations
┌─[ RECORD 1 ]──┬────────────┐
│ Schema        │ public     │
│ Name          │ t_rnd_pkey │
│ Type          │ index      │
│ Owner         │ postgres   │
│ Table         │ t_rnd      │
│ Persistence   │ permanent  │
│ Access method │ btree      │
│ Size          │ 383 MB     │
│ Description   │            │
└───────────────┴────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specially, look at the index size: &lt;code&gt;383 MB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now let's turn our attention to an ordered UUID case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating and storing ordered UUIDs
&lt;/h2&gt;

&lt;p&gt;There are several proposals for ordered UUIDs, such as UUID Version 7 (UUIDv7). But since PostgreSQL does not yet support them natively, let's just "cast" consecutive integers to UUIDs. We will later use an application to generate UUIDv7s for us.&lt;/p&gt;

&lt;p&gt;To cast an integer &lt;code&gt;i&lt;/code&gt; to UUID, we can use the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;lpad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we generate 10 million consecutive UUIDs, and store them in the &lt;code&gt;t_ord&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_ord&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;lpad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This took around 24 seconds (as opposed to 56 seconds with random UUIDs).&lt;/p&gt;

&lt;p&gt;Look at the table and index information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres=# \dt+ t_ord

List of relations
┌─[ RECORD 1 ]──┬───────────┐
│ Schema        │ public    │
│ Name          │ t_ord     │
│ Type          │ table     │
│ Owner         │ postgres  │
│ Persistence   │ permanent │
│ Access method │ heap      │
│ Size          │ 422 MB    │
│ Description   │           │
└───────────────┴───────────┘

postgres=# \di+ t_ord_pkey

List of relations
┌─[ RECORD 1 ]──┬────────────┐
│ Schema        │ public     │
│ Name          │ t_ord_pkey │
│ Type          │ index      │
│ Owner         │ postgres   │
│ Table         │ t_ord      │
│ Persistence   │ permanent  │
│ Access method │ btree      │
│ Size          │ 301 MB     │
│ Description   │            │
└───────────────┴────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The index size: &lt;code&gt;301 MB&lt;/code&gt;, which is more than 20% improvement over the random UUIDs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was the timing comparison fair?
&lt;/h2&gt;

&lt;p&gt;The generation and storage of UUIDs took:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;56 seconds for random UUIDs&lt;/li&gt;
&lt;li&gt;24 seconds for ordered UUIDs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But in this post, we are mostly interested in the storage time. So let's generate and store them in the temp table, and then observe the time it takes to store them in the main table. To be completely fair, we recreate the tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t_ord&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, generate and insert random UUIDs into the temp table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;tmp_rnd&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Guess what? It took less than 7 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_rnd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This took around 51 seconds. Table and index sizes are as before with &lt;code&gt;tmp_rnd&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now let's do the same with ordered UUIDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;tmp_ord&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;lpad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This also took less than 7 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_ord&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_ord&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This took around 21 seconds. Table and index sizes are as before with &lt;code&gt;tmp_ord&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Before we wrap up this section let's check what will happen if we did not have an index to begin with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_rnd_no_pk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_ord_no_pk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_rnd_no_pk&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_rnd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_ord_no_pk&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_ord&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both took around 10 seconds! So, the majority of slow down is to create an index. The following table summarizes everything we discussed so far:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Random UUID&lt;/th&gt;
&lt;th&gt;Ordered UUID&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to generate &amp;amp; save to tmp table&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time save to table without PK&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to save to table with PK&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table size (MB)&lt;/td&gt;
&lt;td&gt;422&lt;/td&gt;
&lt;td&gt;422&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index size (MB)&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;301&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Side note:&lt;/strong&gt; The 3 second (10-7) is spent to write the data to WAL, which is not required for tmp tables.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It follows that the time to create an index is &lt;code&gt;51-10 = 41&lt;/code&gt; seconds with random UUIDs, and &lt;code&gt;21-10 = 11&lt;/code&gt; seconds with ordered UUIDs. This means an almost 400% speed up! But why?&lt;/p&gt;

&lt;h2&gt;
  
  
  B-Tree indexes
&lt;/h2&gt;

&lt;p&gt;The reason why the random UUID indexes are bigger and slower are explained in &lt;a href="https://vladmihalcea.com/uuid-database-primary-key/" rel="noopener noreferrer"&gt;Vlad Mihalcea's excellent blog&lt;/a&gt;, which I quote here verbatim:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Indexing random values using B+Tree causes a lot of problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Index pages will have a very low fill factor because the values come randomly. So, a page of 8kB will end up storing just a few elements, therefore wasting a lot of space, both on the disk and in the database memory, as index pages could be cached in the Buffer Pool.&lt;/li&gt;
&lt;li&gt;Because the B+Tree index needs to rebalance itself in order to maintain its equidistant tree structure, the random key values will cause more index page splits and merges as there is no predetermined order of filling the tree structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The low "fill factor" is the reason why they are larger, and the more index page splits and merges are why they are built slower.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fill factor
&lt;/h3&gt;

&lt;p&gt;To showcase the fill factor issue, simply install the &lt;code&gt;pgstattuple&lt;/code&gt; extension, and then query both indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="n"&gt;extension&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;pgstattuple&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_leaf_density&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnd_ff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_leaf_density&lt;/span&gt; &lt;span class="n"&gt;ord_ff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_fragmentation&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnd_frag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_fragmentation&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ord_frag&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pgstatindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'t_rnd_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
     &lt;span class="n"&gt;pgstatindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'t_ord_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────┬────────┬──────────┬──────────┐
│ rnd_ff │ ord_ff │ rnd_frag │ ord_frag │
├────────┼────────┼──────────┼──────────┤
│  71.19 │  90.04 │    49.77 │        0 │
└────────┴────────┴──────────┴──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The random index has a 20% lower fill factor, and almost 50% fragmentation [1]. &lt;/p&gt;

&lt;p&gt;👉 The default PostgreSQL B-Tree fill factor is 90%, which is achieved by the ordered index.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Side Note:&lt;/strong&gt; If you &lt;code&gt;reindex&lt;/code&gt; the &lt;code&gt;t_rnd_pkey&lt;/code&gt; index, it will also achieve the 90% fill factor, as all the data required to build the index is readily available. You can also try creating a new index with &lt;/p&gt;


&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t_rnd_idx&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;In my case, it took only 6 seconds, and achieved the 90% fill factor without fragmentation.&lt;br&gt;
Also, try the following to achieve a higher fill factor, though higher is not necessarily better!&lt;/p&gt;


&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="n"&gt;t_rnd_idx_2&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;t_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fillfactor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Index page splits and merges
&lt;/h3&gt;

&lt;p&gt;It's rather hard to demonstrate B-Tree node splits and merges, as PostgreSQL does not readily make these stats available. However, we may track the number of index pages over time, after each insert to the table. The following code inserts 100K random and ordered UUIDs into a separate table, and keeps track of &lt;code&gt;leaf_pages&lt;/code&gt; of each index in a separate table. It then export the tracked data to &lt;code&gt;/tmp/data.csv&lt;/code&gt; for further analysis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;demo_rnd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;demo_ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;demo_idx&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_pages&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_pages&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ord&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;pgstatindex&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'demo_rnd_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pgstatindex&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'demo_ord_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;
&lt;span class="k"&gt;begin&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;_000&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;
        &lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_rnd&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_ord&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;lpad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;demo_idx&lt;/span&gt;
        &lt;span class="k"&gt;select&lt;/span&gt;
            &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_pages&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;rnd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;leaf_pages&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;ord&lt;/span&gt;
        &lt;span class="k"&gt;from&lt;/span&gt;
            &lt;span class="n"&gt;pgstatindex&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'demo_rnd_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;pgstatindex&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'demo_ord_pkey'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt; &lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;do&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;demo_idx&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;copy&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;demo_idx&lt;/span&gt; &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following Python3 code is used to show the number of &lt;code&gt;leaf_pages&lt;/code&gt; after each insert for both random and ordered UUIDs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;

&lt;span class="c1"&gt;# Read the CSV file
&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/data.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Plot the data
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rnd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rnd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ord&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ord&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;leaf_pages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Plot of rnd and ord leaf_pages against i&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvz6sv7vubqpr3t1pxrp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvz6sv7vubqpr3t1pxrp.png" alt="Plot of rnd and ord leaf_pages against i" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a zoomed-in version of the above plot:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38a44p8cus58ws6p4os9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38a44p8cus58ws6p4os9.png" alt="Zoomed-in version of the above plot" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we only inserted into the tables, no node merges happened in the B-Tree. However, we can easily see node splits by observing the index page count increase: The random UUID index - blue graph - splits into many pages in an &lt;strong&gt;accelerated&lt;/strong&gt; pace, and then slowly fills them in. However, the ordered UUID index - orange line - splits the pages and fills them &lt;strong&gt;steadily&lt;/strong&gt;. The slope of this line corresponds to the 90% fill factor.&lt;/p&gt;

&lt;h2&gt;
  
  
  UUIDv7
&lt;/h2&gt;

&lt;p&gt;To generate UUIDv7 in PostgreSQL, we can use the third-party extension &lt;code&gt;pg_uuidv7&lt;/code&gt; [2]. The x86_64 Linux module is pre-compiled and released. But for MacOS, I just ran &lt;code&gt;make &amp;amp;&amp;amp; make install&lt;/code&gt;, and then installed the extension by running&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="n"&gt;extension&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;pg_uuidv7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="err"&gt;┌──────────────────────────────────────┐&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;           &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;           &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;├──────────────────────────────────────┤&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;e8d&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a5f7&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;74&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;da5&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ca7a4b6580fc&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;└──────────────────────────────────────┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The timings are quite similar with the ordered UUID we concocted before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;temp&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_7_no_pk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_7&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;


&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 7132.475 ms (00:07.132)&lt;/span&gt;

&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_7_no_pk&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 9651.792 ms (00:09.652)&lt;/span&gt;

&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_7&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 24832.326 ms (00:24.832)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The table size is identical to before (&lt;code&gt;422 MB&lt;/code&gt;), but the index size is very large (&lt;code&gt;402 MB&lt;/code&gt;), even larger than the random UUIDs! Looking at the stats, we see a 67% average leaf density:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;postgres&lt;/span&gt;&lt;span class="o"&gt;=#&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pgstatindex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'t_7_pkey'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="err"&gt;┌─&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;RECORD&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;───────┬───────────┐&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt;            &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;tree_level&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;index_size&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;421830656&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;root_block_no&lt;/span&gt;      &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;295&lt;/span&gt;       &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;internal_pages&lt;/span&gt;     &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;252&lt;/span&gt;       &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;leaf_pages&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;51240&lt;/span&gt;     &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;empty_pages&lt;/span&gt;        &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;deleted_pages&lt;/span&gt;      &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;         &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;avg_leaf_density&lt;/span&gt;   &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;67&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;     &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;leaf_fragmentation&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;44&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;06&lt;/span&gt;     &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;└────────────────────┴───────────┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This came as a surprise to me: The index is both faster and larger than the index on the random UUIDs! Looking deeper, I found out that the extension does NOT generate time-sorted UUIDs (see also [3]):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="err"&gt;┌────┬──────────────────────────────────────┐&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  &lt;span class="err"&gt;│&lt;/span&gt;           &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;           &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;├────┼──────────────────────────────────────┤&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7029&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="n"&gt;ad0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="n"&gt;d12ba763d90&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;715&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a88e&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e36973ae964f&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7357&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;aaa8&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ba06f7232f2d&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;757&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b47d&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;40156&lt;/span&gt;&lt;span class="n"&gt;da5f2f7&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;9&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7595&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;a37&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9024&lt;/span&gt;&lt;span class="n"&gt;f862217d&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="n"&gt;f1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b9f1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b031f29263c5&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7607&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9003&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d001115fb89b&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;a43&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b726&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;e1eca4f15698&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;ea6&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;a572&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;b6be63c8b7a&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;│&lt;/span&gt;  &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt; &lt;span class="mi"&gt;01908&lt;/span&gt;&lt;span class="n"&gt;eee&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;c083&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="n"&gt;fa0&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;b565&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;e91d905fb1d&lt;/span&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;span class="err"&gt;└────┴──────────────────────────────────────┘&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other UUIDv7 implementations that I've seen, such as the Java implementation uuid-creator [4], guarantee a time sorted output by using a (randomly initialized) counter.&lt;/p&gt;

&lt;p&gt;To simulate this a time-sorted sequence, let's cluster the table &lt;code&gt;tmp_7&lt;/code&gt; based on the id first. Here's the full SQL code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;temp&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_7_no_pk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_7&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;uuid_generate_v7&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="n"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 6945.381 ms (00:06.945)&lt;/span&gt;

&lt;span class="c1"&gt;--------------------------------------------&lt;/span&gt;
&lt;span class="c1"&gt;-- Clustering tmp_7 ensures proper ordering&lt;/span&gt;
&lt;span class="c1"&gt;--------------------------------------------&lt;/span&gt;
&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;index&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 3633.888 ms (00:03.634)&lt;/span&gt;
&lt;span class="k"&gt;cluster&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;tmp_7_id_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 4858.199 ms (00:04.858)&lt;/span&gt;


&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_7_no_pk&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 9012.780 ms (00:09.013)&lt;/span&gt;

&lt;span class="k"&gt;insert&lt;/span&gt; &lt;span class="k"&gt;into&lt;/span&gt; &lt;span class="n"&gt;t_7&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tmp_7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Time: 27749.267 ms (00:27.749)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now see that the index size is &lt;code&gt;301 MB&lt;/code&gt;, as expected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this analytical post, we found out two important facts about indexing UUID columns in PostgreSQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The index on a randomly sorted UUID column is more than 20% larger.&lt;/li&gt;
&lt;li&gt;The time required to insert randomly generated UUID values is more than twice the time required to insert ordered UUID values, when the column has an index. In particular, the index creation time is 3-4 times slower.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;In the beginning of this article, I stated a personal opinion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In my opinion, use of UUIDs must generally be confined to the cases where "universally unique" identifiers are needed. For an identifier that is unique within a table or a certain domain, a database sequence number would usually be OK.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some of my colleagues reached out and pointed out to counter-arguments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Sequential identifiers are not an option when they are exposed to the outside, and the business rules require them to be "non-guessable".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There is a certain value in the ability to assign an ID to an entity BEFORE going to the database. You can batch generate them, use any algorithm you like (snowflake, ULID, non-standard UUIDs, etc.). This works great with non-relational databases (some of them don't support auto-increment IDs at all).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easier to correlate in the context of mid/large organization. Auto-increment IDs will always coincide between different entities (i.e., you can have an order with ID = 10, as well as a trade, a booking, and a reservation, with the same ID). Having UUID protects you from accidentally looking at the wrong entity.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are all valid arguments, but there are sometimes workarounds that you might be interested in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Regarding batch sequence generation, some DB sequences can be batch-generated too [5]: But one DB query is necessary anyway to fetch all those values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Regarding the distinctness from all other entities: In this case, we can use the numbers internally, but convert them to a unique, string format when communicating with external systems. For instance, the trade ID is 10 internally, but whenever we share it in events, REST responses, or logs, we use &lt;code&gt;Trade-0000000010&lt;/code&gt;  as the ID. Also, whenever we get this string format from a request, we need to map it to the numeric value.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Footnotes
&lt;/h2&gt;

&lt;p&gt;[1] &lt;code&gt;leaf_fragmentation&lt;/code&gt; is the percentage of leaf pages where the following leaf page has a lower block number, and might not be a good indication for anything. See &lt;a href="https://dba.stackexchange.com/a/331067/1583" rel="noopener noreferrer"&gt;Laurenz Albe's answer&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;[2] &lt;a href="https://github.com/fboulnois/pg_uuidv7" rel="noopener noreferrer"&gt;UUIDv7 extension for PostgreSQL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[3] &lt;a href="https://github.com/fboulnois/pg_uuidv7/pull/15#issuecomment-2058272680" rel="noopener noreferrer"&gt;Comment on UUIDv7 extension&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[4] &lt;a href="https://github.com/f4b6a3/uuid-creator/wiki/1.7.-UUIDv7" rel="noopener noreferrer"&gt;uuid-creator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[5] &lt;a href="https://villim.github.io/postgresql-sequence-cache" rel="noopener noreferrer"&gt;PostgreSQL Sequence Cache&lt;/a&gt;&lt;/p&gt;

</description>
      <category>uuid</category>
      <category>index</category>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Postgres partitioning performance: Hash vs. List</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Wed, 01 May 2024 21:28:17 +0000</pubDate>
      <link>https://dev.to/msdousti/postgres-partitioning-performance-hash-vs-list-2ppn</link>
      <guid>https://dev.to/msdousti/postgres-partitioning-performance-hash-vs-list-2ppn</guid>
      <description>&lt;p&gt;In our design, we came up with a would-be large PostgreSQL table that just stores IDs of incoming (Kafka) events, for the purpose of de-duplication. The IDs are of type UUID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After some consideration, we decided to partition this table into 16 partitions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Before we continue, let's make it clear that this article is about a very specific case and workload. However, it should provide you with enough insight to customize it based on your own needs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  Hash partitioning
&lt;/h1&gt;

&lt;p&gt;The initial idea was to use hash partitioning.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_00&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_01&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_02&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_03&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_04&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_05&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_06&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_07&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_08&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_09&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_10&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_11&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_12&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_13&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_14&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_15&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; 
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modulus&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A competing idea was to ditch &lt;code&gt;hash&lt;/code&gt; and use the last digit of the &lt;code&gt;id&lt;/code&gt; as partitioning key. I'll discuss the idea in the next section, but let's first benchmark the hash partitioning approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
pgbench &lt;span class="nt"&gt;-c10&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; 900 &lt;span class="nt"&gt;-j30&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I chose to run 10 connections to DB, each sending 900 queries to the DB, using 30 concurrent threads. Each query will insert 1000 UUIDs in the table.&lt;/p&gt;

&lt;p&gt;Why these numbers? Just for fun. They should conform to the real traffic to be indicative of anything. But let's just see how this turns out on my personal laptop (old, 2017 model!):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 13.776 ms
initial connection time = 6.491 ms
tps = 725.901931 (without initial connection time)

real    0m12.438s
user    0m0.162s
sys     0m0.396s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It took 12.4 seconds to insert 9,000,000 rows. The average TPS (transaction per second) is 725.9.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;psql&lt;/code&gt; metacommands, we can see the table/index sizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using &lt;code&gt;\dt+&lt;/code&gt; to see table sizes (some columns are removed for brevity):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────┬───────────────────┬─────────┐
│ Name │       Type        │  Size   │
├──────┼───────────────────┼─────────┤
│ t    │ partitioned table │ 0 bytes │
│ t_00 │ table             │ 24 MB   │
│ t_01 │ table             │ 24 MB   │
│ t_02 │ table             │ 24 MB   │
│ t_03 │ table             │ 24 MB   │
│ t_04 │ table             │ 24 MB   │
│ t_05 │ table             │ 24 MB   │
│ t_06 │ table             │ 24 MB   │
│ t_07 │ table             │ 24 MB   │
│ t_08 │ table             │ 24 MB   │
│ t_09 │ table             │ 24 MB   │
│ t_10 │ table             │ 24 MB   │
│ t_11 │ table             │ 24 MB   │
│ t_12 │ table             │ 24 MB   │
│ t_13 │ table             │ 24 MB   │
│ t_14 │ table             │ 24 MB   │
│ t_15 │ table             │ 24 MB   │
└──────┴───────────────────┴─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Using &lt;code&gt;\di+&lt;/code&gt; to see index sizes (some columns are removed for brevity):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────┬───────────────────┬─────────┐
│   Name    │       Type        │  Size   │
├───────────┼───────────────────┼─────────┤
│ t_pkey    │ partitioned index │ 0 bytes │
│ t_00_pkey │ index             │ 21 MB   │
│ t_01_pkey │ index             │ 21 MB   │
│ t_02_pkey │ index             │ 22 MB   │
│ t_03_pkey │ index             │ 20 MB   │
│ t_04_pkey │ index             │ 21 MB   │
│ t_05_pkey │ index             │ 21 MB   │
│ t_06_pkey │ index             │ 21 MB   │
│ t_07_pkey │ index             │ 20 MB   │
│ t_08_pkey │ index             │ 20 MB   │
│ t_09_pkey │ index             │ 21 MB   │
│ t_10_pkey │ index             │ 21 MB   │
│ t_11_pkey │ index             │ 21 MB   │
│ t_12_pkey │ index             │ 21 MB   │
│ t_13_pkey │ index             │ 21 MB   │
│ t_14_pkey │ index             │ 21 MB   │
│ t_15_pkey │ index             │ 21 MB   │
└───────────┴───────────────────┴─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the indexes are almost as large as the tables themselves. Also while the data is equally distributed among partitions (each 24 MB), the index sizes range from 20 to 22 MB. The total size for indexes is &lt;code&gt;334 MB&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  List partitioning
&lt;/h1&gt;

&lt;p&gt;If we want to use the right digit of the &lt;code&gt;id&lt;/code&gt; as the partitioning key, the primary key cannot be added to the parent table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;results in error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR:  unsupported PRIMARY KEY constraint with partition key definition
DETAIL:  PRIMARY KEY constraints cannot be used when partition keys include expressions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, we decided to add the primary key to each individual partition (this effectively enforces uniqueness across all data):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_00&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_01&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_02&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_03&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'3'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_04&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'4'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_05&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_06&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'6'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_07&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'7'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_08&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'8'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_09&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'9'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_10&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_11&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'b'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_12&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'c'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_13&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_14&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'e'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_15&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'f'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's benchmark again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
pgbench &lt;span class="nt"&gt;-c10&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; 900 &lt;span class="nt"&gt;-j30&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 15.123 ms
initial connection time = 8.810 ms
tps = 661.264382 (without initial connection time)

real    0m13.654s
user    0m0.150s
sys     0m0.409s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;slower&lt;/strong&gt; than the hash partition:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; &lt;code&gt;13.654s&lt;/code&gt; instead of &lt;code&gt;12.438s&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TPS:&lt;/strong&gt; &lt;code&gt;661.264382&lt;/code&gt; instead of &lt;code&gt;725.901931&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, we lose the primary key, and it's even slower! Hash partitioning is a clear winner here.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;\dt+&lt;/code&gt; and &lt;code&gt;\di+&lt;/code&gt; results in almost identical results, so let's not repeat myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using hash indexes instead of b-tree indexes
&lt;/h2&gt;

&lt;p&gt;Another suggested approach is to enforce uniqueness using hash indexes instead of b-tree indexes. The benefit is that they are often smaller and faster than b-tree indexes in case equality checking is the only important operation.&lt;/p&gt;

&lt;p&gt;Postgres primary keys do not yet support this, but we may use a hack by applying a non-equality constraint based on hashes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;drop&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;exists&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;left&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_00&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_01&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'1'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_02&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'2'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_03&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'3'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_04&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'4'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_05&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'5'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_06&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'6'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_07&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'7'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_08&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'8'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_09&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'9'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_10&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_11&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'b'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_12&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'c'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_13&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_14&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'e'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;t_15&lt;/span&gt;
&lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;values&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'f'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's benchmark this as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
pgbench &lt;span class="nt"&gt;-c10&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; 900 &lt;span class="nt"&gt;-j30&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; - &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 16.686 ms
initial connection time = 7.089 ms
tps = 599.314265 (without initial connection time)

real    0m15.067s
user    0m0.127s
sys     0m0.468s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, I didn't expect that. It's even slower now. Looking at the table sizes (&lt;code&gt;\dt+&lt;/code&gt;), they are the same as before (24MB).&lt;/p&gt;

&lt;p&gt;However, index sizes (&lt;code&gt;\di+&lt;/code&gt;) are a tiny bit smaller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┬───────┐
│     Name     │ Size  │
├──────────────┼───────┤
│ t_00_id_excl │ 20 MB │
│ t_01_id_excl │ 20 MB │
│ t_02_id_excl │ 20 MB │
│ t_03_id_excl │ 20 MB │
│ t_04_id_excl │ 20 MB │
│ t_05_id_excl │ 20 MB │
│ t_06_id_excl │ 20 MB │
│ t_07_id_excl │ 20 MB │
│ t_08_id_excl │ 20 MB │
│ t_09_id_excl │ 20 MB │
│ t_10_id_excl │ 20 MB │
│ t_11_id_excl │ 20 MB │
│ t_12_id_excl │ 20 MB │
│ t_13_id_excl │ 20 MB │
│ t_14_id_excl │ 20 MB │
│ t_15_id_excl │ 20 MB │
└──────────────┴───────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So, in total, the size of the index was reduced from &lt;code&gt;334 MB&lt;/code&gt; to &lt;code&gt;320 MB&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Summary
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Hash partitioning outperforms list partitioning in the above example (pay attention: not always)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hash partitioning has the added benefit that all tables have primary keys (again, specific to the above example). This is important when using logical replication. For instance, to use &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/blue-green-deployments-creating.html" rel="noopener noreferrer"&gt;AWS RDS blue/green deployment&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Make sure that all tables in the DB instance have a primary key. PostgreSQL logical replication doesn't allow &lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt; operations on tables that don't have a primary key.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using hash indexes instead of b-tree was not a performance boost, but reduced index size by less than 5%.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Edits
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Edit (2024-05-03)
&lt;/h2&gt;

&lt;p&gt;A colleague of mine, who asked to remain anonymous, explained the reason why list-partitioning was slower:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In your example, to compute the partition key of the list-based approach, you use a cast (cast &lt;code&gt;UUID&lt;/code&gt; to &lt;code&gt;text&lt;/code&gt;), then two functions are applied (&lt;code&gt;LEFT&lt;/code&gt; and &lt;code&gt;LOWER&lt;/code&gt;). The functions should be pretty quick, but the cast is slow. That’s why the combined effect is slower than the &lt;code&gt;hash&lt;/code&gt; function, which is implemented in C and is quite fast.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Another colleague, Tim, gave a nice summary:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;So, if I got it right, in essence it says “Don’t try to be fancy, just do it in a boring way and PostgreSQL will deal with it in an optimized way.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Which reminds me of a story of when we implemented a variant of &lt;code&gt;strlen()&lt;/code&gt; function and observed it’s slower than the GLIBC library function by a factor of 300 times! I should write a post on that too 🙂&lt;/p&gt;

&lt;h1&gt;
  
  
  Further reading
&lt;/h1&gt;

&lt;p&gt;For more info on hash indexes, look at &lt;a href="https://twitter.com/MSDousti/status/1783657400383225955" rel="noopener noreferrer"&gt;this Twitter thread&lt;/a&gt; and the links thereof.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://tinyurl.com/sp4hp7np" rel="noopener noreferrer"&gt;Post by Erwin Brandstetter&lt;/a&gt; to pgsql-general mailing list regarding &lt;code&gt;EXCLUDE USING hash(i WITH =)&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://hakibenita.com/postgresql-hash-index" rel="noopener noreferrer"&gt;Re-Introducing Hash Indexes in PostgreSQL&lt;/a&gt;. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>postgres</category>
      <category>performance</category>
      <category>hash</category>
      <category>index</category>
    </item>
    <item>
      <title>Ask not what PostgreSQL can do for AI, ask what AI can do for PostgreSQL</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Thu, 14 Dec 2023 19:52:28 +0000</pubDate>
      <link>https://dev.to/msdousti/ask-not-what-postgresql-can-do-for-ai-ask-what-ai-can-do-for-postgresql-26n7</link>
      <guid>https://dev.to/msdousti/ask-not-what-postgresql-can-do-for-ai-ask-what-ai-can-do-for-postgresql-26n7</guid>
      <description>&lt;p&gt;Today at &lt;a href="https://2023.pgconf.eu/" rel="noopener noreferrer"&gt;PostgreSQL Conference Europe 2023&lt;/a&gt;, some key people addressed the future of PostgreSQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/schedule/session/5140-building-for-the-future-and-betting-big-on-postgres/" rel="noopener noreferrer"&gt;Building for the Future and Betting Big on Postgres&lt;/a&gt; by &lt;strong&gt;Kevin Dallas&lt;/strong&gt;, CEO of EDB.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/schedule/session/5146-all-the-postgres-things-at-microsoft/" rel="noopener noreferrer"&gt;All the Postgres Things at Microsoft&lt;/a&gt; by &lt;strong&gt;Charles Feddersen&lt;/strong&gt;, Director of Product Management - Postgres on Azure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They both talked about how AI is influential, and how PostgreSQL can help. One major hot topic is around native storing and processing vector data, which AI heavily depends on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65d21cpnceis7uikq63c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F65d21cpnceis7uikq63c.png" alt="azure_ai extension" width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's nothing wrong with this approach, and in fact it's a quite welcome change. However, I see a gap here. While all the talk is about how PostgreSQL can help AI, the other way around is equally important: How AI can help PostgreSQL.&lt;/p&gt;

&lt;p&gt;The rest of the post entertains this idea. I give one example that is so basic that it cannot really be deemed as AI, but I hope you get the idea and develop it further!&lt;/p&gt;

&lt;h2&gt;
  
  
  Interlude: The highest-scored question on StackOverflow
&lt;/h2&gt;

&lt;p&gt;Many years ago, I thought of looking at the &lt;a href="https://stackoverflow.com/questions?tab=Votes" rel="noopener noreferrer"&gt;mostly voted question on StackOverflow&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclg4cqhqzn24rxgclr17.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fclg4cqhqzn24rxgclr17.png" alt="highest-scored question on StackOverflow" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stackoverflow.com/q/11227809/459391" rel="noopener noreferrer"&gt;This question&lt;/a&gt; was, and still is (as of this writing) on the top of the list. I read through the Q&amp;amp;A, and learned so much from it. I'll summarize it below, but make sure to check it out.&lt;/p&gt;

&lt;p&gt;Consider this C snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;sumConditionally&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; 
                               &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;arraySize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;unsigned&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;arraySize&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It traverses the whole &lt;code&gt;data&lt;/code&gt; array, and sums up entries that are 128 or greater.&lt;/p&gt;

&lt;p&gt;Does it matter if I shuffle &lt;code&gt;data&lt;/code&gt;? From the functionality standpoint, no: Addition is commutative, so order does not matter. What about the performance standpoint? We expect no change in performance either: Regardless of the order, the loop repeats the same number of times, the comparison is performed once per loop, and the number of entries for which the comparison holds is independent of the order of data. So, the performance must be the same, right?&lt;/p&gt;

&lt;p&gt;Wrong! The original poster observed a 6x improvement if the data is sorted. How on earth is that possible? Well, now you get why it's the highest-scored question.&lt;/p&gt;

&lt;p&gt;The reasoning is quite simple: CPUs have a lot of optimizations in place. One such optimization is called "branch prediction", which essentially boils down to predicting which branch of an &lt;code&gt;if&lt;/code&gt; statement is going to be taken. When the array is sorted, the CPU predictions turn out to be correct most of the time: Initially, the CPU makes a random guess. If the guess is correct, it continues guessing the same. Otherwise, it changes its guess. As the image below depicts, for a sorted array, the guess is correct most of the time, and the CPU has to change it only once. (Of course, I'm oversimplifying!)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq47b7krie48wcgbfmcwn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq47b7krie48wcgbfmcwn.png" alt="A sample array for branch prediction" width="800" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the most basic "reinforcement learning" (but it's so basic - some may argue that it's not an AI technique as the learning algorithm is prescribed).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; Define reinforcement learning like I'm 5.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT:&lt;/strong&gt; Reinforcement learning is like teaching a computer how to do something by giving it rewards when it does things right and punishments when it does things wrong. Just like how you learn to ride a bike better by getting a treat when you balance well and a small scold when you fall. Over time, the computer gets better at the task because it wants to get more rewards and avoid punishments. It's like a game where the computer tries to figure out the best way to do something by trial and error, learning from its mistakes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What does this has to do with PostgreSQL
&lt;/h2&gt;

&lt;p&gt;PostgreSQL optimizer uses a cost-based model to determine the least costly plan to execute a query. To do this, it uses some basic statistics about the data. If the statistics are not good enough, or the required statistics are not there, PostgreSQL often makes a bad guess, and thus picks a suboptimal plan.&lt;/p&gt;

&lt;p&gt;The issue, IMHO, is that PostgreSQL never corrects its guess. Run the same query 1000 times (assuming the underlying data / statistics does not change), and it always results in the same plan. For instance, it may guess that the number of returned rows are 6, and then choose to apply a NESTED LOOP JOIN, while during the execution, it actually fetches 1200 rows, which is awful for the prescribed plan.&lt;/p&gt;

&lt;p&gt;To see what I mean, read &lt;strong&gt;Franck Pachot&lt;/strong&gt;'s post, where he details the issue and suggests &lt;a href="https://dev.to/yugabyte/extended-statistics-and-pghintplan-rows--j4k"&gt;using extended statistics or a PostgreSQL extension to give it hints&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That kept me thinking for a while, and I guess a more futuristic approach is to equip the PostgreSQL planner with AI. In the most basic scenario, it can use something similar to the CPU's branch prediction algorithm: Record the guesses, compare against the actual rows fetched, and improve the guesses over time. But there are so many advanced machine learning algorithms that can be used here; it's just a matter of which one is efficient enough such that the planning time does not skyrocket, and the implementation does not bloat the PostgreSQL core. Alternatively, it can be added as a PostgreSQL extension.&lt;/p&gt;

&lt;p&gt;I'm sure using AI inside PostgreSQL is not a new suggestion, and if I had searched, I would have found many others detailing it. However, I felt like writing about it myself 🙂&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>ai</category>
      <category>pgconfeu</category>
    </item>
    <item>
      <title>How Gradle disagreed with our Maven project</title>
      <dc:creator>Sadeq Dousti</dc:creator>
      <pubDate>Thu, 07 Dec 2023 10:59:30 +0000</pubDate>
      <link>https://dev.to/msdousti/how-gradle-disagreed-with-our-maven-project-3n4a</link>
      <guid>https://dev.to/msdousti/how-gradle-disagreed-with-our-maven-project-3n4a</guid>
      <description>&lt;h2&gt;
  
  
  Preamble
&lt;/h2&gt;

&lt;p&gt;I'm a maintainer of a popular open-source project called &lt;a href="https://github.com/zalando/logbook/" rel="noopener noreferrer"&gt;Logbook&lt;/a&gt;. It is "an extensible Java library for HTTP request and response logging". The project uses Maven for build automation.&lt;/p&gt;

&lt;p&gt;I have recently &lt;a href="https://github.com/zalando/logbook/commit/3004477032652125ee4b302c1d5b4e69ec4ac198#diff-97593056ecbf49ece87cde9da9d7bf29153431ca92e89a86e7f190edd7e5c621R21" rel="noopener noreferrer"&gt;made a change&lt;/a&gt; in the &lt;em&gt;dependency management&lt;/em&gt; section of one of the modules, that resulted in an unwanted effect: &lt;a href="https://github.com/zalando/logbook/issues/1705" rel="noopener noreferrer"&gt;Lombok became a transitive dependency of our project&lt;/a&gt;, but only if you use Gradle in your project! If you use Maven, you won't face this issue.&lt;/p&gt;

&lt;p&gt;But why?!&lt;/p&gt;

&lt;p&gt;I'm going to share the result of my investigations in this post. The structure of the article is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Toolchain used, plus an intro to dependency management&lt;/li&gt;
&lt;li&gt;Example: Creating a library (with the dependency management section, and using it in a Maven and a Gradle project&lt;/li&gt;
&lt;li&gt;Showing disagreement: Adding submodules to the library, overriding the dependency management, and seeing how Maven and a Gradle projects interpret it differently&lt;/li&gt;
&lt;li&gt;Conclusion and personal opinion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Toolchain
&lt;/h2&gt;

&lt;p&gt;I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenJDK Temurin 17.0.9+9 &lt;/li&gt;
&lt;li&gt;Apache Maven 3.9.6&lt;/li&gt;
&lt;li&gt;Gradle 8.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find the &lt;a href="https://github.com/msdousti/maven-vs-gradle" rel="noopener noreferrer"&gt;source code developed below&lt;/a&gt; on GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependency management
&lt;/h2&gt;

&lt;p&gt;Dependency management is a way to tell Maven a few key information about the dependencies of the project. These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dependency version&lt;/li&gt;
&lt;li&gt;Dependency scope&lt;/li&gt;
&lt;li&gt;Excluded dependencies&lt;/li&gt;
&lt;li&gt;Optional dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a refresher, see &lt;a href="https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html" rel="noopener noreferrer"&gt;Introduction to the Dependency Mechanism&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;p&gt;Consider a simple project with the following &lt;code&gt;pom.xml&lt;/code&gt; file (here, I'm using &lt;code&gt;jcip-annotations&lt;/code&gt; as a no-fluff dependency).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt;
         &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt;
         &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-library&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;properties&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.source&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.source&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.target&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.target&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;project.build.sourceEncoding&amp;gt;&lt;/span&gt;UTF-8&lt;span class="nt"&gt;&amp;lt;/project.build.sourceEncoding&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/properties&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencyManagement&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;net.jcip&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;jcip-annotations&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;provided&lt;span class="nt"&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencyManagement&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;net.jcip&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;jcip-annotations&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see that the &lt;code&gt;dependencyManagement&lt;/code&gt; section specifies both version and scope for the dependency, so we can simply include the dependency in the &lt;code&gt;dependencies&lt;/code&gt; section without specifying those.&lt;/p&gt;

&lt;p&gt;If you run &lt;code&gt;mvn dependency:tree&lt;/code&gt; in the project root (I assume that you have installed Maven and Java, and they are properly configure), you'll see&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[INFO] --- dependency:3.6.1:tree (default-cli) @ maven-library ---
[INFO] io.msdousti:maven-library:jar:1.0-SNAPSHOT
[INFO] \- net.jcip:jcip-annotations:jar:1.0:provided
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This clearly shows that the JAR file of &lt;code&gt;net.jcip:jcip-annotations&lt;/code&gt; is imported with proper version (&lt;code&gt;1.0&lt;/code&gt;) and scope (&lt;code&gt;provided&lt;/code&gt;). Cool!&lt;/p&gt;

&lt;p&gt;You can also use your IDE to see this info graphically. For instance, in IntelliJ IDEA, you can open the Maven tool window:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqbty9gjgm9uwfrsgtma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqbty9gjgm9uwfrsgtma.png" alt="Viewing dependencies in Maven tool window of IntelliJ IDEA" width="582" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a side note, IntelliJ IDEA also provides this nifty feature called "Analyze Dependencies", which is very handy in seeing which dependencies conflict and which version is ultimately chosen by Maven:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbvgnx6pt0cldnasszjn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbvgnx6pt0cldnasszjn.png" alt="Analyze Dependencies feature of IntelliJ IDEA" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Including the project as a dependency in other project
&lt;/h2&gt;

&lt;p&gt;To including the project as a dependency in other project, we first need to build and install it in the local Maven repository. Run this in the root of the project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mvn &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then check the local Maven repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; ~/.m2/repository/io/msdousti/maven-library/1.0-SNAPSHOT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should contain the project files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_remote.repositories  maven-library-1.0-SNAPSHOT.jar  maven-library-1.0-SNAPSHOT.pom  maven-metadata-local.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's include it as a dependency in a Maven and a Gradle project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maven Project
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;pom.xml&lt;/code&gt; would look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt;
         &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt;
         &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;test-maven&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;properties&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.source&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.source&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.target&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.target&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;project.build.sourceEncoding&amp;gt;&lt;/span&gt;UTF-8&lt;span class="nt"&gt;&amp;lt;/project.build.sourceEncoding&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/properties&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-library&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you check the dependency tree by running &lt;code&gt;mvn dependency:tree&lt;/code&gt;, you will see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[INFO] --- dependency:3.6.1:tree (default-cli) @ test-maven ---
[INFO] io.msdousti:test-maven:jar:1.0-SNAPSHOT
[INFO] \- io.msdousti:maven-library:jar:1.0-SNAPSHOT:compile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;jcip-annotations&lt;/code&gt; is not included. This is because it has the scope &lt;code&gt;provided&lt;/code&gt; in the &lt;code&gt;io.msdousti:maven-library&lt;/code&gt; dependency, so it won't be transitively included.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gradle
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;build.gradle&lt;/code&gt; will be like this (Notice that I added &lt;code&gt;mavenLocal()&lt;/code&gt; to the list of &lt;code&gt;repositories&lt;/code&gt;, as I want to include the dependency &lt;code&gt;maven-library&lt;/code&gt; from the local &lt;code&gt;.m2&lt;/code&gt; repository):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight groovy"&gt;&lt;code&gt;&lt;span class="n"&gt;plugins&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="s1"&gt;'java'&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'io.msdousti'&lt;/span&gt;
&lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'1.0-SNAPSHOT'&lt;/span&gt;

&lt;span class="n"&gt;repositories&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mavenCentral&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mavenLocal&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;dependencies&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;implementation&lt;/span&gt; &lt;span class="s1"&gt;'io.msdousti:maven-library:1.0-SNAPSHOT'&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using &lt;code&gt;gradle :dependencies&lt;/code&gt;, we can view the dependencies, but as the output is bulky, I just show the graphical version from my IDE:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh1kvwf1oiylh5raab0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh1kvwf1oiylh5raab0s.png" alt="Dependencies of the Gradle project" width="599" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We see that &lt;code&gt;maven-library&lt;/code&gt; is included in all four Gradle "class paths". But again, &lt;code&gt;jcip-annotations&lt;/code&gt; is not included, which is in agreement with the Maven project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disagreement begins
&lt;/h2&gt;

&lt;p&gt;Maven and Gradle start to diverge when our &lt;code&gt;maven-library&lt;/code&gt; has a sub-module with overriding &lt;code&gt;dependencyManagement&lt;/code&gt; section.&lt;/p&gt;

&lt;p&gt;What does that mean?! Let's find out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating sub-modules
&lt;/h3&gt;

&lt;p&gt;Let's add two sub-modules to &lt;code&gt;maven-library&lt;/code&gt;. The directory structure will be like this (For simplicity, I ignored directories like &lt;code&gt;src&lt;/code&gt; that are not relevant here):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
|
├─ pom.xml
├─ my-bom
|   └─ pom.xml
└─ my-module
    └─ pom.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 BOM stands for &lt;strong&gt;Bill of Material&lt;/strong&gt;. It is often a dependency that includes the &lt;code&gt;dependencyManagement&lt;/code&gt; for the project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the parent &lt;code&gt;pom.xml&lt;/code&gt;, we have to specify a packaging of &lt;code&gt;pom&lt;/code&gt;, plus two &lt;code&gt;modules&lt;/code&gt; elements (add them just below the project coordinates):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;groupId&amp;gt;io.msdousti&amp;lt;/groupId&amp;gt;
&amp;lt;artifactId&amp;gt;maven-library&amp;lt;/artifactId&amp;gt;
&amp;lt;version&amp;gt;1.0-SNAPSHOT&amp;lt;/version&amp;gt;

&amp;lt;!-- add below --&amp;gt;
&amp;lt;packaging&amp;gt;pom&amp;lt;/packaging&amp;gt;
&amp;lt;modules&amp;gt;
    &amp;lt;module&amp;gt;my-bom&amp;lt;/module&amp;gt;
    &amp;lt;module&amp;gt;my-module&amp;lt;/module&amp;gt;
&amp;lt;/modules&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;pom.xml&lt;/code&gt; file of the first child (&lt;code&gt;my-bom&lt;/code&gt;) will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt;
         &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt;
         &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;parent&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-library&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/parent&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;my-bom&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;packaging&amp;gt;&lt;/span&gt;pom&lt;span class="nt"&gt;&amp;lt;/packaging&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;properties&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.source&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.source&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;maven.compiler.target&amp;gt;&lt;/span&gt;17&lt;span class="nt"&gt;&amp;lt;/maven.compiler.target&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;project.build.sourceEncoding&amp;gt;&lt;/span&gt;UTF-8&lt;span class="nt"&gt;&amp;lt;/project.build.sourceEncoding&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/properties&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencyManagement&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;net.jcip&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;jcip-annotations&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;compile&lt;span class="nt"&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencyManagement&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that it has a &lt;code&gt;dependencyManagement&lt;/code&gt; section, that specifies the &lt;code&gt;compile&lt;/code&gt; scope for &lt;code&gt;jcip-annotations&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;pom.xml&lt;/code&gt; file of the other child (&lt;code&gt;my-module&lt;/code&gt;) will look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;project&lt;/span&gt; &lt;span class="na"&gt;xmlns=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0"&lt;/span&gt;
         &lt;span class="na"&gt;xmlns:xsi=&lt;/span&gt;&lt;span class="s"&gt;"http://www.w3.org/2001/XMLSchema-instance"&lt;/span&gt;
         &lt;span class="na"&gt;xsi:schemaLocation=&lt;/span&gt;&lt;span class="s"&gt;"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;modelVersion&amp;gt;&lt;/span&gt;4.0.0&lt;span class="nt"&gt;&amp;lt;/modelVersion&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;parent&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;maven-library&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/parent&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;my-module&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;dependencyManagement&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependencies&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.msdousti&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;my-bom&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.0-SNAPSHOT&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;import&lt;span class="nt"&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;type&amp;gt;&lt;/span&gt;pom&lt;span class="nt"&gt;&amp;lt;/type&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependencies&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/dependencyManagement&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;/project&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It includes &lt;code&gt;my-bom&lt;/code&gt; in the &lt;code&gt;dependencyManagement&lt;/code&gt; section, with scope &lt;code&gt;import&lt;/code&gt; and type &lt;code&gt;pom&lt;/code&gt;. If you look at the dependencies of this module, you see that it honors the scope specified by the parent project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc16x7exdpgmmofp21a8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc16x7exdpgmmofp21a8v.png" alt="Dependencies of module my-module" width="505" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Do a &lt;code&gt;mvn install&lt;/code&gt; on the parent project, and let's find out what how our Maven and Gradle projects that use &lt;code&gt;maven-module&lt;/code&gt; will look like.&lt;/p&gt;

&lt;h3&gt;
  
  
  Maven project
&lt;/h3&gt;

&lt;p&gt;Refresh the Maven project &lt;code&gt;test-maven&lt;/code&gt; in your IDE (or use Maven command &lt;code&gt;mvn dependency:tree&lt;/code&gt; at the root of the project).&lt;br&gt;
The dependencies are still the same:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm46t8eblezork6j7wuk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvm46t8eblezork6j7wuk.png" alt="dependencies of the Maven project  raw `test-maven` endraw " width="541" height="342"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Gradle project
&lt;/h3&gt;

&lt;p&gt;Refresh the Maven project &lt;code&gt;test-gradle&lt;/code&gt; in your IDE (or use Maven command &lt;code&gt;gradle :dependencies&lt;/code&gt; at the root of the project).&lt;br&gt;
You'll see that &lt;code&gt;net.jcip:jcip-annotations:1.0&lt;/code&gt; will appear as a transitive dependency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv8ivukys72am1w2uvbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcv8ivukys72am1w2uvbf.png" alt="dependencies of the Gradle project  raw `test-gradle` endraw " width="586" height="538"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Looking at the &lt;a href="https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html" rel="noopener noreferrer"&gt;article&lt;/a&gt; I shared at the beginning of this post, we stumble upon this section about the &lt;code&gt;import&lt;/code&gt; scope (emphasis mine):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This scope is only supported on a dependency of type pom in the &lt;code&gt;&amp;lt;dependencyManagement&amp;gt;&lt;/code&gt; section. It indicates the dependency is to be &lt;strong&gt;replaced&lt;/strong&gt; with the effective list of dependencies in the specified POM's &lt;code&gt;&amp;lt;dependencyManagement&amp;gt;&lt;/code&gt; section. Since they are replaced, dependencies with a scope of import do not actually participate in limiting the transitivity of a dependency.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, the expectation is that the dependencies are &lt;strong&gt;replaced&lt;/strong&gt;. This is exactly what Gradle does, and Maven does not seem to honor this section.&lt;/p&gt;

&lt;p&gt;At least this is what I understand! If you have further insights, please feel free to share them in the comments section.&lt;/p&gt;

</description>
      <category>java</category>
      <category>maven</category>
      <category>gradle</category>
    </item>
  </channel>
</rss>
