<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Giovanni Martinez</title>
    <description>The latest articles on DEV Community by Giovanni Martinez (@iqtoolkit).</description>
    <link>https://dev.to/iqtoolkit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3820977%2F83ddc750-0615-48a2-b90d-d3bec3da4fa1.png</url>
      <title>DEV Community: Giovanni Martinez</title>
      <link>https://dev.to/iqtoolkit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/iqtoolkit"/>
    <language>en</language>
    <item>
      <title>Inside Hypercore: How TimescaleDB Quietly Built a Hybrid OLTP/OLAP Engine on Postgres</title>
      <dc:creator>Giovanni Martinez</dc:creator>
      <pubDate>Tue, 02 Jun 2026 04:00:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/inside-hypercore-how-timescaledb-quietly-built-a-hybrid-oltpolap-engine-on-postgres-49ih</link>
      <guid>https://dev.to/aws-builders/inside-hypercore-how-timescaledb-quietly-built-a-hybrid-oltpolap-engine-on-postgres-49ih</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt11brya0togjiksta7h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt11brya0togjiksta7h.png" alt="Diagram of internals of TimescalDB Hypercore" width="799" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Disclosure:&lt;/strong&gt; I work at Tiger Data, the company behind TimescaleDB. This post is my own analysis based on public documentation and code, and is not an official Tiger Data publication. I've tried to write the post I would have wanted to read six months ago.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The reframe
&lt;/h2&gt;

&lt;p&gt;Here's something most people miss: Hypercore isn't a new feature. It's a rename.&lt;/p&gt;

&lt;p&gt;The hybrid row-columnar storage engine inside TimescaleDB already existed — it was the machinery behind what everyone called "compression." Tiger Data renamed the whole package to &lt;em&gt;Hypercore&lt;/em&gt; because they realized the conversion from row-oriented to column-oriented storage was what customers actually cared about. Compression was the side effect. Real-time analytics on fresh data was the prize.&lt;/p&gt;

&lt;p&gt;That shift in framing matters. If you came to TimescaleDB looking for a time-series database with nice storage savings, you were buying the wrong thing. What you actually got was a Postgres extension that fuses an OLTP-style rowstore with an OLAP-style columnstore behind a single SQL surface — with no separate analytics database, no ETL, no eventual consistency.&lt;/p&gt;

&lt;p&gt;This post is about how that actually works. Not the marketing version. The version where we look at chunks, catalogs, and compression algorithms, and ask: &lt;em&gt;why does this work, and where does it break?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-store mental model
&lt;/h2&gt;

&lt;p&gt;If you've worked with Postgres long enough, you already have the mental model for Hypercore — you just don't know it yet.&lt;/p&gt;

&lt;p&gt;Think of how Postgres treats a freshly-inserted row versus an old one that's been vacuumed, frozen, and is sitting in a cold page on disk. Both are "the same table," but they live in different states, have different access patterns, and respond to different optimizations. Hypercore takes that idea and makes it explicit.&lt;/p&gt;

&lt;p&gt;There are two stores:&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;rowstore&lt;/strong&gt; is regular Postgres heap storage. New data lands here. Inserts are fast. Updates and deletes work normally. Indexes behave the way you expect. If you stopped here, you'd just have a normal hypertable.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;columnstore&lt;/strong&gt; is what older chunks get converted into. Each column is stored separately, compressed, and organized for scanning rather than point lookups. Aggregations fly. Scans skip irrelevant chunks entirely. Storage drops by 90–98%.&lt;/p&gt;

&lt;p&gt;The trick is that &lt;em&gt;both stores belong to the same hypertable.&lt;/em&gt; You don't query them separately. You write one SQL statement, and the planner figures out which chunks live where and reads accordingly. From the application's perspective, it's just a table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This query reads transparently across rowstore and columnstore chunks&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;time_bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'1 hour'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;device_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sensor_readings&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fresh chunks (last few hours) sit in the rowstore. The week-old chunks have been converted to the columnstore. You wrote one query. Postgres did the right thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happens during conversion
&lt;/h2&gt;

&lt;p&gt;Let's get concrete. A chunk's lifecycle looks like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the hypertable.&lt;/strong&gt; You write a normal &lt;code&gt;CREATE TABLE&lt;/code&gt;, then &lt;code&gt;SELECT create_hypertable(...)&lt;/code&gt;. TimescaleDB partitions the table into chunks based on a time interval (default: 7 days, configurable per workload).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inserts hit the rowstore.&lt;/strong&gt; Every &lt;code&gt;INSERT&lt;/code&gt; lands in the chunk corresponding to its timestamp. These chunks are regular heap tables — Postgres doesn't know they're special. You can &lt;code&gt;\d+&lt;/code&gt; them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A columnstore policy is configured.&lt;/strong&gt; When you enable the columnstore on a hypertable, you tell TimescaleDB &lt;em&gt;when&lt;/em&gt; chunks should convert (e.g., "after they're 7 days old") and &lt;em&gt;how&lt;/em&gt; they should be organized (the &lt;code&gt;segmentby&lt;/code&gt; and &lt;code&gt;orderby&lt;/code&gt; options, which we'll get to).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sensor_readings&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enable_columnstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;segmentby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'device_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;timescaledb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;orderby&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ts DESC'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_columnstore_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'sensor_readings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;after&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A background job converts old chunks.&lt;/strong&gt; When a chunk crosses the age threshold, a background worker rewrites it. Rows get reorganized into column-major layout, compression algorithms run per column, and the result lands in internal storage that the hypertable knows how to read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The original chunk is replaced.&lt;/strong&gt; From a query planner perspective, the chunk is now "in the columnstore." It still belongs to the hypertable, still participates in queries, but is read through different machinery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modifications still work.&lt;/strong&gt; This is the part that distinguishes Hypercore from a pure columnar engine. You can &lt;code&gt;INSERT&lt;/code&gt; into a columnstore chunk, &lt;code&gt;UPDATE&lt;/code&gt; it, &lt;code&gt;DELETE&lt;/code&gt; from it. The engine handles the decompress-modify-recompress dance under transactional semantics. It's not free — there's overhead — but it works, which most columnar systems can't say.&lt;/p&gt;

&lt;p&gt;If you want to see what's happening under the hood, the catalog views are your friend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- See which chunks are in the columnstore&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;chunk_schema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;is_compressed&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;timescaledb_information&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;hypertable_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'sensor_readings'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;range_start&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- See compression stats&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;before_compression_total_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;before&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;after_compression_total_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;after&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;hypertable_compression_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'sensor_readings'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The compression algorithms (or: why your time-series data wants to be small)
&lt;/h2&gt;

&lt;p&gt;Generic compression — gzip, LZ4, zstd — treats your data as an opaque stream of bytes. Time-series data isn't opaque. It has &lt;em&gt;structure&lt;/em&gt;: timestamps tick forward at predictable intervals, sensor readings drift slowly, device IDs repeat for hours. A type-aware compressor can exploit that, and it's the difference between 3x and 30x ratios.&lt;/p&gt;

&lt;p&gt;Hypercore picks the right algorithm per column type and layers them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column type&lt;/th&gt;
&lt;th&gt;Algorithm(s)&lt;/th&gt;
&lt;th&gt;Why it works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Timestamps&lt;/td&gt;
&lt;td&gt;Delta-of-delta + Simple-8b&lt;/td&gt;
&lt;td&gt;Regular intervals → second derivative is zero → store almost nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slow-moving floats&lt;/td&gt;
&lt;td&gt;Gorilla XOR (from &lt;a href="https://www.vldb.org/pvldb/vol8/p1816-teller.pdf" rel="noopener noreferrer"&gt;Facebook's Gorilla paper&lt;/a&gt;)&lt;/td&gt;
&lt;td&gt;Consecutive values share most bits; XOR leaves near-zero residual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Low-cardinality strings&lt;/td&gt;
&lt;td&gt;Dictionary + RLE&lt;/td&gt;
&lt;td&gt;Repeated values collapse to a count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Any small-integer stream&lt;/td&gt;
&lt;td&gt;Simple-8b&lt;/td&gt;
&lt;td&gt;Packs multiple values into one 64-bit word&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A timestamp column compressed with delta-of-delta + simple-8b gets to a few bits per row. A &lt;code&gt;status TEXT&lt;/code&gt; column with five distinct values across a billion rows costs essentially nothing. The cumulative layering is where the 90–98% numbers come from.&lt;/p&gt;

&lt;p&gt;What you &lt;em&gt;control&lt;/em&gt; is the column layout — &lt;code&gt;segmentby&lt;/code&gt; and &lt;code&gt;orderby&lt;/code&gt; — which determines how much repetition each algorithm has to work with. Get those right and the compression engine does its job. Get them wrong and you'll wonder why your "compressed" data is still big.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;segmentby&lt;/code&gt; and &lt;code&gt;orderby&lt;/code&gt;: the two decisions that determine everything
&lt;/h2&gt;

&lt;p&gt;If you remember nothing else from this post, remember this: &lt;strong&gt;the compression ratio you get is mostly a function of two settings.&lt;/strong&gt; Not the algorithms, not the chunk size, not the hardware. Those two settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;segmentby&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;segmentby&lt;/code&gt; tells the columnstore which columns to group rows by within a chunk. Think of it like a &lt;code&gt;GROUP BY&lt;/code&gt; for storage: all rows with the same &lt;code&gt;segmentby&lt;/code&gt; value get co-located, then compressed together.&lt;/p&gt;

&lt;p&gt;Pick the right column and the compression algorithms see long runs of repeated values — RLE crushes them, dictionary encoding hands them off as a single integer, the column collapses to almost nothing. Pick the wrong column (or none at all) and the compressor sees a random salad of values and does its best, which isn't much.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; &lt;code&gt;segmentby&lt;/code&gt; should be the column you most often filter on, typically a low-to-medium cardinality identifier — &lt;code&gt;device_id&lt;/code&gt;, &lt;code&gt;tenant_id&lt;/code&gt;, &lt;code&gt;symbol&lt;/code&gt;, &lt;code&gt;host&lt;/code&gt;. Not a timestamp. Not a high-cardinality UUID. Not the metric value itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;orderby&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;orderby&lt;/code&gt; controls the row order &lt;em&gt;within&lt;/em&gt; each segment. The default is your time column descending, which is almost always right, but you can layer in additional columns.&lt;/p&gt;

&lt;p&gt;Why does this matter? Because compression algorithms exploit locality. Delta-of-delta only works if consecutive rows have similar timestamps. Gorilla XOR only works if consecutive floats are similar. If your rows are in random order, none of that lands.&lt;/p&gt;

&lt;h3&gt;
  
  
  An example with numbers
&lt;/h3&gt;

&lt;p&gt;Imagine a metrics table: &lt;code&gt;(ts, device_id, temperature, humidity)&lt;/code&gt;, one billion rows, 10,000 devices, one reading per minute per device.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Approximate ratio&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No &lt;code&gt;segmentby&lt;/code&gt;, no &lt;code&gt;orderby&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;~5x&lt;/td&gt;
&lt;td&gt;Generic per-column compression, no locality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;segmentby = device_id&lt;/code&gt;, default &lt;code&gt;orderby&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;~25x&lt;/td&gt;
&lt;td&gt;Per-device rows colocated, timestamps regular, floats drift slowly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;segmentby = device_id, ts&lt;/code&gt; (wrong)&lt;/td&gt;
&lt;td&gt;~3x&lt;/td&gt;
&lt;td&gt;High-cardinality segments → tiny groups → no compression headroom&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are illustrative ranges, not benchmarks — your data will vary. But the &lt;em&gt;shape&lt;/em&gt; is real: a smart &lt;code&gt;segmentby&lt;/code&gt; is the difference between "TimescaleDB saved us a lot of money" and "we don't really see why people talk about this."&lt;/p&gt;

&lt;p&gt;To choose: find the column in the &lt;code&gt;WHERE&lt;/code&gt; clause of 80% of your analytical queries, verify it has low-to-medium cardinality (aim for at least a few thousand rows per segment), then check &lt;code&gt;hypertable_compression_stats&lt;/code&gt; after setup. If you're under 10x, your &lt;code&gt;segmentby&lt;/code&gt; is wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OLTP problem (and what 2.18 did about it)
&lt;/h2&gt;

&lt;p&gt;For most of Hypercore's life, there was one big asterisk: once a chunk moved to the columnstore, you lost your indexes.&lt;/p&gt;

&lt;p&gt;This is the classic columnar-database problem. Column stores are great for "scan a billion rows and aggregate," terrible for "find the one row where &lt;code&gt;transaction_id = 0xdeadbeef&lt;/code&gt;." Postgres's B-tree indexes — the things that make point lookups instant — don't translate naturally to compressed columnar layouts.&lt;/p&gt;

&lt;p&gt;In practice, that meant: as soon as your data got compressed, looking up a single record turned into a sequential scan over the chunk. Updating a single sensor reading from three months ago became a "decompress the whole batch, modify, recompress" operation. Fine for analytics. Painful for hybrid OLTP/OLAP workloads where someone occasionally needs to correct a specific record.&lt;/p&gt;

&lt;p&gt;TimescaleDB 2.18 fixed this. B-tree and hash indexes now work on columnstore chunks (still labeled Early Access at time of writing, so check the current docs). The vendor benchmarks report something like 1,185x faster record retrievals and 224x faster inserts on indexed columnstore data. Take vendor numbers with appropriate salt — but the qualitative leap is real and matches what you'd expect once compressed chunks have a real index structure attached to them.&lt;/p&gt;

&lt;p&gt;This is the change that turns Hypercore from "real-time analytics engine that's awkward at OLTP" into something legitimately hybrid. If you evaluated TimescaleDB more than a year ago and bounced because of point-lookup performance on old data, this is the upgrade worth re-evaluating on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gotchas section
&lt;/h2&gt;

&lt;p&gt;Every honest internals post needs this section. Hypercore is impressive, but it isn't magic. Here's what to watch for in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema changes are heavier than they look
&lt;/h3&gt;

&lt;p&gt;Adding a column to a hypertable with columnstore chunks is mostly fine. Changing a column &lt;em&gt;type&lt;/em&gt; — &lt;code&gt;int&lt;/code&gt; to &lt;code&gt;bigint&lt;/code&gt;, say, on a multi-billion-row hypertable — is a different animal. Compressed chunks need to be decompressed, rewritten, and recompressed. I've watched this turn into a multi-day operation on production-sized tables. Plan it like a real migration: maintenance windows, &lt;code&gt;maintenance_work_mem&lt;/code&gt; tuning, monitoring of &lt;code&gt;BufFileWrite&lt;/code&gt; and &lt;code&gt;LWLock:WALWrite&lt;/code&gt; waits. The "ALTER TABLE is fast in Postgres" reflex will burn you here.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;UPDATE&lt;/code&gt;s on the columnstore aren't free
&lt;/h3&gt;

&lt;p&gt;You can modify columnstore data under full transactional semantics, but an update generally implies a decompression step on the affected batch. Bulk updates on cold compressed chunks can be surprisingly expensive. If your workload involves a lot of late-arriving corrections, think carefully about whether that data should stay in the rowstore longer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logical replication has limits
&lt;/h3&gt;

&lt;p&gt;Logical replication on TimescaleDB hypertables — especially with the columnstore involved — has historically had sharp edges. Don't assume "it's just Postgres, so logical replication just works." Check current docs for what's officially supported in your version before relying on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to lean on Hypercore (and when to think twice)
&lt;/h2&gt;

&lt;p&gt;A short, honest decision framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lean in when your workload is:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Append-heavy with mostly immutable historical data&lt;/li&gt;
&lt;li&gt;Analytics-shaped: aggregations, time-bucketed queries, dashboards&lt;/li&gt;
&lt;li&gt;A natural fit for Postgres (you want SQL, joins, transactions, the ecosystem)&lt;/li&gt;
&lt;li&gt;Operating at a scale where storage cost is a line item you care about&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Think carefully when your workload involves:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heavy point-lookups and mutations on old data (better with 2.18, still worth evaluating)&lt;/li&gt;
&lt;li&gt;High-cardinality data that doesn't have natural &lt;code&gt;segmentby&lt;/code&gt; candidates&lt;/li&gt;
&lt;li&gt;Strict requirement for traditional Postgres backup/recovery flows without extension awareness&lt;/li&gt;
&lt;li&gt;A team without bandwidth to learn the operational shape of hypertables and chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest framing: Hypercore is excellent at what it's built for, which is real-time analytics on time-structured data. It is not a general-purpose OLTP accelerator, and it's not free — you take on the operational complexity of an extension that controls a lot of storage-level behavior. For the workloads it fits, that trade is one of the best deals in the Postgres ecosystem. For the workloads it doesn't, you'll know within a month.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This post stayed deliberately narrow: what Hypercore &lt;em&gt;is&lt;/em&gt;, how it works, and where to be careful. There are at least six follow-ups worth writing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;segmentby&lt;/code&gt; and &lt;code&gt;orderby&lt;/code&gt;, deep dive&lt;/strong&gt; — with actual benchmarks on a realistic dataset, not the hand-wavy ratios from the table above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous aggregates&lt;/strong&gt; — the other half of the real-time analytics story, and the feature that makes dashboards possible on billion-row tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk skipping&lt;/strong&gt; — the under-discussed query optimization that makes Hypercore queries even faster than the column layout alone would suggest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression algorithms, deep dive&lt;/strong&gt; — delta-of-delta, Gorilla XOR, Simple-8b, and dictionary encoding each deserve more than a table row. The math behind why they combine to 90–98% is worth a full post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backups and restore&lt;/strong&gt; — logical vs. physical backups behave very differently with compressed chunks. &lt;code&gt;pg_dump&lt;/code&gt; restores need to re-run compression policies; physical backups preserve compressed state. Testing your restore path on Hypercore data is its own topic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;segmentby&lt;/code&gt; cardinality pitfalls&lt;/strong&gt; — a too-granular &lt;code&gt;segmentby&lt;/code&gt; doesn't just hurt compression ratios, it can balloon catalog metadata in ways the docs don't loudly warn you about. Worth walking through with real numbers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If there's a piece of this you want me to go deeper on, let me know. The fun part of writing about internals is that there's always another layer.&lt;/p&gt;




&lt;p&gt;Want more Postgres internals content? I write here weekly and on &lt;a href="https://linkedin.com/in/thepostgresguy" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;. My book series, &lt;a href="https://leanpub.com/postgresqlinternalsmastery" rel="noopener noreferrer"&gt;PostgreSQL Internals Mastery&lt;/a&gt;, goes much deeper on the topics in this post.&lt;/p&gt;

&lt;p&gt;Have questions about TimescaleDB or Postgres performance? Reach out on &lt;a href="https://x.com/iqtoolkit" rel="noopener noreferrer"&gt;X&lt;/a&gt; or &lt;a href="https://mastodon.social/@iqtoolkit" rel="noopener noreferrer"&gt;Mastodon&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>postgressql</category>
      <category>tigerdata</category>
      <category>hypercore</category>
    </item>
    <item>
      <title>Beyond the API: DocumentDB vs. Aurora PostgreSQL for JSON Workloads</title>
      <dc:creator>Giovanni Martinez</dc:creator>
      <pubDate>Tue, 28 Apr 2026 11:44:33 +0000</pubDate>
      <link>https://dev.to/aws-builders/beyond-the-api-documentdb-vs-aurora-postgresql-for-json-workloads-4dgl</link>
      <guid>https://dev.to/aws-builders/beyond-the-api-documentdb-vs-aurora-postgresql-for-json-workloads-4dgl</guid>
      <description>&lt;p&gt;In my last post, we dug into the deep internal trade-offs of running JSON workloads on PostgreSQL vs. MongoDB. We looked at how PostgreSQL's MVCC and TOAST architectures create hidden write amplification, and how MongoDB's WiredTiger engine handles documents differently with a copy-on-write B-Tree design.&lt;/p&gt;

&lt;p&gt;But what happens when you decide you do not want to manage those database servers yourself? You migrate to AWS, open the managed services menu, and start comparing &lt;strong&gt;Amazon DocumentDB&lt;/strong&gt; with &lt;strong&gt;Amazon Aurora PostgreSQL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is the plot twist most architecture discussions miss: &lt;strong&gt;at this point, you are no longer comparing PostgreSQL heap internals directly against MongoDB WiredTiger internals. You are comparing two different database compute layers built on AWS's decoupled, log-centric, multi-AZ distributed storage model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That shift completely changes the "Postgres vs. Mongo" math for JSON workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shared Foundation: The Database as a Log-Centric System
&lt;/h2&gt;

&lt;p&gt;To understand both DocumentDB and Aurora PostgreSQL, you first need to understand Aurora's storage architecture.&lt;/p&gt;

&lt;p&gt;In traditional single-instance database deployments, the database engine is responsible for page management and writes data pages plus WAL over network-attached block storage. The compute node and storage node are tightly coupled from a write path perspective.&lt;/p&gt;

&lt;p&gt;Aurora changed that model.&lt;/p&gt;

&lt;p&gt;In Aurora's architecture, the compute layer primarily emits redo records to a distributed storage subsystem spanning three Availability Zones. The storage service is responsible for durable replication and page materialization across its internal fleet. This substantially reduces the amount of page-oriented write work the compute node must manage directly.&lt;/p&gt;

&lt;p&gt;Data is replicated across multiple copies in three AZs, continuously backed up to S3, and storage scales automatically.&lt;/p&gt;

&lt;p&gt;The key insight for this comparison: &lt;strong&gt;Amazon DocumentDB and Aurora PostgreSQL both use this same architectural pattern of decoupled compute and distributed, log-oriented storage.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon DocumentDB Unmasked: Compatibility Layer, Not MongoDB Internals
&lt;/h2&gt;

&lt;p&gt;DocumentDB is often described informally as "managed MongoDB on AWS." That description is directionally useful for app teams, but technically misleading — and the distinction matters at the architecture level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon DocumentDB is not MongoDB.&lt;/strong&gt; It does not run MongoDB's codebase. It does not use MongoDB's WiredTiger storage engine. It does not contain any MongoDB SSPL-licensed code. It is a fully proprietary AWS-built database engine that implements a &lt;strong&gt;subset&lt;/strong&gt; of the MongoDB wire protocol and API surface. The "with MongoDB compatibility" qualifier in the product name is doing a lot of heavy lifting — it means your MongoDB drivers and application code can connect and issue operations, but the engine executing those operations is fundamentally different under the hood.&lt;/p&gt;

&lt;p&gt;When your application sends a MongoDB-style operation to DocumentDB, the service accepts the wire protocol request at its compute layer and executes it through DocumentDB's own engine and storage path built on AWS's distributed storage architecture. The query planner is different. The index internals are different. The replication model is different. What you get is API-level compatibility — not behavioral equivalence.&lt;/p&gt;

&lt;p&gt;As of early 2026, DocumentDB supports compatibility modes for MongoDB 3.6, 4.0, 5.0, and the recently launched 8.0 — which brings a new query planner (Planner v3), collation, views, and additional aggregation stages. However, even with 8.0, the compatibility story has real gaps that matter at the architecture level.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-offs of Compatibility by Emulation
&lt;/h2&gt;

&lt;p&gt;The decoupled storage model brings real operational advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You do not manage MongoDB replica set internals directly.&lt;/li&gt;
&lt;li&gt;Storage growth is decoupled from traditional node-local disk constraints.&lt;/li&gt;
&lt;li&gt;Read scaling is straightforward via replicas that share the underlying distributed storage model.&lt;/li&gt;
&lt;li&gt;Backups, durability, and failover characteristics inherit AWS-managed behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But compatibility-by-emulation comes with engineering consequences: &lt;strong&gt;API compatibility is not the same as internal engine equivalence or full ecosystem parity.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is where things get concrete. These are real compatibility gaps that have bitten teams mid-migration:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;$lookup&lt;/code&gt; with correlated subqueries
&lt;/h3&gt;

&lt;p&gt;DocumentDB (5.0 and earlier) supports equality-based &lt;code&gt;$lookup&lt;/code&gt; joins and uncorrelated subqueries, but &lt;strong&gt;does not support correlated subqueries&lt;/strong&gt; — the &lt;code&gt;$lookup&lt;/code&gt; variant where you use &lt;code&gt;let&lt;/code&gt; and &lt;code&gt;pipeline&lt;/code&gt; together with &lt;code&gt;$expr&lt;/code&gt; to reference parent fields inside the child pipeline. This is a common MongoDB pattern for filtered joins. Your aggregation will fail at runtime with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MongoServerError: Aggregation stage not supported:
  '$lookup on multiple join conditions and uncorrelated subquery'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your application relies on correlated &lt;code&gt;$lookup&lt;/code&gt; pipelines for cross-collection queries, this is a migration-blocking gap that forces you to restructure the aggregation into multiple application-level round trips or flatten the data model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Missing or version-gated aggregation stages
&lt;/h3&gt;

&lt;p&gt;On DocumentDB 5.0, several commonly used aggregation stages are missing entirely: &lt;code&gt;$facet&lt;/code&gt;, &lt;code&gt;$unionWith&lt;/code&gt;, &lt;code&gt;$graphLookup&lt;/code&gt;, &lt;code&gt;$setWindowFields&lt;/code&gt;, and &lt;code&gt;$merge&lt;/code&gt;. DocumentDB 8.0 adds some of these (notably &lt;code&gt;$merge&lt;/code&gt;, &lt;code&gt;$bucket&lt;/code&gt;, &lt;code&gt;$facet&lt;/code&gt;, and &lt;code&gt;$set&lt;/code&gt;/&lt;code&gt;$unset&lt;/code&gt;), but &lt;code&gt;$setWindowFields&lt;/code&gt; and &lt;code&gt;$graphLookup&lt;/code&gt; remain unavailable. If your analytics pipelines use window functions via &lt;code&gt;$setWindowFields&lt;/code&gt; — a feature MongoDB introduced in 5.0 — you will not find a DocumentDB equivalent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Index behavior differences
&lt;/h3&gt;

&lt;p&gt;DocumentDB does not support partial indexes, case-insensitive indexes, or leverage indexes for queries using certain operators. For teams relying on partial indexes to reduce index size on sparse data or conditional queries, this means either bloated indexes or redesigned query patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem tooling assumptions
&lt;/h3&gt;

&lt;p&gt;Tools that assume native MongoDB internals — such as &lt;code&gt;mongodump&lt;/code&gt;/&lt;code&gt;mongorestore&lt;/code&gt; beyond version 100.6.1, or change stream consumers expecting MongoDB-native latency characteristics — may behave differently or fail. Client-side field-level encryption and queryable encryption are not supported.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottom line:&lt;/strong&gt; Run the &lt;a href="https://github.com/awslabs/amazon-documentdb-tools" rel="noopener noreferrer"&gt;AWS DocumentDB compatibility tool&lt;/a&gt; against your actual workload in staging. Pay special attention to aggregation pipelines with &lt;code&gt;$lookup&lt;/code&gt; subqueries and any stage not listed in the &lt;a href="https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html" rel="noopener noreferrer"&gt;supported APIs documentation&lt;/a&gt;. Do not assume that "MongoDB-compatible" means "drop-in replacement."&lt;/p&gt;




&lt;h2&gt;
  
  
  Aurora PostgreSQL: Does It Fix the MVCC / TOAST Problem?
&lt;/h2&gt;

&lt;p&gt;If you read my previous post, you know that PostgreSQL's MVCC and TOAST mechanisms cause heavy write amplification for large, frequently updated JSON documents. So if you migrate to Aurora PostgreSQL — where the compute node sends redo records to the distributed storage layer rather than writing full pages over a network-attached volume — does that fix the Postgres JSON penalty?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The short answer: No. But the bottleneck shifts.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Aurora vastly reduces page-level network I/O, but the Aurora PostgreSQL compute node still runs the standard PostgreSQL engine in memory. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MVCC still happens.&lt;/strong&gt; When you update a &lt;code&gt;jsonb&lt;/code&gt; document, PostgreSQL still creates a new tuple version and leaves a dead tuple behind in the heap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOAST still happens.&lt;/strong&gt; Documents over 2KB are still compressed and chunked into separate TOAST tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HOT updates are still blocked on &lt;code&gt;jsonb&lt;/code&gt;.&lt;/strong&gt; Expression indexes on &lt;code&gt;jsonb&lt;/code&gt; fields prevent PostgreSQL from applying Heap-Only Tuple optimization, meaning every update — even a single key change — forces a new index entry across all indexes on that table. This is a write amplification multiplier that compounds with the number of indexes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VACUUM is still required.&lt;/strong&gt; You still have to tune autovacuum aggressively to clean up dead tuples. In Aurora, unbounded bloat also means unbounded storage cost — you pay per GB on the Aurora volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most critical Aurora-specific consideration is &lt;strong&gt;I/O cost&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In standard Aurora configurations, you are billed per I/O request at $0.20 per million operations. Because updating a TOASTed &lt;code&gt;jsonb&lt;/code&gt; document causes write amplification (MVCC new tuple + TOAST rewrite + index entries), a write-heavy JSON workload can generate I/O charges that aren't obvious until your bill arrives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practical tip:&lt;/strong&gt; If you are running a heavy JSON mutation workload on Aurora PostgreSQL, evaluate &lt;strong&gt;Aurora I/O-Optimized&lt;/strong&gt;. This cluster configuration eliminates per-request I/O charges in exchange for a higher baseline on compute and storage pricing. AWS's guidance is that if I/O spend exceeds 25% of your total Aurora database bill, I/O-Optimized likely saves money — and for workloads dominated by large &lt;code&gt;jsonb&lt;/code&gt; updates with multiple indexes, you can easily clear that threshold. Teams have reported 30–40% cost reductions on I/O-intensive workloads after switching.&lt;/p&gt;

&lt;p&gt;A secondary option worth evaluating: &lt;strong&gt;Aurora Optimized Reads&lt;/strong&gt; with NVMe-backed instance types (r6gd/r6id). This extends the buffer pool to local SSD, which helps with read-heavy JSON access patterns where TOAST decompression causes repeated storage round-trips.&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying, Indexing, and the Compute Bottleneck
&lt;/h2&gt;

&lt;p&gt;Because both services use the same underlying distributed storage model, performance differences are won and lost in the compute layer — specifically in the buffer cache and how efficiently each engine executes queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Aurora PostgreSQL (&lt;code&gt;jsonb&lt;/code&gt; + GIN)
&lt;/h3&gt;

&lt;p&gt;PostgreSQL's query optimizer has full native visibility into your tables and indexes. When you query a &lt;code&gt;jsonb&lt;/code&gt; column using a GIN index, PostgreSQL knows exactly how to traverse the decompressed JSON structure. You can join a deeply nested JSON document against multiple normalized relational tables, filter with a CTE, and aggregate with window functions — and the query planner handles all of it natively with decades of optimization work behind it.&lt;/p&gt;

&lt;p&gt;You also get partial indexes, expression indexes, and exclusion constraints on JSON data — tools that let you build highly targeted index structures that DocumentDB cannot replicate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon DocumentDB
&lt;/h3&gt;

&lt;p&gt;DocumentDB is highly efficient for standard document retrieval and basic filtering. For pure document access patterns — key lookups, single-collection queries with indexed predicates — performance is strong and the operational simplicity is real.&lt;/p&gt;

&lt;p&gt;However, complex analytics are where the emulation model shows its limits. A multi-stage aggregation pipeline (&lt;code&gt;$match&lt;/code&gt;, &lt;code&gt;$unwind&lt;/code&gt;, &lt;code&gt;$group&lt;/code&gt;, &lt;code&gt;$lookup&lt;/code&gt;) requires the DocumentDB compute layer to pull data from the shared storage volume, hold it in memory, and execute pipeline stages iteratively. The &lt;code&gt;$lookup&lt;/code&gt; operator in particular lacks the relational query optimizer that PostgreSQL uses to build efficient join execution plans — DocumentDB uses hash, sort-merge, or nested-loop algorithms but without the cost-based planner that selects between them intelligently based on table statistics.&lt;/p&gt;

&lt;p&gt;For reporting or ad-hoc analytics that span multiple collections, this becomes a compute bottleneck at scale. And because &lt;code&gt;$setWindowFields&lt;/code&gt; is unavailable, any window-function-style analytics require client-side post-processing or a separate analytics layer entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Decision Framework
&lt;/h2&gt;

&lt;p&gt;Since the underlying storage model is shared, your decision is primarily about your application architecture, team skill set, and where your workload is headed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Amazon DocumentDB if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are doing a lift-and-shift of an existing MongoDB application and want managed AWS infrastructure without rewriting your data access layer — but &lt;strong&gt;only after&lt;/strong&gt; you have validated compatibility using AWS's tooling against your actual aggregation pipelines and index usage&lt;/li&gt;
&lt;li&gt;Your data is heavily siloed into independent documents, cross-collection joins are rare, and your aggregation pipelines use only the supported subset&lt;/li&gt;
&lt;li&gt;Schema flexibility and rapid iteration on document shape are more valuable than referential integrity — and you have a team experienced with MongoDB operational patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Aurora PostgreSQL if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building greenfield — the relational model with &lt;code&gt;jsonb&lt;/code&gt; gives you the highest ceiling for complexity as requirements evolve, without having to migrate engines later&lt;/li&gt;
&lt;li&gt;Your data model requires or will eventually require referential integrity, foreign keys, and complex &lt;code&gt;JOIN&lt;/code&gt; operations&lt;/li&gt;
&lt;li&gt;You need robust analytics: full SQL, window functions, GIN indexes, partial indexes, and expression indexes on JSON data&lt;/li&gt;
&lt;li&gt;You want one engine that handles both structured and semi-structured data without a separate system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The case that often gets decided wrong:&lt;/strong&gt; A team has a MongoDB application that uses &lt;code&gt;$lookup&lt;/code&gt; with correlated subqueries, &lt;code&gt;$facet&lt;/code&gt; for multi-faceted search, or &lt;code&gt;$setWindowFields&lt;/code&gt; for analytical aggregations. They assume DocumentDB is the natural AWS landing zone because the API looks the same. In practice, they would be better served migrating to Aurora PostgreSQL with a &lt;code&gt;jsonb&lt;/code&gt;-first schema, where the features they depend on have native equivalents that are more mature and more performant — SQL &lt;code&gt;JOIN&lt;/code&gt; with a cost-based planner instead of &lt;code&gt;$lookup&lt;/code&gt; emulation, &lt;code&gt;FILTER&lt;/code&gt; clauses and window functions instead of &lt;code&gt;$facet&lt;/code&gt; and &lt;code&gt;$setWindowFields&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If your MongoDB usage is already pushing past simple document access into relational territory, the migration to Aurora PostgreSQL is an investment that pays back in query capability and long-term maintainability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;Amazon DocumentDB is a genuine feat of cloud engineering — decoupling a MongoDB-compatible wire protocol from its underlying storage internals enabled legacy application teams to scale on AWS without rearchitecting their data layer. The 8.0 release narrows the gap significantly. For the right workload — high-throughput document access with straightforward query patterns — it earns its place.&lt;/p&gt;

&lt;p&gt;But for modern, data-intensive architectures — where data models inevitably become relational as a product matures — &lt;strong&gt;Aurora PostgreSQL&lt;/strong&gt; remains the stronger long-term foundation. Tune your autovacuum, watch your TOAST tables, evaluate Aurora I/O-Optimized for write-heavy JSON workloads, and keep an eye on HOT update limitations if you are indexing &lt;code&gt;jsonb&lt;/code&gt; fields aggressively. The operational overhead is real, but the query expressiveness and data integrity guarantees pay it back.&lt;/p&gt;

&lt;p&gt;Have you hit unexpected I/O costs from TOAST write amplification on Aurora? Or run into DocumentDB compatibility gaps mid-migration? Drop a comment — I'd love to compare notes.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>documentdb</category>
      <category>aurora</category>
      <category>postgres</category>
    </item>
    <item>
      <title>PostgreSQL vs. MongoDB for JSON: The Internal Trade-offs They Don't Tell You in the Docs</title>
      <dc:creator>Giovanni Martinez</dc:creator>
      <pubDate>Sun, 15 Mar 2026 13:02:35 +0000</pubDate>
      <link>https://dev.to/aws-builders/postgresql-vs-mongodb-for-json-the-internal-trade-offs-they-dont-tell-you-in-the-docs-40oe</link>
      <guid>https://dev.to/aws-builders/postgresql-vs-mongodb-for-json-the-internal-trade-offs-they-dont-tell-you-in-the-docs-40oe</guid>
      <description>&lt;p&gt;The question comes up constantly in architecture discussions: "Should we use MongoDB or PostgreSQL for our JSON-heavy workload?" Having managed both at scale, I can tell you the answer is not as simple as "MongoDB is for documents, Postgres is for tables." There are deep internals at play on both sides that will affect your performance, storage footprint, and operational burden in ways that a quick benchmark won't reveal. Let's dig in.&lt;/p&gt;

&lt;h2&gt;
  
  
  First, a Fundamental Framing Problem
&lt;/h2&gt;

&lt;p&gt;MongoDB is often called a "document database," which people interpret as: great for JSON, superior to relational databases for flexible data. That framing is misleading. MongoDB is not simply a JSON store with a query layer on top. It is a non-relational database, meaning it has no native concept of joins, no foreign key enforcement, no referential integrity, and no support for multi-document ACID transactions that span arbitrary collections (multi-document transactions were added in v4.0 but carry significant performance overhead and are not the default usage pattern).&lt;/p&gt;

&lt;p&gt;To be precise for teams running MongoDB Atlas or Enterprise: while MongoDB does support multi-document transactions, they are bound by a 60-second execution limit and incur significant throughput penalties as lock contention increases. In PostgreSQL, a transaction is a first-class citizen — the default behavior for every statement. In MongoDB, a multi-document transaction is a specialized escape hatch you reach for when your schema design has failed to keep related data local to a single document. That distinction matters enormously at scale.&lt;/p&gt;

&lt;p&gt;This matters because the moment your data has relationships — orders belong to customers, line items belong to orders, products belong to categories — MongoDB forces you to either embed everything (document bloat, duplication, update anomalies) or handle joins in application code (&lt;code&gt;$lookup&lt;/code&gt; is available but is a post-processing aggregation step, not a query optimizer join). Neither is free. PostgreSQL's relational model with JSON support gives you both flexibility and the full power of set-based relational operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostgreSQL's JSON Capabilities: More Than You Think
&lt;/h2&gt;

&lt;p&gt;PostgreSQL has two JSON data types: &lt;code&gt;json&lt;/code&gt; (stored as plain text, re-parsed on each access) and &lt;code&gt;jsonb&lt;/code&gt; (stored as a parsed binary format, indexed, and operator-rich). For any production workload, use &lt;code&gt;jsonb&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;jsonb&lt;/code&gt; you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GIN indexes&lt;/strong&gt; on the entire document or specific paths for fast containment queries (&lt;code&gt;@&amp;gt;&lt;/code&gt; operator)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path-based expression indexes&lt;/strong&gt;: &lt;code&gt;CREATE INDEX ON events ((payload -&amp;gt;&amp;gt; 'event_type'))&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full SQL&lt;/strong&gt;: join your JSON documents against normalized tables, filter with CTEs, aggregate with window functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial indexes&lt;/strong&gt;: index only the subset of rows where a JSON field meets a condition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema validation&lt;/strong&gt; via &lt;code&gt;CHECK&lt;/code&gt; constraints on JSON paths when you need it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB also has rich query capabilities on nested documents, but it lacks the composability of SQL. Complex reporting that mixes document access with aggregation across related collections becomes an aggregation pipeline exercise that few SQL developers would recognize as readable.&lt;/p&gt;

&lt;h2&gt;
  
  
  MVCC: The Hidden Cost in PostgreSQL JSON Workloads
&lt;/h2&gt;

&lt;p&gt;PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle concurrent reads and writes without locking. The mechanics create a write amplification problem that is especially painful for large &lt;code&gt;jsonb&lt;/code&gt; columns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How MVCC works on an UPDATE:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;BEFORE UPDATE:
[Heap Page]
 +------------------------------------------+
 | Tuple v1 (xmin=100, xmax=0) | ...data... |
 +------------------------------------------+

AFTER UPDATE (change one JSON key):
[Heap Page]
 +------------------------------------------+
 | Tuple v1 (xmin=100, xmax=200) | ...data... | &amp;lt;-- marked DEAD
 | Tuple v2 (xmin=200, xmax=0)   | ...data... | &amp;lt;-- NEW full copy
 +------------------------------------------+
                                               ^
                    Dead tuple occupies space until VACUUM runs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you update a row, PostgreSQL does not modify the existing row in place. It writes a new version of the entire row and marks the old version as dead. Even if you change a single key in a 10KB &lt;code&gt;jsonb&lt;/code&gt; document, the full 10KB is written again. Readers on older snapshots see the prior version until their transaction completes — which is excellent for read concurrency, but means dead tuples accumulate on disk.&lt;/p&gt;

&lt;p&gt;For JSON-heavy workloads with frequent partial updates, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table bloat builds faster than in equivalent workloads on narrow rows&lt;/li&gt;
&lt;li&gt;Index bloat follows, because index entries point to specific heap tuple versions&lt;/li&gt;
&lt;li&gt;Query performance degrades as the visibility map becomes stale and more pages need checking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB's WiredTiger storage engine also uses MVCC internally, but it employs a copy-on-write B-Tree model rather than PostgreSQL's heap-based tuple versioning. When you update a document, WiredTiger caches the modification in memory and appends it to a Write-Ahead Log. During its periodic checkpoint process, it writes modified pages to new block locations on disk and eventually frees the old space.&lt;/p&gt;

&lt;p&gt;While WiredTiger avoids the exact single-row write amplification seen in PostgreSQL, it is not zero-cost. It still involves writing out entire compressed pages during checkpoints, and relies heavily on background cache eviction to maintain performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  VACUUM: PostgreSQL's Maintenance Obligation
&lt;/h2&gt;

&lt;p&gt;VACUUM is PostgreSQL's answer to MVCC dead tuple accumulation. It reclaims space occupied by dead tuples, updates the visibility map (allowing Index-Only Scans to skip heap fetches), and prevents transaction ID wraparound — the catastrophic failure mode where Postgres refuses to accept new transactions.&lt;/p&gt;

&lt;p&gt;PostgreSQL has autovacuum, a background daemon that triggers based on a dead tuple threshold (&lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; defaults to 20% of table size). For large tables, this default is dangerously high — a 500 million row table would need 100 million dead tuples before autovacuum wakes up.&lt;/p&gt;

&lt;p&gt;For JSON-heavy workloads, tune aggressively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lower &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; to &lt;code&gt;0.01&lt;/code&gt; or even &lt;code&gt;0.005&lt;/code&gt; for large tables&lt;/li&gt;
&lt;li&gt;Raise autovacuum's I/O budget by reducing &lt;code&gt;autovacuum_vacuum_cost_delay&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Monitor &lt;code&gt;pg_stat_user_tables&lt;/code&gt;: track &lt;code&gt;n_dead_tup&lt;/code&gt;, &lt;code&gt;last_autovacuum&lt;/code&gt;, and &lt;code&gt;last_autoanalyze&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Consider &lt;code&gt;VACUUM ANALYZE&lt;/code&gt; after bulk loads or mass updates to refresh planner statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB does not have an equivalent to VACUUM. WiredTiger reclaims space within its B-tree pages automatically via checkpointing, and collection-level compaction can be triggered manually. There is no "transaction ID wraparound" risk, and space reclamation is generally more transparent to the application.&lt;/p&gt;

&lt;h2&gt;
  
  
  TOAST: PostgreSQL's Large Value Storage
&lt;/h2&gt;

&lt;p&gt;PostgreSQL has a hard limit: a single row must fit on one 8KB page. Since &lt;code&gt;jsonb&lt;/code&gt; documents can easily exceed 8KB, PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to handle large values.&lt;/p&gt;

&lt;p&gt;When a &lt;code&gt;jsonb&lt;/code&gt; value exceeds roughly 2KB (the TOAST threshold), PostgreSQL will automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Compress the value (using LZ compression by default)&lt;/li&gt;
&lt;li&gt;If still too large, chunk it into 2KB segments stored in a separate TOAST table (&lt;code&gt;pg_toast_&amp;lt;oid&amp;gt;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Store a pointer in the main heap row referencing the TOAST chunks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is largely transparent, but the performance implications are real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reads&lt;/strong&gt;: fetching a TOASTed column requires an additional heap scan on the TOAST table — extra I/O on every large document fetch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Updates&lt;/strong&gt;: updating any field in a large &lt;code&gt;jsonb&lt;/code&gt; document causes the entire value to be re-TOASTed, even if you only changed one key. Combined with MVCC write amplification, this is double the I/O penalty&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VACUUM on TOAST tables&lt;/strong&gt;: autovacuum must process the TOAST table separately; TOAST table bloat is a common source of hidden disk usage that operators miss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index access&lt;/strong&gt;: GIN indexes on &lt;code&gt;jsonb&lt;/code&gt; operate on the decompressed value, so retrieving the full document still requires a TOAST table hit, even if the query filter was satisfied entirely by a GIN index&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical recommendation: if your JSON documents regularly exceed 4–8KB, consider splitting large, rarely-queried fields into separate columns or an object store. Keep the frequently-queried JSON fields in a compact &lt;code&gt;jsonb&lt;/code&gt; column.&lt;/p&gt;

&lt;p&gt;MongoDB documents have their own size limit (16MB per document) and store data in BSON format. WiredTiger handles variable-length documents natively without a separate overflow mechanism, which gives MongoDB an advantage for workloads dominated by large, frequently-updated documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Decision Framework
&lt;/h2&gt;

&lt;h3&gt;
  
  
  PostgreSQL (jsonb)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transactions&lt;/strong&gt;: ACID by default for every statement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Joins&lt;/strong&gt;: native SQL joins with optimizer support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrity&lt;/strong&gt;: foreign keys and relational constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update write cost&lt;/strong&gt;: full-row rewrite on updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large JSON reads&lt;/strong&gt;: TOAST can add extra I/O on large values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Space maintenance&lt;/strong&gt;: requires VACUUM tuning on write-heavy workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index options&lt;/strong&gt;: GIN, B-tree, partial, and expression indexes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics&lt;/strong&gt;: full SQL, CTEs, and window functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best fit&lt;/strong&gt;: mixed relational + JSON workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  MongoDB
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transactions&lt;/strong&gt;: multi-document ACID available, but with higher overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Joins&lt;/strong&gt;: &lt;code&gt;$lookup&lt;/code&gt; as an aggregation stage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrity&lt;/strong&gt;: no native relational integrity model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update write cost&lt;/strong&gt;: copy-on-write page checkpointing in WiredTiger&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large JSON reads&lt;/strong&gt;: BSON inline storage up to 16MB document limit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Space maintenance&lt;/strong&gt;: automatic space reuse, optional manual compaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Index options&lt;/strong&gt;: compound, multikey, text, and geospatial indexes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analytics&lt;/strong&gt;: aggregation pipeline model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best fit&lt;/strong&gt;: document-first workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxcrmxyymgndny6jycoc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxcrmxyymgndny6jycoc.jpg" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Operational Reality
&lt;/h2&gt;

&lt;p&gt;In my experience managing both at scale, PostgreSQL's MVCC + VACUUM model requires more active DBA engagement for write-heavy JSON workloads. You will fight bloat if you don't tune autovacuum aggressively. TOAST adds I/O overhead that isn't obvious until you instrument it. But the payoff — full SQL expressiveness, relational integrity, and a single database for everything — is significant.&lt;/p&gt;

&lt;p&gt;MongoDB's operational model is simpler for pure document workloads, but the moment your product evolves and relationships emerge (they always do), you pay the cost of having chosen a non-relational foundation at a time when re-architecting is expensive.&lt;/p&gt;

&lt;p&gt;The best database for JSON is the one you understand deeply enough to tune, monitor, and operate at production scale. For most teams building data-intensive applications in 2026, that database is PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;One contender I haven't covered here is Amazon DocumentDB — a MongoDB-compatible service built on the Aurora storage layer that deserves its own deep dive. I'll be publishing a follow-up post that adds DocumentDB to the mix, including what it actually is under the hood, where it diverges from native MongoDB, and how it stacks up against Aurora PostgreSQL for JSON workloads on AWS.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Have you run into TOAST bloat or MVCC write amplification in a PostgreSQL JSON workload? Or migrated from MongoDB back to Postgres? Drop a comment — I'd love to compare notes.&lt;/p&gt;

&lt;p&gt;canonical_url: &lt;a href="https://iqtoolkit.ai/blog/postgresql-vs-mongodb-json-internal-tradeoffs" rel="noopener noreferrer"&gt;https://iqtoolkit.ai/blog/postgresql-vs-mongodb-json-internal-tradeoffs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>mongodb</category>
      <category>database</category>
      <category>json</category>
    </item>
  </channel>
</rss>
