DEV Community: Giovanni Martinez

Inside Hypercore: How TimescaleDB Quietly Built a Hybrid OLTP/OLAP Engine on Postgres

Giovanni Martinez — Tue, 02 Jun 2026 04:00:00 +0000

Disclosure: I work at Tiger Data, the company behind TimescaleDB. This post is my own analysis based on public documentation and code, and is not an official Tiger Data publication. I've tried to write the post I would have wanted to read six months ago.

The reframe

Here's something most people miss: Hypercore isn't a new feature. It's a rename.

The hybrid row-columnar storage engine inside TimescaleDB already existed — it was the machinery behind what everyone called "compression." Tiger Data renamed the whole package to Hypercore because they realized the conversion from row-oriented to column-oriented storage was what customers actually cared about. Compression was the side effect. Real-time analytics on fresh data was the prize.

That shift in framing matters. If you came to TimescaleDB looking for a time-series database with nice storage savings, you were buying the wrong thing. What you actually got was a Postgres extension that fuses an OLTP-style rowstore with an OLAP-style columnstore behind a single SQL surface — with no separate analytics database, no ETL, no eventual consistency.

This post is about how that actually works. Not the marketing version. The version where we look at chunks, catalogs, and compression algorithms, and ask: why does this work, and where does it break?

The two-store mental model

If you've worked with Postgres long enough, you already have the mental model for Hypercore — you just don't know it yet.

Think of how Postgres treats a freshly-inserted row versus an old one that's been vacuumed, frozen, and is sitting in a cold page on disk. Both are "the same table," but they live in different states, have different access patterns, and respond to different optimizations. Hypercore takes that idea and makes it explicit.

There are two stores:

The rowstore is regular Postgres heap storage. New data lands here. Inserts are fast. Updates and deletes work normally. Indexes behave the way you expect. If you stopped here, you'd just have a normal hypertable.

The columnstore is what older chunks get converted into. Each column is stored separately, compressed, and organized for scanning rather than point lookups. Aggregations fly. Scans skip irrelevant chunks entirely. Storage drops by 90–98%.

The trick is that both stores belong to the same hypertable. You don't query them separately. You write one SQL statement, and the planner figures out which chunks live where and reads accordingly. From the application's perspective, it's just a table.

-- This query reads transparently across rowstore and columnstore chunks
SELECT time_bucket('1 hour', ts) AS hour,
       device_id,
       avg(temperature)
FROM sensor_readings
WHERE ts > now() - INTERVAL '7 days'
GROUP BY hour, device_id;

The fresh chunks (last few hours) sit in the rowstore. The week-old chunks have been converted to the columnstore. You wrote one query. Postgres did the right thing.

What actually happens during conversion

Let's get concrete. A chunk's lifecycle looks like this:

Create the hypertable. You write a normal CREATE TABLE, then SELECT create_hypertable(...). TimescaleDB partitions the table into chunks based on a time interval (default: 7 days, configurable per workload).

Inserts hit the rowstore. Every INSERT lands in the chunk corresponding to its timestamp. These chunks are regular heap tables — Postgres doesn't know they're special. You can \d+ them.

A columnstore policy is configured. When you enable the columnstore on a hypertable, you tell TimescaleDB when chunks should convert (e.g., "after they're 7 days old") and how they should be organized (the segmentby and orderby options, which we'll get to).

ALTER TABLE sensor_readings SET (
  timescaledb.enable_columnstore = true,
  timescaledb.segmentby = 'device_id',
  timescaledb.orderby = 'ts DESC'
);

SELECT add_columnstore_policy('sensor_readings', after => INTERVAL '7 days');

A background job converts old chunks. When a chunk crosses the age threshold, a background worker rewrites it. Rows get reorganized into column-major layout, compression algorithms run per column, and the result lands in internal storage that the hypertable knows how to read.

The original chunk is replaced. From a query planner perspective, the chunk is now "in the columnstore." It still belongs to the hypertable, still participates in queries, but is read through different machinery.

Modifications still work. This is the part that distinguishes Hypercore from a pure columnar engine. You can INSERT into a columnstore chunk, UPDATE it, DELETE from it. The engine handles the decompress-modify-recompress dance under transactional semantics. It's not free — there's overhead — but it works, which most columnar systems can't say.

If you want to see what's happening under the hood, the catalog views are your friend:

-- See which chunks are in the columnstore
SELECT chunk_schema, chunk_name, is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings'
ORDER BY range_start DESC;

-- See compression stats
SELECT pg_size_pretty(before_compression_total_bytes) AS before,
       pg_size_pretty(after_compression_total_bytes)  AS after
FROM hypertable_compression_stats('sensor_readings');

The compression algorithms (or: why your time-series data wants to be small)

Generic compression — gzip, LZ4, zstd — treats your data as an opaque stream of bytes. Time-series data isn't opaque. It has structure: timestamps tick forward at predictable intervals, sensor readings drift slowly, device IDs repeat for hours. A type-aware compressor can exploit that, and it's the difference between 3x and 30x ratios.

Hypercore picks the right algorithm per column type and layers them:

Column type	Algorithm(s)	Why it works
Timestamps	Delta-of-delta + Simple-8b	Regular intervals → second derivative is zero → store almost nothing
Slow-moving floats	Gorilla XOR (from Facebook's Gorilla paper)	Consecutive values share most bits; XOR leaves near-zero residual
Low-cardinality strings	Dictionary + RLE	Repeated values collapse to a count
Any small-integer stream	Simple-8b	Packs multiple values into one 64-bit word

A timestamp column compressed with delta-of-delta + simple-8b gets to a few bits per row. A status TEXT column with five distinct values across a billion rows costs essentially nothing. The cumulative layering is where the 90–98% numbers come from.

What you control is the column layout — segmentby and orderby — which determines how much repetition each algorithm has to work with. Get those right and the compression engine does its job. Get them wrong and you'll wonder why your "compressed" data is still big.

`segmentby` and `orderby`: the two decisions that determine everything

If you remember nothing else from this post, remember this: the compression ratio you get is mostly a function of two settings. Not the algorithms, not the chunk size, not the hardware. Those two settings.

`segmentby`

segmentby tells the columnstore which columns to group rows by within a chunk. Think of it like a GROUP BY for storage: all rows with the same segmentby value get co-located, then compressed together.

Pick the right column and the compression algorithms see long runs of repeated values — RLE crushes them, dictionary encoding hands them off as a single integer, the column collapses to almost nothing. Pick the wrong column (or none at all) and the compressor sees a random salad of values and does its best, which isn't much.

Rule of thumb: segmentby should be the column you most often filter on, typically a low-to-medium cardinality identifier — device_id, tenant_id, symbol, host. Not a timestamp. Not a high-cardinality UUID. Not the metric value itself.

`orderby`

orderby controls the row order within each segment. The default is your time column descending, which is almost always right, but you can layer in additional columns.

Why does this matter? Because compression algorithms exploit locality. Delta-of-delta only works if consecutive rows have similar timestamps. Gorilla XOR only works if consecutive floats are similar. If your rows are in random order, none of that lands.

An example with numbers

Imagine a metrics table: (ts, device_id, temperature, humidity), one billion rows, 10,000 devices, one reading per minute per device.

Configuration	Approximate ratio	Why
No `segmentby`, no `orderby`	~5x	Generic per-column compression, no locality
`segmentby = device_id`, default `orderby`	~25x	Per-device rows colocated, timestamps regular, floats drift slowly
`segmentby = device_id, ts` (wrong)	~3x	High-cardinality segments → tiny groups → no compression headroom

These are illustrative ranges, not benchmarks — your data will vary. But the shape is real: a smart segmentby is the difference between "TimescaleDB saved us a lot of money" and "we don't really see why people talk about this."

To choose: find the column in the WHERE clause of 80% of your analytical queries, verify it has low-to-medium cardinality (aim for at least a few thousand rows per segment), then check hypertable_compression_stats after setup. If you're under 10x, your segmentby is wrong.

The OLTP problem (and what 2.18 did about it)

For most of Hypercore's life, there was one big asterisk: once a chunk moved to the columnstore, you lost your indexes.

This is the classic columnar-database problem. Column stores are great for "scan a billion rows and aggregate," terrible for "find the one row where transaction_id = 0xdeadbeef." Postgres's B-tree indexes — the things that make point lookups instant — don't translate naturally to compressed columnar layouts.

In practice, that meant: as soon as your data got compressed, looking up a single record turned into a sequential scan over the chunk. Updating a single sensor reading from three months ago became a "decompress the whole batch, modify, recompress" operation. Fine for analytics. Painful for hybrid OLTP/OLAP workloads where someone occasionally needs to correct a specific record.

TimescaleDB 2.18 fixed this. B-tree and hash indexes now work on columnstore chunks (still labeled Early Access at time of writing, so check the current docs). The vendor benchmarks report something like 1,185x faster record retrievals and 224x faster inserts on indexed columnstore data. Take vendor numbers with appropriate salt — but the qualitative leap is real and matches what you'd expect once compressed chunks have a real index structure attached to them.

This is the change that turns Hypercore from "real-time analytics engine that's awkward at OLTP" into something legitimately hybrid. If you evaluated TimescaleDB more than a year ago and bounced because of point-lookup performance on old data, this is the upgrade worth re-evaluating on.

The gotchas section

Every honest internals post needs this section. Hypercore is impressive, but it isn't magic. Here's what to watch for in production.

Schema changes are heavier than they look

Adding a column to a hypertable with columnstore chunks is mostly fine. Changing a column type — int to bigint, say, on a multi-billion-row hypertable — is a different animal. Compressed chunks need to be decompressed, rewritten, and recompressed. I've watched this turn into a multi-day operation on production-sized tables. Plan it like a real migration: maintenance windows, maintenance_work_mem tuning, monitoring of BufFileWrite and LWLock:WALWrite waits. The "ALTER TABLE is fast in Postgres" reflex will burn you here.

`UPDATE`s on the columnstore aren't free

You can modify columnstore data under full transactional semantics, but an update generally implies a decompression step on the affected batch. Bulk updates on cold compressed chunks can be surprisingly expensive. If your workload involves a lot of late-arriving corrections, think carefully about whether that data should stay in the rowstore longer.

Logical replication has limits

Logical replication on TimescaleDB hypertables — especially with the columnstore involved — has historically had sharp edges. Don't assume "it's just Postgres, so logical replication just works." Check current docs for what's officially supported in your version before relying on it.

When to lean on Hypercore (and when to think twice)

A short, honest decision framework.

Lean in when your workload is:

Append-heavy with mostly immutable historical data
Analytics-shaped: aggregations, time-bucketed queries, dashboards
A natural fit for Postgres (you want SQL, joins, transactions, the ecosystem)
Operating at a scale where storage cost is a line item you care about

Think carefully when your workload involves:

Heavy point-lookups and mutations on old data (better with 2.18, still worth evaluating)
High-cardinality data that doesn't have natural segmentby candidates
Strict requirement for traditional Postgres backup/recovery flows without extension awareness
A team without bandwidth to learn the operational shape of hypertables and chunks

The honest framing: Hypercore is excellent at what it's built for, which is real-time analytics on time-structured data. It is not a general-purpose OLTP accelerator, and it's not free — you take on the operational complexity of an extension that controls a lot of storage-level behavior. For the workloads it fits, that trade is one of the best deals in the Postgres ecosystem. For the workloads it doesn't, you'll know within a month.

What's next

This post stayed deliberately narrow: what Hypercore is, how it works, and where to be careful. There are at least six follow-ups worth writing:

segmentby and orderby, deep dive — with actual benchmarks on a realistic dataset, not the hand-wavy ratios from the table above.
Continuous aggregates — the other half of the real-time analytics story, and the feature that makes dashboards possible on billion-row tables.
Chunk skipping — the under-discussed query optimization that makes Hypercore queries even faster than the column layout alone would suggest.
Compression algorithms, deep dive — delta-of-delta, Gorilla XOR, Simple-8b, and dictionary encoding each deserve more than a table row. The math behind why they combine to 90–98% is worth a full post.
Backups and restore — logical vs. physical backups behave very differently with compressed chunks. pg_dump restores need to re-run compression policies; physical backups preserve compressed state. Testing your restore path on Hypercore data is its own topic.
segmentby cardinality pitfalls — a too-granular segmentby doesn't just hurt compression ratios, it can balloon catalog metadata in ways the docs don't loudly warn you about. Worth walking through with real numbers.

If there's a piece of this you want me to go deeper on, let me know. The fun part of writing about internals is that there's always another layer.

Want more Postgres internals content? I write here weekly and on LinkedIn. My book series, PostgreSQL Internals Mastery, goes much deeper on the topics in this post.

Have questions about TimescaleDB or Postgres performance? Reach out on X or Mastodon.

Beyond the API: DocumentDB vs. Aurora PostgreSQL for JSON Workloads

Giovanni Martinez — Tue, 28 Apr 2026 11:44:33 +0000

In my last post, we dug into the deep internal trade-offs of running JSON workloads on PostgreSQL vs. MongoDB. We looked at how PostgreSQL's MVCC and TOAST architectures create hidden write amplification, and how MongoDB's WiredTiger engine handles documents differently with a copy-on-write B-Tree design.

But what happens when you decide you do not want to manage those database servers yourself? You migrate to AWS, open the managed services menu, and start comparing Amazon DocumentDB with Amazon Aurora PostgreSQL.

Here is the plot twist most architecture discussions miss: at this point, you are no longer comparing PostgreSQL heap internals directly against MongoDB WiredTiger internals. You are comparing two different database compute layers built on AWS's decoupled, log-centric, multi-AZ distributed storage model.

That shift completely changes the "Postgres vs. Mongo" math for JSON workloads.

The Shared Foundation: The Database as a Log-Centric System

To understand both DocumentDB and Aurora PostgreSQL, you first need to understand Aurora's storage architecture.

In traditional single-instance database deployments, the database engine is responsible for page management and writes data pages plus WAL over network-attached block storage. The compute node and storage node are tightly coupled from a write path perspective.

Aurora changed that model.

In Aurora's architecture, the compute layer primarily emits redo records to a distributed storage subsystem spanning three Availability Zones. The storage service is responsible for durable replication and page materialization across its internal fleet. This substantially reduces the amount of page-oriented write work the compute node must manage directly.

Data is replicated across multiple copies in three AZs, continuously backed up to S3, and storage scales automatically.

The key insight for this comparison: Amazon DocumentDB and Aurora PostgreSQL both use this same architectural pattern of decoupled compute and distributed, log-oriented storage.

Amazon DocumentDB Unmasked: Compatibility Layer, Not MongoDB Internals

DocumentDB is often described informally as "managed MongoDB on AWS." That description is directionally useful for app teams, but technically misleading — and the distinction matters at the architecture level.

Amazon DocumentDB is not MongoDB. It does not run MongoDB's codebase. It does not use MongoDB's WiredTiger storage engine. It does not contain any MongoDB SSPL-licensed code. It is a fully proprietary AWS-built database engine that implements a subset of the MongoDB wire protocol and API surface. The "with MongoDB compatibility" qualifier in the product name is doing a lot of heavy lifting — it means your MongoDB drivers and application code can connect and issue operations, but the engine executing those operations is fundamentally different under the hood.

When your application sends a MongoDB-style operation to DocumentDB, the service accepts the wire protocol request at its compute layer and executes it through DocumentDB's own engine and storage path built on AWS's distributed storage architecture. The query planner is different. The index internals are different. The replication model is different. What you get is API-level compatibility — not behavioral equivalence.

As of early 2026, DocumentDB supports compatibility modes for MongoDB 3.6, 4.0, 5.0, and the recently launched 8.0 — which brings a new query planner (Planner v3), collation, views, and additional aggregation stages. However, even with 8.0, the compatibility story has real gaps that matter at the architecture level.

The Trade-offs of Compatibility by Emulation

The decoupled storage model brings real operational advantages:

You do not manage MongoDB replica set internals directly.
Storage growth is decoupled from traditional node-local disk constraints.
Read scaling is straightforward via replicas that share the underlying distributed storage model.
Backups, durability, and failover characteristics inherit AWS-managed behavior.

But compatibility-by-emulation comes with engineering consequences: API compatibility is not the same as internal engine equivalence or full ecosystem parity.

Here is where things get concrete. These are real compatibility gaps that have bitten teams mid-migration:

`$lookup` with correlated subqueries

DocumentDB (5.0 and earlier) supports equality-based $lookup joins and uncorrelated subqueries, but does not support correlated subqueries — the $lookup variant where you use let and pipeline together with $expr to reference parent fields inside the child pipeline. This is a common MongoDB pattern for filtered joins. Your aggregation will fail at runtime with:

MongoServerError: Aggregation stage not supported:
  '$lookup on multiple join conditions and uncorrelated subquery'

If your application relies on correlated $lookup pipelines for cross-collection queries, this is a migration-blocking gap that forces you to restructure the aggregation into multiple application-level round trips or flatten the data model.

Missing or version-gated aggregation stages

On DocumentDB 5.0, several commonly used aggregation stages are missing entirely: $facet, $unionWith, $graphLookup, $setWindowFields, and $merge. DocumentDB 8.0 adds some of these (notably $merge, $bucket, $facet, and $set/$unset), but $setWindowFields and $graphLookup remain unavailable. If your analytics pipelines use window functions via $setWindowFields — a feature MongoDB introduced in 5.0 — you will not find a DocumentDB equivalent.

Index behavior differences

DocumentDB does not support partial indexes, case-insensitive indexes, or leverage indexes for queries using certain operators. For teams relying on partial indexes to reduce index size on sparse data or conditional queries, this means either bloated indexes or redesigned query patterns.

Ecosystem tooling assumptions

Tools that assume native MongoDB internals — such as mongodump/mongorestore beyond version 100.6.1, or change stream consumers expecting MongoDB-native latency characteristics — may behave differently or fail. Client-side field-level encryption and queryable encryption are not supported.

The bottom line: Run the AWS DocumentDB compatibility tool against your actual workload in staging. Pay special attention to aggregation pipelines with $lookup subqueries and any stage not listed in the supported APIs documentation. Do not assume that "MongoDB-compatible" means "drop-in replacement."

Aurora PostgreSQL: Does It Fix the MVCC / TOAST Problem?

If you read my previous post, you know that PostgreSQL's MVCC and TOAST mechanisms cause heavy write amplification for large, frequently updated JSON documents. So if you migrate to Aurora PostgreSQL — where the compute node sends redo records to the distributed storage layer rather than writing full pages over a network-attached volume — does that fix the Postgres JSON penalty?

The short answer: No. But the bottleneck shifts.

Aurora vastly reduces page-level network I/O, but the Aurora PostgreSQL compute node still runs the standard PostgreSQL engine in memory. That means:

MVCC still happens. When you update a jsonb document, PostgreSQL still creates a new tuple version and leaves a dead tuple behind in the heap.
TOAST still happens. Documents over 2KB are still compressed and chunked into separate TOAST tables.
HOT updates are still blocked on jsonb. Expression indexes on jsonb fields prevent PostgreSQL from applying Heap-Only Tuple optimization, meaning every update — even a single key change — forces a new index entry across all indexes on that table. This is a write amplification multiplier that compounds with the number of indexes.
VACUUM is still required. You still have to tune autovacuum aggressively to clean up dead tuples. In Aurora, unbounded bloat also means unbounded storage cost — you pay per GB on the Aurora volume.

The most critical Aurora-specific consideration is I/O cost.

In standard Aurora configurations, you are billed per I/O request at $0.20 per million operations. Because updating a TOASTed jsonb document causes write amplification (MVCC new tuple + TOAST rewrite + index entries), a write-heavy JSON workload can generate I/O charges that aren't obvious until your bill arrives.

The practical tip: If you are running a heavy JSON mutation workload on Aurora PostgreSQL, evaluate Aurora I/O-Optimized. This cluster configuration eliminates per-request I/O charges in exchange for a higher baseline on compute and storage pricing. AWS's guidance is that if I/O spend exceeds 25% of your total Aurora database bill, I/O-Optimized likely saves money — and for workloads dominated by large jsonb updates with multiple indexes, you can easily clear that threshold. Teams have reported 30–40% cost reductions on I/O-intensive workloads after switching.

A secondary option worth evaluating: Aurora Optimized Reads with NVMe-backed instance types (r6gd/r6id). This extends the buffer pool to local SSD, which helps with read-heavy JSON access patterns where TOAST decompression causes repeated storage round-trips.

Querying, Indexing, and the Compute Bottleneck

Because both services use the same underlying distributed storage model, performance differences are won and lost in the compute layer — specifically in the buffer cache and how efficiently each engine executes queries.

Aurora PostgreSQL (`jsonb` + GIN)

PostgreSQL's query optimizer has full native visibility into your tables and indexes. When you query a jsonb column using a GIN index, PostgreSQL knows exactly how to traverse the decompressed JSON structure. You can join a deeply nested JSON document against multiple normalized relational tables, filter with a CTE, and aggregate with window functions — and the query planner handles all of it natively with decades of optimization work behind it.

You also get partial indexes, expression indexes, and exclusion constraints on JSON data — tools that let you build highly targeted index structures that DocumentDB cannot replicate.

Amazon DocumentDB

DocumentDB is highly efficient for standard document retrieval and basic filtering. For pure document access patterns — key lookups, single-collection queries with indexed predicates — performance is strong and the operational simplicity is real.

However, complex analytics are where the emulation model shows its limits. A multi-stage aggregation pipeline ($match, $unwind, $group, $lookup) requires the DocumentDB compute layer to pull data from the shared storage volume, hold it in memory, and execute pipeline stages iteratively. The $lookup operator in particular lacks the relational query optimizer that PostgreSQL uses to build efficient join execution plans — DocumentDB uses hash, sort-merge, or nested-loop algorithms but without the cost-based planner that selects between them intelligently based on table statistics.

For reporting or ad-hoc analytics that span multiple collections, this becomes a compute bottleneck at scale. And because $setWindowFields is unavailable, any window-function-style analytics require client-side post-processing or a separate analytics layer entirely.

The Decision Framework

Since the underlying storage model is shared, your decision is primarily about your application architecture, team skill set, and where your workload is headed.

Choose Amazon DocumentDB if:

You are doing a lift-and-shift of an existing MongoDB application and want managed AWS infrastructure without rewriting your data access layer — but only after you have validated compatibility using AWS's tooling against your actual aggregation pipelines and index usage
Your data is heavily siloed into independent documents, cross-collection joins are rare, and your aggregation pipelines use only the supported subset
Schema flexibility and rapid iteration on document shape are more valuable than referential integrity — and you have a team experienced with MongoDB operational patterns

Choose Aurora PostgreSQL if:

You are building greenfield — the relational model with jsonb gives you the highest ceiling for complexity as requirements evolve, without having to migrate engines later
Your data model requires or will eventually require referential integrity, foreign keys, and complex JOIN operations
You need robust analytics: full SQL, window functions, GIN indexes, partial indexes, and expression indexes on JSON data
You want one engine that handles both structured and semi-structured data without a separate system

The case that often gets decided wrong: A team has a MongoDB application that uses $lookup with correlated subqueries, $facet for multi-faceted search, or $setWindowFields for analytical aggregations. They assume DocumentDB is the natural AWS landing zone because the API looks the same. In practice, they would be better served migrating to Aurora PostgreSQL with a jsonb-first schema, where the features they depend on have native equivalents that are more mature and more performant — SQL JOIN with a cost-based planner instead of $lookup emulation, FILTER clauses and window functions instead of $facet and $setWindowFields.

If your MongoDB usage is already pushing past simple document access into relational territory, the migration to Aurora PostgreSQL is an investment that pays back in query capability and long-term maintainability.

The Verdict

Amazon DocumentDB is a genuine feat of cloud engineering — decoupling a MongoDB-compatible wire protocol from its underlying storage internals enabled legacy application teams to scale on AWS without rearchitecting their data layer. The 8.0 release narrows the gap significantly. For the right workload — high-throughput document access with straightforward query patterns — it earns its place.

But for modern, data-intensive architectures — where data models inevitably become relational as a product matures — Aurora PostgreSQL remains the stronger long-term foundation. Tune your autovacuum, watch your TOAST tables, evaluate Aurora I/O-Optimized for write-heavy JSON workloads, and keep an eye on HOT update limitations if you are indexing jsonb fields aggressively. The operational overhead is real, but the query expressiveness and data integrity guarantees pay it back.

Have you hit unexpected I/O costs from TOAST write amplification on Aurora? Or run into DocumentDB compatibility gaps mid-migration? Drop a comment — I'd love to compare notes.

PostgreSQL vs. MongoDB for JSON: The Internal Trade-offs They Don't Tell You in the Docs

Giovanni Martinez — Sun, 15 Mar 2026 13:02:35 +0000

The question comes up constantly in architecture discussions: "Should we use MongoDB or PostgreSQL for our JSON-heavy workload?" Having managed both at scale, I can tell you the answer is not as simple as "MongoDB is for documents, Postgres is for tables." There are deep internals at play on both sides that will affect your performance, storage footprint, and operational burden in ways that a quick benchmark won't reveal. Let's dig in.

First, a Fundamental Framing Problem

MongoDB is often called a "document database," which people interpret as: great for JSON, superior to relational databases for flexible data. That framing is misleading. MongoDB is not simply a JSON store with a query layer on top. It is a non-relational database, meaning it has no native concept of joins, no foreign key enforcement, no referential integrity, and no support for multi-document ACID transactions that span arbitrary collections (multi-document transactions were added in v4.0 but carry significant performance overhead and are not the default usage pattern).

To be precise for teams running MongoDB Atlas or Enterprise: while MongoDB does support multi-document transactions, they are bound by a 60-second execution limit and incur significant throughput penalties as lock contention increases. In PostgreSQL, a transaction is a first-class citizen — the default behavior for every statement. In MongoDB, a multi-document transaction is a specialized escape hatch you reach for when your schema design has failed to keep related data local to a single document. That distinction matters enormously at scale.

This matters because the moment your data has relationships — orders belong to customers, line items belong to orders, products belong to categories — MongoDB forces you to either embed everything (document bloat, duplication, update anomalies) or handle joins in application code ($lookup is available but is a post-processing aggregation step, not a query optimizer join). Neither is free. PostgreSQL's relational model with JSON support gives you both flexibility and the full power of set-based relational operations.

PostgreSQL's JSON Capabilities: More Than You Think

PostgreSQL has two JSON data types: json (stored as plain text, re-parsed on each access) and jsonb (stored as a parsed binary format, indexed, and operator-rich). For any production workload, use jsonb.

With jsonb you get:

GIN indexes on the entire document or specific paths for fast containment queries (@> operator)
Path-based expression indexes: CREATE INDEX ON events ((payload ->> 'event_type'))
Full SQL: join your JSON documents against normalized tables, filter with CTEs, aggregate with window functions
Partial indexes: index only the subset of rows where a JSON field meets a condition
Schema validation via CHECK constraints on JSON paths when you need it

MongoDB also has rich query capabilities on nested documents, but it lacks the composability of SQL. Complex reporting that mixes document access with aggregation across related collections becomes an aggregation pipeline exercise that few SQL developers would recognize as readable.

MVCC: The Hidden Cost in PostgreSQL JSON Workloads

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle concurrent reads and writes without locking. The mechanics create a write amplification problem that is especially painful for large jsonb columns.

How MVCC works on an UPDATE:

BEFORE UPDATE:
[Heap Page]
 +------------------------------------------+
 | Tuple v1 (xmin=100, xmax=0) | ...data... |
 +------------------------------------------+

AFTER UPDATE (change one JSON key):
[Heap Page]
 +------------------------------------------+
 | Tuple v1 (xmin=100, xmax=200) | ...data... | <-- marked DEAD
 | Tuple v2 (xmin=200, xmax=0)   | ...data... | <-- NEW full copy
 +------------------------------------------+
                                               ^
                    Dead tuple occupies space until VACUUM runs

When you update a row, PostgreSQL does not modify the existing row in place. It writes a new version of the entire row and marks the old version as dead. Even if you change a single key in a 10KB jsonb document, the full 10KB is written again. Readers on older snapshots see the prior version until their transaction completes — which is excellent for read concurrency, but means dead tuples accumulate on disk.

For JSON-heavy workloads with frequent partial updates, this means:

Table bloat builds faster than in equivalent workloads on narrow rows
Index bloat follows, because index entries point to specific heap tuple versions
Query performance degrades as the visibility map becomes stale and more pages need checking

MongoDB's WiredTiger storage engine also uses MVCC internally, but it employs a copy-on-write B-Tree model rather than PostgreSQL's heap-based tuple versioning. When you update a document, WiredTiger caches the modification in memory and appends it to a Write-Ahead Log. During its periodic checkpoint process, it writes modified pages to new block locations on disk and eventually frees the old space.

While WiredTiger avoids the exact single-row write amplification seen in PostgreSQL, it is not zero-cost. It still involves writing out entire compressed pages during checkpoints, and relies heavily on background cache eviction to maintain performance.

VACUUM: PostgreSQL's Maintenance Obligation

VACUUM is PostgreSQL's answer to MVCC dead tuple accumulation. It reclaims space occupied by dead tuples, updates the visibility map (allowing Index-Only Scans to skip heap fetches), and prevents transaction ID wraparound — the catastrophic failure mode where Postgres refuses to accept new transactions.

PostgreSQL has autovacuum, a background daemon that triggers based on a dead tuple threshold (autovacuum_vacuum_scale_factor defaults to 20% of table size). For large tables, this default is dangerously high — a 500 million row table would need 100 million dead tuples before autovacuum wakes up.

For JSON-heavy workloads, tune aggressively:

Lower autovacuum_vacuum_scale_factor to 0.01 or even 0.005 for large tables
Raise autovacuum's I/O budget by reducing autovacuum_vacuum_cost_delay
Monitor pg_stat_user_tables: track n_dead_tup, last_autovacuum, and last_autoanalyze
Consider VACUUM ANALYZE after bulk loads or mass updates to refresh planner statistics

MongoDB does not have an equivalent to VACUUM. WiredTiger reclaims space within its B-tree pages automatically via checkpointing, and collection-level compaction can be triggered manually. There is no "transaction ID wraparound" risk, and space reclamation is generally more transparent to the application.

TOAST: PostgreSQL's Large Value Storage

PostgreSQL has a hard limit: a single row must fit on one 8KB page. Since jsonb documents can easily exceed 8KB, PostgreSQL uses TOAST (The Oversized-Attribute Storage Technique) to handle large values.

When a jsonb value exceeds roughly 2KB (the TOAST threshold), PostgreSQL will automatically:

Compress the value (using LZ compression by default)
If still too large, chunk it into 2KB segments stored in a separate TOAST table (pg_toast_<oid>)
Store a pointer in the main heap row referencing the TOAST chunks

This is largely transparent, but the performance implications are real:

Reads: fetching a TOASTed column requires an additional heap scan on the TOAST table — extra I/O on every large document fetch
Updates: updating any field in a large jsonb document causes the entire value to be re-TOASTed, even if you only changed one key. Combined with MVCC write amplification, this is double the I/O penalty
VACUUM on TOAST tables: autovacuum must process the TOAST table separately; TOAST table bloat is a common source of hidden disk usage that operators miss
Index access: GIN indexes on jsonb operate on the decompressed value, so retrieving the full document still requires a TOAST table hit, even if the query filter was satisfied entirely by a GIN index

The practical recommendation: if your JSON documents regularly exceed 4–8KB, consider splitting large, rarely-queried fields into separate columns or an object store. Keep the frequently-queried JSON fields in a compact jsonb column.

MongoDB documents have their own size limit (16MB per document) and store data in BSON format. WiredTiger handles variable-length documents natively without a separate overflow mechanism, which gives MongoDB an advantage for workloads dominated by large, frequently-updated documents.

The Decision Framework

PostgreSQL (jsonb)

Transactions: ACID by default for every statement
Joins: native SQL joins with optimizer support
Integrity: foreign keys and relational constraints
Update write cost: full-row rewrite on updates
Large JSON reads: TOAST can add extra I/O on large values
Space maintenance: requires VACUUM tuning on write-heavy workloads
Index options: GIN, B-tree, partial, and expression indexes
Analytics: full SQL, CTEs, and window functions
Best fit: mixed relational + JSON workloads

MongoDB

Transactions: multi-document ACID available, but with higher overhead
Joins: $lookup as an aggregation stage
Integrity: no native relational integrity model
Update write cost: copy-on-write page checkpointing in WiredTiger
Large JSON reads: BSON inline storage up to 16MB document limit
Space maintenance: automatic space reuse, optional manual compaction
Index options: compound, multikey, text, and geospatial indexes
Analytics: aggregation pipeline model
Best fit: document-first workloads

The Operational Reality

In my experience managing both at scale, PostgreSQL's MVCC + VACUUM model requires more active DBA engagement for write-heavy JSON workloads. You will fight bloat if you don't tune autovacuum aggressively. TOAST adds I/O overhead that isn't obvious until you instrument it. But the payoff — full SQL expressiveness, relational integrity, and a single database for everything — is significant.

MongoDB's operational model is simpler for pure document workloads, but the moment your product evolves and relationships emerge (they always do), you pay the cost of having chosen a non-relational foundation at a time when re-architecting is expensive.

The best database for JSON is the one you understand deeply enough to tune, monitor, and operate at production scale. For most teams building data-intensive applications in 2026, that database is PostgreSQL.

One contender I haven't covered here is Amazon DocumentDB — a MongoDB-compatible service built on the Aurora storage layer that deserves its own deep dive. I'll be publishing a follow-up post that adds DocumentDB to the mix, including what it actually is under the hood, where it diverges from native MongoDB, and how it stacks up against Aurora PostgreSQL for JSON workloads on AWS.

Have you run into TOAST bloat or MVCC write amplification in a PostgreSQL JSON workload? Or migrated from MongoDB back to Postgres? Drop a comment — I'd love to compare notes.

canonical_url: https://iqtoolkit.ai/blog/postgresql-vs-mongodb-json-internal-tradeoffs

DEV Community: Giovanni Martinez

Inside Hypercore: How TimescaleDB Quietly Built a Hybrid OLTP/OLAP Engine on Postgres

The reframe

The two-store mental model

What actually happens during conversion

The compression algorithms (or: why your time-series data wants to be small)

segmentby and orderby: the two decisions that determine everything

segmentby

orderby

An example with numbers

The OLTP problem (and what 2.18 did about it)

The gotchas section

Schema changes are heavier than they look

UPDATEs on the columnstore aren't free

Logical replication has limits

When to lean on Hypercore (and when to think twice)

What's next

Beyond the API: DocumentDB vs. Aurora PostgreSQL for JSON Workloads

The Shared Foundation: The Database as a Log-Centric System

Amazon DocumentDB Unmasked: Compatibility Layer, Not MongoDB Internals

The Trade-offs of Compatibility by Emulation

$lookup with correlated subqueries

Missing or version-gated aggregation stages

Index behavior differences

Ecosystem tooling assumptions

Aurora PostgreSQL: Does It Fix the MVCC / TOAST Problem?

Querying, Indexing, and the Compute Bottleneck

Aurora PostgreSQL (jsonb + GIN)

Amazon DocumentDB

The Decision Framework

The Verdict

PostgreSQL vs. MongoDB for JSON: The Internal Trade-offs They Don't Tell You in the Docs

First, a Fundamental Framing Problem

PostgreSQL's JSON Capabilities: More Than You Think

MVCC: The Hidden Cost in PostgreSQL JSON Workloads

VACUUM: PostgreSQL's Maintenance Obligation

TOAST: PostgreSQL's Large Value Storage

The Decision Framework

PostgreSQL (jsonb)

MongoDB

The Operational Reality

`segmentby` and `orderby`: the two decisions that determine everything

`segmentby`

`orderby`

`UPDATE`s on the columnstore aren't free

`$lookup` with correlated subqueries

Aurora PostgreSQL (`jsonb` + GIN)