Manveer Chawla

Posted on Apr 20 • Originally published at dataanalyticsguide.substack.com

ClickHouse Native JSON Support in 2026: A PR-by-PR Analysis

#data #dataengineering #clickhouse #json

TL;DR

ClickHouse has full native JSON support, and has since v25.3. The JSON type stores each path as a separate columnar subcolumn with native type preservation, primary key indexing, and selective path reads. It is 2,500x faster than MongoDB for aggregations on the JSONBench 1-billion-document benchmark. The narrative that "ClickHouse can't do JSON" is outdated by two years and 80+ merged PRs.

We analyzed 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks to trace the full evolution of JSON support from string-based functions through the modern native JSON type.
In 2021, the criticism had some basis. JSON was stored as opaque String blobs, queried via JSONExtract* functions that required full column scans on every query. The experimental Object('json') type shipped in 2022 but suffered from eager type unification, unbounded column explosion, and race conditions.
By early 2026, ClickHouse ships a production-ready native JSON type built on three foundational types (Variant, Dynamic, JSON), with configurable path limits, type hints, primary key support for JSON subcolumns, three generations of storage serialization, and a query planner that reads only the specific JSON paths your query needs. None of this requires manual schema management.
The single highest-impact storage change is advanced shared data serialization (PR #83777), which delivered 58x faster reads and 3,300x less memory for selective path access by introducing per-granule metadata with path indexes.
The native JSON type stores each path as a separate Dynamic-typed subcolumn in columnar format. The result: 2,500x faster than MongoDB for aggregations, 10x faster than Elasticsearch, and 9,000x faster than DuckDB/PostgreSQL for analytics on the same dataset, according to the JSONBench benchmark on 1 billion Bluesky documents.
The JSON type reached GA in ClickHouse 25.3 (PR #77785), with experimental flags removed and the type backported to the LTS release. The legacy Object('json') type was fully removed in v25.11 (PR #85718).
Verdict: the "ClickHouse doesn't do JSON" advice referenced a system that no longer exists. The current JSON type is a ground-up columnar implementation that preserves native types, supports primary key indexing, and reads only the paths you query. Repeating the old criticism in 2026 is misinformation.

Why People Still Say "ClickHouse Has No Native JSON Support"

If you've evaluated ClickHouse for semi-structured data, you've heard the warnings:

"ClickHouse doesn't support JSON natively"
"Flatten JSON into columns manually"
"Use JSONExtract functions on String columns" (as the primary approach)
"Use Object('JSON')" (deprecated type)
"No native JSON support"

Some of these started as legitimate observations circa 2021-2022. ClickHouse did store JSON as Strings. The JSONExtract* functions did scan the full column. The first attempt at a native type (Object('json')) did have serious architectural flaws.

Others were amplified by competitors who found a convenient story: ClickHouse is fast for scans, but it can't handle semi-structured data.

Then ClickHouse's engineering team spent three years building one of the most sophisticated columnar JSON implementations in any database. Over 80 significant pull requests merged. They built three new foundational types (Variant, Dynamic, JSON), three generations of storage serialization, a query planner that reads only needed subcolumns, primary key and skip index support for JSON paths, and clear migration paths from every legacy representation.

This article traces that evolution with PR-level evidence. No marketing claims. No benchmarks on toy datasets. Just the commit history.

Methodology: How We Analyzed ClickHouse's JSON Type Commit History

We went through ClickHouse's GitHub commit history, pull requests, changelogs, and release blogs from 2019 through early 2026. The scope covered every PR that touched JSON handling: type implementations, storage formats, function changes, planner optimizations, memory improvements, correctness fixes, and migration paths.

Each PR was classified by category (type system, storage, functions, planner, correctness, migration), impact severity, and whether it changed default behavior. We cross-referenced PR descriptions against changelog entries and benchmark results to verify claimed improvements. Where multiple PRs addressed the same subsystem, we traced the dependency chain to understand how incremental changes compounded.

The result is a ranked analysis of 80+ pull requests organized into six phases, with full provenance. Every claim in this article maps to a specific merged PR that you can verify yourself on GitHub.

ClickHouse JSON Capabilities in 2026: What Ships by Default

The current state, as of early 2026:

Native JSON data type (GA since v25.3): Each JSON path is stored as a separate Dynamic-typed subcolumn in columnar format. Full SQL query, filter, and aggregation support on JSON fields, including nested structures and arrays (Array(JSON)). Configurable max_dynamic_paths (default 1024) and max_dynamic_types (default 32) control resource usage. Known paths can be materialized as physical columns with type hints (JSON(key1 UInt32, key2 String)), while unknown paths are automatically discovered with type inference. Path filtering via SKIP and SKIP REGEXP provides fine-grained schema control.
Three foundational types: Variant (PR #58047) provides discriminated unions. Dynamic (PR #63058) extends Variant with open-ended type storage. JSON (PR #66444) combines both to store semi-structured data with native type preservation.
Primary key and skip index support: JSON subcolumns can appear in ORDER BY and data-skipping index expressions (PR #72644), enabling the same data pruning that ClickHouse applies to regular typed columns.
Advanced shared data serialization: Per-granule path indexes for selective reads of specific paths without scanning the entire JSON column (PR #83777). Three serialization modes optimized for different access patterns.
Planner-level subcolumn optimization: The query planner reads only the JSON paths referenced in your query (PR #68053), pushes subcolumn requirements through CTEs and views (PR #94105), and rewrites JSONExtract calls into direct subcolumn reads (PR #96711).
Full JSONExtract interop: All JSONExtract* functions work with native JSON columns (PR #96711). Introspection functions (distinctJSONPaths, distinctJSONPathsAndTypes) provide schema discovery from metadata alone (PR #68463, PR #92196).
Migration from every legacy format: ALTER TABLE ... MODIFY COLUMN converts String, Object('json'), Map, and Tuple columns to the native JSON type (PR #70442, PR #71784, PR #71320).

These are not experimental features behind flags. They are defaults that ship with every ClickHouse installation since v25.3.

ClickHouse JSON Myths vs. Reality: A 2026 Checklist

#	The FUD	Score	Evidence Volume	Reality (2026)
1	"No native JSON support"	False since Aug 2024	PR #66444, #77785	Native JSON type stores each path as a separate columnar subcolumn. GA since v25.3.
2	"Flatten JSON into columns manually"	False since Aug 2024	PR #66444, #72644	Automatic path flattening into Dynamic-typed subcolumns. No manual schema management.
3	"Use JSONExtract on String columns"	Outdated	PR #96711, #66444	JSONExtract works on native JSON columns and gets rewritten to direct subcolumn reads. No full-column scan.
4	"Use Object('JSON')"	Removed in v25.11	PR #85718, #66444	Object('json') was replaced by a ground-up redesign. The old type was fully removed in v25.11.
5	"JSON queries require full column scans"	False since 2024	PR #68053, #83777, #94105	Planner reads only referenced subcolumns. Advanced serialization provides per-granule path indexes. 58x faster, 3,300x less memory.
6	"Can't index JSON fields"	False since Dec 2024	PR #72644, #98886	JSON subcolumns in ORDER BY, primary key, and skip indexes. Bloom/text indexes on JSONAllPaths.
7	"JSON types lose type information"	False	PR #58047, #63058	Variant/Dynamic preserve native types (UInt32, Float64, DateTime, etc.). No String collapse.
8	"ClickHouse JSON is slower than document DBs"	False	JSONBench (1B docs)	2,500x faster than MongoDB. 10x faster than Elasticsearch. 9,000x faster than DuckDB/PostgreSQL.
9	"No schema discovery for JSON"	False since Aug 2024	PR #68463, #92196	`distinctJSONPaths()` and `distinctJSONPathsAndTypes()` read metadata only. Instant schema views.
10	"Can't migrate existing JSON String columns"	False since Oct 2024	PR #70442, #71784, #71320	ALTER TABLE converts String, Object, Map, and Tuple to native JSON. Background merge conversion.

Phase 1: ClickHouse JSON Functions and String Storage (2019-2021)

The FUD: "Use JSONExtract functions on String columns"

In this era, the criticism was fair. ClickHouse stored JSON as opaque String blobs, and every JSON query required parsing the entire string value.

ClickHouse JSONExtract Functions: simdjson-Powered but CPU-Heavy (May 2019)

PR #5235 introduced the JSONExtract* function family, powered by simdjson with a RapidJSON fallback. This was a meaningful step: SIMD instructions allowed structural element identification at near-memory-bandwidth speeds.

But the fundamental limitation remained. Every query, no matter which field it accessed, required scanning and parsing the full JSON string column. There was no way to read just event.user_id without also reading event.metadata, event.payload, and every other field.

ClickHouse provided two function families with different trade-offs:

simpleJSON / visitParam: Minimalist heuristic parsing with low CPU overhead, but strict assumptions about canonical encoding and no nested object support.
JSONExtract*: Full simdjson-powered parsing with standards-compliant extraction, but high per-row CPU cost from full document parsing.

Neither approach could avoid the core problem: 100% column scan for every query.

SQL/JSON Standard Functions (Mid-2021)

PR #24148 added JSON_VALUE, JSON_QUERY, and JSON_EXISTS with JSONPath expression support, bringing ClickHouse closer to SQL/JSON standard compliance. This improved SQL compatibility but did not change the underlying storage model. JSON was still strings.

Map(String, String): A Partial Improvement

The Map(String, String) type offered some improvement by storing JSON key-value pairs natively, eliminating the need for string parsing on every access. But it still required reading all keys to find one entry, and it lost all type information by collapsing everything to strings.

By the end of 2021, ClickHouse had capable JSON parsing functions but no native JSON storage. The gap was real, and the engineering team knew it.

Phase 2: ClickHouse Object('json') Type -- What Went Wrong (2022)

The FUD: "Use Object('JSON')"

PR #23932, merged March 2022 by Anton Popov, was the first attempt at native columnar JSON storage. It shipped in ClickHouse 22.3 LTS under allow_experimental_object_type. The implementation spanned 101 commits and proved a critical concept: JSON could be stored with each path as a separate subcolumn.

But it had serious architectural flaws:

Challenge	Impact	Consequence
Eager Type Unification	Mixed types at a path collapsed to String	Lost native type optimizations
Metadata Explosion	High memory for many unique keys	System instability with high-cardinality JSON
Race Conditions	Inconsistent results during merges	Unreliable query analysis
Schema Rigidity	Inability to handle type changes	Required manual ALTER or table rewrites
No Primary Key Support	JSON paths excluded from ORDER BY	No data pruning on JSON fields

Despite these flaws, Object('json') validated the demand for native JSON storage and identified every architectural challenge the replacement would need to solve.

Alongside the type work, ClickHouse continued improving JSON ecosystem support. PR #40910 introduced the JSONObjectEachRow format for keyed JSON objects. PR #39186 added automatic type inference from JSON strings, detecting dates, datetimes, and integers by default. PR #54427 enabled schema inference of JSON objects as named Tuples.

These format and inference improvements meant ClickHouse was getting better at ingesting JSON data. What it still lacked was a sound way to store it.

Phase 3: ClickHouse Variant and Dynamic Types -- The JSON Foundation (2024)

The FUD: "JSON types lose type information"

Rather than patching Object('json'), ClickHouse built from first principles. The redesign started with two new foundational types that solved the type-preservation problem that had plagued the original implementation.

Variant: Discriminated Union Type (January 2024)

PR #58047, by Pavel Kruglov, introduced Variant(T1, T2, ..., TN), a discriminated union storing values of different types in a single column. It uses a UInt8 discriminator column plus dense subcolumns per type variant, supporting up to 255 variants. The PR included 47 commits and roughly 5,000 lines of tests.

This solved the type-unification problem that killed Object('json'). Instead of collapsing 42 and "hello" at the same path into String, Variant stores them in their native types with a discriminator indicating which type each row contains.

A follow-up optimization (PR #62774) introduced compact discriminator serialization: when all discriminators in a granule are the same type (the common case for JSON paths), it stores 3 values instead of 8,192. This is highly effective in practice since most JSON paths have homogeneous types within a granule.

Dynamic: Open-Ended Type Storage (May 2024)

PR #63058, also by Pavel Kruglov, extended Variant with an open, self-describing type set. Dynamic has a max_types parameter (default 32); the most frequent types get their own Variant slots, and overflow types are stored in a SharedVariant as binary-encoded strings. This provided the flexibility that JSON demands without the unbounded explosion that doomed Object('json').

The PR included 39 commits and introduced the dynamicType() introspection function. A dynamic_structure.bin metadata file per data part tracks the type composition.

These two types, Variant and Dynamic, were the architectural foundation. The JSON type would combine them both.

Phase 4: ClickHouse Native JSON Type Implementation (2024)

The FUD: "ClickHouse doesn't support JSON natively"

How ClickHouse Implemented the Native JSON Data Type: PR #66444 (August 2024)

PR #66444 is the single most important commit in ClickHouse's JSON evolution. Authored by Pavel Kruglov, it implements the entirely new JSON data type in 91 commits, closing RFC #54864 ("Semistructured Columns") authored by Alexey Milovidov.

The design works as follows. JSON paths are flattened into individual Dynamic-typed subcolumns, each stored in separate column files per data part. Paths exceeding max_dynamic_paths (default 1024) overflow into a shared data structure. The type supports:

Full SQL support: Query, filter, and aggregate on any JSON field using standard SQL. Nested structures and arrays are first-class citizens.
Configurable limits: max_dynamic_paths (default 1024) and max_dynamic_types (default 32) control resource usage
Materialized known paths: Type hints like JSON(key1 UInt32, key2 String) materialize known paths as physical typed columns for maximum performance, while unknown paths are automatically created with type inference as they are discovered
Path filtering: SKIP and SKIP REGEXP to exclude noisy paths from columnar storage
Dot-notation access: json.a.b for direct path reads
Sub-object access: json.^prefix for extracting JSON subtrees
Array(JSON) support: Nested structures and arrays of JSON documents
Efficient data skipping on dynamic paths: JSON subcolumns in primary keys and skip indexes enable granule-level pruning

First shipped in ClickHouse 24.8 LTS under allow_experimental_json_type. The official blog post "How we built a new powerful JSON data type for ClickHouse" (October 2024) detailed the architecture.

20x Memory Reduction for Inserts (September 2024)

PR #69272 addressed a critical production concern: memory consumption during JSON inserts. Before this PR, inserting JSON data consumed 6.99 GiB of memory. After, 354 MiB. A 20x reduction.

The fix was adaptive write buffer sizing. Buffers start at 16 KiB and grow exponentially to a maximum of 1 MiB, selectively enabled for dynamic substreams. S3 inserts improved from 23.13 GiB to 7.65 GiB. No throughput regression.

ALTER String to JSON + Serialization V2 (October 2024)

PR #70442 delivered two major changes. First, ALTER TABLE ... MODIFY COLUMN col JSON to convert existing String columns to the JSON type. Conversion happens during background merges, so there is no downtime. Second, Serialization V2 for JSON and Dynamic types with an improved binary layout.

This was the beginning of clear migration paths. Teams no longer had to reimport data to adopt native JSON.

JSONExtract Refactoring for Native JSON (July 2024)

PR #66046 refactored the JSONExtract function family to work with the new type, splitting the implementation into reusable JSONExtractTree.h/cpp components and adding Dynamic type support. This ensured that existing queries using JSONExtract would continue to work when columns migrated to native JSON.

Introspection Functions (August 2024)

PR #68463 added distinctDynamicTypes(), distinctJSONPaths(), and distinctJSONPathsAndTypes(). These are essential schema discovery tools for semi-structured data. They were later optimized in PR #92196 to read only metadata files instead of scanning actual data, making schema diversity views effectively instant.

Subcolumn Optimization Enabled by Default (August 2024)

PR #68053 enabled optimize_functions_to_subcolumns by default. This planner optimization rewrites function calls to read only the specific subcolumns needed, which is transformative for JSON queries. A query accessing json.user.id reads only that subcolumn's data, not the entire JSON column.

Beta Promotion (November 2024)

PR #72294 moved JSON, Dynamic, and Variant to beta status, backported to 24.11. This signaled production readiness for early adopters.

JSON Subcolumns in Primary Key and Skip Indexes (December 2024)

PR #72644 was a milestone for performance. It enabled JSON subcolumns (json.path.to.key) in ORDER BY expressions and data-skipping index definitions. This means ClickHouse applies the same data pruning to JSON fields that it applies to regular typed columns.

The JSONBench benchmark uses this capability for sub-second queries over 1 billion documents. Without it, JSON columns could not participate in ClickHouse's primary mechanism for reducing scan ranges.

Migration Paths from Every Legacy Format

By the end of 2024, clear migration routes existed for every semi-structured representation:

Source Type	Migration Method	PR	Notes
`String`	`ALTER TABLE ... MODIFY COLUMN ... JSON`	#70442	Background merge conversion; `ALTER UPDATE` for immediate
`Map(String, String)`	`CAST(col AS JSON)`	#71320	Serialize-then-parse roundtrip
`Tuple`	`CAST(col AS JSON)`	#71320	Serialize-then-parse roundtrip
`Object('json')`	`ALTER TABLE ... MODIFY COLUMN ... JSON`	#71784	Must complete before upgrading past v25.11
`JSON(params_A)` to `JSON(params_B)`	`CAST` or `ALTER`	#72303	Change max_dynamic_paths, SKIP rules, type hints

Phase 5: ClickHouse JSON Reaches GA -- Performance and Storage Optimizations (2025)

The FUD: "JSON queries require full column scans"

ClickHouse JSON Production-Ready: GA in v25.3 (March 2025)

PR #77785, authored by Alexey Milovidov and expanded by Pavel Kruglov, removed all experimental and beta gates for JSON, Dynamic, and Variant. The commit message references https://jsonbench.com/. Backported to ClickHouse 25.3 LTS via cherry-pick PRs #77974 and #77975.

The 25.3 release blog stated: "About 1.5 years ago, we weren't happy with our JSON implementation, so we returned to the drawing board."

63x Memory Reduction for Read Prefetches (March 2025)

PR #77640 addressed memory consumption during read-ahead prefetches of JSON columns in Wide parts. Before: SELECT * WHERE y=1 on 1 million rows with 1,000 JSON paths consumed 69.16 GiB peak memory. After: 1.11 GiB. A 63x reduction.

4-10x Faster S3 Reads (February 2025)

PR #74827 introduced prefetches for subcolumn prefix deserialization, a cache for deserialized prefixes, and parallel prefix deserialization for JSON columns on S3. The result: 4x faster full scans and roughly 10x faster LIMIT 10 queries on remote storage. This introduced MergeTreePrefixesDeserializationThreadPool and benefits any remote filesystem with similar latency characteristics.

58x Faster Selective Reads: Advanced Shared Data Serialization (August 2025)

PR #83777 is the most impactful storage optimization in the JSON type's history. It introduced three serialization modes for shared data (the overflow storage for paths beyond max_dynamic_paths):

Mode	Read Latency	Memory	Write Cost	Ideal Use Case
`map`	High for subcolumns	High	Low	Writing data, reading whole JSON
`map_with_buckets`	Medium	Medium	Medium	Balanced workloads
`advanced`	Low for subcolumns	Low	High	Reading specific paths

The advanced mode creates per-granule .structure, .data, and .paths_marks files with a path index that enables direct lookup of specific paths without scanning the entire shared data structure.

The benchmarks speak for themselves. Reading a single key from 200,000 rows with 10,000 unique paths improved from 3.63s / 12.53 GiB to 0.063s / 3.89 MiB. That is 58x faster and 3,300x less memory. For Compact parts, non-existing key reads improved from 3.4s to 0.3s (roughly 11x faster), memory from 517 MiB to 3.7 MiB (roughly 140x reduction).

This PR contained 47 commits and is documented in the official ClickHouse blog "Making complex JSON 58x faster, use 3,300x less memory."

Substream Marks in Compact Parts (March 2025)

PR #77940 added marks for individual substreams within compact parts, extending selective subcolumn read efficiency to the compact storage format. Previously, reading any subcolumn from a compact part required reading the entire part.

Experimental Settings Obsoleted (v25.8)

PR #85934 marked the experimental and beta JSON settings as obsolete. JSON was now unconditionally enabled.

Legacy Object('json') Fully Removed (November 2025)

PR #85718 removed the deprecated Object('json') implementation entirely. 270 files changed. ColumnObjectDeprecated, DataTypeObjectDeprecated, deprecated serialization files, the JSONDataParser, and all legacy tests were deleted. This was backward-incompatible: any tables or queries referencing Object('json') must be migrated before upgrading past v25.11.

Phase 6: ClickHouse JSON Query Planner and JSONExtract Interop (2026)

The FUD: "JSONExtract on String columns is the primary approach"

ClickHouse JSONExtract Now Works with Native JSON Columns (2026)

PR #96711, by Fisnik Kastrati, extended all JSONExtract*, JSONHas, JSONLength, and JSONType functions to accept native JSON columns directly. More importantly, it introduced a FunctionToSubcolumnsPass planner optimization that rewrites constant-path JSONExtract calls into direct subcolumn reads.

This means existing queries that use JSONExtractString(json_col, 'user', 'name') now bypass text parsing entirely when the column is a native JSON type. The planner rewrites the call to a direct subcolumn read of json_col.user.name. This closed issue #88370.

Optimized has(JSON, path) Function (2026)

PR #96927 added an optimized has(json_col, 'path') function for fast path-existence checks without text parsing. This is essential for queries that filter based on whether a JSON path exists.

SubcolumnPushdownPass in Query Planner (January 2026)

PR #94105 introduced SubcolumnPushdownPass, which pushes subcolumn requirements through CTEs and views. This means wrapping a JSON table in a view or CTE no longer defeats subcolumn optimizations.

Skip Indexes on JSONAllPaths (April 2026)

PR #98886 enabled bloom and text skip indexes on JSONAllPaths(), allowing efficient filtering on JSON key presence. This gives ClickHouse the ability to skip entire granules when querying for documents that contain (or don't contain) specific paths.

SIMD Tokenizer Refactoring (2026)

PR #97871, by Amos Bird, refactored the tokenizer to a SIMD-ready stateful API, replacing the older iterator API. This lays the groundwork for continued parsing performance improvements.

Combined Subcolumn Access (2026)

PR #98788 introduced a unified combined subcolumn that returns Dynamic for both scalar and object access at a path. This simplifies queries that need to handle paths where the value might be a scalar or a nested object.

ClickHouse JSON Performance Benchmarks: MongoDB, Elasticsearch, and DuckDB Compared

The JSON type's performance has been validated by both ClickHouse's internal benchmarks and independent third-party testing. The numbers come from specific, verifiable sources.

JSONBench: 1 Billion Bluesky Documents (January 2025)

The JSONBench benchmark (https://jsonbench.com/) tested the native JSON type against other databases on 1 billion Bluesky social media documents on a single m6i.8xlarge node:

Comparison	ClickHouse Advantage	Source
vs MongoDB aggregations	2,500x faster (405ms vs ~16 min)	JSONBench
vs Elasticsearch aggregations	10x faster	JSONBench
vs DuckDB/PostgreSQL analytics	9,000x faster	JSONBench
Storage vs compressed files	20% more compact (same algorithm)	JSONBench
Storage vs MongoDB	40% more efficient	JSONBench
Peak memory (1B document count)	< 3 MiB	JSONBench

A follow-up benchmark in March 2025 scaled to 4 billion+ documents (1.6 TiB), achieving 91.84 million docs/sec throughput with sub-100ms queries.

Storage and Memory Improvements

Optimization	PR	Before	After	Improvement
Insert memory	#69272	6.99 GiB	354 MiB	20x reduction
S3 insert memory	#69272	23.13 GiB	7.65 GiB	3x reduction
Read prefetch memory	#77640	69.16 GiB	1.11 GiB	63x reduction
Selective read latency	#83777	3.63s	0.063s	58x faster
Selective read memory	#83777	12.53 GiB	3.89 MiB	3,300x reduction
S3 full scan	#74827	Baseline	4x faster	4x improvement
S3 LIMIT 10	#74827	Baseline	~10x faster	~10x improvement

Third-Party Validation

SigNoz, a ClickHouse-based observability platform, reported 30% faster log queries with the native JSON type. ClickHouse's own observability stack (ClickStack) demonstrated 9x faster queries compared to the previous Map-based approach for OpenTelemetry log attributes.

ClickHouse JSON for OpenTelemetry and Log Analytics: A Real-World Use Case

The observability domain is where the JSON type's impact is most visible. Before native JSON, log management solutions built on ClickHouse flattened attributes into Map(String, String) columns, losing type information. Queries like SUM(LogAttributes.response_size) required explicit casts on every access.

With the native JSON type, OpenTelemetry log attributes preserve their native types. The performance difference:

Dimension	Legacy (Map)	Modern (JSON)	Impact
I/O Efficiency	Read entire column	Read specific path subcolumns	Reduced disk I/O
Memory Footprint	High (String parsing)	Low (Columnar access)	Lower peak memory
Schema Migration	Manual ALTER TABLE	Fully Automatic	Simplified operations
Aggregation Speed	Slow (Cast required)	Native (No cast)	Up to 10x faster queries

ClickHouse JSON Limitations and Trade-offs in 2026

Fairness matters. A few things still require awareness:

Path explosion requires attention. Without appropriate max_dynamic_paths settings and SKIP rules, high-cardinality JSON (thousands of unique paths per document) can create many subcolumns. Set limits that match your schema shape, and use SKIP REGEXP for noisy paths.
Subcolumn pruning through SELECT * in CTEs is not yet supported. Issue #92455 documents this gap. Explicitly name columns in CTEs over JSON tables for now.
Legacy Object('json') migration is mandatory. PR #85718 enforces a hard removal. Post-upgrade to v25.12+, any tables or queries referencing Object('json') will fail. Audit schemas and run ALTER before upgrading past v25.11.
Correctness fixes are ongoing. Edge cases in JSONExtract interop (issue #102018, #102079), default value handling (issue #101721), and specific format combinations (issue #101911) show that a system this complex requires staying on the latest stable release.
Advanced shared data mode trades write cost for read performance. The per-granule path indexes that enable 58x faster reads add write overhead. For write-heavy workloads with infrequent selective reads, the simpler map mode may be more appropriate.
Type hints and path configuration require understanding your data. The defaults work well for moderate schemas (up to 1,024 unique paths). Workloads with tens of thousands of unique paths need tuning.

These are real engineering trade-offs, and understanding them is part of making an informed decision.

ClickHouse JSON Evolution Timeline (2019-2026)

Year	What Changed	Key PRs	Impact
2019	JSONExtract function family with simdjson	#5235	SIMD-accelerated extraction from String columns. Full column scan required.
2021	SQL/JSON standard functions	#24148	JSONPath support. Still string-based storage.
2022	Object('json') first attempt. Format ecosystem.	#23932, #40910, #39186	Proved columnar JSON concept. Architectural flaws identified. Schema inference improved.
2024 H1	Variant and Dynamic types. Compact discriminators.	#58047, #63058, #62774	Type-preserving foundation built. Efficient storage for homogeneous granules.
2024 H2	Native JSON type. 20x insert memory. Serialization V2. Primary key support. Beta.	#66444, #69272, #70442, #72644, #72294	Complete native JSON type ships. Migration paths established. JSON paths in ORDER BY.
2025	GA. 63x read memory. 58x selective reads. S3 optimization. Legacy removal.	#77785, #77640, #83777, #74827, #85718	Production-ready. Advanced shared data. 2,500x vs MongoDB. Legacy Object removed.
2026	JSONExtract interop. Planner intelligence. Skip indexes on paths.	#96711, #94105, #98886, #96927	Full function compatibility. Subcolumn pushdown through CTEs. Bloom indexes on JSON paths.

When Should You Use the Native JSON Type in ClickHouse?

Workload	Verdict	Reasoning
Log and event analytics with semi-structured attributes	Yes	Native type preserves types, subcolumn reads minimize I/O, primary key support enables data pruning
OpenTelemetry / observability data	Yes	Purpose-built for this. ClickStack validates 9x faster queries vs Map approach
JSON documents with known high-value fields	Yes	Use type hints for critical paths, SKIP rules for noisy paths, ORDER BY on key fields
Schema-on-read analytics over heterogeneous JSON	Yes	Dynamic type handles mixed schemas. `distinctJSONPaths()` provides instant schema discovery
Migrating from MongoDB/Elasticsearch for analytics	Yes	2,500x faster aggregations (MongoDB), 10x faster (Elasticsearch). Clear migration via ALTER
JSON with 10,000+ unique paths per document	Depends	Set appropriate `max_dynamic_paths`. Use SKIP REGEXP for noisy paths. Advanced shared data helps but requires tuning
Write-heavy JSON ingestion with rare reads	Depends	Simpler serialization modes (`map`) may be more appropriate than `advanced`
Existing String/Map JSON columns	Yes, migrate	ALTER TABLE converts in background. No downtime. Immediate query performance improvement

How to Respond When Someone Says "ClickHouse Doesn't Support JSON"

Run the PR and benchmark numbers.

When someone tells you ClickHouse can't handle JSON in 2026, ask them if they've tested against a version that includes the native JSON type (GA since v25.3, PR #77785), primary key support for JSON subcolumns (v24.12, PR #72644), advanced shared data serialization (v25.8, PR #83777), or JSONExtract interop with native JSON columns (v26.2, PR #96711).

If they're referencing Object('json'), that type was removed in v25.11. If they're recommending JSONExtract on String columns as the primary approach, the native JSON type has made that unnecessary since v24.8. If they're telling you to flatten JSON into columns manually, the type does this automatically with configurable limits and type hints.

The commit history doesn't lie. 80+ pull requests. Three foundational types. Three generations of storage serialization. Primary key indexing. Planner-level subcolumn optimization. 2,500x faster than MongoDB on real-world data.

ClickHouse's JSON implementation in 2026 bears no resemblance to the string-based functions and experimental Object type that earned those early warnings. The engineers built a ground-up columnar JSON storage system, and the evidence is in the PRs.

Test it on your workload. That's the only benchmark that matters.

ClickHouse JSON FAQ

Does ClickHouse support JSON natively in 2026?

Yes. ClickHouse's native JSON type (PR #66444) stores each JSON path as a separate Dynamic-typed subcolumn in columnar format. It reached GA in v25.3 (PR #77785) with all experimental flags removed. The legacy Object('json') type was fully removed in v25.11 (PR #85718).

What is the most impactful ClickHouse JSON optimization?

Advanced shared data serialization (PR #83777), which delivers 58x faster reads and 3,300x less memory for selective path access. For insert workloads, adaptive write buffers (PR #69272) with 20x memory reduction are equally important.

ClickHouse vs MongoDB vs Elasticsearch for JSON: Which Is Faster?

On the JSONBench benchmark (1 billion Bluesky documents, single node), ClickHouse with the native JSON type is 2,500x faster than MongoDB for aggregations, 10x faster than Elasticsearch, and 9,000x faster than DuckDB/PostgreSQL for analytics. Storage is 20% more compact than compressed files and 40% more efficient than MongoDB.

Should I migrate from JSONExtract on String columns to the native JSON type?

Yes. ALTER TABLE ... MODIFY COLUMN col JSON converts String columns to native JSON during background merges with no downtime (PR #70442). After migration, queries read only the paths they need instead of scanning the full string. JSONExtract functions continue to work on native JSON columns (PR #96711) and get rewritten to direct subcolumn reads by the planner.

What happened to ClickHouse Object('json') and how to migrate?

Object('json') (PR #23932) was ClickHouse's first attempt at native JSON storage, shipped in 2022. It suffered from type unification issues, metadata explosion, and race conditions. Rather than patching it, ClickHouse built an entirely new implementation from first principles using Variant, Dynamic, and JSON types. Object('json') was fully removed in v25.11 (PR #85718). Tables using it must be migrated via ALTER TABLE ... MODIFY COLUMN ... JSON (PR #71784) before upgrading.

Can I use JSON fields in ClickHouse primary keys and indexes?

Yes, since v24.12. PR #72644 enables JSON subcolumns in ORDER BY and data-skipping index expressions. PR #98886 adds bloom and text skip indexes on JSONAllPaths() for efficient key-presence filtering.

How does ClickHouse handle high-cardinality JSON with thousands of paths?

The max_dynamic_paths parameter (default 1024) controls how many paths get their own columnar subcolumn. Paths beyond this limit overflow into shared data storage. The advanced serialization mode (PR #83777) makes shared data reads efficient with per-granule path indexes. Use SKIP and SKIP REGEXP to exclude noisy paths from columnar storage.

Is ClickHouse JSON good for logs and observability?

Yes. ClickHouse's own observability stack (ClickStack) uses the native JSON type for OpenTelemetry log attributes, demonstrating 9x faster queries compared to the previous Map-based approach. SigNoz independently validated 30% faster log queries. The type preserves native numeric types, eliminating cast overhead for aggregations on log attributes.

Analysis based on 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks covering the period 2019-2026. Every claim maps to a specific merged PR. Verify the evidence yourself -- the commit history is public.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.