DEV Community

Cover image for Oracle SQL/JSON: The Developer's Guide to Querying JSON Like a Pro
Rick Houlihan
Rick Houlihan

Posted on • Edited on

Oracle SQL/JSON: The Developer's Guide to Querying JSON Like a Pro

Oracle SQL/JSON: The Developer's Guide to Querying JSON Like a Pro

SQL and JSON aren't enemies. They never were.

For a decade, the industry sold you a false choice: pick the rigidity of relational tables or the flexibility of JSON documents. Build your app on SQL or build it on a document store. Structure or freedom. Choose.

It was never a real tradeoff. It was a failure of implementation masquerading as a law of nature.

Oracle's SQL/JSON support — built into the database engine since 12c and dramatically expanded through 19c, 21c, and 26ai — proves the point. You get the full expressive power of SQL (joins, aggregations, window functions, a 40-year-old cost-based optimizer) and the full flexibility of JSON (nested documents, arrays, schema-optional structures) in the same query, the same transaction, the same execution plan.

This article is a practical developer's guide. We'll start simple and build to sophisticated. By the end, you'll understand CTEs, every major SQL/JSON function, and how they compose into queries that would require five different databases in a polyglot architecture.

Let's build.


The Sample Data

Everything in this article runs against a single table. Simple enough to follow, rich enough to be real:

CREATE TABLE orders (
  id        NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  order_doc JSON
);

INSERT INTO orders (order_doc) VALUES (JSON('{
  "orderId": 1001,
  "customer": "Acme Corp",
  "status": "shipped",
  "priority": "high",
  "orderDate": "2025-03-15",
  "shipping": {
    "method": "express",
    "address": {
      "city": "Austin",
      "state": "TX",
      "zip": "78701"
    }
  },
  "items": [
    {"product": "Widget Pro",  "quantity": 10, "unitPrice": 29.99, "category": "hardware"},
    {"product": "Gadget Plus", "quantity": 5,  "unitPrice": 49.99, "category": "electronics"},
    {"product": "Cable Kit",   "quantity": 50, "unitPrice": 4.99,  "category": "accessories"}
  ],
  "tags": ["wholesale", "priority", "Q1-promo"]
}'));
Enter fullscreen mode Exit fullscreen mode

Notice the column type: JSON. Not VARCHAR2. Not CLOB. The native JSON data type — introduced in 21c — stores documents in Oracle's OSON binary format. Hash-indexed field navigation. O(1) access to any field at any depth. This matters more than most developers realize, and we'll come back to why.


1. Dot-Notation: The Easy On-Ramp

If you've never queried JSON in Oracle, start here. Dot-notation gives you direct field access using the syntax you'd expect from any programming language:

SELECT o.order_doc.customer,
       o.order_doc.status,
       o.order_doc.shipping.address.city
FROM   orders o;
Enter fullscreen mode Exit fullscreen mode

That's it. Table alias, JSON column, dot-separated field path. Oracle navigates the OSON binary structure, hashes the field names, jumps directly to the offsets. No parsing. No scanning.

Types are preserved. When the column is declared as JSON (the native OSON type), Oracle knows the underlying types of your values. A number stored as a JSON number comes back as a number. A string comes back as a string. You can use dot-notation results directly in comparisons, arithmetic, and predicates — Oracle handles the type mapping:

-- Types flow naturally — no casting required
SELECT o.order_doc.customer       AS customer_name,
       o.order_doc.orderId        AS order_id,
       o.order_doc.status         AS status
FROM   orders o
WHERE  o.order_doc.orderId > 1000;   -- numeric comparison works
Enter fullscreen mode Exit fullscreen mode

Item methods let you coerce types when you need explicit control — converting to a specific SQL type or applying a transformation:

SELECT o.order_doc.customer.string()       AS customer_name,
       o.order_doc.orderId.number()         AS order_id,
       o.order_doc.orderDate.date()         AS order_date,
       o.order_doc.items.size()             AS item_count
FROM   orders o
WHERE  o.order_doc.priority.string() = 'high';
Enter fullscreen mode Exit fullscreen mode

The distinction matters: you don't have to call .string() to use a string value as a string. But item methods give you precision when you need it — .date() parses a JSON string into a SQL DATE you can use in date arithmetic, .number() ensures numeric precision for financial calculations, and methods like .upper(), .size(), and .sum() apply transformations inline without a separate function call.

Available item methods include: .string(), .number(), .date(), .timestamp(), .boolean(), .double(), .length(), .upper(), .lower(), .size(), .type(), .abs(), .ceiling(), .floor(), .count(), .sum(), .avg(), .minNumber(), .maxNumber().

That's not a thin wrapper around JSON. That's a full type conversion system built into the path expression itself.

Array access works how you'd expect:

-- First item
SELECT o.order_doc.items[0].product AS first_product
FROM   orders o;

-- All items (returns a JSON array)
SELECT o.order_doc.items[*].product
FROM   orders o;

-- Last item
SELECT o.order_doc.items[last].product
FROM   orders o;
Enter fullscreen mode Exit fullscreen mode

Dot-notation is perfect for quick queries, dashboards, and ad-hoc exploration. When you need more control — type safety, error handling, complex path expressions — you reach for the SQL/JSON functions.


2. JSON_VALUE: Surgical Scalar Extraction

JSON_VALUE extracts a single scalar value from a JSON document and returns it as a SQL type. Think of it as dot-notation's more disciplined sibling.

SELECT JSON_VALUE(o.order_doc, '$.customer')                          AS customer,
       JSON_VALUE(o.order_doc, '$.orderId' RETURNING NUMBER)          AS order_id,
       JSON_VALUE(o.order_doc, '$.shipping.address.zip')              AS zip_code
FROM   orders o;
Enter fullscreen mode Exit fullscreen mode

Why use JSON_VALUE when dot-notation exists?

Dot-notation gives you type preservation and coercion via item methods. JSON_VALUE adds two things dot-notation can't do: error handling and SQL-standard portability (it's part of the ISO SQL/JSON spec, so your queries translate across databases that support the standard).

Error handling saves you at runtime:

-- NULL ON ERROR (default): missing path returns NULL silently
SELECT JSON_VALUE(o.order_doc, '$.nonexistent' NULL ON ERROR) FROM orders o;

-- ERROR ON ERROR: missing path raises ORA exception
SELECT JSON_VALUE(o.order_doc, '$.nonexistent' ERROR ON ERROR) FROM orders o;

-- DEFAULT ON ERROR: missing path returns your fallback
SELECT JSON_VALUE(o.order_doc, '$.discount' DEFAULT 0 ON ERROR RETURNING NUMBER)
FROM   orders o;
Enter fullscreen mode Exit fullscreen mode

The same three options — NULL, ERROR, DEFAULT — work for ON EMPTY (path exists but matches nothing). During development, use ERROR ON ERROR to catch path mistakes early. In production, choose the behavior that matches your business logic.

Pro tip (26ai+): Instead of adding error clauses to every function, set it at the session level:

ALTER SESSION SET JSON_BEHAVIOR = 'ON_ERROR:ERROR';
Enter fullscreen mode Exit fullscreen mode

This applies to all SQL/JSON functions in the session — JSON_VALUE, JSON_QUERY, JSON_TABLE, and JSON_EXISTS all inherit the error default. Any function that doesn't have an explicit ON ERROR clause now raises errors instead of silently returning NULL. Toggle it off when you're done debugging, or set it to NULL if you'd rather have permissive defaults.

Indexability is where both JSON_VALUE and dot-notation earn their keep in production. You can create functional B-tree indexes on JSON fields using either syntax:

-- Functional index using JSON_VALUE
CREATE INDEX idx_order_customer ON orders (
  JSON_VALUE(order_doc, '$.customer' RETURNING VARCHAR2(100) ERROR ON ERROR)
);

-- Equivalent index using dot-notation
CREATE INDEX idx_order_status ON orders o (
  o.order_doc.status.string()
);
Enter fullscreen mode Exit fullscreen mode

The type-matching gotcha: The RETURNING type in your index definition and your query must match exactly. If you create the index with RETURNING NUMBER but your query omits the RETURNING clause (which defaults to VARCHAR2), the CBO won't recognize them as the same expression — and your index sits unused. This is the single most common SQL/JSON performance mistake:

-- Index defined with RETURNING NUMBER
CREATE INDEX idx_order_id ON orders (
  JSON_VALUE(order_doc, '$.orderId' RETURNING NUMBER ERROR ON ERROR)
);

-- This query USES the index (types match)
SELECT * FROM orders
WHERE  JSON_VALUE(order_doc, '$.orderId' RETURNING NUMBER) = 1001;

-- This query IGNORES the index — JSON_VALUE defaults to VARCHAR2, not NUMBER
SELECT * FROM orders
WHERE  JSON_VALUE(order_doc, '$.orderId') = 1001;
Enter fullscreen mode Exit fullscreen mode

Same field. Same data. Completely different execution plans. Check your EXPLAIN PLAN — if you see a full table scan where you expected an index range scan, type mismatch is the first thing to investigate.

You can also create composite indexes across multiple JSON fields, just like relational composite indexes:

CREATE INDEX idx_cust_date ON orders (
  JSON_VALUE(order_doc, '$.customer' RETURNING VARCHAR2(100)),
  JSON_VALUE(order_doc, '$.orderDate' RETURNING DATE)
);
Enter fullscreen mode Exit fullscreen mode

This powers queries that filter on customer and sort by date — a common API pattern — without touching the table at all.


3. Extended Types: What OSON Actually Stores

Here's the thing that surprises developers: OSON stores more than the JSON spec defines.

The JSON standard has six types — string, number, boolean, null, object, array. That's the entire type system. Dates are strings. Timestamps are strings. High-precision decimals are floats (with all the IEEE 754 rounding baggage that implies). Binary data? Base64-encoded strings. Vectors? Not a thing.

OSON is a superset. It stores Oracle's native scalar types directly in the document — as themselves, not as stringified approximations. Write a DATE, store a DATE, read a DATE. No parsing on the way in or out.

This is why dot-notation has so many item methods and JSON_VALUE has so many RETURNING options. They're not type conversions — they're accessors for types that are already there.

The Extended Type Table

OSON Type Dot-notation JSON_VALUE RETURNING When to use
VARCHAR2 / text .string() RETURNING VARCHAR2(n) Text, identifiers, categorical data
NUMBER .number() RETURNING NUMBER Money, quantities, anything where rounding matters
BINARY_DOUBLE .double() RETURNING BINARY_DOUBLE IEEE 754 double — scientific math
BINARY_FLOAT .float() RETURNING BINARY_FLOAT Single-precision float
BOOLEAN .boolean() RETURNING BOOLEAN (23ai+) True/false flags
DATE .date() RETURNING DATE Second-precision date+time (see note below)
TIMESTAMP .timestamp() RETURNING TIMESTAMP Sub-second precision, no timezone
TIMESTAMP WITH TIME ZONE .timestamp() RETURNING TIMESTAMP WITH TIME ZONE Zone-aware moments in time (see note below)
INTERVAL YEAR TO MONTH RETURNING INTERVAL YEAR TO MONTH Month/year durations
INTERVAL DAY TO SECOND RETURNING INTERVAL DAY TO SECOND Sub-second durations
RAW .binary() RETURNING RAW Hashes, fingerprints, binary payloads
VECTOR (26ai) RETURNING VECTOR Embeddings for similarity search

A note on DATE and temporal types. Oracle's DATE despite the name actually stores date + time down to the second — year, month, day, hour, minute, second. There's no "date without time" primitive in Oracle. If you want day-level matching (e.g., "all orders placed on April 11" regardless of the time of day), you'll want to TRUNC() the value or use a range predicate like WHERE dt >= DATE '2026-04-11' AND dt < DATE '2026-04-12'. TIMESTAMP WITH TIME ZONE has the opposite subtlety — the same moment in time can have multiple representations (2026-04-11 14:30 UTC == 2026-04-11 10:30 EDT), so equality comparisons need to either normalize to UTC or use the right comparison operator. These aren't OSON quirks — they're inherited from Oracle's core type system. The important thing is that OSON stores them as those types, with full fidelity, so you get the same semantics as a relational column of the same type.

Why You Care

Dates are the feature developers underestimate until trouble tickets start piling up. In a text-based JSON store, your orderDate is a string — and that string can be anything. Someone stores "2026-04-11", someone else stores "2026-04-11T14:30:00Z", a third person stores "04/11/2026" because their frontend didn't normalize. Your sort order is now a lottery. Your range filter silently skips rows. Your index is polluted with mismatched formats that the database has no way to validate.

So you push validation into application code. Every write path has to sanitize the date format before it hits the database. Every integration has to agree on the convention. Every bug report starts with "wait, what format is this field actually in?"

In OSON, your orderDate is a DATE. Period. The database enforces it at insert time — anything that isn't a valid date gets rejected before it pollutes storage. Queries just work:

-- Native date comparison. No parse. No cast. Index still works.
SELECT o.order_doc.customer
FROM   orders o
WHERE  o.order_doc.orderDate.date() > SYSDATE - 7;
Enter fullscreen mode Exit fullscreen mode

The same logic applies to every extended type:

  • NUMBER — decimal precision IEEE 754 can't represent. Critical for money.
  • TIMESTAMP WITH TIME ZONE — scheduling across regions without reinventing timezone math.
  • VECTOR — similarity search in the same document as your operational data. No sidecar. No sync lag.
  • RAW — binary hashes without base64 inflation.

You stop writing parsing code. The database handles it.

Getting Typed Values Into OSON

Three paths, depending on where the data comes from:

1. SQL/PL/SQL construction — automatic. JSON_OBJECT and JSON_ARRAY preserve native types:

INSERT INTO events (event_doc) VALUES (JSON_OBJECT(
  'name'      VALUE 'Product Launch',
  'scheduled' VALUE TIMESTAMP '2026-06-15 09:30:00 America/New_York',
  'price'     VALUE 299.99,     -- stored as NUMBER
  'confirmed' VALUE TRUE        -- stored as BOOLEAN
));
Enter fullscreen mode Exit fullscreen mode

2. JSON_SCALAR — explicit. Force a specific SQL type into a JSON scalar:

SELECT JSON_OBJECT('expiresAt' VALUE JSON_SCALAR(SYSTIMESTAMP + INTERVAL '30' DAY))
FROM DUAL;
Enter fullscreen mode Exit fullscreen mode

3. Extended JSON syntax — the text path. When JSON arrives as a string from an API, use type markers with the EXTENDED keyword:

INSERT INTO events (event_doc) VALUES (JSON('{
  "name": "Product Launch",
  "scheduled": {"$oracleTimestampTZ": "2026-06-15T09:30:00-04:00"}
}' EXTENDED));
Enter fullscreen mode Exit fullscreen mode

Oracle recognizes $oracleDate, $oracleTimestamp, $oracleTimestampTZ, $oracleBinary and stores them as native types.

The Payoff

OSON makes JSON a structured document with first-class types — the type system in your database matches the type system in your application. No translation layer. No parsing code. No rounding bugs you find in production. Dates stay dates, numbers stay numbers, vectors stay vectors, all the way from storage to your response body.


4. JSON_QUERY: When You Need the Whole Object

JSON_VALUE returns scalars. JSON_QUERY returns JSON fragments — objects, arrays, or multiple values.

-- Extract the shipping object
SELECT JSON_QUERY(o.order_doc, '$.shipping' PRETTY) AS shipping_info
FROM   orders o;

-- Extract the items array
SELECT JSON_QUERY(o.order_doc, '$.items') AS all_items
FROM   orders o;

-- Extract all product names (multiple values → need a wrapper)
SELECT JSON_QUERY(o.order_doc, '$.items[*].product' WITH WRAPPER) AS product_names
FROM   orders o;
-- Result: ["Widget Pro","Gadget Plus","Cable Kit"]
Enter fullscreen mode Exit fullscreen mode

The Wrapper Clause

This is the part that confuses people. Here's the rule:

What the path returns WITHOUT WRAPPER (default) WITH WRAPPER WITH CONDITIONAL WRAPPER
Single object/array Returns as-is Wraps in array Returns as-is
Single scalar Error/NULL Wraps in array [42] Wraps in array [42]
Multiple values Error/NULL Wraps in array [1,2,3] Wraps in array [1,2,3]

WITH CONDITIONAL WRAPPER is the pragmatic choice — it wraps only when the result isn't already a single JSON value. Use it when you're not sure whether a path will return one thing or many.

Error handling in JSON_QUERY offers additional options beyond the NULL/ERROR pattern:

-- Return empty array when path doesn't match
SELECT JSON_QUERY(o.order_doc, '$.reviews' EMPTY ARRAY ON ERROR) FROM orders o;
-- Result: []

-- Return empty object
SELECT JSON_QUERY(o.order_doc, '$.metadata' EMPTY OBJECT ON ERROR) FROM orders o;
-- Result: {}
Enter fullscreen mode Exit fullscreen mode

These are Oracle-specific. No other database gives you EMPTY ARRAY ON ERROR. Sounds small — until you're building a JSON API response and don't want null checks littering your application code.


5. JSON_EXISTS: Filtering with Path Predicates

JSON_EXISTS isn't a function — it's a SQL condition. It returns TRUE or FALSE. Use it in WHERE clauses to filter rows based on JSON content.

-- Orders that have a shipping address
SELECT o.order_doc.customer.string()
FROM   orders o
WHERE  JSON_EXISTS(o.order_doc, '$.shipping.address');

-- Orders with at least one item over $25
SELECT o.order_doc.customer.string()
FROM   orders o
WHERE  JSON_EXISTS(o.order_doc, '$.items[*]?(@.unitPrice > 25)');
Enter fullscreen mode Exit fullscreen mode

That ?(@.unitPrice > 25) is a filter expression. The @ symbol refers to the current element being evaluated by the filter — which for $.items[*] means each item in the array, one at a time. You can combine multiple conditions:

-- Items that are expensive AND high quantity
WHERE JSON_EXISTS(o.order_doc,
  '$.items[*]?(@.unitPrice > 25 && @.quantity >= 5)')

-- Orders with specific tags
WHERE JSON_EXISTS(o.order_doc,
  '$.tags[*]?(@.string() == "wholesale")')

-- Nested existence checks
WHERE JSON_EXISTS(o.order_doc,
  '$?(@.status == "shipped"
      && exists(@.items[*]?(@.category == "electronics")))')
Enter fullscreen mode Exit fullscreen mode

Filters aren't just for arrays. A filter is a predicate that applies to whatever path step it's attached to — the step doesn't have to return an array. For example:

-- Return the customer name only when the customer lives in Belmont
SELECT JSON_VALUE(o.order_doc, '$.customer?(@.address.city == "Belmont").name')
FROM   orders o;
Enter fullscreen mode Exit fullscreen mode

There's no array in that path. $.customer is a single object, and the filter either passes (returning the name) or fails (returning NULL). When used with JSON_VALUE, non-matching rows come back as NULL — which is usually exactly what you want for "give me the name only if..." queries.

Path expressions transparently process intermediate arrays. This is a subtlety that catches people off guard, especially if you're coming from a language where arrays require explicit iteration. In Oracle's path expression engine, $.a.b doesn't just navigate from field a to field b — if a happens to be an array, the engine automatically iterates through every element and evaluates .b on each one. You don't need $.a[*].b — the [*] is implicit.

This matters most with JSON_EXISTS:

-- Works even if 'addresses' is an array of objects
WHERE JSON_EXISTS(doc, '$.addresses?(@.city == "Boston")')
Enter fullscreen mode Exit fullscreen mode

If addresses is a single object, the filter tests that object. If addresses is an array, the filter tests each element — and returns true if any element matches. Same path expression, both shapes. You don't have to know (or care) whether the field is a scalar, an object, or an array when you write the filter. The engine handles the polymorphism.

This is particularly useful for schemas that evolve over time — a field that started as a single value and later became an array doesn't break your existing queries.

Filter-then-project is a common pattern. You can chain a filter with a projection to return a subset of fields from matching elements:

-- Get the SKUs of line items that cost more than $25
SELECT JSON_QUERY(o.order_doc,
  '$.items[*]?(@.unitPrice > 25).product' WITH WRAPPER) AS expensive_skus
FROM orders o;
-- ["Widget Pro","Gadget Plus"]
Enter fullscreen mode Exit fullscreen mode

Notice the WITH WRAPPER clause. When a filter-then-project returns multiple values — like the SKUs above — you need JSON_QUERY to re-wrap them into a JSON array. JSON_VALUE won't work here because it only returns single scalars. This is the most common reason developers reach for JSON_QUERY over JSON_VALUE in real applications.

The PASSING Clause: Bind Variables in Path Expressions

This is a feature most developers don't know exists — and it changes how you write dynamic JSON filters:

WHERE JSON_EXISTS(o.order_doc,
  '$.items[*]?(@.category == $cat && @.unitPrice < $max)'
  PASSING 'electronics' AS "cat", 50 AS "max")
Enter fullscreen mode Exit fullscreen mode

No string concatenation. No SQL injection risk. Bind variables plugged directly into the JSON path expression.

This matters more than most developers realize. In document databases, dynamic filters typically mean building query strings in application code — concatenating user input into $match pipelines or find() predicates. Every concatenated string is a potential injection vector. Every dynamically constructed query is a unique query shape the engine has to parse and plan from scratch.

The PASSING clause eliminates both problems at once. The CBO parses the path expression once, builds an execution plan once, and reuses it across every parameter combination. Ten thousand different category/price filter combinations hit the same optimized plan. That's not just safer — it's measurably faster. Hard-parsed queries burn CPU. Soft-parsed queries with bind variables don't. At scale, that difference shows up on your cloud bill.

The Full Filter Predicate Toolkit

Oracle's path expressions support far more than basic comparisons:

Operator Example
==, !=, <, <=, >, >= ?(@.price > 100)
&& (AND), `\ \
{% raw %}exists() $?exists(@.items[*]?(@.flagged == true))
in() ?(@.category in ("electronics","tools"))
has substring ?(@.name has substring "Pro")
starts with ?(@.sku starts with "WDG-")
like_regex ?(@.email like_regex ".*@oracle\\.com")

PostgreSQL users: this is roughly equivalent to jsonb_path_exists, but richer.

MongoDB users: this is your $elemMatch and $regex — except it runs inside the SQL optimizer with access to indexes, join reordering, and cost-based plan selection.


6. JSON_TABLE: The Bridge Between Worlds

If you learn one SQL/JSON function from this article, make it JSON_TABLE. It projects JSON data into relational rows and columns — the bridge between document flexibility and relational power.

SELECT jt.*
FROM   orders o,
       JSON_TABLE(o.order_doc, '$.items[*]'
         COLUMNS (
           row_num    FOR ORDINALITY,
           product    VARCHAR2(100)  PATH '$.product',
           quantity   NUMBER         PATH '$.quantity',
           unit_price NUMBER         PATH '$.unitPrice',
           category   VARCHAR2(50)   PATH '$.category'
         )
       ) jt;
Enter fullscreen mode Exit fullscreen mode

Result:

ROW_NUM PRODUCT QUANTITY UNIT_PRICE CATEGORY
1 Widget Pro 10 29.99 hardware
2 Gadget Plus 5 49.99 electronics
3 Cable Kit 50 4.99 accessories

Each array element becomes a row. Each JSON field becomes a column. The FOR ORDINALITY column auto-generates row numbers. And now you can do everything SQL does: GROUP BY, ORDER BY, SUM(), AVG(), JOIN to other tables, window functions — the full relational toolkit applied to JSON data.

Column Types

JSON_TABLE supports four column types:

JSON_TABLE(doc, '$' COLUMNS (
  -- Value column (like JSON_VALUE): extracts a scalar
  name      VARCHAR2(100) PATH '$.name',

  -- Query column (like JSON_QUERY): extracts JSON fragment
  metadata  VARCHAR2(4000) FORMAT JSON PATH '$.metadata',

  -- Exists column (like JSON_EXISTS): returns 'true'/'false'
  has_notes VARCHAR2(5) EXISTS PATH '$.notes',

  -- Ordinality column: auto-generated row number
  row_num   FOR ORDINALITY
))
Enter fullscreen mode Exit fullscreen mode

NESTED PATH: Hierarchical Flattening

This is where JSON_TABLE gets powerful. Real-world JSON is nested. Orders contain items. Items contain variants. You need to flatten multiple levels without writing self-joins.

SELECT jt.*
FROM   orders o,
       JSON_TABLE(o.order_doc, '$'
         COLUMNS (
           order_id    NUMBER        PATH '$.orderId',
           customer    VARCHAR2(100) PATH '$.customer',
           NESTED PATH '$.items[*]' COLUMNS (
             item_num   FOR ORDINALITY,
             product    VARCHAR2(100) PATH '$.product',
             quantity   NUMBER        PATH '$.quantity',
             unit_price NUMBER        PATH '$.unitPrice'
           )
         )
       ) jt;
Enter fullscreen mode Exit fullscreen mode

Here's what that produces from our sample order document:

ORDER_ID CUSTOMER ITEM_NUM PRODUCT QUANTITY UNIT_PRICE
1001 Acme Corp 1 Widget Pro 10 29.99
1001 Acme Corp 2 Gadget Plus 5 49.99
1001 Acme Corp 3 Cable Kit 50 4.99

One JSON document became three relational rows. The parent fields (order_id, customer) repeat for each array element. The ITEM_NUM ordinality column tracks position within the array. This is the document-to-relational bridge in action — and from here, you can GROUP BY category, JOIN to a products table, run window functions, or anything else SQL gives you.

The NESTED PATH clause creates an implicit lateral join — Oracle automatically cross-applies the nested array to the parent fields. Parent rows are preserved even when the array is empty (left outer join semantics). Sibling NESTED PATHs at the same level produce a union join, not a Cartesian product — Oracle knows you don't want row explosion.

You can nest multiple levels deep, and you can have sibling NESTED PATHs at the same level. This is where the join semantics get interesting. Consider a document with both items and tags:

SELECT jt.*
FROM   orders o,
       JSON_TABLE(o.order_doc, '$'
         COLUMNS (
           order_id    NUMBER        PATH '$.orderId',
           NESTED PATH '$.items[*]' COLUMNS (
             product    VARCHAR2(100) PATH '$.product',
             quantity   NUMBER        PATH '$.quantity'
           ),
           NESTED PATH '$.tags[*]' COLUMNS (
             tag        VARCHAR2(50)  PATH '$'
           )
         )
       ) jt;
Enter fullscreen mode Exit fullscreen mode
ORDER_ID PRODUCT QUANTITY TAG
1001 Widget Pro 10
1001 Gadget Plus 5
1001 Cable Kit 50
1001 wholesale
1001 priority
1001 Q1-promo

Sibling NESTED PATHs produce a union join, not a Cartesian product. You get 3 item rows + 3 tag rows = 6 total rows, not 3 × 3 = 9. Oracle fills the non-matching columns with NULLs. This is critical — without union join semantics, sibling arrays would cause row explosion that scales multiplicatively with array size.

Compare with MongoDB's $unwind. MongoDB's equivalent operation is the aggregation pipeline's $unwind stage, which flattens one array at a time. To flatten both items and tags, you'd write:

db.orders.aggregate([
  { $unwind: "$items" },
  { $unwind: "$tags" }
])
Enter fullscreen mode Exit fullscreen mode

This produces a Cartesian product|items| × |tags| rows per document. For our three-item, three-tag example, that's 9 rows per order, not 6. Now imagine a document with 50 line items and 20 tags: MongoDB gives you 1,000 rows. Oracle gives you 70. The ratio gets worse as the arrays grow.

You can work around this in MongoDB with $facet (which runs the two unwinds as independent sub-pipelines and merges the results), but now you're hand-engineering join semantics that Oracle handles automatically as part of the core JSON_TABLE operator. One more case where the developer is the optimizer.

You can also nest paths within paths for multi-level hierarchies:

NESTED PATH '$.sections[*]' COLUMNS (
  section_name VARCHAR2(100) PATH '$.name',
  NESTED PATH '$.paragraphs[*]' COLUMNS (
    para_text VARCHAR2(4000) PATH '$.text',
    para_ord  FOR ORDINALITY
  )
)
Enter fullscreen mode Exit fullscreen mode

Why JSON_TABLE Matters

Here's the thing most developers miss: JSON_TABLE doesn't just make JSON queryable. It makes JSON optimizable.

Once JSON is projected into relational columns, the CBO treats it like any other relational data. It can push predicates through the projection. It can use functional indexes. It can choose between nested loops, hash joins, and merge joins. It can parallelize.

The document database world doesn't have this. MongoDB's aggregation pipeline is a sequential chain of stages — each stage processes the full result set of the previous stage. There's no cost-based optimizer reordering stages, pushing filters earlier, or choosing between join algorithms.

What does that actually mean for you as a developer? It means you stop writing optimization logic in your application code.

In a document database, you learn — usually the hard way — that the order you filter, sort, and aggregate matters. You hand-tune your aggregation pipeline stages. You restructure queries because putting $match before $unwind is faster than after. You read blog posts about which pipeline stages can use indexes and which can't. You become the optimizer. That's not your job.

With JSON_TABLE, you describe what you want and the database figures out how to get it. You write WHERE category = 'electronics' on the outer query, and the CBO decides — without you asking — to push that filter down into the JSON scan, hit the functional index, and skip 99% of the documents. You join the flattened JSON to a products table, and the CBO picks the join algorithm (nested loops for small results, hash join for large ones) based on actual table statistics, not your guess about data volume.

Add a column to the SELECT? The plan adapts. Data distribution changes as your table grows from 10K to 10M rows? The plan adapts. You didn't change a line of code. You write declarative SQL. The database does the engineering.

Oracle's CBO has had 40+ years of development. When you use JSON_TABLE, you're handing your JSON to that optimizer. That's not a small thing — it's the difference between writing query logic and writing query infrastructure.


7. What's a CTE and Why Should You Care?

A Common Table Expression (CTE) is a named, temporary result set defined in a WITH clause that exists for the duration of a single SQL statement. If you've never used one, think of it as a named subquery that you can reference multiple times — and that makes complex SQL readable.

WITH high_value_items AS (
  SELECT jt.*
  FROM   orders o,
         JSON_TABLE(o.order_doc, '$.items[*]'
           COLUMNS (
             product    VARCHAR2(100)  PATH '$.product',
             quantity   NUMBER         PATH '$.quantity',
             unit_price NUMBER         PATH '$.unitPrice'
           )
         ) jt
  WHERE  jt.unit_price > 10
)
SELECT product, quantity * unit_price AS line_total
FROM   high_value_items
ORDER  BY line_total DESC;
Enter fullscreen mode Exit fullscreen mode

That's a CTE. The WITH clause defines high_value_items. The main query uses it like a table. The query reads top-to-bottom, each step building on the last.

Why CTEs Matter for JSON Work

JSON transformations are inherently multi-step: flatten the document, filter and enrich, then re-assemble into a new shape. Without CTEs, you're nesting subqueries five levels deep. With CTEs, each step is a named, testable, readable block.

WITH
  -- Step 1: Flatten JSON into rows
  raw_items AS (...),
  -- Step 2: Join with relational product catalog
  enriched_items AS (...),
  -- Step 3: Aggregate by category
  category_totals AS (...)
-- Step 4: Build the JSON response
SELECT JSON_OBJECT(...) FROM category_totals;
Enter fullscreen mode Exit fullscreen mode

Four steps. Each one readable in isolation. Each one debuggable by replacing the final SELECT with SELECT * FROM step_N.

Multiple CTEs and Chaining

CTEs can reference earlier CTEs in the same WITH clause:

WITH
  step1 AS (SELECT ... FROM orders),
  step2 AS (SELECT ... FROM step1 WHERE ...),
  step3 AS (SELECT ... FROM step2 JOIN products ON ...)
SELECT * FROM step3;
Enter fullscreen mode Exit fullscreen mode

CTEs vs Aggregation Pipelines: A Mental Model

If you're coming from MongoDB, you already understand multi-step data transformations — that's what the aggregation pipeline is. CTEs are the same idea, with two critical differences: named addressability and optimizer-driven execution order.

A MongoDB pipeline is an array of stages that execute in declaration order. Modern pipelines can branch via $facet, $lookup, and $unionWith, so technically the pipeline is a DAG too — but stages still run in the order you wrote them, with only limited pipelining across non-blocking stages. There's no cost-based reordering. There's no "the optimizer decided it was cheaper to filter before the unwind." If you put $match after $unwind when it should have been before, you pay for it — and so does every row that had to be flattened just to get thrown away a stage later.

And here's a detail that matters more than it looks: $facet branches execute sequentially, not in parallel. Each sub-pipeline runs to completion before the next one starts. You get the shape of a branch — separate result sets from the same input — but not the physics of parallelism. If Branch A takes 200ms and Branch B takes 300ms, you wait 500ms, not 300ms. The branches share an input document set, but the engine processes them one at a time.

CTEs are also a DAG, but with three key differences:

  1. Each step is named and addressable. The final SELECT can reach into any CTE simultaneously — join raw_items to category_totals, filter on enriched, aggregate across all three. You're not locked into the most-recent-stage-feeds-the-next-stage structure.
  2. The CBO decides the execution order. Push a predicate from the outer query into the innermost CTE? Sure. Reorder joins based on row estimates? Yes. Pick a hash join here and a nested loop there? Automatic. You describe the logic; the optimizer decides the physics.
  3. Independent branches can execute in parallel. When the CBO sees that two CTEs have no data dependency on each other, it can run them concurrently. The same two branches that take 500ms sequentially in $facet take 300ms under Oracle's parallel execution — because the optimizer knows they're independent and schedules them accordingly.

The differences compound in practice:

Aggregation Pipeline CTEs
Step identity Anonymous (position in array) Named (referenced by name)
Data flow Mostly linear; branches via $facet/$lookup/$unionWith Full DAG: any step can reference any earlier step
Execution order Runs in declaration order, no cost-based reordering CBO reorders based on statistics and predicates
Parallelism $facet branches execute sequentially, not in parallel CBO parallelizes independent branches automatically
Debugging Insert $out or explain() at specific stages Replace final SELECT with SELECT * FROM step_N
Reuse Can't reference earlier stages from later ones (except via $facet) Any CTE reusable by multiple downstream steps
Optimization Developer hand-tunes stage order CBO pushes predicates, picks join methods, parallelizes

The aggregation pipeline forces you to think about how the database should process data — stage order matters, $match belongs before $unwind, put your projections last to minimize intermediate document size. CTEs let you think about what the data should look like at each logical stage, and the CBO figures out the how. That's the core difference: you become the optimizer in one model, and you stop being the optimizer in the other.

Recursive CTEs

For hierarchical data — org charts, category trees, bill-of-materials — Oracle supports recursive CTEs. The structure has two parts: an anchor query that selects the starting rows, and a recursive query that joins back to the CTE itself to walk the hierarchy:

WITH category_tree (id, parent_id, name, lvl) AS (
  -- Anchor: start at root categories (no parent)
  SELECT id, parent_id, name, 1
  FROM   categories
  WHERE  parent_id IS NULL
  UNION ALL
  -- Recursive: for each row found so far, find its children
  SELECT c.id, c.parent_id, c.name, t.lvl + 1
  FROM   category_tree t
  JOIN   categories c ON c.parent_id = t.id
)
SEARCH DEPTH FIRST BY name SET seq
SELECT LPAD(' ', (lvl-1)*2) || name AS category_hierarchy
FROM   category_tree
ORDER  BY seq;
Enter fullscreen mode Exit fullscreen mode

The SEARCH clause controls what order the recursion visits nodes, and generates a seq column you can use for deterministic ORDER BY. Think of it like walking a tree — you have two choices for how to walk it:

SEARCH DEPTH FIRST BY name — go deep before going wide. Visit a node, then immediately visit its children, then their children, all the way to the leaf, before backtracking to visit siblings. This produces the indented tree output you'd expect:

Electronics
  Laptops
    Gaming Laptops
    Ultrabooks
  Phones
    Android
    iOS
Clothing
  Men
  Women
Enter fullscreen mode Exit fullscreen mode

SEARCH BREADTH FIRST BY name — go wide before going deep. Visit all nodes at level 1, then all nodes at level 2, then level 3. This produces a level-by-level listing:

Electronics
Clothing
  Laptops
  Phones
  Men
  Women
    Gaming Laptops
    Ultrabooks
    Android
    iOS
Enter fullscreen mode Exit fullscreen mode

Depth-first is what you want for tree displays, navigation menus, and indented reports. Breadth-first is useful when you care about levels — "show me all second-level categories" or "find everything within two hops of this node."

The BY name part determines the sort order among siblings at each level. Replace name with any column — BY created_date visits the oldest siblings first, BY priority DESC visits the highest priority first.

Cycle detection: Hierarchical data can have bugs — a category that's its own grandparent creates an infinite loop. Oracle catches this:

CYCLE id SET is_cycle TO 'Y' DEFAULT 'N'
Enter fullscreen mode Exit fullscreen mode

This adds an is_cycle column and stops recursion on any row that would revisit an already-seen id. Without it, a cyclic reference means a runaway query.

Materialization: When Performance Matters

The CBO decides whether to materialize a CTE (store results in a temp table) or inline it (copy the SQL text into the main query). The default behavior:

  • Referenced once: the CBO inlines it (allows predicate pushing, index usage)
  • Referenced multiple times: the CBO materializes it (compute once, read many)

You can override this with hints:

WITH expensive_json_parse AS (
  SELECT /*+ MATERIALIZE */ ...
)
Enter fullscreen mode Exit fullscreen mode

When to materialize: The CTE does expensive work that's reused multiple times, especially when the CBO might otherwise inline and recompute. Compute it once, read it many.

When to inline: The main query has selective predicates that need to push down into the JSON scan to hit functional indexes. Inlining keeps the optimization boundary open.

Here's a case where MATERIALIZE is genuinely the right call — a self-join on an expensively-parsed CTE:

-- Cross-sell analysis: find customer pairs who both bought products
-- in the same category. The CTE is self-joined, so it's referenced twice.
WITH customer_categories AS (
  SELECT /*+ MATERIALIZE */
         JSON_VALUE(o.order_doc, '$.customer') AS customer,
         jt.category
  FROM   orders o,
         JSON_TABLE(o.order_doc, '$.items[*]'
           COLUMNS (category VARCHAR2(50) PATH '$.category')) jt
)
SELECT DISTINCT a.customer       AS customer_a,
                b.customer       AS customer_b,
                a.category       AS shared_category
FROM   customer_categories a
JOIN   customer_categories b ON a.category = b.category
                            AND a.customer < b.customer;
Enter fullscreen mode Exit fullscreen mode

Without materialization, the CBO might inline customer_categories into both sides of the self-join — meaning the JSON_TABLE flattening runs twice, parsing every order document twice. With MATERIALIZE, the parse happens once, the result lands in a session-scoped temp table, and both sides of the self-join read from that temp table at memory speed.

For a 10-million-row orders table with 5 items per document on average, that's the difference between 10 million JSON parses and 20 million. Same result. Half the work.

Note on auto-materialization: Oracle automatically materializes CTEs that are referenced multiple times — so in the example above, the CBO often makes this decision on its own. The explicit /*+ MATERIALIZE */ hint does two things: it documents intent (so future readers understand the query was designed with materialization in mind), and it protects against CBO regressions when statistics shift or plans change between database versions. For a CTE that's expensive to compute and referenced multiple times, making the hint explicit is defensive engineering.

The INLINE hint goes the other direction — it tells Oracle to treat the CTE as if you had copy-pasted its SQL into the main query, instead of walling it off in a temp table. Why would you want that? Because materialization creates a boundary the optimizer can't see through. Once results are in a temp table, Oracle can't look at what the outer query is going to do with them — it has to compute the full CTE result, every row, before your WHERE clause even runs.

Here's what that looks like in practice:

WITH recent_orders AS (
  SELECT /*+ INLINE */
         JSON_VALUE(order_doc, '$.customer') AS customer,
         JSON_VALUE(order_doc, '$.orderDate' RETURNING DATE) AS order_date
  FROM   orders
)
SELECT * FROM recent_orders WHERE order_date > SYSDATE - 7;
Enter fullscreen mode Exit fullscreen mode

Think of it like this: if the CTE is inlined, Oracle can see that you only want orders from the last 7 days. It looks at your functional index on $.orderDate, jumps straight to those rows, and reads maybe 0.1% of the table.

If the CTE is materialized, Oracle has to compute recent_orders first — which means reading every order, parsing every JSON document, building the full intermediate result — and then filtering to the last 7 days. On a 10-million-row table, that's parsing 10 million documents to return 10,000 rows. The index sits unused because the optimizer never gets a chance to use it.

The INLINE hint removes the wall. It tells Oracle "don't stage this — fold it into the main query so you can see the whole picture and optimize across it." When your outer query has a selective filter that should hit an index, inlining is how you make sure that happens.

Both hints exist for the same reason: sometimes you know the shape of your data better than the optimizer's statistics do. Check your execution plans with EXPLAIN PLAN or DBMS_XPLAN. If you see a full table scan where you expected an index range scan, materialization is often the culprit — and INLINE is the fix. If you see the same expensive CTE appearing multiple times in the plan, that's the opposite problem — and MATERIALIZE is the fix.

Hint only when the execution plan tells you to. Trust the CBO by default.


8. Building JSON: The Construction Functions

Reading JSON is half the story. The other half is constructing JSON from relational data — building API responses, materializing document views, assembling complex payloads.

Important: the default return type is VARCHAR2, not JSON. JSON_OBJECT, JSON_ARRAY, JSON_ARRAYAGG, and JSON_OBJECTAGG all default to RETURNING VARCHAR2(4000) for backward compatibility. That means without an explicit RETURNING JSON clause, you're getting a serialized text string — not the binary OSON representation. This causes two real problems: (1) when you nest one of these inside another, you have to use FORMAT JSON to prevent double-escaping (more on this in a moment), and (2) you lose the type fidelity OSON gives you (numbers stay numbers, dates stay dates, etc.). Always add RETURNING JSON when you mean to produce JSON-typed data, especially in CTEs that feed downstream JSON construction. The examples in this section show both forms — explicit and implicit — so you can see how each behaves. In production code, default to RETURNING JSON and only drop it when you specifically want a serialized text result.

The JSON { } and JSON [ ] shorthand (26ai). Oracle added JSON-like constructor syntax that solves the RETURNING problem by defaulting to RETURNING JSON instead of VARCHAR2. JSON { 'foo' : 'bar' } is equivalent to JSON_OBJECT('foo' : 'bar' RETURNING JSON), and JSON [ 1, 2, 3 ] is equivalent to JSON_ARRAY(1, 2, 3 RETURNING JSON). If you're writing new code on 26ai, prefer the JSON { } form — it's shorter, the return type is correct by default, and you never have to worry about missing RETURNING JSON or FORMAT JSON. The JSON_OBJECT / JSON_ARRAY functions remain available and are the right choice when you need explicit RETURNING VARCHAR2 or FORMAT JSON control.

JSON_OBJECT: Rows to Objects

-- Key-value pairs
SELECT JSON_OBJECT(
  'customer' VALUE o.order_doc.customer.string(),
  'total'    VALUE (
    SELECT SUM(jt.quantity * jt.unit_price)
    FROM   JSON_TABLE(o.order_doc, '$.items[*]'
             COLUMNS (quantity NUMBER PATH '$.quantity',
                      unit_price NUMBER PATH '$.unitPrice')) jt
  )
  RETURNING JSON
) AS order_summary
FROM orders o;
-- {"customer":"Acme Corp","total":799.4}
Enter fullscreen mode Exit fullscreen mode

Column shorthand — the column name becomes the key:

SELECT JSON_OBJECT(employee_id, first_name, last_name, email)
FROM   employees WHERE employee_id = 100;
-- {"employee_id":100,"first_name":"Steven","last_name":"King","email":"SKING"}
Enter fullscreen mode Exit fullscreen mode

Wildcard — all columns at once:

SELECT JSON_OBJECT(*) FROM employees WHERE employee_id = 100;
Enter fullscreen mode Exit fullscreen mode

NULL handling — choose your API contract:

-- Include nulls explicitly (default)
SELECT JSON_OBJECT('name' VALUE 'Alice', 'middle' VALUE NULL NULL ON NULL);
-- {"name":"Alice","middle":null}

-- Omit nulls entirely
SELECT JSON_OBJECT('name' VALUE 'Alice', 'middle' VALUE NULL ABSENT ON NULL);
-- {"name":"Alice"}
Enter fullscreen mode Exit fullscreen mode

JSON_ARRAY: Building Arrays

What's DUAL? Think of it as Oracle's REPL. When you open a Node or Python shell and type 1 + 1, you get 2 back — no variables, no setup, just the expression evaluated. DUAL is how you do that in SQL. It's a built-in one-row table that exists purely as a target for expressions you want to run standalone. SELECT SYSDATE FROM DUAL is the SQL equivalent of typing Date.now() into a JavaScript console. Any time you see FROM DUAL in this article, we're just running an expression to show you the result — treat the code block like a REPL session.

SELECT JSON_ARRAY(1, 'two', 3.0, NULL RETURNING JSON) FROM DUAL;
-- [1,"two",3.0]   (NULL omitted by default — ABSENT ON NULL)

SELECT JSON_ARRAY(1, 'two', 3.0, NULL NULL ON NULL RETURNING JSON) FROM DUAL;
-- [1,"two",3.0,null]
Enter fullscreen mode Exit fullscreen mode

26ai: Build JSON arrays directly from a query. This is the syntax you want when you're building an API response. You have rows in a table. You need them as a JSON array in the response body. In 26ai, you just write that:

SELECT JSON_ARRAY(
  SELECT JSON_OBJECT('product' VALUE product_name, 'price' VALUE list_price)
  FROM   products
  WHERE  category_id = 10
  ORDER  BY list_price DESC
  RETURNING JSON
) AS product_catalog
FROM DUAL;
Enter fullscreen mode Exit fullscreen mode

Read it like you'd read code: "give me a JSON array containing an object per product in category 10, sorted by price." That's exactly what the SQL says. No extra steps.

Before 26ai, that simple intent required extra SQL plumbing — you had to stage the rows into a helper query and pull them out with a different function. For anything beyond trivial API responses, this added real noise to your code. The 26ai version is what you'd write on a whiteboard if somebody asked you how it had to work under the hood.

JSON_ARRAYAGG and JSON_OBJECTAGG: Aggregation into JSON

Here's the mental model: JSON_OBJECT and JSON_ARRAY build JSON from one row at a time. JSON_ARRAYAGG and JSON_OBJECTAGG build JSON from many rows at once. Same as the difference between writing row.toJSON() in your ORM versus calling .map(r => r.toJSON()) over a result set — the singular version shapes one record, the plural version rolls up a collection.

These are the workhorses of API response building. Any time you have a parent-child relationship — customer and their orders, order and its line items, post and its comments — you're going to use JSON_ARRAYAGG. It's the function that collapses "many rows" into "one JSON array" so the structure matches what your frontend or API consumer actually wants.

-- Roll up all items in an order into a ranked JSON array
SELECT JSON_ARRAYAGG(
  JSON_OBJECT('product' VALUE jt.product,
              'total'   VALUE jt.quantity * jt.unit_price
              RETURNING JSON)
  ORDER BY jt.quantity * jt.unit_price DESC
  RETURNING JSON
) AS items_ranked
FROM   orders o,
       JSON_TABLE(o.order_doc, '$.items[*]'
         COLUMNS (product VARCHAR2(100) PATH '$.product',
                  quantity NUMBER PATH '$.quantity',
                  unit_price NUMBER PATH '$.unitPrice')) jt;
Enter fullscreen mode Exit fullscreen mode

Notice the RETURNING JSON on both the inner JSON_OBJECT and the outer JSON_ARRAYAGG. Without those, you'd be aggregating VARCHAR2 strings into a VARCHAR2 string, which means double-escaping issues and lost type fidelity. With them, you're aggregating JSON values into a JSON value — clean, binary, native.

Read that as: "take every item row, shape each one as a {product, total} object, then roll them up into a single array sorted by total descending." One statement does the row-level shaping and the collection-level rollup. No application code. No loops. No post-processing.

The ORDER BY inside the aggregate is important — it controls the order of elements in the resulting JSON array. Without it, array element order is undefined. With it, you can guarantee that your API consumers get data in the order they expect (top-N lists, chronological feeds, priority queues).

JSON_OBJECTAGG is the less common sibling — it builds a single JSON object where keys and values both come from table rows. Think of it as turning two columns into a dictionary:

-- Turn a settings table into a JSON config object
SELECT JSON_OBJECTAGG(setting_name VALUE setting_value RETURNING JSON) AS config
FROM   user_settings
WHERE  user_id = 42;
-- {"theme":"dark","language":"en","timezone":"America/Chicago"}
Enter fullscreen mode Exit fullscreen mode

Use it when you have key-value pair data in rows and need a lookup object in the response. Settings, feature flags, translation dictionaries, metadata maps — anything where you'd otherwise reduce rows into an object in application code.

Why these are the "workhorses": Every non-trivial API response needs nested collections. Every nested collection starts as rows in a table. JSON_ARRAYAGG and JSON_OBJECTAGG are the bridge between those two worlds. Combined with JSON_OBJECT for per-row shaping and the FORMAT JSON clause for composition, they let you build arbitrarily nested API responses in a single SQL statement — no middleware, no serialization layer, no ORM-to-JSON mapping code.

The database returns the JSON. Your handler just sends it.

The FORMAT JSON Clause

Here's the gotcha that catches every developer on day one. You build a JSON array in a CTE. You reference it in a JSON_OBJECT to nest it inside a parent object. You run the query. You get back this:

{"orderItems":"[{\"product\":\"Widget Pro\"},{\"product\":\"Gadget Plus\"}]"}
Enter fullscreen mode Exit fullscreen mode

That's not a nested array. That's a string containing an escaped JSON array. Your frontend just received "orderItems" as text — they'd have to call JSON.parse() on the value to use it. Broken.

There are two fixes — and the better one is what we just talked about.

Fix #1 (preferred): use RETURNING JSON so the intermediate result is JSON, not VARCHAR2.

WITH item_array AS (
  SELECT JSON_ARRAYAGG(
    JSON_OBJECT('product' VALUE jt.product RETURNING JSON)
    RETURNING JSON
  ) AS items_json
  FROM orders o,
       JSON_TABLE(o.order_doc, '$.items[*]'
         COLUMNS (product VARCHAR2(100) PATH '$.product')) jt
)
SELECT JSON_OBJECT(
  'orderItems' VALUE items_json     -- Already JSON, no FORMAT JSON needed
  RETURNING JSON
) FROM item_array;
Enter fullscreen mode Exit fullscreen mode

items_json is now a JSON-typed column. When you nest it inside the outer JSON_OBJECT, Oracle knows it's already JSON and doesn't escape it. This is the cleanest pattern — and the one Zhen Hua Liu (the OSON architect) recommends — because it preserves type fidelity all the way through the pipeline.

Fix #2 (legacy / interop): use FORMAT JSON to vouch for a VARCHAR2 value.

If you're working with a VARCHAR2 or CLOB column that already contains serialized JSON — perhaps from an external source, a legacy table, or a function that returns text — you don't have the luxury of changing the source type. That's where FORMAT JSON earns its keep:

SELECT JSON_OBJECT(
  'payload' VALUE legacy_text_column FORMAT JSON   -- Trust me, this is JSON
  RETURNING JSON
) FROM legacy_table;
Enter fullscreen mode Exit fullscreen mode

FORMAT JSON tells Oracle "this string is already valid JSON, insert it as-is, don't escape it." Without that clause, Oracle assumes text — and text gets escaped for safety (otherwise a malicious value containing "} could break out of the parent structure). With it, Oracle inserts the value verbatim.

When you need FORMAT JSON: working with VARCHAR2/CLOB columns that store serialized JSON. Legacy tables. External payloads. Anywhere the source type is text but the content is JSON.

When you don't: When you control the source type. Use RETURNING JSON on every constructor in the pipeline and the problem disappears entirely.

This trips up every developer exactly once — usually on a Monday morning, usually ten minutes before a demo. Now you know how to avoid it (use RETURNING JSON everywhere) and how to fix it when you can't (use FORMAT JSON to vouch for the bytes).


9. Transforming JSON In-Place

Sometimes you don't want to decompose and rebuild — you want to modify the JSON document directly. Oracle gives you two tools.

JSON_TRANSFORM: The Swiss Army Knife

Here's the pattern every developer has written at least once: pull a document out of the database, JSON.parse it, mutate a few fields in application code, JSON.stringify it back, and write it to the database. Maybe wrap the whole thing in a read-modify-write loop with optimistic concurrency to avoid losing updates. Maybe get it wrong the first time. Probably write a retry.

JSON_TRANSFORM is the SQL-native alternative. You tell Oracle what you want changed. Oracle does the mutation on the binary OSON structure in place — no parse, no serialize, no round-trip, no race. Introduced in 21c and dramatically expanded in 26ai, it's the most expressive in-place JSON modification function in any database.

-- Multiple operations in a single call
SELECT JSON_TRANSFORM(o.order_doc,
  SET    '$.status'      = 'delivered',
  SET    '$.deliveredAt'  = SYSTIMESTAMP,
  REMOVE '$.priority',
  RENAME '$.customer'     = 'customerName',
  APPEND '$.tags'         = 'completed'
  RETURNING VARCHAR2(4000) PRETTY
)
FROM orders o;
Enter fullscreen mode Exit fullscreen mode

Read that like a script: update the status, stamp the delivery time, drop the priority field, rename customer to customerName, append "completed" to the tags array. Five mutations. One function call. One atomic operation.

Core operations:

Operation What it does Think of it like
SET Create the field if missing, replace if present obj[key] = value (JavaScript)
INSERT Create only — error if the field already exists dict.setdefault(key, value) (Python)
REPLACE Replace only — no-op if the field is missing if key in obj: obj[key] = value
APPEND Push a value onto a JSON array arr.push(value)
REMOVE Delete a field or array element delete obj[key]
RENAME Change a field name, keeping its value obj[newKey] = obj[oldKey]; delete obj[oldKey]
KEEP Whitelist: keep these fields, drop everything else pick(obj, [...]) from Lodash

Each operation has its own error handling — you decide what should happen if the target path is missing, already exists, or has a NULL value:

-- Only stamp createdAt if it's not already there (won't overwrite existing audits)
UPDATE orders SET order_doc = JSON_TRANSFORM(order_doc,
  INSERT '$.audit.createdAt' = SYSTIMESTAMP   -- create the path if missing
  IGNORE ON EXISTING                          -- skip silently if already set
  RETURNING JSON
);
Enter fullscreen mode Exit fullscreen mode

Why this matters: Every one of these operations happens atomically inside a single SQL transaction. No lost updates. No torn writes. No "I parsed version 1, you parsed version 1, I wrote version 2, you wrote version 2, my change is gone." You skip the entire class of concurrency bugs that the parse-mutate-write pattern introduces — because there's no parse and no intermediate state for another session to race against.

And because Oracle modifies the OSON binary format in place, updates are piecewise — only the changed portions of the document get written to disk, undo, and redo. A 2KB update to a 2MB document costs 2KB of I/O, not 2MB. At scale, that's the difference between a blog-post database and a system of record.

26ai Enhancements: NESTED PATH and CASE

Apply transformations to every element in an array:

-- Apply 10% discount to every item
SELECT JSON_TRANSFORM(o.order_doc,
  NESTED PATH '$.items[*]'
    SET '@.discountedPrice' = '@.unitPrice * 0.9'
  RETURNING VARCHAR2(4000) PRETTY
)
FROM orders o;
Enter fullscreen mode Exit fullscreen mode

Conditional transformations:

SELECT JSON_TRANSFORM(o.order_doc,
  CASE
    WHEN '$.status' = '"shipped"'
      THEN SET '$.trackable' = true
    ELSE
      SET '$.trackable' = false
  END
  RETURNING VARCHAR2(4000) PRETTY
)
FROM orders o;
Enter fullscreen mode Exit fullscreen mode

Array set operations: UNION, INTERSECT, MINUS, SORT — applied to JSON arrays like relational set operators.

JSON_MERGEPATCH: The Simple Alternative

For straightforward updates, JSON_MERGEPATCH implements RFC 7396:

-- Update status, add a field, remove a field (set to null)
UPDATE orders SET order_doc = JSON_MERGEPATCH(
  order_doc,
  '{"status": "delivered", "trackingUrl": "https://track.example.com/1001", "priority": null}'
);
Enter fullscreen mode Exit fullscreen mode

Rules are simple: new keys are added, existing keys are replaced, null removes the key.

Limitation: You can't modify individual array elements. JSON_MERGEPATCH replaces arrays wholesale. For granular array work, use JSON_TRANSFORM.


10. Putting It All Together: The CTE Pipeline

Here's where everything converges. Let's build a realistic query: flatten orders from JSON, join with a relational product catalog, compute analytics, and construct a new JSON API response. One statement. One transaction.

WITH
-- Step 1: Flatten JSON orders into relational rows
order_lines AS (
  SELECT o.id                                         AS order_pk,
         JSON_VALUE(o.order_doc, '$.orderId'
                    RETURNING NUMBER)                  AS order_id,
         JSON_VALUE(o.order_doc, '$.customer')         AS customer,
         JSON_VALUE(o.order_doc, '$.orderDate'
                    RETURNING DATE)                    AS order_date,
         jt.product,
         jt.quantity,
         jt.unit_price,
         jt.category
  FROM   orders o,
         JSON_TABLE(o.order_doc, '$.items[*]'
           COLUMNS (
             product    VARCHAR2(100) PATH '$.product',
             quantity   NUMBER        PATH '$.quantity',
             unit_price NUMBER        PATH '$.unitPrice',
             category   VARCHAR2(50)  PATH '$.category'
           )
         ) jt
),

-- Step 2: Enrich with relational product catalog
enriched AS (
  SELECT ol.*,
         p.supplier,
         p.weight_kg,
         ol.quantity * ol.unit_price AS line_total
  FROM   order_lines ol
  LEFT JOIN products p ON p.product_name = ol.product
),

-- Step 3: Aggregate by category
category_summary AS (
  SELECT customer,
         category,
         SUM(line_total)   AS category_total,
         SUM(quantity)      AS units_sold
  FROM   enriched
  GROUP  BY customer, category
)

-- Step 4: Construct the JSON API response
SELECT JSON_OBJECT(
  'customer'   VALUE cs.customer,
  'categories' VALUE JSON_ARRAYAGG(
    JSON_OBJECT(
      'category' VALUE cs.category,
      'total'    VALUE cs.category_total,
      'units'    VALUE cs.units_sold
      RETURNING JSON
    )
    ORDER BY cs.category_total DESC
    RETURNING JSON
  )
  RETURNING JSON
) AS api_response
FROM   category_summary cs
GROUP  BY cs.customer;
Enter fullscreen mode Exit fullscreen mode

Read it top to bottom. Each CTE is one logical step. Each step is testable in isolation (replace the final SELECT with SELECT * FROM enriched to debug step 2). The CBO optimizes the entire pipeline as a single execution plan — it can push predicates from step 4 down into step 1, reorder joins, and choose the optimal access path for each table.

Now stop and think about what just happened. You described the data you wanted. You described how to shape it. You handed that description to Oracle. What you got back was a fully-formed JSON API response — computed, aggregated, sorted, assembled — in one round trip, one transaction, one execution plan.

Count the code that didn't have to exist:

  • No ORM.
  • No serialization layer.
  • No hand-written JSON builder walking through result sets in application code.
  • No stitching together results from multiple queries.
  • No worrying about which query reads the latest data and which one doesn't.
  • No retry logic when a mid-pipeline read returns stale state.

The handler code to deliver this API response is now one line: res.send(row). That's the whole request path.

The Multi-Model Query

This is where the gloves come off. In Oracle 26ai, the pipeline extends to vector search, graph traversal, and spatial queries — all in the same WITH clause, all in the same transaction, all under the same optimizer:

WITH
semantic_matches AS (
  -- Vector similarity search
  SELECT doc_id, chunk_text,
         VECTOR_DISTANCE(embedding, :query_vector, COSINE) AS score
  FROM   knowledge_base
  ORDER  BY score
  FETCH  FIRST 20 ROWS ONLY
),
graph_context AS (
  -- Graph traversal (SQL/PGQ)
  SELECT v1.name AS entity, v2.name AS related, e.rel_type
  FROM   GRAPH_TABLE(knowledge_graph
           MATCH (v1) -[e]-> (v2)
           WHERE v1.doc_id IN (SELECT doc_id FROM semantic_matches)
           COLUMNS (v1.name, v2.name, e.rel_type)
         )
),
enriched AS (
  -- Relational metadata
  SELECT s.doc_id, s.score, d.title, d.author
  FROM   semantic_matches s
  JOIN   documents d ON d.id = s.doc_id
)
SELECT JSON_OBJECT(
  'results' VALUE (
    SELECT JSON_ARRAYAGG(
      JSON_OBJECT(
        'title'     VALUE e.title,
        'score'     VALUE ROUND(e.score, 4),
        'relations' VALUE (
          SELECT JSON_ARRAYAGG(
            JSON_OBJECT('entity' VALUE g.entity,
                        'related' VALUE g.related,
                        'type' VALUE g.rel_type
                        RETURNING JSON)
            RETURNING JSON
          )
          FROM graph_context g
        )
        RETURNING JSON
      ) ORDER BY e.score
      RETURNING JSON
    ) FROM enriched e
  )
  RETURNING JSON
) AS rag_context
FROM DUAL;
Enter fullscreen mode Exit fullscreen mode

Vector search. Graph traversal. Relational joins. JSON construction. One query. One transaction. One optimizer.

The Alternative: Polyglot Persistence

Let's walk through building the same feature on a "purpose-built" stack. You know the pitch — "use the right tool for the job." Here are the jobs:

  • Vector similarity search → vector database (Pinecone, Weaviate, or Atlas Search)
  • Graph traversal → graph database (Neo4j, Neptune)
  • Document storage → MongoDB or DynamoDB
  • Relational metadata → Postgres or MySQL
  • Assembly → your application code, on an EC2 instance somewhere, holding the bag

Now your request path looks like this:

  1. HTTP call out to the vector service. Marshall the request, send it over TCP, wait for the response, unmarshall 20 results.
  2. HTTP call out to the graph database. Pass the IDs from step 1. Marshall, send, wait, unmarshall.
  3. HTTP call out to the document store. Pass IDs again. Marshall, send, wait, unmarshall.
  4. HTTP call out to the relational database. Pass IDs again. Marshall, send, wait, unmarshall.
  5. Write application code to JOIN all four result sets in-memory.
  6. Hand-build the JSON response object.
  7. Ship it.

Count the overhead. Four network round-trips. Four serialization cycles. Four deserialization cycles. Four connection pools. Four auth handshakes. Four monitoring dashboards. Four SDKs with four different query APIs your team has to learn. And the developer — not the database — is the one writing the join logic that used to be FROM a JOIN b.

"But we have a unified API!" — okay, let's actually look at that. A unified API over multiple backend services reduces the number of network calls your app makes, because the vendor's gateway is now doing the fan-out instead of your code. That's a real win for developer ergonomics. It is also the entire win.

What the unified API does not give you is a unified data model. MongoDB, for instance, exposes document storage, Atlas Search, vector search, time series, and a bolt-on graph primitive ($graphLookup) through a common surface. Looks clean. But underneath, there's still no relational model, no cost-based optimizer that spans those stores, and no way to join four data shapes in a single execution plan. You can $lookup your way to something that resembles a join, but you're stitching stages in an aggregation pipeline — the developer is still the optimizer, and cross-shard $graphLookup can't even participate in a multi-document transaction. Consistency? Atlas Search indexes update in a separate process with one-to-fifteen seconds of lag. Your vector results and your document results can — and routinely do — disagree about what's actually in the database right now.

So yes, the unified API saves you some network round trips. It does not save you from the thing that breaks your RAG pipeline at 3am: the underlying stores don't share a model, don't share an optimizer, and don't share a transaction boundary. The seams are still there. You just can't see them anymore — and neither can your debugger.

The Consistency Problem

Now the part that isn't obvious until it's production.

In the polyglot model, every service has its own clock. The vector database indexed your document three seconds ago. The graph database processed the relationship change yesterday. The document store has today's data. The relational metadata is fresh from the last ETL job that ran at 4am. All four sources are individually correct and collectively lying.

When you're building a dashboard, you shrug. When you're feeding a retrieval-augmented LLM, you don't. The model gets vector results pointing to a document that no longer exists, graph relationships that were severed an hour ago, and metadata from a snapshot taken last night. What does the model do with contradictory facts?

It hallucinates. With confidence. And your users — or worse, your agents taking actions on behalf of your users — treat that hallucination as truth.

There's no application fix for this. You can't retry your way out of it. You can't throw more monitoring at it. The inconsistency is baked into the architecture the moment you decided to split your data across systems that don't share a transaction boundary.

What the Converged Engine Gives You

In the Oracle query above, the CBO sees the entire statement. It knows the vector search returns ~20 rows, so it'll probably use that as the driving input. It knows the graph traversal is bounded by a predicate. It knows the documents join can use an index. It builds one execution plan that's optimized across vector, graph, relational, and JSON operations simultaneously.

But the part that matters for your users is this: every piece of data in the response came from the same moment in time. The vector search, the graph traversal, and the relational join all ran inside the same transaction against the same snapshot. There's no "the vector index says this but the document says that." There's just one answer, internally consistent, from one source of truth.

For an LLM, that's the difference between grounding and gambling. For an autonomous agent, it's the difference between correct action and expensive liability. For you — the developer on the hook when the 3am page comes in — it's the difference between "the model got confused" and "the system did what it was supposed to do."

One query. One transaction. One optimizer. One truth. That's not a feature. That's architecture — and no amount of clever SDK design on top of a polyglot stack gets you there.


11. Indexing and Performance

Good SQL/JSON queries are fast. Properly indexed SQL/JSON queries are blazing. Wrong indexes — or missing ones — turn beautiful declarative SQL into a sequential scan of your entire table, which is an experience every developer enjoys exactly once before they learn to run EXPLAIN PLAN.

Oracle gives you five different kinds of JSON indexes, each optimized for a different access pattern. Here's the decision tree:

What you're doing Use this
Querying one scalar field, known queries Functional index on JSON_VALUE
Same field queried with multiple types Multiple functional indexes — one per type
Looking up values inside a JSON array Multivalue index (21c+)
Ad-hoc or unknown query patterns across the whole document JSON search index
Only want to index rows matching a condition Add WHERE to any of the above — partial index (23ai+)

The CBO picks whichever one matches your predicate. You don't hint, you don't route — you just write SQL and Oracle figures out which index to use. Most production workloads combine two or three of these. Let's walk through them.

Functional Indexes on JSON_VALUE

For known access patterns — the queries your app runs a million times a day:

CREATE INDEX idx_order_status ON orders (
  JSON_VALUE(order_doc, '$.status' RETURNING VARCHAR2(20) ERROR ON ERROR)
);

-- Composite index across multiple JSON fields
CREATE INDEX idx_order_cust_date ON orders (
  JSON_VALUE(order_doc, '$.customer' RETURNING VARCHAR2(100)),
  JSON_VALUE(order_doc, '$.orderDate' RETURNING DATE)
);
Enter fullscreen mode Exit fullscreen mode

Or use dot-notation with item methods:

CREATE INDEX idx_order_customer ON orders o (
  o.order_doc.customer.string()
);
Enter fullscreen mode Exit fullscreen mode

Critical rule: the expression has to match exactly. The CBO compares the index definition and the query predicate as expressions, not as semantic equivalents. "Close enough" is not enough. Specifically:

  • Return type must match. RETURNING NUMBER in the index and a query without a RETURNING clause (which defaults to VARCHAR2) = no match. Index ignored.
  • VARCHAR2 length is part of the type. RETURNING VARCHAR2(100) in the index and RETURNING VARCHAR2(400) in the query = no match. These are different type instances as far as the optimizer is concerned.
  • Dot-notation and JSON_VALUE are different expressions. An index created with o.order_doc.customer.string() will not be used by a query written with JSON_VALUE(order_doc, '$.customer' RETURNING VARCHAR2(100)) — even though both compute "the customer field as a string." The item method .string() is internally equivalent to JSON_VALUE with a default VARCHAR2 return (typically VARCHAR2(4000)), and that's rarely the same type as your RETURNING VARCHAR2(n) index definition.

The practical rule: pick one syntax for each access path — either dot-notation with item methods, or JSON_VALUE with explicit RETURNING clauses — and use it consistently for both the index definition and every query that should hit it. Mixing them is the single most common reason "my index isn't getting used" in Oracle SQL/JSON.

If you find out too late that your queries and your index are using different expressions, you have two options: rebuild the index to match, or change the queries to match the index. There's no magic reconciliation. Run EXPLAIN PLAN or check DBMS_XPLAN.DISPLAY_CURSOR — if you see a full table scan where you expected an index range scan, expression mismatch is almost always the culprit.

What if your queries hit the same path with different types? Create multiple functional indexes — one per type. Each one is a thin B-tree, each one matches exactly one query shape, and the CBO picks whichever one fits:

-- Index for string comparisons: WHERE amount.string() = '100.00'
CREATE INDEX idx_amount_str ON orders o (o.order_doc.amount.string());

-- Index for numeric comparisons: WHERE amount.number() > 100
CREATE INDEX idx_amount_num ON orders o (o.order_doc.amount.number());
Enter fullscreen mode Exit fullscreen mode

You pay for the extra storage and write amplification on that field, but every query shape gets index access. We will discuss how to avoid this in a minute. This is the "explicit coverage" approach — and it's especially useful for legacy code paths that query the same field with different conventions than your newer code.

If you'd rather not enumerate types by hand, the next option covers every type at once.

JSON Search Indexes: Index Everything, Query Anywhere

Functional and multivalue indexes are targeted — they cover specific paths you already know you'll query. But document data is document data: schemas drift, new fields appear, analysts show up with questions nobody anticipated, multi-tenant customers each care about different attributes. What do you do when you don't know the access patterns in advance?

You create a JSON search index:

CREATE SEARCH INDEX idx_orders_search ON orders (order_doc) FOR JSON;
Enter fullscreen mode Exit fullscreen mode

That one statement makes every path in every document indexed. $.customer? Indexed. $.shipping.address.city? Indexed. $.items[*]?(@.category == "electronics")? Indexed. $.totally.new.field.you.added.last.tuesday? Indexed, automatically, the moment data with that path shows up. You didn't declare any of those paths individually. You didn't add DDL when the schema evolved. The index adapts.

Any JSON_EXISTS, JSON_VALUE, or full-text predicate on that column becomes a candidate for index access. The CBO chooses it automatically when no more targeted index fits.

Comparisons with MongoDB — the honest version. People draw a few different comparisons between Oracle's JSON search index and MongoDB's indexing options. Some are closer than others. The accurate mapping:

MongoDB Oracle What they both do
Wildcard index { "$**": 1 } JSON search index (scalar use) Index every path for ad-hoc queries
Atlas Search JSON search index (full-text use) Above + tokenization, stemming, phrase, regex
Multikey index Multivalue index Per-field array indexing
Compound index Composite functional index Multiple fields in one index

The wildcard index is the closest comparison when you're just talking about "index everything for unknown access patterns" — both let you skip the up-front decision of which fields to index, and both stay in the operational engine (no sidecar). That's the fair comparison and it's worth giving MongoDB credit for the feature.

Where wildcard indexes fall short for production use:

  • One field per query. A wildcard index can only help a query that predicates on a single field. Multi-field queries like { customer: "Acme", status: "shipped" } can't use the wildcard index for both predicates — it picks one and scans for the other. Compound queries still need compound indexes, which means you're back to enumerating access patterns.
  • No complex filters. Oracle's JSON_EXISTS with filter predicates (?(@.price > 100 && @.category in ("electronics","tools"))) gets evaluated at the index level. MongoDB's wildcard index can't support that class of predicate — it has to scan after the index narrows things down.
  • Can't be unique, can't be TTL, can't be a shard key. A lot of functionality you give up for the convenience.
  • No full-text. If you want tokenization, stemming, or phrase search, you stand up Atlas Search as a separate process. Separate process means separate sync cycle, and here's the punchline: Atlas Search runs as a sidecar (mongot) with one to fifteen seconds of indexing lag. During that window, your document exists but isn't searchable. Your RAG pipeline reads the vector result and then can't find the corresponding document because the search index is playing catch-up. Your support team files a ticket that says "the search is broken." Again.

Oracle's JSON search index does all of it — scalar ad-hoc queries, complex filter predicates, full-text search — from a single index definition, in-kernel and transactionally consistent by default. The index updates inside the same transaction that updates the row. Commit returns. The document is searchable. Period. No sidecar process. No replication lag. No "eventual" anything.

-- Transactional (default)
CREATE SEARCH INDEX idx_orders_search ON orders (order_doc) FOR JSON
  PARAMETERS ('SYNC (ON COMMIT)');

-- Or periodic, if you'd rather trade freshness for write throughput
CREATE SEARCH INDEX idx_orders_search ON orders (order_doc) FOR JSON
  PARAMETERS ('SYNC (EVERY "FREQ=SECONDLY; INTERVAL=1")');
Enter fullscreen mode Exit fullscreen mode

Your choice, your trade-off. Oracle's default is consistency. MongoDB's only option is async.

When to use it: Analytics workloads. Multi-tenant systems where each tenant queries different fields. Applications where the schema evolves faster than you can deploy new DDL. Anywhere you'd otherwise be tempted to stand up Elasticsearch as a sidecar just to make ad-hoc queries fast. The JSON search index gives you that capability inside your operational database, consistent with your operational data, without standing up new infrastructure.

The trade-off: it's heavier to maintain than a targeted functional index. Every insert tokenizes the whole document and updates the inverted lists. For write-heavy OLTP workloads with well-known access patterns, targeted functional indexes are still the better call. Most real systems use both:

  • Two or three functional indexes on the hot-path fields (status, customer, date, whatever drives 80% of your queries)
  • One JSON search index as the catch-all for everything else
  • A couple of partial indexes layered on top for high-cardinality optional fields

The CBO picks whichever index fits the query. You write normal SQL and let the optimizer sort it out.

Multivalue Indexes (21c+)

For indexing values inside JSON arrays — something ordinary functional indexes can't do:

CREATE MULTIVALUE INDEX idx_order_tags ON orders o (
  o.order_doc.tags.string()
);

-- This query uses the multivalue index
SELECT * FROM orders o
WHERE  JSON_EXISTS(o.order_doc, '$.tags[*]?(@.string() == "wholesale")');
Enter fullscreen mode Exit fullscreen mode

Execution plans show INDEX RANGE SCAN (MULTI VALUE) when the optimizer picks it up.

A note on type coverage for array values. The multivalue index above is locked to .string() — it indexes the tag values as text. If your arrays can hold values of different types (say, a metrics array that mixes numbers and strings, or an audit log with mixed scalar types), the same pattern applies as with scalar indexes: create one multivalue index per type you actually query on:

-- Tags queried as strings
CREATE MULTIVALUE INDEX idx_tags_str ON orders o (o.order_doc.tags.string());

-- Numeric IDs queried as numbers
CREATE MULTIVALUE INDEX idx_ids_num  ON orders o (o.order_doc.relatedIds.number());
Enter fullscreen mode Exit fullscreen mode

MongoDB's multikey indexes are strict about BSON types — and this is where developers get burned. Say two services write to the same tags array, one as strings and one as numbers:

// Inserted by Service A
{ _id: 1, tags: ["urgent", "123"] }    // "123" is a string

// Inserted by Service B
{ _id: 2, tags: ["urgent", 123] }      // 123 is a number
Enter fullscreen mode Exit fullscreen mode

Now query:

db.orders.find({ tags: 123 })          // returns _id: 2 only
db.orders.find({ tags: "123" })        // returns _id: 1 only
Enter fullscreen mode Exit fullscreen mode

Neither query finds both documents. MongoDB's BSON comparison treats "123" and 123 as different values, so the index stores them as different entries and queries silently miss across the type boundary. To catch both, you need $in: [123, "123"] — and now your application code has to know the full set of type variations that might exist in the database, which defeats the point of having a schema-flexible store.

Oracle's multivalue indexes normalize at index time. When you create a multivalue index with .string(), both "123" and 123 are stored as the string "123" in the index. When you create one with .number(), both are stored as the number 123. You pick the canonical representation and the index enforces it, regardless of how heterogeneous the source data is:

-- Both "123" and 123 are indexed as the string "123"
CREATE MULTIVALUE INDEX idx_tags_str ON orders o (o.order_doc.tags.string());

-- Both "123" and 123 are indexed as the number 123
CREATE MULTIVALUE INDEX idx_tags_num ON orders o (o.order_doc.tags.number());
Enter fullscreen mode Exit fullscreen mode

A query using @.string() == "123" on the string index hits both documents. A query using @.number() == 123 on the number index hits both documents. You choose the type view once at index creation time and get consistent results forever, without your app code having to enumerate every type variation the source data might contain.

If you want broad coverage across multiple types without declaring each one by hand, use a JSON search index — it catches everything.

Partial Indexes: Exclude NULL, Route Writes

Here's a question every developer asks once they start putting indexes on optional JSON fields: if most of my documents don't have a trackingUrl, am I still paying to index NULL on every insert?

For single-column B-tree indexes, the answer is automatic — Oracle doesn't index rows where every indexed column is NULL. An index on JSON_VALUE(order_doc, '$.trackingUrl') skips unshipped orders for free.

For everything else — composite indexes, indexes you want to exclude based on conditions other than NULL — partial indexes (23ai+) let you attach a WHERE clause directly to the index definition:

CREATE INDEX idx_shipped_tracking ON orders (
  JSON_VALUE(order_doc, '$.trackingUrl' RETURNING VARCHAR2(200))
)
WHERE JSON_VALUE(order_doc, '$.status') = 'shipped';
Enter fullscreen mode Exit fullscreen mode

Rows where status isn't 'shipped' are never touched by this index — not on insert, not on update, not on delete. The B-tree stays small and the write path stays fast.

Pattern: Type-Routed Indexes for Polymorphic Documents

Now combine partial indexes with the multi-type coverage pattern and you get something MongoDB can't replicate cleanly: write-path routing based on document shape.

A lot of real-world document collections are polymorphic. One table holds customers, products, and invoices. Or one table holds v1 documents with amount stored as a string ("1,234.56") and v2 documents with amount stored as a NUMBER, because you evolved the schema without running a big-bang migration. Standard indexing says you either pick one type and break the other, or you index the lowest-common-denominator representation and lose query precision. Neither is great.

Partial indexes let you define one index per shape and route writes to the right index based on a discriminator attribute:

-- Schema version routing
CREATE INDEX idx_amount_v1 ON orders (
  JSON_VALUE(order_doc, '$.amount' RETURNING VARCHAR2(30))
)
WHERE JSON_VALUE(order_doc, '$.schemaVersion') = '1';

CREATE INDEX idx_amount_v2 ON orders (
  JSON_VALUE(order_doc, '$.amount' RETURNING NUMBER)
)
WHERE JSON_VALUE(order_doc, '$.schemaVersion') = '2';
Enter fullscreen mode Exit fullscreen mode

Now when you insert a v1 document, only idx_amount_v1 gets maintained. When you insert a v2 document, only idx_amount_v2. Neither index stores NULL entries for the other version. Neither index contends with the other on the write path. The total write amplification stays the same as a single-version index even though the collection handles multiple schemas simultaneously.

The same pattern works for polymorphic entity types:

-- One table, multiple entity types, write-path-isolated indexes
CREATE INDEX idx_customer_name ON entities (
  JSON_VALUE(doc, '$.companyName' RETURNING VARCHAR2(200))
)
WHERE JSON_VALUE(doc, '$.entityType') = 'customer';

CREATE INDEX idx_product_sku ON entities (
  JSON_VALUE(doc, '$.sku' RETURNING VARCHAR2(50))
)
WHERE JSON_VALUE(doc, '$.entityType') = 'product';

CREATE INDEX idx_invoice_number ON entities (
  JSON_VALUE(doc, '$.invoiceNo' RETURNING NUMBER)
)
WHERE JSON_VALUE(doc, '$.entityType') = 'invoice';
Enter fullscreen mode Exit fullscreen mode

Insert a customer document: only the customer index gets touched. Insert a product: only the product index. Insert an invoice: only the invoice index. Write overhead becomes proportional to the entity type you're writing, not to the total number of indexes on the table.

This is the cleanest answer to the "polymorphic collection" problem I've seen in any database. You don't pay for indexes that don't apply. You don't deal with NULL pollution. You don't need to split the data across multiple tables to get write-path isolation. And query time, the CBO picks the right index automatically based on the predicate — no hints, no routing logic in application code.

The trade-off: schema-version or entity-type discriminators have to be discoverable from the document. Make sure your write path sets them consistently — these become load-bearing fields. But if you're already tagging documents with a type or version (and most polyglot migrations do), you get write-path routing for free.

The OSON Advantage

Every one of these indexes is riding on top of the native JSON type's OSON binary format — and that matters more than most developers realize.

Here's why. When your JSON is stored as text (VARCHAR2, CLOB, or MongoDB's BSON), every field access has to walk the document byte-by-byte until it finds the right offset. O(n) complexity per lookup, where n is how deep into the document the field lives. Every index maintenance operation — every insert, every update — pays that parse cost again. At scale, this shows up as CPU burn on your write path and latency spikes on deeply nested field queries.

OSON doesn't parse. It navigates. Field names are hash-indexed, so looking up $.shipping.address.zip is three hash jumps — direct offset lookups, not scans. O(1) per field, regardless of document size or nesting depth. Your index maintenance is cheaper because the underlying field access is cheaper. Your queries are cheaper. Your writes are cheaper.

I've benchmarked this extensively. At field position 1000 in a document, OSON is 529x faster than BSON for field access. At position 50, it's 28.6x. This isn't marketing. It's storage engine mechanics — hash-indexed O(1) versus length-prefixed sequential O(n). Same input, same output, radically different physics.

And because OSON supports piecewise updates, modifying one field in a large document only writes the changed portion to disk, undo, and redo. Update $.status on a 2MB document and you pay for 2KB of I/O, not 2MB. Functional indexes on unchanged fields don't get touched at all. Text-stored JSON has to rewrite the entire column on every update — including BSON, which has no piecewise update path at the storage layer.

For the deep dive on the binary format mechanics, see my article: Why Binary Document Protocols Aren't All Created Equal.

The bottom line for developers: the index features in this section work on top of OSON because OSON makes them practical. You can have type-routed partial indexes and full-document search indexes and multivalue arrays and targeted functional indexes on the same column because the underlying format is fast enough that maintaining all of them on every write is affordable. On a text-based or sequentially-scanned format, this mix of indexing strategies would murder your write throughput. On OSON, it's just how you build things.


12. The Bigger Picture

We've covered the mechanics. Let's zoom out.

JSON Schema Validation (26ai)

Oracle 26ai lets you enforce document structure at the storage layer:

CREATE TABLE validated_orders (
  id   NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  data JSON VALIDATE '{
    "type": "object",
    "properties": {
      "orderId":  {"type": "number"},
      "customer": {"type": "string"},
      "items":    {"type": "array", "items": {
        "type": "object",
        "required": ["product", "quantity"]
      }}
    },
    "required": ["orderId", "customer", "items"]
  }'
);
Enter fullscreen mode Exit fullscreen mode

Invalid documents are rejected at insert time with ORA-40875. No application-layer validation. No "trust the client" hope. The database enforces the contract. Oracle is the first major relational database to support JSON Schema validation natively as a constraint.

JSON_SERIALIZE: Making Binary Readable

When you need to inspect OSON binary content as text:

-- Pretty print with alphabetically sorted keys (26ai)
SELECT JSON_SERIALIZE(order_doc ORDERED PRETTY) FROM orders;
Enter fullscreen mode Exit fullscreen mode

Essential for debugging, logging, and API responses. The ORDERED keyword (26ai) sorts keys alphabetically — useful for deterministic output in tests and diffs.

JSON Relational Duality Views: The UMT Implementation

Everything in this article leads here. Duality Views are the physical implementation of Unified Model Theory — one truth (normalized relational tables), many shapes (JSON document views), zero data duplication.

CREATE JSON RELATIONAL DUALITY VIEW order_dv AS
  SELECT JSON {
    'orderId'  : o.order_id,
    'customer' : o.customer_name,
    'items'    : [ SELECT JSON {
                     'product'  : i.product_name,
                     'quantity' : i.quantity,
                     'price'    : i.unit_price
                   }
                   FROM order_items i WITH UPDATE
                   WHERE i.order_id = o.order_id ]
  }
  FROM orders o WITH INSERT UPDATE DELETE;
Enter fullscreen mode Exit fullscreen mode

That's a fully updatable JSON document view over normalized relational tables. INSERT a document through the view and it decomposes into relational rows across multiple tables. Query through the view and it assembles documents from those tables. Full ACID. Optimistic concurrency via ETags. Accessible via SQL, REST, or even the MongoDB wire protocol.

Model the domain. Project the access. One truth. Many shapes. Zero tradeoffs.


The Bottom Line

Oracle's SQL/JSON isn't a compatibility checkbox. It's not "we also do JSON." It's a complete query language for hierarchical data, integrated into the most mature optimizer in the industry, backed by a binary format designed by database research veterans, and extended with construction, transformation, validation, and duality features that don't exist anywhere else.

But the real story for developers isn't the feature list. It's what these features eliminate from your stack.

The Code That Stops Existing

Look at the shape of a typical application built around a traditional database. You have an ORM translating between objects and tables, because your tables don't look like your objects. You have DTO classes shuttling data between the ORM layer and your API layer, because your ORM objects don't look like your API responses. You have serialization code building JSON from those DTOs, because your API consumers want JSON. You have validation libraries on top of all of it, because none of those layers enforce a contract.

And when your JSON data doesn't fit the relational model cleanly, you stand up a second database — a document store, a search engine, a vector index — and now you have integration code. Change data capture. Sync jobs. Consistency compensators. Retry logic. Pipelines that fail at 2am. Tickets nobody wants.

Oracle SQL/JSON collapses this stack. Your tables can hold both normalized relational data and JSON documents. Your queries return JSON directly, shaped to match your API contract. Your indexes cover both data shapes. Your transactions span both models. Your optimizer plans across both.

Count the code that stops existing:

  • No ORM required. Your query emits the JSON structure your API consumer wants. res.send(row.api_response) is the whole handler.
  • No DTO layer. There's nothing to translate. The database returned the final shape.
  • No serialization library. The database did the serialization, and it did it on the binary OSON structure, not in your application memory.
  • No validation framework for the API response. JSON Schema validation runs at the storage layer, not the application layer.
  • No integration layer between document and relational data. Both live in the same table, same transaction, same optimizer.
  • No sidecar search or vector services. JSON search and native VECTOR types live inside the database, consistent with the operational data.
  • No consistency compensators. The engine gives you ACID across JSON, relational, graph, and vector operations in the same transaction.

Every one of these is a class of code that exists in a polyglot architecture because the underlying databases don't talk to each other. Eliminate the need for them to talk to each other, and the code evaporates. What's left is the part of your application that actually creates business value.

Let Physics Define Your Data Model

Here's the principle that should drive every storage decision you make: if data is accessed together, store it together. If access patterns aren't clear, normalize it. That's it. That's the rule. The database's job is to make both possible without forcing you to choose an architecture up front.

All data is ultimately relational — entities, attributes, and the relationships between them. That's not a philosophy; that's how information works. But storing everything as normalized relational tables is an implementation choice, and it's the wrong choice when your access patterns are well-known and involve reading hierarchical data together. You're paying for joins you don't need, on data you always fetch as a unit, to get back a result you have to re-hierarchize in application code anyway.

The physics argument goes the other way too. When your access patterns are unknown, varied, or analytical — when different consumers need the same data in different shapes, or when you need aggregation across entities — normalized relational is the only sensible choice. Embedding everything in documents means duplicating data, fighting update anomalies, and losing the ability to reason efficiently about relationships.

Oracle lets you pick per-workload without committing to a paradigm. You have the full toolkit:

  • Native JSON type columns. Store documents as first-class data alongside relational columns in the same table. When your access patterns are well-known and you want document locality, put the document in the row. One read, one fetch, one shape.
  • JSON Collections (SODA / MongoDB API). Pure document storage with the MongoDB wire protocol on top. Drop-in compatibility for existing document applications, with all the indexing, ACID, and optimizer capabilities of the Oracle engine underneath.
  • Hybrid tables. Relational columns for structured, queryable fields alongside a JSON column for flexible, schema-optional attributes. The best of both worlds when your data has both known structure and emergent properties.
  • Normalized relational tables. When access patterns are varied, analytical workloads matter, and entity relationships drive the queries, normal form is still the right answer. Use it.
  • JSON Relational Duality Views. When the same canonical data needs to serve multiple access patterns with multiple document shapes, model once in relational form and project as many document views as you need. Updates flow both ways with full ACID guarantees. One source of truth, many projections, no CDC.

Every one of these stores the same kind of data. Every one participates in the same transactions. Every one is indexed and optimized by the same cost-based optimizer. You don't have to pick one approach for the whole application — different tables can use different strategies based on how the data is actually accessed. Let the physics of your access patterns define the model, not the other way around.

This is the answer to "should we use a document database or a relational database?" that the industry has been arguing about for fifteen years. The answer isn't one or the other. The answer is: whichever shape fits the access pattern, with full SQL and full JSON support either way, inside a single engine with a single transaction boundary.

Your application code sees documents where documents make sense. Your analytics queries see tables where tables make sense. Your reports see normalized joins. Your AI agents see vector-indexed context. Nobody sees the seams because there are no seams.

Your New Stack

The developer who actually internalizes what Oracle SQL/JSON can do doesn't just write faster JSON queries. They build fundamentally simpler systems.

  • No ORM. The database returns your API response.
  • No sidecar services. Search, vector, graph, and relational share the same engine and transaction.
  • No sync jobs. There's nothing to sync — one truth, many projections.
  • No complex middleware. The query is the middleware.
  • No "which database does this belong in?" debates. The answer is always "this one."
  • No praying that five systems agree on current state. They're all the same system.

You write one query. You get one truth. Your application is smaller, your operational surface is smaller, your bug reports are fewer, and your pages at 3am have better reasons than "the search index is lagging again."

That's not a feature. That's physics. ⚡


What SQL/JSON patterns are you using in production? Have you tried Duality Views yet? I'd love to hear what's working — and what's not. 👇

Oracle #JSON #SQL #DatabaseArchitecture #SQLJson #DeveloperTools #DataModeling #OracleDatabase #CTE #UnifiedModelTheory #DataEngineering #Performance #EnterpriseAI #TechLeadership

Top comments (0)