Gowtham Potureddi

Posted on Jun 14

Iceberg REST Catalog, Nessie & Polaris: Open Lakehouse Catalogs Compared

#sql #python #interview #dataengineering

The iceberg rest catalog has quietly become the lingua franca of the open lakehouse — the thin OpenAPI surface that lets Spark, Trino, Flink, DuckDB, ClickHouse, Snowflake, and BigQuery all read and write the same physical Iceberg tables without a Hive metastore in sight. Two other names sit alongside it on every 2026 architecture review: Project Nessie, the git-style catalog that adds branches, tags, and atomic multi-table commits to the lakehouse; and Apache Polaris, Snowflake's open-source multi-tenant catalog donated to the ASF and now the default federation surface for cross-engine Iceberg deployments.

This guide is the senior-engineer reference for picking one. It walks through the lakehouse catalog contract (federation, governance, atomic CAS), unpacks the Iceberg REST spec endpoint by endpoint, contrasts project nessie branches/tags with vanilla snapshot-only versioning, lays out apache polaris principals / roles / grants for multi tenant catalog isolation, and closes with a four-axis decision matrix for engine fit, multi-tenancy, data versioning catalog semantics, and governance. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.

When you want hands-on reps immediately after reading, drill the dimensional modelling practice library →, rehearse on ETL design problems →, and stack the platform muscles with system design drills →.

On this page

Why a catalog — federation, governance, multi-engine access
Iceberg REST Catalog spec — open protocol, vended credentials, engine support
Project Nessie — git-like catalog, branches and tags on data
Apache Polaris — multi-tenant, Iceberg + RBAC
Picking a catalog — engine support, multi-tenancy, governance trade-offs
Catalog cheat sheet — REST, Nessie, Polaris recipes
Frequently asked questions
Practice on PipeCode

1. Why a catalog — federation, governance, multi-engine access

A catalog is the contract between the table format and every engine that reads it

The one-sentence invariant: an Iceberg catalog is the only thing that lets multiple writers and multiple engines agree on which metadata.json is the current head of a table — every other property (atomicity, federation, governance) falls out of that one rule. Once you accept that the catalog is the lakehouse control plane, the choice between iceberg catalog implementations stops being about features and starts being about which contract you can defend in a code review.

The four properties a real catalog must provide.

Federation. A single namespace tree (catalog → namespace → table) that Spark, Trino, Flink, DuckDB, ClickHouse, Snowflake, and BigQuery can all list and resolve. Without federation you are back to per-engine metastores and you cannot do a cross-engine join without copying data.
Governance. Centralised authentication, authorisation, audit trail, and storage credential vending — engines authenticate to the catalog, the catalog vends short-lived tokens to the storage layer, and no engine ever sees the root S3 / GCS / ABFS key.
Discovery. A searchable, hierarchical namespace tree that data consumers can browse without joining an internal Slack channel and asking which database the gold layer lives in.
Atomicity. The catalog holds the pointer to the current metadata.json. Every commit is a compare-and-set on that pointer — first writer wins, second writer 409s and retries — which is the only way to make a lakehouse table safely writable by two processes at the same time.

The three places a missing catalog bites.

Discovery. Without a catalog, "which table holds the gold-tier orders?" is answered by grep on an S3 bucket. Once you have ten domains, this becomes a full-time job for one engineer.
Atomicity. Two writers race a commit. Without a CAS-capable catalog, both append a new metadata.json to the same prefix and rely on S3 last-writer-wins. The result is a silently dropped commit and the senior on-call gets paged at 03:00.
Governance. Without a catalog vending credentials, every engine needs a long-lived IAM key with read/write to the whole bucket. The blast radius of a single leaked key is the entire lake.

The three-engine reality.

Once a platform has more than one query engine, the catalog is the only safe way to share Iceberg tables. Spark writes a fact table from an ETL job, Trino reads it for ad-hoc analytics, Flink streams a CDC sink into a sister table, DuckDB pulls the same table for a notebook. All four engines must agree on the current snapshot or one of them will silently read stale data. That agreement is the catalog's job.

What interviewers listen for.

Do you say "the catalog holds the pointer to the current metadata.json and commits are an atomic CAS on that pointer"? — required answer.
Do you mention vended credentials when asked how engines authenticate to storage? — senior signal.
Do you reach for RBAC at the namespace / table level the moment multi-tenant comes up? — required answer.
Do you recognise that the Hive metastore is now deprecated for lakehouse use because it lacks atomic CAS and short-lived credential vending? — senior signal.

The 2026 reality.

Hive Metastore is officially deprecated for new lakehouse builds — every major engine ships an Iceberg REST connector and treats HMS as legacy.
AWS Glue offers a REST-compatible mode alongside its native API, so existing Glue users can move to the open spec without re-importing tables.
Snowflake and Databricks both publish Iceberg via catalog — Snowflake through Polaris (GA mid-2025), Databricks through Unity Catalog's Iceberg REST endpoint (GA late-2025).
Apache Polaris is an Apache Software Foundation incubator project donated by Snowflake mid-2024, with community drivers from Spark, Trino, Flink, and Dremio.
Project Nessie ships in Dremio Sonar and as a stand-alone OSS server, and added Iceberg REST compatibility in version 0.80, so a single catalog can serve both git-style and REST clients.

Worked example — what breaks without a catalog

Detailed explanation. New lakehouse builders sometimes try to skip the catalog and let writers manage metadata.json on object storage themselves — "we'll just list the directory and pick the latest." This works exactly long enough for the second writer to show up. The moment two processes commit concurrently, one of them silently loses its append because S3 has no atomic compare-and-set on object names.

Question. Two Spark jobs commit to the same Iceberg table at the same time. The "catalog" is a directory listing on S3 that picks the alphabetically latest metadata-vN.json. Walk me through how a commit gets dropped and why a real catalog fixes it.

Input.

Time	Writer A	Writer B	S3 state
t0	reads `v3` as current	reads `v3` as current	`v3.json` present
t1	writes `v4-A.json` (new snapshot)	—	`v3.json`, `v4-A.json`
t2	—	writes `v4-B.json` (new snapshot)	`v3.json`, `v4-A.json`, `v4-B.json`
t3	next reader lists prefix	—	sees `v4-B.json` last
t4	—	next reader lists prefix	sees `v4-B.json` last

Code.

# BROKEN — no catalog, two writers race
def commit_table(snapshot, table_prefix):
    versions = list_s3(f"{table_prefix}/metadata/")
    latest = max(versions)                 # e.g. metadata-v3.json
    next_v = bump(latest)                  # metadata-v4-{uuid}.json
    write_s3(f"{table_prefix}/metadata/{next_v}", snapshot)
    return next_v

# REAL — catalog mediates the commit via atomic CAS
def commit_table_real(snapshot, table_id, catalog):
    current = catalog.load_table(table_id)
    new_meta = build_new_metadata(current, snapshot)
    # POST /v1/.../commits with requirements.assertRefSnapshotId
    catalog.commit_table(
        table_id,
        requirements=[{"type": "assert-ref-snapshot-id",
                        "ref": "main",
                        "snapshot-id": current.snapshot_id}],
        updates=new_meta.diff(),
    )

Step-by-step explanation.

Writer A reads the directory at t0 and sees v3.json as current. So does Writer B.
Writer A writes v4-A.json at t1 with snapshot-id S_A whose parent is v3.
Writer B writes v4-B.json at t2 with snapshot-id S_B whose parent is v3. Both children point at the same parent, so the timeline forks but neither writer knows.
A reader at t3 lists the prefix and picks v4-B.json (alphabetically last). Writer A's commit is now invisible — the snapshot is on disk but is not the current head.
The "real" commit through the catalog includes an assertRefSnapshotId requirement: "I claim main points at S_3 — please CAS it to S_A." The catalog sees the requirement, processes Writer A first, advances main to S_A, then sees Writer B's claim, finds that main is already at S_A not S_3, returns a 409 Conflict, and Writer B retries against the new head.

Output.

Path	Without catalog	With catalog
Writer A snapshot	silently dropped	committed
Writer B snapshot	wins by coincidence	retries → committed second
Lineage	forked, one branch invisible	linear

Rule of thumb. Never let writers pick metadata.json themselves. The catalog's compare-and-set on the current snapshot pointer is the only lakehouse primitive that turns S3 / GCS / ABFS into a safe shared table store.

Worked example — namespace + multi-engine read

Detailed explanation. A namespace tree (catalog → namespace → table) is the searchable surface for every Iceberg consumer. Once a catalog publishes it, Spark, Trino, Flink, and DuckDB all enumerate the same tree using their own SQL dialects — SHOW NAMESPACES, SHOW SCHEMAS, SHOW DATABASES, iceberg_scan('...') — all backed by the same REST endpoint.

Question. A table lives at prod.sales.orders. Show how Spark, Trino, and DuckDB each list the namespace and read the table, and explain why the answers are identical.

Input.

Engine	List command	Read command
Spark	`SHOW NAMESPACES IN prod`	`SELECT * FROM prod.sales.orders`
Trino	`SHOW SCHEMAS FROM prod`	`SELECT * FROM prod.sales.orders`
DuckDB	`SELECT * FROM duckdb_iceberg.namespaces`	`SELECT * FROM iceberg_scan('s3://.../orders')`

Code.

-- Spark — REST catalog config:
--   spark.sql.catalog.prod = org.apache.iceberg.spark.SparkCatalog
--   spark.sql.catalog.prod.catalog-impl = org.apache.iceberg.rest.RESTCatalog
--   spark.sql.catalog.prod.uri = https://catalog.example.com
SHOW NAMESPACES IN prod;
SELECT order_id, total FROM prod.sales.orders WHERE order_date = current_date;

-- Trino — REST connector:
--   iceberg.catalog.type=rest
--   iceberg.rest-catalog.uri=https://catalog.example.com
SHOW SCHEMAS FROM prod;
SELECT order_id, total FROM prod.sales.orders WHERE order_date = current_date;

-- DuckDB — Iceberg extension v1.x
INSTALL iceberg; LOAD iceberg;
CREATE SECRET (TYPE iceberg, ENDPOINT 'https://catalog.example.com', TOKEN '<bearer>');
SELECT order_id, total FROM iceberg_scan('prod.sales.orders')
WHERE order_date = current_date;

Step-by-step explanation.

Each engine's config points at the same REST catalog URL. The HTTP client inside each engine speaks the same OpenAPI spec — only the surrounding SQL surface differs.
SHOW NAMESPACES IN prod translates to GET /v1/{prefix}/namespaces?parent=prod over HTTP. The catalog returns the JSON list — Spark, Trino, and DuckDB all receive the same payload.
Reading a table translates to GET /v1/{prefix}/namespaces/{ns}/tables/{tbl} (load-table). The response includes the current metadata-location, vended storage credentials, and the snapshot manifest list. The engine then reads the parquet files directly from object storage using those credentials.
The "answer" is the same because the catalog is the source of truth. The engines are just three different consumers of the same JSON API.

Output.

Engine	Namespace list	First three orders read
Spark	`[default, sales, marketing]`	1, 2, 3
Trino	`[default, sales, marketing]`	1, 2, 3
DuckDB	`[default, sales, marketing]`	1, 2, 3

Rule of thumb. Pick a catalog whose protocol is open (REST, gRPC, or both). Vendor-specific protocols (HMS / Glue native / Unity proprietary) lock you to one engine flavour; open protocols let you swap engines without re-importing tables.

Worked example — atomic commit via catalog CAS

Detailed explanation. The catalog stores a single pointer per table: "this table's main reference points at snapshot S_n." Every commit is a CAS on that pointer: the writer claims the current value and proposes the new value. The catalog server validates the claim atomically — exactly one writer can succeed for any given parent.

Question. Show how the Iceberg REST commitTable endpoint prevents the lost-update problem, using a worked CAS sequence for two concurrent writers.

Input — two writers, same parent snapshot.

Writer	Parent snapshot	Proposed child
A	`S_3`	`S_A` (append 10k rows)
B	`S_3`	`S_B` (append 8k rows)

Code.

POST /v1/prod/namespaces/sales/tables/orders/commits
Content-Type: application/json
Authorization: Bearer <token>

{
  "requirements": [
    {
      "type": "assert-ref-snapshot-id",
      "ref": "main",
      "snapshot-id": 3
    },
    {
      "type": "assert-table-uuid",
      "uuid": "550e8400-e29b-41d4-a716-446655440000"
    }
  ],
  "updates": [
    { "action": "add-snapshot", "snapshot": { "snapshot-id": 1001, ... } },
    { "action": "set-current-snapshot", "snapshot-id": 1001 }
  ]
}

Step-by-step explanation.

Writer A and Writer B both call loadTable at the same time. Both receive main → S_3 as the current head. Each writes its parquet files to S3 and stages a new snapshot.
Writer A POSTs commitTable with assert-ref-snapshot-id: 3 → 1001. The catalog acquires its row-level lock, verifies that main is S_3, advances main to S_A (id 1001), releases the lock, and returns 200 OK.
Writer B POSTs commitTable with assert-ref-snapshot-id: 3 → 1002. The catalog acquires its row-level lock, finds main is S_A (id 1001), not S_3, and returns 409 Conflict with the current snapshot id.
Writer B retries: re-runs loadTable, re-bases its append on top of S_A, stages a new snapshot S_B' with parent S_A, and POSTs commitTable with assert-ref-snapshot-id: 1001 → 1002. This time the assertion holds and the catalog advances main to S_B'.

Output.

Path	Without CAS	With CAS
Snapshot history	forked (`S_3 → S_A`, `S_3 → S_B`)	linear (`S_3 → S_A → S_B'`)
Rows lost	8k or 10k (whoever lost)	0
Operator wake-ups	1 (at 03:00)	0

Rule of thumb. Every catalog you evaluate must offer atomic CAS on the current snapshot pointer. If the answer is "we rely on object store consistency," walk away — that is not a lakehouse catalog, that is a directory.

Senior engineering interview question on safe concurrent writes

A platform interviewer often opens with: "Two Spark jobs are appending to the same Iceberg fact table every minute. Walk me through the catalog primitives that make that safe, and the failure mode you would expect without them." It blends federation, atomic CAS, and retry semantics into a single architecture answer.

Solution Using catalog CAS for safe concurrent writes

# Senior-engineer answer — catalog CAS + bounded retry
import time
from random import uniform

def append_with_retry(client, table_id, files, max_retries=8):
    """Commit a snapshot via REST commitTable with optimistic CAS."""
    for attempt in range(max_retries):
        current = client.load_table(table_id)
        new_snapshot = build_snapshot(current, files)

        try:
            client.commit_table(
                table_id,
                requirements=[
                    {"type": "assert-table-uuid",
                     "uuid": current.uuid},
                    {"type": "assert-ref-snapshot-id",
                     "ref": "main",
                     "snapshot-id": current.snapshot_id},
                ],
                updates=[
                    {"action": "add-snapshot",
                     "snapshot": new_snapshot},
                    {"action": "set-current-snapshot",
                     "snapshot-id": new_snapshot["snapshot-id"]},
                ],
            )
            return new_snapshot["snapshot-id"]
        except CommitFailedException:        # HTTP 409 from catalog
            # someone else committed; back off and retry
            time.sleep(uniform(0.05, 0.5) * (2 ** attempt))
    raise RuntimeError("commit failed after max retries")

Step-by-step trace.

Step	Writer A	Writer B	Catalog `main` ref
1	`loadTable` → `S_3`	`loadTable` → `S_3`	`S_3`
2	stage files for `S_A`	stage files for `S_B`	`S_3`
3	`commitTable` assert `S_3 → S_A`	—	CAS succeeds → `S_A`
4	—	`commitTable` assert `S_3 → S_B`	409 (current is `S_A`)
5	—	re-`loadTable` → `S_A`; rebase to `S_B'` parent `S_A`	`S_A`
6	—	`commitTable` assert `S_A → S_B'`	CAS succeeds → `S_B'`

After six steps both appends land linearly in the snapshot log. The contention cost is one extra round-trip plus one rebase per loser per conflict.

Output:

Result	Without CAS	With CAS
Writer A snapshot	dropped	committed (`S_A`)
Writer B snapshot	wins by accident	committed (`S_B'`)
Final `main` ref	inconsistent	`S_B'`
Lost rows	up to 18k	0
Cost of contention	1 page	1 retry per loser

Why this works — concept by concept:

catalog CAS — every commit asserts "I think main is S_3" and the catalog atomically advances it to S_A only if the claim holds. This is the same compare-and-set primitive every distributed system uses; the catalog server is the single place that serialises the writes.
assertRefSnapshotId — the REST requirement encodes which reference (main, dev, a Nessie branch) and which snapshot id the writer thinks is current. Both fields are checked; a mismatch on either is a 409.
assertTableUUID — guards against the table being dropped and re-created with the same name between loadTable and commitTable. Cheap insurance against a class of subtle bugs.
retry semantics — losers re-loadTable, rebase their snapshot on the new parent, and re-submit. Exponential backoff with jitter prevents thundering-herd retries.
no S3 lock needed — because the CAS lives in the catalog, the object store can stay last-writer-wins. This is the architectural insight that unlocks lakehouse on commodity blob storage.
Cost — O(1) catalog round-trip in the happy path, O(R) round-trips under R retries, O(parquet bytes) for the file write. Atomicity is bounded by the catalog server's per-table lock, not by the size of the snapshot.

SQL
Topic — dimensional-modeling
Dimensional modelling problems (SQL)

Practice →

2. Iceberg REST Catalog spec — open protocol, vended credentials, engine support

REST catalog is the OpenAPI of Iceberg — one spec, every engine

The mental model in one line: the Iceberg REST catalog is an OpenAPI-3 contract anyone can implement (Tabular, Polaris, Nessie, Glue, Snowflake, Databricks Unity) and anyone can call (Spark, Trino, Flink, DuckDB, ClickHouse, BigQuery, Athena), so the protocol — not any one vendor — is the moat. Once the spec exists, "switching catalogs" becomes a configuration change, not a re-import.

The endpoint surface in one table.

Endpoint	HTTP	Purpose	Idempotent
`/v1/config`	GET	catalog capability discovery + auth hint	yes
`/v1/{prefix}/namespaces`	POST	createNamespace	no
`/v1/{prefix}/namespaces`	GET	listNamespaces	yes
`/v1/{prefix}/namespaces/{ns}/tables`	POST	createTable	no
`/v1/{prefix}/namespaces/{ns}/tables`	GET	listTables	yes
`/v1/{prefix}/namespaces/{ns}/tables/{tbl}`	GET	loadTable (+ vended creds)	yes
`/v1/{prefix}/namespaces/{ns}/tables/{tbl}/commits`	POST	commitTable (atomic CAS)	no
`/v1/{prefix}/namespaces/{ns}/tables/{tbl}`	DELETE	dropTable	no
`/v1/{prefix}/namespaces/{ns}/views`	POST	createView	no
`/v1/{prefix}/oauth/tokens`	POST	OAuth2 client-credentials	yes

Vended credentials explained.

When an engine calls loadTable, the response body includes a credentials block containing a short-lived storage token (STS, GCS HMAC, Azure SAS) scoped to the table's storage prefix. The engine never sees the root bucket key — it only sees a token that can read or write one table's files, expires in roughly an hour, and is automatically refreshed by another loadTable call.

This is the single feature that turns the catalog from a metadata server into a governance plane. Without vended credentials, a leaked engine credential exposes the whole lake. With vended credentials, the blast radius is one table for one hour.

Engine support matrix (2026).

Engine	REST catalog	Notes
Spark	yes (built-in)	`org.apache.iceberg.rest.RESTCatalog`
Trino	yes (built-in)	`iceberg.catalog.type=rest`
Flink	yes (built-in)	`CREATE CATALOG ... 'type'='iceberg' 'catalog-type'='rest'`
DuckDB	yes (extension)	`iceberg` extension v1+
ClickHouse	yes	`Iceberg` table engine + REST URL
Snowflake	yes	catalog-integration → REST URL
BigQuery	yes (preview)	external Iceberg metadata source
Athena	yes	REST endpoint + IAM token

OAuth2 auth in one paragraph.

The catalog publishes an oauth/tokens endpoint that accepts client_credentials grants. Engines exchange a client-id / client-secret pair for a short-lived bearer token, then send Authorization: Bearer <token> on every other call. Scopes are catalog-defined (catalog, data:read, data:write), and the catalog enforces them per principal.

The 2026 reality.

Snowflake's Polaris Catalog is REST-native — there is no proprietary protocol underneath; Polaris simply implements the spec.
Databricks Unity announces an Iceberg REST endpoint (GA late-2025) — Unity Catalog can now serve any REST-compatible engine, not just Databricks runtime.
AWS Glue exposes a REST shim alongside its native API — existing Glue users can flip a switch and gain REST clients.
Tabular (the original commercial REST catalog) was acquired by Databricks in 2024; its engineers now drive the Unity REST surface.
The spec lives at github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml and is governed by the Apache Iceberg PMC.

Worked example — list namespaces with curl

Detailed explanation. Because the REST catalog is just HTTP + JSON, anything that can speak HTTP can talk to it — curl, Postman, a Python script, an engine. The same call works the same way regardless of the client. This is the dimension that makes the protocol portable.

Question. Show how Trino, Spark, and a raw curl all hit the same listNamespaces endpoint and receive the same response. Explain why this matters for debugging in production.

Input.

Client	Command
curl	`curl -H "Authorization: Bearer $T" .../v1/prod/namespaces`
Spark	`SHOW NAMESPACES IN prod`
Trino	`SHOW SCHEMAS FROM prod`

Code.

# Raw HTTP — works from any laptop on the network
curl -H "Authorization: Bearer ${TOKEN}" \
     -H "Accept: application/json"       \
     https://catalog.example.com/v1/prod/namespaces

# Response (same payload regardless of caller):
# {
#   "namespaces": [
#     ["sales"],
#     ["marketing"],
#     ["finance"]
#   ]
# }

// Spark — same HTTP call under the hood
spark.conf.set("spark.sql.catalog.prod",
               "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.prod.catalog-impl",
               "org.apache.iceberg.rest.RESTCatalog")
spark.conf.set("spark.sql.catalog.prod.uri",
               "https://catalog.example.com")

spark.sql("SHOW NAMESPACES IN prod").show()

-- Trino — same HTTP call under the hood
-- etc/catalog/prod.properties:
--   connector.name=iceberg
--   iceberg.catalog.type=rest
--   iceberg.rest-catalog.uri=https://catalog.example.com
SHOW SCHEMAS FROM prod;

Step-by-step explanation.

Each client builds the same HTTP request: GET /v1/prod/namespaces with a bearer token and an Accept header.
The catalog server validates the token, checks that the principal has catalog:list permission on prod, and returns the namespace JSON.
Each client parses the response into its native representation (a Spark DataFrame row, a Trino result set row, a JSON object) — but the bytes on the wire are identical.
When a Trino user complains "I can't see the sales schema," the platform engineer reproduces the failure with curl in five seconds. No need to spin up a Trino cluster.

Output.

Client	Output
curl	`{"namespaces":[["sales"],["marketing"],["finance"]]}`
Spark	`[sales, marketing, finance]`
Trino	`[sales, marketing, finance]`

Rule of thumb. REST means everything is HTTP — and HTTP debuggability is one of the best operational properties a control plane can have. The day you investigate a production catalog issue, you'll be grateful you can curl it.

Worked example — commitTable with optimistic CAS

Detailed explanation. The commitTable endpoint is the heart of the spec. Every other endpoint is glorified discovery; commitTable is the one that turns the catalog into a transactional system. Its anatomy — requirements (CAS assertions) + updates (the change to apply) — is worth memorising for senior interviews.

Question. Given an Iceberg table currently at snapshot S_3, write a commitTable request body that adds a new snapshot S_4 and advances main to it. Explain each requirement field.

Input.

Field	Value
current `main`	`S_3` (snapshot-id 3)
new snapshot id	`1001`
new snapshot operation	`append`
new manifest count	4

Code.

POST /v1/prod/namespaces/sales/tables/orders/commits
{
  "requirements": [
    {
      "type": "assert-table-uuid",
      "uuid": "550e8400-e29b-41d4-a716-446655440000"
    },
    {
      "type": "assert-ref-snapshot-id",
      "ref": "main",
      "snapshot-id": 3
    },
    {
      "type": "assert-last-assigned-field-id",
      "last-assigned-field-id": 12
    }
  ],
  "updates": [
    {
      "action": "add-snapshot",
      "snapshot": {
        "snapshot-id": 1001,
        "parent-snapshot-id": 3,
        "sequence-number": 4,
        "timestamp-ms": 1718265600000,
        "operation": "append",
        "manifest-list": "s3://lake/orders/metadata/snap-1001-...avro",
        "summary": {"added-records": "10000"}
      }
    },
    {
      "action": "set-current-snapshot",
      "snapshot-id": 1001,
      "ref": "main"
    }
  ]
}

Step-by-step explanation.

assert-table-uuid guards against the table being dropped and re-created under the same name in the milliseconds between loadTable and commitTable. If the UUID changed, the commit aborts with 409.
assert-ref-snapshot-id is the CAS itself: "I believe main is at snapshot 3." If the catalog finds main is at any other snapshot, the commit aborts with 409 and returns the actual current snapshot id so the client can rebase.
assert-last-assigned-field-id guards against schema drift — if the schema gained a column between loadTable and commitTable, the commit aborts and the client must re-derive the new manifest with the new column.
add-snapshot registers the new snapshot in the table metadata. set-current-snapshot advances the main reference. Both updates apply atomically — either all updates land or none do.

Output.

State	Before	After
`main` ref	`S_3`	`S_4`
snapshots known	`[S_1, S_2, S_3]`	`[S_1, S_2, S_3, S_4]`
parent of `S_4`	—	`S_3`
HTTP status	—	`200 OK`

Rule of thumb. Every senior data engineer should know the three-field anatomy of commitTable — UUID, ref snapshot id, last field id. Those three assertions are the whole transactional contract.

Worked example — vended credentials end-to-end

Detailed explanation. loadTable is the endpoint engines call before every scan. When the catalog supports credential vending, the response includes a credentials block with short-lived storage tokens. The engine uses those tokens to read parquet files directly from S3 / GCS / ABFS — the catalog is never in the data path.

Question. Walk me through the entire chain of trust: Spark asks the catalog to load a table, the catalog returns vended credentials, Spark reads parquet directly from S3. What expires when? Who renews?

Input.

Actor	Identity	Lifetime
Spark driver	OAuth client_id / secret	long-lived
Spark bearer token	from `/oauth/tokens`	1 hour
Vended S3 token	from `loadTable` `credentials` block	1 hour

Code.

# Step 1 — Spark exchanges client-creds for a catalog bearer token
POST /v1/oauth/tokens
grant_type=client_credentials&client_id=spark-prod&client_secret=...

# Response:
# { "access_token": "eyJ...", "expires_in": 3600, "token_type": "Bearer" }

# Step 2 — Spark calls loadTable; catalog returns vended S3 creds
GET /v1/prod/namespaces/sales/tables/orders
Authorization: Bearer eyJ...

# Response:
# {
#   "metadata-location": "s3://lake/sales/orders/metadata/v42.json",
#   "metadata": { ... full table metadata ... },
#   "config": {
#     "s3.access-key-id":    "ASIA...",
#     "s3.secret-access-key": "wJal...",
#     "s3.session-token":     "FQoG...",
#     "s3.session-expires":   "1718269200000"
#   }
# }

# Step 3 — Spark reads parquet directly from S3 using those vended creds
# (no catalog round-trip per file)

Step-by-step explanation.

Spark starts up and exchanges its long-lived client-id / client-secret for a 1-hour bearer token. The client-id / client-secret never leaves the driver — only the short-lived bearer does.
Spark calls loadTable with the bearer token in the Authorization header. The catalog server authenticates the principal, authorises read on sales.orders, then asks its storage broker (STS / GCS / Azure) for a 1-hour token scoped only to the s3://lake/sales/orders/ prefix.
The catalog returns the table metadata + the vended storage token in one response. Spark caches both for the lifetime of the query.
Spark reads the manifest list, the manifests, and the data files directly from S3 using the vended token. The catalog is not in the data path — it has touched zero parquet bytes.
When the vended token expires, Spark calls loadTable again to refresh. The catalog re-checks the principal's authorisation each time, so revoking a principal in the catalog takes effect at the next refresh — typically within an hour.

Output.

Property	Without vending	With vending
Engine credential	long-lived IAM key	1-hour token
Blast radius of leak	whole bucket forever	one table prefix for one hour
Revocation latency	hours to days	<= 1 hour
Catalog in data path	no	no

Rule of thumb. Always insist on vended credentials when picking a catalog. The lakehouse story falls apart if every engine needs a long-lived IAM key — the only way to give Spark, Trino, Flink, DuckDB safe access to the same lake is for the catalog to hand each of them a scoped, expiring token.

Senior engineering interview question on REST catalog wiring across engines

A senior interviewer might frame this as: "Your platform has Spark for ETL, Trino for ad-hoc analytics, and Flink for streaming. Walk me through wiring all three to a single Iceberg REST catalog and explain what configuration each engine needs."

Solution Using a single REST catalog config across three engines

# Spark — set as session-level catalog
spark.sql.catalog.prod            = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.prod.catalog-impl = org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.prod.uri        = https://catalog.example.com
spark.sql.catalog.prod.credential = <oauth-client-id>:<oauth-client-secret>
spark.sql.catalog.prod.warehouse  = s3://lake/prod
spark.sql.catalog.prod.io-impl    = org.apache.iceberg.aws.s3.S3FileIO

# Trino — etc/catalog/prod.properties
connector.name              = iceberg
iceberg.catalog.type        = rest
iceberg.rest-catalog.uri    = https://catalog.example.com
iceberg.rest-catalog.security = OAUTH2
iceberg.rest-catalog.oauth2.credential = <client-id>:<client-secret>
iceberg.rest-catalog.warehouse = s3://lake/prod
fs.native-s3.enabled        = true

-- Flink — CREATE CATALOG via SQL
CREATE CATALOG prod WITH (
  'type'             = 'iceberg',
  'catalog-type'     = 'rest',
  'uri'              = 'https://catalog.example.com',
  'credential'       = '<client-id>:<client-secret>',
  'warehouse'        = 's3://lake/prod',
  'io-impl'          = 'org.apache.iceberg.aws.s3.S3FileIO'
);
USE CATALOG prod;

Step-by-step trace.

Step	Spark	Trino	Flink
1	exchanges client-creds for bearer	exchanges client-creds for bearer	exchanges client-creds for bearer
2	calls `listNamespaces` → `[sales, marketing, finance]`	calls `listNamespaces` → identical	calls `listNamespaces` → identical
3	calls `loadTable(sales.orders)` → metadata + vended creds	calls `loadTable(sales.orders)` → metadata + vended creds	calls `loadTable(sales.orders)` → metadata + vended creds
4	reads parquet from S3 directly	reads parquet from S3 directly	reads parquet from S3 directly
5	atomic `commitTable` for ETL writes	(read-only)	atomic `commitTable` for sink writes

After five steps all three engines have the same view of the lake, all three are authenticated via OAuth, and none of them holds a long-lived S3 key.

Output:

Engine	Auth	Storage credential	Sees same data?
Spark	OAuth bearer	vended STS (1h)	yes
Trino	OAuth bearer	vended STS (1h)	yes
Flink	OAuth bearer	vended STS (1h)	yes

Why this works — concept by concept:

one URI — every engine points at the same REST endpoint, so the catalog is the single source of truth. Changing the warehouse location is a configuration change in one place, not three.
OAuth token — every engine uses the same client-credentials flow. Revoking a client in the catalog kills its access across all three engines simultaneously.
vended credentials — none of the engines holds a long-lived S3 key. Each loadTable refreshes the storage token, and the catalog re-checks authorisation each refresh.
capability discovery — the GET /v1/config endpoint returns the catalog's capabilities (e.g. view-versions, multi-table-commits). Engines can adapt their behaviour to what the catalog supports without hard-coded vendor checks.
atomic CAS — Spark and Flink both write via commitTable. Concurrent commits from both engines serialise through the catalog's per-table lock; neither loses a snapshot.
Cost — O(1) catalog round-trip per scan plus one OAuth refresh per hour. Data-path bandwidth is unchanged from non-REST architectures because the catalog is never on the file read path.

SQL
Topic — etl
ETL design problems (SQL)

Practice →

3. Project Nessie — git-like catalog, branches and tags on data

Nessie is git for the data lakehouse

The mental model in one line: Nessie turns the lakehouse catalog into a version-controlled reference store — every commit is a hash, every table version is reachable by branch or tag, and git merge semantics apply to data tables across the entire catalog. Once you have used a Write-Audit-Publish branch in anger, the snapshot-only history of vanilla REST catalogs feels like working without version control.

The four primitives.

Branches. A branch is a named, mutable reference (like git main) that points at a commit hash. New branches are copy-on-write at the reference level — branching off main is O(1) and consumes no data bytes.
Tags. A tag is a named, immutable reference. Tags exist for releases, regulatory snapshots, and "the state of the lake at month-end" anchors. A tag cannot be force-moved.
Commits. A commit is a hash that records a snapshot of every table's state. A single Nessie commit can advance multiple tables at once — the atomic multi-table commit feature that pure REST catalogs lack.
Merges and cherry-picks. Carry curated commits from one branch to another with full git-style merge semantics, including conflict resolution.

Architecture.

Server. Java service, gRPC + REST surface, ships in Dremio and as a stand-alone OSS server (projectnessie/nessie).
Backend. RocksDB for embedded, JDBC (Postgres / MySQL) or DynamoDB for production. The backend is a key-value store of commit hashes, not the data itself.
Storage. Data files live on S3 / GCS / ABFS exactly as with any other Iceberg catalog. Nessie is purely the reference store — the parquet bytes are untouched.

The Write-Audit-Publish (WAP) pattern.

WAP is the canonical use of Nessie:

Write. Branch off main into etl_2026_06_13. Run the ETL job against that branch. The writes land in normal parquet files but the references advance only on the dev branch.
Audit. Run dbt tests, Great Expectations, or any quality check against the dev branch. Production consumers are still on main; they see none of the in-flight changes.
Publish. If audit passes, MERGE the dev branch into main. If audit fails, drop the dev branch — O(1) rollback with zero data movement.

Every regulated industry on earth invents WAP independently; Nessie ships it as a built-in.

Multi-table atomic commits.

The Iceberg REST spec has a commitMultiTable extension, but it is optional and not all catalogs implement it. Nessie is multi-table atomic by construction — every Nessie commit can advance N tables in one transaction. A MERGE across two tables (e.g. orders + order_items) lands as a single Nessie commit; either both tables advance or neither does.

Engine support.

Engine	Nessie support	Notes
Spark	yes (built-in)	`org.projectnessie.spark.NessieCatalog`
Flink	yes	nessie catalog plugin
Trino	yes (via Iceberg REST shim)	Nessie 0.80+
Dremio Sonar	yes (native)	Dremio markets Nessie as the OSS Polaris alternative
Presto	yes	nessie plugin
DuckDB	yes (via Iceberg REST shim)	Nessie 0.80+

The 2026 reality.

Nessie 0.80+ ships Iceberg REST-spec compatibility — a single Nessie server can serve both git-style clients and vanilla REST clients.
Dremio Sonar markets Nessie as the open-source alternative to Polaris for organisations that need WAP and multi-table atomicity.
Multi-table commit is unmatched by any other open catalog — Polaris is single-table-atomic, Glue is single-table-atomic, Unity is single-table-atomic.
Snowflake and BigQuery do not (yet) speak Nessie's git API, only its REST surface; the git semantics are most powerful from Spark, Flink, Trino, and Dremio.

Worked example — branch / commit / merge a WAP flow

Detailed explanation. WAP turns "did I just corrupt prod?" into "did I just corrupt my branch?" — and a corrupt branch costs nothing because dropping it is one CLI command. The reads on main see none of the in-flight writes until the publish step.

Question. Walk me through a daily ETL run that ingests orders into the gold layer using a Nessie WAP pattern. Show the CLI / SQL commands at each step.

Input.

Step	Branch state	`main` state
0	(none)	`gold.orders` at v100
1 (branch)	`etl_2026_06_13` at v100	v100
2 (write)	`etl_2026_06_13` at v101 (10k new rows)	v100
3 (audit)	run dbt tests on dev branch	v100
4 (merge)	`etl_2026_06_13` at v101 (merged)	v101

Code.

-- Step 1 — branch off main
CREATE BRANCH etl_2026_06_13 IN nessie FROM main;
USE REFERENCE etl_2026_06_13 IN nessie;

-- Step 2 — write to the dev branch only
INSERT INTO gold.orders
SELECT * FROM silver.orders_landing
WHERE ingestion_date = DATE '2026-06-13';

-- Step 3 — audit (dbt tests, GE checks) against the dev branch
SELECT COUNT(*) AS new_rows                 FROM gold.orders@etl_2026_06_13;
SELECT COUNT(*) AS dupe_keys
FROM (
    SELECT order_id, COUNT(*) c
    FROM gold.orders@etl_2026_06_13
    GROUP BY 1
    HAVING COUNT(*) > 1
);

-- Step 4 — publish (merge into main) or drop on failure
-- if audit passed:
MERGE BRANCH etl_2026_06_13 INTO main IN nessie;
-- if audit failed:
DROP BRANCH etl_2026_06_13 IN nessie;

Step-by-step explanation.

CREATE BRANCH is a metadata-only operation — no data is copied. The new branch starts as a pointer at main's current commit hash. O(1) time, O(1) space.
The INSERT on the dev branch writes parquet files to S3 (those bytes are physical) but advances only the dev branch reference. Production readers on main see exactly the same data they saw before.
Audit queries use the @etl_2026_06_13 reference syntax to read from the dev branch. The same SQL runs against the dev state — no special tooling required.
If audit passes, MERGE BRANCH fast-forwards main to the dev commit. The merge itself is a metadata-only reference update — no data movement.
If audit fails, DROP BRANCH deletes the dev reference. The parquet files become orphaned and are cleaned up by the next Iceberg expire-snapshots run. Zero downtime, zero risk of half-applied state on main.

Output.

State	After step 1	After step 2	After step 4 (success)	After step 4 (failure)
`main` ref	v100	v100	v101	v100
`etl_2026_06_13` ref	v100	v101	merged	dropped
Rows on `main`	1M	1M	1.01M	1M
Audit time	—	—	seconds	seconds

Rule of thumb. Branch every ETL job. The cost is one metadata operation; the payoff is bulletproof rollback semantics for the entire pipeline.

Worked example — multi-table atomic commit

Detailed explanation. A fact-and-dimension append (e.g. orders + order_items) must land atomically — either both tables get the new rows or neither does. Vanilla REST catalogs cannot guarantee this across two tables. Nessie can, because a single Nessie commit can advance many tables at once.

Question. Walk me through inserting into orders and order_items as a single atomic operation across both tables using Nessie. Show the failure mode under a single-table-atomic catalog.

Input.

Table	Before commit	After commit
`orders`	1,000,000 rows	1,010,000 rows
`order_items`	3,500,000 rows	3,536,000 rows

Code.

-- Spark + Nessie — multi-table commit on a single branch
USE REFERENCE etl_orders_2026_06_13 IN nessie;

INSERT INTO orders        SELECT * FROM landing.orders_new;
INSERT INTO order_items   SELECT * FROM landing.order_items_new;

-- One Nessie commit covers both INSERTs because they share a single branch.
-- When we merge the branch into main:
MERGE BRANCH etl_orders_2026_06_13 INTO main IN nessie;
-- → main advances atomically: both tables update or neither does.

Step-by-step explanation.

Both INSERTs run on the same branch (etl_orders_2026_06_13). Each INSERT advances the branch reference; under the hood, the second INSERT's commit has the first as parent.
MERGE BRANCH fast-forwards main to the head of the dev branch. The fast-forward is a single metadata operation on a single reference, but it carries the entire chain of dev commits with it — both INSERTs land as one atomic step from main's perspective.
Readers on main see either neither INSERT (before merge) or both INSERTs (after merge). There is no window in which orders has the new rows but order_items does not.
Under a single-table-atomic catalog (vanilla REST, Glue, Polaris), the same workflow would land each INSERT as a separate commit. A reader between the two commits would see orders advance without order_items — a foreign-key-inconsistent view.

Output.

Reader timing	Nessie multi-table	Single-table REST
Before either INSERT	old, old	old, old
Between INSERTs	(not possible)	old + new, old
After both INSERTs	new, new	new, new

Rule of thumb. If your tables have cross-table invariants (FK, parent-child counts, sum-equal-sum), you need a multi-table-atomic catalog. Nessie is the only mainstream open catalog that ships this; the alternative is to model both tables as a single wide table and lose the dimensional model.

Worked example — tag a daily release

Detailed explanation. Tags are immutable references — the snapshot equivalent of a git release tag. They are how you say "the lake as of midnight" for regulatory purposes, end-of-month reconciliation, or downstream consumers that want a frozen view.

Question. Show how to create an immutable daily release tag from main and how to query the lake at that tag from Spark.

Input.

Reference	Type	Mutability
`main`	branch	mutable
`release_2026_06_12`	tag	immutable

Code.

-- Create the tag (typically scheduled at 00:00 UTC)
CREATE TAG release_2026_06_12 IN nessie FROM main;

-- Optional protection — make the tag undroppable except by admin
ALTER TAG release_2026_06_12 SET access_rules = ('drop' -> 'admin-only');

-- Query the lake at the tag
USE REFERENCE release_2026_06_12 IN nessie;
SELECT COUNT(*) FROM gold.orders;
-- → returns the count as of midnight 2026-06-12

-- Or inline:
SELECT COUNT(*) FROM gold.orders@release_2026_06_12;

Step-by-step explanation.

CREATE TAG records the current commit hash of main under the new immutable reference name. The operation is metadata-only and O(1).
USE REFERENCE switches the session's read context to the tag. Every subsequent query reads tables as of the tagged commit.
The @release_2026_06_12 inline syntax lets a single query read from the tag without switching session context — useful for ad-hoc reconciliation.
Tags are immutable by default — attempting to move or overwrite a tag returns an error. Access rules can further restrict who may drop the tag at all.

Output.

Query	Result
`SELECT COUNT(*) FROM gold.orders` on `main`	varies (live)
`SELECT COUNT(*) FROM gold.orders@release_2026_06_12`	fixed (frozen at tag time)

Rule of thumb. Tag every meaningful snapshot — end-of-day, end-of-month, regulatory cut-offs. The cost is one reference write; the payoff is an immutable, reproducible view of the lake that survives any subsequent ETL run.

Senior engineering interview question on rollback via Nessie branches

A senior interviewer might frame this as: "A bad ETL run corrupted the silver layer this morning. The corrupt data has already been read by three downstream jobs. Walk me through rolling back to the previous day's state with zero data movement and zero downtime."

Solution Using Nessie ASSIGN BRANCH for instant rollback

-- Senior-engineer answer — atomic reference reassignment
-- 1) Identify the last good tag (created at midnight yesterday)
SHOW REFERENCES IN nessie;
-- → main, release_2026_06_12, etl_2026_06_13 (corrupt)

-- 2) (Optional) tag the corrupt state for forensics
CREATE TAG forensics_2026_06_13_bad IN nessie FROM main;

-- 3) Reassign main to the previous day's release tag
ASSIGN BRANCH main IN nessie TO release_2026_06_12;

-- 4) Verify
USE REFERENCE main IN nessie;
SELECT MAX(ingestion_date) FROM silver.orders;
-- → 2026-06-12 (rollback confirmed)

Step-by-step trace.

Step	Timeline	`main` ref	Reader view
1	09:00 ETL writes bad data	`S_etl_bad`	corrupt
2	09:30 alert fires	`S_etl_bad`	corrupt
3	09:35 tag forensics	`S_etl_bad`	corrupt
4	09:36 `ASSIGN BRANCH main TO release_2026_06_12`	`S_release_06_12`	clean
5	09:36 downstream reads	`S_release_06_12`	clean

After five steps, the entire lake is back to yesterday's state. The bad commit S_etl_bad still exists on disk (and under the forensics tag) — nothing has been deleted. The rollback was a single metadata reference update.

Output:

Property	Before rollback	After rollback
`main` ref	`S_etl_bad`	`S_release_06_12`
Reader view of silver	corrupt	yesterday's clean state
Data files on S3	unchanged	unchanged
Downtime	—	0
Forensics tag	none	`forensics_2026_06_13_bad`

Why this works — concept by concept:

reference, not snapshot — Nessie rolls back by re-pointing the main branch reference at an earlier commit hash. No snapshots are deleted, no manifests are rewritten, no parquet files are touched.
atomic reassign — ASSIGN BRANCH is a single transaction at the Nessie level. Either the reference moves or it does not; there is no intermediate state where half the catalog points old and half points new.
no data movement — the rollback is O(1) in bytes. Compare with a snapshot-based catalog where rolling back across multiple tables requires running an expire_snapshots per table and possibly rewriting manifests.
multi-table consistency — because Nessie commits are catalog-wide, reassigning main rolls back every table in the catalog to the tag's state. There is no risk of one table reverting while a sibling stays at the bad commit.
forensics preserved — tagging the corrupt state before rollback keeps the bad commit reachable for post-mortem. The parquet files survive until the next expire-snapshots run.
Cost — O(1) metadata writes. The whole rollback is two reference updates (forensics tag + ASSIGN BRANCH) and completes in single-digit milliseconds. The slowest part is the on-call engineer typing the command.

SQL
Topic — time-series
Time-series analysis problems (SQL)

Practice →

4. Apache Polaris — multi-tenant, Iceberg + RBAC

Polaris is Snowflake's open Iceberg catalog — multi-tenant from day one

The mental model in one line: Apache Polaris is an Iceberg REST catalog designed around a SaaS multi-tenant control plane — every entity is a CATALOG / PRINCIPAL / ROLE / GRANT, RBAC is enforced at the namespace and table level, and short-lived storage credentials are vended for every load. If your platform serves multiple product teams, lines of business, or customer tenants on one lake, Polaris is the catalog you can defend to your security team.

The Polaris object model.

Catalog. The top-level container. A catalog has a name, a default storage location, and a set of grants. Catalogs are usually one-per-tenant or one-per-environment (prod, staging, dev).
Namespace. A nested folder inside a catalog. Maps cleanly to schemas in Spark / Trino.
Table / View. The standard Iceberg objects. Grants apply at table and view level.
Principal. A user or service-account identity. Principals authenticate via OAuth2 client-credentials and own zero permissions until granted.
Role. A bundle of grants. Roles are assigned to principals. Multiple roles per principal allowed; effective permissions are the union.
Grant. A (privilege, securable) pair attached to a role. Privileges are SQL-flavoured (USAGE, SELECT, INSERT, UPDATE, MERGE, CREATE_NAMESPACE, etc.).

The federation knob.

A single Polaris server can host many catalogs. One Polaris instance can simultaneously serve:

A Snowflake account (Snowflake reads Polaris as its native Iceberg catalog).
A Spark cluster doing ETL.
A Trino cluster doing analytics.
A Flink streaming sink.
A Dremio reader.

The principals + roles + grants model isolates each consumer without splitting the catalog.

Storage credential vending.

Polaris supports STS (AWS), Workload Identity Federation (GCP), and Azure SAS for storage. On every loadTable, the catalog generates a short-lived storage token scoped to the table's prefix and returns it in the response. Engines never see the root storage credential.

The 2026 reality.

Apache Polaris is an Apache Software Foundation incubator project donated by Snowflake mid-2024. The Apache name lends governance neutrality versus a Snowflake-owned project.
GA on Snowflake mid-2025. Snowflake customers can spin up a Polaris catalog without managing it themselves.
Self-hosted Polaris runs as a Java service backed by Postgres for metadata. Single-node and HA topologies are documented.
Spark, Trino, Flink, and Dremio drivers all speak Polaris through the standard Iceberg REST spec — no Polaris-specific code paths.
Databricks Unity Catalog now also publishes Iceberg via REST, competing directly. Polaris distinguishes itself with the multi-tenant catalog model and a fully open governance story.

Worked example — create a tenant + role + grant in Polaris

Detailed explanation. Onboarding a new team to a multi-tenant lakehouse is a four-step ritual: create the catalog, create the role, create the principal, grant the role's privileges. Polaris models this with familiar SQL DDL (and equivalent REST calls).

Question. Walk me through onboarding a new "ml-platform" team to a Polaris-managed lakehouse. They need full control of their own catalog plus read-only access to the shared "dim" catalog.

Input.

Object	Name	Type
catalog	`ml_platform`	new
catalog	`dim`	existing (shared)
role	`ml_platform_owner`	new
principal	`svc-ml-platform-prod`	new

Code.

-- 1) Create the tenant catalog
CREATE CATALOG ml_platform
  TYPE = ICEBERG
  STORAGE = 's3://lake/ml-platform/'
  STORAGE_CREDENTIAL_KEY = 'aws-role-ml-platform';

-- 2) Create the role for that catalog
CREATE ROLE ml_platform_owner;

-- 3) Grant full control of the new catalog to the role
GRANT USAGE   ON CATALOG ml_platform TO ROLE ml_platform_owner;
GRANT CREATE_NAMESPACE
              ON CATALOG ml_platform TO ROLE ml_platform_owner;
GRANT MANAGE  ON CATALOG ml_platform TO ROLE ml_platform_owner;

-- 4) Grant read-only access to the shared dim catalog
GRANT USAGE ON CATALOG dim TO ROLE ml_platform_owner;
GRANT SELECT ON ALL TABLES IN CATALOG dim TO ROLE ml_platform_owner;

-- 5) Create a service-account principal and bind the role
CREATE PRINCIPAL svc_ml_platform_prod
  CLIENT_ID = 'ml-platform-prod'
  CLIENT_SECRET = '<generated>';

GRANT ROLE ml_platform_owner TO PRINCIPAL svc_ml_platform_prod;

Step-by-step explanation.

The new catalog gets its own storage prefix and IAM role. Polaris remembers the IAM role and will use it to mint vended STS tokens for any reader of this catalog.
ml_platform_owner is the role bundle — every member of the team gets this role on their principal. Adding a new team member is a one-line role grant.
The four GRANT ... ON CATALOG ml_platform statements give the role full control of its own catalog: usage, namespace creation, and full manage.
The two GRANT ... ON CATALOG dim statements add read-only access to the shared dim catalog — the team can read shared dimensions but cannot modify them.
CREATE PRINCIPAL mints OAuth client credentials. GRANT ROLE ... TO PRINCIPAL binds the role to the identity. The team can now point their Spark / Trino / Flink clients at Polaris with these credentials.

Output.

Resource	Access by `svc_ml_platform_prod`
`ml_platform` catalog	full control
`ml_platform..` tables	full CRUD
`dim` catalog	usage + read
`dim..` tables	SELECT only
any other catalog	no access

Rule of thumb. Model one catalog per tenant. Cross-tenant sharing happens through explicit GRANT USAGE + GRANT SELECT. This keeps the blast radius of any misconfiguration to one tenant and makes per-tenant billing trivial (catalog == cost-allocation unit).

Worked example — Spark reads via Polaris with OAuth client-creds

Detailed explanation. Once a principal exists, every engine logs in with the same OAuth2 client-credentials flow. Spark sees Polaris as a vanilla REST catalog — the multi-tenant model is invisible at the wire level.

Question. Configure a Spark session to read from the ml_platform.features.user_features table using the svc_ml_platform_prod principal. Walk through the request sequence.

Input.

Item	Value
Polaris URL	`https://polaris.example.com`
Catalog	`ml_platform`
Client id	`ml-platform-prod`
Client secret	`<vault-managed>`

Code.

from pyspark.sql import SparkSession

spark = (
    SparkSession.builder
    .appName("ml-features-read")
    .config(
        "spark.sql.catalog.ml",
        "org.apache.iceberg.spark.SparkCatalog",
    )
    .config(
        "spark.sql.catalog.ml.catalog-impl",
        "org.apache.iceberg.rest.RESTCatalog",
    )
    .config("spark.sql.catalog.ml.uri",
            "https://polaris.example.com")
    .config("spark.sql.catalog.ml.warehouse", "ml_platform")
    .config("spark.sql.catalog.ml.credential",
            "ml-platform-prod:${env:ML_PLATFORM_SECRET}")
    .config("spark.sql.catalog.ml.scope", "PRINCIPAL_ROLE:ALL")
    .config("spark.sql.catalog.ml.io-impl",
            "org.apache.iceberg.aws.s3.S3FileIO")
    .getOrCreate()
)

df = spark.sql("""
    SELECT user_id, feature_v1, feature_v2
    FROM ml.features.user_features
    WHERE region = 'EU'
""")
df.show()

Step-by-step explanation.

Spark starts up. The driver exchanges the OAuth client-id / secret for a bearer token via POST /v1/oauth/tokens.
Spark calls GET /v1/config for capability discovery and GET /v1/ml_platform/namespaces to list the catalog. Polaris authorises the principal: it has USAGE on ml_platform, so the list succeeds.
Spark calls GET /v1/ml_platform/namespaces/features/tables/user_features. Polaris authorises SELECT on the table (granted via the role), mints a vended STS token scoped to the table prefix, and returns the metadata + creds.
Spark reads parquet from S3 using the vended STS token. The catalog is not in the data path.
The bearer token + vended STS both expire in 1 hour. The next scan triggers refresh on both.

Output.

Stage	Polaris response
OAuth	200 + bearer (1h)
listNamespaces	200 + `[features, models, serving]`
loadTable	200 + metadata + vended STS
S3 read	200 + parquet bytes
Aggregate query	rows from EU users

Rule of thumb. Polaris from an engine's perspective looks like any other REST catalog. The multi-tenant model is invisible at the wire — it shows up only in authorisation decisions on the server side.

Worked example — storage credential vending on read

Detailed explanation. The single biggest operational reason to pick Polaris is that no engine ever holds a long-lived S3 key. Every read mints a fresh, scoped, expiring STS token. Revoking a principal in Polaris kills its lake access within an hour, regardless of how many engines were running.

Question. Walk through what an attacker would see if they stole the bearer token from a Spark driver running against Polaris. Compare with a non-vending catalog.

Input.

Asset	Stolen	Useful for
Spark bearer (OAuth)	1h	calling Polaris APIs as the principal
Vended S3 STS	1h	reading one table prefix
Long-lived S3 key (counter-factual)	forever	reading the whole bucket

Code.

# Attack surface with Polaris vended creds
attacker has bearer  → can call loadTable for ANY table the principal is
                       allowed to access, for the next ~1h
attacker has STS     → can read s3://lake/ml-platform/features/user_features/
                       for the next ~1h, then 403

# Attack surface with a non-vending catalog
attacker has S3 key  → can read s3://lake/ recursively
                       forever (until key is manually rotated, days to weeks)

Step-by-step explanation.

The Polaris bearer token is scoped to the principal's role bundle. An attacker can read tables the principal can read, but cannot escalate.
The vended S3 STS is scoped to a single table prefix. An attacker who somehow lifts the STS token cannot read sibling tables, only the one that loaded.
Both tokens expire in roughly an hour. Revoking the principal in Polaris kills future minting; in-flight tokens still live to their natural expiry. The whole lake is safe again within an hour.
Under a long-lived-key model, the equivalent recovery is: rotate the S3 key, redeploy every engine, audit every read since the leak. Hours to days, not minutes.

Output.

Threat	With Polaris vended creds	With long-lived S3 key
Blast radius	one principal, one table	the whole bucket
Time to revoke	<= 1 hour automatic	hours to days manual
Lateral movement	none	cross-table, cross-tenant
Audit trail	per-load entry in catalog	none

Rule of thumb. If your security team has ever asked "what happens if a service account leaks?" — Polaris's answer is "an hour of one table." That answer alone is often enough to win the catalog decision.

Senior engineering interview question on multi-tenant catalog design

A senior interviewer might frame this as: "You're picking a single Iceberg catalog to serve 12 product teams, each with their own data and one shared dim layer. Walk me through how Polaris's catalogs / principals / roles / grants solve isolation, billing, and discoverability — and the trade-off versus per-team catalog instances."

Solution Using Polaris catalogs-as-tenants pattern

-- One catalog per team, plus one shared dim catalog
CREATE CATALOG dim          STORAGE = 's3://lake/dim/';
CREATE CATALOG sales        STORAGE = 's3://lake/sales/';
CREATE CATALOG marketing    STORAGE = 's3://lake/marketing/';
CREATE CATALOG product      STORAGE = 's3://lake/product/';
-- ... etc for 12 teams

-- Per-team owner role
CREATE ROLE sales_owner;
GRANT MANAGE ON CATALOG sales TO ROLE sales_owner;
GRANT USAGE  ON CATALOG dim   TO ROLE sales_owner;
GRANT SELECT ON ALL TABLES IN CATALOG dim TO ROLE sales_owner;

-- A global "data-platform" role for the platform team
CREATE ROLE data_platform;
GRANT MANAGE ON CATALOG dim TO ROLE data_platform;
-- (the platform team can manage shared dims, not tenant catalogs)

-- Bind principals
CREATE PRINCIPAL svc_sales_prod    CLIENT_ID = 'sales-prod';
CREATE PRINCIPAL svc_platform_prod CLIENT_ID = 'platform-prod';

GRANT ROLE sales_owner    TO PRINCIPAL svc_sales_prod;
GRANT ROLE data_platform  TO PRINCIPAL svc_platform_prod;

Step-by-step trace.

Step	What	Effect
1	one catalog per team + one shared `dim`	isolation by name + by storage prefix
2	per-team owner role with `MANAGE` on team catalog	full self-service inside the team's catalog
3	per-team owner also has `SELECT` on shared `dim`	discoverable cross-tenant joins without copies
4	platform team has `MANAGE` only on `dim`, not tenant catalogs	governance separation: platform manages shared, tenants own their own
5	each principal sees only its catalogs in `listCatalogs`	discoverability + isolation in one model

After five steps, 12 teams each have their own catalog, can self-serve namespaces and tables, can read the shared dim layer without copying, cannot read each other's data, and the platform team controls the shared dim without intruding on tenant catalogs.

Output:

Principal	Catalogs visible	`MANAGE` on	`SELECT` on shared dim
`svc_sales_prod`	`sales`, `dim`	`sales`	yes
`svc_marketing_prod`	`marketing`, `dim`	`marketing`	yes
`svc_platform_prod`	every catalog	`dim` only	yes (own)
any other	only own catalog + `dim`	own only	yes

Why this works — concept by concept:

catalog-as-tenant — one catalog per team gives a clean cost-allocation unit (storage prefix == billing key), a clean blast-radius unit (a misconfiguration affects one catalog), and a clean discovery unit (the team's catalog is what they see when they listCatalogs).
role inheritance — roles bundle grants. Adding a new team member is one GRANT ROLE statement; offboarding is one REVOKE. No per-table fiddling.
principal granularity — separate service-account principals per environment (svc_sales_prod, svc_sales_dev) keep secrets and audit trails distinct. The principal is the unit of secret rotation.
cross-catalog grants — GRANT SELECT ON CATALOG dim lets every team join against shared dimensions without copying them into per-tenant catalogs. This is the killer pattern for the "single shared dim, many fact catalogs" architecture.
governance separation — platform team has manage rights only on the shared dim catalog. Tenant teams have manage rights only on their own catalog. Neither can step on the other's toes.
Cost — O(catalogs) storage prefixes, O(catalogs) IAM roles, O(roles + grants) rows in the Polaris metadata DB. No per-table overhead — even 200 tables per team is comfortable for a single Polaris instance.

SQL
Topic — design
System design problems (SQL)

Practice →

5. Picking a catalog — engine support, multi-tenancy, governance trade-offs

Three catalogs, four axes — pick by the one that bites first

The mental model in one line: the three open Iceberg catalogs (vanilla REST, Nessie, Polaris) score differently on four axes — engine fit, multi-tenancy, versioning, governance — and the right answer is the one that nails whichever axis is most painful in your platform today. Once the axes are explicit, the choice stops being about marketing fit and starts being about which trade-off you can live with.

The feature matrix.

Catalog	Engine fit	Multi-tenancy	Versioning	Governance
REST (vanilla) — Tabular / Glue / DIY	strong (built into every engine)	partial (single-catalog, requires app-layer logic)	snapshot only	partial (depends on impl)
Nessie	strong (Iceberg + REST shim)	basic (single-server, branch as isolation)	killer (branches / tags / merge / multi-table atomic)	basic RBAC
Polaris	strong (REST-native)	killer (catalogs + principals + roles + grants)	snapshot only (no branches)	strong (ANSI-flavour GRANT / REVOKE)

When to pick which.

Pick vanilla REST when you have one team, one tenant, no version-control needs, and you want the lightest possible catalog. Glue's REST mode, Tabular, or a small Polaris instance all qualify.
Pick Nessie when WAP is non-negotiable, when multi-table atomicity matters (FK invariants, parent-child counts), or when you want git semantics on top of the lake.
Pick Polaris when multi-tenancy is non-negotiable, when you need ANSI-style RBAC, when you serve multiple product teams or external customers, or when you want a Snowflake-native Iceberg catalog.

The hybrid pattern.

Most large enterprises in 2026 end up running Polaris in front of Nessie-backed branches via a federation layer (Gravitino, custom proxy). The line between the two is blurring as REST becomes the lingua franca — Nessie 0.80+ already speaks the REST spec, and the Polaris community is openly discussing branch / tag semantics for future versions.

The 2026 reality.

REST is the lingua franca. Every serious catalog speaks it; the choice between catalogs is now mostly about features, not protocol.
Polaris vs Unity is the new vendor decision. Snowflake-aligned shops pick Polaris; Databricks-aligned shops pick Unity. Both publish via REST, so the engine layer is identical.
Nessie owns the WAP + multi-table niche. No other open catalog ships true multi-table atomic commits.
Glue's REST mode is the easy migration. AWS-native shops with existing Glue tables flip a config flag and gain REST clients without re-importing.

Worked example — three-tenant analytics platform

Detailed explanation. A data platform team supports three product teams — sales analytics, marketing analytics, finance — on one lake. Each team has 30-80 tables. The shared dim layer holds customer / product / calendar. Tenant isolation and per-team billing are explicit requirements.

Question. Pick a catalog and justify the choice on each of the four axes.

Input.

Requirement	Priority
3 teams + shared dim	high
isolation between teams	high
ANSI-style RBAC	high
Spark + Trino + Snowflake reads	high
WAP or multi-table atomic	low
branches / tags	low

Code.

Decision: Polaris

Justification by axis:
  engine fit     — strong; Spark + Trino + Snowflake all speak Polaris's REST surface
  multi-tenancy  — killer; one catalog per team + a shared dim catalog
  versioning     — sufficient; snapshot-level rollback is acceptable, no WAP needed
  governance     — strong; ANSI-style GRANT / REVOKE, per-catalog billing, vended creds

Step-by-step explanation.

Multi-tenancy is the highest priority and Polaris's catalogs / principals / roles / grants model is the cleanest fit. Per-tenant catalogs give per-tenant billing and per-tenant blast-radius isolation for free.
ANSI-style RBAC is required because the security team wants GRANT SELECT ON ... audit trails. Polaris ships exactly that surface; Nessie's RBAC is more basic.
WAP and multi-table atomicity are low-priority — the ETL team is happy with snapshot-level rollback and single-table commits. Nessie's killer features are wasted on this team.
Engine fit is uniform across all three options for the engines listed.

Output.

Axis	Polaris	Nessie	REST (plain)
Multi-tenancy	win	partial	partial
Governance	win	basic	partial
Versioning	OK	overkill	OK
Engine fit	tied	tied	tied

Rule of thumb. Multi-tenancy + RBAC → Polaris. If the requirement list opens with "isolate N teams on one lake," Polaris is the default.

Worked example — heavy ETL team needing WAP

Detailed explanation. A data team runs 200 ETL jobs per day. The CFO has asked for bulletproof rollback semantics after a previous quarter's reporting blew up due to a half-applied ETL. The data team owns one catalog; there is no multi-tenancy requirement.

Question. Pick a catalog and justify the choice on each of the four axes.

Input.

Requirement	Priority
WAP per ETL job	high
multi-table atomic commits	high
instant rollback (no data movement)	high
single-team usage	high
ANSI-style RBAC	low
12+ tenants	low

Code.

Decision: Nessie

Justification by axis:
  engine fit     — strong; Spark + Flink + Trino all speak Nessie + REST shim
  multi-tenancy  — basic; not needed for one team
  versioning     — killer; branches + tags + multi-table atomic commits
  governance     — basic RBAC, sufficient for one team

Step-by-step explanation.

Versioning is the highest priority. Nessie's branch / tag / merge / multi-table atomic commit model is the only open catalog that ships this fully.
Multi-tenancy is irrelevant — one team, no per-tenant billing required.
RBAC requirements are basic — Nessie's branch-level access rules are sufficient for one team.
Engine fit is uniform; Spark / Flink / Trino all work fine.

Output.

Axis	Polaris	Nessie	REST (plain)
Versioning	snapshot only	win	snapshot only
Multi-table atomic	no	win	no
Multi-tenancy	overkill	OK	OK
Governance	overkill	OK	OK

Rule of thumb. WAP or multi-table atomic → Nessie. If the requirement list opens with "we can never have a half-applied ETL," Nessie is the default.

Worked example — small team standardising on Spark + Trino

Detailed explanation. A small data team standardises on Spark for ETL and Trino for analytics. There is one tenant, no WAP requirement, and they want the lowest-operational-cost catalog they can get. AWS-native shop.

Question. Pick a catalog and justify the choice on each of the four axes.

Input.

Requirement	Priority
lowest operational cost	high
Spark + Trino read/write	high
AWS-native	high
1 tenant	high
WAP	low
multi-tenancy	low

Code.

Decision: AWS Glue REST mode (vanilla REST)

Justification by axis:
  engine fit     — strong; Spark + Trino out of the box
  multi-tenancy  — N/A; single tenant
  versioning     — snapshot only, acceptable
  governance     — IAM-backed; vended creds via Glue + STS

Step-by-step explanation.

Lowest operational cost is the highest priority. Glue is fully managed by AWS; there is nothing to run.
Single tenant means Polaris's multi-tenant model is overkill.
No WAP requirement means Nessie's git features are unused.
AWS-native shop already trusts IAM for everything else; Glue's IAM-backed authorisation aligns with that.

Output.

Axis	Polaris	Nessie	Glue REST
Ops cost	medium	medium	win (managed)
Engine fit	tied	tied	tied
Multi-tenancy	overkill	OK	OK
Versioning	OK	overkill	OK

Rule of thumb. If the team is one tenant, AWS-native, and snapshot-level rollback is enough, Glue REST mode is the cheapest path. Polaris is overkill; Nessie's features are wasted.

Senior engineering interview question on catalog decision

A senior interviewer might frame this as: "You inherit a Hive-metastore-based lakehouse with 4 teams and 200 tables. The CTO wants you to pick a target catalog this quarter. Walk me through the decision matrix and your final recommendation."

Solution Using a tiered axis-first decision

Decision rubric — score each axis against the platform's needs

1. ENGINE FIT
   - Spark, Trino, Snowflake (read-only) → all speak REST → tied
   - Pick: any REST-spec catalog

2. MULTI-TENANCY
   - 4 teams + 1 shared dim → per-tenant catalogs + cross-tenant SELECT
   - Pick: Polaris (catalogs as tenants)

3. VERSIONING
   - WAP per ETL job desired but not yet mandated
   - Nice-to-have: branches / tags
   - Pick: Polaris (snapshot rollback) — escalate to Nessie federation if
     mandate arrives

4. GOVERNANCE
   - Security team wants ANSI-style GRANT / REVOKE
   - Pick: Polaris

FINAL DECISION: Polaris on the control plane, with the option to layer
Nessie behind a subset of catalogs if WAP becomes a hard requirement.
Hive metastore migrates table-by-table via Iceberg's snapshot import.

Step-by-step trace.

Axis	Need	REST plain	Nessie	Polaris
engine fit	Spark / Trino / Snowflake	tied	tied	tied
multi-tenancy	4 teams + shared dim	partial	basic	win
versioning	WAP nice-to-have	snapshot	overkill	snapshot
governance	ANSI RBAC	partial	basic	win
ops cost	run one service	low	medium	medium
final	—	OK	mismatched priority	best fit

Output:

Axis	Winner	Rationale
Multi-tenancy	Polaris	catalogs-as-tenants gives clean isolation + billing
Governance	Polaris	ANSI-style GRANT / REVOKE matches security team's mental model
Versioning	Polaris	snapshot rollback is enough today; WAP can be added later
Engine fit	tied	REST spec means every engine works
Final	Polaris	wins on the axes that bite

Why this works — concept by concept:

engine-first axis — start by enumerating every engine that will touch the lake. If they all speak REST, every modern catalog is on the table; if one only speaks HMS, you have a migration problem regardless of catalog choice.
tenant-first axis — count the teams + the cost-allocation requirement. More than 2 tenants + per-tenant billing → Polaris. One tenant → Glue / Tabular is enough.
versioning-first axis — if WAP is contractual (regulatory, audit, end-of-month reconciliation), Nessie wins. If "snapshot rollback within 24h" is enough, the snapshot-only catalogs are fine.
governance-first axis — does the security team want ANSI GRANT / REVOKE audit trails? Polaris ships them. Does the security team only care about IAM? Glue is enough.
hybrid escape hatch — even after picking a primary, you can layer Nessie behind a subset of catalogs via federation if WAP becomes mandatory later. The decision is reversible because REST is the lingua franca.
Cost — O(4) axes to score, O(3) catalogs to score, plus one operational-cost tie-breaker. The whole decision fits on one whiteboard and on one slide.

SQL
Topic — dimensional-modeling
Lakehouse modelling problems (SQL)

Practice →

Catalog cheat sheet — REST, Nessie, Polaris recipes

List namespaces over HTTP. GET /v1/{prefix}/namespaces with Authorization: Bearer <token>. Same call from curl, Spark, Trino, DuckDB.
Load a table + get vended creds. GET /v1/{prefix}/namespaces/{ns}/tables/{tbl}. Response includes metadata, metadata-location, and a credentials block.
Atomic commit. POST /v1/{prefix}/namespaces/{ns}/tables/{tbl}/commits with requirements: [assertTableUUID, assertRefSnapshotId] and updates: [add-snapshot, set-current-snapshot]. CAS failure returns 409.
OAuth2 token exchange. POST /v1/oauth/tokens with grant_type=client_credentials. Bearer expires in 1h; refresh by re-posting.
Nessie WAP. CREATE BRANCH dev FROM main → write → audit on dev → MERGE BRANCH dev INTO main. Drop on failure for O(1) rollback.
Nessie tag. CREATE TAG release_YYYY_MM_DD IN nessie FROM main. Immutable; protect with access rules.
Nessie rollback. ASSIGN BRANCH main IN nessie TO TAG release_YYYY_MM_DD. O(1) metadata write.
Nessie multi-table commit. Multiple INSERTs on one branch + one MERGE. Either all tables advance or none do.
Polaris tenant onboarding. CREATE CATALOG t_team + CREATE ROLE team_owner + GRANT MANAGE ON CATALOG t_team TO ROLE team_owner + CREATE PRINCIPAL svc_team_prod + GRANT ROLE team_owner TO PRINCIPAL svc_team_prod.
Polaris cross-catalog grant. GRANT SELECT ON ALL TABLES IN CATALOG dim TO ROLE team_owner. Lets the team read shared dims without copies.
Glue REST mode. Set iceberg.rest-catalog.uri=https://glue.{region}.amazonaws.com/iceberg and use IAM SigV4 auth. No re-import required.
Pick the catalog by the axis that bites. Multi-tenancy → Polaris. WAP / multi-table → Nessie. Single team + AWS-native → Glue REST.

Frequently asked questions

What is the Iceberg REST catalog?

The Iceberg REST catalog is an OpenAPI-3 specification published alongside the Apache Iceberg project that defines the HTTP surface for an Iceberg-aware metadata store. It standardises endpoints for createNamespace, listTables, loadTable, commitTable (with optimistic compare-and-set), vendCredentials, and OAuth2 token exchange. Every modern catalog — vanilla Tabular, AWS Glue REST mode, Snowflake's Polaris, Project Nessie 0.80+, Databricks Unity — implements this spec, and every modern engine (Spark, Trino, Flink, DuckDB, ClickHouse, Snowflake, BigQuery) speaks it as a client. The point of the spec is that picking a catalog becomes a configuration change rather than a re-import.

How is Project Nessie different from a vanilla REST catalog?

Nessie adds git semantics on top of the Iceberg catalog surface. The four primitives — branches, tags, atomic multi-table commits, and merge / cherry-pick — turn the catalog into a version-controlled reference store rather than a snapshot log. A Nessie commit can advance many tables at once (multi-table atomic), which a vanilla REST catalog cannot, and any reference can be reassigned in O(1) for instant rollback. Nessie 0.80+ also speaks the Iceberg REST spec, so the same server can serve git-style clients (Spark + Nessie SQL, Dremio Sonar) and vanilla REST clients (Trino, DuckDB, Snowflake) simultaneously.

Is Apache Polaris open source?

Yes. Apache Polaris is an Apache Software Foundation incubator project, donated by Snowflake in mid-2024 and now governed by the ASF rather than by a single vendor. The server is Java, the metadata store is Postgres, and the wire protocol is the standard Iceberg REST spec. Snowflake offers a fully managed Polaris service for customers who do not want to run it themselves, but self-hosted Polaris is identical to the managed version. Community drivers from Spark, Trino, Flink, and Dremio all use the open REST spec — there is no Polaris-specific protocol underneath.

Can one engine talk to multiple catalogs?

Yes — every modern engine supports multiple catalog registrations and can join across them. Spark uses session-level catalog configs (spark.sql.catalog.prod, spark.sql.catalog.dim, etc.). Trino registers one connector instance per catalog. Flink supports CREATE CATALOG for each. The engine then resolves prod.sales.orders and dim.customer against their respective catalogs and joins them in a single query. This is how the hybrid pattern works in practice: register Polaris and Nessie side-by-side, route tenant tables through Polaris and WAP-critical tables through Nessie, and let the engine join across them transparently.

Do I still need a Hive metastore?

In 2026, no — for new lakehouse builds. Every major engine treats HMS as a legacy connector and ships an Iceberg REST connector as the default. HMS lacks atomic CAS on the current snapshot pointer and cannot vend short-lived credentials, so it fails the two foundational tests of a real lakehouse catalog. Existing HMS deployments can migrate table-by-table via Iceberg's snapshot import: register the HMS table in the REST catalog (Polaris / Nessie / Glue REST), point engines at the new catalog name, and decommission the HMS row once readers have moved.

Which catalog should I pick for a multi-tenant analytics platform?

Apache Polaris — its catalogs / principals / roles / grants model is purpose-built for multi-tenant Iceberg. Onboarding a new team is CREATE CATALOG t_team + CREATE ROLE team_owner + GRANT MANAGE ON CATALOG t_team TO ROLE team_owner + CREATE PRINCIPAL svc_team_prod + GRANT ROLE team_owner TO PRINCIPAL svc_team_prod. Cross-tenant sharing is one GRANT SELECT ON ALL TABLES IN CATALOG dim per tenant. Per-tenant billing is trivially clean because each catalog has its own S3 prefix. Vended credentials kill the long-lived-IAM-key blast-radius problem. If you also need WAP or multi-table atomic commits, layer Nessie behind a subset of catalogs via federation — the REST spec makes this hybrid feasible.

Practice on PipeCode

Drill the dimensional modelling practice library → for the fact / dimension shapes that catalogs serve.
Rehearse on ETL design problems → for WAP, atomic commits, and the cross-engine ingest patterns.
Sharpen system design drills → for multi-tenant catalog architecture.
Layer the event modelling library → for the streaming sinks that Flink writes into Iceberg.
Stack the slowly changing data practice → for the SCD-2 patterns that benefit most from multi-table atomic commits.
For the broader surface, read top data engineering interview questions →.
Stack the prerequisites with the only 5 skills you need to become a data engineer →.
Sharpen the design axis with the ETL system design for data engineering interviews course →.
For long-form schema craft, work through data modelling for DE interviews →.

Pipecode.ai is Leetcode for Data Engineering — every catalog pattern above ships with hands-on practice rooms where you write the multi-table SCD-2, the WAP branch, and the per-tenant Polaris GRANT against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you never have to wonder whether your Iceberg REST catalog answer actually holds up under a senior interviewer's "what happens if two writers race?" probe.

Practice dimensional modelling now →
ETL design drills →