DEV Community

Prithvi S
Prithvi S

Posted on

How Apache Polaris Vends Credentials: Securing Data Access Without Sharing Keys

The modern data warehouse demands a fundamental shift in how we think about access control. When you build multi-tenant systems at scale, the traditional approach - distributing long-lived API keys or database credentials - becomes a security nightmare. Apache Polaris solves this elegantly: vend temporary, scoped credentials on demand, revoke instantly, audit everything.

The Problem: Why Long-Lived Credentials Don't Scale

At Netflix, Cloudera, or any major data platform, you're managing access across hundreds of users, services, and applications. If you hand out permanent API keys:

  • Revocation is impossible - a compromised key stays valid until you manually rotate it
  • Audit trails are fuzzy - you don't know which key accessed which data when
  • Compliance is painful - SOC2, HIPAA, PCI-DSS demand temporal, traceable access
  • Key rotation is a nightmare - updating thousands of clients, coordinating across teams
  • Scope is too broad - a key that works today still works tomorrow, even if access should have expired

This is why cloud providers moved away from permanent credentials. AWS uses temporary STS tokens. GCP uses short-lived access tokens. Azure has managed identities. The pattern is clear: trust should be ephemeral, scoped, and revocable.

Polaris applies this principle to data catalogs and table access.

How Polaris Vends Credentials

Polaris is an open-source, REST-first Iceberg catalog that implements the Iceberg REST API. Unlike traditional catalogs (which require direct database access or assume long-lived credentials), Polaris mints temporary credentials on every access.

Here's the flow:

1. Authorization Check (Who Are You? What Can You Do?)

When a client requests data access, Polaris first checks:

  • Is this principal (user/service) authenticated?
  • Do they have a role with permission to access this table?
  • Is the access read-only or read-write?

This uses Polaris's two-tier RBAC model:

  • Principal Roles - assigned to service principals (identities)
  • Catalog Roles - define actual permissions (TABLE_READ_DATA, TABLE_WRITE_DATA, etc.)

Example: A data analyst gets role analyst_prod, which is granted TABLE_READ_DATA on catalog.sales.transactions. A service account gets role etl_writer, which gets TABLE_WRITE_DATA on catalog.etl.staging.

2. Storage Configuration Lookup

Polaris queries its configuration:

  • Which cloud provider hosts this table? (AWS S3, GCS, Azure Blob)
  • What credentials should be used for minting? (Polaris's service role)
  • Are there any table-specific overrides?

3. Credential Minting

Here's where the magic happens. Polaris calls the cloud provider's API to mint temporary, scoped credentials:

For AWS (S3):

assume-role(
  role_arn=polaris-service-role,
  session_name=client-session-xyz,
  session_duration=15m,
  policy=restrict-to-s3://bucket/table-path/
)
Enter fullscreen mode Exit fullscreen mode

Returns: temporary AWS credentials (access key + secret key) valid for 15 minutes, scoped to the specific table path.

For GCS:

create-service-account-key(
  service_account=polaris-sa@project.iam.gserviceaccount.com,
  lifetime=15m,
  custom-claims={ "resource": "gs://bucket/table-path" }
)
Enter fullscreen mode Exit fullscreen mode

Returns: short-lived JWT valid for 15 minutes, scoped to the table path.

For Azure:

get-managed-identity-token(
  resource=https://storage.azure.com,
  lifetime=15m,
  scope=/subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.Storage/storageAccounts/zzz/blobServices/default/containers/table-path
)
Enter fullscreen mode Exit fullscreen mode

Returns: short-lived bearer token (OAuth2) valid for 15 minutes, scoped to the container path.

4. Scope Restriction

The credentials are scoped to:

  • Path - exact table location (e.g., s3://data/catalog/table/)
  • Operations - read-only (GET) vs read-write (GET, PUT, DELETE)
  • Duration - typically 15 minutes (configurable)

A client can't use these credentials to:

  • Access other tables
  • Write data to a read-only table
  • Perform actions after expiration

5. Return to Client

Polaris returns the temporary credentials to the client. The client's query engine (Spark, Trino, Presto, DuckDB, etc.) receives these credentials and uses them to read/write data directly to object storage.

No long-lived secrets are distributed. The client never sees Polaris's service credentials.

Why This Matters

Security Benefits

  1. Instant Revocation - Delete a principal's role, all future requests are denied instantly
  2. Fine-Grained Access - per-table, per-operation, per-principal permissions
  3. Auditability - every credential mint event is logged (who, when, which table, read/write)
  4. Compliance-Ready - temporal credentials, immutable audit trails, no shared secrets
  5. Blast Radius - if a credential leaks, it's only valid for 15 minutes and only for one table

Operational Benefits

  1. No Credential Rotation - credentials are automatically rotated every request
  2. No Key Distribution - no need to distribute, store, or rotate permanent keys
  3. Multi-Cloud Ready - same API works with S3, GCS, Azure, MinIO
  4. Client Simplicity - clients just receive credentials and query - they don't manage them

Business Benefits

  1. Compliance Aligned - meets SOC2, HIPAA, PCI-DSS, FedRAMP requirements
  2. Cost Control - audit who accessed what, charge accordingly
  3. Governance - enforce data mesh principles (teams own their data, Polaris mediates access)

Real-World Example: Data Mesh at Scale

Imagine you're running a data mesh with 50 teams, each owning their own datasets. Without Polaris:

  • Each team issues permanent API keys to consumers
  • Keys spread across configuration files, CI/CD pipelines, notebooks
  • A leaked key compromises an entire dataset
  • Revocation requires manual updates across dozens of systems
  • Audit trails are incomplete (keys used by multiple systems)

With Polaris:

  • Each consumer requests access via Polaris API (authenticated via OIDC, OAuth2, mTLS)
  • Polaris checks if consumer's identity has permission
  • Polaris mints a 15-minute credential scoped to the specific table and operation
  • Consumer queries the data with the temporary credential
  • On next request, the credential is already expired - a new one is minted
  • Revoke a consumer's role, and all future requests fail instantly
  • Audit logs show exactly which identity accessed which table at what time

Version 1.3.0 Features (January 2026)

Recent Polaris releases added:

  1. Federated Credential Vending - Polaris can mint credentials for external catalogs (Snowflake, AWS Glue) instead of clients using their own credentials
  2. OPA Integration - externalize authorization logic to Open Policy Agent for complex policies
  3. Generic Tables - support Delta Lake and Hudi alongside Iceberg
  4. Metrics Reporting - pluggable framework to report table metrics (row/byte counts, commits)

Getting Started

Polaris is on Apache Foundation:

If you're building a data platform, data mesh, or multi-tenant system, Polaris's credential vending model is worth studying. It's a pattern that applies beyond Iceberg - any system managing shared resource access can benefit from temporal, scoped credentials.


About the author: I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. Follow my work on GitHub.

Top comments (0)