DEV Community

Cover image for Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)
Alex Merced
Alex Merced

Posted on

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Get Data Lakehouse Books:

Lakehouse Community:


The past two weeks have seen active development and discussion across the mailing lists of Apache Iceberg, Polaris, Arrow, and Parquet. This digest highlights notable design discussions, release planning, and community updates from each project—so you can stay on top of the lakehouse ecosystem’s evolution.

Apache Iceberg

☕ Java 17 Minimum Requirement

Jean-Baptiste Onofré proposed raising the minimum supported Java version for Iceberg to JDK 17. This proposal received widespread agreement and is expected to move forward.

  • Why it matters: Enables modern Java features and aligns Iceberg with other cutting-edge data infrastructure projects.
  • Impact: Users still on Java 11 should begin preparing for an upgrade.

Discussion thread

Format V4: Indexing and Commit Optimizations

Design conversations around Iceberg Format V4 continued with emphasis on two areas:

  • Native Indexing Support A revived discussion explored integrating indexing directly into the Iceberg table format to support faster lookups.

Indexing proposal thread

  • One-File Commit Proposal While this discussion began earlier, it wrapped up in this period. The aim is to reduce commit overhead by consolidating manifests into a single file.

One-file commit thread

REST Catalog Enhancements

Several key REST Catalog improvements were discussed and voted on:

  • ETag Support: Added to CommitTableResponse for optimistic concurrency.
  • Idempotency Keys: Introduced to safely retry REST operations.
  • HTTP 429 Standardization: Formalized handling of rate limiting.
  • Storage Credentials in Planning Responses: Enables catalogs to return temporary credentials for secure data access.

ETag vote thread

Idempotency keys discussion

Storage credentials planning vote

Flink Connector & View Support

  • FlinkSink Metadata Enhancements

    Proposed: allow writing user-defined stats (e.g. row count) during Flink writes.

  • Register View API

    Early discussion about adding support to register logical views as catalog entities.

Metadata proposal

Register View discussion

👥 Community Updates

  • New PMC Members: Kevin Liu and Matt Topol were added to the Project Management Committee.
  • Meetup Announced: Iceberg Community Meetup held in Amsterdam on Dec 11.
  • Release Activity: Patch release 1.10.1 planned, focusing on bug fixes and stability.

PMC addition announcement

Meetup announcement


Apache Polaris

1.3.0-incubating Release Approved

The Polaris community finalized and approved the release of version 1.3.0-incubating after resolving issues in the initial release candidate. The version includes:

  • Generic Table GA: Graduation of the generic table feature to production-ready status, allowing seamless cataloging of external table formats like Hudi and Delta Lake.
  • Improved Cloud Integration Tests: Strengthens stability in cloud-native environments.
  • Bug Fixes and Reliability Enhancements

Release vote result

RC0 cancellation and RC2 vote

🔔 Event Listener Refactor

Refinements were made to Polaris's catalog event model:

  • Simplified Event Hooks: Deprecated Before/AfterCommitTableEvent in favor of a cleaner notification architecture.
  • Multiple Listener Support: Differentiated notification vs. interceptor behavior to prevent listener conflicts and enhance modularity.

Listener simplification thread

Interceptors vs. notifications discussion

♻️ Idempotency and Retry Support

In alignment with Iceberg's enhancements, Polaris initiated discussions to:

  • Introduce Idempotency Keys in commit APIs
  • Ensure safe retry mechanisms in case of transient failures
  • Enhance robustness of the core catalog and REST APIs

Idempotency design thread

☁️ AWS Integration Improvements

Several threads focused on expanding Polaris support for AWS authentication patterns:

  • STS AssumeRoleWithWebIdentity: Allows AWS OIDC-based token flows (used in EKS, notebooks, etc.)
  • AWS China ARN & KMS Support: Ensures compatibility with AWS partition differences and encryption configuration.

Web Identity auth thread

🛠️ Tooling and Dev Experience

  • Python CLI Packaging: Polaris CLI being prepped for PyPI and nightly releases.
  • Release Automation for Polaris Tools: Scripts and GitHub Actions added to streamline CLI and support package releases.
  • Early UI Proposal: Community exploring a user-friendly UI for catalog introspection and onboarding.

CLI packaging update

👨‍👩‍👧‍👦 Community and Governance

  • Community Sprint: Virtual collaboration event scheduled for Dec 16 to tackle bugs, docs, and onboarding.
  • NoSQL Sync Meeting: Held Dec 2, focusing on extending Polaris capabilities to non-relational workloads.
  • Incubator Progress: Polaris shared updates for its Apache Incubator status and roadmap alignment.

Sprint announcement

Incubator report thread


Apache Arrow

📦 Format Evolution: TimestampWithOffset

The Arrow community voted to add a new canonical type: TimestampWithOffset. This enhancement allows better timezone handling by encoding the UTC offset directly with each timestamp value.

  • Why it matters: Avoids ambiguity in interpreting timestamps across systems with different local times or daylight saving settings.
  • Vote passed unanimously, signaling strong consensus.

Vote thread

🧪 Experimental: 128-bit Timestamps

An earlier thread explored adding support for 128-bit picosecond-level timestamps, aimed at use cases requiring extreme time resolution (e.g., scientific or financial data).

  • This is still under discussion and not yet planned for inclusion.

Discussion thread

🚀 Language Releases

  • Go: Arrow Go 18.5.0 release candidate (RC0) published and under vote.
  • Rust: Arrow Rust 57.1.0 was recently released, with improvements to bitwise performance being considered.
  • Java: Discussion started on Arrow Java 20.0.0, possibly decoupling its versioning for more agile releases.

Arrow Go RC0 vote

Arrow Rust changelog

Arrow Java release planning

🔄 Governance & Meetings

  • New PMC Chair: Antoine Pitrou, one of Arrow’s co-creators, named new project chair.
  • Community Meetings: Active meetings continued across Arrow working groups, including Arrow-R and general syncs.
  • New Proposal (DACP): Early concept to introduce a “Data Access and Collaboration Protocol” to Arrow ecosystem.

PMC chair announcement

DACP intro thread


Apache Parquet

🧵 String Column Layout Optimization

Micah Kornfield started a design thread to optimize string/byte array page layouts:

  • Proposed sharing compressed dictionaries across multiple pages using FSST (Finite State Entropy) encoding.
  • Goal: Improve scan speed and reduce CPU overhead for large string columns.

Discussion thread

☕ Java 1.17.0 Release Planning

Planning is underway for Parquet Java 1.17.0, which includes:

  • Dropping Java 8, moving to Java 11 minimum
  • Accumulated improvements and minor bug/security patches

Release prep thread

🔍 Metadata Cleanup: Deprecating file_path

A proposal was made to deprecate the file_path field in column chunk metadata:

  • Considered obsolete in modern Parquet workflows
  • Will remain for backward compatibility but no longer actively used

Deprecation thread

📐 Toward Parquet Format V3?

  • Developers expressed intent to finalize all outstanding v2 features (e.g., bloom filters, page checksums).
  • Early hints suggest the community may begin scoping a Parquet V3 format in 2026.

Format v2 finalization thread

📆 Community Syncs

  • Despite the U.S. holiday, the weekly sync on Nov 26 went ahead as planned.
  • Reflects the globally distributed and consistent engagement of Parquet contributors.

Meeting confirmation thread

📌 Final Thoughts

This period marked steady evolution across the lakehouse projects:

  • Iceberg is refining its API and planning for V4 features like native indexing.
  • Polaris is progressing toward graduation with mature features and API resiliency.
  • Arrow continues to invest in format flexibility and multi-language consistency.
  • Parquet is optimizing performance while laying groundwork for future format innovations.

Stay tuned for more updates in the next dev digest!

Top comments (0)