Alex Merced

Posted on Nov 5

The State of Apache Iceberg, Polaris, and Arrow: October–November 2025

#dataengineering #news #database #opensource

Get Data Lakehouse Books:

Lakehouse Community:

The Apache developer mailing lists are where the real work happens. Between October and early November 2025, contributors across Apache Iceberg, Apache Polaris, and Apache Arrow exchanged hundreds of messages shaping the next stage of these cornerstone data projects. For data engineers and architects, these threads reveal not just incremental updates, but the direction of the entire open data ecosystem.

In this period, the Iceberg community balanced community-building with deep technical planning — from its first-ever Iceberg Summit to new proposals on idempotent REST operations and materialized views. Polaris pushed forward with its 1.2.0 release, introducing enterprise-grade features like granular authorization and improved catalog federation. Meanwhile, Arrow continued to demonstrate mature project cadence with the 22.0.0 release, aligning its roadmap more closely with the Python ecosystem.

This blog summarizes the most important conversations from each project’s developer list between October 1 and November 5, 2025. It highlights new proposals, evolving architectures, and the collaborative discussions that keep these projects at the core of modern data platforms.

Apache Iceberg: Building Reliability and Expanding the Format

Apache Iceberg’s developer list reflected a community focused on both reliability and growth. The most active discussions during October and November revolved around improving core behavior, defining new features, and strengthening collaboration across the ecosystem.

Planning the Apache Iceberg Summit

A major highlight was the announcement of the Apache Iceberg Summit, planned for 2026. Contributors proposed forming a volunteer committee to define session tracks and select talks. The initiative drew broad engagement from PMC members and community leaders, signaling Iceberg’s maturity and its growing role as a central open standard for table formats. The summit aims to bring together developers, vendors, and users around practical Iceberg adoption stories and emerging technical challenges.

Idempotent REST Operations

Another key thread centered on idempotent REST mutations. Today, clients like Spark or Flink may resend failed POST requests to a catalog, risking duplicate commits. The proposal introduces idempotency keys, allowing retries without breaking table state. This change would make Iceberg catalogs more reliable in distributed and streaming environments—an important step toward exactly-once semantics across engines and clients.

Materialized Views Proposal

Iceberg developers also opened a design discussion on standardized materialized views. The goal is to define a cross-engine approach for derived datasets that maintain Iceberg’s transactional guarantees. The conversation covered how views should store definitions, how refresh operations work, and how query engines interpret metadata. While still early, this proposal marks Iceberg’s evolution beyond tables toward richer analytical structures.

Broader Ecosystem Progress

Complementing these feature discussions, the ecosystem continued to expand:

PyIceberg 0.10.0 introduced API refinements and improved Arrow integration.
Rust 0.7.0 reached release candidate stage, strengthening non-Java interoperability.
Conversations began around dropping Spark 3.4 and aligning with Java 21, ensuring future performance and security.

Together, these efforts show Iceberg balancing stability with innovation—refining its foundation while preparing for a new phase of adoption across multiple runtimes and catalogs.

Apache Polaris: Strengthening Governance and Enterprise Readiness

While Iceberg focused on format evolution, Apache Polaris — the open catalog for managing Iceberg tables — spent October and November refining its architecture and releasing a major version. The developer list showed a project maturing quickly within the Apache Incubator, with clear progress toward operational robustness and enterprise-grade features.

The 1.2.0 Release

The biggest milestone was the approval of Apache Polaris 1.2.0-incubating. This release brought several production-focused improvements:

Granular authorization: New privileges such as TABLE_ADD_SNAPSHOT give administrators precise control over table actions.
Federated catalog RBAC: Fine-grained access control now extends across sub-catalogs when integrating with systems like Hive or Glue.
Credential reset management API: Operators can now reset principal credentials securely through an optional endpoint.
Event persistence: Catalog events can be persisted or streamed to external sinks for monitoring or auditing.

These updates were heavily discussed on the mailing list, with contributors coordinating multiple release candidates, testing integrations, and verifying signatures before the final vote passed. The result is a catalog service that’s more secure, more observable, and easier to integrate with enterprise systems.

The Authorizer Chain Proposal

Polaris developers also debated a major architectural enhancement: introducing a chain of authorizers. The idea allows multiple authorization modules to run sequentially — for example, combining an external engine like Open Policy Agent (OPA) with Polaris’s internal authorizer. Each module can be marked as “required” or “optional,” providing flexible policy enforcement. This design mirrors the modularity of authentication chains in other enterprise frameworks, and was met with strong interest for its extensibility.

Internal Refactoring and Performance

Additional discussions focused on refactoring the metastore manager to be request-scoped, reducing coupling and improving testability. Another proposal sought to add indexes to the JDBC metastore, improving lookup speed for large catalogs. Combined with ongoing work on operational metrics, these efforts aim to make Polaris more performant and transparent for production operators.

Community Growth and Governance

Polaris also saw progress on its incubation journey. The project’s quarterly report reflected steady contributor growth, with new PPMC members added and an active pipeline of features. Mailing list discussions also included naming decisions for the CLI and Python client packages, underscoring the team’s focus on polish and consistency as it approaches graduation.

Polaris’s activity this fall shows a project shifting from rapid prototyping to operational excellence. With stronger governance, a hardened feature set, and growing community engagement, it’s positioning itself as the enterprise-grade Iceberg catalog for multi-cloud environments.

Apache Arrow: Stability, Coordination, and Ecosystem Growth

The Apache Arrow developer list remained one of the most active in the Apache data ecosystem during October and November. The discussions reflected a mature project maintaining consistent release cadence, expanding its multi-language ecosystem, and fine-tuning coordination with downstream communities like Python.

The 22.0.0 Release

On October 24, the community announced Apache Arrow 22.0.0, a major quarterly release resolving more than 200 issues. The release spanned all components — C++, Java, Python, R, and Go — with improvements to compute kernels, dataset readers, and Arrow Flight RPC. Contributors on the dev list coordinated the vote process, verified build artifacts, and confirmed cross-platform packaging. The result was another stable release that underscored Arrow’s reliability as the columnar backbone for modern analytics frameworks.

Aligning with Python’s Release Schedule

Shortly after 22.0.0 shipped, maintainers opened a forward-looking discussion about realigning Arrow’s release cycle with CPython’s annual schedule. Today, PyArrow releases often lag a few weeks behind new Python versions, delaying compatibility for users upgrading early. The proposal suggests moving Arrow’s feature freeze from October to August, allowing PyArrow wheels to be ready ahead of Python’s fall releases. This shift would make Arrow more responsive to the Python community’s needs and reduce downstream friction for libraries that depend on it.

Specification and Component Updates

While major specification work had slowed after the introduction of new data types in previous cycles, developers continued refining smaller aspects of the Arrow spec. Component-level discussions included maintenance releases for Arrow Go 18.4.1 and early planning for new Rust and Java improvements. These threads focused on stability and cross-language consistency, key for a project that serves as the foundation for dozens of analytics and database engines.

Expanding the Language Ecosystem

The mailing list also hosted a discussion about incorporating an Erlang implementation of Arrow through the Apache IP clearance process. This marks continued expansion into new languages and environments, reinforcing Arrow’s role as the universal in-memory standard for data interchange. Additionally, the team coordinated its quarterly Apache Board report, sharing release details and community updates that highlight a steady, healthy project lifecycle.

Arrow’s development rhythm this fall showed a balance of predictability and innovation. With strong governance, a broad contributor base, and careful coordination with the Python ecosystem, Arrow continues to serve as the connective tissue of the modern data stack — powering everything from machine learning frameworks to lakehouse engines.

Key Takeaways Across Projects

Across Iceberg, Polaris, and Arrow, the Apache developer lists tell a story of momentum and maturity. Each project has entered a phase of focused refinement — strengthening core reliability, improving interoperability, and aligning more closely with the needs of production data systems.

Shared Themes

1. Reliability and Consistency

Iceberg’s idempotent REST proposal and Arrow’s stable release cadence both emphasize predictable, fault-tolerant systems. These discussions reflect how Apache projects are hardening to meet enterprise SLAs, ensuring resilience even under distributed or high-load conditions.

2. Governance and Security

Polaris’s fine-grained authorization and authorizer chaining show a strong push toward secure, policy-driven design. These efforts align with broader enterprise needs for compliance and access control within data lakehouse environments.

3. Ecosystem Alignment

Arrow’s synchronization with Python’s release schedule and Iceberg’s cross-language client work both demonstrate an awareness of the broader developer ecosystem. The goal is to make open standards not just interoperable in theory, but easy to adopt in practice across tools and runtimes.

4. Community Growth

From Iceberg’s upcoming summit to Polaris’s expanding PPMC and Arrow’s continuous multi-language expansion, all three projects show healthy participation and collaboration. The mailing lists remain the hubs where architectural ideas evolve into production features.

The Road Ahead

Looking into 2026, these discussions foreshadow several shifts:

Apache Iceberg is heading toward new spec features and a stronger governance structure around its growing ecosystem.
Apache Polaris is on track for graduation from incubation, with the foundation now built for enterprise-scale deployments.
Apache Arrow continues to cement its position as the universal columnar layer connecting compute engines, languages, and frameworks.

Together, they define the technical backbone of the open lakehouse movement. Their developer mailing lists offer a transparent view into how open collaboration continues to drive innovation in data engineering.

DEV Community