Apache Data Lakehouse Weekly: March 10–17, 2026

#data #dataengineering #news #opensource

The open lakehouse stack had one of its busiest weeks in months. Arrow Java 19.0.0 shipped and the community immediately proposed raising the Java floor to JDK 17 for the next major release — a move that would align Arrow with Iceberg's own Java modernization trajectory. Iceberg Rust 0.9.0 cleared its release vote while the Java side opened discussion on a 1.10.2 patch release. Polaris welcomed a new committer and launched threads on catalog federation and a web console. And Parquet saw a bold proposal for a new "File" logical type that could reshape how unstructured data lives inside columnar files. Across all four projects, a shared question surfaced: how should open source communities handle AI-generated contributions?

Apache Iceberg

Iceberg Rust 0.9.0 passed its release vote, with binding +1s from Kevin Liu, Renjie Liu, and Fokko Driesprong. The Rust implementation has been shipping at a rapid cadence — this is the fourth Rust release in six months — and its DataFusion integration is making it a serious alternative for teams that want Iceberg without a JVM dependency.

On the Java side, Amogh Jahagirdar proposed a 1.10.2 patch release to address bugs discovered since 1.10.1. Steve Loughran and Szehon Ho weighed in on candidate fixes. This keeps the production branch stable while contributors push forward on the 1.11.0 release cycle that will bring major REST spec enhancements and new V3 features.

Ryan Blue shared the draft March board report, drawing contributions from Péter Váry, Kevin Liu, Renjie Liu, and Matt Topol. The report covers a healthy project: an expanding contributor base, active V4 design work, and a release pipeline that now spans Java, Python, Rust, Go, and C++. The Apache Go 0.5.0 release was also announced earlier this month.

The Spark integration got significant attention. Anton Okolnychyi, Romain Manni-Bucau, and Steve Loughran debated Spark 4.1 compatibility, while Max Konstantinov proposed CREATE TABLE LIKE support — a DDL feature frequently requested by teams migrating from Hive. hemanth boyina reopened the discussion on specifying sort order in CTAS/RTAS operations, with Ryan Blue guiding the design toward alignment with the existing sort order API.

Antoni Reus Darder flagged two gaps that matter for V3 adoption: missing features in the Java API and an implementation status page that doesn't track V3 support across client libraries. These threads are a useful signal — V3 is shipping, but cross-library visibility hasn't kept pace.

Péter Váry continued advancing the efficient column updates proposal for wide tables. Steve Loughran started a new thread on benchmarking commit performance, seeking community input on methodology. And Varun Lakhyani proposed parallel scan task execution in Spark readers as a GSoC 2026 project.

huaxin gao opened what became the week's most debated thread: enforcing AI contribution guidelines. The discussion drew Holden Karau, Kevin Liu, Steve Loughran, Anurag Mantripragada, and Sung Yun into a conversation about how to maintain code quality and IP compliance as AI-assisted contributions increase. This mirrors a parallel discussion on the Polaris list — and is likely to produce coordinated policy across both projects.

With the Iceberg Summit three weeks away on April 8–9 in San Francisco, Viktor Kessler announced a new European community meetup in Basel, Switzerland, expanding the project's in-person presence to both continents.

Apache Polaris

Polaris is building its governance muscles. Jean-Baptiste Onofré circulated the draft March 26 board report — the project's first as an independent top-level ASF project. Dmitri Bourlatchkov, Yufei Gu, and Robert Stupp contributed to a report that covers graduation, community growth, and the technical roadmap.

Christopher Lambert was welcomed as a new committer, signaling continued community expansion. And the technical discussions reflected a project that's moving fast on multiple fronts simultaneously.

The most architecturally significant thread was Joy's proposal on federation support for native Iceberg catalogs like BigQuery Metastore and AWS Glue. JB Onofré, Dmitri Bourlatchkov, Prashant Singh, Madhan Neethiraj, and Yufei Gu all engaged. The idea is for Polaris to serve as a federation layer over multiple external catalogs — a capability that would make it the natural bridge for multi-cloud and hybrid Iceberg deployments rather than a replacement for cloud-native catalogs.

JB Onofré opened discussion on the first release of the Polaris Console, the project's web management UI. Getting a polished admin interface into users' hands would be a significant UX milestone and lower the barrier to adoption for teams evaluating Polaris alongside commercial catalogs.

Nándor Kollár's event persistence design thread continued with input from Alexandre Dutra and Dmitri Bourlatchkov, working through how Polaris should durably store audit and notification events. Selvamohan Neethiraj's RFC on an Apache Ranger authorization plugin drew responses from Dmitri and Yufei Gu — this integration would let enterprises manage Polaris policies through Ranger alongside Hive, Spark, and Trino, addressing a real enterprise adoption blocker.

The Iceberg catalog migrator 1.0.0 is on rc3, and the Polaris 1.4.0 release is in progress after Adnan Hemani flagged and helped resolve a bug during release materials generation.

Like Iceberg, Polaris is also debating AI contribution guidelines. EJ Wang's thread drew JB Onofré, Yufei Gu, and Dmitri Bourlatchkov into what's becoming a cross-project conversation about how open source maintains quality in an era of AI-generated pull requests.

Apache Arrow

Arrow Java 19.0.0 shipped this week, with the release vote passing through votes from Raúl Cumplido, Sutou Kouhei, Gang Wu, David Li, and others. JB Onofré immediately followed with a proposal to set JDK 17 as the minimum for Arrow Java 20.0.0. David Li responded supportively. This matters beyond Arrow — if both Arrow Java and Iceberg Java move to JDK 17, the entire lakehouse stack's Java floor rises together, unlocking modern language features and better dependency hygiene across the ecosystem.

Arrow Go 18.5.2 also passed its release vote, keeping the Go implementation's regular cadence intact for lightweight analytics and ETL pipelines.

The community meeting on March 11 provided a forum for in-person coordination. Nic Crane started a discussion on using LLMs to aid with project maintenance — another angle on the AI-in-open-source theme that's rippling across all four projects, though Arrow's framing focuses on maintenance assistance rather than contribution policy.

Sutou Kouhei raised a technical thread on Map type key/item/value field names, drawing responses from Micah Kornfield and Antoine Pitrou. Pedro Matias continued the Flight SQL prepared statement execution path discussion with David Li, working through how Flight SQL should indicate which execution path a server uses. And Rusty Conover introduced vgi-rpc, a typed Arrow RPC framework for Python, to the community.

Apache Parquet

Parquet held its community sync on March 11, and the week's threads showed a project broadening its scope beyond traditional analytics.

The most forward-looking proposal came from Burak Yavuz, who proposed a new "File" logical type for Parquet. Antoine Pitrou and Rahil C engaged with the design. If adopted, this would allow Parquet files to embed references to or metadata about unstructured files (images, documents, audio), making the format more useful for AI/ML pipelines that need to track both structured features and unstructured source data in the same table.

Will Edwards raised a practical question about multi-frame ZSTD compression, with Andrew Pilloud and Antoine Pitrou exploring compatibility implications. The thread highlights the kind of encoding-level detail work that keeps Parquet performant across diverse workloads.

Rok Mihevc introduced Hardwood, a new Java-based Parquet parser, to the community. Julien Le Dem responded positively, and Alkis Evlogimenos joined the conversation. Alternative implementations help validate the spec and can drive performance improvements back into the reference implementation.

Rahil C's ongoing thread on configuring Parquet for vector embeddings saw continued engagement from Rok Mihevc and Andrew Lamb. Combined with the "File" logical type proposal, these threads signal that Parquet is being actively shaped by AI/ML use cases — not just as a storage format for tabular data, but as infrastructure for multimodal data management.

Arnav Balyan posted an update on the FSST (Finite State Symbol Table) spec design for Parquet, advancing the string compression work that targets scan performance improvements for high-cardinality string columns. Blake Orth's thread on page-level GEO statistics continued with Andrew Lamb and Dewey Dunnington, pushing geospatial capabilities deeper into the format.

Cross-Project Themes

AI is reshaping both the tooling and the community process. Iceberg and Polaris are debating contribution guidelines for AI-generated code. Arrow is exploring LLMs for maintenance. Parquet is fielding proposals (File logical type, vector embeddings) driven by AI workloads. The open lakehouse stack is being reshaped by AI from both sides — as infrastructure for AI data and as a community navigating AI-assisted development.

Java modernization is converging. Arrow Java 19.0.0 shipped with a JDK 17 proposal for 20.0.0. Iceberg has been targeting JDK 17. Parquet already moved to Java 11 with 1.17.0. The lakehouse Java stack is modernizing in lockstep, which means cleaner dependencies and access to modern language features for the engines that depend on all three.

Format scope is expanding. Parquet's File logical type and vector embedding discussions, Iceberg's efficient column updates for ML feature stores, and Polaris's catalog federation for multi-cloud environments all point in the same direction: the open lakehouse is being asked to handle workloads far beyond traditional analytics.

Looking Ahead

The Iceberg Summit on April 8–9 is the community's marquee event, and the Basel meetup adds a European touchpoint. Watch for the Iceberg 1.10.2 patch release vote, the Polaris 1.4.0 release, and the Iceberg Rust 0.9.0 official announcement. On Polaris, the catalog federation discussion and Ranger authorization RFC will shape the project's enterprise story. Parquet's File logical type proposal could be the most consequential design discussion of the quarter if it gains traction. And Arrow Java 20.0.0 planning with JDK 17 will set the Java baseline for the entire stack.

Resources & Further Learning

Get Started with Dremio

Try Dremio Free — Build your lakehouse on Iceberg with a free trial
Build a Lakehouse with Iceberg, Parquet, Polaris & Arrow — Learn how Dremio brings the open lakehouse stack together

Free Downloads