DEV Community

Cover image for Apache Data Lakehouse Weekly: April 23–29, 2026
Alex Merced
Alex Merced

Posted on

Apache Data Lakehouse Weekly: April 23–29, 2026

Three weeks past the Iceberg Summit, the lakehouse projects shifted from in-person alignment back into shipping mode. Polaris cut its 1.4.0 release and immediately followed up with a Python CLI 1.4.0, Arrow shipped its 24.0.0 major release and kicked off an arrow-rs 58.2.0 vote, and Parquet's design lists stayed dense with proposals on footers, page encoding, and a new java release discussion. Iceberg's dev list was quieter this week as contributors digested summit follow-ups and continued narrowing on V4 design questions in the background.

Apache Iceberg

The post-summit wave of formal proposals continued translating into design work this week. The V4 metadata.json optionality direction that has anchored multiple syncs — treating catalog-managed metadata as a first-class supported mode while keeping static-table portability through explicit opt-in semantics — is still the defining V4 design conversation, with Anton Okolnychyi, Yufei Gu, Shawn Chang, and Steven Wu continuing to push edge cases on portability and Spark driver behavior. The single-file commits proposal that Russell Spitzer and Amogh Jahagirdar have been advancing remains on track for a formal write-up, with the latency and metadata footprint reductions driving urgency.

Péter Váry's efficient column updates proposal for wide tables continued attracting collaboration. The design — write only the columns that change on each commit, then stitch the result at read time — is squarely aimed at petabyte-scale feature stores with thousands of embedding and model-score columns, and the I/O savings make it one of the more practically grounded V4 proposals on the list. Anurag Mantripragada and Gábor Kaszab are working alongside Péter on POC benchmarks to support the formal proposal that should land on the dev list in the coming weeks.

On the Rust side, the Iceberg Rust 0.9.0 release shipped earlier this development cycle and continues to anchor downstream adoption discussions, with its DataFusion integration making it a serious option for teams that want Iceberg without a JVM dependency. Iceberg Summit 2026 session recordings are also rolling out on the project's YouTube channel this week, giving the global community access to the V4 design talks, the vendor panel, and the production case studies from Apple, Bloomberg, Pinterest, and others. The AI contribution policy that Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun pushed through March is still expected to land as published guidance covering disclosure requirements and code provenance standards.

Apache Polaris

Polaris had its biggest release week of the year. Adnan Hemani announced Apache Polaris 1.4.0 on April 23, the project's first major release as a graduated top-level project. Dmitri Bourlatchkov, Yufei Gu, Xi Wen, and Alexandre Dutra all weighed in with congratulations and follow-up notes on packaging and distribution. Right behind it, Adnan kicked off and shepherded the Apache Polaris Python CLI 1.4.0 RC2 vote, which collected binding +1s from Yufei Gu, Honah J., and Jean-Baptiste Onofré, with Yong Zheng adding non-binding support. The Python CLI 1.4.0 release shipped on April 28, completing the back-to-back release pair. Jean-Baptiste also confirmed in a HEADS UP note that the project is now back on a monthly release cadence after the graduation transition.

The release had its share of post-launch fires. Alexandre Dutra opened threads on Helm chart repo inconsistency after the 1.4.0 release, a release workflow failure in step 4, and an Artifact Hub request for official status. A GitHub thread on KMS-related errors after bumping to 1.4.0 surfaced a real upgrade bug that drew immediate attention. Yufei Gu took the lead on triaging most of these, and the discussions are doing exactly what a healthy post-release cycle should — surfacing rough edges before they reach more users.

Design discussions stayed active alongside the release work. EJ Wang's DISCUSS thread on AGENTS.md for Polaris opened a conversation about adding agent-readable repository metadata, picking up engagement from Yufei Gu. Yufei separately started a discussion on narrowing the scope of SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION, which Dmitri Bourlatchkov and Dennis Huo dug into. ITing Lee's proposal to add OpenLineage to Polaris continued attracting feedback from Adnan Hemani, Jean-Baptiste Onofré, Yufei Gu, and Michael Collado. Alexandre Dutra's URL path decoding thread and his PolarisPrivilege fields and grant validation discussion both kept multiple contributors engaged through the week, and Selvamohan Neethiraj raised a PolarisPrincipal user attributes server-side bug that Alexandre and Yufei traced through.

Apache Arrow

Arrow had its own back-to-back release week. Raúl Cumplido announced Apache Arrow 24.0.0 on April 22, closing out the 24.0.0 RC0 vote that spanned mid-April. Matt Topol followed with the Apache Arrow Go 18.6.0 RC0 vote on April 22 and announced the release result on April 28, with Pedro Matias, Ian Cook, David Li, and Bryce Mecum carrying the verification work. Andrew Lamb then opened the arrow-rs 58.2.0 RC1 vote on April 28, with Bryce Mecum, Ed Seidl, Jeffrey Vo, and Raúl Cumplido moving quickly through verification — finishing what last week's newsletter flagged as the next ship to watch.

Beyond releases, the design conversations stayed lively. Emil Sadek opened a DISCUSS thread on an ADBC Logo Proposal with Nic Crane, Julian Hyde, and Rusty Conover weighing in on visual identity for the database connectivity standard. Benjamin Philip kicked off a new DISCUSS thread on Arrow Erlang's grant documents, continuing the project's expansion into more language ecosystems. The pyarrow-stubs donation vote that Rok Mihevc opened on April 14 stayed active, drawing additional support this week with Rok pushing for a final tally. Mandukhai Alimaa's earlier proposal for a canonical BigDecimal extension type and Andrew Lamb's arrow-rs security policy discussion both continued generating engagement as the project tightens its production posture.

Apache Parquet

Parquet's lists were as dense as any project's this week. Ismaël Mejía opened a thread soliciting code reviews for Java performance optimization work, with Steve Loughran picking it up immediately. Manu Zhang's DISCUSS thread on a new parquet-java release drew sustained engagement from Steve Loughran, Aaron Niskode-Dossett, Fokko Driesprong, Julien Le Dem, Gang Wu, and Rahil C — covering both the timing question and what should ship in the next release. Julien Le Dem's Parquet sync on April 22 drew Manu Zhang and Micah Kornfield into the agenda discussion.

The format-level proposals continued to evolve. Will Edwards's DISCUSS thread on an alternative to the FlatBuffer footer with a lightweight byte-offset index kept pulling in design feedback from Andrew Lamb, Ed Seidl, Jan Finis, Alkis Evlogimenos, Raphael Taylor-Davies, Andrew Bell, and others. Ed Seidl's proposal to make path_in_schema optional attracted commentary from Gang Wu, Steve Loughran, and Micah Kornfield. Andrew Lamb's thread on where VariantJsonParser should live — touching the boundary between Parquet and Iceberg's variant tooling — continued with Steve Loughran and Gang Wu. Jan Finis's question on whether a too-long RLE bitpack at the end of a page is valid drew careful answers from Raphael Taylor-Davies and Micah Kornfield, the kind of spec-edge clarification that matters for cross-implementation interop. Milan Stefanovic's Geospatial CRS string format clarification continued threading toward closure with Dewey Dunnington and Micah Kornfield.

Cross-Project Themes

This week's clearest pattern is post-graduation Polaris finding its operational rhythm. The 1.4.0 release plus the Python CLI 1.4.0, the return to a monthly cadence, and the visible upgrade-path bugs and Helm packaging issues are all the work of a project growing into its TLP independence. The fact that contributors are surfacing problems publicly and triaging them on the dev list — rather than routing through a parent project — is itself the marker of a healthy graduation.

The release wave across projects also reflects how synchronized the lakehouse stack has become. Arrow 24.0.0 plus arrow-rs 58.2.0 plus arrow-go 18.6.0 plus Polaris 1.4.0 plus Polaris Python CLI 1.4.0 all landing within a single week is a coordination story. Engines and tools downstream of these libraries — Spark, Trino, Dremio, DataFusion, DuckDB, Snowflake — can pick up the new versions in a coherent batch rather than chasing staggered upgrades across half a dozen vendors. The format-level design work in Parquet (footers, optional path_in_schema, variant tooling location) and the V4 design work in Iceberg (metadata.json optionality, single-file commits, efficient column updates) are also starting to rhyme: both communities are picking apart assumptions baked into v1 and v2 spec design and asking what a leaner, AI-workload-aware format looks like.

Looking Ahead

Watch the arrow-rs 58.2.0 RC vote close out in the coming days. Polaris should publish 1.4.1 or move toward 1.5.0 planning given the monthly cadence commitment, and the AGENTS.md discussion is likely to firm into a concrete proposal. The Polaris OpenLineage RFC has the volume of feedback it needs to move toward implementation. On the Iceberg side, the formal V4 single-file commits write-up and the published AI contribution policy remain the next concrete deliverables to track. Iceberg Summit 2026 talk recordings will continue rolling out on YouTube, and the parquet-java release discussion should converge on a target version.


Resources & Further Learning

Get Started with Dremio

Free Downloads

Books by Alex Merced

Top comments (0)