DEV Community

Cover image for Apache Data Lakehouse Weekly: February 18–25, 2026
Alex Merced
Alex Merced

Posted on

Apache Data Lakehouse Weekly: February 18–25, 2026

Get Data Lakehouse Books:

Lakehouse Community:

This week belongs to Apache Polaris. After months of incubation votes, community reviews, and governance discussions, Polaris officially graduated to a top-level Apache Software Foundation project on February 18. That announcement set the tone for a week where governance maturity and format evolution ran in parallel across all four projects.

Apache Iceberg

The Iceberg dev list continued active threads from the prior two weeks, with V4 planning gaining momentum and a community meetup landing in Warsaw on February 18.

metadata.json in V4 — The Debate Deepens
Anton Okolnychyi's thread on making the root metadata JSON file optional in Iceberg V4 remains one of the hottest discussions in the community. The problem: writing metadata.json on every commit creates performance bottlenecks for streaming write workloads, especially with HMS and Hadoop catalog backends. Two paths remain under debate. The first allows catalogs to skip writing the file entirely. The second offloads parts of it to external files. Yufei Gu raised portability concerns, noting that Spark's static tables and driver still read the file directly from storage. The community is balancing performance gains against backward compatibility. Follow the thread

Index Support Sync Holds First Meeting
The first dedicated sync for Iceberg's native index support feature happened on February 11, organized by Huaxin Gao and Steven Wu. Native indexing is a key V4 feature and would allow faster lookups without full table scans. The meeting covered the initial design doc and implementation planning.

Snapshot Expiration Race Condition
Krutika Dhananjay flagged a concurrency bug in snapshot expiration logic. A race window exists between when the ExpireSnapshots job computes candidate snapshots and when the commit actually runs. A concurrent ref addition during that window can cause the maintenance job to remove a live snapshot. The iceberg-go project already has a fix in place. Amogh Jahagirdar asked for a reproducible test case before confirming the same bug exists in the Java implementation.

Warsaw Community Meetup
The Iceberg community held a meetup in Warsaw on February 18, continuing the global grassroots growth that has seen meetups in Atlanta, Amsterdam, San Francisco, and beyond over the past several months.

Apache Polaris

This week's headline belongs to Polaris.

Graduation to Top-Level Apache Project
Apache Polaris officially graduated from the Apache Incubator on February 18, 2026. The IPMC graduation vote passed after the PPMC consensus round received 27 binding +1 votes. Jean-Baptiste Onofré, the incoming PMC Chair, cited six releases from 0.9 through 1.3.0, more than 100 contributors, and 2,819 merged pull requests as evidence of community maturity. The project co-created by Dremio is now a self-governing, vendor-neutral catalog implementing the Apache Iceberg REST Catalog specification.

Graduation means Polaris now operates with full ASF oversight, independent governance, and a community-driven roadmap. Multi-engine support spans Apache Spark, Apache Flink, Trino, StarRocks, Apache Doris, and Dremio. The project positions itself as the open alternative to proprietary catalogs like AWS Glue and Databricks Unity Catalog. Read the official announcement

First Post-Graduation PMC Activity
With graduation complete, the new PMC is expected to begin reshaping its roadmap independently. Community watch items include credential vending expansion for non-AWS storage backends, deeper Delta Lake support through the Generic Table API, and idempotent commit operations for retry-safe catalog writes.

Apache Arrow

Arrow's dev list focused this week on IPC stream design and community infrastructure.

IPC Stream Multiplexing Discussion Continues
Rusty Conover pushed back on a proposal to use QUIC for IPC stream multiplexing. His use case requires explicit ordering across batches from different logical streams. QUIC optimizes for independent delivery, which does not fit his requirements. The thread is a niche but technically interesting look at how Arrow handles multi-schema interleaving in a single IPC channel.

Google Summer of Code 2026
Arrow is among the Apache projects accepting mentors and student contributors for GSoC 2026. A student named Prasanna expressed interest in contributing. Engineers who want to mentor should watch the dev list for formal program announcements.

Security Model Now Published
The Arrow PMC published a formal security model for the project on February 5. The documentation clarifies how Arrow handles security considerations across its libraries. The publication is part of a broader pattern of organizational maturity across the Apache lakehouse ecosystem this month.

Apache Parquet

Parquet activity this week focused on format encoding and the continued adoption of the 1.17.0 release.

ALP Encoding Spec Advancing
The Adaptive Lossless floating-Point (ALP) encoding spec continued moving through review. ALP enables more compact storage of floating-point columns, which appear frequently in ML feature tables and financial datasets. Contributors discussed whether the finalized spec should land as a pull request against the parquet-format repository. Julien Le Dem asked remaining reviewers to comment before the spec can be finalized.

Parquet Java 1.17.0 Adoption
Parquet Java 1.17.0, released January 13, is now moving through broader production adoption. This version drops Java 8 support and sets Java 11 as the new minimum. Teams still running Java 8 JVMs need to plan for an upgrade. Apache Iceberg, Trino, and Spark all depend on Parquet Java for their core read and write paths.

Cross-Project Themes

Two themes defined the Apache lakehouse ecosystem this week. The first is organizational maturity. Polaris graduated. Iceberg is formalizing AI contribution guidelines. Arrow published a formal security model. Parquet added a new PMC member in Andrew Lamb. These projects are not just technically sound. They are building the governance infrastructure that enterprise adopters require.

The second theme is format evolution for AI and streaming workloads. Iceberg's V4 metadata discussions, Polaris's credential vending roadmap, and Parquet's ALP encoding work all point in the same direction. The open lakehouse stack is adapting to handle wide-table ML workloads, high-frequency streaming commits, and multi-cloud storage backends without breaking the compatibility guarantees that production users depend on.

Looking Ahead

Watch for the first official Polaris PMC governance actions now that graduation is complete. The Iceberg metadata.json and index support threads are likely to produce formal proposals or design documents in the coming weeks. Parquet's ALP encoding spec needs its final review pass before it can merge against the format repo. Arrow will likely surface new GSoC contributor proposals as the program opens officially. The lakehouse stack is in good shape. The work continues.

Top comments (0)