DEV Community

Alex Merced
Alex Merced

Posted on

# Apache Data Lakehouse Weekly: March 20–27, 2026

With the Iceberg Summit less than two weeks away, the open lakehouse ecosystem spent this week in a final push of preparation, stabilization, and policy refinement. The AI contribution guidelines debate that erupted across Iceberg and Polaris last week continued drawing community input, while release engineering and summit logistics dominated the technical threads. Across all four projects, the mood is pre-conference focus — tying off loose ends before the community gathers in San Francisco on April 8–9.

Apache Iceberg

The Iceberg community spent the week in summit countdown mode. With the Iceberg Summit 2026 now less than two weeks out, logistics threads picked up as the selection committee finalized the speaker lineup. The two-day event at the Marriott Marquis in San Francisco will feature hands-on workshops, deep technical sessions, and direct access to core maintainers — the largest gathering of Iceberg practitioners yet.

On the release front, the 1.10.2 patch release discussion that Amogh Jahagirdar opened in the prior week continued to gather input from the community. The patch targets bugs discovered since 1.10.1 and aims to keep the production branch stable while contributors push forward on the 1.11.0 release cycle. With production users relying on the 1.10.x line, the community is being methodical about what gets backported versus what waits for the next minor release.

The AI contribution guidelines thread that huaxin gao opened the previous week continued to draw thoughtful responses. Holden Karau, Kevin Liu, Steve Loughran, and Sung Yun all weighed in on the challenge of maintaining code quality and Apache IP compliance as AI-assisted pull requests increase. The conversation is zeroing in on practical guardrails — disclosure requirements, review standards, and how to handle contributions where the provenance of code is unclear. This is likely to produce a formal policy before the Summit.

Péter Váry's efficient column updates proposal for wide tables continued its steady progress. The design, which targets ML feature stores and vector databases with thousands of columns, has now been through multiple community syncs and is moving toward a written design document. The core idea — writing only updated columns to separate files and stitching them together at read time — would dramatically reduce write amplification for AI workloads. Steve Loughran's parallel thread on benchmarking commit performance methodology is providing the measurement framework that will validate these changes.

Viktor Kessler's Basel, Switzerland meetup announcement from last week continued generating interest, expanding Iceberg's in-person community presence to Europe alongside the North American Summit. The combination of a major conference and grassroots meetups reflects a community that's investing heavily in face-to-face collaboration.

Apache Polaris

Polaris continued settling into its role as an independent top-level Apache project. Jean-Baptiste Onofré circulated the project's first board report as a TLP, covering the March 26 board meeting. This is a governance milestone — the report documents community health, development progress, and strategic direction under Polaris's own PMC rather than the Incubator's oversight. The report highlighted the project's growth since graduation on February 18, including active development on credential vending expansion and the catalog federation proposal.

The catalog federation discussion that launched the previous week continued this cycle. The design would allow Polaris to federate across multiple catalog instances in multi-cloud deployments — a capability that enterprise users running Iceberg across AWS, Azure, and GCS have been requesting. Alongside federation, the web console proposal for Polaris gained traction, with contributors discussing how a browser-based UI would complement the existing CLI and REST API for catalog management.

Like Iceberg, Polaris is navigating the AI contribution guidelines question. EJ Wang's thread continued to draw JB Onofré, Yufei Gu, and Dmitri Bourlatchkov into what's becoming a cross-project conversation. The two communities are likely to produce coordinated — if not identical — policies, reflecting their shared contributor base and overlapping governance sensibilities. The 1.4.0 release, which will be Polaris's first release as a graduated project, remains in active planning with credential vending for Azure and GCS backends as the headlining feature.

Apache Arrow

Arrow's dev list this week focused on planning for the 24.0.0 release cycle and continuing the JDK 17 discussion that JB Onofré kicked off after Arrow Java 19.0.0 shipped earlier in March. David Li's supportive response from last week was followed by additional community input on the migration timeline. If Arrow Java 20.0.0 sets JDK 17 as the minimum, it would align Arrow with Iceberg's Java modernization trajectory — effectively raising the Java floor for the entire lakehouse stack in one coordinated move.

Nic Crane's thread on using LLMs to aid with project maintenance continued generating discussion. Unlike the Iceberg and Polaris threads, which focus on AI-generated contributions from external submitters, Arrow's framing centers on how maintainers themselves can use AI tools to manage the project's growing codebase and issue backlog. Sutou Kouhei's Map type key/item/value field names technical thread drew continued engagement from Micah Kornfield and Antoine Pitrou, working through naming consistency across language implementations. Google Summer of Code 2026 student proposals also continued to arrive, with interest in compute kernels and language bindings.

Apache Parquet

Parquet's community held its bi-weekly sync this period and continued active technical discussions on two fronts. The File logical type proposal that emerged the previous week remained the project's most consequential design thread. The proposal would allow Parquet files to natively represent unstructured data — images, PDFs, audio — inside columnar files. If adopted, it would expand Parquet's role from a purely analytical format to a hybrid that can manage the unstructured data that AI/ML pipelines generate alongside the structured features they consume.

The Variant type announcement from February continued to see adoption discussion, with contributors sharing integration experiences across Spark, Trino, and Dremio. Variant brings native semi-structured data support to Parquet, eliminating the need to store JSON strings in regular columns. Combined with the File logical type proposal, Parquet is rapidly expanding its type system to handle the diverse data shapes that modern analytics and AI workloads demand. The ALP floating-point encoding spec is now through its final review, with the formal acceptance vote expected imminently.

Cross-Project Themes

The AI contribution policy conversation is the clearest cross-project theme this week. Iceberg, Polaris, and Arrow are all grappling with the same question from different angles: Iceberg and Polaris are focused on contributor-side disclosure and review standards for AI-generated code, while Arrow is exploring how maintainers can responsibly use AI for project upkeep. The fact that these conversations are happening in parallel — with many of the same people participating across lists — suggests coordinated policies will emerge. This is an open-source governance question that will define how all four projects operate in the years ahead.

The second theme is format scope expansion. Parquet's File logical type and Variant type, Iceberg's efficient column updates for wide ML tables, and Polaris's catalog federation for multi-cloud environments all point in the same direction: the open lakehouse is being asked to handle workloads far beyond traditional analytics. The stack is evolving from a data warehousing alternative into a unified platform for structured analytics, semi-structured data, unstructured files, and AI/ML feature engineering — all governed through a single open catalog.

Looking Ahead

The Iceberg Summit on April 8–9 is the event to watch. Expect final speaker announcements, agenda details, and community organizing threads to accelerate over the next two weeks. On the release side, watch for the Iceberg 1.10.2 patch release vote to open, the Polaris 1.4.0 release planning to solidify, and the Parquet ALP encoding vote to close. The AI contribution guidelines threads across Iceberg and Polaris should produce actionable proposals — potentially in time for discussion at the Summit itself.


Resources & Further Learning

Get Started with Dremio

Free Downloads

Books by Alex Merced

Top comments (0)