Alex Merced

Posted on Sep 15

Apache Iceberg dev list digest (Sept 8–12, 2025)

#data #database #dataengineering #datascience

V1 manifests and V3 tables

Christian Thiel reported an unhandled case when upgrading v1 tables to v3 in Rust: older v1 manifests may not contain existing_rows_count or added_rows_count, leading to a NullPointerException when they are added to a snapshot in a v3 table. He and Russell Spitzer proposed that Iceberg should fail validation when encountering missing row counts rather than throwing NPEs. They noted the issue only affects very old tables, because newer v1 clients always write these fields. Russell suggested emitting a clear error and focusing on finishing the RepairMetadata feature to rewrite affected manifests. Ryan Blue agreed a clear message is appropriate and that there is no need for a spec change.

Thread: DISCUSS V1 Manifests without row counts break V3 Tables

Refining the FileFormat API

Discussion continued about how to design the new FileFormat API. Renjie Liu said he would vote for the first approach (producing format‑specific writers) and felt the proposed “converter” API for vectorized writes was unnecessary. Péter Vár explained that the converter API is only needed for writing position‑delete files on v2 tables and that there are no plans to support vectorized writes; he suggested removing the converter once deprecation of position‑delete‑with‑row‑data is complete. Russell Spitzer asked whether it would be better to have a single signature and special‑case position‑deletes rather than having two distinct APIs. Péter later outlined four methods he believes are needed in a FormatModelRegistry (a generic write builder, plus builders for data, equality deletes and position deletes) and asked if engines should supply a more complex write builder or keep differences in engine‑specific code.

FGAC OpenAPI consensus

Robert Stupp summarised the security discussion on the Iceberg REST fine‑grained access control (FGAC) OpenAPI. The community agreed that Iceberg expressions plus user‑defined functions will be the sole mechanism for retrieving protection instructions. This separates policy definitions from enforcement and avoids specifying query plans in policies. Catalogs must provide user‑ and query‑specific protection instructions using this mechanism. Robert planned to update the OpenAPI proposal in line with this consensus.

Cleaning up stale issues in Iceberg Rust

Micah Kornfield proposed copying PyIceberg’s configuration for automatically marking and closing stale GitHub issues into the Iceberg‑Rust repository. Kevin Liu, Renjie Liu and Manu Zhang supported the idea and suggested testing the configuration on pull requests as well. By Sept 11 Micah observed lazy consensus and asked for a committer to review and merge the PR enabling the bot.

PyIceberg’s optional dependencies

André Luis Anastácio raised concerns about PyIceberg’s optional third‑party dependencies (pandas, polars, duckdb, ray, etc.). He noted that supporting conversion helpers (to_arrow(), to_polars(), etc.) pins PyIceberg to specific versions of these libraries, causing version conflicts (for example, Bodo uses PyArrow 19.0 while PyIceberg depends on PyArrow 21). He argued that external libraries should implement their own integrations rather than PyIceberg carrying these dependencies. Fokko Driesprong responded that PyIceberg was designed as a convenience layer and agreed in principle, but pointed out that PyIceberg itself relies on Arrow extensively and must support lower‑bound versions. He said they try to upgrade Arrow aggressively but also need to consider users locked to older versions.

Deprecation of position deletes with row data

Péter Vár formally called for a vote on deprecating position‑delete files that include row data (a rarely used V2 feature). Community members widely supported deprecation; writing row‑data deletes would be removed in Iceberg 2.0 while read support could remain for backward compatibility. A separate vote thread was opened to approve the deprecation.

Iceberg 1.10.0 RC5 and release announcements

On Sept 11 Steven Wu asked the community to vote on Apache Iceberg 1.10.0 RC5, providing links to the source tarball, commit ID and convenience binaries. Within hours, Steven announced that the vote passed with multiple binding +1s and no objections, so RC5 became the official release. He then published an announcement thanking contributors and summarising key features of 1.10.0, such as v3 format support, improved vectorized reads and new REST catalog endpoints. The community celebrated the release and encouraged users to upgrade.

Bringing back `added-rows` in snapshots

Steven Wu opened a discussion on whether to reintroduce the added_rows field in snapshot metadata. In Iceberg 1.10.0 RC5 the field had been removed to reduce manifest size, but some developers argued it is still useful for incremental scans. Contributors debated whether the information can be derived from manifests or if it should remain a top‑level snapshot field. The thread was ongoing at the end of the week.

Extending DECIMAL type evolution and REST spec enhancements

rice Zhang proposed extending Iceberg’s type promotion rules to support SQL:2011‑compliant DECIMAL evolution (for example, changing precision and scale without rewriting data files). The thread explored how Iceberg currently handles precision and scale changes and whether a compliant evolution policy would be backward compatible. Meanwhile, Prashant Singh proposed a REST‑spec change to add an optional “referenced‑by” list to loadTable responses, allowing clients to see which tables reference a given table (useful for dependency analysis). Alex Stephen suggested publishing Iceberg’s OpenAPI‑generated REST catalog models as a standalone Python package so that PyIceberg and other clients can import the official models directly instead of maintaining their own copies.

S3 analytics accelerator and other updates

Michael Stubbs summarised progress on the Amazon S3 Analytics Accelerator integration: engineers planned to create a JIRA epic, evaluate vectored reads, benchmark heap usage with and without the accelerator and engage with third‑party storage vendors. Jian Chen requested review for PR #13301, which proposed improvements to Iceberg’s metadata API. Viktor Kessler announced a community meet‑up in Dublin to coincide with the Arrow & Iceberg Summit.

Takeaway

The second week of September 2025 saw the Iceberg community refine the 1.10.0 release (voting on RC5 and celebrating its approval), plan deprecation of rarely used position‑delete features, and debate API design choices such as how to evolve decimal types, format models and snapshot fields. Discussions about governance (marking stale issues), dependency management in PyIceberg and forthcoming S3 acceleration features highlighted the project’s growing ecosystem and the constant balance between adding convenience and maintaining flexibility.

DEV Community

Apache Iceberg dev list digest (Sept 8–12, 2025)

V1 manifests and V3 tables

Refining the FileFormat API

FGAC OpenAPI consensus

Cleaning up stale issues in Iceberg Rust

PyIceberg’s optional dependencies

Deprecation of position deletes with row data

Iceberg 1.10.0 RC5 and release announcements

Bringing back `added-rows` in snapshots

Extending DECIMAL type evolution and REST spec enhancements

S3 analytics accelerator and other updates

Takeaway

Top comments (0)

V1 manifests and V3 tables

Refining the FileFormat API

FGAC OpenAPI consensus

Cleaning up stale issues in Iceberg Rust

PyIceberg’s optional dependencies

Deprecation of position deletes with row data

Iceberg 1.10.0 RC5 and release announcements

Bringing back added-rows in snapshots

Extending DECIMAL type evolution and REST spec enhancements

S3 analytics accelerator and other updates

Takeaway

Bringing back `added-rows` in snapshots