Kazuya

Posted on Dec 5, 2025 • Edited on Dec 7, 2025

AWS re:Invent 2025 - [NEW LAUNCH] What's new in Apache Iceberg v3 and beyond (OPN201)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - [NEW LAUNCH] What's new in Apache Iceberg v3 and beyond (OPN201)

In this video, Ron Ortloff and Yuri Zarubin from AWS discuss Apache Iceberg V3 and V4 developments. They cover three major V3 features: variant data type for semi-structured data handling with up to 10X performance improvement over string parsing, deletion vectors that replace V2 positional deletes with bitmap-based Puffin files for write optimization, and row lineage providing automatic change tracking through hidden columns with row IDs and sequence numbers. Additional V3 features include default values, table encryption keys, multi-argument transformations, nanosecond timestamps, geotypes (geography and geometry), and the unknown data type. The session concludes with V4 proposals focusing on performance improvements: enhanced column statistics, adaptive metadata tree for small write optimization, and relative paths enabling easier table copying across storage locations.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to Apache Iceberg V3: Project History and Specification Framework

Today's session is about what's new in Apache Iceberg V3 and beyond. My name is Ron Ortloff, and I'm a Principal Product Manager – Technical at Amazon Web Services. I'm joined here today by Yuri Zarubin, who is a Principal Software Engineer also at AWS. Just a quick side note: you're in an open source track session, so this is focused on open source and what's new in the Apache Iceberg specification. There are a bunch of other Iceberg-focused sessions specifically on AWS services leveraging V3 features, such as deletion vectors. There's a great session on how you can use AWS services with deletion vectors and many more. So just a quick note on that.

What are we going to talk about today? It's really Apache Iceberg V3 and why should you care. We'll cover the three top features that we see most of our customers talking about: the variant data type, deletion vectors, and row lineage. Those are really bubbling up to the top of the stack in terms of what people are most interested in and where they're seeing the most alignment with their use cases. We'll wrap up the V3 talk with additional features. There are a couple of buckets of great additional features that are in the V3 specification. We'll talk a bit about those, and then Yuri is going to come on and talk a bit more forward-looking about Apache Iceberg V4, where the community is starting to go. We're starting to see some proposals formulate and some momentum around those proposals as things are starting to shape up and look forward towards Apache Iceberg V4. We'll have a call to action, some resources, and then you'll be on your way to your next session.

Before we get into V3, I want to talk a little bit about Apache Iceberg as a project and the project history. Everything started back in 2017 at Netflix. Dan Weeks, Ryan Blue, and Jason Reed noticed a lot of common patterns around the big data solutions that they had at Netflix. They started a project that they called Iceberg. In 2018, they started a process of getting that moved into the Apache Software Foundation. So in 2018, Iceberg became an incubator project, which is the first step into the Apache Software Foundation. In 2020, it became a top-level project, which is where you get the formal governance and you're following the Apache Software Foundation processes. In 2021, version two of the specification came out. At that time, the big release and feature that came in V2 was the capability for merge on read, doing row-level deletes, which is kind of the precursor to deletion vectors, which is what we'll talk about today for V3. In 2025, this year in May, version 3 of the specification was ratified. And as I mentioned, we're starting to see some formulation around where things are going with the in-development proposals for V4.

With that foundation established, when it comes to Apache Software Foundation projects, there are a couple of terms when we're talking about versions and what's new that it's important to understand how things operate within the Apache Software Foundation. You have specifications, which are basically documents. It's a contract that anybody working with Iceberg needs to adhere to in order to be specification compliant. This is particularly important in the Iceberg space where interoperability is really the golden ticket of why Iceberg has gained so much popularity. You have multiple vendors implementing, and you want to make sure that you're adhering to the specification so that you can have that single copy of data and use a bunch of different compute engines on top of that data. Specifications are just that contract. They have to be voted on by the PMC, the Project Management Committee, before they're officially ratified. But keep in mind that anybody who wants to participate in an Apache project is more than welcome to vote, have your voice heard, champion something you really want to support, or provide feedback on something. Everybody's welcome to participate. Only the PMC member votes count officially.

On the release side, this is where the specs start to come to life. You have spec features that are implemented in releases either through reference implementations or SDKs, but the releases go through a formal Apache Software Foundation software release process. Those are voted on by the PMC just like specs are.

So why is this important? A lot of people heard marketing buzz and talk about the Iceberg V3 spec being released back in May, and they wondered where their features were. Why couldn't they use feature X, Y, and Z that they heard someone talk about in the Iceberg V3 spec? Since the 1.7 release all the way through to the 1.10 release, this is where we are seeing the community start to implement those features that were ratified in the spec. That is going to continue to go forward, but just keep in mind that there is a lag between when things are ratified in a spec and that contract and when they actually make it to market in a reference implementation that you can get your hands on and use.

That is the whole point of the slide, giving some foundation on where things are at. The V3 spec is not 100 percent implemented in any release yet. That is really the key takeaway from this whole slide. With that all out of the way, let's jump into V3 and let's start talking about some of the features. The first one I will talk about is the variant data type. By the way, real quick show of hands, how many people are using Iceberg today? I meant to do this before, so okay. Anybody had a V3 table created yet? We have one brave soul. We have got to get this guy a prize. That is awesome. Well done. Okay, so come on up and join us. You can give the rest of the talk.

Variant Data Type: Enabling Semi-Structured Data Support with Performance Benefits

Before we jump into variant data type, let's do a quick background. There are really a few different kinds of data types in Iceberg. There are primitive types, which are atomic data types that cannot be broken down further. Think of a string. Those are your basic atomic primitive types. There are also structured types where you can stitch together multiple primitive types. Think list, map, array, those types of things. The key here is that you have a fixed schema. This is where variant comes in, with the ability to do semi-structured data support. Here you are getting flexibility to handle a varying schema. There is no fixed component to the semi-structured nature of variant. We see a lot of people leveraging the variant data type to handle and process JSON data that they receive.

In terms of components in the variant spec for V3, there are three different components. There is a metadata component used to support things like file pruning. There is an encoding where you are taking values out of the variant data set and putting a data type on those. And then the last component is shredding. Shredding is basically taking elements out of the variant data type and the variant data set and materializing those as hidden columns. We will talk a bit more about shredding in a moment.

Before we get into use cases of the variant data type, let's talk about some of the pain that some of you folks using Iceberg V2 may be experiencing today when you are trying to handle semi-structured data. The first thing we saw a lot of customers doing is basically trying to make their semi-structured data look like structured data. They are doing that through transformations, picking the closest structure type that they can to align to their data. With this, you have to operate with a fixed schema. There is a little bit less performance, which we will talk about more later as well. But we do see some people trying to operate in this manner, and you are again also paying for additional costs around the transformation of that data.

The second pattern we see is people materializing out additional columns. They may take in a data set, find a number of elements that are key to them, and then they will run transformations and put them out into separate columns in the table.

So again, you're paying transformation costs and compute costs, but you are getting pretty good performance in this model. These are legit columns in the table that you've transformed and pulled out of a semi-structured set. They're going to have stats on them. They're going to perform quite well.

And then the last one is String. I call this kind of like the give up model. I've got a semi-structured data set. I can't fit it into a struct. I don't want to materialize columns. Let's just throw it into a string field, right? And we'll do string parsing, parse JSON, and we'll just grab data out of that thing whenever we can. This one gives you that flexible schema, but I do have performance listed on here twice because the penalty for this can be quite substantial. It really, really can.

So that's what we see customers doing today. In terms of use cases, what we really see customers going after with variant, the first one I'll talk about is IoT workloads. Here, you've got eventing type models with different event details that you may be getting from a number of different IoT devices. Schema is really best applied on read, not trying to figure out a way to get that data into a fixed schema upfront. You also have a tendency to add in fields into your semi-structured data set more often with IoT workloads as you're getting these varieties of different types. So this is a case where variant, being semi-structured in nature, can just absorb that semi-structured type data and give you an ability to report on it.

The next one is data pipelines. Here, this is more about landing the data into a variant data type, and instead of doing transformation logic to get it to fit into a schema, you're basically building logic into your data pipeline to apply a schema. We have one customer taking in data from a number of different suppliers. They couldn't scale up the operation to actually enforce schema on their data providers. So they basically had a data set. If it fit into the schema, great. It went into the main pipelines that they had. If they didn't have a supplier that gave them spec compliant or schema compliant data, they created something they called the imparsable fields column. That was variant. They would just dump the data into that variant field and use pipeline logic to extract the elements out of that variant field. That allowed them then to keep things consistent as they moved the data through the different phases in reporting and analytics for their solution.

The last one is real-time analytics. Again, you want to be able to query the data as it lands. You want to avoid preprocessing of data. This is in a model where I need access and I need analytics on top of the data immediately. We had a FinTech customer doing lots of quote analysis with stocks, bonds, crypto, whatever. Those quotes all come in in different shapes and sizes. They don't have time for this solution to be able to do any preprocessing or transforming up front. They want to throw it into a variant field and just let the quants and the smarter people than me query on that data quickly.

So that's kind of variant use cases. Just to distill down some of the variant benefits, performance and cost is really a big one with variant. You're getting data put into columnar storage, which gives you stats and gives you the ability to do predicate pushdown. I've talked a lot about flexible schema in the different use cases and built up to this slide. You're operating now on a schema that's dynamic for you. You get in new evening scenarios and details and you almost get an automatic schema evolution, right? You can still take things in that have new elements in them and you can still operate on top of them without having to go through a formal schema evolution.

Efficient storage is another benefit, so you are getting compression when you do the shredding and build out the hidden columns. You're going to get the compression done for you out of the box. And then the last benefit is around querying the data. So you get schema navigation through dot notation on some engines.

Some engines have a slightly different syntax, but it's very easy to navigate down a hierarchy on a variant data type compared to wrapping multiple string parsing functions. I've talked a couple of times about shredding, and this is my favorite slide of the entire deck, so be prepared. I have a table here that's been defined in storage with event date, timestamp, source ID as an integer, and event details as a variant field. That's three columns, pretty simple. I get this type of data file from my source system. I want to load that now into the table. What happens in this case with V3? The primitive types, those easily mapped, get loaded through my engine and put into the table. With this variant field, I'm going to pass that through logic that you would have in your Iceberg V3 engine to actually shred that out. Here's my variant shredder. That data flows through the shredder, where it gets broken out into the sub-columns. That's the engine implementation of a variant shredding process. Those elements are then put into sub-columns, hidden virtual sub-columns in the table. You see SKU with integer, account ID with integer, and pricing as decimal. I've got data types and these hidden columns. I still reference that column as event details, but underneath the covers in the implementation of variant, I have those hidden sub-columns that I can then query into. That query would look something like this. With this predicate column, I'm actually going to be able to do pruning on that data and return just the values that match that predicate.

Any other type of model outside of variant has a higher chance of doing a full table scan, especially if I'm doing string parsing or something like that. It will certainly be doing a full table scan, which is rough when you think about it. But then also if you think about doing joins, if I wanted to do a join on account ID to another table with no statistics, I think you know how that story ends. It can be pretty ugly with mismatched join conditions, scanning multiple tables, and performance can be very bad. Just to give you a rough sense of the relative query performance difference that you may see, with variant being the 1X baseline, you could see up to 4 times additional degradation in performance with structured types. Strings could be 10X worse, and honestly that 10X number, I could craft a benchmark for you that did table scans to very tight predicate lookups and it could be much, much higher than 10X. So when you start looking at variant for potential semi-structured type data solutions, this is the type of performance profile that you can start seeing over some of those alternatives that we discussed.

Deletion Vectors: Optimizing Write Performance and Storage Efficiency

You're all experts on variant now, so let's move on to the next one and talk a little bit more about deletion vectors. Deletion vectors are a write optimization feature and a storage optimization feature that's been enhanced over what's been in the spec since the V2 version of it. How does the write modes inside of Iceberg work? You have the default option of copy on write that's been there since day one of Iceberg. In that model, you have a data file, you run a delete query against that, and then you rewrite a new data file that's got whatever was deleted out of it removed. I use this analogy: if I've got a table with 10 records in it and I want to delete one of those records, I'm going to rewrite a new data file with 9 records in it. That's copy on write. Now that may not seem like that big of a deal, but if you do that across thousands of data files, your write amplification will be significant.

You'll be rewriting an awful lot of data. Conversely, with merge on read, that same model now with a data file and delete query, I'm just writing a delete file with one record in it, one positional delete or one delete condition, depending on the type of row-level delete I've implemented. The write amplification problem is solved. You're going to have faster writes with merge on read, and you're going to be consuming less storage. But you're going to pay a tax on the read. Now I've got to join those delete files to my full data files to figure out what the filtered result set is to return. With copy on write, I've got a new file with nine records in it. I just need to scan that entire set and not worry about putting it together with anything else.

So that's copy on write versus merge on read. When we talk about the delete types that are there in Iceberg, first we had in V2, we introduced the equality deletes capability. This is writing a condition to a delete file, like remove all user ID equals ABC, and then that delete condition is used on the reads to filter out records. This is pretty popular for streaming workloads like Flink. We see a lot of equality delete solutions being used with Flink. Then the other piece that was introduced in V2 was positional deletes. You're writing a delete position to a delete file, and then you're using that delete file on read, brought into a bitmap to filter out the results set.

Equality deletes were there in V2 and equality deletes are still there in V3 spec. Positional deletes were there in V2, but those are not in V3. They've been deprecated and replaced with the deletion vector feature now in the V3 spec. Instead of writing positional delete files, deletion vectors update the values on a bitmap and then persist that bitmap to disk in the form of a Puffin file. There's a slight nuance to that. It's still positional deletes, but it's in a much more optimized format with deletion vectors.

Let's talk a little bit about use cases and why you want to implement something like deletion vectors. We talked about the write amplification piece, but there are other higher level scenarios where deletion vectors can come into play. GDPR compliance is one. We see a lot of people running into write amplification issues doing GDPR compliance. Oftentimes when you're removing information about individuals, it can be different than how the data is fully consumed for other business purposes, and that's where you end up in these sort of random write scenarios where your write amplification is very high. If you have to do GDPR deletes, deletion vectors is a feature that you should certainly look into.

Another use case is data cleanup. We see people using a medallion style architecture with bronze, silver, and gold layers, where your bronze is more of a raw staging area. You're going to land a lot of data, there's going to be maybe a lot of noise around that data, and you're going to want to do a lot of data cleanup on that bronze layer. That's where deletion vectors can come into play and aid in the write speeds for those data cleanup operations. Then the last one is incremental data pipelines. If you're running merge statements in your data pipelines, you have a potential to be doing again more of those random write type of operations, and here's where deletion vectors can play very nicely with merge operations.

Just to understand a bit how the moving parts operate within deletion vectors, in this example I've got a table with a couple of snapshots that are created on disk. We've got snapshot one and snapshot two. If I want to do another delete query, I've got a data file that I'm going to produce, and then I'm going to have a bitmap that I'm going to update. As part of that S3 transaction, we'll make sure that the data file is committed on disk and we'll make sure that the bitmap is also committed on disk.

And then that transaction is committed into the table. When it comes time to consume that data, we'll grab the data files and the bitmap , and then we'll use that bitmap to filter out the results set as we're returning the data to the calling application.

To wrap up on deletion vectors, let me compare them to V2 positional deletes and hopefully drive home why this is such a great improvement. With positional delete files, there was a propensity to have a proliferation of delete files. There was no spec and no guidance given around producing delete files, so we ended up seeing a lot of customers using positional deletes on V2 with just tons of these delete files from small transactions, small deletes, small updates, and small merge statements. This puts an additional compaction burden on you because you have to clean that data up and fix up your underlying data files, which is one of the challenges we're certainly seeing with V2 and positional deletes.

The other challenge is that positional delete files are translated from a bitmap into a Parquet file, and then on read, that is reversed. You're pulling data off of a Parquet file and then building a bitmap on the fly to filter out results. With V3, many of these challenges are solved. Per the spec, you're only allowed to produce one delete file per snapshot, which eliminates the problem of all these small tiny delete files. The bitmap itself is also stored in a Puffin file directly, so there's no deconstruction and reconstruction process that happens around that bitmap and the Puffin file. The writers themselves maintain that bitmap while the write operations are happening, so you have a fully efficient bitmap that's being built and maintained.

Row Lineage: Built-in Change Tracking for Compliance and Data Pipeline Management

That's deletion vectors. Let's talk a little bit about row lineage. Row lineage is a great change tracking feature that's been brought to the V3 spec. The components of row lineage within the spec include a writer specification. To be a compliant Iceberg V3 writer, you have to be producing record-level change information. As a writer, you can use Iceberg V3 metadata to understand sequence numbers and row IDs that have been committed to snapshots, but there is a responsibility on the writer's side to produce these changed records that allow for row lineage to be consumed.

On the reader's side, there are new hidden columns added to V3 tables that give you that row lineage information automatically. You get a row ID and a sequence number on every single V3 record. From an operational standpoint, row lineage is a required feature, so it's on by default with no knob to turn it off. That information is just there for you to consume. What I like about row lineage is that for those who have ever dealt with a split brain scenario where state information gets out of sync with data, it's a neat side benefit. The row lineage information is stored right on the record, so you have that row lineage state information in the snapshot, in the record, and in the table itself.

If you want to do time travel, you're going to get the time travel equivalent of what that row lineage looked like at the time you query. In terms of row lineage use cases, incremental processing is a really nice sweet spot for row lineage. If you want to be reading out of V3 tables as input into data pipelines, you can leverage the sequence numbers, the row IDs, and the information on the rows to understand what's changed as a source to feed into your pipelines. Event lifecycle tracking is also another good one. You think of capturing changes to a record over time.

Think of an orders table where an order goes through a series of changes: it's been submitted, billed, processed, fulfilled, shipped, and delivered. You want to understand what the lifecycle or state changes were. With row lineage, each one of those changes is on the same row ID for that order. You'll see the sequence number increment each time, and you'll be able to quickly and easily stitch together what that lifecycle looked like.

Slowly changing dimensions, or type 2 dimensions, use a similar model. You could feed in row lineage information to help build those types of things out as well. Debugging is another use case, maybe not as flashy as some of the other applications here, but if you keep track of that row lineage information within the transformations you do and the data you enrich and build in your system, you can then trace back to how those calculations came to be. You can use the row ID and the sequence number for which you did the calculation, and then you can trace back over those events and understand how you got one plus one equals seven, for example.

Lastly, with row lineage, it's a great compliance enabler. You're going to have a sequence number stamped on each modification to that table. So you'll be able to understand if someone went in and updated a salary in an employee's table to a billion dollars. You'd be able to see that sort of change of events and trace back on those steps.

So, a couple of clicks here on how row lineage works. I have a table here with two columns: first name and last name. I've added in this gray box on the side of that, which are the row lineage columns that come automatically. If I do a select star from the table, I see first name and last name, but I don't see the row lineage columns automatically. But if I actually key those in and select four columns, you'll see all the information like you see on the screen here.

For the initial load, I put two rows in the table, and then I have the row ID and sequence number equal to one there. On the next transaction at time T2, I update Diego to Pablo, and then we see the sequence number increment up to two. Then for the third transaction here, I'll ratchet up the complexity a tiny bit. You'll get a merge statement that does a new row and then updates Carlos to Chuck. Here you see the new record added with a new row ID generated, and then the sequence numbers for both of those come in as sequence number three.

Some people, when we talk to them about this, ask why the new insert shouldn't be sequence number one. It's going to be a monotonically increasing number, but it's going to start at where the current sequence number is within that table. This is important now when you want to actually start consuming those changes. Here on the select query, if this was like when I was talking about doing those incremental pipelines, if this was a source query for you to feed into a data pipeline, you can keep track of that max last sequence number on each of the batches that you run through. So now when I come through, if I say where the sequence number is greater than the last time that I pulled, I'm going to get those last two records that we saw come out of that merge statement with this example here.

To close up on row lineage, in the V2 spec, there was in the Spark implementation a procedure and a view implementation for doing changelog. Changelog is basically a mechanism that does snapshot diffing. You're taking a snapshot at T1 and comparing that, joining that together with snapshot T2 to understand what the differences are between those two snapshots. If you compare and contrast that to row lineage, because row lineage is basically doing the same type of change tracking for you, with changelogs in V2, you're having to pay the compute cost to do that snapshot diffing. With row lineage in V3, those details are just stamped right on the record.

There's no additional compute. You just query those, and the writers are tracking it automatically. Change logs require view maintenance. If you do schema evolution, you have to make sure the change logs and the views are maintained correctly. Row lineage is done automatically. Those metadata columns, the writers, everything is done for you, and that information is there with zero conflict.

If you think about correlating changes over time through time travel queries over different snapshots, and you want to do that with change logs, it can get a bit tricky where you're having to do an iterative approach to comparing multiple snapshots over time. With raw lineage, because that information is stamped on the records, it's very easy for you to stitch out the full life cycle of changes on straightforward queries as opposed to doing a bunch of joins.

Additional V3 Features: Core Infrastructure Enhancements and New Data Types

To wrap things up on V3, we'll cover some of the other features that have been delivered in the spec. In terms of additional features, there are really two buckets. There's a core infrastructure set of changes or features that came in on the V3 spec, and then there's also some additional data types. We'll take a minute here to talk about the core infrastructure pieces.

The first feature that came in the V3 spec in the core infrastructure area is default values. Default values insert data when there's no value specified. You're going to create a table, specify for a column what the default value is, and then when you're running your pipelines or inserting your data, if no value is specified, you're using the default value that's configured on the schema.

What's interesting about the Iceberg implementation is that the default value itself is stored in metadata. You're not persisting a constant value in the file itself. You're actually using the metadata on read to replace the value that wasn't specified. That's a nifty nuance, being able to leverage Iceberg metadata to get that default value populated on the results set. You're going to get some conformity and ease of use around the fact that the value has persisted in the table metadata. I've worked on systems where multiple developers got creative with how they specify a default value. Some people use negative one, some people use some other value, and it can become a mess. Here, that default value is stored right with the schema, and you don't have to worry about any pipeline logic or any rules that you want to implement for replacing in default values.

Next in the core infrastructure area is table encryption keys. Table encryption keys give you the ability to specify an encryption key at the table and at the metadata level. It's going to give you a granular set of control across your tables that you need to encrypt. You can also integrate that encryption key with KMS. It does support key rotation as well, so you can rotate keys. This is more of an advanced feature for you to really align with some elevated security compliance requirements on your data, so it's going to give you more granular control, allow you to integrate with KMS as well, but also to get that higher level of encryption on your Iceberg data.

The last one I'll talk about in the core infrastructure area is multi-argument transformations. With multi-argument transformations, now you're able to specify more than one field in a transformation. Transformations can be used for partitioning, or they can be used for sorting. This gives you a bit more control in scenarios where you have a harder time picking just one column. You need more than one column to actually help with query scenarios and help to avoid data skew. The net for multi-argument transformations is that this is a performance feature, so you'll be able to align your end user query patterns more efficiently by specifying multiple columns in those transformations for partitioning as well as sorting.

The last section we'll discuss regarding V3 is additional data types.

The first one in this section is nanosecond timestamps. Nanosecond timestamps have been discussed in the Iceberg space for quite a while, probably a year before the Iceberg V3 spec came to be. We had customers who have nanosecond timestamps in their Parquet data or from some other system, and they want to be able to keep that fidelity and precision as they move into Iceberg. This has been a pretty significant gap for quite a while. What you're getting is nanosecond timestamp support for your high-frequency and temporal type workloads, with an increase from microsecond to nanosecond fidelity in Iceberg V3. You're getting this both with and without timezone support. In terms of workloads, this fits very nicely into streaming workloads where you want to have more precise measurements on frequently arriving data.

The next one is the geotypes. This is actually two types on one slide: geography and geometry. You're getting support for location and mapping queries with the geography data type, and then you're getting the ability to do measurements and shapes with the geometry data type. Both of these data types are separate implementations and separate data types. They follow the open OGC standard, so we have an open format following an open standard, which is great. I would say that having the geography and geodata types inside Iceberg V3 is really a killer feature. You're getting geo-enablement on data plus all the goodness that comes with Iceberg around schema evolution, time travel, and interoperability. Now you have the power of Iceberg and the power of geospatial on the data, which I think is really great.

The last one I'll talk about is unknown. Unknown is a bit of a cryptic implementation. It's a kind of null placeholder value. You're going to get some protection now if you're doing schema evolution and types or data doesn't exist in files. You can use the unknown data type as a known null placeholder across engines. Sometimes we see customers running into issues where null handling isn't always handled gracefully across their Iceberg engines. This helps to combat that scenario. We've actually seen some extreme cases where customers have gone to rewriting data to make sure that they're not breaking their Iceberg implementations.

Looking Ahead to Apache Iceberg V4: Performance-Focused Proposals

With that, I will hand it off to Yuri. Clearly, there's a lot of features in V3, and after Ron's excellent deep dive, I think we have no excuse not to know about it. What I'm going to do is cover some of the new things that the community is cooking up in V4, and then we're going to close out the session and do conclusions. One thing to note is that all the stuff I'm about to talk about hasn't been ratified. These are just proposals, and the community is still thinking about what to do. The way I would characterize the changes so far is they're very much performance-based, focused on raw performance and not jam-packed full of features like V3.

Let's start with improved column statistics. If you're not familiar with column statistics, it's special information about columns that's stored inside the Iceberg metadata files, and it helps query engines to effectively scan data. One of the problems with them right now is that the way these stats are implemented is not super efficient in certain use cases, especially if you have lots of columns, and that's because when query engines read them, they have to deserialize these large maps, which then creates memory pressure.

One of the things that the community is thinking about is creating a proper structure for these, which will help engines efficiently look at particular stats that they care about. If you're just using Iceberg like me, all you really need to know is that you're going to get better performance in certain use cases.

Next is the adaptive metadata tree. This is motivated by small write and delete performance, something that Ron mentioned earlier. If you have lots of small writes or deletes, it can become problematic because Iceberg has these layers, starting from the catalog layer to the root metadata layer, and then there are three more layers where you have the manifest list, metadata, and then the actual data file. Every time you want to insert a file or do some sort of update or write into your table, you have to go through all of these layers, and that can be inefficient.

The proposal is to combine the manifest list and the manifest together into a single structure that will have root nodes and leaf nodes. It's a very long proposal, which I'm not going to go into, but basically at the end of the day, it will skip one layer for the small write use cases. Again, if you're just using Iceberg, this isn't really something you're going to think about. Just know that your small write performance is going to improve with V4 if this proposal goes through.

Lastly is relative paths. This is actually super useful because one common problem with manifests is that they contain absolute paths to the Parquet files, to the data files. The reason that's problematic is that if you want to copy your table, like if you have an Iceberg table on general purpose S3 and you just want to copy it to a different S3 bucket, it could be a different account, could be a different cloud storage provider. You run into this issue because once you copy the actual files, you can't actually read them or none of the query engines work because the metadata files are referencing the previous data files.

The fix to that is, well, it's in the name, right? It's to make paths relative. This is going to be a very useful feature, especially if you're using something like S3 replication, where you're replicating your Iceberg table to a different bucket for data protection, backup, or whatever. With relative paths, you're going to be able to just query that data without having to modify manifest lists.

Conclusion and Call to Action: Getting Started with V3 and Joining the Community

So just to wrap things up, Ron talked about the key V3 features, which are deletion vectors, row lineage, and variant data type. Then he talked about the core features, so the core functionalities such as default values, multi-argument transforms, and the new types: the geo, nano, and the unknown. I just went through the V4 proposals, so that's the improved stats, adaptive metadata, and relative paths.

We do invite you to go and just try things out. There's only one gentleman at the beginning of this that raised their hand when asked if they tried out V3, so we hope to move that number up. Hopefully by the end of this week, once you go back, you try some of this out. Whatever vendor you're using with, ask them what V3 support do they have, and then try converting your table from V2 to V3.

If you're interested more in Iceberg, do join the community so you can attend the meetup, you can join us on the mailing list, on Slack, or even make a contribution if you're a developer. And with that said, thank you for coming and I hope you enjoy the rest of your time here. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.