<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Du Tran</title>
    <description>The latest articles on DEV Community by Du Tran (@dutvmta).</description>
    <link>https://dev.to/dutvmta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3887157%2Ff3fd04db-b7c8-43b1-a15f-fcc2a3ad3059.png</url>
      <title>DEV Community: Du Tran</title>
      <link>https://dev.to/dutvmta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dutvmta"/>
    <language>en</language>
    <item>
      <title>Migrating Apache Iceberg Tables Between AWS Accounts: What Nobody Tells You</title>
      <dc:creator>Du Tran</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:41:05 +0000</pubDate>
      <link>https://dev.to/dutvmta/migrating-apache-iceberg-tables-between-aws-accounts-what-nobody-tells-you-5en7</link>
      <guid>https://dev.to/dutvmta/migrating-apache-iceberg-tables-between-aws-accounts-what-nobody-tells-you-5en7</guid>
      <description>&lt;h1&gt;
  
  
  Migrating Apache Iceberg Tables Between AWS Accounts: What Nobody Tells You
&lt;/h1&gt;

&lt;p&gt;When my company needed to migrate to a new AWS account, I took on this project solo — and ended up successfully migrating &lt;strong&gt;nearly 2,000 Iceberg tables&lt;/strong&gt; while maintaining full data integrity across both accounts.&lt;/p&gt;

&lt;p&gt;This wasn't a straightforward lift-and-shift. It required understanding Iceberg's metadata structure at a deep level, handling edge cases that aren't documented anywhere, and verifying data consistency at scale.&lt;/p&gt;

&lt;p&gt;This post documents what I learned, so you don't have to figure it out the hard way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Is This Hard? The S3 Path Problem
&lt;/h2&gt;

&lt;p&gt;S3 bucket names are globally unique. When you create a new AWS account and set up a new bucket, it will have a different name than the source bucket — always.&lt;/p&gt;

&lt;p&gt;Apache Iceberg hardcodes the full S3 URI at every layer of its metadata. This means simply running &lt;code&gt;aws s3 sync&lt;/code&gt; to copy your files isn't enough. All the metadata still points to the old bucket, and every query on the new table will fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Standard Approaches Fall Short
&lt;/h2&gt;

&lt;p&gt;Before building a custom solution, I evaluated the obvious options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;aws s3 sync&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
Copies all files to the new bucket quickly and cheaply, but metadata still references the old bucket name. Queries fail immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CTAS / INSERT OVERWRITE&lt;/strong&gt;&lt;br&gt;
This works — Athena or Spark reads from the old table and writes a brand-new Iceberg table at the new location. But it rewrites every data file from scratch. For tables in the hundreds of gigabytes or terabytes, the compute cost and time are simply not acceptable when you have nearly 2,000 tables to migrate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spark &lt;code&gt;snapshot&lt;/code&gt; procedure&lt;/strong&gt;&lt;br&gt;
Iceberg's built-in snapshot procedure is designed to convert Hive-format tables to Iceberg — not to migrate between buckets. More importantly, it doesn't handle &lt;strong&gt;Iceberg v2 delete files&lt;/strong&gt;, which turned out to be the most complex part of this migration.&lt;/p&gt;


&lt;h2&gt;
  
  
  Understanding Iceberg's Metadata Structure
&lt;/h2&gt;

&lt;p&gt;To understand why migration is complex, you need to understand how Iceberg organizes its metadata. Iceberg is not a file format — it's a &lt;strong&gt;table format&lt;/strong&gt; built on multiple layers of metadata stacked on top of each other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;metadata.json
    └── snapshots
            └── manifest list  (snap-*.avro)
                        └── manifest files  (*.avro)
                                ├── data files  (*.parquet)
                                └── delete files  (*.parquet)  ← Iceberg v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;metadata.json&lt;/strong&gt; is the entry point for every Iceberg table. It contains the table's root location, schema, partition spec, and a full history of snapshots. Critically, it stores the full S3 path to the manifest list for each snapshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manifest list&lt;/strong&gt; (&lt;code&gt;snap-*.avro&lt;/code&gt;) is an Avro file that enumerates all manifest files belonging to a specific snapshot. Each record contains &lt;code&gt;manifest_path&lt;/code&gt; — a full S3 URI — and &lt;code&gt;manifest_length&lt;/code&gt;, the file size in bytes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manifest files&lt;/strong&gt; (&lt;code&gt;*.avro&lt;/code&gt;) are where Iceberg tracks individual data and delete files. Each record contains &lt;code&gt;file_path&lt;/code&gt; pointing to an actual Parquet file on S3, along with statistics including row count, file size, and column-level min/max bounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delete files&lt;/strong&gt; are specific to &lt;strong&gt;Iceberg v2&lt;/strong&gt;. Instead of rewriting a data file on every update or delete, Iceberg writes a separate Parquet file describing which rows were removed. Inside this Parquet file is a column called &lt;code&gt;file_path&lt;/code&gt; — again, a full S3 URI — pointing to the corresponding data file.&lt;/p&gt;

&lt;p&gt;This is the core problem: &lt;strong&gt;the bucket name appears at every layer&lt;/strong&gt;, including inside binary files. A correct migration must touch all of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Approach: Rewrite Metadata, Not Data
&lt;/h2&gt;

&lt;p&gt;The key insight is that &lt;strong&gt;data files don't contain any path references&lt;/strong&gt;. Only metadata files do. So instead of rewriting expensive data files, we can rewrite just the metadata files to point to the new bucket — while simply copying the data files as-is with &lt;code&gt;aws s3 sync&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here's the high-level flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1 — Copy data files
         aws s3 sync (old bucket → new bucket)

Step 2 — Download metadata files to local server

Step 3 — Rewrite each metadata layer (replace bucket name)

Step 4 — Upload rewritten metadata to new bucket

Step 5 — Register new table in Glue Catalog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Steps 1 and 5 are straightforward. The complexity lives entirely in &lt;strong&gt;Step 3&lt;/strong&gt; — and the order in which you process each layer matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step-by-Step: Rewriting the Metadata Layers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — metadata.json
&lt;/h3&gt;

&lt;p&gt;This is the simplest layer since it's plain JSON. You need to replace the bucket name in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;location&lt;/code&gt; — the table root path&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;write.object-storage.path&lt;/code&gt; inside &lt;code&gt;properties&lt;/code&gt; (if present)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;manifest-list&lt;/code&gt; path inside each snapshot entry&lt;/li&gt;
&lt;li&gt;All paths in &lt;code&gt;snapshot-log&lt;/code&gt; and &lt;code&gt;metadata-log&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important optimization: &lt;strong&gt;only process the current snapshot&lt;/strong&gt;. You don't need to migrate the full snapshot history. This significantly simplifies the work — you only need to follow the manifest list of the active snapshot forward.&lt;/p&gt;

&lt;p&gt;Note that this means &lt;strong&gt;time travel will not work&lt;/strong&gt; on the migrated table. Make sure to communicate this limitation to stakeholders before migration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Manifest List
&lt;/h3&gt;

&lt;p&gt;The manifest list is an Avro file. For each record, replace &lt;code&gt;manifest_path&lt;/code&gt; with the new bucket name.&lt;/p&gt;

&lt;p&gt;Here's the first critical gotcha: &lt;strong&gt;&lt;code&gt;manifest_length&lt;/code&gt; must be accurate&lt;/strong&gt;. This field stores the file size in bytes of the corresponding manifest file. After you rewrite manifest files in the next step, their sizes may change — and you must update &lt;code&gt;manifest_length&lt;/code&gt; accordingly. Iceberg uses this field for integrity validation. Get it wrong and your table is corrupted.&lt;/p&gt;

&lt;p&gt;This means you must process manifest files &lt;strong&gt;before&lt;/strong&gt; updating the manifest list, so you have the correct new sizes available.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Manifest Files
&lt;/h3&gt;

&lt;p&gt;This is the most complex layer. Beyond replacing &lt;code&gt;file_path&lt;/code&gt; in each record, there are two additional concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delete file sizes&lt;/strong&gt;: If a record references a delete file (content type = 1 in Iceberg v2), the &lt;code&gt;file_size_in_bytes&lt;/code&gt; field must be updated to reflect the size of the rewritten delete file. Just like &lt;code&gt;manifest_length&lt;/code&gt;, Iceberg validates this field. Get it wrong and queries will throw cryptic errors.&lt;/p&gt;

&lt;p&gt;This means you must process delete files &lt;strong&gt;before&lt;/strong&gt; manifest files, so you have the correct new sizes ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lower_bounds and upper_bounds&lt;/strong&gt;: Iceberg stores column-level statistics in manifest records for query pruning. In most cases these are numeric or timestamp values and don't contain any path references. However, if your table has any string column whose values happen to contain S3 paths — for example, in certain CDC patterns — those bounds will contain the bucket name encoded as raw UTF-8 bytes. You need to detect and replace these as well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Delete Files
&lt;/h3&gt;

&lt;p&gt;Delete files are Parquet files with a &lt;code&gt;file_path&lt;/code&gt; column containing full S3 URIs pointing to the data files they affect. Read the file, replace the bucket name in the &lt;code&gt;file_path&lt;/code&gt; column, and write it back.&lt;/p&gt;

&lt;p&gt;For large delete files, process in batches rather than loading the entire file into memory.&lt;/p&gt;

&lt;p&gt;Two important things to get right here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preserve the exact schema&lt;/strong&gt; when writing back, including Parquet version and compression codec. Schema mismatches can make the file unreadable to Iceberg.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Record the new file size&lt;/strong&gt; — you'll need it when updating manifest files in the next step.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Correct Processing Order
&lt;/h2&gt;

&lt;p&gt;The dependency chain between layers determines the order you must follow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Delete files  →  Manifest files  →  Manifest list  →  metadata.json
   (sizes)           (sizes)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer needs the file sizes from the layer below before it can be correctly written. Processing in the wrong order means you won't have the information you need — and the sizes you write will be wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Gotchas
&lt;/h2&gt;

&lt;p&gt;These are the things that will cost you hours if you don't know about them upfront:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File sizes must be exact after every rewrite.&lt;/strong&gt; Iceberg validates &lt;code&gt;file_size_in_bytes&lt;/code&gt; for delete files and &lt;code&gt;manifest_length&lt;/code&gt; for manifest files. These aren't advisory fields — they're used for integrity checks. A wrong value means a corrupted table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Processing order is strict.&lt;/strong&gt; Bottom-up: delete files → manifest files → manifest list → metadata.json. There's no flexibility here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Only migrate the current snapshot.&lt;/strong&gt; Don't try to migrate full snapshot history. It multiplies the work enormously and the migrated table will have a fresh history anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;lower/upper bounds are bytes, not strings.&lt;/strong&gt; When scanning bounds in manifest records, the values are raw bytes. Decode to UTF-8, replace, and re-encode. Standard string replacement won't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallelism helps but adds complexity.&lt;/strong&gt; With nearly 2,000 tables and potentially thousands of delete files per table, parallel processing is essential for reasonable throughput. But shared state — particularly the dictionaries tracking new file sizes — needs to be handled carefully to avoid race conditions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;This approach has some limitations worth being explicit about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No time travel on the migrated table&lt;/strong&gt; — only the current snapshot is migrated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The table must have no active writes&lt;/strong&gt; during migration — otherwise you risk inconsistency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Equality delete files may behave differently&lt;/strong&gt; — this approach was validated against positional delete files&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Migrating Iceberg tables between AWS accounts is not a copy-paste operation. Because Iceberg hardcodes S3 paths at every metadata layer — from the top-level JSON down to binary Avro and Parquet files — you need to rewrite metadata systematically, in the right order, with correct file sizes at each step.&lt;/p&gt;

&lt;p&gt;The metadata rewrite approach avoids rewriting data files entirely, making it dramatically more cost-effective than CTAS or INSERT OVERWRITE — especially at scale.&lt;/p&gt;

&lt;p&gt;If you're facing the same challenge, I hope this gives you a clear mental model of what's actually involved and helps you build the right solution from the start.&lt;/p&gt;

</description>
      <category>iceberg</category>
      <category>dataengineering</category>
      <category>aws</category>
      <category>python</category>
    </item>
  </channel>
</rss>
