DEV Community

Danica Fine
Danica Fine

Posted on

A Dive into Apache Iceberg™'s Metadata

The promise of the data lakehouse is simple: combine the scalability of data lakes with the reliability of data warehouses. Apache Iceberg™ has emerged as the de facto table format for delivering that promise. But why?

The answer lies in Iceberg’s robust metadata layer. It’s a structured, versioned system that enables features like time travel, schema evolution, and efficient query planning. This post explores how Iceberg’s metadata architecture works, and why it’s the foundation of reliable, high-performance data operations in the modern lakehouse.

The Challenge: Finding Data Reliably in a Massive Data Lake

A data lake may contain billions of files, constantly being updated, merged, or deleted. To query it reliably, you need to know:

  • Which files belong to a specific table?
  • What was the table's state at a particular point in time?
  • How has the schema changed?
  • Which files are relevant to a query without scanning everything?

Traditional approaches often relied on directory listings, which are slow, inconsistent, and prone to errors. Iceberg solves this with a structured, versioned metadata system.

Iceberg's Metadata Layer: A Hierarchical View

A sample view of Iceberg's metadata layer situated between a catalog and the data layer of your data lake.

Catalog

The catalog maps table identifiers to their current metadata file, acting as the “address book” for tables. It's the entry point for query engines and compute engines that what to interact with your Iceberg tables.

Examples include REST catalogs like Apache Polaris (incubating), Hive Metastore, or custom implementations.

Metadata Files

Each commit (transaction) to an Iceberg table generates a new Metadata File. This file contains useful (non-data) information on the table, such as:

  • Schemas: The columns in the table, including a full history of the schema as it’s evolved over time. Field IDs are also stored here to ensure that changes are handled correctly across versions.
  • Partition specs: How the data is physically partitioned, including a full history of the partition specs as they’ve evolved over time.
  • Snapshot IDs: A unique ID for each version of the table. Many snapshots can be stored, and you can configure the table to expire snapshots after a certain time has passed or number of snapshots have been accumulated.
  • Pointers to Manifest Lists: Specifically, each snapshot ID maps to a Manifest List file. This is important, because the Metadata File doesn’t list all data files, but rather pointers to Manifest Lists.

Manifest Lists

Each snapshot links to a Manifest List, which points to one or more Manifest Files. Summary statistics are aggregated in the manifest list and include row counts, partitions, and min/max values across all files in the snapshot.

Manifest Files

Each Manifest File tracks a subset of individual Data Files (e.g., Parquet, ORC, Avro) within a single Iceberg table. For each Data File, it stores:

  • File Path: The exact location in your object storage (S3, ADLS, GCS).
  • File Format: Parquet, ORC, etc.
  • Partition Data: Which partition this file belongs to.
  • Column-level Statistics: Min/max values, null counts, value counts for each column within that specific Data File. This is incredibly useful for pruning.
  • Content (ADD, DELETE, EXISTING): Whether the file was added, deleted, or existed in this snapshot.

Note: As the table evolves, a Manifest File can be referenced by multiple Manifest Lists.

Why Metadata Matters

Time Travel

Every commit creates a new metadata file, resulting in a version history of your entire table. Time travel is invaluable for auditability of your data and to ensure that you always know what the result of a query was at any given point in time.

Want to see the data from a specific snapshot? Or data from last Tuesday? It's easy!

  1. Query the system metadata table to see the snapshots and the time at which they became current.
  2. Point to an older snapshot ID in the metadata with VERSION AS OF <SNAPSHOT_ID> or a specific timestamp by adding TIMESTAMP AS OF ‘2025-09-30 17:53:01.284’.

Schema Evolution without Rewrite

Because Iceberg tracks columns by unique ID (not position), adding, dropping, renaming, or reordering columns is a metadata-only operation. Old schemas are maintained so that you can continue to query your data without issues.

Efficient Query Planning

Query engines don't need to list directories in Iceberg. Instead, they read the current metadata file (O(1) operation), then efficiently scan manifest lists and manifest files to put together an efficient query plan. The column-level statistics within manifest files allow for aggressive file and column pruning. If your query only needs customer_id and order_total and filters by region='US', Iceberg knows exactly which data files and which columns within those files to read, skipping the rest of a potentially wide table.

ACID Transactions

Updates, deletes, and merges are managed by creating new snapshots in the metadata, atomically swapping out old data files for new ones. Readers always see a consistent snapshot, preventing dirty reads—crucial for dependable analytics.

Open Format, Decoupled Engines

The metadata files themselves are open (JSON, Avro) and self-describing. This allows any Iceberg-compatible compute engine (Apache Spark™, Apache Flink®, Trino, Presto, Snowflake, Dremio, etc.) to read the same table reliably, fostering true vendor-neutrality and interoperability.

The Metadata Difference

Iceberg's popularity and power as a table format isn't just in its features, but in the metadata system that makes those features possible. By managing tables through structured, versioned metadata, Iceberg transforms the raw sprawl of the data lake into a reliable, high-performance lakehouse that data engineers and data scientists alike can trust.

Top comments (0)