DEV Community

Aki for AWS Community Builders

Posted on

Interoperating Open Table Formats on AWS Using Apache XTable (Delta Iceberg)

Original Japanese article: Apache XTableを使ったAWS上でのOpen Table Format相互運用(Delta→Iceberg)

Introduction

I'm Aki, an AWS Community Builder (@jitepengin).

Lakehouse architectures have become increasingly common in modern data platforms. In many cases, multiple Open Table Format (OTF) are used simultaneously, such as:

  • Delta Lake
  • Apache Iceberg
  • Apache Hudi

For example:
“Our existing lakehouse uses Delta Lake, but the new project wants to adopt Iceberg…”
This kind of scenario is becoming more frequent.

When this happens, interoperability between OTF becomes a key challenge.
Interoperability refers to the ability to integrate different table formats and lakehouses smoothly and seamlessly.

For instance, Microsoft Fabric provides partial Delta ⇔ Iceberg sync via the Iceberg Shortcut feature.
However, it still suffers from type conversion errors, and full bidirectional sync is not yet possible.

This is where Apache XTable has been gaining attention.
In this post, I’ll walk through how I used XTable to convert and synchronize Delta Lake tables to Apache Iceberg on AWS.


What Is Apache XTable?

Apache XTable is an open-source project designed to provide seamless interoperability between the following OTF:
https://xtable.apache.org/

  • Apache Hudi
  • Delta Lake
  • Apache Iceberg

Key features include:

TableFormatSync

Converts table metadata between formats.
For example, you can read Delta Lake tables as Iceberg tables.

CatalogSync

Synchronizes metadata between multiple external catalogs.
Currently supported:

  • Hive Metastore (HMS)
  • AWS Glue Data Catalog

Upcoming support includes Unity Catalog, Polaris, Gravitino, DataHub, and more.

Polaris Catalog, which focuses on Iceberg interoperability, is also worth watching.

With these capabilities, you can combine the strengths of each OTF—Hudi’s real-time ingestion, Delta’s feature-rich capabilities, Iceberg’s flexibility—within a unified catalog.


Interoperating with AWS Glue Catalog

Architecture

I tested the following setup:

  • Create Delta Lake tables with Databricks (stored in S3)
  • Run XTable on EC2
  • Convert Delta → Iceberg
  • Sync the Iceberg table to AWS Glue Catalog

Unity Catalog cannot be used as a source yet, so I used Delta tables directly stored in S3.


Configuration

Below is an example configuration.
The sourceCatalog uses a storage backend, and the target catalog is Glue.
The dataset section controls the Delta → Iceberg conversion.

sourceCatalog:
   catalogId: "source-catalog-id"
   catalogType: "STORAGE"
   catalogProperties: {}

targetCatalogs:
   - catalogId: "target-catalog-id-glue"
     catalogSyncClientImpl: "org.apache.xtable.glue.GlueCatalogSyncClient"
     catalogProperties:
        externalCatalog.glue.region: "ap-northeast-1"

datasets:
   - sourceCatalogTableIdentifier:
        storageIdentifier:
           tableBasePath: "s3://your-source-bucket"
           tableName: "source-table"
           tableFormat: "DELTA"

     targetCatalogTableIdentifiers:
        - catalogId: "target-catalog-id-glue"
          tableFormat: "ICEBERG"
          tableIdentifier:
             hierarchicalId: "db.delta_table"
Enter fullscreen mode Exit fullscreen mode

Running XTable

Execute XTable with the following command:

java -cp "xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar:xtable-aws/target/xtable-aws-0.2.0-SNAPSHOT-bundled.jar:hudi-hive-sync-bundle-0.14.0.jar" \
  org.apache.xtable.utilities.RunCatalogSync \
  --catalogSyncConfig my_config_catalog.yaml
Enter fullscreen mode Exit fullscreen mode

Results

Databricks Source Table

XTable Execution

Glue Catalog

The Iceberg table is successfully registered in the Glue Data Catalog!

Athena Query

Athena can query the Iceberg table via Glue without any issues.


Alternative Approach (EventBridge × Lambda)

Although true real-time sync is difficult, you can still implement periodic synchronization using EventBridge and Lambda.
For near-real-time workflows, S3 event triggers work well.

Scheduled Sync Architecture

  • Data written to Delta Lake (S3)
  • EventBridge triggers Lambda
  • Lambda runs XTable Java libraries via JPype
  • Converts Delta → Iceberg and syncs metadata to Glue Catalog

Event-Driven Architecture (Near Real-Time)

  • Data written to Delta Lake (S3)
  • S3 event triggers Lambda
  • Lambda uses JPype to run XTable
  • Syncs Iceberg/Delta metadata to Glue Catalog

This enables low-cost near–real-time synchronization.


What I Learned While Using XTable

Small EC2 Instances Are Not Enough

XTable loads multiple dependencies:

  • Java 11
  • Maven
  • hudi-hive-sync
  • Various XTable JARs

As a result, t3.micro or t3.small instances run out of memory and fail frequently.

Typical issues:

  • Java heap errors
  • Spark startup failures
  • JAR load errors
  • Class conflicts

Use t3.medium or larger


Type Conversion Errors Are Common

Just like Microsoft Fabric’s Iceberg Shortcut, type conversion compatibility is still immature.
Cross-OTF schema compatibility is challenging, and full bidirectional sync remains difficult.

Even though OTF emphasize schema evolution, interoperability introduces new considerations.


Conclusion

In this post, I introduced Apache XTable — a powerful OSS for interoperating between different Open Table Formats on a unified catalog.

However, from hands-on experience:

  • Type conversion between Delta and Iceberg is still unstable
  • You need sufficiently large EC2 instances
  • Schema evolution requires careful testing
  • And the project is still incubating

So full production adoption is difficult at this stage.

That said, XTable is one of the fastest-growing solutions in the OTF interoperability space.
It has the potential to break down the walls between lakehouses, reduce data silos, and enable more flexible data architectures in the future.

I hope this article helps anyone exploring modern data platform design or multi-OTF environments.

Top comments (0)