Original Japanese article: Apache XTableを使ったAWS上でのOpen Table Format相互運用(Delta→Iceberg)
Introduction
I'm Aki, an AWS Community Builder (@jitepengin).
Lakehouse architectures have become increasingly common in modern data platforms. In many cases, multiple Open Table Format (OTF) are used simultaneously, such as:
- Delta Lake
- Apache Iceberg
- Apache Hudi
For example:
“Our existing lakehouse uses Delta Lake, but the new project wants to adopt Iceberg…”
This kind of scenario is becoming more frequent.
When this happens, interoperability between OTF becomes a key challenge.
Interoperability refers to the ability to integrate different table formats and lakehouses smoothly and seamlessly.
For instance, Microsoft Fabric provides partial Delta ⇔ Iceberg sync via the Iceberg Shortcut feature.
However, it still suffers from type conversion errors, and full bidirectional sync is not yet possible.
This is where Apache XTable has been gaining attention.
In this post, I’ll walk through how I used XTable to convert and synchronize Delta Lake tables to Apache Iceberg on AWS.
What Is Apache XTable?
Apache XTable is an open-source project designed to provide seamless interoperability between the following OTF:
https://xtable.apache.org/
- Apache Hudi
- Delta Lake
- Apache Iceberg
Key features include:
TableFormatSync
Converts table metadata between formats.
For example, you can read Delta Lake tables as Iceberg tables.
CatalogSync
Synchronizes metadata between multiple external catalogs.
Currently supported:
- Hive Metastore (HMS)
- AWS Glue Data Catalog
Upcoming support includes Unity Catalog, Polaris, Gravitino, DataHub, and more.
Polaris Catalog, which focuses on Iceberg interoperability, is also worth watching.
With these capabilities, you can combine the strengths of each OTF—Hudi’s real-time ingestion, Delta’s feature-rich capabilities, Iceberg’s flexibility—within a unified catalog.
Interoperating with AWS Glue Catalog
Architecture
I tested the following setup:
- Create Delta Lake tables with Databricks (stored in S3)
- Run XTable on EC2
- Convert Delta → Iceberg
- Sync the Iceberg table to AWS Glue Catalog
Unity Catalog cannot be used as a source yet, so I used Delta tables directly stored in S3.
Configuration
Below is an example configuration.
The sourceCatalog uses a storage backend, and the target catalog is Glue.
The dataset section controls the Delta → Iceberg conversion.
sourceCatalog:
catalogId: "source-catalog-id"
catalogType: "STORAGE"
catalogProperties: {}
targetCatalogs:
- catalogId: "target-catalog-id-glue"
catalogSyncClientImpl: "org.apache.xtable.glue.GlueCatalogSyncClient"
catalogProperties:
externalCatalog.glue.region: "ap-northeast-1"
datasets:
- sourceCatalogTableIdentifier:
storageIdentifier:
tableBasePath: "s3://your-source-bucket"
tableName: "source-table"
tableFormat: "DELTA"
targetCatalogTableIdentifiers:
- catalogId: "target-catalog-id-glue"
tableFormat: "ICEBERG"
tableIdentifier:
hierarchicalId: "db.delta_table"
Running XTable
Execute XTable with the following command:
java -cp "xtable-utilities/target/xtable-utilities_2.12-0.2.0-SNAPSHOT-bundled.jar:xtable-aws/target/xtable-aws-0.2.0-SNAPSHOT-bundled.jar:hudi-hive-sync-bundle-0.14.0.jar" \
org.apache.xtable.utilities.RunCatalogSync \
--catalogSyncConfig my_config_catalog.yaml
Results
Databricks Source Table
XTable Execution
Glue Catalog
The Iceberg table is successfully registered in the Glue Data Catalog!
Athena Query
Athena can query the Iceberg table via Glue without any issues.
Alternative Approach (EventBridge × Lambda)
Although true real-time sync is difficult, you can still implement periodic synchronization using EventBridge and Lambda.
For near-real-time workflows, S3 event triggers work well.
Scheduled Sync Architecture
- Data written to Delta Lake (S3)
- EventBridge triggers Lambda
- Lambda runs XTable Java libraries via JPype
- Converts Delta → Iceberg and syncs metadata to Glue Catalog
Event-Driven Architecture (Near Real-Time)
- Data written to Delta Lake (S3)
- S3 event triggers Lambda
- Lambda uses JPype to run XTable
- Syncs Iceberg/Delta metadata to Glue Catalog
This enables low-cost near–real-time synchronization.
What I Learned While Using XTable
Small EC2 Instances Are Not Enough
XTable loads multiple dependencies:
- Java 11
- Maven
- hudi-hive-sync
- Various XTable JARs
As a result, t3.micro or t3.small instances run out of memory and fail frequently.
Typical issues:
- Java heap errors
- Spark startup failures
- JAR load errors
- Class conflicts
→ Use t3.medium or larger
Type Conversion Errors Are Common
Just like Microsoft Fabric’s Iceberg Shortcut, type conversion compatibility is still immature.
Cross-OTF schema compatibility is challenging, and full bidirectional sync remains difficult.
Even though OTF emphasize schema evolution, interoperability introduces new considerations.
Conclusion
In this post, I introduced Apache XTable — a powerful OSS for interoperating between different Open Table Formats on a unified catalog.
However, from hands-on experience:
- Type conversion between Delta and Iceberg is still unstable
- You need sufficiently large EC2 instances
- Schema evolution requires careful testing
- And the project is still incubating
So full production adoption is difficult at this stage.
That said, XTable is one of the fastest-growing solutions in the OTF interoperability space.
It has the potential to break down the walls between lakehouses, reduce data silos, and enable more flexible data architectures in the future.
I hope this article helps anyone exploring modern data platform design or multi-OTF environments.







Top comments (0)