Aki for AWS Community Builders

Posted on Jul 1

Does Amazon S3 Tables Replace AWS Glue Data Catalog? Understanding Their Relationship

#aws #dataengineering #iceberg

Original Japanese article: S3 TablesはGlue Data Catalogを置き換えるのか考えてみた

Introduction

I'm Aki, an AWS Community Builder (@jitepengin).

When I first started exploring Amazon S3 Tables, one question immediately came to mind:

"Does this service eventually replace AWS Glue Data Catalog?"

Perhaps not everyone has the same impression. However, because S3 Tables provides its own Iceberg REST Catalog endpoint and can create and manage namespaces and tables without integrating with Glue Data Catalog, I couldn't help but wonder about it.

The official documentation uses the term "integration", which suggests that the relationship is more nuanced than a simple replacement. Still, it can be difficult to understand how these two services actually fit together.

In this article, I'd like to organize the relationship between Amazon S3 Tables and AWS Glue Data Catalog based on the official AWS documentation.

To summarize upfront:

Amazon S3 Tables does not replace AWS Glue Data Catalog. Instead, it should be viewed as a new managed table storage service that provides its own Iceberg REST Catalog while also integrating with Glue Data Catalog.

Understanding the Roles of S3 Tables and Glue Data Catalog

To begin with, Amazon S3 Tables is not a replacement for Glue Data Catalog.

At the same time, it is not entirely accurate to think of S3 Tables as "just storage."

S3 Tables provides its own Iceberg REST Catalog endpoint:

https://s3tables.<region>.amazonaws.com/iceberg

Even without integrating with Glue Data Catalog, you can create, list, delete, and read/write namespaces and tables directly through this endpoint.

The official documentation describes this standalone endpoint as being suitable for scenarios where you only need basic read/write access to a single table bucket. For other scenarios, AWS recommends using the Glue Iceberg REST endpoint, which provides:

Integrated table management
Centralized governance
Fine-grained access control

The roles of the two services can be summarized as follows:

Service	Role
Amazon S3 Tables	A storage layer that stores Iceberg table data and metadata while also providing an Iceberg REST Catalog for a single table bucket
AWS Glue Data Catalog	A centralized metadata catalog for AWS tables and databases, including S3 Tables, providing unified access and governance across analytics services such as Athena and Redshift

In other words, the central catalog for governance across AWS analytics services is still Glue Data Catalog.

S3 Tables is simply one of the catalog sources that can be integrated into Glue Data Catalog as a federated catalog.

Considering recent features such as S3 Tables, Catalog Federation, S3 Metadata, and S3 Annotations, all of which bring various metadata sources under Glue Data Catalog, the role of Glue Data Catalog may become even more important in the future.

If you only need to manage a single table bucket, the standalone REST Catalog provided by S3 Tables may be sufficient.

However, if you need to work across multiple analytics services or multiple catalog sources, using Glue Data Catalog as the entry point is likely to be much easier operationally.

Understanding Federated Catalogs

By integrating S3 Tables with Glue Data Catalog, you can use a single catalog to discover and query data in Amazon S3 data lakes and even join that data with S3 Tables.

A federated catalog allows users to access metadata through Glue Data Catalog without needing to know where that metadata is actually stored.

From a user's perspective, S3 Tables integration works in a similar way—you can access it as another catalog within Glue Data Catalog without needing to know where the underlying metadata physically resides.

The integration maps S3 Tables resources into Glue Catalog objects as follows:

An S3 table bucket becomes a Data Catalog catalog.
An S3 namespace becomes an AWS Glue database.
An S3 table becomes an AWS Glue table.

When integration is enabled through the console, AWS automatically creates another layer on top called:

s3tablescatalog

When you integrate S3 Tables with Data Catalog through the Amazon S3 console, AWS creates a federated catalog named s3tablescatalog, which acts as the parent catalog for all existing and future S3 table buckets in that account and Region.

From the perspective of query engines, the architecture looks like this:

Athena / Redshift / Glue ETL
              │
              ▼
      Glue Data Catalog
              │
              ├── Traditional Glue Catalog Tables
              ├── S3 Tables (via s3tablescatalog)
              └── External Catalogs (via Catalog Federation)

Regardless of where the metadata actually resides, all of these sources appear as tables under Glue Data Catalog.

The hierarchy inside S3 Tables looks like this:

s3tablescatalog (federated catalog)
        └── analytics-bucket (child catalog = S3 table bucket)
                └── sales (database = S3 namespace)
                        └── transactions (table = S3 table)

For example, if you have a table bucket named analytics-bucket containing a namespace called sales and a table called transactions, it can conceptually be represented in Glue Data Catalog as:

s3tablescatalog/analytics-bucket/sales/transactions

In Athena SQL, the same table is referenced as:

"s3tablescatalog/analytics-bucket"."sales"."transactions"

The important thing to remember is that the parent catalog layer, s3tablescatalog, sits in front of the table bucket.

Note:
The four-level hierarchy described above applies to same-account scenarios.
In cross-account scenarios, individual S3 table buckets must be mounted manually into Data Catalog, resulting in a three-part hierarchy.

Trying It Out

Let's verify how an S3 table bucket becomes visible from Glue Data Catalog and how it can be queried from Athena.

Creating a Table Bucket

Create a table bucket in the console and enable the "Enable integration" checkbox.

If you're using the CLI:

aws s3tables create-table-bucket

At this point, integration with Glue Data Catalog is automatically configured.

Querying from Athena

To query the table from Athena, specify:

s3tablescatalog/<table-bucket-name>

as the catalog.

Viewing It from Glue Data Catalog

When you open the Glue Data Catalog console, you'll see the table bucket under s3tablescatalog, followed by its namespaces and tables.

You can also view the schema information directly.

One particularly nice aspect of this integration is that, without any special configuration, you can access S3 Tables data directly from the familiar Athena query editor and Glue Data Catalog console.

Existing Access Control Mechanisms Continue to Work

From an access management perspective, the existing Glue Data Catalog and Lake Formation mechanisms continue to work.

Data Catalog supports two access control modes for S3 Tables integration.

Mode	Description
IAM Access Control	Controls access to both S3 Tables and Data Catalog through IAM policies
AWS Lake Formation Access Control	Uses Lake Formation permissions in addition to Glue IAM permissions and supports database-, table-, column-, and row-level security

One important detail concerns credentials when using Lake Formation.

If a registered role is configured and credential vending is enabled, principals do not need direct S3 Tables IAM permissions.

This is because Lake Formation issues credentials on behalf of the principal using the registered role.

I have another article covering AWS Lake Formation in detail if you're interested.
Organizing How to Use AWS Lake Formation

Because you can migrate between access control modes as requirements evolve, a practical approach might be to start with IAM-only permissions and later move to Lake Formation for finer-grained control.

If Nothing Is Being Replaced, What Actually Changed?

The position of Glue Data Catalog itself has not changed.

What has changed is the operational layer around data management and table maintenance.

Traditionally, Iceberg on S3 consisted of:

A general-purpose S3 bucket
Glue Data Catalog
Table maintenance mechanisms

Even in the traditional architecture, Glue Table Optimizer can already manage:

Compaction
Snapshot retention
Orphan file deletion

Therefore, the major difference between traditional Iceberg on S3 and S3 Tables is not necessarily the existence of these features, but rather that they are enabled by default.

Item	Traditional Iceberg on S3	S3 Tables
Storage	General-purpose S3 bucket	Dedicated table bucket
Catalog	Glue Data Catalog	Native Iceberg REST Catalog or Glue Data Catalog
Compaction	Glue Table Optimizer (manual enablement)	Managed and enabled by default
Snapshot retention	Glue Table Optimizer (manual enablement)	Managed and enabled by default
Orphan file deletion	Glue Table Optimizer (manual enablement)	Managed and enabled by default

Of course, S3 Tables also provides dedicated resource types, optimized pricing models, and specialized APIs.

However, one of the biggest differences is the reduction in onboarding effort thanks to these capabilities being enabled from the start.

From the perspective of Glue Data Catalog, S3 Tables is simply another catalog source integrated as a federated catalog.

It does not require replacing existing crawler-based Glue Data Catalog environments. Both approaches can coexist.

When Should You Use It?

S3 Tables integration seems particularly well suited for:

Building new Iceberg tables with maintenance enabled by default.
Querying across multiple table buckets from Athena or Redshift.
Leveraging Lake Formation's fine-grained access controls.

Things to keep in mind:

Cross-account scenarios require manual mounting.
Query engines require the s3tablescatalog parent catalog path.
Existing crawler-based Glue Catalog environments do not need to be migrated to S3 Tables.

My Thoughts on the Future

As more specialized storage services like S3 Tables emerge, I believe Glue Data Catalog may evolve beyond being merely a Hive Metastore-compatible catalog and become more of an AWS-wide metadata hub.

Catalog Federation already allows external catalogs such as Snowflake Horizon Catalog and Databricks Unity Catalog to connect under Glue Data Catalog.

This suggests a future where:

The data can reside anywhere, but the catalog entry point is centralized in Glue Data Catalog.

The recently announced Amazon S3 Annotations feature also seems to support this direction.

S3 Annotations allows rich, mutable, and queryable metadata to be attached directly to S3 objects.

When annotation tables are enabled, S3 automatically indexes those annotations into fully managed Apache Iceberg tables that can be queried using Athena.

Interestingly, the official examples reference:

"s3tablescatalog/aws-s3"."b_my_media_bucket"."annotation"

which means that the s3tablescatalog hierarchy is now appearing outside the context of S3 Tables itself.

While S3 Tables turns data into Iceberg tables and integrates them into Glue Data Catalog, S3 Annotations appears to do something similar for object metadata.

AWS has not explicitly stated this direction.

However, when looking at S3 Tables, S3 Metadata, S3 Annotations, and Catalog Federation together, Glue Data Catalog increasingly looks like an AWS-wide metadata plane rather than simply a Hive Metastore-compatible service.

It feels as though AWS is moving toward a future where both data and metadata can be accessed through a common Iceberg-based access model.

If this trend continues, Glue Data Catalog may become even more important as the metadata plane for AWS data services.

The arrival of S3 Tables does not diminish the importance of Glue Data Catalog.

If anything, it clarifies its role as the hub that integrates multiple data sources and metadata sources.

Conclusion

To summarize:

Amazon S3 Tables does not replace AWS Glue Data Catalog.
S3 Tables integrates into Glue Data Catalog as a federated catalog.
Glue Data Catalog may become even more important as a hub that integrates multiple metadata sources.
The catalog hierarchy consists of:

Federated Catalog (s3tablescatalog)
        ↓
Child Catalog (table bucket)
        ↓
Database (namespace)
        ↓
Table

Access control can be implemented using either IAM or Lake Formation and migrated later if necessary.
The major change introduced by S3 Tables is that storage, metadata management, and table maintenance are now provided in a more integrated and managed manner through table buckets.

For building new AWS-native data lakes or lakehouses, S3 Tables is becoming a compelling option.

However, this is not because it replaces Glue Data Catalog.

Rather, it provides a more managed way to operate Iceberg tables on top of a metadata foundation centered around Glue Data Catalog.

I hope this article helps clarify the relationship between Amazon S3 Tables and AWS Glue Data Catalog.

DEV Community