Access Glue Iceberg tables via the Iceberg Rest Api
AWS Released silently Iceberg REST-API support. This is a standard API to access iceberg tables on different platforms. More information can be found here https://iceberg.apache.org/concepts/catalog/
PyIceberg is a python library with generic iceberg support. It also supports the rest api. Other tools are pyspark
Example code to use a catalog via the Iceberg Rest API from Glue.
from pyiceberg.catalog import load_catalog
import logging
# Set up logging to show debug messages
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Specifically for PyIceberg logging
logger = logging.getLogger('pyiceberg')
logger.setLevel(logging.DEBUG)
def main():
rest_catalog = load_catalog(
"ibtest1",
**{
"type": "rest",
"uri": "https://glue.eu-central-1.amazonaws.com/iceberg",
"rest.sigv4-enabled": "true",
"rest.signing-name": "glue",
"rest.signing-region": "eu-central-1"
}
)
print(rest_catalog.list_namespaces())
print(rest_catalog.list_tables("ibtest"))
print(rest_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
if __name__ == "__main__":
main()
Glue Catalog version
For comparison this is the native glue version in pyiceberg. This uses the boto api.
def main():
glue_catalog = load_catalog("glue", **{"type": "glue"})
print(glue_catalog.list_namespaces())
print(glue_catalog.list_tables("ibtest"))
print(glue_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
Output
I'll add the full debug output here, to show that i only uses the rest api. There are no requests to s3. I run with with the aws credentials with glue and s3 permissions in the environment as AWS_PROFILE.
ListNameSpace
2024-12-22 15:13:41,061 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/config
accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,062 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,213 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/config HTTP/1.1" 200 327
2024-12-22 15:13:41,237 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces
accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,237 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,435 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces HTTP/1.1" 200 48
[('ibtest',), ('sourcedata_sales',)]
ListTable
2024-12-22 15:13:41,462 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables
accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Signature:
xxxx
2024-12-22 15:13:41,541 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables HTTP/1.1" 200 59
[('ibtest', 'ibtest1')]
Scan Tables
2024-12-22 15:13:41,567 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1
accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials
accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Signature: xxxxxx
2024-12-22 15:13:41,712 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 HTTP/1.1" 200 2123
id name created
0 001 test 2024-12-22 13:48:31.381
Urls
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/config
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces
- https://glue.eu-central-1.amazonaws.com:/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables
- https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1
Curl option
I remembered that curl has a sigv4 option. I tried it with the same iam credentials and the sigv4 sign area of aws:amz:<region>:glue
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/config --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"
Combining this with the urls discovered above shows the output
Get the namespaces
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126/namespaces --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" <aws:ibtest>
{"namespaces":[["ibtest"],["sourcedata_sales"]]}
Get the Table Info
curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" | jq
{
"config": {
"metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json",
"previous_metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
"table_type": "ICEBERG"
},
"metadata": {
"current-schema-id": 0,
"current-snapshot-id": 968789183104214971,
"default-sort-order-id": 0,
"default-spec-id": 0,
"format-version": 2,
"last-column-id": 3,
"last-partition-id": 999,
"last-sequence-number": 1,
"last-updated-ms": 1734875312126,
"location": "s3://ibtest-123456789012/ibtest/ibtest1",
"metadata-log": [
{
"metadata-file": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
"timestamp-ms": 1734874992670
}
],
"partition-specs": [
{
"fields": [],
"spec-id": 0
}
],
"partition-statistics-files": [],
"properties": {
"write.parquet.compression-codec": "zstd"
},
"refs": {
"main": {
"snapshot-id": 968789183104214971,
"type": "branch"
}
},
"schemas": [
{
"fields": [
{
"doc": "",
"id": 1,
"name": "id",
"required": false,
"type": "string"
},
{
"doc": "",
"id": 2,
"name": "name",
"required": false,
"type": "string"
},
{
"doc": "",
"id": 3,
"name": "created",
"required": false,
"type": "timestamp"
}
],
"schema-id": 0,
"type": "struct"
}
],
"snapshot-log": [
{
"snapshot-id": 968789183104214971,
"timestamp-ms": 1734875312126
}
],
"snapshots": [
{
"manifest-list": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/snap-968789183104214971-1-86672725-3389-414f-b8f4-7f4aaa6401b0.avro",
"schema-id": 0,
"sequence-number": 1,
"snapshot-id": 968789183104214971,
"summary": {
"changed-partition-count": "1",
"added-data-files": "1",
"total-equality-deletes": "0",
"added-records": "1",
"trino_query_id": "20241222_134831_00070_aiuhg",
"total-position-deletes": "0",
"added-files-size": "507",
"total-delete-files": "0",
"total-files-size": "507",
"total-records": "1",
"total-data-files": "1",
"operation": "append"
},
"timestamp-ms": 1734875312126
}
],
"sort-orders": [
{
"fields": [],
"order-id": 0
}
],
"statistics-files": [],
"table-uuid": "d4dbfb4a-93b4-4255-9ce3-cfaa280fa40c"
},
"metadata-location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json"
}
Conclusion
With the latest release of Glue you can access Iceberg tables on AWS using the standard iceberg REST_API opening the infrastructure to multiple tools.
The only AWS specific call is the signing with sigv4. https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html. This is already used by many tools for the S3 access.
This decouples the code from AWS Specific access and allows you to use more generic tools
Next steps
- Test with more tools (Curl update added)
- Test with the new S3 Tables (iceberg)
- Can we use this with Unity?
Top comments (0)