DEV Community

Jacob for AWS Community Builders

Posted on

Glue Iceberg Rest Api and PyIceberg

Access Glue Iceberg tables via the Iceberg Rest Api

AWS Released silently Iceberg REST-API support. This is a standard API to access iceberg tables on different platforms. More information can be found here https://iceberg.apache.org/concepts/catalog/

PyIceberg is a python library with generic iceberg support. It also supports the rest api. Other tools are pyspark

Example code to use a catalog via the Iceberg Rest API from Glue.

from pyiceberg.catalog import load_catalog
import logging

# Set up logging to show debug messages
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Specifically for PyIceberg logging
logger = logging.getLogger('pyiceberg')
logger.setLevel(logging.DEBUG)



def main():
    rest_catalog = load_catalog(
    "ibtest1",
    **{
        "type": "rest",
        "uri": "https://glue.eu-central-1.amazonaws.com/iceberg",
             "rest.sigv4-enabled": "true",
               "rest.signing-name": "glue",
               "rest.signing-region": "eu-central-1"
    }
    )
    print(rest_catalog.list_namespaces())
    print(rest_catalog.list_tables("ibtest"))
  print(rest_catalog.load_table("ibtest.ibtest1").scan().to_pandas())



if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Glue Catalog version

For comparison this is the native glue version in pyiceberg. This uses the boto api.

def main():
    glue_catalog = load_catalog("glue", **{"type": "glue"})


    print(glue_catalog.list_namespaces())
    print(glue_catalog.list_tables("ibtest"))
    print(glue_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
Enter fullscreen mode Exit fullscreen mode

Output

I'll add the full debug output here, to show that i only uses the rest api. There are no requests to s3. I run with with the aws credentials with glue and s3 permissions in the environment as AWS_PROFILE.

ListNameSpace

2024-12-22 15:13:41,061 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/config

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,062 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,213 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/config HTTP/1.1" 200 327
2024-12-22 15:13:41,237 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,237 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,435 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces HTTP/1.1" 200 48
[('ibtest',), ('sourcedata_sales',)]
Enter fullscreen mode Exit fullscreen mode

ListTable

2024-12-22 15:13:41,462 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Signature:
xxxx
2024-12-22 15:13:41,541 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables HTTP/1.1" 200 59

[('ibtest', 'ibtest1')]
Enter fullscreen mode Exit fullscreen mode

Scan Tables

2024-12-22 15:13:41,567 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Signature: xxxxxx
2024-12-22 15:13:41,712 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 HTTP/1.1" 200 2123

    id  name                 created
0  001  test 2024-12-22 13:48:31.381
Enter fullscreen mode Exit fullscreen mode

Urls

Curl option

I remembered that curl has a sigv4 option. I tried it with the same iam credentials and the sigv4 sign area of aws:amz:<region>:glue

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/config --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"

Combining this with the urls discovered above shows the output

Get the namespaces

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126/namespaces  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"      <aws:ibtest>
{"namespaces":[["ibtest"],["sourcedata_sales"]]}
Enter fullscreen mode Exit fullscreen mode

Get the Table Info

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" | jq
Enter fullscreen mode Exit fullscreen mode
{
  "config": {
    "metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json",
    "previous_metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
    "table_type": "ICEBERG"
  },
  "metadata": {
    "current-schema-id": 0,
    "current-snapshot-id": 968789183104214971,
    "default-sort-order-id": 0,
    "default-spec-id": 0,
    "format-version": 2,
    "last-column-id": 3,
    "last-partition-id": 999,
    "last-sequence-number": 1,
    "last-updated-ms": 1734875312126,
    "location": "s3://ibtest-123456789012/ibtest/ibtest1",
    "metadata-log": [
      {
        "metadata-file": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
        "timestamp-ms": 1734874992670
      }
    ],
    "partition-specs": [
      {
        "fields": [],
        "spec-id": 0
      }
    ],
    "partition-statistics-files": [],
    "properties": {
      "write.parquet.compression-codec": "zstd"
    },
    "refs": {
      "main": {
        "snapshot-id": 968789183104214971,
        "type": "branch"
      }
    },
    "schemas": [
      {
        "fields": [
          {
            "doc": "",
            "id": 1,
            "name": "id",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 2,
            "name": "name",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 3,
            "name": "created",
            "required": false,
            "type": "timestamp"
          }
        ],
        "schema-id": 0,
        "type": "struct"
      }
    ],
    "snapshot-log": [
      {
        "snapshot-id": 968789183104214971,
        "timestamp-ms": 1734875312126
      }
    ],
    "snapshots": [
      {
        "manifest-list": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/snap-968789183104214971-1-86672725-3389-414f-b8f4-7f4aaa6401b0.avro",
        "schema-id": 0,
        "sequence-number": 1,
        "snapshot-id": 968789183104214971,
        "summary": {
          "changed-partition-count": "1",
          "added-data-files": "1",
          "total-equality-deletes": "0",
          "added-records": "1",
          "trino_query_id": "20241222_134831_00070_aiuhg",
          "total-position-deletes": "0",
          "added-files-size": "507",
          "total-delete-files": "0",
          "total-files-size": "507",
          "total-records": "1",
          "total-data-files": "1",
          "operation": "append"
        },
        "timestamp-ms": 1734875312126
      }
    ],
    "sort-orders": [
      {
        "fields": [],
        "order-id": 0
      }
    ],
    "statistics-files": [],
    "table-uuid": "d4dbfb4a-93b4-4255-9ce3-cfaa280fa40c"
  },
  "metadata-location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json"
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

With the latest release of Glue you can access Iceberg tables on AWS using the standard iceberg REST_API opening the infrastructure to multiple tools.
The only AWS specific call is the signing with sigv4. https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html. This is already used by many tools for the S3 access.

This decouples the code from AWS Specific access and allows you to use more generic tools

Next steps

  • Test with more tools (Curl update added)
  • Test with the new S3 Tables (iceberg)
  • Can we use this with Unity?

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post