DEV Community

Jacob for AWS Community Builders

Posted on

Glue Iceberg Rest Api and PyIceberg

Access Glue Iceberg tables via the Iceberg Rest Api

AWS Released silently Iceberg REST-API support. This is a standard API to access iceberg tables on different platforms. More information can be found here https://iceberg.apache.org/concepts/catalog/

PyIceberg is a python library with generic iceberg support. It also supports the rest api. Other tools are pyspark

Example code to use a catalog via the Iceberg Rest API from Glue.

from pyiceberg.catalog import load_catalog
import logging

# Set up logging to show debug messages
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Specifically for PyIceberg logging
logger = logging.getLogger('pyiceberg')
logger.setLevel(logging.DEBUG)



def main():
    rest_catalog = load_catalog(
    "ibtest1",
    **{
        "type": "rest",
        "uri": "https://glue.eu-central-1.amazonaws.com/iceberg",
             "rest.sigv4-enabled": "true",
               "rest.signing-name": "glue",
               "rest.signing-region": "eu-central-1"
    }
    )
    print(rest_catalog.list_namespaces())
    print(rest_catalog.list_tables("ibtest"))
  print(rest_catalog.load_table("ibtest.ibtest1").scan().to_pandas())



if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Glue Catalog version

For comparison this is the native glue version in pyiceberg. This uses the boto api.

def main():
    glue_catalog = load_catalog("glue", **{"type": "glue"})


    print(glue_catalog.list_namespaces())
    print(glue_catalog.list_tables("ibtest"))
    print(glue_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
Enter fullscreen mode Exit fullscreen mode

Output

I'll add the full debug output here, to show that i only uses the rest api. There are no requests to s3. I run with with the aws credentials with glue and s3 permissions in the environment as AWS_PROFILE.

ListNameSpace

2024-12-22 15:13:41,061 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/config

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,062 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,213 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/config HTTP/1.1" 200 327
2024-12-22 15:13:41,237 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,237 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,435 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces HTTP/1.1" 200 48
[('ibtest',), ('sourcedata_sales',)]
Enter fullscreen mode Exit fullscreen mode

ListTable

2024-12-22 15:13:41,462 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Signature:
xxxx
2024-12-22 15:13:41,541 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables HTTP/1.1" 200 59

[('ibtest', 'ibtest1')]
Enter fullscreen mode Exit fullscreen mode

Scan Tables

2024-12-22 15:13:41,567 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Signature: xxxxxx
2024-12-22 15:13:41,712 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 HTTP/1.1" 200 2123

    id  name                 created
0  001  test 2024-12-22 13:48:31.381
Enter fullscreen mode Exit fullscreen mode

Urls

Curl option

I remembered that curl has a sigv4 option. I tried it with the same iam credentials and the sigv4 sign area of aws:amz:<region>:glue

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/config --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"

Combining this with the urls discovered above shows the output

Get the namespaces

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126/namespaces  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"      <aws:ibtest>
{"namespaces":[["ibtest"],["sourcedata_sales"]]}
Enter fullscreen mode Exit fullscreen mode

Get the Table Info

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" | jq
Enter fullscreen mode Exit fullscreen mode
{
  "config": {
    "metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json",
    "previous_metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
    "table_type": "ICEBERG"
  },
  "metadata": {
    "current-schema-id": 0,
    "current-snapshot-id": 968789183104214971,
    "default-sort-order-id": 0,
    "default-spec-id": 0,
    "format-version": 2,
    "last-column-id": 3,
    "last-partition-id": 999,
    "last-sequence-number": 1,
    "last-updated-ms": 1734875312126,
    "location": "s3://ibtest-123456789012/ibtest/ibtest1",
    "metadata-log": [
      {
        "metadata-file": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
        "timestamp-ms": 1734874992670
      }
    ],
    "partition-specs": [
      {
        "fields": [],
        "spec-id": 0
      }
    ],
    "partition-statistics-files": [],
    "properties": {
      "write.parquet.compression-codec": "zstd"
    },
    "refs": {
      "main": {
        "snapshot-id": 968789183104214971,
        "type": "branch"
      }
    },
    "schemas": [
      {
        "fields": [
          {
            "doc": "",
            "id": 1,
            "name": "id",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 2,
            "name": "name",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 3,
            "name": "created",
            "required": false,
            "type": "timestamp"
          }
        ],
        "schema-id": 0,
        "type": "struct"
      }
    ],
    "snapshot-log": [
      {
        "snapshot-id": 968789183104214971,
        "timestamp-ms": 1734875312126
      }
    ],
    "snapshots": [
      {
        "manifest-list": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/snap-968789183104214971-1-86672725-3389-414f-b8f4-7f4aaa6401b0.avro",
        "schema-id": 0,
        "sequence-number": 1,
        "snapshot-id": 968789183104214971,
        "summary": {
          "changed-partition-count": "1",
          "added-data-files": "1",
          "total-equality-deletes": "0",
          "added-records": "1",
          "trino_query_id": "20241222_134831_00070_aiuhg",
          "total-position-deletes": "0",
          "added-files-size": "507",
          "total-delete-files": "0",
          "total-files-size": "507",
          "total-records": "1",
          "total-data-files": "1",
          "operation": "append"
        },
        "timestamp-ms": 1734875312126
      }
    ],
    "sort-orders": [
      {
        "fields": [],
        "order-id": 0
      }
    ],
    "statistics-files": [],
    "table-uuid": "d4dbfb4a-93b4-4255-9ce3-cfaa280fa40c"
  },
  "metadata-location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json"
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

With the latest release of Glue you can access Iceberg tables on AWS using the standard iceberg REST_API opening the infrastructure to multiple tools.
The only AWS specific call is the signing with sigv4. https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html. This is already used by many tools for the S3 access.

This decouples the code from AWS Specific access and allows you to use more generic tools

Next steps

  • Test with more tools (Curl update added)
  • Test with the new S3 Tables (iceberg)
  • Can we use this with Unity?

Top comments (0)