DEV Community

Jacob for AWS Community Builders

Posted on

Glue Iceberg Rest Api and PyIceberg

Access Glue Iceberg tables via the Iceberg Rest Api

AWS Released silently Iceberg REST-API support. This is a standard API to access iceberg tables on different platforms. More information can be found here https://iceberg.apache.org/concepts/catalog/

PyIceberg is a python library with generic iceberg support. It also supports the rest api. Other tools are pyspark

Example code to use a catalog via the Iceberg Rest API from Glue.

from pyiceberg.catalog import load_catalog
import logging

# Set up logging to show debug messages
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Specifically for PyIceberg logging
logger = logging.getLogger('pyiceberg')
logger.setLevel(logging.DEBUG)



def main():
    rest_catalog = load_catalog(
    "ibtest1",
    **{
        "type": "rest",
        "uri": "https://glue.eu-central-1.amazonaws.com/iceberg",
             "rest.sigv4-enabled": "true",
               "rest.signing-name": "glue",
               "rest.signing-region": "eu-central-1"
    }
    )
    print(rest_catalog.list_namespaces())
    print(rest_catalog.list_tables("ibtest"))
  print(rest_catalog.load_table("ibtest.ibtest1").scan().to_pandas())



if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Glue Catalog version

For comparison this is the native glue version in pyiceberg. This uses the boto api.

def main():
    glue_catalog = load_catalog("glue", **{"type": "glue"})


    print(glue_catalog.list_namespaces())
    print(glue_catalog.list_tables("ibtest"))
    print(glue_catalog.load_table("ibtest.ibtest1").scan().to_pandas())
Enter fullscreen mode Exit fullscreen mode

Output

I'll add the full debug output here, to show that i only uses the rest api. There are no requests to s3. I run with with the aws credentials with glue and s3 permissions in the environment as AWS_PROFILE.

ListNameSpace

2024-12-22 15:13:41,061 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/config

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,062 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,062 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,213 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/config HTTP/1.1" 200 327
2024-12-22 15:13:41,237 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,237 - botocore.auth - DEBUG - Signature:
xxxxxx
2024-12-22 15:13:41,237 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): glue.eu-central-1.amazonaws.com:443
2024-12-22 15:13:41,435 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces HTTP/1.1" 200 48
[('ibtest',), ('sourcedata_sales',)]
Enter fullscreen mode Exit fullscreen mode

ListTable

2024-12-22 15:13:41,462 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxxxx
2024-12-22 15:13:41,463 - botocore.auth - DEBUG - Signature:
xxxx
2024-12-22 15:13:41,541 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables HTTP/1.1" 200 59

[('ibtest', 'ibtest1')]
Enter fullscreen mode Exit fullscreen mode

Scan Tables

2024-12-22 15:13:41,567 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Calculating signature using v4 auth.
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - CanonicalRequest:
GET
/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1

accept:*/*
accept-encoding:gzip, deflate
content-type:application/json
host:glue.eu-central-1.amazonaws.com
x-amz-date:20241222T141341Z
x-client-version:0.14.1
x-iceberg-access-delegation:vended-credentials

accept;accept-encoding;content-type;host;x-amz-date;x-client-version;x-iceberg-access-delegation
xxxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - StringToSign:
AWS4-HMAC-SHA256
20241222T141341Z
20241222/eu-central-1/glue/aws4_request
xxxxx
2024-12-22 15:13:41,567 - botocore.auth - DEBUG - Signature: xxxxxx
2024-12-22 15:13:41,712 - urllib3.connectionpool - DEBUG - https://glue.eu-central-1.amazonaws.com:443 "GET /iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1 HTTP/1.1" 200 2123

    id  name                 created
0  001  test 2024-12-22 13:48:31.381
Enter fullscreen mode Exit fullscreen mode

Urls

Curl option

I remembered that curl has a sigv4 option. I tried it with the same iam credentials and the sigv4 sign area of aws:amz:<region>:glue

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/config --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"

Combining this with the urls discovered above shows the output

Get the namespaces

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/311141556126/namespaces  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue"      <aws:ibtest>
{"namespaces":[["ibtest"],["sourcedata_sales"]]}
Enter fullscreen mode Exit fullscreen mode

Get the Table Info

curl https://glue.eu-central-1.amazonaws.com/iceberg/v1/catalogs/123456789012/namespaces/ibtest/tables/ibtest1  --user "$AWS_KEY:$AWS_SEC" --aws-sigv4 "aws:amz:eu-central-1:glue" | jq
Enter fullscreen mode Exit fullscreen mode
{
  "config": {
    "metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json",
    "previous_metadata_location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
    "table_type": "ICEBERG"
  },
  "metadata": {
    "current-schema-id": 0,
    "current-snapshot-id": 968789183104214971,
    "default-sort-order-id": 0,
    "default-spec-id": 0,
    "format-version": 2,
    "last-column-id": 3,
    "last-partition-id": 999,
    "last-sequence-number": 1,
    "last-updated-ms": 1734875312126,
    "location": "s3://ibtest-123456789012/ibtest/ibtest1",
    "metadata-log": [
      {
        "metadata-file": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00000-60501f31-ad65-4cd8-92be-85dc2cc99d70.metadata.json",
        "timestamp-ms": 1734874992670
      }
    ],
    "partition-specs": [
      {
        "fields": [],
        "spec-id": 0
      }
    ],
    "partition-statistics-files": [],
    "properties": {
      "write.parquet.compression-codec": "zstd"
    },
    "refs": {
      "main": {
        "snapshot-id": 968789183104214971,
        "type": "branch"
      }
    },
    "schemas": [
      {
        "fields": [
          {
            "doc": "",
            "id": 1,
            "name": "id",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 2,
            "name": "name",
            "required": false,
            "type": "string"
          },
          {
            "doc": "",
            "id": 3,
            "name": "created",
            "required": false,
            "type": "timestamp"
          }
        ],
        "schema-id": 0,
        "type": "struct"
      }
    ],
    "snapshot-log": [
      {
        "snapshot-id": 968789183104214971,
        "timestamp-ms": 1734875312126
      }
    ],
    "snapshots": [
      {
        "manifest-list": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/snap-968789183104214971-1-86672725-3389-414f-b8f4-7f4aaa6401b0.avro",
        "schema-id": 0,
        "sequence-number": 1,
        "snapshot-id": 968789183104214971,
        "summary": {
          "changed-partition-count": "1",
          "added-data-files": "1",
          "total-equality-deletes": "0",
          "added-records": "1",
          "trino_query_id": "20241222_134831_00070_aiuhg",
          "total-position-deletes": "0",
          "added-files-size": "507",
          "total-delete-files": "0",
          "total-files-size": "507",
          "total-records": "1",
          "total-data-files": "1",
          "operation": "append"
        },
        "timestamp-ms": 1734875312126
      }
    ],
    "sort-orders": [
      {
        "fields": [],
        "order-id": 0
      }
    ],
    "statistics-files": [],
    "table-uuid": "d4dbfb4a-93b4-4255-9ce3-cfaa280fa40c"
  },
  "metadata-location": "s3://ibtest-123456789012/ibtest/ibtest1/metadata/00001-5801d3f4-952a-4a22-b63a-415aa4378d69.metadata.json"
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

With the latest release of Glue you can access Iceberg tables on AWS using the standard iceberg REST_API opening the infrastructure to multiple tools.
The only AWS specific call is the signing with sigv4. https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html. This is already used by many tools for the S3 access.

This decouples the code from AWS Specific access and allows you to use more generic tools

Next steps

  • Test with more tools (Curl update added)
  • Test with the new S3 Tables (iceberg)
  • Can we use this with Unity?

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Create a simple OTP system with AWS Serverless cover image

Create a simple OTP system with AWS Serverless

Implement a One Time Password (OTP) system with AWS Serverless services including Lambda, API Gateway, DynamoDB, Simple Email Service (SES), and Amplify Web Hosting using VueJS for the frontend.

Read full post