DEV Community

Jacob for AWS Community Builders

Posted on

Duckberg!

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api

This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg

I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv

dependencies = [
  "duckberg>=0.3.1",
  "pyarrow>=19.0.1",
]
Enter fullscreen mode Exit fullscreen mode

Code

from duckberg import DuckBerg


def main():
    region = "eu-central-1"


    catalog_config: dict[str, str] = {
        "type": "rest", # Iceberg catalog type 
        "uri": f"https://glue.{region}.amazonaws.com/iceberg", 
        "rest.sigv4-enabled": "true",
        "rest.signing-name": "glue",
        "rest.signing-region": region
    }

    db = DuckBerg(
        catalog_name="aws_glue",
        database_names=["ibtest"],
        catalog_config=catalog_config)



    print(db.list_tables())

    query = "SELECT * FROM 'ibtest.ibtest1'"
    df = db.select(sql=query).read_pandas()
    print(df)



if __name__ == "__main__":
    main()


Enter fullscreen mode Exit fullscreen mode

Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region)

  • Catalog_name = any name
  • Database_name = your glue database
  • Table_name = your glue tables
['ibtest.ibtest1']
    id  name                 created
0  001  test 2024-12-22 13:48:31.381
Enter fullscreen mode Exit fullscreen mode
  • Note *

There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes.

This could be a nice option to add sqlglot here. As an advanced sql parsing library.

Top comments (0)