Duckberg!

#iceberg #duckdb #awsglue

I wrote a previous small blog about PyIceberg and Glue iceberg Rest Api

This week is saw the announcement of Duckberg, combining all the favorites in a single library: PyIceberg, DuckDB and Iceberg

I rewrote my previous code into this. Make sure you have the following dependencies installed with pip/poetry/uv

dependencies = [
  "duckberg>=0.3.1",
  "pyarrow>=19.0.1",
]

Code

from duckberg import DuckBerg


def main():
    region = "eu-central-1"


    catalog_config: dict[str, str] = {
        "type": "rest", # Iceberg catalog type 
        "uri": f"https://glue.{region}.amazonaws.com/iceberg", 
        "rest.sigv4-enabled": "true",
        "rest.signing-name": "glue",
        "rest.signing-region": region
    }

    db = DuckBerg(
        catalog_name="aws_glue",
        database_names=["ibtest"],
        catalog_config=catalog_config)



    print(db.list_tables())

    query = "SELECT * FROM 'ibtest.ibtest1'"
    df = db.select(sql=query).read_pandas()
    print(df)



if __name__ == "__main__":
    main()

Run the script with your AWS credentials in the environment (AWS_PROFILE or AWS_ACCESS_KEY and the correct region)

Catalog_name = any name
Database_name = your glue database
Table_name = your glue tables

['ibtest.ibtest1']
    id  name                 created
0  001  test 2024-12-22 13:48:31.381

Note *

There is a sql parser include to parse the table name out of the query to validate if it is an iceberg tables. This parser requires the table to be in 'database.table' format with single quotes.

This could be a nice option to add sqlglot here. As an advanced sql parsing library.

DEV Community

Duckberg!

Code

Top comments (0)