DEV Community

Jacob for AWS Community Builders

Posted on

Unity Catalog Iceberg Rest Api and PyIceberg

Access Unity tables via the Iceberg Rest Api

After working with glue catalog in the previous article, i wanted it to test with Unity Catalog. The opensource catalog from DataBricks.

Installing Unity Catalog

I followed the quickstart steps from: https://docs.unitycatalog.io/quickstart/.
It needs java 17, my mac installation had 23 installed by default. I used the following with jenv to switch quickly

Install java 17 from brew with jenv

brew install microsoft-openjdk@17 jenv
jenv add /opt/homebrew/opt/openjdk@17
Enter fullscreen mode Exit fullscreen mode

Clone repo and setup java to 17

git clone git@github.com:unitycatalog/unitycatalog.git
cd unitycatalog
jenv local 17
jenv rehash
Enter fullscreen mode Exit fullscreen mode

Start (and build the first):

bin/start-uc-server
Enter fullscreen mode Exit fullscreen mode

Use Uniform tables

Unity is not supporting native iceberg yet. But it does support the uniform delta format which can be used with delta and iceberg

To setup the test environment follow: https://docs.unitycatalog.io/usage/tables/uniform/

cp -r etc/data/external/unity/default/tables/marksheet_uniform /tmp/marksheet_uniform
Enter fullscreen mode Exit fullscreen mode

Rest Api

The iceberg api is available from: http://127.0.0.1:8080/api/2.1/unity-catalog/iceberg/

According the documentation the tables can be used with the following format, but i had to change this

When querying Iceberg REST Catalog for Unity Catalog, tables are identified using the following pattern iceberg... (e.g. iceberg.unity.default.marksheet_uniform).

Code

By default the local unity catalog doesn't use authentication. I'll test this later.

from pyiceberg.catalog import load_catalog
import logging


def main():
    rest_catalog = load_catalog(
    "databricks",
    **{
        "type": "rest",
        "warehouse": "unity",
        "uri": "http://127.0.0.1:8080/api/2.1/unity-catalog/iceberg",
    }
    )
    print(rest_catalog.list_namespaces())
    print(rest_catalog.list_tables("default"))
    print(rest_catalog.load_table("default.marksheet_uniform").scan().to_pandas())


if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Output

2024-12-29 12:53:45,806 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080
2024-12-29 12:53:45,809 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "GET /api/2.1/unity-catalog/iceberg/v1/config?warehouse=unity HTTP/1.1" 200 55
2024-12-29 12:53:45,810 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080
2024-12-29 12:53:45,815 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "GET /api/2.1/unity-catalog/iceberg/v1/catalogs/unity/namespaces HTTP/1.1" 200 28
[('default',)]
2024-12-29 12:53:45,836 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "GET /api/2.1/unity-catalog/iceberg/v1/catalogs/unity/namespaces/default/tables HTTP/1.1" 200 70
[('default', 'marksheet_uniform')]
2024-12-29 12:53:45,853 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "GET /api/2.1/unity-catalog/iceberg/v1/catalogs/unity/namespaces/default/tables/marksheet_uniform HTTP/1.1" 200 2238
2024-12-29 12:53:45,853 - pyiceberg.io - INFO - Defaulting to PyArrow FileIO
    id        name  marks
0    1  nWYHawtqUw    930
1    2  uvOzzthsLV    166
2    3  WIAehuXWkv    170
3    4  wYCSvnJKTo    709
Enter fullscreen mode Exit fullscreen mode

Difference

It seems the implementation is slightly different that the glue iceberg catalog and also not fully features.

Only "issue" is that the setup requests a parameter warehouse in the setup. I pass this in the config with "warehouse": "unity"
Where warehouse is the same as catalog.

After that it responds the same as the glue iceberg catalog rest api. I'm able to list the namespace, tables and view the data.

For the full source code, this can be found here: https://github.com/unitycatalog/unitycatalog/blob/main/server/src/main/java/io/unitycatalog/server/service/IcebergRestCatalogService.java

Next

  1. Writing data
  2. Trying real iceberg tables
  3. Comparing functionality off the rest api's.
👋 While you are here

Reinvent your career. Join DEV.

It takes one minute and is worth it for your career.

Get started

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay