DEV Community

Cover image for Apache Gravitino Iceberg REST Catalog Access Control Deployment Guide

Apache Gravitino Iceberg REST Catalog Access Control Deployment Guide

1. Overview

1.1 Product Introduction

Apache Gravitino IRC (Iceberg REST Catalog) is an Iceberg REST catalog service based on Gravitino, providing unified Iceberg table management capabilities. Starting from v1.1.0, Gravitino IRC supports access control for Iceberg tables.

1.2 Key Features

  • ✅ Table operation authorization
  • ✅ Multi-tenancy support
  • ✅ RESTful API interface
  • ✅ Seamless integration with Spark
  • ✅ Role-based access control (RBAC)

1.3 Current Status

Currently supports table-level operation authorization, with more access control features to be added in the future.

2. System Architecture

2.1 Architecture Diagram

Architecture Diagram

2.2 Component Description

  • Gravitino Server: Core metadata service, primarily managing table permission information in this scenario; port 8090
  • Iceberg REST Service: Iceberg REST catalog service that connects to Gravitino Server via API to retrieve permission information; port 9002
  • MySQL: Metadata storage for both Gravitino and IRC
  • Object Storage: Data file storage

3. Environment Requirements

3.1 System Requirements

Resource Minimum Recommended
CPU 4 cores 8 cores
Memory 8GB 16GB
Disk 100GB 500GB
Network Gigabit 10 Gigabit

3.2 Software Dependencies

Software Version Notes
Java JDK 17+ Required
MySQL 5.7+ Metadata storage
Spark 3.4+ Optional, client

4. Configuration

4.1 Core Configuration File

Create gravitino.conf configuration file in GRAVITINO_HOME/conf:

# ============================================
# Gravitino Service Basic Configuration
# ============================================

# Service shutdown timeout
gravitino.server.shutdown.timeout = 3000

# ============================================
# Web Server Configuration
# ============================================

# Web server host address
gravitino.server.webserver.host = 0.0.0.0
# HTTP port
gravitino.server.webserver.httpPort = 8090
# Minimum threads
gravitino.server.webserver.minThreads = 24
# Maximum threads
gravitino.server.webserver.maxThreads = 200
# Stop timeout
gravitino.server.webserver.stopTimeout = 30000
# Idle timeout
gravitino.server.webserver.idleTimeout = 30000

# ============================================
# Entity Store Configuration (MySQL)
# ============================================

gravitino.entity.store = relational
gravitino.entity.store.relational = JDBCBackend
gravitino.entity.store.relational.jdbcUrl = jdbc:mysql://192.168.194.152:3306/gravitino
gravitino.entity.store.relational.jdbcDriver = com.mysql.cj.jdbc.Driver
gravitino.entity.store.relational.jdbcUser = gravitino
gravitino.entity.store.relational.jdbcPassword = gravitino

# ============================================
# Cache Configuration
# ============================================

gravitino.cache.enabled = true
gravitino.cache.maxEntries = 10000
gravitino.cache.expireTimeInMs = 3600000
gravitino.cache.enableWeigher = true
gravitino.cache.implementation = caffeine

# ============================================
# Authorization Configuration
# ============================================

gravitino.authorization.enable = true
gravitino.authorization.impl = org.apache.gravitino.server.authorization.jcasbin.JcasbinAuthorizer
gravitino.authorization.serviceAdmins = admin # Admin account for creating metalake
gravitino.authenticators = simple

# ============================================
# Iceberg REST Service Configuration
# ============================================

gravitino.auxService.names = iceberg-rest
gravitino.iceberg-rest.classpath = iceberg-rest-server/libs,iceberg-rest-server/conf
gravitino.iceberg-rest.host = 0.0.0.0
gravitino.iceberg-rest.httpPort = 9002
gravitino.iceberg-rest.catalog-config-provider = dynamic-config-provider
gravitino.iceberg-rest.gravitino-uri = http://localhost:8090/
gravitino.iceberg-rest.gravitino-metalake = my_metalake  # Metalake name to use
gravitino.iceberg-rest.gravitino-simple.user-name = rest-catalog # User for IRC service to fetch catalog info
gravitino.iceberg-rest.default-catalog-name = catalog_iceberg
Enter fullscreen mode Exit fullscreen mode

5. Deployment Process

5.1 Database Initialization

# Navigate to scripts directory
cd distribution/package/scripts

# Execute SQL based on database type
# MySQL example
mysql -h <host> -u <user> -p -D <database> < xxx.sql
Enter fullscreen mode Exit fullscreen mode

5.2 Download Dependencies

# Download MySQL driver
cd $GRAVITINO_HOME
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.27/mysql-connector-java-8.0.27.jar
cp mysql-connector-java-8.0.27.jar libs/
cp mysql-connector-java-8.0.27.jar catalogs/lakehouse-iceberg/libs
cp mysql-connector-java-8.0.27.jar iceberg-rest-server/libs

# Copy bundle jar files
cp -r bundles/aws-bundle/build/libs/*.jar distribution/package/catalogs/lakehouse-iceberg/libs
cp -r bundles/aws-bundle/build/libs/*.jar distribution/package/iceberg-rest-server/libs

wget https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.9.2/iceberg-aws-bundle-1.9.2.jar
cp iceberg-aws-bundle-1.9.2.jar distribution/package/iceberg-rest-server/libs
cp iceberg-aws-bundle-1.9.2.jar distribution/package/catalogs/lakehouse-iceberg/libs
Enter fullscreen mode Exit fullscreen mode

5.3 Start Services

# Start Gravitino service
/bin/bash bin/gravitino.sh start

# Check service status
/bin/bash bin/gravitino.sh status
Enter fullscreen mode Exit fullscreen mode

5.4 Create Metalake

If you haven't created a metalake yet, use the API (or web UI) to create one named my_metalake:

# Create Metalake with admin privileges
curl -X POST -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{
    "name": "my_metalake",
    "comment": "",
    "properties": {}
  }' http://localhost:8090/api/metalakes
Enter fullscreen mode Exit fullscreen mode

5.5 Create Iceberg Catalog

Register an Iceberg Catalog in Gravitino; it needs to use the same backend (such as HMS or JDBC) as the running Iceberg REST Service:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{
    "name": "catalog_iceberg",
    "type": "RELATIONAL",
    "provider": "lakehouse-iceberg",
    "comment": "Iceberg directory",
    "properties": {
      "uri": "jdbc:mysql://mysql-host:3306/iceberg_db",
      "catalog-backend": "jdbc",
      "warehouse": "s3://bucket/iceberg/warehouse/",
      "jdbc-user": "mysql_user",
      "jdbc-password": "mysql_password",
      "jdbc-driver": "com.mysql.cj.jdbc.Driver",
      "io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
      "s3-secret-access-key": "your_secret_key",
      "s3-access-key-id": "your_access_key",
      "s3-region": "ap-southeast-1",
      "authentication.type": "simple",
      "credential-providers": "s3-token",
      "s3-endpoint": "http://s3.ap-southeast-1.amazonaws.com",
      "jdbc-initialize": "true",
      "s3-role-arn": "arn:aws:iam::730335553010:role/sts_s3_access_role"
    }
  }' http://localhost:8090/api/metalakes/my_metalake/catalogs
Enter fullscreen mode Exit fullscreen mode

6. Access Control Management

Next, we will use Gravitino's RBAC permission model to configure access control for the Iceberg Catalog.

6.1 Permission Model

Gravitino provides the following privileges related to catalog/schema/table:

Privilege Type Description Applicable Objects
USE_CATALOG Permission to use catalog Catalog
USE_SCHEMA Permission to use schema Schema, Catalog
SELECT_TABLE Permission to query table Table, Schema, Catalog
MODIFY_TABLE Permission to modify table Table, Schema, Catalog
CREATE_TABLE Permission to create table Schema, Catalog
CREATE_SCHEMA Permission to create schema Catalog

6.2 Create Roles and Permissions

Create a role named "data_reader" with various privileges on catalog, schema, and table. Please adjust the catalog, schema, and table names accordingly.

# Create schema
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -H "Content-Type: application/json" -d '{
  "name": "schema1",
  "comment": "comment",
  "properties": {
    "key1": "value1"
  }
}' http://localhost:8090/api/metalakes/my_metalake/catalogs/catalog_iceberg/schemas

# Create role
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{
    "name": "data_reader",
    "properties": {"description": "data read"},
    "securableObjects": [
      {
        "fullName": "catalog_iceberg.schema1",
        "type": "SCHEMA",
        "privileges": [
          {"name": "CREATE_TABLE", "condition": "ALLOW"},
          {"name": "USE_SCHEMA", "condition": "ALLOW"}
        ]
      },
      {
        "fullName": "catalog_iceberg",
        "type": "CATALOG",
        "privileges": [{"name": "USE_CATALOG", "condition": "ALLOW"}]
      }
    ]
  }' http://localhost:8090/api/metalakes/my_metalake/roles
Enter fullscreen mode Exit fullscreen mode

Create a role for rest_server to allow it to fetch catalog information:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{
    "name": "catalog_reader",
    "properties": {"description": "load catalog infos"},
    "securableObjects": [
      {
        "fullName": "my_metalake",
        "type": "METALAKE",
        "privileges": [{"name": "USE_CATALOG", "condition": "ALLOW"}]
      }
    ]
  }' http://localhost:8090/api/metalakes/my_metalake/roles
Enter fullscreen mode Exit fullscreen mode

6.3 Create Users and Grant Permissions

Create a user such as spark_user in Gravitino and grant them the role created above:

# Create user
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{"name": "spark_user"}' \
  http://localhost:8090/api/metalakes/my_metalake/users

# Grant permissions to user
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{"roleNames": ["data_reader"]}' \
  http://localhost:8090/api/metalakes/my_metalake/permissions/users/spark_user/grant
Enter fullscreen mode Exit fullscreen mode

Create user rest-catalog in Gravitino and grant permissions to load catalog:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{"name": "rest-catalog"}' \
  http://localhost:8090/api/metalakes/my_metalake/users

# Grant permissions to user
curl -X PUT -H "Accept: application/vnd.gravitino.v1+json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $(echo -n 'admin:password' | base64)" \
  -d '{"roleNames": ["catalog_reader"]}' \
  http://localhost:8090/api/metalakes/my_metalake/permissions/users/rest-catalog/grant
Enter fullscreen mode Exit fullscreen mode

7. Spark Integration

After configuring permissions in Gravitino, you can test and verify on the client side.

7.1 Spark Configuration

Using Spark as an example, you need to configure the username on the client and point the Iceberg REST service to the IRC service address.

spark-sql \
  --jars "/path/to/iceberg-aws-bundle-1.9.2.jar,/path/to/iceberg-spark-runtime-3.4_2.12-1.9.2.jar" \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.rest.type=rest \
  --conf spark.sql.catalog.rest.uri=http://localhost:9002/iceberg/ \
  --conf spark.sql.catalog.rest..X-Iceberg-Access-Delegation=vended-credentials \
  --conf spark.sql.catalog.rest.rest.auth.type=basic \
  --conf spark.sql.catalog.rest.rest.auth.basic.username=spark_user \
  --conf spark.sql.catalog.rest.rest.auth.basic.password=user_password
Enter fullscreen mode Exit fullscreen mode

7.2 Usage Examples

-- Show available tables
SHOW TABLES IN rest.schema1;

-- Query data
SELECT * FROM rest.schema1.table1;

-- Create table
CREATE TABLE rest.schema1.table2 (
  id BIGINT,
  name STRING
) USING iceberg;
Enter fullscreen mode Exit fullscreen mode

Summary

Through this guide, you will master:

  1. Complete Deployment Process - End-to-end guidance from environment preparation, database initialization, dependency downloads to service startup
  2. Access Control System - Understanding Gravitino's RBAC permission model, learning to create roles, assign permissions, and manage users
  3. Real-world Application Scenarios - Learning how to use IRC access control in production through Spark integration examples
  4. Core Configuration Points - Mastering key configuration parameters for Gravitino Server and Iceberg REST Service

This solution provides enterprise-grade access control capabilities for your data lake, implementing fine-grained table-level permission management while ensuring data security and maintaining excellent usability.

Top comments (0)