Yue @ Datastrato (Admin) for Apache Gravitino

Posted on Feb 12

Using Apache Gravitino with Trino for Query Federation

#gravitino101 #trino #metadata #lakehouse

Author: Yu hui

Last Updated: 2026-02-11

Overview

In this tutorial, you will learn how to integrate Apache Gravitino with Trino to enable query federation across multiple data sources through a unified metadata layer. By the end of this guide, you'll be able to configure Trino to automatically load catalogs from Gravitino and run cross-catalog queries seamlessly.

What you'll accomplish:

Connect Trino to Gravitino to enable automatic loading of Gravitino-managed catalogs
Create catalogs from Trino SQL including Iceberg and MySQL examples
Execute federated queries that join data across heterogeneous sources
Validate catalog discovery and inspect catalogs using Trino SQL

Trino is a distributed SQL query engine designed for fast analytic queries against data of any size. In modern data architectures, organizations often need to query data across multiple heterogeneous systems (like MySQL, PostgreSQL, Iceberg, Hive) without moving or copying data. This is where query federation becomes essential.

Apache Gravitino simplifies this by acting as a unified metadata control plane. By using the Gravitino Trino Connector, you can access multiple data sources through a single catalog interface in Trino, with automatic catalog discovery and centralized metadata management.

Key benefits:

Unified catalog access: Query MySQL, Iceberg, Hive, and other sources through a single interface
Automatic catalog discovery: Catalogs created in Gravitino are automatically available in Trino
Zero data movement: Join across heterogeneous systems without copying data
Centralized management: Create and update catalogs in one place, reflected everywhere

Architecture overview:

Prerequisites

Before starting this tutorial, you will need:

System Requirements:

Linux or macOS operating system with outbound internet access for downloads
JDK 17 or higher installed and properly configured

Required Components:

Gravitino server installed and running (see 02-setup-guide/README.md)
Trino server (coordinator + workers, or single-node for testing)
Trino version 435 or compatible version for your Gravitino Trino connector release

Optional Components:

MySQL or PostgreSQL for JDBC federation examples
Hive Metastore for Iceberg catalog backend
Object storage (S3/GCS/Azure) for cloud-based table storage

Before proceeding, verify your Java installation:

${JAVA_HOME}/bin/java -version

Important: Ensure your Gravitino server is configured to use simple authentication mode. The Gravitino Trino connector currently connects as an anonymous user and does not propagate user authentication.

Setup

How the Integration Works

The Gravitino Trino Connector enables Trino to dynamically load catalogs from Gravitino:

The connector is configured as a Trino catalog named gravitino via etc/catalog/gravitino.properties
You create additional catalogs (like iceberg_test and mysql_test) through Gravitino stored procedures or REST APIs
Trino automatically syncs Gravitino-managed catalogs every 10 seconds (configurable via gravitino.metadata.refresh-interval-seconds)
You query federated data using standard catalog.schema.table naming

Values Used in This Tutorial

Replace these values with your environment settings:

Gravitino URI: http://gravitino-server:8090
Metalake: trino_metalake
Iceberg HMS URI: thrift://hive-host:9083
Iceberg warehouse: hdfs://namenode:9000/user/iceberg/warehouse
MySQL JDBC URL: jdbc:mysql://mysql-host:3306?useSSL=false
MySQL credentials: trino / ds123

Step 1: Install and Configure Gravitino Trino Connector

The Gravitino Trino Connector must be installed on all Trino nodes (coordinator and workers).

Install the Connector Plugin

1. Download the connector

Download the Gravitino Trino connector from the Apache Gravitino download page or build from source.

2. Install on all Trino nodes

Extract the connector and copy it to Trino's plugin directory:

# Extract the connector
tar -xzf gravitino-trino-connector-<version>.tar.gz

# Copy to Trino plugin directory (on coordinator and all workers)
cp -r gravitino-trino-connector-<version> ${TRINO_HOME}/plugin/gravitino

Enable Dynamic Catalog Management

Configure Trino for dynamic catalogs

Edit ${TRINO_HOME}/etc/config.properties on the coordinator node:

catalog.management=dynamic

Configure the Gravitino Catalog

Create etc/catalog/gravitino.properties

On each Trino node, create the Gravitino catalog configuration file, pointing to your Gravitino server and metalake:

connector.name=gravitino
gravitino.uri=http://gravitino-server:8090
gravitino.metalake=trino_metalake

Note: The metalake specified in gravitino.metalake must already exist in Gravitino. If not, create it via the Web UI or REST API:
curl -X POST -H "Content-Type: application/json" \
  -d '{"name":"trino_metalake","comment":"Metalake for Trino federation","properties":{}}' \
  http://gravitino-server:8090/api/metalakes

Restart Trino

After creating the configuration file on each node, restart Trino to load the connector.

Verify Installation

Check the gravitino catalog is loaded

SHOW CATALOGS;

You should see gravitino in the catalog list, confirming successful installation.

Step 2: Create Catalogs from Trino SQL

Once the gravitino catalog is configured, you can create additional catalogs using stored procedures in gravitino.system.

Create an Iceberg Catalog

Example using Hive Metastore backend

CALL gravitino.system.create_catalog(
    'iceberg_test',
    'lakehouse-iceberg',
    MAP(
        ARRAY['uri', 'catalog-backend', 'warehouse'],
        ARRAY['thrift://hive-host:9083', 'hive', 'hdfs://namenode:9000/user/iceberg/warehouse']
    )
);

Note: For S3 or other cloud storage, you may need to pass additional properties using the trino.bypass. prefix for Trino-specific settings, read more in Apache Gravitino Trino connector - Iceberg catalog.

Create a MySQL Catalog

Example JDBC catalog for MySQL

CALL gravitino.system.create_catalog(
    'mysql_test',
    'jdbc-mysql',
    MAP(
        ARRAY['jdbc-url', 'jdbc-user', 'jdbc-password', 'jdbc-driver'],
        ARRAY['jdbc:mysql://mysql-host:3306?useSSL=false', 'trino', 'ds123', 'com.mysql.cj.jdbc.Driver']
    )
);

Tip: To ignore "already exists" errors, use named arguments with ignore_exist => true.

Verify Catalog Creation

Inspect Gravitino catalogs

SELECT * FROM gravitino.system.catalog;

Expected output:

   name       |     provider      | properties
--------------+-------------------+-------------------------------
 iceberg_test | lakehouse-iceberg | {...}
 mysql_test   | jdbc-mysql        | {...}

Step 3: Validate Catalog Discovery

After creating catalogs in Gravitino, verify they are visible in Trino.

Confirm Catalog Visibility

Check available catalogs and schemas

SHOW CATALOGS;
SHOW SCHEMAS FROM iceberg_test;
SHOW SCHEMAS FROM mysql_test;

Note: Trino syncs catalogs from Gravitino according to the configured refresh interval (10 seconds by default). If catalogs don't appear immediately, wait for the next refresh cycle and retry.

Step 4: Prepare Sample Data

Create sample schemas and tables to demonstrate federation capabilities.

Create MySQL Dimension Table

Set up a users dimension table

-- Create schema
CREATE SCHEMA mysql_test.demo;

-- Create users table
CREATE TABLE mysql_test.demo.users (
  user_id BIGINT,
  user_name VARCHAR
);

-- Insert sample data
INSERT INTO mysql_test.demo.users VALUES
  (1, 'alice'),
  (2, 'bob');

-- Verify data
SHOW TABLES FROM mysql_test.demo;
SELECT * FROM mysql_test.demo.users;

Create Iceberg Fact Table

Set up an events fact table

-- Create schema
CREATE SCHEMA iceberg_test.demo;

-- Create events table
CREATE TABLE iceberg_test.demo.events (
  user_id BIGINT,
  event_type VARCHAR,
  ts TIMESTAMP
);

-- Insert sample data
INSERT INTO iceberg_test.demo.events VALUES
  (1, 'click', TIMESTAMP '2024-01-01 10:00:00'),
  (2, 'view',  TIMESTAMP '2024-01-01 10:01:00');

-- Verify data
SHOW TABLES FROM iceberg_test.demo;
SELECT * FROM iceberg_test.demo.events;

Step 5: Execute Federated Queries

These examples demonstrate the core value of query federation: joining data across heterogeneous sources in a single query.

Pattern 1: Cross-Catalog JOIN

Join dimension and fact tables

SELECT
  e.user_id,
  u.user_name,
  e.event_type,
  e.ts
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
ORDER BY e.ts;

Pattern 2: Aggregation Across Catalogs

Count events by user

SELECT
  u.user_name,
  COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
WHERE e.ts >= TIMESTAMP '2024-01-01 00:00:00'
GROUP BY u.user_name
ORDER BY event_cnt DESC, u.user_name;

Pattern 3: Semi-Join Filter

Filter fact table by dimension membership

SELECT e.*
FROM iceberg_test.demo.events e
WHERE EXISTS (
  SELECT 1
  FROM mysql_test.demo.users u
  WHERE u.user_id = e.user_id
);

Pattern 4: LEFT JOIN with Unmatched Rows

Keep all events, even without matching users

SELECT
  e.user_id,
  COALESCE(u.user_name, 'unknown') AS user_name,
  e.event_type,
  e.ts
FROM iceberg_test.demo.events e
LEFT JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
ORDER BY e.ts;

Step 6: Understanding Federation Mechanics

In federated queries, understanding where work happens is crucial for optimization:

How queries execute:

Connector-level reads: Each connector (Iceberg, MySQL) reads from its respective source
Trino-level joins: Trino combines results from multiple sources in memory
Pushdown optimization: Some filters and predicates may be pushed to source systems

Query optimization tips:

Filter early: Apply partition/time filters on large tables to reduce data scanned
Align join keys: Use consistent data types across sources (e.g., BIGINT for IDs)
Small dimension pattern: Join large fact tables with small dimension tables for efficiency
Review query plans: Use EXPLAIN to understand execution strategy

Analyze query execution

EXPLAIN
SELECT
  u.user_name,
  COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
  ON e.user_id = u.user_id
GROUP BY u.user_name;

Step 7: Clean Up Resources

Remove sample data and catalogs

-- Drop tables
DROP TABLE mysql_test.demo.users;
DROP TABLE iceberg_test.demo.events;

-- Drop schemas
DROP SCHEMA mysql_test.demo;
DROP SCHEMA iceberg_test.demo;

-- Drop catalogs
CALL gravitino.system.drop_catalog('mysql_test');
CALL gravitino.system.drop_catalog('iceberg_test');

Troubleshooting

Common issues and their solutions:

Connector installation issues:

Catalog not found: Ensure the Gravitino connector plugin is installed on all Trino nodes and gravitino.properties exists in etc/catalog/
Dynamic catalog not working: Verify catalog.management=dynamic is set in etc/config.properties on the coordinator

Connection issues:

Cannot connect to Gravitino: Check that Gravitino server is running and gravitino.uri is correct
Metalake not found: Ensure the metalake specified in gravitino.metalake exists in Gravitino

Catalog sync issues:

Catalogs not appearing: Wait for the sync interval (default 10 seconds) or adjust gravitino.metadata.refresh-interval-seconds
Stale catalog information: Restart Trino or wait for the next sync cycle

Query execution issues:

Table not found: Verify the fully qualified table name format: catalog.schema.table
Permission denied (Gravitino metadata): Verify that the Trino identity (or the mapped Gravitino user, or the anonymous user if the connector is configured to run anonymously) has the required catalog/schema/table privileges in Gravitino for the objects being queried
Permission denied (underlying data source): If Gravitino privileges are correct but the error persists, check the credentials and permissions for the underlying data sources (for example MySQL user/password, Hive/HDFS/S3 ACLs) configured in the corresponding Gravitino catalog

Congratulations

You have successfully completed the Gravitino Trino query federation tutorial!

You now have a fully functional Trino environment with Gravitino integration, including:

A configured Gravitino Trino Connector for automatic catalog discovery
Multiple registered catalogs (Iceberg and MySQL) accessible from Trino
Working federated queries that join data across heterogeneous sources
Understanding of query optimization patterns and federation mechanics

Your Trino environment is now ready to leverage Gravitino for unified metadata management and cross-system query federation.

Next Steps

Explore Using Gravitino with Flink for streaming processing
Follow and star Apache Gravitino Repository

Apache Gravitino is rapidly evolving, and this article is written based on the latest version 1.1.0. If you encounter issues, please refer to the official documentation or submit issues on GitHub.

DEV Community