Author: Yu hui
Last Updated: 2026-02-11
Overview
In this tutorial, you will learn how to integrate Apache Gravitino with Trino to enable query federation across multiple data sources through a unified metadata layer. By the end of this guide, you'll be able to configure Trino to automatically load catalogs from Gravitino and run cross-catalog queries seamlessly.
What you'll accomplish:
- Connect Trino to Gravitino to enable automatic loading of Gravitino-managed catalogs
- Create catalogs from Trino SQL including Iceberg and MySQL examples
- Execute federated queries that join data across heterogeneous sources
- Validate catalog discovery and inspect catalogs using Trino SQL
Trino is a distributed SQL query engine designed for fast analytic queries against data of any size. In modern data architectures, organizations often need to query data across multiple heterogeneous systems (like MySQL, PostgreSQL, Iceberg, Hive) without moving or copying data. This is where query federation becomes essential.
Apache Gravitino simplifies this by acting as a unified metadata control plane. By using the Gravitino Trino Connector, you can access multiple data sources through a single catalog interface in Trino, with automatic catalog discovery and centralized metadata management.
Key benefits:
- Unified catalog access: Query MySQL, Iceberg, Hive, and other sources through a single interface
- Automatic catalog discovery: Catalogs created in Gravitino are automatically available in Trino
- Zero data movement: Join across heterogeneous systems without copying data
- Centralized management: Create and update catalogs in one place, reflected everywhere
Architecture overview:
Prerequisites
Before starting this tutorial, you will need:
System Requirements:
- Linux or macOS operating system with outbound internet access for downloads
- JDK 17 or higher installed and properly configured
Required Components:
- Gravitino server installed and running (see
02-setup-guide/README.md) - Trino server (coordinator + workers, or single-node for testing)
- Trino version 435 or compatible version for your Gravitino Trino connector release
Optional Components:
- MySQL or PostgreSQL for JDBC federation examples
- Hive Metastore for Iceberg catalog backend
- Object storage (S3/GCS/Azure) for cloud-based table storage
Before proceeding, verify your Java installation:
${JAVA_HOME}/bin/java -version
Important: Ensure your Gravitino server is configured to use
simpleauthentication mode. The Gravitino Trino connector currently connects as an anonymous user and does not propagate user authentication.
Setup
How the Integration Works
The Gravitino Trino Connector enables Trino to dynamically load catalogs from Gravitino:
- The connector is configured as a Trino catalog named
gravitinoviaetc/catalog/gravitino.properties - You create additional catalogs (like
iceberg_testandmysql_test) through Gravitino stored procedures or REST APIs - Trino automatically syncs Gravitino-managed catalogs every 10 seconds (configurable via
gravitino.metadata.refresh-interval-seconds) - You query federated data using standard
catalog.schema.tablenaming
Values Used in This Tutorial
Replace these values with your environment settings:
-
Gravitino URI:
http://gravitino-server:8090 -
Metalake:
trino_metalake -
Iceberg HMS URI:
thrift://hive-host:9083 -
Iceberg warehouse:
hdfs://namenode:9000/user/iceberg/warehouse -
MySQL JDBC URL:
jdbc:mysql://mysql-host:3306?useSSL=false -
MySQL credentials:
trino/ds123
Step 1: Install and Configure Gravitino Trino Connector
The Gravitino Trino Connector must be installed on all Trino nodes (coordinator and workers).
Install the Connector Plugin
1. Download the connector
Download the Gravitino Trino connector from the Apache Gravitino download page or build from source.
2. Install on all Trino nodes
Extract the connector and copy it to Trino's plugin directory:
# Extract the connector
tar -xzf gravitino-trino-connector-<version>.tar.gz
# Copy to Trino plugin directory (on coordinator and all workers)
cp -r gravitino-trino-connector-<version> ${TRINO_HOME}/plugin/gravitino
Enable Dynamic Catalog Management
Configure Trino for dynamic catalogs
Edit ${TRINO_HOME}/etc/config.properties on the coordinator node:
catalog.management=dynamic
Configure the Gravitino Catalog
Create etc/catalog/gravitino.properties
On each Trino node, create the Gravitino catalog configuration file, pointing to your Gravitino server and metalake:
connector.name=gravitino
gravitino.uri=http://gravitino-server:8090
gravitino.metalake=trino_metalake
Note: The metalake specified in
gravitino.metalakemust already exist in Gravitino. If not, create it via the Web UI or REST API:curl -X POST -H "Content-Type: application/json" \ -d '{"name":"trino_metalake","comment":"Metalake for Trino federation","properties":{}}' \ http://gravitino-server:8090/api/metalakes
Restart Trino
After creating the configuration file on each node, restart Trino to load the connector.
Verify Installation
Check the gravitino catalog is loaded
SHOW CATALOGS;
You should see gravitino in the catalog list, confirming successful installation.
Step 2: Create Catalogs from Trino SQL
Once the gravitino catalog is configured, you can create additional catalogs using stored procedures in gravitino.system.
Create an Iceberg Catalog
Example using Hive Metastore backend
CALL gravitino.system.create_catalog(
'iceberg_test',
'lakehouse-iceberg',
MAP(
ARRAY['uri', 'catalog-backend', 'warehouse'],
ARRAY['thrift://hive-host:9083', 'hive', 'hdfs://namenode:9000/user/iceberg/warehouse']
)
);
Note: For S3 or other cloud storage, you may need to pass additional properties using the
trino.bypass.prefix for Trino-specific settings, read more in Apache Gravitino Trino connector - Iceberg catalog.
Create a MySQL Catalog
Example JDBC catalog for MySQL
CALL gravitino.system.create_catalog(
'mysql_test',
'jdbc-mysql',
MAP(
ARRAY['jdbc-url', 'jdbc-user', 'jdbc-password', 'jdbc-driver'],
ARRAY['jdbc:mysql://mysql-host:3306?useSSL=false', 'trino', 'ds123', 'com.mysql.cj.jdbc.Driver']
)
);
Tip: To ignore "already exists" errors, use named arguments with
ignore_exist => true.
Verify Catalog Creation
Inspect Gravitino catalogs
SELECT * FROM gravitino.system.catalog;
Expected output:
name | provider | properties
--------------+-------------------+-------------------------------
iceberg_test | lakehouse-iceberg | {...}
mysql_test | jdbc-mysql | {...}
Step 3: Validate Catalog Discovery
After creating catalogs in Gravitino, verify they are visible in Trino.
Confirm Catalog Visibility
Check available catalogs and schemas
SHOW CATALOGS;
SHOW SCHEMAS FROM iceberg_test;
SHOW SCHEMAS FROM mysql_test;
Note: Trino syncs catalogs from Gravitino according to the configured refresh interval (10 seconds by default). If catalogs don't appear immediately, wait for the next refresh cycle and retry.
Step 4: Prepare Sample Data
Create sample schemas and tables to demonstrate federation capabilities.
Create MySQL Dimension Table
Set up a users dimension table
-- Create schema
CREATE SCHEMA mysql_test.demo;
-- Create users table
CREATE TABLE mysql_test.demo.users (
user_id BIGINT,
user_name VARCHAR
);
-- Insert sample data
INSERT INTO mysql_test.demo.users VALUES
(1, 'alice'),
(2, 'bob');
-- Verify data
SHOW TABLES FROM mysql_test.demo;
SELECT * FROM mysql_test.demo.users;
Create Iceberg Fact Table
Set up an events fact table
-- Create schema
CREATE SCHEMA iceberg_test.demo;
-- Create events table
CREATE TABLE iceberg_test.demo.events (
user_id BIGINT,
event_type VARCHAR,
ts TIMESTAMP
);
-- Insert sample data
INSERT INTO iceberg_test.demo.events VALUES
(1, 'click', TIMESTAMP '2024-01-01 10:00:00'),
(2, 'view', TIMESTAMP '2024-01-01 10:01:00');
-- Verify data
SHOW TABLES FROM iceberg_test.demo;
SELECT * FROM iceberg_test.demo.events;
Step 5: Execute Federated Queries
These examples demonstrate the core value of query federation: joining data across heterogeneous sources in a single query.
Pattern 1: Cross-Catalog JOIN
Join dimension and fact tables
SELECT
e.user_id,
u.user_name,
e.event_type,
e.ts
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
ORDER BY e.ts;
Pattern 2: Aggregation Across Catalogs
Count events by user
SELECT
u.user_name,
COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
WHERE e.ts >= TIMESTAMP '2024-01-01 00:00:00'
GROUP BY u.user_name
ORDER BY event_cnt DESC, u.user_name;
Pattern 3: Semi-Join Filter
Filter fact table by dimension membership
SELECT e.*
FROM iceberg_test.demo.events e
WHERE EXISTS (
SELECT 1
FROM mysql_test.demo.users u
WHERE u.user_id = e.user_id
);
Pattern 4: LEFT JOIN with Unmatched Rows
Keep all events, even without matching users
SELECT
e.user_id,
COALESCE(u.user_name, 'unknown') AS user_name,
e.event_type,
e.ts
FROM iceberg_test.demo.events e
LEFT JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
ORDER BY e.ts;
Step 6: Understanding Federation Mechanics
In federated queries, understanding where work happens is crucial for optimization:
How queries execute:
- Connector-level reads: Each connector (Iceberg, MySQL) reads from its respective source
- Trino-level joins: Trino combines results from multiple sources in memory
- Pushdown optimization: Some filters and predicates may be pushed to source systems
Query optimization tips:
- Filter early: Apply partition/time filters on large tables to reduce data scanned
-
Align join keys: Use consistent data types across sources (e.g.,
BIGINTfor IDs) - Small dimension pattern: Join large fact tables with small dimension tables for efficiency
-
Review query plans: Use
EXPLAINto understand execution strategy
Analyze query execution
EXPLAIN
SELECT
u.user_name,
COUNT(*) AS event_cnt
FROM iceberg_test.demo.events e
JOIN mysql_test.demo.users u
ON e.user_id = u.user_id
GROUP BY u.user_name;
Step 7: Clean Up Resources
Remove sample data and catalogs
-- Drop tables
DROP TABLE mysql_test.demo.users;
DROP TABLE iceberg_test.demo.events;
-- Drop schemas
DROP SCHEMA mysql_test.demo;
DROP SCHEMA iceberg_test.demo;
-- Drop catalogs
CALL gravitino.system.drop_catalog('mysql_test');
CALL gravitino.system.drop_catalog('iceberg_test');
Troubleshooting
Common issues and their solutions:
Connector installation issues:
-
Catalog not found: Ensure the Gravitino connector plugin is installed on all Trino nodes and
gravitino.propertiesexists inetc/catalog/ -
Dynamic catalog not working: Verify
catalog.management=dynamicis set inetc/config.propertieson the coordinator
Connection issues:
-
Cannot connect to Gravitino: Check that Gravitino server is running and
gravitino.uriis correct -
Metalake not found: Ensure the metalake specified in
gravitino.metalakeexists in Gravitino
Catalog sync issues:
-
Catalogs not appearing: Wait for the sync interval (default 10 seconds) or adjust
gravitino.metadata.refresh-interval-seconds - Stale catalog information: Restart Trino or wait for the next sync cycle
Query execution issues:
-
Table not found: Verify the fully qualified table name format:
catalog.schema.table -
Permission denied (Gravitino metadata): Verify that the Trino identity (or the mapped Gravitino user, or the
anonymoususer if the connector is configured to run anonymously) has the required catalog/schema/table privileges in Gravitino for the objects being queried - Permission denied (underlying data source): If Gravitino privileges are correct but the error persists, check the credentials and permissions for the underlying data sources (for example MySQL user/password, Hive/HDFS/S3 ACLs) configured in the corresponding Gravitino catalog
Congratulations
You have successfully completed the Gravitino Trino query federation tutorial!
You now have a fully functional Trino environment with Gravitino integration, including:
- A configured Gravitino Trino Connector for automatic catalog discovery
- Multiple registered catalogs (Iceberg and MySQL) accessible from Trino
- Working federated queries that join data across heterogeneous sources
- Understanding of query optimization patterns and federation mechanics
Your Trino environment is now ready to leverage Gravitino for unified metadata management and cross-system query federation.
Further Reading
For more advanced configurations and detailed documentation:
- Review the Gravitino Trino Connector Documentation for advanced configuration options
- Learn about Trino Query Federation for deeper understanding
- Explore Trino Performance Tuning for optimization strategies
Next Steps
- Explore Using Gravitino with Flink for streaming processing
- Follow and star Apache Gravitino Repository
Apache Gravitino is rapidly evolving, and this article is written based on the latest version 1.1.0. If you encounter issues, please refer to the official documentation or submit issues on GitHub.


Top comments (0)