Author: Qi Yu
Last Updated: 2026-01-23
Overview
In this tutorial, you will learn how to configure and use the Gravitino Lance REST service. By the end of this guide, you'll have a fully functional Lance REST service that enables Lance clients to interact with Gravitino through HTTP APIs.
The Gravitino Lance REST service provides a RESTful interface for managing Lance datasets, implementing the standard Lance REST API. It acts as a centralized catalog service that allows Lance clients (like Spark and Ray) to discover and access Lance datasets managed by Gravitino.
Key concepts:
- Lance REST catalog: A standard HTTP API for Lance dataset operations
- Gravitino Lance REST service: Implements the Lance REST API and integrates with Gravitino's metadata system
- Unified Metadata: Stores Lance dataset metadata in Gravitino, enabling centralized governance
The REST endpoint base path is http://<host>:<port>/lance/.
Architecture overview:
Prerequisites
Before starting this tutorial, you will need:
System Requirements:
- Linux or macOS operating system with outbound internet access for downloads
- Python environment (3.10+) for running PySpark or Ray clients
Required Components:
- Gravitino server installed and configured (see
02-setup-guide/README.md)
Optional Components:
- Apache Spark with Lance runtime JARs for client verification (recommended for testing)
- Ray framework for distributed Lance data processing
Before proceeding, verify your Python installation and install required packages:
python --version
pip install pyspark==3.5.0 lance-ray==0.1.0 lance-namespace
Setup
Step 1: Start a Gravitino server with Lance REST service
Use this approach if you want the Lance REST service embedded in a full Gravitino server (with Web UI, unified REST APIs, etc.).
Configure Lance REST as auxiliary service
1. Install Gravitino server distribution
Follow the previous tutorial 02-setup-guide/README.md to download or build the Gravitino server package.
2. Enable Lance REST as an auxiliary service
Modify conf/gravitino.conf to enable the lance-rest service and configure it:
# Enable Lance REST service
gravitino.auxService.names = lance-rest
gravitino.lance-rest.httpPort = 9101
gravitino.lance-rest.host = 0.0.0.0
gravitino.lance-rest.namespace-backend = gravitino
gravitino.lance-rest.gravitino-uri = http://localhost:8090
gravitino.lance-rest.gravitino-metalake = lance_metalake
Note: The
lance_metalakeshould exist in Gravitino when you access Lance REST service. You can create it via the Gravitino REST API or Web UI after starting the Gravitino server if it doesn't exist.
3. Start the Gravitino server
./bin/gravitino.sh start
4. Create the Metalake (if not exists)
curl -X POST -H "Content-Type: application/json" \
-d '{"name":"lance_metalake","comment":"comment"}' \
http://localhost:8090/api/metalakes
5. Check server logs (optional)
tail -f logs/gravitino-server.log
Step 2: Verify the Lance REST endpoint and create a catalog namespace
Test the service endpoint
You can verify the service is running by the following command:
curl -X GET http://localhost:9101/lance/v1/namespace/$/list \
-H 'Content-Type: application/json'
On success, you should see a JSON response with namespace information.
Create a catalog namespace
Create a catalog namespace (e.g., lance_catalog) that will hold your Lance schemas and tables:
curl -X POST http://localhost:9101/lance/v1/namespace/lance_catalog/create \
-H 'Content-Type: application/json' \
-d '{
"id": ["lance_catalog"],
"mode": "exist_ok"
}'
If successful, it returns the namespace information.
Step 3: Connect with Spark
Configure your PySpark session to use the Lance REST catalog.
Configure Spark with Lance REST catalog
Prerequisites:
- Install pyspark:
pip install pyspark==3.5.0 - Download the
lance-sparkbundle jar matching your Spark version (e.g.,lance-spark-bundle-3.5_2.12-0.0.15.jar)
Execute sample operations
Run the following Python script:
from pyspark.sql import SparkSession
import os
# Set path to your lance-spark bundle
os.environ["PYSPARK_SUBMIT_ARGS"] = (
"--jars /path/to/lance-spark-bundle-3.5_2.12-0.0.15.jar "
"--conf \"spark.driver.extraJavaOptions=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED\" "
"--conf \"spark.executor.extraJavaOptions=--add-opens=java.base/sun.nio.ch=ALL-UNNAMED\" "
"--master local[1] pyspark-shell"
)
spark = SparkSession.builder \
.appName("lance_rest_demo") \
.config("spark.sql.catalog.lance", "com.lancedb.lance.spark.LanceNamespaceSparkCatalog") \
.config("spark.sql.catalog.lance.impl", "rest") \
.config("spark.sql.catalog.lance.uri", "http://localhost:9101/lance") \
.config("spark.sql.catalog.lance.parent", "lance_catalog") \
.config("spark.sql.defaultCatalog", "lance") \
.getOrCreate()
# Create a schema and table
spark.sql("CREATE DATABASE IF NOT EXISTS demo_schema")
spark.sql("""
CREATE TABLE demo_schema.test_table (id INT, value STRING)
USING lance
LOCATION '/tmp/lance_catalog/demo_schema/test_table'
""")
# Insert and query data
spark.sql("INSERT INTO demo_schema.test_table VALUES (1, 'test')")
spark.sql("SELECT * FROM demo_schema.test_table").show()
Step 4: Connect with Ray
You can also access the data created by Spark using Ray with Lance Ray integration.
Configure Ray with Lance REST catalog
Prerequisites:
- Install required packages:
pip install lance-ray==0.1.0 lance-namespace
Execute sample operations
import ray
import lance_namespace as ln
from lance_ray import read_lance, write_lance
ray.init()
# Connect to Lance REST
namespace = ln.connect("rest", {"uri": "http://localhost:9101/lance"})
# Read the table created by Spark
# Note: Table ID is [catalog, schema, table]
ds = read_lance(namespace=namespace, table_id=["lance_catalog", "demo_schema", "test_table"])
print(f"Row count: {ds.count()}")
ds.show()
# Perform filtering operation
result = ds.filter(lambda row: row["id"] < 100).count()
print(f"Filtered row count: {result}")
Troubleshooting
Common issues and their solutions:
Service connectivity issues:
-
Service fails to start: Check
logs/gravitino-server.logfor startup errors and configuration issues -
Connection refused: Verify
gravitino.lance-rest.httpPort(default 9101) is open and accessible -
curlreturns 404: Confirm the Lance REST base path is/lanceand the port matches configuration
Client connection issues:
-
Spark ClassNotFoundException: Ensure the
lance-spark-bundlejar is correctly referenced inPYSPARK_SUBMIT_ARGSor--jars -
Namespace not found: Remember to create the parent catalog namespace (e.g.,
lance_catalog) before creating schemas or tables -
Ray connection errors: Verify
lance-rayandlance-namespacepackages are installed and the REST endpoint is accessible
Configuration issues:
-
Metalake not found: Ensure the metalake specified in
gravitino.lance-rest.gravitino-metalakeexists in Gravitino - Permission errors: Check that the Gravitino server has proper access to the configured storage locations
Congratulations
You have successfully completed the Gravitino Lance REST service configuration tutorial!
You now have a fully functional Lance REST service with:
- A configured Lance REST endpoint running on port 9101
- A catalog namespace configured for organizing Lance datasets
- Verified client connectivity through Apache Spark and Ray
- Understanding of Lance dataset operations across different compute engines
Your Gravitino Lance REST service is ready to serve Lance clients across your data ecosystem.
Further Reading
For more advanced configurations and detailed documentation:
- Check the Lance REST Integration Guide for compatibility matrices and advanced configuration
- Learn more about Lance format and its capabilities
Next Steps
- Continue reading Spark ETL
- Follow and star Apache Gravitino Repository
Apache Gravitino is rapidly evolving, and this article is written based on the latest version 1.1.0. If you encounter issues, please refer to the official documentation or submit issues on GitHub.



Top comments (0)