This step-by-step tutorial shows you how to install and run a PrestoDB Java Coordinator with a Prestissimo (C++) Worker using Docker. This setup uses Meta's high-performance Velox engine for worker-side query execution.
We'll create a lightweight Presto cluster, run a test query with the built-in TPCH connector.
Introducing Prestissimo (Presto C++ Worker)
Prestissimo is the C++ native implementation of the Presto Worker, designed as a drop-in replacement for the traditional Java worker. It is built using Velox, a high-performance open-source C++ database acceleration library created by Meta.
The shift to a C++ execution engine is a major performance innovation for Presto, offering several significant advantages for data lake analytics:
Massive Performance Boost: Prestissimo leverages native C++ execution, vectorization, and SIMD (Single Instruction, Multiple Data) instructions, which can dramatically increase CPU efficiency and reduce query latency, with production results showing fleet sizes shrinking to nearly a third.
Eliminates Java GC Issues: By moving the execution engine out of the JVM, it removes the unpredictable performance spikes and pauses associated with Java Garbage Collection (GC), leading to more consistent and stable query times.
Explicit Memory Control: The Velox memory management framework provides explicit memory accounting and arbitration, offering finer control over resource consumption than the JVM.
Prerequisites ✅
- Docker installed (I am using OrbStack).
- Basic familiarity with the Terminal and shell commands.
Directory-Structure 🗃️
Let's Get Started! 🛠️
Step 1 - Create a Working Directory
Create a clean directory to hold all your configuration files and the docker-compose.yml.
mkdir -p ~/presto-lab
cd ~/presto-lab
Step 2 - Configure the Presto Java Coordinator
The Coordinator requires configuration for its role, the discovery service, and a catalog to query.
- Create Directory:
mkdir -p coordinator/etc/catalog
- Create coordinator/etc/config.properties:This enables the coordinator and discovery server, and sets the port.
# coordinator/etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080
coordinator=true: Enables coordinator mode.
discovery-server.enabled=true: Coordinator also hosts the worker discovery service.
- Create coordinator/etc/jvm.config: Standard Java 17 flags for Presto.
# coordinator/etc/jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.ref=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.security=ALL-UNNAMED
--add-opens=java.base/javax.security.auth=ALL-UNNAMED
--add-opens=java.base/javax.security.auth.login=ALL-UNNAMED
--add-opens=java.base/java.text=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/java.util.regex=ALL-UNNAMED
--add-opens=java.base/jdk.internal.loader=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
- Create coordinator/etc/node.properties: Sets the node environment and data directory.
# coordinator/etc/node.properties
node.id=${ENV:HOSTNAME}
node.environment=test
node.data-dir=/var/lib/presto/data
- Add TPCH Catalog coordinator/etc/catalog/tpch.properties: The TPCH connector allows you to run test queries against an in-memory dataset.
# coordinator/etc/catalog/tpch.properties
connector.name=tpch
Step 3 - Configure the Prestissimo (C++) Worker
The Worker needs to know where the Coordinator/Discovery service is and how to identify itself within the network.
- Create Directory
mkdir -p worker-1/etc/catalog
- Create worker-1/etc/config.properties: This configuration points the worker to the discovery service.
# worker-1/etc/config.properties
discovery.uri=http://coordinator:8080
presto.version=0.288-15f14bb
http-server.http.port=7777
shutdown-onset-sec=1
runtime-metrics-collection-enabled=true
discovery.uri=http://coordinator:8080: Uses the coordinator service name (defined in docker-compose) for in-network communication.
- Create worker-1/etc/node.properties: Defines the worker's internal address for network registration.
# worker-1/etc/node.properties
node.environment=test
node.internal-address=worker-1
node.location=docker
node.id=worker-1
node.internal-address=worker-1: Matches the service name for reliable registration.
- Add TPCH Catalog worker-1/etc/catalog/tpch.properties: The worker needs the catalog definition to execute the query stages.
# worker-1/etc/catalog/tpch.properties
connector.name=tpch
Repeat the Step 3 to add more workers
Step 4 - Create docker-compose.yml
- This file orchestrates both the Java Coordinator and the C++ Worker containers. Create the file docker-compose.yml in your ~/presto-lab directory:
# docker-compose.yml
services:
coordinator:
image: public.ecr.aws/oss-presto/presto:latest
platform: linux/amd64
container_name: presto-coordinator
hostname: coordinator
ports:
- "8080:8080"
volumes:
- ./coordinator/etc:/opt/presto-server/etc:ro
restart: unless-stopped
worker-1:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-1
hostname: worker-1
depends_on:
- coordinator
volumes:
- ./worker-1/etc:/opt/presto-server/etc:ro
restart: unless-stopped
worker-2:
image: public.ecr.aws/oss-presto/presto-native:latest
platform: linux/amd64
container_name: prestissimo-worker-2
hostname: worker-2
depends_on:
- coordinator
volumes:
- ./worker-2/etc:/opt/presto-server/etc:ro
restart: unless-stopped
- coordinator uses the standard Java Presto image (presto:latest).
- worker-1 & worker-2 uses the Prestissimo (C++ Native) image (presto-native:latest).
- platform: linux/amd64 is crucial for Apple Silicon Mac users.
- volumes mount your local configuration directories (etc) into the container's expected path (/opt/presto-server/etc).
Step 5 - Start the Cluster and Verify
docker compose up -d
Verify: Open the Presto Web UI at http://localhost:8080. You should see the UI, with 3 Active Workers (1 Coordinator + 2 Workers).
- To check detailed status and metadata about every node (Coordinator and Workers). Run below query.
select * from system.runtime.nodes;
Follow Presto at Official Website, Linkedin, Youtube, and Join Slack channel to interact with the community.
Top comments (0)