As a high-performance distributed data integration platform, properly tuning JVM parameters for Apache SeaTunnel is essential if you want better throughput, lower latency, and stable execution.
So how should you tune JVM parameters?
In this article, we’ll walk through where to configure them, how precedence works, the key parameters to focus on, and some practical tuning strategies.
1. Configuration File Locations
SeaTunnel manages JVM parameters through configuration files under $SEATUNNEL_HOME/config/. Depending on the deployment role, there are four main files:
| File Name | Scope | Default Example |
|---|---|---|
jvm_options |
Hybrid mode (master_and_worker), where Master and Worker run in the same process |
-Xms2g -Xmx2g -XX:+UseG1GC |
jvm_master_options |
Dedicated Master node, responsible for scheduling and state management (no computation) | -Xms2g -Xmx2g |
jvm_worker_options |
Dedicated Worker node, responsible for data reading, transformation, and writing (main memory consumer) | -Xms2g -Xmx2g |
jvm_client_options |
Client side (seatunnel.sh), used to parse configs and submit jobs |
-Xms256m -Xmx512m |
2. Parameter Precedence
Understanding parameter precedence is critical when troubleshooting.
SeaTunnel loads JVM parameters in the following order, and later ones override earlier ones (for example, the last -Xmx wins):
Environment variable
JAVA_OPTS
Loaded first. You can define it in system env variables or inconfig/seatunnel-env.sh.Configuration files (
config/jvm_*_options)
Loaded next, and override anything set inJAVA_OPTS.Command-line parameters (
-DJvmOption)
Loaded last, with the highest priority.
Example:
If JAVA_OPTS="-Xmx4g", the config file sets -Xmx2g, and the startup command includes -DJvmOption="-Xmx8g", then the effective value will be 8g.
3. Key JVM Tuning Parameters
3.1 Heap Memory
Heap memory is the most important part of JVM tuning. It directly determines how much data SeaTunnel can process in parallel without running into OOM (Out Of Memory).
-
-Xms: Initial heap size -
-Xmx: Maximum heap size
Best practices:
Worker nodes:
It’s strongly recommended to set-Xmsand-Xmxto the same value (for example,-Xms8g -Xmx8g).
This avoids runtime heap resizing, reduces performance fluctuations, and helps prevent memory fragmentation.Master nodes:
Memory requirements are relatively low. In most cases,2g–4gis sufficient. Increase it only if the cluster handles many jobs.Client:
The default512mis usually enough. If your job configuration (SQL/JSON) is very large (tens of thousands of lines), consider increasing it to1gor more.
3.2 Off-Heap Memory
Important note:
You may notice that the actual physical memory (RSS) used by SeaTunnel is significantly larger than the -Xmx value.
Why?
SeaTunnel uses Netty for network communication, which relies heavily on off-heap (direct) memory for zero-copy data transfer.
In addition, thread stacks (-Xss * number of threads), Metaspace, and JVM overhead also consume non-heap memory.Risk:
If the machine runs out of physical memory, the Linux OOM Killer may terminate the process (usually a Worker).
Recommendations:
Reserve memory for the OS:
On an 8GB machine, keep-Xmxbelow5g, leaving around 3GB for off-heap memory and the operating system.Docker/Kubernetes:
The container memory limit must be larger than-Xmxplus estimated off-heap usage.
A common rule is to set it to about 1.5×-Xmx.
3.3 Garbage Collector
SeaTunnel’s Zeta engine recommends using G1GC, which provides more predictable pause times for large heaps.
-XX:+UseG1GC: Enable G1 GC (enabled by default)-
-XX:MaxGCPauseMillis=200: Target maximum GC pause time (in milliseconds)-
Real-time workloads:
If latency is critical, you can lower this value (e.g.,
100). Keep in mind this may increase GC frequency and slightly reduce overall throughput. -
Batch workloads:
The default
200msis usually a good balance.
-
Real-time workloads:
If latency is critical, you can lower this value (e.g.,
-XX:InitiatingHeapOccupancyPercent=45:
Heap occupancy threshold that triggers concurrent GC.
If you observe frequent Full GC, try lowering it (e.g.,40) so GC starts earlier.
3.4 Metaspace
Metaspace stores class metadata. SeaTunnel consumes metaspace when loading connectors.
-
-XX:MaxMetaspaceSize: Maximum metaspace size
The default (2g) is usually sufficient.
If you encounter java.lang.OutOfMemoryError: Metaspace, increase it accordingly.
3.5 Troubleshooting
When OOM happens, heap dumps are extremely helpful for diagnosis.
-
-XX:+HeapDumpOnOutOfMemoryError: Generate a heap dump automatically on OOM -
-XX:HeapDumpPath=/tmp/seatunnel/dump/: Path to store dump files
Notes:
- Make sure the disk has enough space (at least larger than
-Xmx) - In container environments, ensure the path is mounted to the host; otherwise, dumps will be lost after restart
4. JDK Compatibility
Recommended versions: Java 8 (JDK 1.8) or Java 11
These are the most thoroughly tested versions.Java 17+:
Generally supported, but due to the module system introduced in Java 9+, you may encounterInaccessibleObjectExceptioncaused by restricted reflection access.
Solution:
If this happens, add --add-opens options in jvm_options, for example:
--add-opens java.base/java.lang=ALL-UNNAMED
--add-opens java.base/java.util=ALL-UNNAMED
5. Production Tuning Scenarios
Scenario 1: Large-Scale Batch Processing
- Characteristics: Large data volume (TB scale), throughput is the priority
Worker recommendation:
-Xms8g -Xmx8g
-XX:+UseG1GC
-XX:ParallelGCThreads=8
Notes:
- If the source reads data too quickly, memory may build up
-
Besides increasing heap size, consider:
- Limiting
read_limit.rows_per_second - Adjusting
parallelism
- Limiting
Scenario 2: Real-Time CDC Synchronization
- Characteristics: Long-running jobs, latency-sensitive, relatively stable memory usage
Worker recommendation:
-Xms4g -Xmx4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
Notes:
- Checkpoint frequency also affects memory usage (state backend caching)
- If memory pressure is high, consider increasing
checkpoint.interval
Scenario 3: Low-Memory Deployment (e.g., 4GB)
- Risk: High chance of being killed by the OS
Worker recommendation:
-Xmx2560m
- Allocate about 2.5GB to heap
-
Leave the remaining 1.5GB for:
- Off-heap memory (Netty)
- OS
- Other processes
6. How to Verify Your Configuration
After starting SeaTunnel, run:
jps -v | grep SeaTunnel
Example output:
12345 SeaTunnelServer ... -Xms8g -Xmx8g -XX:+UseG1GC ...
Make sure your parameters (e.g., -Xmx8g) appear at the end of the list (or are not overridden by later ones).
7. Docker / Kubernetes-Specific Configuration
7.1 Recommended Approach: Container-Aware Memory
In Kubernetes, memory is typically controlled via resources.limits.memory.
Instead of hardcoding -Xmx, it’s better to use percentage-based settings so the JVM can adapt automatically.
Example:
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0 -XshowSettings:vm"
Explanation:
-
-XX:+UseContainerSupport: Allows JVM to detect container limits -
-XX:MaxRAMPercentage=70.0: Sets heap to 70% of container memory
Why 70%?
The remaining 30% is needed for:
- Direct memory (Netty)
- Metaspace
- Thread stacks
- JVM overhead
7.2 Resource Limits
Make sure Kubernetes resource settings align with JVM needs.
Example: Want 8GB heap
- JVM: 70%
- K8s limit:
8 / 0.7 ≈ 11.5GB→ set to12Gi
resources:
requests:
memory: "12Gi"
cpu: "4"
limits:
memory: "12Gi"
cpu: "4"
7.3 Overriding Default Config
If default config files already define memory settings, they may override JAVA_OPTS.
To ensure your settings take effect:
- Use command-line parameters (highest priority):
args: ["-DJvmOption=-XX:MaxRAMPercentage=70.0"]
- Mount custom config files via ConfigMap
7.4 Common Pitfalls
❌ Setting
limits.memory = 4Giand-Xmx4g
→ No space left for non-heap memory → process will be killed❌ Not setting
requests
→ Pod may be scheduled on a node without enough memory
Code References
jvm_optionsseatunnel-cluster.shvalues.yaml
Top comments (0)