Apache SeaTunnel

Posted on Mar 27

Apache SeaTunnel Performance Tuning: How to Set JVM Parameters the Right Way

#apacheseatunnel #opensource #jvm #ai

As a high-performance distributed data integration platform, properly tuning JVM parameters for Apache SeaTunnel is essential if you want better throughput, lower latency, and stable execution.

So how should you tune JVM parameters?
In this article, we’ll walk through where to configure them, how precedence works, the key parameters to focus on, and some practical tuning strategies.

1. Configuration File Locations

SeaTunnel manages JVM parameters through configuration files under $SEATUNNEL_HOME/config/. Depending on the deployment role, there are four main files:

File Name	Scope	Default Example
`jvm_options`	Hybrid mode (`master_and_worker`), where Master and Worker run in the same process	`-Xms2g -Xmx2g -XX:+UseG1GC`
`jvm_master_options`	Dedicated Master node, responsible for scheduling and state management (no computation)	`-Xms2g -Xmx2g`
`jvm_worker_options`	Dedicated Worker node, responsible for data reading, transformation, and writing (main memory consumer)	`-Xms2g -Xmx2g`
`jvm_client_options`	Client side (`seatunnel.sh`), used to parse configs and submit jobs	`-Xms256m -Xmx512m`

2. Parameter Precedence

Understanding parameter precedence is critical when troubleshooting.

SeaTunnel loads JVM parameters in the following order, and later ones override earlier ones (for example, the last -Xmx wins):

Environment variable JAVA_OPTS
Loaded first. You can define it in system env variables or in config/seatunnel-env.sh.
Configuration files (config/jvm_*_options)
Loaded next, and override anything set in JAVA_OPTS.
Command-line parameters (-DJvmOption)
Loaded last, with the highest priority.

Example:
If JAVA_OPTS="-Xmx4g", the config file sets -Xmx2g, and the startup command includes -DJvmOption="-Xmx8g", then the effective value will be 8g.

3. Key JVM Tuning Parameters

3.1 Heap Memory

Heap memory is the most important part of JVM tuning. It directly determines how much data SeaTunnel can process in parallel without running into OOM (Out Of Memory).

-Xms: Initial heap size
-Xmx: Maximum heap size

Best practices:

Worker nodes:
It’s strongly recommended to set -Xms and -Xmx to the same value (for example, -Xms8g -Xmx8g).
This avoids runtime heap resizing, reduces performance fluctuations, and helps prevent memory fragmentation.
Master nodes:
Memory requirements are relatively low. In most cases, 2g–4g is sufficient. Increase it only if the cluster handles many jobs.
Client:
The default 512m is usually enough. If your job configuration (SQL/JSON) is very large (tens of thousands of lines), consider increasing it to 1g or more.

3.2 Off-Heap Memory

Important note:
You may notice that the actual physical memory (RSS) used by SeaTunnel is significantly larger than the -Xmx value.

Why?
SeaTunnel uses Netty for network communication, which relies heavily on off-heap (direct) memory for zero-copy data transfer.
In addition, thread stacks (-Xss * number of threads), Metaspace, and JVM overhead also consume non-heap memory.
Risk:
If the machine runs out of physical memory, the Linux OOM Killer may terminate the process (usually a Worker).

Recommendations:

Reserve memory for the OS:
On an 8GB machine, keep -Xmx below 5g, leaving around 3GB for off-heap memory and the operating system.
Docker/Kubernetes:
The container memory limit must be larger than -Xmx plus estimated off-heap usage.
A common rule is to set it to about 1.5× -Xmx.

3.3 Garbage Collector

SeaTunnel’s Zeta engine recommends using G1GC, which provides more predictable pause times for large heaps.

-XX:+UseG1GC: Enable G1 GC (enabled by default)
-XX:MaxGCPauseMillis=200: Target maximum GC pause time (in milliseconds)
- Real-time workloads: If latency is critical, you can lower this value (e.g., 100). Keep in mind this may increase GC frequency and slightly reduce overall throughput.
- Batch workloads: The default 200ms is usually a good balance.
-XX:InitiatingHeapOccupancyPercent=45:
Heap occupancy threshold that triggers concurrent GC.
If you observe frequent Full GC, try lowering it (e.g., 40) so GC starts earlier.

3.4 Metaspace

Metaspace stores class metadata. SeaTunnel consumes metaspace when loading connectors.

-XX:MaxMetaspaceSize: Maximum metaspace size

The default (2g) is usually sufficient.
If you encounter java.lang.OutOfMemoryError: Metaspace, increase it accordingly.

3.5 Troubleshooting

When OOM happens, heap dumps are extremely helpful for diagnosis.

-XX:+HeapDumpOnOutOfMemoryError: Generate a heap dump automatically on OOM
-XX:HeapDumpPath=/tmp/seatunnel/dump/: Path to store dump files

Notes:

Make sure the disk has enough space (at least larger than -Xmx)
In container environments, ensure the path is mounted to the host; otherwise, dumps will be lost after restart

4. JDK Compatibility

Recommended versions: Java 8 (JDK 1.8) or Java 11
These are the most thoroughly tested versions.
Java 17+:
Generally supported, but due to the module system introduced in Java 9+, you may encounter InaccessibleObjectException caused by restricted reflection access.

Solution:
If this happens, add --add-opens options in jvm_options, for example:

--add-opens java.base/java.lang=ALL-UNNAMED
--add-opens java.base/java.util=ALL-UNNAMED

5. Production Tuning Scenarios

Scenario 1: Large-Scale Batch Processing

Characteristics: Large data volume (TB scale), throughput is the priority

Worker recommendation:

-Xms8g -Xmx8g
-XX:+UseG1GC
-XX:ParallelGCThreads=8

Notes:

If the source reads data too quickly, memory may build up
Besides increasing heap size, consider:
- Limiting read_limit.rows_per_second
- Adjusting parallelism

Scenario 2: Real-Time CDC Synchronization

Characteristics: Long-running jobs, latency-sensitive, relatively stable memory usage

Worker recommendation:

-Xms4g -Xmx4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100

Notes:

Checkpoint frequency also affects memory usage (state backend caching)
If memory pressure is high, consider increasing checkpoint.interval

Scenario 3: Low-Memory Deployment (e.g., 4GB)

Risk: High chance of being killed by the OS

Worker recommendation:

-Xmx2560m

Allocate about 2.5GB to heap
Leave the remaining 1.5GB for:
- Off-heap memory (Netty)
- OS
- Other processes

6. How to Verify Your Configuration

After starting SeaTunnel, run:

jps -v | grep SeaTunnel

Example output:

12345 SeaTunnelServer ... -Xms8g -Xmx8g -XX:+UseG1GC ...

Make sure your parameters (e.g., -Xmx8g) appear at the end of the list (or are not overridden by later ones).

7. Docker / Kubernetes-Specific Configuration

7.1 Recommended Approach: Container-Aware Memory

In Kubernetes, memory is typically controlled via resources.limits.memory.
Instead of hardcoding -Xmx, it’s better to use percentage-based settings so the JVM can adapt automatically.

Example:

env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0 -XshowSettings:vm"

Explanation:

-XX:+UseContainerSupport: Allows JVM to detect container limits
-XX:MaxRAMPercentage=70.0: Sets heap to 70% of container memory

Why 70%?
The remaining 30% is needed for:

Direct memory (Netty)
Metaspace
Thread stacks
JVM overhead

7.2 Resource Limits

Make sure Kubernetes resource settings align with JVM needs.

Example: Want 8GB heap

JVM: 70%
K8s limit: 8 / 0.7 ≈ 11.5GB → set to 12Gi

resources:
  requests:
    memory: "12Gi"
    cpu: "4"
  limits:
    memory: "12Gi"
    cpu: "4"

7.3 Overriding Default Config

If default config files already define memory settings, they may override JAVA_OPTS.

To ensure your settings take effect:

Use command-line parameters (highest priority):

args: ["-DJvmOption=-XX:MaxRAMPercentage=70.0"]

Mount custom config files via ConfigMap

7.4 Common Pitfalls

❌ Setting limits.memory = 4Gi and -Xmx4g
→ No space left for non-heap memory → process will be killed
❌ Not setting requests
→ Pod may be scheduled on a node without enough memory

Code References

jvm_options
seatunnel-cluster.sh
values.yaml

DEV Community

Apache SeaTunnel Performance Tuning: How to Set JVM Parameters the Right Way

1. Configuration File Locations

2. Parameter Precedence

3. Key JVM Tuning Parameters

3.1 Heap Memory

3.2 Off-Heap Memory

3.3 Garbage Collector

3.4 Metaspace

3.5 Troubleshooting

4. JDK Compatibility

5. Production Tuning Scenarios

Scenario 1: Large-Scale Batch Processing

Scenario 2: Real-Time CDC Synchronization

Scenario 3: Low-Memory Deployment (e.g., 4GB)

6. How to Verify Your Configuration

7. Docker / Kubernetes-Specific Configuration

7.1 Recommended Approach: Container-Aware Memory

7.2 Resource Limits

7.3 Overriding Default Config

7.4 Common Pitfalls

Code References

Top comments (0)