DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Apache SeaTunnel Performance Tuning: How to Set JVM Parameters the Right Way

As a high-performance distributed data integration platform, properly tuning JVM parameters for Apache SeaTunnel is essential if you want better throughput, lower latency, and stable execution.

So how should you tune JVM parameters?
In this article, we’ll walk through where to configure them, how precedence works, the key parameters to focus on, and some practical tuning strategies.

1. Configuration File Locations

SeaTunnel manages JVM parameters through configuration files under $SEATUNNEL_HOME/config/. Depending on the deployment role, there are four main files:

File Name Scope Default Example
jvm_options Hybrid mode (master_and_worker), where Master and Worker run in the same process -Xms2g -Xmx2g -XX:+UseG1GC
jvm_master_options Dedicated Master node, responsible for scheduling and state management (no computation) -Xms2g -Xmx2g
jvm_worker_options Dedicated Worker node, responsible for data reading, transformation, and writing (main memory consumer) -Xms2g -Xmx2g
jvm_client_options Client side (seatunnel.sh), used to parse configs and submit jobs -Xms256m -Xmx512m

2. Parameter Precedence

Understanding parameter precedence is critical when troubleshooting.

SeaTunnel loads JVM parameters in the following order, and later ones override earlier ones (for example, the last -Xmx wins):

  1. Environment variable JAVA_OPTS
    Loaded first. You can define it in system env variables or in config/seatunnel-env.sh.

  2. Configuration files (config/jvm_*_options)
    Loaded next, and override anything set in JAVA_OPTS.

  3. Command-line parameters (-DJvmOption)
    Loaded last, with the highest priority.

Example:
If JAVA_OPTS="-Xmx4g", the config file sets -Xmx2g, and the startup command includes -DJvmOption="-Xmx8g", then the effective value will be 8g.

3. Key JVM Tuning Parameters

3.1 Heap Memory

Heap memory is the most important part of JVM tuning. It directly determines how much data SeaTunnel can process in parallel without running into OOM (Out Of Memory).

  • -Xms: Initial heap size
  • -Xmx: Maximum heap size

Best practices:

  • Worker nodes:
    It’s strongly recommended to set -Xms and -Xmx to the same value (for example, -Xms8g -Xmx8g).
    This avoids runtime heap resizing, reduces performance fluctuations, and helps prevent memory fragmentation.

  • Master nodes:
    Memory requirements are relatively low. In most cases, 2g–4g is sufficient. Increase it only if the cluster handles many jobs.

  • Client:
    The default 512m is usually enough. If your job configuration (SQL/JSON) is very large (tens of thousands of lines), consider increasing it to 1g or more.

3.2 Off-Heap Memory

Important note:
You may notice that the actual physical memory (RSS) used by SeaTunnel is significantly larger than the -Xmx value.

  • Why?
    SeaTunnel uses Netty for network communication, which relies heavily on off-heap (direct) memory for zero-copy data transfer.
    In addition, thread stacks (-Xss * number of threads), Metaspace, and JVM overhead also consume non-heap memory.

  • Risk:
    If the machine runs out of physical memory, the Linux OOM Killer may terminate the process (usually a Worker).

Recommendations:

  • Reserve memory for the OS:
    On an 8GB machine, keep -Xmx below 5g, leaving around 3GB for off-heap memory and the operating system.

  • Docker/Kubernetes:
    The container memory limit must be larger than -Xmx plus estimated off-heap usage.
    A common rule is to set it to about 1.5× -Xmx.

3.3 Garbage Collector

SeaTunnel’s Zeta engine recommends using G1GC, which provides more predictable pause times for large heaps.

  • -XX:+UseG1GC: Enable G1 GC (enabled by default)

  • -XX:MaxGCPauseMillis=200: Target maximum GC pause time (in milliseconds)

    • Real-time workloads: If latency is critical, you can lower this value (e.g., 100). Keep in mind this may increase GC frequency and slightly reduce overall throughput.
    • Batch workloads: The default 200ms is usually a good balance.
  • -XX:InitiatingHeapOccupancyPercent=45:
    Heap occupancy threshold that triggers concurrent GC.
    If you observe frequent Full GC, try lowering it (e.g., 40) so GC starts earlier.

3.4 Metaspace

Metaspace stores class metadata. SeaTunnel consumes metaspace when loading connectors.

  • -XX:MaxMetaspaceSize: Maximum metaspace size

The default (2g) is usually sufficient.
If you encounter java.lang.OutOfMemoryError: Metaspace, increase it accordingly.

3.5 Troubleshooting

When OOM happens, heap dumps are extremely helpful for diagnosis.

  • -XX:+HeapDumpOnOutOfMemoryError: Generate a heap dump automatically on OOM
  • -XX:HeapDumpPath=/tmp/seatunnel/dump/: Path to store dump files

Notes:

  • Make sure the disk has enough space (at least larger than -Xmx)
  • In container environments, ensure the path is mounted to the host; otherwise, dumps will be lost after restart

4. JDK Compatibility

  • Recommended versions: Java 8 (JDK 1.8) or Java 11
    These are the most thoroughly tested versions.

  • Java 17+:
    Generally supported, but due to the module system introduced in Java 9+, you may encounter InaccessibleObjectException caused by restricted reflection access.

Solution:
If this happens, add --add-opens options in jvm_options, for example:

--add-opens java.base/java.lang=ALL-UNNAMED
--add-opens java.base/java.util=ALL-UNNAMED
Enter fullscreen mode Exit fullscreen mode

5. Production Tuning Scenarios

Scenario 1: Large-Scale Batch Processing

  • Characteristics: Large data volume (TB scale), throughput is the priority

Worker recommendation:

-Xms8g -Xmx8g
-XX:+UseG1GC
-XX:ParallelGCThreads=8
Enter fullscreen mode Exit fullscreen mode

Notes:

  • If the source reads data too quickly, memory may build up
  • Besides increasing heap size, consider:

    • Limiting read_limit.rows_per_second
    • Adjusting parallelism

Scenario 2: Real-Time CDC Synchronization

  • Characteristics: Long-running jobs, latency-sensitive, relatively stable memory usage

Worker recommendation:

-Xms4g -Xmx4g
-XX:+UseG1GC
-XX:MaxGCPauseMillis=100
Enter fullscreen mode Exit fullscreen mode

Notes:

  • Checkpoint frequency also affects memory usage (state backend caching)
  • If memory pressure is high, consider increasing checkpoint.interval

Scenario 3: Low-Memory Deployment (e.g., 4GB)

  • Risk: High chance of being killed by the OS

Worker recommendation:

-Xmx2560m
Enter fullscreen mode Exit fullscreen mode
  • Allocate about 2.5GB to heap
  • Leave the remaining 1.5GB for:

    • Off-heap memory (Netty)
    • OS
    • Other processes

6. How to Verify Your Configuration

After starting SeaTunnel, run:

jps -v | grep SeaTunnel
Enter fullscreen mode Exit fullscreen mode

Example output:

12345 SeaTunnelServer ... -Xms8g -Xmx8g -XX:+UseG1GC ...
Enter fullscreen mode Exit fullscreen mode

Make sure your parameters (e.g., -Xmx8g) appear at the end of the list (or are not overridden by later ones).

7. Docker / Kubernetes-Specific Configuration

7.1 Recommended Approach: Container-Aware Memory

In Kubernetes, memory is typically controlled via resources.limits.memory.
Instead of hardcoding -Xmx, it’s better to use percentage-based settings so the JVM can adapt automatically.

Example:

env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=70.0 -XshowSettings:vm"
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • -XX:+UseContainerSupport: Allows JVM to detect container limits
  • -XX:MaxRAMPercentage=70.0: Sets heap to 70% of container memory

Why 70%?
The remaining 30% is needed for:

  • Direct memory (Netty)
  • Metaspace
  • Thread stacks
  • JVM overhead

7.2 Resource Limits

Make sure Kubernetes resource settings align with JVM needs.

Example: Want 8GB heap

  • JVM: 70%
  • K8s limit: 8 / 0.7 ≈ 11.5GB → set to 12Gi
resources:
  requests:
    memory: "12Gi"
    cpu: "4"
  limits:
    memory: "12Gi"
    cpu: "4"
Enter fullscreen mode Exit fullscreen mode

7.3 Overriding Default Config

If default config files already define memory settings, they may override JAVA_OPTS.

To ensure your settings take effect:

  1. Use command-line parameters (highest priority):
args: ["-DJvmOption=-XX:MaxRAMPercentage=70.0"]
Enter fullscreen mode Exit fullscreen mode
  1. Mount custom config files via ConfigMap

7.4 Common Pitfalls

  • ❌ Setting limits.memory = 4Gi and -Xmx4g
    → No space left for non-heap memory → process will be killed

  • ❌ Not setting requests
    → Pod may be scheduled on a node without enough memory

Code References

  • jvm_options
  • seatunnel-cluster.sh
  • values.yaml

Top comments (0)