"I’ll write about the DolphinScheduler integration when I have time; I owe too much content already." Well, the project is about to be deployed, so it’s time to settle the "debt".
1. Why Integrate with DolphinScheduler?
We’ve already verified that SeaTunnel’s Local mode works fine for ETL tasks. However, in a production environment, we need:
- Scheduled Dispatching: Automatic execution of data sync tasks daily or hourly.
- Task Dependencies: Triggering downstream tasks only after upstream data is ready.
- Alarm Notifications: Sending alerts when tasks fail (not a common role in smaller cities yet—usually we just wait for things to explode).
- O&M Management: Visualizing task status and historical execution records.
Honestly, I’m mostly just too lazy to use the command line. Executing tasks via a Web UI is much easier, and checking logs is convenient. If it’s a bit slower, that’s just more time for a water break.
DolphinScheduler and SeaTunnel are natively integrated, supporting SeaTunnel job configuration directly via the Web UI to meet all the above needs.
2. Deployment Environment
| Component | Version | Description |
|---|---|---|
| DolphinScheduler | 3.1.7+ | Scheduling Platform |
| SeaTunnel | 2.3.8+ / 2.3.12 | Data Sync Engine |
| Zeta Engine | Built-in | SeaTunnel Execution Engine |
Architecture Logic: DS handles scheduling and workflow orchestration; SeaTunnel handles the actual data reading and writing.
3. Integration Methods
3.1 Method 1: Calling SeaTunnel CLI via Shell Node
This is the most direct way—the "Shell approach" fits most scenarios.
Steps:
- Install the SeaTunnel client on the DolphinScheduler runtime node (API service not required).
- Call the
seatunnel.shscript within a Shell node.
#!/bin/bash
cd /opt/apache-seatunnel-2.3.12/bin
./seatunnel.sh --config /data/jobs/mysql_to_doris.conf -m local
Pros: Simple configuration, good compatibility, and avoids exposing sensitive database info.
Cons: Config files must be debugged in advance; modifications require using vim on the server (a headache just thinking about it).
3.2 Method 2: Submitting via SeaTunnel API or SeaTunnel Web
If you need granular control (task cancellation, status queries), use the API method.
- I haven't tried this because it seemed too troublesome...
3.3 Method 3: Official SeaTunnel Node
Using the SeaTunnel node in DolphinScheduler with the Zeta engine. I found it doesn't support IP settings, meaning DolphinScheduler and SeaTunnel must be bound to the same machine.
Consequently, SeaTunnel must be installed on every machine where DolphinScheduler is installed. Since DS is a cluster, tasks could be assigned to any node. For quick validation, I copied the local SeaTunnel version to all DS nodes instead of reinstalling the cluster version.
3.3.1 Validation with Default Config
Using default parameters (a script that generates test data and outputs to the console) resulted in an error:
Line 5: /bin/seatunnel.sh: No such file or directory.
Integration failed because the environment variables weren't configured, so the directory couldn't be found.
3.3.2 Modifying DolphinScheduler Environment Config
On the main DS node, modify the dolphinscheduler_env.sh file located in /opt/dolphinscheduler/bin/env:
Update: export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel} (where /opt/seatunnel is your installation path).
Restart the cluster. Official docs say this automatically updates the environment for all Worker and Master servers. If it doesn't work, manually update the conf directories on each node. Ensure all Workers, Masters, and API servers have the SEATUNNEL_HOME configured.
3.3.3 Re-verifying Integration
Rerun the task instance. Once you see the green checkmark, you’re good! Checking the logs shows the SeaTunnel logo and sync info. Integration successful.
3.3.4 Viewing Detailed Logs in a Cluster
Query the DS database using the task instance ID (e.g., 203971):
SELECT * FROM t_ds_task_instance where id=203971
The node IP and directory are recorded, but the actual log content must be retrieved by scanning the corresponding log file on that node.
4. DolphinScheduler Timezone Issues
Incorrect scheduling time is a major pain, often resulting in an 8-hour offset. DS has timezone settings (likely dependent on Java's xx_jackson_time_zone). If DS is started via systemctl, global Java variables might not work; modifying the DS configuration files directly is the most effective fix.
5. Summary
SeaTunnel’s strength lies in its multiple integration options and its ability to automatically create tables with templates. Integrating with DolphinScheduler adds management power, allowing you to manage .conf files via UI and making debugging much more convenient.

Top comments (0)