DEV Community

Apache SeaTunnel
Apache SeaTunnel

Posted on

Apache DolphinScheduler 3.4.1 Released with Task Dispatch Timeout Detection

The 3.4.1 version of Apache DolphinScheduler has been officially released by the community. As a maintenance release in the 3.4.x series, this update focuses on improving scheduling stability, enhancing task execution control, and fixing system issues.

The new version introduces a task dispatch timeout detection mechanism and maximum runtime control for tasks, while also resolving multiple issues in scheduling logic, plugin functionality, and API behavior. In addition, system documentation, development processes, and project structure have been further optimized.

Key Highlights

Task Dispatch Timeout Detection Mechanism

A task dispatch timeout checking logic has been added to the Master scheduling module. When a task is dispatched to a Worker for execution, if the Worker Group does not exist or no Worker nodes are available, the scheduler can detect the dispatch exception within a certain period and handle it accordingly.

This mechanism prevents tasks from remaining in a waiting state for an extended time and improves the system’s fault tolerance in scenarios involving resource anomalies (#17795, #17796).

Support for Configuring Maximum Runtime for Workflow and Task Instances

The new version allows users to configure a maximum runtime for both Workflow Instances and Task Instances.

Users can define the maximum execution duration for tasks or workflows. If the runtime exceeds the configured threshold, the system can trigger timeout handling mechanisms, preventing tasks from hanging or occupying resources indefinitely and improving overall operational controllability (#17931, #17932).

Key Fixes and Improvements

Scheduling System Stability Fixes

  • Fixed an issue where task timeout alerts were not triggered (#17820, #17818)
  • Fixed the issue where the workflow failure strategy did not take effect (#17834, #17851)
  • Automatically mark a task as failed when task execution context initialization fails (#17758, #17821)
  • Fixed incorrect parallelism calculation in backfill tasks under parallel execution mode (#17831, #17853)

Database and Compatibility Fixes

  • Fixed SQL execution errors for dependent tasks in PostgreSQL environments (#17690, #17837)
  • Fixed mismatched INT/BIGINT column types in database tables (#17979, #17988)

API and Permission Fixes

  • Removed the WAIT_TO_RUN state and added a FAILOVER state when querying workflow instances (#17838, #17839)
  • Added tenant validation for the Workflow API (#17969, #17970)
  • Fixed an issue where non-admin users could not delete their own Access Tokens (#17995, #17997)

Plugin and Task Execution Fixes

  • Fixed incorrect JVM parameter position in Java Task (#17848, #17850)
  • Fixed an issue where Procedure Task parameters could not be passed correctly (#17967, #17968)
  • Fixed the issue where ProcedureTask could not return parameters or execute query stored procedures (#17971, #17973)
  • Fixed an issue where the HTTP plugin could not send nested JSON structures (#17912, #17911)
  • Fixed inconsistent timeout units in the HTTP alert plugin (#17915, #17920)

UI and Documentation Fixes

  • Removed the STOP state from task instances in the UI (#17864, #17865)
  • Fixed an issue where locks were not released when workflow definition list loading failed (#17984, #17989)
  • Fixed the Keycloak login icon 404 issue (#18006, #18007)
  • Corrected errors in the installation documentation (#17901, #17903)
  • Fixed a SeaTunnel documentation link 404 issue (#17904, #17905)

In-Depth Feature Analysis

In modern data platform architectures, scheduling systems often serve as key infrastructure connecting various computing engines. Tasks from systems such as Apache Spark, Apache Flink, and Apache Hive are commonly orchestrated through a unified scheduler.

However, in production environments, scheduling systems often face challenges such as:

  • Worker resource anomalies preventing tasks from being scheduled
  • Uncontrollable task execution time
  • Unstable plugin execution behavior

The newly introduced task dispatch timeout detection mechanism enables the scheduler to quickly identify anomalies when Workers do not exist or resources are unavailable, preventing tasks from waiting indefinitely (#17795, #17796).

At the same time, the maximum runtime control capability provides a more flexible management approach for task execution. By setting a maximum runtime for workflows or tasks, the system can take action when tasks hang or run abnormally long, preventing resources from being occupied for extended periods (#17931, #17932).

These improvements further enhance DolphinScheduler’s stability and controllability in production-grade data platform environments.

Acknowledgements

The release of Apache DolphinScheduler 3.4.1 would not have been possible without the contributions of community developers. Special thanks to the release manager @ruanwenjun and the following contributors for their work on this version:

  • SbloodyS
  • njnu-seafish
  • Mrhs121
  • ylq5126
  • qiong-zhou
  • XpengCen
  • iampratap7997-dot
  • yzeng1618
  • Alexander1902
  • maomao199691
  • asadjan4611
  • dill21yu

Final Thoughts

Apache DolphinScheduler 3.4.1 is a maintenance release focused on improving scheduling stability and enhancing task runtime control.

With the introduction of scheduling fault-tolerance mechanisms, maximum task runtime control, and numerous bug fixes, this version further strengthens the system’s reliability in production environments.

As the community continues to grow, Apache DolphinScheduler is steadily improving its capabilities in the data workflow orchestration space, providing enterprises with a more stable and efficient infrastructure for building modern data platforms. We welcome more contributors to join the community and help drive the development of the project forward.

Top comments (0)