The May 2026 DolphinScheduler community update can be summarized with two keywords: stability and precision.
On one hand, major stability risks such as Master failover issues—which can have a significant impact on production environments when failures occur—have been addressed. On the other hand, long-standing usability problems, including API authorization gaps, plugin dependency conflicts, and RemoteShell null pointer exceptions, have been systematically fixed.
This monthly report highlights the key changes merged into the dev branch during May, including their impact on users, whether upgrades should be considered, and how to validate them.
Monthly Statistics
- Merged PRs: 50
- Contributors: 7
- Code Changes: +10,036 / -8,542
- Major Modules Involved: API, DAO, Master, Task Plugin, CI/Testing
You will notice that testing-related changes account for a large proportion of the updates. This reflects the community's effort to build a stronger foundation for future iterations. Stable and efficient CI/UT pipelines enable faster feature delivery and more reliable bug fixes.
Who Should Read This?
- End users and business teams: Want to know which common issues were fixed and whether production environments will become more stable.
- Operations and platform engineers: Care about failover, permissions, logging, and plugin stability.
- Developers: Want a quick overview of recent engineering governance efforts, including CI, unit testing, and quality assurance improvements.
The 6 Improvements Users Will Notice Most
1. More Reliable Master Failover
Typical scenario: after a Master node crashes, cluster recovery is slow or failover becomes stuck.
One of May's major fixes addresses failover lock leaks, reducing the likelihood that the scheduler remains unavailable for an extended period after failures.
Related PR:
https://github.com/apache/dolphinscheduler/pull/18207
2. More Rigorous Authorization for Critical APIs
Project-level authorization checks have been added to APIs such as view-gantt, view-variables, and trigger workflow.
This makes the permission model more intuitive: users without proper authorization should not be able to access these resources.
Related PR:
https://github.com/apache/dolphinscheduler/pull/18212
3. Fewer Null Pointer Exceptions in RemoteShell Tasks
Null pointer exceptions in remote tasks are notoriously difficult to troubleshoot due to distributed logs and complex execution contexts.
This month introduces fixes for RemoteShell-related NPEs, making task failures easier to understand and resolve.
Related PR:
https://github.com/apache/dolphinscheduler/pull/18210
4. Improved Dependency Conflict Management for Task Plugins
Plugins such as AliyunServerlessSpark previously suffered from dependency conflicts that could lead to ClassNotFound or compatibility issues.
Enhancements to dependency management and exception handling improve overall plugin reliability.
Related PR:
https://github.com/apache/dolphinscheduler/pull/18180
5. Faster and More Reliable CI and Unit Testing
This is not a user-facing feature, but it matters greatly.
More stable CI pipelines catch problems before code is merged, and stronger testing reduces the likelihood of production incidents.
Related PRs:
- https://github.com/apache/dolphinscheduler/pull/18213
- https://github.com/apache/dolphinscheduler/pull/18214
- https://github.com/apache/dolphinscheduler/pull/18205
6. More Flexible Region and Endpoint Support for AWS S3 Remote Logs
Users relying on S3-compatible storage services or private endpoints now have greater flexibility when configuring regions and endpoints.
This reduces troubleshooting time for connectivity issues caused by storage configuration differences.
Related PR:
https://github.com/apache/dolphinscheduler/pull/18268
Upgrade and Validation Recommendations
This report is based on PRs merged into the dev branch during May 2026, making it valuable for tracking development trends and performing early validation.
If you are running DolphinScheduler in production, prioritize upgrades based on risk:
Recommended for Immediate Attention
- Master failover improvements
- Authorization and security-related fixes
- Task plugin stability enhancements
Can Be Adopted as Needed
- CI and testing optimizations
- Documentation and formatting updates
- Return-type migration and engineering quality improvements
Since all changes were merged into the dev branch, validation in testing or integration environments is recommended:
git fetch origin dev
git checkout dev
git pull --rebase
Focus regression testing on the following scenarios:
- Master restart and failover recovery
- Critical API authorization validation
- Common task plugins such as RemoteShell and ServerlessSpark
Contributor Acknowledgements
Thanks to all contributors who submitted and merged PRs to Apache DolphinScheduler during May 2026.
Your contributions continue to improve the platform's stability, usability, and ecosystem capabilities.
| GitHub Username | Main Contribution | Merged PRs | +Lines | -Lines | Score |
|---|---|---|---|---|---|
| @ruanwenjun | Test Cases | 40 | 7367 | 6506 | 349.69 |
| @SbloodyS | Test Cases | 4 | 2503 | 1988 | 45.83 |
| @hiSandog | Documentation | 2 | 34 | 7 | 15.12 |
| @leocook | Debug & Fix | 1 | 34 | 29 | 9.15 |
| @includetts | Debug & Fix | 1 | 16 | 6 | 9.06 |
| @llphxd | Documentation | 1 | 4 | 4 | 9.02 |
| @wcmolin | Test Cases | 1 | 78 | 2 | 8.26 |
In-Depth Analysis of Key Technical Changes
A total of 50 PRs were merged this month.
The primary focus areas include:
- Stability
- Security and Authorization
- Plugin Reliability
- CI and Testing Efficiency
To help readers quickly understand the most important developments, the following section analyzes five representative changes in detail.
1. [Fix-18197][Master] Fix master failover lock leak (#18207)
- Link: https://github.com/apache/dolphinscheduler/pull/18207
- Author: @ruanwenjun
- Base/Head: dev ← dev_wenjun_fix18197
- Diff Stats: +171 / -10
Background and Challenges
Master failover relies on distributed locks to ensure that failover for a given address is not executed concurrently.
If lock release logic is incorrect, lock nodes may leak, preventing future failover operations and leaving the cluster unable to resume scheduling after failures.
Design and Implementation
The lock acquisition interface was redesigned to return an AutoCloseable handle.
Using try-with-resources guarantees symmetric acquire/release behavior.
Additionally, callers now retain the exact lock path, preventing subtle mistakes such as releasing parent paths.
Suggested Metrics
Simulate failover storms in a three-Master cluster by repeatedly issuing kill -9 and automatic restarts.
Compare:
- Failover success rate
- Mean Time To Recovery (MTTR)
- Failover thread blocking duration
Registry lock node count should also be monitored, as lock leaks accumulate over time.
Compatibility and Rollback
Interface signature changes may affect callers.
Rollback is straightforward but requires cleanup of leaked lock nodes to prevent continued service disruption.
2. [Fix][API] Add missing project authorization on view-gantt/view-variables and trigger workflow APIs (#18212)
- Link: https://github.com/apache/dolphinscheduler/pull/18212
- Author: @ruanwenjun
- Base/Head: dev ← dev_wenjun_fixCvePermissionCheck
- Diff Stats: +321 / -16
Background and Challenges
Workflow APIs without project-level authorization checks can create privilege escalation risks.
In multi-tenant enterprise environments, this becomes a serious security concern.
Design and Implementation
Authorization validation was added to:
- view-gantt
- view-variables
- trigger workflow
Permission checks are enforced consistently through Controller and Service layers.
Suggested Validation
Benchmark authorization overhead before and after implementation.
Security regression tests should include cross-project access attempts.
Best Practices
Enterprise users should enable stricter tenant isolation policies and audit sensitive API operations.
3. [Fix-18201][TaskPlugin] Fix RemoteShell task NullPointerException and… (#18210)
- Link: https://github.com/apache/dolphinscheduler/pull/18210
- Author: @leocook
- Base/Head: dev ← fix-18201-remoteshell-npe
- Diff Stats: +34 / -29
Background and Challenges
RemoteShell tasks are commonly used for operations and integration workloads.
Network interruptions, command output handling differences, and SSH channel inconsistencies can easily lead to NPEs and incomplete logs.
Design and Implementation
Input/output stream handling for SSH channels was improved to eliminate null pointer scenarios.
Exception handling paths were also enhanced to preserve root-cause information.
Suggested Validation
Inject failures such as:
- Remote disconnections
- Empty output streams
- Immediate command termination
Execute 1,000 test runs and compare:
- NPE occurrence rates
- Log completeness
Risks and Rollback
Changes are isolated to the plugin layer and are relatively easy to revert.
Regression tests should continue covering:
- Empty output
- Large output
- Non-zero exit codes
4. [Fix-18177][Task Plugin] Fix AliyunServerlessSpark plugin dependency conflicts and improve exception handling (#18180)
- Link: https://github.com/apache/dolphinscheduler/pull/18180
- Author: @includetts
- Base/Head: dev ← fix/aliyun-serverless-spark-deps-v2
- Diff Stats: +16 / -6
Background and Challenges
Dependency conflicts are classic runtime problems that often manifest as:
- NoSuchMethodError
- NoSuchFieldError
They are difficult to reproduce because they only occur under specific dependency combinations.
Design and Implementation
Critical dependency versions were corrected and exception wrapping improved.
Users can now directly identify conflicting classes and methods from logs.
Suggested Validation
Execute smoke tests under multiple Hadoop and Spark dependency trees.
Measure:
- Startup success rate
- Exception readability
- Time-to-diagnosis
Best Practices
Production environments should consider dependency isolation techniques such as:
- Shading
- Relocation
- Dedicated ClassLoaders
5. [Chore] Unit-Test performance optimize (#18213)
- Link: https://github.com/apache/dolphinscheduler/pull/18213
- Author: @SbloodyS
- Base/Head: dev ← ut_performance_optimize
- Diff Stats: +22 / -6
Background and Challenges
Slow, flaky, or frequently skipped tests delay problem detection until production deployment.
Testing infrastructure directly impacts community development speed and software quality.
Design and Implementation
Unit test execution and CI configurations were optimized.
Temporary safeguards were also introduced to maintain CI stability during environmental issues.
Suggested Validation
Compare:
- Total CI duration
- Number of executed unit tests
- Percentage of skipped tests
- Flaky test rerun counts
Risks and Rollback
Temporary test disablement should always include a documented recovery plan.
Conditions for re-enabling tests should be tracked through issues and PRs.
Appendix
- PR #18204: https://github.com/apache/dolphinscheduler/pull/18204
- PR #18208: https://github.com/apache/dolphinscheduler/pull/18208
- PR #18206: https://github.com/apache/dolphinscheduler/pull/18206
- PR #18207: https://github.com/apache/dolphinscheduler/pull/18207
- PR #18205: https://github.com/apache/dolphinscheduler/pull/18205
- PR #18213: https://github.com/apache/dolphinscheduler/pull/18213
- PR #18209: https://github.com/apache/dolphinscheduler/pull/18209
- PR #18180: https://github.com/apache/dolphinscheduler/pull/18180
- PR #18212: https://github.com/apache/dolphinscheduler/pull/18212
- PR #18210: https://github.com/apache/dolphinscheduler/pull/18210
- PR #18214: https://github.com/apache/dolphinscheduler/pull/18214
- PR #18221: https://github.com/apache/dolphinscheduler/pull/18221
- PR #18218: https://github.com/apache/dolphinscheduler/pull/18218
- PR #18225: https://github.com/apache/dolphinscheduler/pull/18225
- PR #18227: https://github.com/apache/dolphinscheduler/pull/18227
- PR #18241: https://github.com/apache/dolphinscheduler/pull/18241
- PR #18240: https://github.com/apache/dolphinscheduler/pull/18240
- PR #18226: https://github.com/apache/dolphinscheduler/pull/18226
- PR #18228: https://github.com/apache/dolphinscheduler/pull/18228
- PR #18229: https://github.com/apache/dolphinscheduler/pull/18229
- PR #18232: https://github.com/apache/dolphinscheduler/pull/18232
- PR #18223: https://github.com/apache/dolphinscheduler/pull/18223
- PR #18230: https://github.com/apache/dolphinscheduler/pull/18230
- PR #18234: https://github.com/apache/dolphinscheduler/pull/18234
- PR #18242: https://github.com/apache/dolphinscheduler/pull/18242
- PR #18236: https://github.com/apache/dolphinscheduler/pull/18236
- PR #18233: https://github.com/apache/dolphinscheduler/pull/18233
- PR #18245: https://github.com/apache/dolphinscheduler/pull/18245
- PR #18250: https://github.com/apache/dolphinscheduler/pull/18250
- PR #18251: https://github.com/apache/dolphinscheduler/pull/18251
- PR #18252: https://github.com/apache/dolphinscheduler/pull/18252
- PR #18257: https://github.com/apache/dolphinscheduler/pull/18257
- PR #18270: https://github.com/apache/dolphinscheduler/pull/18270
- PR #18271: https://github.com/apache/dolphinscheduler/pull/18271
- PR #18258: https://github.com/apache/dolphinscheduler/pull/18258
- PR #18253: https://github.com/apache/dolphinscheduler/pull/18253
- PR #18260: https://github.com/apache/dolphinscheduler/pull/18260
- PR #18259: https://github.com/apache/dolphinscheduler/pull/18259
- PR #18256: https://github.com/apache/dolphinscheduler/pull/18256
- PR #18263: https://github.com/apache/dolphinscheduler/pull/18263
- PR #18262: https://github.com/apache/dolphinscheduler/pull/18262
- PR #18261: https://github.com/apache/dolphinscheduler/pull/18261
- PR #18254: https://github.com/apache/dolphinscheduler/pull/18254
- PR #18279: https://github.com/apache/dolphinscheduler/pull/18279
- PR #18284: https://github.com/apache/dolphinscheduler/pull/18284
- PR #18288: https://github.com/apache/dolphinscheduler/pull/18288
- PR #18268: https://github.com/apache/dolphinscheduler/pull/18268
- PR #18296: https://github.com/apache/dolphinscheduler/pull/18296
- PR #18300: https://github.com/apache/dolphinscheduler/pull/18300
- PR #18301: https://github.com/apache/dolphinscheduler/pull/18301



Top comments (0)