<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chen Debra</title>
    <description>The latest articles on DEV Community by Chen Debra (@chen_debra_3060b21d12b1b0).</description>
    <link>https://dev.to/chen_debra_3060b21d12b1b0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1533306%2Fc0ea3a94-ba17-47c8-9304-4571fb1adaf9.png</url>
      <title>DEV Community: Chen Debra</title>
      <link>https://dev.to/chen_debra_3060b21d12b1b0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chen_debra_3060b21d12b1b0"/>
    <language>en</language>
    <item>
      <title>Part 9 | Beyond Scheduling: How Data Platforms Evolve into DataOps Systems</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 24 Apr 2026 02:20:41 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-9-beyond-scheduling-how-data-platforms-evolve-into-dataops-systems-36em</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-9-beyond-scheduling-how-data-platforms-evolve-into-dataops-systems-36em</guid>
      <description>&lt;p&gt;In the continuous evolution of data platforms, many teams encounter a critical turning point: the scheduling system is already stable, and tasks run on time, yet overall efficiency does not improve. Instead, as the scale grows, the system becomes increasingly difficult to maintain. The root cause is that the platform still operates at the level of “task scheduling” rather than advancing to the level of “engineering governance.”&lt;/p&gt;

&lt;p&gt;This article focuses on that transformation—how scheduling evolves from an execution tool into the core platform supporting DataOps, along with the key methodologies and practical approaches involved. It also uses Apache DolphinScheduler as a concrete example to illustrate this transition.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Evolution of the Scheduler’s Role&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the beginning, scheduling systems were essentially enhanced tools for timed execution. Tasks existed in the form of scripts, triggered by time, with little to no clear dependency relationships between them. This model worked when the number of tasks was small, but as data pipelines became more complex, issues began to emerge: tasks affected each other without visibility, retry strategies were lacking, and pipeline states were difficult to trace.&lt;/p&gt;

&lt;p&gt;To address these problems, scheduling systems gradually introduced workflow orchestration mechanisms, organizing tasks into Directed Acyclic Graphs (DAGs), enabling structured representation of data processing flows. For example, a standard ETL process can be clearly connected through dependencies.&lt;/p&gt;

&lt;p&gt;At this stage, the key improvement is that scheduling is no longer just a “trigger,” but becomes the “organizer” of data workflows. However, it still remains at the execution layer and does not solve deeper management challenges.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzp4fhsthcnm0zlpswxu9.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzp4fhsthcnm0zlpswxu9.jpg" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Engineering Transformation Driven by Standards&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As the number of tasks continues to grow, teams often realize that the real bottleneck is not scheduling capability, but the disorder of tasks themselves. The same data is repeatedly developed, naming conventions vary across tasks, code reuse is limited, and lineage relationships are difficult to track. At the core, these issues stem from a lack of unified standards.&lt;/p&gt;

&lt;p&gt;As a result, the focus of platform development shifts from “enhancing scheduling capabilities” to “establishing engineering standards.” By abstracting a unified development model and standardizing the data processing workflow, maintainability can be significantly improved. For instance, tasks can be uniformly divided into three stages: extract, transform, and load.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd3dgytyh2k7ap2mz0ug.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgd3dgytyh2k7ap2mz0ug.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Based on this abstraction, individual tasks only need to implement their own logic, avoiding repetitive development.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgygvtwlh4ajowqbf73v3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgygvtwlh4ajowqbf73v3.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Once these standards are gradually implemented, tasks are no longer scattered scripts but become structured engineering units, laying the foundation for subsequent governance capabilities.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;How Scheduling Platforms Support Engineering Governance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After task standardization is achieved, the role of the scheduling platform undergoes a qualitative transformation. It is no longer just responsible for executing tasks but becomes the control center of the entire data engineering process. By centrally managing task metadata—such as owners, retry strategies, and priorities—the platform enables full lifecycle control over tasks.&lt;/p&gt;

&lt;p&gt;At the same time, dependency relationships built through workflows naturally form data lineage, supporting impact analysis and issue diagnosis.&lt;/p&gt;

&lt;p&gt;Observability becomes a critical capability at this stage. By continuously monitoring metrics such as execution duration, success rate, and resource consumption, the platform can proactively identify risks. For example, adding simple monitoring logic during execution allows timely alerts when anomalies occur:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;send_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Furthermore, when the scheduling platform is integrated with code repositories, data development can be incorporated into CI/CD processes, enabling automated validation and deployment. Every change is recorded, and every release is verified, gradually bringing data development in line with software engineering practices.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;DataOps Practices with Apache DolphinScheduler&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When applying the above concepts to a real system, Apache DolphinScheduler provides a representative implementation path. It is not merely a scheduling tool but has progressively evolved to include key capabilities of a DataOps platform.&lt;/p&gt;

&lt;p&gt;First, in terms of &lt;strong&gt;task standardization&lt;/strong&gt;, DolphinScheduler defines a hierarchical structure of “project–workflow–task,” clearly separating development boundaries, resource isolation, and execution units. Each task must specify execution type, resources, retry strategies, and other metadata. This effectively enforces engineering standards rather than allowing arbitrary script integration.&lt;/p&gt;

&lt;p&gt;Second, in &lt;strong&gt;workflow governance&lt;/strong&gt;, DolphinScheduler uses visual DAG orchestration to clearly represent complex dependencies. For example, a typical data pipeline can be defined programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tasks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dependencies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extract&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;load&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure is not only used for execution but can also support lineage analysis and impact assessment.&lt;/p&gt;

&lt;p&gt;Furthermore, in terms of &lt;strong&gt;resource governance&lt;/strong&gt;, DolphinScheduler integrates with underlying resource management systems such as YARN or Kubernetes. Through tenant mechanisms, scheduling maps directly to actual computing resources. This means scheduling is not just about “arranging tasks,” but about controlling resource boundaries and preventing interference between tasks.&lt;/p&gt;

&lt;p&gt;In terms of &lt;strong&gt;observability&lt;/strong&gt;, DolphinScheduler provides built-in capabilities such as task logs, execution tracking, and alerting mechanisms, making task execution traceable and auditable. When a node fails, engineers can quickly locate the specific task instance instead of manually searching through logs.&lt;/p&gt;

&lt;p&gt;Finally, in &lt;strong&gt;engineering capabilities&lt;/strong&gt;, DolphinScheduler integrates with code management systems to support version control and release management of workflows. Through APIs or automation pipelines, it enables a complete delivery lifecycle from development to testing to production, which is a core aspect of “continuous delivery” in DataOps.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Evolution Path of Enterprise Data Platforms&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;From a broader perspective, enterprise data platforms typically evolve through a progressive process. They start with simple script-based and time-triggered systems, then move to workflow-oriented scheduling platforms, further incorporate metadata management and access control, and ultimately evolve into DataOps platforms with automation, observability, and governance capabilities.&lt;/p&gt;

&lt;p&gt;The essence of this evolution is the continuous upward shift of focus—from “whether tasks run” to “whether data is reliable,” and finally to “whether engineering is governable.” Each stage reduces complexity while improving controllability and system stability.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;A Governable Data Task in Practice&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When these concepts are applied in practice, it becomes possible to build data tasks with governance capabilities. Before execution, schema validation can be performed; after execution, runtime metrics can be reported, ensuring full lifecycle control.&lt;/p&gt;

&lt;p&gt;At the scheduling layer, task behavior is constrained through unified configurations such as SLA, retry strategies, and alert mechanisms. This approach ensures that tasks no longer depend on individual experience but operate within a standardized governance framework.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtkh4j19gn5nqdpc1ltd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtkh4j19gn5nqdpc1ltd.png" alt="1" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The ultimate goal of a scheduling system is never just to “run tasks faster,” but to “make data development manageable.” When a platform can enforce standards, organize workflows, ensure stability through monitoring, and support evolution through automation, it has completed the transformation from scheduling to DataOps.&lt;/p&gt;

&lt;p&gt;Scheduling systems represented by Apache DolphinScheduler are evolving from the execution layer to the governance layer—marking the true arrival of the DataOps era.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous articles:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-ffeaf159f534" rel="noopener noreferrer"&gt;Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-1c97d8d0327e" rel="noopener noreferrer"&gt;Part 7 | Where Scheduling Systems Really Break and the Hidden Bottlenecks Beyond CPU and Scale&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-8-boundaries-collaboration-and-best-practices-between-apache-dolphinscheduler-and-flink-4992ae5e1bc5" rel="noopener noreferrer"&gt;Part 8 | Boundaries, Collaboration, and Best Practices Between Apache DolphinScheduler and Flink &amp;amp; Spark&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next: From Scheduling to DataOps: DolphinScheduler as the Control Plane&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>apachedolphinscheduler</category>
      <category>dataops</category>
      <category>systems</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How A Leading Manufacturing Enterprise in Shenzhen Deploys Apache DolphinScheduler Across Dozens of Factories Within One Day?</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 17 Apr 2026 07:24:25 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/how-a-leading-manufacturing-enterprise-in-shenzhen-deploys-apache-dolphinscheduler-across-dozens-of-53k7</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/how-a-leading-manufacturing-enterprise-in-shenzhen-deploys-apache-dolphinscheduler-across-dozens-of-53k7</guid>
      <description>&lt;p&gt;&lt;a href="https://youtu.be/OKjCaqQgHoU" rel="noopener noreferrer"&gt;https://youtu.be/OKjCaqQgHoU&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the wave of digital transformation sweeps across the globe, intelligent manufacturing has become the core engine driving high-quality growth in the manufacturing industry.&lt;br&gt;
However, on the path toward intelligence, enterprises are facing a wide range of challenges: data silos across multiple systems, complex scheduling dependencies, and delayed monitoring and alerting issues continue to emerge.&lt;/p&gt;

&lt;p&gt;At a recent Apache DolphinScheduler online user meetup, the community invited Qiu Zhongbiao, a senior software engineer from a large intelligent manufacturing enterprise in Shenzhen.&lt;/p&gt;

&lt;p&gt;During the session, he delivered a detailed sharing on the practical application of Apache DolphinScheduler in real manufacturing scenarios.&lt;/p&gt;

&lt;p&gt;This article organizes the key content from that talk to explore how this enterprise achieved a qualitative leap in its scheduling platform with Apache DolphinScheduler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;About the Author&lt;/strong&gt;&lt;br&gt;
Qiu Zhongbiao is a senior software engineer at a large intelligent manufacturing enterprise in Shenzhen.&lt;/p&gt;

&lt;p&gt;He focuses on data technology research and practice in the field of intelligent manufacturing.&lt;/p&gt;

&lt;p&gt;He is dedicated to promoting the digital transformation of the manufacturing industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Era of Intelligent Manufacturing
&lt;/h2&gt;

&lt;p&gt;With the continuous advancement of Industry 4.0, intelligent manufacturing has become the focus of global competition in the manufacturing sector.&lt;/p&gt;

&lt;p&gt;The maturity model of intelligent manufacturing is divided into multiple levels from low to high.&lt;/p&gt;

&lt;p&gt;Enterprises need to progressively improve their capabilities in automation, digitalization, and networking, and ultimately achieve fully intelligent production.&lt;/p&gt;

&lt;p&gt;In this process, data becomes a core production factor.&lt;/p&gt;

&lt;p&gt;How to efficiently, stably, and reliably collect, process, and schedule this data has become a critical challenge faced by every manufacturing enterprise.&lt;/p&gt;

&lt;p&gt;The data environment in modern manufacturing enterprises is becoming increasingly complex.&lt;br&gt;
On one hand, enterprises operate a large number of business systems, including MES (Manufacturing Execution System), ERP (Enterprise Resource Planning), WMS (Warehouse Management System), WCS (Warehouse Control System), CRM (Customer Relationship Management), QMS (Quality Management System), PLM (Product Lifecycle Management), SCM (Supply Chain Management), and APS (Advanced Planning and Scheduling).&lt;/p&gt;

&lt;p&gt;Data exchange between these systems is often implemented through hard-coded integrations.&lt;br&gt;
This leads to highly complex inter-system relationships, high maintenance costs, poor scalability, and difficulty in troubleshooting.&lt;/p&gt;

&lt;p&gt;On the other hand, enterprises also face complex network environments.&lt;/p&gt;

&lt;p&gt;These include corporate production networks, factory internal networks, and international/domestic dedicated-line networks.&lt;/p&gt;

&lt;p&gt;Different network environments have different requirements for data collection, transmission, and scheduling.&lt;/p&gt;

&lt;p&gt;How to achieve unified management and task isolation under such conditions becomes a major challenge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03qnjzoh6i8sg3szvrgw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03qnjzoh6i8sg3szvrgw.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges of Traditional Data Processing Approaches
&lt;/h2&gt;

&lt;p&gt;In the process of promoting data-driven transformation in intelligent manufacturing, enterprises are facing pain points across multiple dimensions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Diversity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1. Protocol complexity: device layer uses proprietary protocols such as PLM/S7, edge layer uses MQTT/COAP, and system layer uses REST/SOAP.&lt;br&gt;2. Data format heterogeneity: device data includes binary and hexadecimal formats, while database tables are often semi-structured formats such as JSON/XML.&lt;br&gt;3. Vendor differences: multiple vendors for robots and devices, with significant variations across production lines.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-System / Cross-Factory Collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1. Complex data links: involving devices, gateways, local systems, MES, SAP, ASP, WMS, and remote factories.&lt;br&gt;2. Mixed network environments: factory intranet, on-site servers, cross-factory dedicated lines, public network, and international network connections.&lt;br&gt;3. High real-time requirements: production scheduling, capacity planning, and other business functions demand strong timeliness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lack of Visualization &amp;amp; Traceability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1. Invisible data pipelines: traditional systems cannot visually display data processing flows.&lt;br&gt;2. Disconnected logs: data transmission between systems relies on manual logging, making it difficult to store and track complete logs across all nodes.&lt;br&gt;3. Difficult traceability: tracking data flow across systems requires manual effort and high labor costs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unreliable Data Collection Quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1. Diverse anomalies: network failures, device errors, system exceptions, and duplicate data collection.&lt;br&gt;2. Delayed issue detection: multiple anomalies are often discovered only after they impact downstream systems, relying on manual intervention.&lt;br&gt;3. Difficult root cause analysis: multi-system interactions make it hard to locate faults, requiring full-chain understanding of data flows.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;First is the foundational barrier caused by data diversity.&lt;/p&gt;

&lt;p&gt;Device protocols are highly diverse, covering proprietary protocols such as PLM/S7 as well as general protocols like MQTT.&lt;/p&gt;

&lt;p&gt;Data formats include binary data and semi-structured data.&lt;/p&gt;

&lt;p&gt;Combined with differences among vendors and production lines, this makes it extremely difficult to standardize data.&lt;/p&gt;

&lt;p&gt;On top of that, cross-system and cross-factory data collaboration is particularly challenging.&lt;/p&gt;

&lt;p&gt;Data links involve multiple stages, including devices, various systems, and geographically distributed factories.&lt;/p&gt;

&lt;p&gt;Network environments are mixed, including intranets, dedicated lines, and the public internet.&lt;/p&gt;

&lt;p&gt;At the same time, business scenarios such as production scheduling and capacity calculation have very high requirements for real-time data.&lt;/p&gt;

&lt;p&gt;All of these factors further increase the complexity of collaboration.&lt;/p&gt;

&lt;p&gt;Meanwhile, data visualization and traceability capabilities are insufficient.&lt;br&gt;
Traditional systems cannot intuitively present data flow nodes.&lt;br&gt;
Logs are stored in a scattered manner, leading to inefficient troubleshooting.&lt;br&gt;
Building a complete traceability system also requires significant manual effort.&lt;/p&gt;

&lt;p&gt;Finally, the quality of data collection lacks guarantees.&lt;/p&gt;

&lt;p&gt;Various anomalies frequently occur due to networks and devices.&lt;/p&gt;

&lt;p&gt;Detection of these anomalies is often delayed.&lt;/p&gt;

&lt;p&gt;Manual recovery is inefficient.&lt;/p&gt;

&lt;p&gt;In multi-system interactions, fault localization still relies heavily on familiarity with the entire data pipeline.&lt;/p&gt;

&lt;p&gt;All of these issues further impact data reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Apache DolphinScheduler Solution
&lt;/h2&gt;

&lt;p&gt;In response to the above challenges, Apache DolphinScheduler provides a comprehensive solution.&lt;br&gt;
As a distributed, highly extensible, and visual workflow scheduling platform, it demonstrates strong capabilities in manufacturing scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Worker Node Grouping: A Solution for Complex Network Environments
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fdzximlqtvik0rgps3t.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fdzximlqtvik0rgps3t.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In terms of Worker node grouping, Apache DolphinScheduler provides a flexible isolation strategy tailored to complex network environments in manufacturing enterprises.&lt;/p&gt;

&lt;p&gt;Worker nodes can be grouped by network environments, such as corporate production network Workers, factory internal network Workers, and international/domestic dedicated-line Workers.&lt;br&gt;
They can also be grouped by business types, such as PLC device data collection, production data processing, and quality data analysis.&lt;/p&gt;

&lt;p&gt;This enables task isolation across different network environments and business scenarios.&lt;br&gt;
It ensures the security and reliability of data collection.&lt;/p&gt;

&lt;p&gt;This solution effectively supports key application scenarios such as production data lake ingestion, customer data feedback, and cross-network data synchronization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Collection
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjv7wix5lv8jgw3rcblw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frjv7wix5lv8jgw3rcblw.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In terms of data collection, Apache DolphinScheduler builds a complete data processing pipeline.&lt;/p&gt;

&lt;p&gt;The data source layer includes IoT devices, such as device sensors, heartbeat data, status monitoring, and device operation data.&lt;br&gt;
It also includes business systems such as MES, WMS, ASP, and SAP databases.&lt;br&gt;
In addition, it includes AGENT probes and user-uploaded data.&lt;/p&gt;

&lt;p&gt;The processing layer uses DataX for offline data synchronization.&lt;br&gt;
It uses Flink for real-time stream processing.&lt;br&gt;
Kafka is used as a message queue buffer.&lt;/p&gt;

&lt;p&gt;Finally, data is unified into a data lake.&lt;br&gt;
This supports BI analysis and AI applications.&lt;/p&gt;

&lt;p&gt;Through unified scheduling with Apache DolphinScheduler, enterprises can achieve end-to-end management from data collection to processing to application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Interaction
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr9ej4re3cvdbkn8g8md.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxr9ej4re3cvdbkn8g8md.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the traditional model, systems interact with each other in a point-to-point manner.&lt;/p&gt;

&lt;p&gt;This leads to highly complex relationships between systems.&lt;/p&gt;

&lt;p&gt;After introducing Apache DolphinScheduler, all data interactions are unified through the scheduling center.&lt;/p&gt;

&lt;p&gt;This enables centralized management of all data interaction tasks.&lt;br&gt;
It allows visual monitoring of task execution status.&lt;br&gt;
It provides unified exception handling and alerting mechanisms.&lt;/p&gt;

&lt;p&gt;At the same time, it reduces coupling between systems.&lt;br&gt;
It improves the reliability of data interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Template-Based Data Collection and Distribution Across Multiple Factories
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3xncvkb5a2cja7qop63.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3xncvkb5a2cja7qop63.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For manufacturing enterprises with multiple factories, Apache DolphinScheduler provides a template-based solution.&lt;/p&gt;

&lt;p&gt;For homogeneous systems, such as unified MES/WMS systems or the same types of PLC devices, how can rapid deployment be achieved?&lt;/p&gt;

&lt;p&gt;The approach is to solidify core processes into reusable templates.&lt;br&gt;
These processes include reading task lists, parameter injection, execution of data collection or distribution, and completion or exception marking.&lt;/p&gt;

&lt;p&gt;At the same time, task configuration tables are introduced.&lt;br&gt;
These include data source configurations, SQL statements, system IDs for distribution or collection, custom parameters, and checkpoint settings.&lt;/p&gt;

&lt;p&gt;This enables a flexible model of “template standardization + parameter customization.”&lt;/p&gt;

&lt;p&gt;This template-based solution brings several significant advantages.&lt;br&gt;
First, parameterized configuration allows the core process to be standardized as a template, while factory-specific parameters such as IP addresses, accounts, and paths are configured separately.&lt;br&gt;
Second, batch deployment capability allows enterprises to complete deployment across dozens of factories within one day, greatly improving efficiency.&lt;br&gt;
Third, a unified iteration mechanism ensures that when templates are updated, all factories are automatically synchronized without the need for manual adjustments.&lt;br&gt;
Fourth, flexible extensibility supports template version management, allowing customized templates to be derived for different factories based on a base template.&lt;br&gt;
For example, some factories may require additional data fields.&lt;br&gt;
Fifth, cross-scenario support enables both “multi-factory data collection to headquarters” and “headquarters data distribution to multiple factories,” such as unified production plan distribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Qualitative Leap: From Manual Workshop to Industrial Pipeline
&lt;/h2&gt;

&lt;p&gt;After introducing Apache DolphinScheduler, the enterprise achieved a qualitative leap in data processing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Traditional Coding&lt;/th&gt;
&lt;th&gt;Apache DolphinScheduler&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Development Efficiency&lt;/td&gt;
&lt;td&gt;Requires writing data processing logic, exception handling, retry logic, etc.; high human effort&lt;/td&gt;
&lt;td&gt;Drag-and-drop configuration, built-in components and plugins, development completed in stages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency Management&lt;/td&gt;
&lt;td&gt;Difficult to handle complex task dependencies; prone to issues such as missing or inconsistent dependencies&lt;/td&gt;
&lt;td&gt;Visual DAG-based workflow orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring &amp;amp; Alerting&lt;/td&gt;
&lt;td&gt;Requires custom development of monitoring or logging, leading to lagging issue detection&lt;/td&gt;
&lt;td&gt;Built-in monitoring, real-time task execution status, logs, and alert notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fault Tolerance &amp;amp; Retry&lt;/td&gt;
&lt;td&gt;Requires manual modification of code/scripts; complex recovery process&lt;/td&gt;
&lt;td&gt;One-click retry/stop; built-in fault-tolerant retry mechanisms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Scheduling&lt;/td&gt;
&lt;td&gt;Lacks unified management; prone to CPU/memory contention and uneven resource allocation&lt;/td&gt;
&lt;td&gt;Distributed, centralized resource management; dynamic scaling via integration with compute engines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In the traditional approach, developers needed to write code for data connections, exception handling, and retry logic modules.&lt;br&gt;
This required significant human effort.&lt;/p&gt;

&lt;p&gt;In contrast, Apache DolphinScheduler uses a drag-and-drop configuration approach.&lt;br&gt;
It comes with numerous built-in plugins.&lt;br&gt;
Development tasks can be completed within minutes.&lt;/p&gt;

&lt;p&gt;In terms of dependency management, traditional approaches struggle to handle complex cross-system scheduling.&lt;/p&gt;

&lt;p&gt;Issues such as idempotency and consistency must be considered.&lt;br&gt;
This makes the process error-prone.&lt;/p&gt;

&lt;p&gt;In contrast, Apache DolphinScheduler provides intuitive and convenient visual DAG operations.&lt;/p&gt;

&lt;p&gt;The improvement in monitoring and alerting capabilities is particularly significant.&lt;/p&gt;

&lt;p&gt;Traditional approaches require developers to write monitoring scripts or manually check logs.&lt;/p&gt;

&lt;p&gt;This leads to delayed fault detection and resolution.&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler comes with built-in monitoring capabilities.&lt;/p&gt;

&lt;p&gt;It supports real-time viewing of task execution status and logs.&lt;/p&gt;

&lt;p&gt;It can also integrate with multiple alerting channels such as WeCom, DingTalk, and email.&lt;/p&gt;

&lt;p&gt;In terms of fault tolerance and recovery, traditional approaches require manual modification of code and scripts.&lt;/p&gt;

&lt;p&gt;Data recovery logic is complex.&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler provides one-click rerun and stop functions.&lt;/p&gt;

&lt;p&gt;It also includes built-in automatic retry mechanisms for failures.&lt;/p&gt;

&lt;p&gt;Resource scheduling capabilities are also greatly improved.&lt;/p&gt;

&lt;p&gt;Traditional approaches lack unified resource management.&lt;/p&gt;

&lt;p&gt;This often leads to CPU and memory overload on single machines, causing crashes.&lt;br&gt;
Distributed approaches also consume significant resources.&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler adopts a distributed and decentralized cluster management architecture.&lt;/p&gt;

&lt;p&gt;It supports rapid dynamic scaling through monitoring.&lt;br&gt;
It enables fine-grained resource management.&lt;/p&gt;

&lt;p&gt;These improvements bring real value at multiple levels.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Development&lt;/th&gt;
&lt;th&gt;Business&lt;/th&gt;
&lt;th&gt;Decision Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Drag-and-drop development&lt;/td&gt;
&lt;td&gt;1. Visualized monitoring&lt;/td&gt;
&lt;td&gt;1. De-personalization (processes not dependent on individuals)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Automated parameterization&lt;/td&gt;
&lt;td&gt;2. Alert assurance&lt;/td&gt;
&lt;td&gt;2. Operation auditing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Log-based issue localization&lt;/td&gt;
&lt;td&gt;3. Flexible parameters&lt;/td&gt;
&lt;td&gt;3. Data security (centralized data configuration)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Low O&amp;amp;M cost&lt;/td&gt;
&lt;td&gt;4. Cross-system orchestration&lt;/td&gt;
&lt;td&gt;4. Elimination of black-box operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;5. Reduced development dependencies&lt;/td&gt;
&lt;td&gt;5. Resource utilization &amp;amp; measurability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At the development level, drag-and-drop workflows lower the technical barrier.&lt;/p&gt;

&lt;p&gt;Parameter automation improves development efficiency.&lt;/p&gt;

&lt;p&gt;Second-level log tracing shortens troubleshooting time.&lt;/p&gt;

&lt;p&gt;Operational costs are significantly reduced.&lt;/p&gt;

&lt;p&gt;At the business level, visual monitoring provides a clear view of task status.&lt;/p&gt;

&lt;p&gt;Multi-channel alerting ensures timely response to issues.&lt;/p&gt;

&lt;p&gt;Flexible data recovery strategies handle various anomalies.&lt;/p&gt;

&lt;p&gt;Cross-system coordination enables unified data flow management.&lt;/p&gt;

&lt;p&gt;Dependence on individual developers is reduced.&lt;/p&gt;

&lt;p&gt;At the decision-making level, knowledge is no longer tied to individuals.&lt;/p&gt;

&lt;p&gt;It becomes an organizational asset.&lt;/p&gt;

&lt;p&gt;Complete audit logs meet compliance requirements.&lt;/p&gt;

&lt;p&gt;Centralized database configuration reduces security risks.&lt;/p&gt;

&lt;p&gt;Transparent workflows make management and optimization easier.&lt;/p&gt;

&lt;p&gt;Quantified resource usage supports refined decision-making.&lt;/p&gt;

&lt;p&gt;These values together form a solid foundation for enterprise digital transformation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results and Future Outlook
&lt;/h2&gt;

&lt;p&gt;Through the practical application of Apache DolphinScheduler, this intelligent manufacturing enterprise has achieved significant improvements across multiple dimensions.&lt;/p&gt;

&lt;p&gt;These include improved development efficiency, shortened deployment cycles, significantly reduced operational costs and manpower, and greatly increased task success rates.&lt;/p&gt;

&lt;p&gt;At the same time, the system supports rapid scaling.&lt;/p&gt;

&lt;p&gt;New factories can be deployed within one day.&lt;/p&gt;

&lt;p&gt;This enables standardized processes, transparent management, and data-driven decision-making.&lt;/p&gt;

&lt;p&gt;Looking ahead, as intelligent manufacturing continues to advance, data scheduling will play an increasingly important role.&lt;/p&gt;

&lt;p&gt;As an open-source project, Apache DolphinScheduler will continue to evolve in multiple directions.&lt;/p&gt;

&lt;p&gt;In terms of AI enablement, it will introduce AI capabilities to achieve intelligent scheduling and predictive maintenance.&lt;/p&gt;

&lt;p&gt;In terms of cloud-native architecture, it will deeply adapt to cloud-native environments to improve elasticity and scalability.&lt;/p&gt;

&lt;p&gt;In terms of ecosystem expansion, it will enrich the plugin ecosystem to cover more business scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the journey of intelligent manufacturing, data scheduling is not the destination, but the starting point.&lt;/p&gt;

&lt;p&gt;Apache DolphinScheduler helps enterprises solve the “last mile” problem of data processing.&lt;br&gt;
It allows enterprises to focus more on business innovation and value creation.&lt;/p&gt;

&lt;p&gt;The road to digital transformation is long and challenging.&lt;br&gt;
But with persistence, progress will be made.&lt;/p&gt;

&lt;p&gt;May more manufacturing enterprises leverage the power of open source to achieve a transformation from “manufacturing” to “intelligent manufacturing.”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>programming</category>
      <category>apachedolphinscheduler</category>
    </item>
    <item>
      <title>Part 8 | Boundaries, Collaboration, and Best Practices Between Apache DolphinScheduler and Flink &amp; Spark</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 17 Apr 2026 07:09:24 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-8-boundaries-collaboration-and-best-practices-between-apache-dolphinscheduler-and-flink--39n2</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-8-boundaries-collaboration-and-best-practices-between-apache-dolphinscheduler-and-flink--39n2</guid>
      <description>&lt;p&gt;In the continuous evolution of data platforms, a very common yet subtle misconception is that teams unconsciously allow the scheduling system to take on more and more responsibilities that do not belong to it, such as writing complex business logic in the scheduling layer, controlling computation parameters, and even attempting to centrally manage execution details across different computing engines.&lt;/p&gt;

&lt;p&gt;In the short term, this may seem to improve efficiency, but in the long run, such a design often makes the system highly coupled, difficult to maintain, and even causes it to lose stability as scale increases.&lt;/p&gt;

&lt;p&gt;Therefore, before discussing specific practices, we must first clarify one thing: the boundary between the scheduling system and data engines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Responsibilities and Boundaries Between the Scheduler and Data Engines
&lt;/h3&gt;

&lt;p&gt;To understand how the entire system operates, it is helpful to remember a very core principle: the scheduling system is only responsible for “when to run” and “dependency relationships,” while “how to compute” must be left to execution engines such as Spark, Flink, or SeaTunnel.&lt;/p&gt;

&lt;p&gt;In other words, DolphinScheduler is the orchestrator of workflows, not the executor of computation.&lt;/p&gt;

&lt;p&gt;From an engineering perspective, this division of responsibilities can be clearly expressed in the following table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Core Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DolphinScheduler&lt;/td&gt;
&lt;td&gt;DAG orchestration, task scheduling, dependency management, failure retry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spark&lt;/td&gt;
&lt;td&gt;Offline batch processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flink&lt;/td&gt;
&lt;td&gt;Real-time stream processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SeaTunnel&lt;/td&gt;
&lt;td&gt;Data integration (batch / streaming / CDC)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In actual development, the place where this boundary is most easily broken is often the Shell task.&lt;/p&gt;

&lt;p&gt;Many people are accustomed to writing complex branching logic in a single node, for example, deciding which Spark job to execute based on the date:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$day&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"2026-04-01"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;spark-submit job_a.py
&lt;span class="k"&gt;else
  &lt;/span&gt;spark-submit job_b.py
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although this approach “works,” it brings three problems: first, the logic is hidden inside the script and cannot be perceived by the DAG; second, dependency relationships are no longer clear, affecting the visualization capability of the scheduling system; third, the cost of maintenance and troubleshooting will increase significantly in the later stages.&lt;/p&gt;

&lt;p&gt;A more reasonable approach is to explicitly model the branching logic in the workflow and control the execution path through conditional nodes, so that the entire process is visible and controllable in the UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Differences in Scheduling Between Batch, Streaming, and CDC
&lt;/h3&gt;

&lt;p&gt;After the boundaries are clear, when we look at the scheduling methods of different types of tasks, we will find that they are essentially three completely different models, rather than simple variations of the same scheduling logic.&lt;/p&gt;

&lt;p&gt;First is batch processing, which is the type of scenario that best fits the traditional scheduling model, such as T+1 tasks in a data warehouse or aggregation computations running hourly.&lt;/p&gt;

&lt;p&gt;Such tasks have clear time windows and well-defined upstream and downstream dependencies, making them very suitable to be expressed through DAGs.&lt;/p&gt;

&lt;p&gt;In practice, they are usually split into layers such as ODS, DWD, and DWS, with each layer corresponding to one or more independent tasks, and driven by parameters (such as ${biz_date}).&lt;/p&gt;

&lt;p&gt;For example, a typical Spark submission method is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;spark-submit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--class&lt;/span&gt; com.example.ETLJob &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--master&lt;/span&gt; yarn &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deploy-mode&lt;/span&gt; cluster &lt;span class="se"&gt;\&lt;/span&gt;
  etl-job.jar &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--date&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;biz_date&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this process, the responsibility of the scheduling system is to connect task relationships, control execution order, and handle failure retries, rather than diving into the specific computation logic.&lt;/p&gt;

&lt;p&gt;In contrast to batch processing, streaming tasks are fundamentally “continuously running,” rather than “periodically triggered.”&lt;/p&gt;

&lt;p&gt;If a scheduling system is used to start a Flink job every few minutes, it is essentially solving the problem in the wrong way.&lt;/p&gt;

&lt;p&gt;A well-designed streaming task should rely on Flink’s own state management and checkpoint mechanism to run continuously, while DolphinScheduler plays more of a “guardian” role, responsible for initial startup, status detection, and exception recovery, rather than frequent intervention.&lt;/p&gt;

&lt;p&gt;Looking further at CDC scenarios, it is essentially also a type of streaming processing, but more oriented toward data integration, which is exactly a typical application scenario of SeaTunnel.&lt;/p&gt;

&lt;p&gt;Through SeaTunnel, it is very convenient to implement real-time synchronization from databases to message queues, for example, from MySQL to Kafka:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hocon"&gt;&lt;code&gt;&lt;span class="nl"&gt;env&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;execution.parallelism&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;MySQL-CDC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;hostname&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"localhost"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3306&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;username&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"root"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;password&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123456"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;database-names&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"test_db"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;table-names&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"test_db.user"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nl"&gt;sink&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;Kafka&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;topic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_cdc"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;bootstrap.servers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"localhost:9092"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The corresponding startup command is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./bin/seatunnel.sh &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--config&lt;/span&gt; config/mysql_cdc.conf &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the scheduling level, the principle of CDC is consistent with streaming processing: start once, run continuously, and ensure stability through status detection mechanisms, rather than repeatedly triggering through periodic scheduling.&lt;/p&gt;

&lt;p&gt;From this perspective, the core difference between batch processing, streaming processing, and CDC actually lies in whether it needs to be repeatedly scheduled.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Scheduling System Should Not Intrude into the Execution Engine
&lt;/h3&gt;

&lt;p&gt;As the system gradually scales, a deeper question will emerge: why do we repeatedly emphasize that the scheduling system should remain “restrained”?&lt;/p&gt;

&lt;p&gt;The reason is that once the scheduling system begins to intrude into the responsibility scope of the execution engine, the controllability of the entire architecture will rapidly decline.&lt;/p&gt;

&lt;p&gt;For example, directly writing Spark resource parameters in the scheduling script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;spark-submit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--executor-memory&lt;/span&gt; 8G &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--conf&lt;/span&gt; spark.sql.shuffle.partitions&lt;span class="o"&gt;=&lt;/span&gt;500 &lt;span class="se"&gt;\&lt;/span&gt;
  job.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem with this approach is that it hardcodes execution-layer configurations into the scheduling layer, making parameter management scattered and difficult to unify.&lt;/p&gt;

&lt;p&gt;Once resource configurations need to be adjusted, the scheduling task must be modified, or even the workflow must be redeployed.&lt;/p&gt;

&lt;p&gt;A more reasonable approach is to place these parameters in the Spark configuration center or manage them within the job itself, allowing DolphinScheduler to only be responsible for triggering execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;spark-submit job.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This decoupling approach can significantly improve system maintainability, allowing each layer to focus on its own responsibilities.&lt;/p&gt;

&lt;p&gt;From an overall architectural perspective, a mature data platform can usually be abstracted into a three-layer structure: the top layer is the scheduling layer represented by DolphinScheduler, responsible for workflow orchestration; the middle layer is the execution layer represented by Spark, Flink, and SeaTunnel, responsible for specific computation and data processing; and the bottom layer is the resource layer such as YARN or Kubernetes, responsible for resource allocation and isolation.&lt;/p&gt;

&lt;p&gt;Only when the boundaries of these three layers are clear can the system maintain stability as complexity increases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbtab9nbdtooajy14g2o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbtab9nbdtooajy14g2o.jpg" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  A Practical Architecture Example Integrating SeaTunnel
&lt;/h3&gt;

&lt;p&gt;In real production environments, this layered thinking is usually reflected in complete data pipelines.&lt;/p&gt;

&lt;p&gt;For example, SeaTunnel can be used to implement CDC from MySQL to Kafka to synchronize real-time data; then Flink performs real-time computation to produce online metrics; at the same time, the data is landed into storage systems, and then Spark completes offline data warehouse processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbk5rlvjjtsmq1tiuazm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbk5rlvjjtsmq1tiuazm.jpg" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this process, DolphinScheduler is responsible for unified orchestration of these tasks, including starting CDC, monitoring streaming tasks, and scheduling offline computations.&lt;/p&gt;

&lt;p&gt;From a process perspective, it can be abstracted into a clear data link: data enters from the source, goes through SeaTunnel into the real-time channel, is processed by Flink to serve online systems, is simultaneously written into storage, and then processed by Spark for layered transformation, while DolphinScheduler always acts as the “central hub,” coordinating execution order and dependency relationships across all stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary: Let the System Return to “Each Doing Its Own Job”
&lt;/h3&gt;

&lt;p&gt;Returning to the original question, the design principle of the entire system can actually be summarized in one sentence: DolphinScheduler is the “brain,” while Spark, Flink, and SeaTunnel are the “muscles.”&lt;/p&gt;

&lt;p&gt;The scheduling system is responsible for decision-making and orchestration, while the execution engines are responsible for specific computation and processing.&lt;/p&gt;

&lt;p&gt;In practical implementation, it can be further summarized into three simple but very critical principles: first, all process logic must be reflected in the DAG, rather than hidden in scripts; second, all computation logic must be pushed down into the execution engines to avoid expansion of the scheduling layer; third, streaming processing and CDC tasks must be designed based on “long-running” operation, rather than being scheduled repeatedly in a batch-processing manner.&lt;/p&gt;

&lt;p&gt;When these three points are strictly followed, the data platform can evolve from “just able to run” to “stable, scalable, and governable,” which is also a key step from engineering to systematic architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous articles:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-ffeaf159f534" rel="noopener noreferrer"&gt;Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-1c97d8d0327e" rel="noopener noreferrer"&gt;Part 7 | Where Scheduling Systems Really Break and the Hidden Bottlenecks Beyond CPU and Scale&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next: From Scheduling to DataOps: DolphinScheduler as the Control Plane&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>apachedolphinscheduler</category>
      <category>spark</category>
    </item>
    <item>
      <title>Part 7 | Where Scheduling Systems Really Break and the Hidden Bottlenecks Beyond CPU and Scale</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:58:17 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-lgj</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-lgj</guid>
      <description>&lt;p&gt;In production environments, performance issues in a scheduling platform are never caused by a single bottleneck. Instead, they arise from the combined effects of scheduling decisions, task execution, metadata storage, and coordination mechanisms. Taking Apache DolphinScheduler as an example, focusing on just one component, such as the Master or Worker, often leads to misidentifying the root cause.&lt;/p&gt;

&lt;p&gt;This article is based on real-world production experience. It systematically breaks down performance bottlenecks in a scheduling platform and provides practical, actionable optimization strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. From the overall architecture, where exactly are the bottlenecks?
&lt;/h2&gt;

&lt;p&gt;The core workflow of DolphinScheduler can be abstracted as:&lt;/p&gt;

&lt;p&gt;Scheduling → Execution → Storage → Coordination&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hcoleuc4r1dorc1ym9m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hcoleuc4r1dorc1ym9m.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any layer can become a bottleneck, but the most common issues are concentrated in four areas:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Insufficient scheduling throughput on the Master&lt;/li&gt;
&lt;li&gt;Mismatch between Worker execution capacity and workload&lt;/li&gt;
&lt;li&gt;Excessive pressure on the database (MySQL/PostgreSQL)&lt;/li&gt;
&lt;li&gt;Latency or instability in ZooKeeper (coordination layer)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  2. The Master bottleneck is not CPU, but the “scheduling model”
&lt;/h2&gt;

&lt;p&gt;Many assume the Master’s CPU is the issue. In practice, the real bottleneck is the combination of the scheduling model and database I/O.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Scheduling mechanism
&lt;/h3&gt;

&lt;p&gt;The Master’s core loop looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MasterSchedulerService.java&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ProcessInstance&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findNeedScheduleProcessInstances&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ProcessInstance&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;submitProcessInstance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a polling + database-driven model. The key limitation is that scheduling capacity is directly tied to database throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Typical symptoms
&lt;/h3&gt;

&lt;p&gt;High scheduling latency:&lt;/p&gt;

&lt;p&gt;Tasks are ready but delayed by tens of seconds before execution, while Master CPU usage remains low and database QPS is high.&lt;/p&gt;

&lt;p&gt;Low throughput:&lt;/p&gt;

&lt;p&gt;The system may only schedule a few hundred tasks per minute, and adding more Masters yields limited improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Optimization strategies
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Reduce database scanning pressure
&lt;/h4&gt;

&lt;p&gt;Typical SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t_ds_process_instance&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optimization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_state_priority_time&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t_ds_process_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additional measures include limiting scan batch sizes and tuning scheduling intervals to avoid excessive polling.&lt;/p&gt;

&lt;h4&gt;
  
  
  Increase scheduling concurrency
&lt;/h4&gt;

&lt;p&gt;Key configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;master&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
  &lt;span class="na"&gt;dispatch-task-number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Practical guidelines:&lt;/p&gt;

&lt;p&gt;exec-threads should be approximately 2 to 4 times the number of CPU cores.&lt;br&gt;
dispatch-task-number should not be too large to avoid overwhelming Workers.&lt;/p&gt;
&lt;h4&gt;
  
  
  Scale out Masters
&lt;/h4&gt;

&lt;p&gt;DolphinScheduler supports multiple Masters, but scaling is not linear due to shared database bottlenecks and ZooKeeper coordination overhead.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. More Workers is not always better
&lt;/h2&gt;

&lt;p&gt;Adding more Workers blindly can overload the database and worsen queuing.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.1 Worker configuration
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Workers act as both execution units and resource isolation boundaries.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.2 Estimation formula
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Worker count ≈ Total concurrent tasks / Per-Worker concurrency
Per-Worker concurrency ≈ CPU cores × (2 to 4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3.3 Example
&lt;/h3&gt;

&lt;p&gt;For 1,000 concurrent tasks and 16-core Workers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Per Worker ≈ 32 to 64 concurrent tasks
Required Workers ≈ 1000 / 50 ≈ 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.4 Task type matters more
&lt;/h3&gt;

&lt;p&gt;Short tasks (&amp;lt;5 seconds):&lt;/p&gt;

&lt;p&gt;Scheduling overhead exceeds execution time, making the Master the bottleneck.&lt;/p&gt;

&lt;p&gt;Long tasks (&amp;gt;10 minutes):&lt;/p&gt;

&lt;p&gt;Workers become resource bottlenecks due to long occupation time.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Different strategies for short and long tasks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Short tasks optimization
&lt;/h3&gt;

&lt;p&gt;Typical scenarios include SQL queries and API calls.&lt;/p&gt;

&lt;p&gt;Batching example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Before: multiple small queries&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- After: batch query&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other strategies include reducing DAG granularity and moving loops into scripts.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Long tasks optimization
&lt;/h3&gt;

&lt;p&gt;Typical scenarios include Spark or Flink jobs.&lt;/p&gt;

&lt;p&gt;The bottleneck lies in resource systems rather than the scheduler.&lt;/p&gt;

&lt;p&gt;Strategies:&lt;/p&gt;

&lt;p&gt;Bind workloads to YARN queues or Kubernetes namespaces and enforce concurrency limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The database bottleneck is the most underestimated
&lt;/h2&gt;

&lt;p&gt;Around 80% of production performance issues ultimately relate to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Common problems
&lt;/h3&gt;

&lt;p&gt;Slow queries&lt;br&gt;
Row-level lock contention&lt;br&gt;
Connection pool exhaustion&lt;/p&gt;
&lt;h3&gt;
  
  
  5.2 Typical SQL
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;t_ds_task_instance&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'RUNNING'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Frequent updates to the same rows lead to lock contention and reduced throughput.&lt;/p&gt;
&lt;h3&gt;
  
  
  5.3 Optimization strategies
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Read-write separation
&lt;/h4&gt;

&lt;p&gt;Masters handle writes, while APIs and queries use read replicas.&lt;/p&gt;
&lt;h4&gt;
  
  
  Reduce update frequency
&lt;/h4&gt;

&lt;p&gt;Inefficient pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RUNNING → RUNNING → RUNNING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optimization:&lt;/p&gt;

&lt;p&gt;Reduce heartbeat frequency.&lt;/p&gt;

&lt;h4&gt;
  
  
  Batch updates
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Batch update task states&lt;/span&gt;
&lt;span class="n"&gt;updateBatch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;taskInstances&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. ZooKeeper as a hidden bottleneck
&lt;/h2&gt;

&lt;p&gt;ZooKeeper is responsible for coordination, including Master election, Worker registration, and heartbeat management.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Common symptoms
&lt;/h3&gt;

&lt;p&gt;Scheduling jitter under high load&lt;br&gt;
Workers falsely marked as dead&lt;br&gt;
Frequent Master failovers&lt;/p&gt;
&lt;h3&gt;
  
  
  6.2 Root causes
&lt;/h3&gt;

&lt;p&gt;Improper session timeout settings&lt;br&gt;
Too many nodes and connections&lt;br&gt;
Network instability&lt;/p&gt;
&lt;h3&gt;
  
  
  6.3 Optimization
&lt;/h3&gt;

&lt;p&gt;Example configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;tickTime&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2000&lt;/span&gt;
&lt;span class="py"&gt;initLimit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;
&lt;span class="py"&gt;syncLimit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recommendations:&lt;/p&gt;

&lt;p&gt;Increase session timeout to at least 20 seconds to tolerate transient failures.&lt;br&gt;
Deploy ZooKeeper independently to avoid resource contention.&lt;/p&gt;
&lt;h2&gt;
  
  
  7. A real-world optimization case
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Background
&lt;/h3&gt;

&lt;p&gt;Daily tasks: 200,000&lt;br&gt;
DAGs: 30,000&lt;br&gt;
Masters: 2&lt;br&gt;
Workers: 30&lt;/p&gt;
&lt;h3&gt;
  
  
  Issues
&lt;/h3&gt;

&lt;p&gt;Scheduling latency exceeded 1 minute during peak hours&lt;br&gt;
Database CPU usage reached 90 percent&lt;/p&gt;
&lt;h3&gt;
  
  
  Optimization process
&lt;/h3&gt;

&lt;p&gt;Step 1: Database indexing&lt;br&gt;
Result: latency reduced by 40 percent&lt;/p&gt;

&lt;p&gt;Step 2: Reduce short tasks&lt;br&gt;
Result: DAG count reduced by 30 percent&lt;/p&gt;

&lt;p&gt;Step 3: Adjust Master parameters&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50 → &lt;/span&gt;&lt;span class="m"&gt;120&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: throughput doubled&lt;/p&gt;

&lt;h3&gt;
  
  
  Final results
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scheduling latency reduced from 60 seconds to 8 seconds&lt;/li&gt;
&lt;li&gt;Database CPU usage reduced from 90 percent to 50 percent&lt;/li&gt;
&lt;li&gt;Overall throughput improved by 2 to 3 times&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. Summary: the essence of scheduling performance optimization
&lt;/h2&gt;

&lt;p&gt;The core insight is that performance is a balance of:&lt;/p&gt;

&lt;p&gt;Scheduling capacity × Execution capacity × Storage capacity × Coordination capability&lt;/p&gt;

&lt;p&gt;Optimization must be holistic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Master controls the scheduling rhythm&lt;/li&gt;
&lt;li&gt;Workers provide execution capacity&lt;/li&gt;
&lt;li&gt;The database defines system limits&lt;/li&gt;
&lt;li&gt;ZooKeeper ensures coordination stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately:&lt;/p&gt;

&lt;p&gt;The limit of a scheduling system is not how many tasks it can dispatch, but how long the database can sustain the load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous articles:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-ffeaf159f534" rel="noopener noreferrer"&gt;Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next: The boundaries between DolphinScheduler and Flink, Spark, and SeaTunnel&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>scheduling</category>
      <category>apachedolphinscheduler</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Can Your Scheduler Fix Itself at 2 AM? Inside the DolphinScheduler Agent Meetup</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 10:18:14 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/can-your-scheduler-fix-itself-at-2-am-inside-the-dolphinscheduler-agent-meetup-3ae0</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/can-your-scheduler-fix-itself-at-2-am-inside-the-dolphinscheduler-agent-meetup-3ae0</guid>
      <description>&lt;p&gt;If you’ve ever worked with scheduling systems, you’ve probably had moments like this:&lt;/p&gt;

&lt;p&gt;At 2 AM, your phone suddenly lights up.&lt;br&gt;
Not a message—an alert. A job has failed.&lt;/p&gt;

&lt;p&gt;You stare at the screen, with only one thought in your head:&lt;br&gt;
&lt;strong&gt;“Can it just fix itself?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It sounds a bit idealistic.&lt;br&gt;
But this time, we actually want to take it seriously.&lt;/p&gt;

&lt;p&gt;Soon, the Apache DolphinScheduler community will host a new online Meetup.&lt;/p&gt;

&lt;p&gt;This time, we won’t dive into grand architectures or complex theories.&lt;br&gt;
Instead, we start with a very “engineer-like” question:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Can a scheduling system require less human effort?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  📅 &lt;strong&gt;Event Info&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time&lt;/strong&gt;: April 21, 2026, 14:00–15:00&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format&lt;/strong&gt;: Online livestream&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Register your seat now:&lt;/strong&gt; &lt;a href="https://meeting.tencent.com/dm/sdXKjKfLewVe" rel="noopener noreferrer"&gt;https://meeting.tencent.com/dm/sdXKjKfLewVe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎤 &lt;strong&gt;Who’s Speaking?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv37xjqy2ier6myjfztsi.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv37xjqy2ier6myjfztsi.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This session features &lt;strong&gt;Liu Xiaodong&lt;/strong&gt;,&lt;br&gt;
an algorithm engineer from Shanghai FamilyMart Co., Ltd.&lt;/p&gt;

&lt;p&gt;His self-introduction is quite fun:&lt;/p&gt;

&lt;p&gt;Not limited to one direction—he tinkers with everything.&lt;br&gt;
Writes code, builds systems, explores new ideas.&lt;br&gt;
And occasionally “wanders around Hyrule to discover new landscapes.”&lt;/p&gt;

&lt;p&gt;Sounds like this won’t be a conventional talk.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 &lt;strong&gt;What’s the Topic?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The topic is simple yet vivid:&lt;br&gt;
&lt;strong&gt;“DolphinScheduler Agent: I Just Want to Lie Down and Still Get Work Done”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It starts from a very real idea:&lt;/p&gt;

&lt;p&gt;The dream state of a “lazy engineer” is:&lt;br&gt;
When something breaks, the system detects and fixes it automatically.&lt;br&gt;
Humans just take a glance and say a word—everything else is handled.&lt;/p&gt;

&lt;p&gt;Sounds exaggerated?&lt;/p&gt;

&lt;p&gt;This talk will explore:&lt;br&gt;
👉 &lt;strong&gt;How far can we actually go in this direction&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 &lt;strong&gt;What Will You Learn?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is not a purely conceptual talk, but an &lt;strong&gt;ongoing exploration&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The design of DolphinScheduler Agent&lt;/li&gt;
&lt;li&gt;How to make scheduling systems more “self-healing”&lt;/li&gt;
&lt;li&gt;Real-world attempts and lessons learned&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;working demo&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than giving standard answers, it’s more like:&lt;br&gt;
&lt;strong&gt;a journey recap + new ways of thinking&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🎁 &lt;strong&gt;Bonus&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There will also be a &lt;strong&gt;lucky draw&lt;/strong&gt; during the livestream 🎉&lt;/p&gt;

&lt;p&gt;You might even win a custom Apache DolphinScheduler keychain—&lt;br&gt;
a must-have for community members!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jcki55c7hnkop7heatr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jcki55c7hnkop7heatr.png" alt="DS 钥匙扣" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  👀 &lt;strong&gt;Who Should Join?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This Meetup is for you if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re using or exploring DolphinScheduler&lt;/li&gt;
&lt;li&gt;You’re interested in automation, agents, or intelligent operations&lt;/li&gt;
&lt;li&gt;You want to see real demos, not just slides&lt;/li&gt;
&lt;li&gt;Or you simply want to “work less” in a smarter way&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📢 &lt;strong&gt;Final Thought&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We’re used to fixing problems when they occur.&lt;br&gt;
But rarely do we ask:&lt;br&gt;
&lt;strong&gt;Can systems prevent problems—or even solve them on their own?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Maybe that’s the next step for scheduling systems.&lt;/p&gt;

&lt;p&gt;📅 April 21&lt;br&gt;
Let’s talk about building systems that are a little less exhausting.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>apachedolphinscheduler</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Apache DolphinScheduler Local Setup Made Simple: A Beginner-Friendly Guide</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 10:08:09 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-local-setup-made-simple-a-beginner-friendly-guide-108e</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-local-setup-made-simple-a-beginner-friendly-guide-108e</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm86el1td22eufuncrqu7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm86el1td22eufuncrqu7.jpg" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is intended for developers who want to read and debug the core source code of Apache DolphinScheduler locally. The example environment is based on &lt;code&gt;Windows + IntelliJ IDEA + Docker Desktop + PostgreSQL + ZooKeeper&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you only want to quickly تجربه features rather than debug the full chain of &lt;code&gt;master / worker / api&lt;/code&gt;, it is recommended to use &lt;code&gt;StandaloneServer&lt;/code&gt; first. If you want to debug the distributed scheduling workflow, follow this guide to start services separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Use Cases&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Start &lt;code&gt;MasterServer&lt;/code&gt;, &lt;code&gt;WorkerServer&lt;/code&gt;, and &lt;code&gt;ApiApplicationServer&lt;/code&gt; individually in IntelliJ IDEA&lt;/li&gt;
&lt;li&gt;Use Docker Desktop to host PostgreSQL and ZooKeeper&lt;/li&gt;
&lt;li&gt;Debug Java services locally on the host machine&lt;/li&gt;
&lt;li&gt;Run the frontend locally and connect it to backend APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Environment Requirements&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Docker Desktop&lt;/li&gt;
&lt;li&gt;JDK 8 or 11&lt;/li&gt;
&lt;li&gt;Maven 3.8+ (or use the built-in &lt;code&gt;mvnw.cmd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Node.js 16+&lt;/li&gt;
&lt;li&gt;pnpm 8+&lt;/li&gt;
&lt;li&gt;IntelliJ IDEA&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;java.version&lt;/code&gt; in the root &lt;code&gt;pom.xml&lt;/code&gt; is &lt;code&gt;1.8&lt;/code&gt;. It is recommended to use JDK 8 or 11 for local debugging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Start PostgreSQL and ZooKeeper&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, navigate to the &lt;code&gt;deploy/docker&lt;/code&gt; directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler\deploy\docker&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using the &lt;code&gt;docker-compose-windows.yml&lt;/code&gt; provided in the appendix, ensure that &lt;code&gt;dolphinscheduler-zookeeper&lt;/code&gt; exposes port &lt;code&gt;2181&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;master&lt;/code&gt;, &lt;code&gt;worker&lt;/code&gt;, and &lt;code&gt;api&lt;/code&gt; all connect to &lt;code&gt;localhost:2181&lt;/code&gt; by default. If ZooKeeper runs only inside the container without port mapping, Java processes started in IDEA will fail to connect.&lt;/p&gt;

&lt;p&gt;Ensure the following configuration exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;dolphinscheduler-zookeeper&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;zookeeper:3.8&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2181:2181"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker-compose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docker-compose-windows.yml&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;up&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler-postgresql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler-zookeeper&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optional verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ps&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;127.0.0.1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;5432&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;localhost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;2181&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port &lt;code&gt;5432&lt;/code&gt; is reachable&lt;/li&gt;
&lt;li&gt;Port &lt;code&gt;2181&lt;/code&gt; is reachable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are using local or remote installations instead of Docker, skip this step but ensure configurations match your environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Build the Project&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\mvnw.cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;spotless:apply&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\mvnw.cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clean&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DskipTests&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;spotless:apply&lt;/code&gt; formats code to avoid check failures&lt;/li&gt;
&lt;li&gt;The first build may take a while&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Initialize PostgreSQL Metadata Database&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before starting &lt;code&gt;master&lt;/code&gt; and &lt;code&gt;api&lt;/code&gt;, initialize metadata tables.&lt;/p&gt;

&lt;p&gt;SQL script location:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_postgresql.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Docker PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Get-Content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler-dao\src\main\resources\sql\dolphinscheduler_postgresql.sql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;exec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-e&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docker-dolphinscheduler-postgresql-1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;psql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, use DataGrip, DBeaver, or &lt;code&gt;psql&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: This script contains &lt;code&gt;DROP TABLE IF EXISTS&lt;/code&gt;. Do NOT run it on production databases.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;t_ds_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected: one record returned (e.g., &lt;code&gt;3.4.0&lt;/code&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Verify Local Configuration&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Default configs (usually no changes needed):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL: &lt;code&gt;127.0.0.1:5432&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;DB: &lt;code&gt;dolphinscheduler&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Username: &lt;code&gt;root&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;root&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;ZooKeeper: &lt;code&gt;localhost:2181&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Config files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-master/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-api/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-worker/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If needed, modify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.url&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.username&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.password&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;registry.zookeeper.connect-string&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do NOT use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-Dspring.profiles.active=mysql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-Dspring.profiles.active=postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;5. Configure IntelliJ IDEA Run Configurations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Common settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JDK: 8 or 11&lt;/li&gt;
&lt;li&gt;Use the classpath of the module&lt;/li&gt;
&lt;li&gt;Enable: &lt;code&gt;Add dependencies with "provided" scope to classpath&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Working directory: project root&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This option is critical to avoid missing dependency issues.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Create these configurations:&lt;/p&gt;

&lt;h3&gt;
  
  
  MasterServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.server.master.MasterServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPC: 5678&lt;/li&gt;
&lt;li&gt;Spring Boot: 5679&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  WorkerServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.server.worker.WorkerServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPC: 1234&lt;/li&gt;
&lt;li&gt;Spring Boot: 1235&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ApiApplicationServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.api.ApiApplicationServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP: 12345&lt;/li&gt;
&lt;li&gt;Gateway: 25333&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Startup order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MasterServer&lt;/li&gt;
&lt;li&gt;WorkerServer&lt;/li&gt;
&lt;li&gt;ApiApplicationServer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Start Frontend&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler\dolphinscheduler-ui&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;pnpm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;pnpm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dev&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:5173
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default credentials:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Username: &lt;code&gt;admin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;dolphinscheduler123&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;7. Verification&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  API
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/actuator/health&lt;/code&gt; → should return &lt;code&gt;UP&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/swagger-ui&lt;/code&gt; → should load successfully&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Access UI and log in successfully&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Check for fatal errors in the IDEA console&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;8. Common Issues&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ZooKeeper connection failed
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;ZooKeeper is not running&lt;/li&gt;
&lt;li&gt;Port 2181 not exposed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Missing &lt;code&gt;t_ds_version&lt;/code&gt; table
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;DB not initialized&lt;/li&gt;
&lt;li&gt;Wrong database&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Missing dependencies in IDEA
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Check the “provided scope” option&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Port 12345 occupied
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Stop conflicting processes&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>beginners</category>
      <category>opensource</category>
      <category>apachedolphinscheduler</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Built by the Community: Apache DolphinScheduler March 2026 Highlights</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 09:59:10 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/built-by-the-community-apache-dolphinscheduler-march-2026-highlights-4nmp</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/built-by-the-community-apache-dolphinscheduler-march-2026-highlights-4nmp</guid>
      <description>&lt;p&gt;Hey there! The March 2026 monthly report is here! The Apache DolphinScheduler community has been on fire 🔥&lt;/p&gt;

&lt;p&gt;A total of 13 contributors actively submitted code. Version &lt;strong&gt;3.4.1&lt;/strong&gt; was released, bringing enhanced scheduling, upgraded task plugins, improved API &amp;amp; UI, and fixing 15+ bugs.&lt;/p&gt;

&lt;p&gt;Meanwhile, infrastructure has also been upgraded. Both enterprise and individual users are encouraged to upgrade and explore the latest features. Let’s grow with the community 🚀&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reporting period: March 1, 2026 – March 30, 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Release&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Release Date&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3.4.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2026-03-01&lt;/td&gt;
&lt;td&gt;Latest stable release&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;📎 Download: &lt;a href="https://dolphinscheduler.apache.org/download" rel="noopener noreferrer"&gt;https://dolphinscheduler.apache.org/download&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Key Feature Updates&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1 Scheduling Enhancements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Configurable Max Runtime&lt;/td&gt;
&lt;td&gt;Set maximum runtime limits for workflows/tasks&lt;/td&gt;
&lt;td&gt;#17932&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worker Group Optimization&lt;/td&gt;
&lt;td&gt;Allow creation of Worker Groups without Workers&lt;/td&gt;
&lt;td&gt;#17927&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling Timeout Detection&lt;/td&gt;
&lt;td&gt;Handle cases with missing or unavailable Workers&lt;/td&gt;
&lt;td&gt;#17796&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2 Task Plugin Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Java Task&lt;/td&gt;
&lt;td&gt;Support built-in &amp;amp; custom variables&lt;/td&gt;
&lt;td&gt;#17860&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zeppelin Task&lt;/td&gt;
&lt;td&gt;Support parameter parsing&lt;/td&gt;
&lt;td&gt;#17862&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedure Task&lt;/td&gt;
&lt;td&gt;Support cancellation &amp;amp; output parameters&lt;/td&gt;
&lt;td&gt;#17696, #17973&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP Task&lt;/td&gt;
&lt;td&gt;Fix nested JSON sending issue&lt;/td&gt;
&lt;td&gt;#17911&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.3 API &amp;amp; UI Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Remove import/export (DSIP-104)&lt;/td&gt;
&lt;td&gt;#17941&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Improve Spark parameter validation&lt;/td&gt;
&lt;td&gt;#17958&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Fix Keycloak icon 404 issue&lt;/td&gt;
&lt;td&gt;#18007&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Fix lock not released on request failure&lt;/td&gt;
&lt;td&gt;#17989&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Bug Fixes&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix timeout alert failure&lt;/td&gt;
&lt;td&gt;#17818&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix workflow failure strategy issue&lt;/td&gt;
&lt;td&gt;#17851&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix task not marked failed on init error&lt;/td&gt;
&lt;td&gt;#17821&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependent&lt;/td&gt;
&lt;td&gt;Fix PostgreSQL dependency SQL error&lt;/td&gt;
&lt;td&gt;#17837&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Fix token deletion issue for non-admin users&lt;/td&gt;
&lt;td&gt;#17997&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Add tenant validation&lt;/td&gt;
&lt;td&gt;#17970&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAO&lt;/td&gt;
&lt;td&gt;Fix type mismatch in workflow_definition_code&lt;/td&gt;
&lt;td&gt;#17988&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alert&lt;/td&gt;
&lt;td&gt;Fix timeout unit inconsistency&lt;/td&gt;
&lt;td&gt;#17920&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SeaTunnel&lt;/td&gt;
&lt;td&gt;Fix broken documentation link&lt;/td&gt;
&lt;td&gt;#17905&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Params&lt;/td&gt;
&lt;td&gt;Fix Procedure Task param passing issue&lt;/td&gt;
&lt;td&gt;#17968&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Community Updates&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Top Contributors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In March, &lt;strong&gt;31 PRs&lt;/strong&gt; were merged. Thanks to all &lt;strong&gt;9 contributors&lt;/strong&gt; 🙌&lt;/p&gt;

&lt;p&gt;Full list: &lt;a href="https://github.com/apache/dolphinscheduler/graphs/contributors" rel="noopener noreferrer"&gt;https://github.com/apache/dolphinscheduler/graphs/contributors&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Infrastructure Updates&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Upgrade ZooKeeper to 3.8.3&lt;/li&gt;
&lt;li&gt;Upgrade Testcontainers to 1.21.4&lt;/li&gt;
&lt;li&gt;Update license year&lt;/li&gt;
&lt;li&gt;Add AI usage confirmation to PR template&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. Enterprise Recommendations&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔧 Upgrade Advice
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Production environments are recommended to upgrade to &lt;strong&gt;3.4.1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Includes multiple bug fixes and stability improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📋 Key Features to Watch
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Runtime limits for workflows/tasks&lt;/li&gt;
&lt;li&gt;Flexible Worker Group management&lt;/li&gt;
&lt;li&gt;Enhanced Procedure Task capabilities&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ⚠️ Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No major API changes this month&lt;/li&gt;
&lt;li&gt;Follow official docs for latest configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Statistics&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;March Data&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Releases&lt;/td&gt;
&lt;td&gt;1 (3.4.1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Improvements&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug Fixes&lt;/td&gt;
&lt;td&gt;15+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contributors&lt;/td&gt;
&lt;td&gt;13+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>community</category>
      <category>apchedolphinscheduler</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>Meet ASF’s New Member Xiang Zihao: How He Impacts the Community with Code and the Apache Way</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:24:08 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/meet-asfs-new-member-xiang-zihao-how-he-impacts-the-community-with-code-and-the-apache-way-4ko9</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/meet-asfs-new-member-xiang-zihao-how-he-impacts-the-community-with-code-and-the-apache-way-4ko9</guid>
      <description>&lt;p&gt;Congratulations to &lt;a class="mentioned-user" href="https://dev.to/xiang"&gt;@xiang&lt;/a&gt; Zihao on being recently invited to become an ASF Member! As a PMC Member of Apache DolphinScheduler, the community is truly delighted by this well-deserved recognition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleh1v3557mvdxcnoyadc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleh1v3557mvdxcnoyadc.png" alt="467d043346d43a87f99395f5ff9e631c" width="560" height="949"&gt;&lt;/a&gt;&lt;br&gt;
Over the years, his continuous contributions to the community have been evident to all—from documentation improvements to code enhancements, from active discussions to helping newcomers. His presence can be seen everywhere. Beyond Apache DolphinScheduler, he is also deeply involved in multiple ASF open source projects, consistently practicing the Apache Way year after year. All his persistent efforts have finally led him to this milestone.&lt;/p&gt;

&lt;p&gt;On this occasion, the community conducted another in-depth interview with him. This time, through five chapters—Personal Background, Open Source Contributions &amp;amp; Growth, Becoming an ASF Member, DolphinScheduler Community Development, and Open Source Culture—we take a closer look at his journey, his growth story in open source, and the passion and persistence he has accumulated within the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Personal Background
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q1: Could you briefly introduce yourself, including how you entered the big data and open source fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I’m Xiang Zihao / SbloodyS 👋&lt;br&gt;
My hobbies include coding during the day, gaming at night, taking my kid out on weekends, backpacking during holidays, and enjoying tea chats when I need a break.&lt;br&gt;
My life philosophy is: explore the world through code, and heal through life.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2: When did you start contributing to Apache DolphinScheduler? What was the trigger?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I first encountered Apache DolphinScheduler in 2021. It was actually quite accidental—an opportunity at work introduced me to this scheduling system. Unexpectedly, this “chance encounter” gradually drew me in, and I began contributing to the community.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3: What key work or features have you contributed to DolphinScheduler?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I have mainly worked on documentation optimization, performance improvements, bug fixes, code reviews, and CI/CD optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: Open Source Contributions &amp;amp; Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q4: In open source collaboration, what do you think is the most important ability? Technical skills, communication, or something else?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I believe the most important ability in open source collaboration is not a single dimension, but a combination of technical skills, communication ability, and an open mindset.&lt;br&gt;
Technical skills are the foundation, communication determines efficiency and quality, and an open mindset is the key to long-term growth.&lt;br&gt;
If I had to prioritize, I’d say openness is the most fundamental—it determines whether you are willing to learn, ask, and evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q5: What advice would you give to newcomers in open source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Start by “using” rather than “building.”&lt;br&gt;
Become a real user first, identify problems during usage, submit issues, then gradually move to documentation fixes, bug fixes, and eventually core feature development.&lt;br&gt;
Don’t aim to contribute “big features” right away—every small PR is the beginning of building trust with the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Becoming an ASF Member
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q6: Congratulations on becoming an ASF Member! What was your first reaction?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Thank you! Honestly, my first reaction was a mix of surprise and gratitude.&lt;/p&gt;

&lt;p&gt;Surprise—because becoming an ASF Member was never my initial goal. In 2021, I simply started contributing to solve problems and give back to the community, and I never imagined this journey would lead here.&lt;/p&gt;

&lt;p&gt;Gratitude—because this honor represents the trust and support of the entire community. Without patient reviewers and fellow contributors, I wouldn’t be here today.&lt;/p&gt;

&lt;p&gt;For me, becoming an ASF Member is not an endpoint, but a new beginning. It means greater responsibility and a commitment to give back even more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q7: How closely related is this achievement to DolphinScheduler? What other factors contributed?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: DolphinScheduler was an important foundation, but not the only reason.&lt;/p&gt;

&lt;p&gt;On one hand, it’s the first Apache project I deeply engaged in, where I built experience and credibility through contributions.&lt;/p&gt;

&lt;p&gt;On the other hand, ASF evaluates broader impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-project contributions&lt;/li&gt;
&lt;li&gt;Community-building efforts&lt;/li&gt;
&lt;li&gt;Practicing the Apache Way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, DolphinScheduler was my starting point, but sustained and sincere contributions to the broader Apache ecosystem made this possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q8: What does becoming an ASF Member mean to you and the community?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: For me, it’s recognition from the global open source community—not for one achievement, but for long-term commitment. It’s also a responsibility to keep improving.&lt;/p&gt;

&lt;p&gt;For the community, ASF Members are core contributors responsible for project incubation, governance, and cultural inheritance.&lt;/p&gt;

&lt;p&gt;For China’s open source ecosystem, more ASF Members represent growing global recognition and diversity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q9: How important is the Apache Way to project success?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It can be summed up in one phrase: “Community Over Code.”&lt;br&gt;
Code can be replaced, but a healthy, collaborative community cannot.&lt;br&gt;
The Apache Way ensures openness, transparency, and consensus-driven development—proven principles behind many successful projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4: DolphinScheduler Community Development
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q10: What are the key milestones in DolphinScheduler’s growth?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Three major turning points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Donation to Apache&lt;/li&gt;
&lt;li&gt;Graduation from incubation&lt;/li&gt;
&lt;li&gt;Globalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These milestones transformed it into a globally recognized project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q11: How do you see its positioning and future?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: DolphinScheduler is evolving into a next-generation cloud-native workflow orchestration platform, connecting the full data lifecycle.&lt;br&gt;
Its future lies in integrating with modern data stacks and becoming essential for data engineers worldwide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q12: What are your future plans as an ASF Member?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Three directions: Deepening, Expanding, and Passing On.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deepening: continue contributing to core tech and governance&lt;/li&gt;
&lt;li&gt;Expanding: engage in more Apache projects&lt;/li&gt;
&lt;li&gt;Passing On: help more developers enter open source&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open source has given me a lot—I want to pass it forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5: Open Source Culture &amp;amp; Personal Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q13: How has open source changed you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It reshaped my definition of growth.&lt;br&gt;
Before, growth meant improving skills. Now, it means expanding impact—helping others grow.&lt;br&gt;
I’ve transformed from a solo problem-solver into a global collaborator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q14: How would you summarize the spirit of open source in one sentence?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Open source is a belief that sharing is more powerful than owning.&lt;/p&gt;

&lt;p&gt;That concludes our interview! If you found this inspiring, feel free to like, share, and spread the word so more people can discover valuable insights from the open source world 🏅&lt;/p&gt;

</description>
      <category>asf</category>
      <category>opensource</category>
      <category>apachedolphinscheduler</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:22:57 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-f4n</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-f4n</guid>
      <description>&lt;p&gt;In Apache DolphinScheduler, multi-tenancy is not just an “auxiliary permission feature,” but the core execution model of the scheduling system. What it truly solves is not “who can use the system,” but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Under what identity tasks run, what resources they consume, and how to prevent interference between them&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Only by understanding this can we grasp the essence of DolphinScheduler’s multi-tenant design.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Are Single-Tenant and Multi-Tenant?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, let’s clarify what single-tenant and multi-tenant mean.&lt;/p&gt;

&lt;p&gt;In enterprise scheduling platforms, how different teams or business units share platform resources is a fundamental design concern. &lt;strong&gt;Single-tenancy and multi-tenancy&lt;/strong&gt; are two common models, with clear differences in resource isolation, stability, and scalability. Understanding these differences helps organizations choose the right architecture for efficient and controllable scheduling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwni70i5va5v2agogbq1k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwni70i5va5v2agogbq1k.jpg" width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;single-tenant&lt;/strong&gt; system serves only one team or business unit. All tasks share the same execution environment, resource pool, and permission system.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;multi-tenant&lt;/strong&gt; system, on the other hand, allows multiple teams to share one platform. Each team is logically isolated as an independent Tenant and mapped to underlying execution identities (Linux users), resource queues (YARN queues), or cloud-native namespaces (Kubernetes namespaces), enabling independent management of tasks and resources.&lt;/p&gt;

&lt;p&gt;Compared with single-tenancy, multi-tenancy provides significant advantages in resource isolation, stability, and scalability. While single-tenancy is simple to deploy and manage, resource contention and task interference become inevitable as the number of users grows. Multi-tenancy avoids this by clearly isolating Tenants and assigning dedicated resource pools per team or environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Core Mechanism: Tenant-Centric Execution Model&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To overcome the limitations of single-tenancy, Apache DolphinScheduler adopts a multi-tenant design.&lt;/p&gt;

&lt;p&gt;At the heart of this design is a single concept: &lt;strong&gt;Tenant&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, a Tenant is not just a logical label—it is an &lt;strong&gt;execution context container&lt;/strong&gt;. When a task is scheduled, the system determines three key aspects based on the Tenant:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Execution Identity
&lt;/h3&gt;

&lt;p&gt;Tasks do not run abstractly on Worker nodes; they must run as a specific OS user. A Tenant is bound to a Linux user, and tasks execute under that identity, inheriting file permissions and system-level isolation.&lt;/p&gt;

&lt;p&gt;Example: Executing tasks as a Linux user&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Switch to the Linux user corresponding to the Tenant
sudo su - team_alpha_user

# Execute workflow task
spark-submit --class com.example.Job /opt/jobs/job.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Tenant is bound to an OS user, and tasks run under this identity on Worker nodes, achieving file permission and environment isolation.&lt;/li&gt;
&lt;li&gt;Tip: Ensure each Tenant has an independent home directory to avoid unauthorized access.
### 2. Resource Ownership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When tasks are submitted to engines like Spark or Flink, they must enter a resource pool. The Tenant determines the target resource queue or namespace, ensuring controlled resource usage.&lt;/p&gt;

&lt;p&gt;Example: Create a Tenant and bind a YARN Queue&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST http://dolphinscheduler-api:12345/tenants \
  -H "Content-Type: application/json" \
  -d '{
        "name": "team_alpha",
        "queue": "team_alpha_queue",
        "description": "Team Alpha Tenant"
      }'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Each Tenant corresponds to a YARN Queue or K8s Namespace, ensuring exclusive resource usage.&lt;/li&gt;
&lt;li&gt;Tip: After creating a Tenant, remember to configure the queue or namespace in the resource scheduling system.
### 3. Isolation Boundary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tenant defines a clear boundary for data access, task execution, and resource usage, forming logical isolation between teams.&lt;/p&gt;

&lt;p&gt;Together, these three aspects form the foundation of DolphinScheduler’s multi-tenant mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How Resource Isolation Is Achieved&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy alone at the scheduling layer is not enough. The key design of DolphinScheduler is mapping Tenants to &lt;strong&gt;real underlying resource systems&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;YARN-Based Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In traditional big data architectures, Tenants are mapped to YARN queues. Each Tenant corresponds to a queue with defined capacity and limits. Tasks are submitted with queue information and scheduled accordingly, preventing resource contention.&lt;/p&gt;

&lt;p&gt;YARN Mapping Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Queue configuration

&amp;lt;queue name="team_alpha_queue"&amp;gt;
  &amp;lt;capacity&amp;gt;30&amp;lt;/capacity&amp;gt;
  &amp;lt;maximum-capacity&amp;gt;50&amp;lt;/maximum-capacity&amp;gt;
  &amp;lt;user-limit-factor&amp;gt;1.0&amp;lt;/user-limit-factor&amp;gt;
&amp;lt;/queue&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Tasks automatically enter the queue when submitted, avoiding resource conflicts between Tenants.&lt;/li&gt;
&lt;li&gt;Tip: Capacity and maximum capacity can be dynamically adjusted based on team workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if one team submits a large number of tasks, it only consumes resources within its own queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Kubernetes-Based Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In cloud-native environments, Tenants are mapped to Kubernetes namespaces. Tasks run as Pods, and:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ResourceQuota&lt;/strong&gt; limits total resource usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LimitRange&lt;/strong&gt; restricts per-task resource consumption
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    cpu: "20"
    memory: "64Gi"
    pods: "50"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Limits total resources and number of Pods to achieve cloud-native isolation.&lt;/li&gt;
&lt;li&gt;Tip: Combine with LimitRange to control per-task resource limits and prevent a single task from monopolizing resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach isolates not only resources but also runtime environments and networking.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;OS-Level Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the execution layer, Linux users provide the final isolation boundary. Even on the same machine, tasks from different Tenants cannot access each other’s files or scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;End-to-End Execution Flow&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Putting everything together, the execution flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A workflow is triggered in DolphinScheduler&lt;/li&gt;
&lt;li&gt;The system determines the Tenant&lt;/li&gt;
&lt;li&gt;The Master assigns tasks to Workers&lt;/li&gt;
&lt;li&gt;Workers switch to the corresponding Linux user&lt;/li&gt;
&lt;li&gt;Tasks are submitted with resource metadata (YARN queue / K8s namespace)&lt;/li&gt;
&lt;li&gt;Tasks run within the assigned resource pool under defined limits&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsd2mrke5qbm1g0xhrtu.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsd2mrke5qbm1g0xhrtu.jpg" width="791" height="326"&gt;&lt;/a&gt;&lt;br&gt;
This creates full isolation from scheduling logic to resource execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Technical Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The architecture can be understood in three layers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmr03er3ohwfjkz947e3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmr03er3ohwfjkz947e3.jpg" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top Layer&lt;/strong&gt;: DolphinScheduler (Tenant / Workflow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middle Layer&lt;/strong&gt;: Mapping (Linux User / YARN Queue / K8s Namespace)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottom Layer&lt;/strong&gt;: Resource systems (Compute nodes / Big data clusters / Kubernetes clusters)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key idea is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The scheduling layer does not directly manage resources—it controls them through Tenant mapping&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why This Design Works in Enterprises&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This design becomes especially powerful in enterprise environments.&lt;/p&gt;

&lt;p&gt;When multiple teams share a platform, resource contention is inevitable. Without Tenant-to-resource mapping, a high-load workload could impact the entire system. With proper isolation, each team operates within its own boundaries.&lt;/p&gt;

&lt;p&gt;It also simplifies troubleshooting. Issues can be traced to a specific Tenant and then to its corresponding resource pool, without affecting the entire system.&lt;/p&gt;

&lt;p&gt;Most importantly, the design is highly scalable. Adding new teams or integrating new compute engines only requires extending Tenant mappings, without redesigning the scheduling system.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Summary&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;DolphinScheduler’s multi-tenant design is essentially a way to &lt;strong&gt;embed the scheduling system into the resource ecosystem&lt;/strong&gt;. Instead of relying on complex logic, it leverages operating systems, resource schedulers, and container platforms to build a stable, clear, and controllable execution model.&lt;/p&gt;

&lt;p&gt;For engineers, the real focus is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How to create a Tenant”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;but rather:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How to map Tenants to resources effectively to achieve true isolation and stability”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the core value of multi-tenant design.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous articles:
&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next article preview:
Part 7 | Where Are the Performance Bottlenecks in Scheduling Platforms?&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dolphinscheduler</category>
      <category>opensource</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>Apache SeaTunnel 2.3.13 Major Release! Top 10 Features You Should Know</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:35:27 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-seatunnel-2313-major-release-top-10-features-you-should-know-j02</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-seatunnel-2313-major-release-top-10-features-you-should-know-j02</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Apache SeaTunnel community officially released &lt;strong&gt;version 2.3.13&lt;/strong&gt;! This release is a milestone for Apache SeaTunnel, bringing important features such as &lt;strong&gt;Checkpoint API, Flink engine upgrade, large file parallel processing, multi-table sync, AI Embedding Transform, and richer connector extensions&lt;/strong&gt;. Whether for batch processing or real-time CDC syncing to Lakehouse, SeaTunnel can now support your data integration tasks more efficiently, stably, and intelligently.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;strong&gt;50+ community contributors&lt;/strong&gt;, this release includes &lt;strong&gt;100+ PRs&lt;/strong&gt; of new features, optimizations, and bug fixes. If you are building &lt;strong&gt;data warehouses, real-time sync platforms, or AI data pipelines&lt;/strong&gt;, this release is worth your attention.&lt;/p&gt;

&lt;p&gt;No time to read the full Release Notes? No worries, here are the &lt;strong&gt;Top 10 features of this release&lt;/strong&gt; with PR references for your reference.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Release Note: &lt;a href="https://github.com/apache/seatunnel/releases/tag/2.3.13" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/releases/tag/2.3.13&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  01 New Checkpoint API Enhances Task Fault Tolerance
&lt;/h2&gt;

&lt;p&gt;In data sync tasks, checkpoints are one of the core mechanisms to ensure task reliability. SeaTunnel 2.3.13 introduces &lt;strong&gt;Checkpoint API&lt;/strong&gt; (#10065), making task state management more flexible and providing a solid foundation for future scheduling and operation capabilities. The Zeta engine supports &lt;strong&gt;min-pause configuration&lt;/strong&gt; (#9804) to avoid system pressure caused by frequent checkpoints.&lt;/p&gt;

&lt;p&gt;Monitoring has also been enhanced, such as adding Sink commit metrics and calculating commit rate (#10233), returning PendingJobs information in the task overview interface (#9902), and providing REST API to view the Pending queue (#10078).&lt;/p&gt;

&lt;p&gt;These capabilities help users better understand task execution status and optimize checkpoint strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  02 Flink 1.20.1 Support and Enhanced CDC
&lt;/h2&gt;

&lt;p&gt;On the engine side, this version improves Apache Flink support. SeaTunnel now supports &lt;strong&gt;Flink 1.20.1&lt;/strong&gt; (#9576), and CDC sync capabilities have been enhanced. CDC Source now supports &lt;strong&gt;Schema Evolution&lt;/strong&gt; (#9867), automatically adapting sync tasks to source table structure changes.&lt;/p&gt;

&lt;p&gt;Additionally, NO_CDC Source also supports checkpoints (#10094), improving task recovery. These changes make SeaTunnel more stable in scenarios with frequent database schema changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  03 Large File Parallel Reading Significantly Improved
&lt;/h2&gt;

&lt;p&gt;In real data platforms, large amounts of data often exist as files, such as HDFS, object storage, or local file systems.&lt;/p&gt;

&lt;p&gt;This release significantly optimizes file processing performance. HDFS File Connector supports true large file parallel splitting (#10332), LocalFile Connector supports CSV, Text, JSON large file parallel reading (#10142), and Parquet files now support Logical Split (#10239).&lt;/p&gt;

&lt;p&gt;HDFS File also supports multi-table reading (#9816). These improvements significantly increase throughput for TB-scale file processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  04 File Connector Adds Update Sync Mode
&lt;/h2&gt;

&lt;p&gt;Previously, file sync tasks only supported append or overwrite. In this version, multiple file connectors add &lt;strong&gt;sync_mode=update&lt;/strong&gt;, including FTP, SFTP, and LocalFile Source (#10437), and HdfsFile Source (#10268). This allows file sync tasks to support update semantics, better fitting incremental data processing scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  05 Connector Ecosystem Expansion
&lt;/h2&gt;

&lt;p&gt;SeaTunnel 2.3.13 continues to expand and enhance the connector ecosystem. For analytical databases, it adds DuckDB Source and Sink support (#10285), suitable for local analysis and data exploration.&lt;/p&gt;

&lt;p&gt;New or enhanced connectors include Apache HugeGraph Sink (#10002), AWS DSQL Sink (#9739), Lance Dataset Sink (#9894), IoTDB 2.x Source and Sink (#9872).&lt;/p&gt;

&lt;p&gt;Existing connectors have also been improved: PostgreSQL supports TIMESTAMP_TZ (#10048), Hive Sink supports SchemaSaveMode and DataSaveMode (#9743), MongoDB Sink supports multi-table writing and adds SaveMode (#9958 / #9883).&lt;/p&gt;

&lt;p&gt;These updates significantly improve SeaTunnel’s adaptability in database and Lakehouse scenarios and the efficiency of building data pipelines.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Connector&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Feature Highlights&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analytical DB&lt;/td&gt;
&lt;td&gt;DuckDB&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Read and write data from DuckDB, suitable for local analysis and exploration&lt;/td&gt;
&lt;td&gt;#10285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph DB&lt;/td&gt;
&lt;td&gt;Apache HugeGraph&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into HugeGraph&lt;/td&gt;
&lt;td&gt;#10002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Lakehouse&lt;/td&gt;
&lt;td&gt;AWS DSQL&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into AWS DSQL&lt;/td&gt;
&lt;td&gt;#9739&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File/Dataset&lt;/td&gt;
&lt;td&gt;Lance Dataset&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into Lance Dataset&lt;/td&gt;
&lt;td&gt;#9894&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time Series DB&lt;/td&gt;
&lt;td&gt;IoTDB 2.x&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Add IoTDB 2.x source and sink support&lt;/td&gt;
&lt;td&gt;#9872&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relational DB&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Source&lt;/td&gt;
&lt;td&gt;Support TIMESTAMP_TZ type&lt;/td&gt;
&lt;td&gt;#10048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Warehouse&lt;/td&gt;
&lt;td&gt;Hive&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support SchemaSaveMode and DataSaveMode&lt;/td&gt;
&lt;td&gt;#9743&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document DB&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support multi-table write and new SaveMode&lt;/td&gt;
&lt;td&gt;#9958 / #9883&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  06 Kafka Supports Protobuf Schema Registry
&lt;/h2&gt;

&lt;p&gt;In real-time scenarios, Kafka often uses Schema Registry. This release adds &lt;strong&gt;Protobuf Schema Registry Wire Format support&lt;/strong&gt; (#10183) to Kafka Connector, allowing SeaTunnel to directly parse Protobuf data managed via Schema Registry, making real-time pipeline construction easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  07 New AI Embedding Transform
&lt;/h2&gt;

&lt;p&gt;With AI and data engineering integration, more companies need vector data pipelines.&lt;/p&gt;

&lt;p&gt;SeaTunnel adds &lt;strong&gt;Multimodal Embedding Transform&lt;/strong&gt; (#9673) in the Transform component, generating vector data directly in pipelines for vector databases, RAG systems, and AI retrieval applications. &lt;strong&gt;RegexExtract Transform&lt;/strong&gt; (#9829) further enhances data cleaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  08 Markdown Parser Supports RAG Scenarios
&lt;/h2&gt;

&lt;p&gt;Markdown documents are common in AI data preparation. This release adds &lt;strong&gt;Markdown Parser&lt;/strong&gt; (#9760) and related documentation (#9834) for parsing and structuring Markdown, facilitating RAG pipeline construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  09 Stability and Performance Improvements
&lt;/h2&gt;

&lt;p&gt;This release includes numerous stability and performance optimizations, such as ClickHouse Connector parallel read strategy (#9801), MySQL Connector shard calculation (#9975), JSON parsing for nested structures (#10000), Zeta engine task metrics (#9833), and more.&lt;/p&gt;

&lt;p&gt;It also fixes production issues like Zeta engine memory leak on task cancellation (#10315), ClickHouse ThreadLocal memory leak (#10264), MongoDB multi-task submit (#10116), HBase Source scan exception (#10287), Hive Sink init failure (#10331), etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  10 Bug Fixes and Documentation Updates
&lt;/h2&gt;

&lt;p&gt;Fixes include CDC Snapshot Split null pointer (#10404), ClickHouse memory leak (#10264), MongoDB multi-task submit (#10064, #10116), HBase scan exceptions (#10336, #10287), JDBC schema merge overflow (#10387, #9942, #10093), Hive Sink overwrite semantics (#10279, #9823, #9743), Elasticsearch Sink task exit issue (#10038), and other Connector, Transform, Engine, UI, CI fixes (#10422, #10013, etc.).&lt;/p&gt;

&lt;p&gt;Documentation improvements include SeaTunnel MCP &amp;amp; x2SeaTunnel docs (#10108), connector config examples (#10283, #10250, #10241, #10202), multi-table sync examples (#10241), upgrade incompatibility notes (#10068), and doc structure optimizations (#10262, #10395, #10351, #10420, #10438, #10424, #10109, #10382, #10385), helping new users get started and developers better understand architecture and features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thanks to Contributors ❤️
&lt;/h2&gt;

&lt;p&gt;Special thanks to release manager @xiaochen-zhou for strong support in planning and execution. Thanks to all volunteers; your efforts keep the SeaTunnel community growing!&lt;/p&gt;

&lt;p&gt;Adam Wang, AzkabanWarden.Gf, Bo Schuster, cloud456, CloverDew, corgy-w, CosmosNi, Cyanty, David Zollo, dotfive-star, dy102, dyp12, Frui Guo, Jarvis, Jast, Jeremy, JeremyXin, Jia Fan, Joonseo Lee, krutoileshii, 老王, Leon Yoah, Li Dongxu, LiJie20190102, limin, LimJiaWenBrenda, liucongjy, loupipalien, mengxpgogogo-eng, misi, 巧克力黑, shfshihuafeng, silenceland, Sim Chou, Steven Zhao, wanmingshi, wtybxqm, yzeng1618, zhan7236, zhangdonghao, zhuxt2015, zy&lt;/p&gt;

&lt;h2&gt;
  
  
  Download &amp;amp; Try
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Download: &lt;a href="https://seatunnel.apache.org/download" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/download&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Upgrade Guide: &lt;a href="https://seatunnel.apache.org/docs/upgrade-guide" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/docs/upgrade-guide&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Upgrade Note&lt;/strong&gt;: If you are on &lt;strong&gt;SeaTunnel 2.3.x&lt;/strong&gt;, upgrading to 2.3.13 is generally safe as it focuses on feature enhancement and stability. Back up config files and test in staging. For tasks using checkpoints, stop tasks and confirm state consistency to avoid checkpoint conflicts. Check connector config changes (Hive, MongoDB, Kafka). If using Flink engine, consider upgrading to Flink 1.20.x for better compatibility and CDC support.&lt;/p&gt;

</description>
      <category>apacheseatunnel</category>
      <category>release</category>
      <category>datascience</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Airflow Is Overkill for Most Teams-Here’s a Better Option</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 07:32:35 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/airflow-is-overkill-for-most-teams-heres-a-better-option-342h</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/airflow-is-overkill-for-most-teams-heres-a-better-option-342h</guid>
      <description>&lt;p&gt;Last year, when our team was selecting a data platform, my boss directly said:&lt;strong&gt;“Airflow is too heavy. The operational cost is too high. Find a lighter alternative.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To be honest, I was a bit overwhelmed at the time. Airflow is indeed heavy. There are a lot of Python dependencies, and the Celery Executor also requires Redis or RabbitMQ. Once the scale grows a bit, you basically need to use Kubernetes.&lt;/p&gt;

&lt;p&gt;But our data team only has a few people. Asking them to maintain crontab scripts? That would be going backwards.&lt;/p&gt;

&lt;p&gt;Later, after browsing GitHub, I found DolphinScheduler in the Apache Incubator. It has 14.1K stars, is under the Apache 2.0 license, and was open-sourced by a Chinese company (Analysys). Now it has graduated and become a top-level Apache project.&lt;/p&gt;

&lt;p&gt;After trying it out, I found that this thing really has something special.&lt;/p&gt;

&lt;h2&gt;
  
  
  Low-Code Drag-and-Drop, You Can Get Things Done Without Writing YAML
&lt;/h2&gt;

&lt;p&gt;Everyone understands Airflow’s DAG configuration: workflows are written in Python code. It’s flexible, but data analysts can’t understand it.&lt;/p&gt;

&lt;p&gt;DolphinScheduler directly provides you with a visual drag-and-drop interface. You can configure task dependencies just by clicking and dragging with your mouse.&lt;/p&gt;

&lt;p&gt;It supports more than 30 task types: Shell, SQL, Spark, Flink, HTTP, DataX, Python… basically covering all common tasks in big data scenarios.&lt;/p&gt;

&lt;p&gt;Want to run a Hive SQL? Drag a SQL node, configure the data source and script, connect upstream dependencies, done. No need to write a single line of Python, and no need to deal with BashOperator or SparkSubmitOperator.&lt;/p&gt;

&lt;p&gt;This is much more friendly to non-developer roles. Data analysts can configure workflows themselves, without coming to you every day asking you to write DAGs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decentralized High Availability, No Dependence on ZooKeeper
&lt;/h2&gt;

&lt;p&gt;Everyone knows Airflow’s architecture. The Scheduler is a single point. Although it later supports multi-Scheduler HA, it still relies on database locks to ensure tasks are not scheduled repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6mja8di8nya2hrnubew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6mja8di8nya2hrnubew.png" alt="DS去中心化架构" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DolphinScheduler was designed with decentralization from the very beginning. The architecture is very clear, with five core components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Server: the entry point for frontend interaction, including workflow configuration and user permission management&lt;/li&gt;
&lt;li&gt;Master Server: DAG parsing and task distribution; multiple Masters can be deployed, and each can work independently&lt;/li&gt;
&lt;li&gt;Worker Server: task execution nodes that receive tasks from Master and return results&lt;/li&gt;
&lt;li&gt;Alert Server: alert notifications, supporting email, DingTalk, WeCom, Feishu, and more&lt;/li&gt;
&lt;li&gt;Registry: registry center responsible for service discovery and distributed locks, supporting three options: JDBC, ZooKeeper, and Etcd&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s focus on the Master’s decentralized design.&lt;/p&gt;

&lt;p&gt;There is no master-slave relationship between multiple Masters. After starting, each Master registers itself to the Registry, and then competes for tasks using a slot partitioning algorithm.&lt;/p&gt;

&lt;p&gt;How is the partitioning done? It uses modulo on ID:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Command ID % total number of Masters = the slot of the current Master&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if you have 3 Masters, and the Command ID is 1001, then it will be assigned to slot 2 (1001 % 3 = 2, slots start from 0).&lt;/p&gt;

&lt;p&gt;If one Master goes down, its slot will be taken over by other Masters, and tasks will not be lost.&lt;/p&gt;

&lt;p&gt;This design is much simpler than Airflow’s Scheduler HA. It does not require complex leader election logic, and Masters can scale horizontally at any time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use JDBC as Registry, Say Goodbye to ZooKeeper Dependency
&lt;/h2&gt;

&lt;p&gt;In the past, when building distributed scheduling systems, you couldn’t avoid ZooKeeper. Early versions of Airflow also relied on ZK. Later it switched to database locks, but there are still performance bottlenecks.&lt;/p&gt;

&lt;p&gt;DolphinScheduler supports three types of registries: JDBC, ZooKeeper, and Etcd.&lt;/p&gt;

&lt;p&gt;The official recommendation is to use JDBC. You can directly reuse your business database (MySQL or PostgreSQL), without deploying additional ZK or Etcd clusters.&lt;/p&gt;

&lt;p&gt;For small and medium-sized teams, maintaining one less component means reducing cost and improving efficiency.&lt;/p&gt;

&lt;p&gt;Of course, if you already have a ZK cluster, or have extremely high performance requirements (tens of thousands of concurrent scheduling tasks), you can still choose ZK or Etcd.&lt;/p&gt;

&lt;h2&gt;
  
  
  Task Dispatch Mechanism: Active Push Instead of Pull
&lt;/h2&gt;

&lt;p&gt;Airflow’s Celery Executor is a typical task queue model. The Scheduler puts tasks into a Redis queue, and Workers pull them themselves.&lt;/p&gt;

&lt;p&gt;This approach is flexible, but when the queue gets backlogged, it becomes troublesome.&lt;/p&gt;

&lt;p&gt;DolphinScheduler uses active push. After the Master parses the DAG, it directly pushes tasks to Workers via Netty RPC.&lt;/p&gt;

&lt;p&gt;Workers do not need to poll. The Master tells them exactly what to do.&lt;/p&gt;

&lt;p&gt;During task allocation, load balancing is performed. By default, it uses dynamic weighted round-robin, considering CPU, memory, and thread pool usage of Workers, and assigning tasks to nodes with lower load.&lt;/p&gt;

&lt;p&gt;If a Worker is about to be overloaded, the Master will automatically schedule tasks to other nodes.&lt;/p&gt;

&lt;p&gt;The advantage of this push mechanism is low scheduling latency. The Master can grasp Worker status in real time, and tasks will not sit in the queue for dozens of seconds waiting to be consumed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plugin-Based Architecture, Replace Anything You Want
&lt;/h2&gt;

&lt;p&gt;DolphinScheduler’s plugin system is quite thorough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task plugins: more than 30 built-in task types, and you can write your own plugins&lt;/li&gt;
&lt;li&gt;Alert plugins: email, DingTalk, WeCom, Feishu, Telegram; if not enough, implement the Alert Plugin interface yourself&lt;/li&gt;
&lt;li&gt;Data source plugins: MySQL, PostgreSQL, Hive, Spark SQL, ClickHouse… supporting hundreds of data sources&lt;/li&gt;
&lt;li&gt;Storage plugins: task logs and resource files can be stored locally, on HDFS, S3, or OSS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to switch an alert channel? Write a plugin, package it into a JAR, drop it in, restart the service—done.&lt;/p&gt;

&lt;p&gt;No need to modify source code, and maintenance cost is low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flexible Deployment, One-Click Experience with Docker
&lt;/h2&gt;

&lt;p&gt;The official provides four deployment methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standalone: single-machine mode, for development and testing, can run with one command&lt;/li&gt;
&lt;li&gt;Cluster: cluster mode, standard for production, manually deploy each component&lt;/li&gt;
&lt;li&gt;Docker: start a complete environment with one click, suitable for quick experience&lt;/li&gt;
&lt;li&gt;Kubernetes: deploy with Helm Chart, preferred for cloud-native teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to try quickly, just use Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose -f docker/docker-compose.yaml up -d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the containers start, open your browser at:&lt;br&gt;
&lt;a href="http://localhost:12345/dolphinscheduler" rel="noopener noreferrer"&gt;http://localhost:12345/dolphinscheduler&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Default account: admin / dolphinscheduler123&lt;/p&gt;

&lt;p&gt;Drag a Shell task and try it—you can run a workflow in a few minutes.&lt;/p&gt;

&lt;p&gt;For production deployment, it is recommended to have at least 3 Masters plus several Workers. Use MySQL master-slave or PostgreSQL for the database, and choose JDBC as the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlights of Version 3.4.0
&lt;/h2&gt;

&lt;p&gt;The 3.4.0 version released at the end of last year mainly optimized several points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task priority queue: high-priority tasks can jump the queue instead of waiting&lt;/li&gt;
&lt;li&gt;Dynamic resource allocation: Workers can dynamically adjust thread pool size based on task type&lt;/li&gt;
&lt;li&gt;Workflow version management: DAG changes automatically save history versions, supporting one-click rollback&lt;/li&gt;
&lt;li&gt;Enhanced lineage analysis: visualization of upstream and downstream dependencies of data tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most practical one is the task priority queue. Previously, when inserting urgent tasks, you had to manually pause other tasks to free resources. Now you just assign a high priority label, and the scheduler will handle it automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kind of Teams Is It Suitable For?
&lt;/h2&gt;

&lt;p&gt;That said, after talking about so many advantages, it’s only fair to discuss where it actually fits.&lt;/p&gt;

&lt;p&gt;Suitable teams for DolphinScheduler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data teams with fewer than 10 people and limited operational resources&lt;/li&gt;
&lt;li&gt;Tasks mainly based on offline batch processing, such as ETL, data synchronization, reporting scheduling&lt;/li&gt;
&lt;li&gt;Need for a low-code platform so that analysts and business users can configure workflows&lt;/li&gt;
&lt;li&gt;Already using MySQL/PostgreSQL and do not want to deploy ZooKeeper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not very suitable scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mainly real-time streaming tasks (although Flink is supported, scheduling granularity is still batch-oriented)&lt;/li&gt;
&lt;li&gt;Heavy reliance on Python ecosystem with highly customized workflow logic (Airflow is more flexible)&lt;/li&gt;
&lt;li&gt;Extremely large task volume with tens of thousands of concurrent scheduling tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Overall, DolphinScheduler’s positioning is a &lt;strong&gt;user-friendly, stable, and lightweight data scheduling platform&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It doesn’t have as many fancy features as Airflow, but all the core capabilities are there, and the maintenance cost is much lower.&lt;/p&gt;

&lt;p&gt;After our team migrated from Airflow to DolphinScheduler, the cluster size was reduced from 5 nodes to 3 nodes, and operational manpower was cut by half.&lt;/p&gt;

&lt;p&gt;Now data analysts can configure workflows themselves, and no longer need to urge me every day to write DAGs.&lt;/p&gt;

&lt;p&gt;There is no absolute good or bad scheduling tool. The one that fits your team is the best.&lt;/p&gt;

&lt;p&gt;If you are also looking for an alternative to Airflow, you might want to try DolphinScheduler—it might be exactly what you need.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>apachedolphinschedu</category>
      <category>opensource</category>
      <category>tooling</category>
    </item>
    <item>
      <title>See You in Beijing This August! CFP for Community Over Code Asia 2026 Is Now Open</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 07:09:11 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/see-you-in-beijing-this-august-cfp-for-community-over-code-asia-2026-is-now-open-epd</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/see-you-in-beijing-this-august-cfp-for-community-over-code-asia-2026-is-now-open-epd</guid>
      <description>&lt;p&gt;Community Over Code Asia 2026 will take place from &lt;strong&gt;August 7–9, 2026 in Beijing&lt;/strong&gt;, and the Call for Proposals (CFP) is now officially open.&lt;/p&gt;

&lt;p&gt;Developers, Apache Committers, open-source contributors, technology leaders, and practitioners from around the world will gather in Beijing to explore the latest practices in AI, cloud-native technologies, big data, open-source community governance, and the broader Apache ecosystem.&lt;/p&gt;

&lt;p&gt;If you are contributing to an open-source project or using the Apache technology stack in production, this is the perfect opportunity to share your experience and step onto a global stage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi9uo2s8xui2ryg5d5m6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi9uo2s8xui2ryg5d5m6.jpg" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conference Info
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; August 7 – August 9, 2026&lt;br&gt;
&lt;strong&gt;Location:&lt;/strong&gt; Zhongguancun National Innovation Demonstration Zone Conference Center, Beijing&lt;/p&gt;

&lt;h2&gt;
  
  
  19 Tracks Covering Key Areas of the Apache Ecosystem
&lt;/h2&gt;

&lt;p&gt;This year’s conference will run for three days and feature &lt;strong&gt;19 technical tracks&lt;/strong&gt;, showcasing the latest technical breakthroughs in Apache projects and innovative practices from the Apache Incubator.&lt;/p&gt;

&lt;p&gt;The conference invites developers, technical experts, and open-source contributors worldwide to submit proposals and share insights into Apache projects, cutting-edge technologies, and open-source collaboration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68n2m52946mub46rf7lr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68n2m52946mub46rf7lr.jpg" width="702" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Submit to the DataOps Track!
&lt;/h2&gt;

&lt;p&gt;If you have hands-on experience using Apache DolphinScheduler, optimization practices, or deep insights into new features, you are welcome to submit a talk to the DataOps Track and share your experience with the global community.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conference website:&lt;/strong&gt; &lt;a href="https://asia.communityovercode.org/" rel="noopener noreferrer"&gt;https://asia.communityovercode.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submit your proposal now:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0fqbot1wgsinkhirlx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0fqbot1wgsinkhirlx.jpg" width="197" height="185"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submission deadline:&lt;/strong&gt; April 21, 2026, 23:59 (Beijing Time, UTC+8)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submission language:&lt;/strong&gt; Please submit proposals in English. Presentations can be delivered in either Chinese or English.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Makes Community Over Code Asia 2026 Special?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Curated by top Apache experts&lt;/strong&gt;&lt;br&gt;
Each track is led by experienced contributors from the Apache Software Foundation who carefully curate high-quality sessions focusing on real technical innovation and open collaboration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A one-stop event for Apache ecosystem trends&lt;/strong&gt;&lt;br&gt;
From Agentic Coding and AI Infrastructure to Data + AI and Streaming, the conference covers the most important topics in modern open-source development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connect with global open-source leaders&lt;/strong&gt;&lt;br&gt;
Meet Apache Committers, foundation members, and open-source contributors face-to-face. Exchange ideas, grow your network, and explore the spirit of “The Apache Way”.&lt;/p&gt;

&lt;p&gt;Open source is more than code — it’s a way of collaboration and a culture of innovation. Whether you are an experienced Apache Committer or someone who just submitted your first pull request, Community Over Code Asia 2026 welcomes your voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;See you in Beijing this August.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>cfp</category>
      <category>apachedolphinscheduler</category>
      <category>opensource</category>
      <category>communityovercodeasia</category>
    </item>
  </channel>
</rss>
