<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chen Debra</title>
    <description>The latest articles on DEV Community by Chen Debra (@chen_debra_3060b21d12b1b0).</description>
    <link>https://dev.to/chen_debra_3060b21d12b1b0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1533306%2Fc0ea3a94-ba17-47c8-9304-4571fb1adaf9.png</url>
      <title>DEV Community: Chen Debra</title>
      <link>https://dev.to/chen_debra_3060b21d12b1b0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chen_debra_3060b21d12b1b0"/>
    <language>en</language>
    <item>
      <title>Part 7 | Where Scheduling Systems Really Break and the Hidden Bottlenecks Beyond CPU and Scale</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:58:17 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-lgj</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-7-where-scheduling-systems-really-break-and-the-hidden-bottlenecks-beyond-cpu-and-scale-lgj</guid>
      <description>&lt;p&gt;In production environments, performance issues in a scheduling platform are never caused by a single bottleneck. Instead, they arise from the combined effects of scheduling decisions, task execution, metadata storage, and coordination mechanisms. Taking Apache DolphinScheduler as an example, focusing on just one component, such as the Master or Worker, often leads to misidentifying the root cause.&lt;/p&gt;

&lt;p&gt;This article is based on real-world production experience. It systematically breaks down performance bottlenecks in a scheduling platform and provides practical, actionable optimization strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. From the overall architecture, where exactly are the bottlenecks?
&lt;/h2&gt;

&lt;p&gt;The core workflow of DolphinScheduler can be abstracted as:&lt;/p&gt;

&lt;p&gt;Scheduling → Execution → Storage → Coordination&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hcoleuc4r1dorc1ym9m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hcoleuc4r1dorc1ym9m.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Any layer can become a bottleneck, but the most common issues are concentrated in four areas:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Insufficient scheduling throughput on the Master&lt;/li&gt;
&lt;li&gt;Mismatch between Worker execution capacity and workload&lt;/li&gt;
&lt;li&gt;Excessive pressure on the database (MySQL/PostgreSQL)&lt;/li&gt;
&lt;li&gt;Latency or instability in ZooKeeper (coordination layer)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  2. The Master bottleneck is not CPU, but the “scheduling model”
&lt;/h2&gt;

&lt;p&gt;Many assume the Master’s CPU is the issue. In practice, the real bottleneck is the combination of the scheduling model and database I/O.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Scheduling mechanism
&lt;/h3&gt;

&lt;p&gt;The Master’s core loop looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MasterSchedulerService.java&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ProcessInstance&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findNeedScheduleProcessInstances&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ProcessInstance&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;submitProcessInstance&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a polling + database-driven model. The key limitation is that scheduling capacity is directly tied to database throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Typical symptoms
&lt;/h3&gt;

&lt;p&gt;High scheduling latency:&lt;/p&gt;

&lt;p&gt;Tasks are ready but delayed by tens of seconds before execution, while Master CPU usage remains low and database QPS is high.&lt;/p&gt;

&lt;p&gt;Low throughput:&lt;/p&gt;

&lt;p&gt;The system may only schedule a few hundred tasks per minute, and adding more Masters yields limited improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Optimization strategies
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Reduce database scanning pressure
&lt;/h4&gt;

&lt;p&gt;Typical SQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;t_ds_process_instance&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'READY'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optimization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_state_priority_time&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t_ds_process_instance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_time&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additional measures include limiting scan batch sizes and tuning scheduling intervals to avoid excessive polling.&lt;/p&gt;

&lt;h4&gt;
  
  
  Increase scheduling concurrency
&lt;/h4&gt;

&lt;p&gt;Key configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;master&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
  &lt;span class="na"&gt;dispatch-task-number&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Practical guidelines:&lt;/p&gt;

&lt;p&gt;exec-threads should be approximately 2 to 4 times the number of CPU cores.&lt;br&gt;
dispatch-task-number should not be too large to avoid overwhelming Workers.&lt;/p&gt;
&lt;h4&gt;
  
  
  Scale out Masters
&lt;/h4&gt;

&lt;p&gt;DolphinScheduler supports multiple Masters, but scaling is not linear due to shared database bottlenecks and ZooKeeper coordination overhead.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. More Workers is not always better
&lt;/h2&gt;

&lt;p&gt;Adding more Workers blindly can overload the database and worsen queuing.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.1 Worker configuration
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;worker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Workers act as both execution units and resource isolation boundaries.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.2 Estimation formula
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Worker count ≈ Total concurrent tasks / Per-Worker concurrency
Per-Worker concurrency ≈ CPU cores × (2 to 4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3.3 Example
&lt;/h3&gt;

&lt;p&gt;For 1,000 concurrent tasks and 16-core Workers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Per Worker ≈ 32 to 64 concurrent tasks
Required Workers ≈ 1000 / 50 ≈ 20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.4 Task type matters more
&lt;/h3&gt;

&lt;p&gt;Short tasks (&amp;lt;5 seconds):&lt;/p&gt;

&lt;p&gt;Scheduling overhead exceeds execution time, making the Master the bottleneck.&lt;/p&gt;

&lt;p&gt;Long tasks (&amp;gt;10 minutes):&lt;/p&gt;

&lt;p&gt;Workers become resource bottlenecks due to long occupation time.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Different strategies for short and long tasks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Short tasks optimization
&lt;/h3&gt;

&lt;p&gt;Typical scenarios include SQL queries and API calls.&lt;/p&gt;

&lt;p&gt;Batching example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Before: multiple small queries&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- After: batch query&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Other strategies include reducing DAG granularity and moving loops into scripts.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Long tasks optimization
&lt;/h3&gt;

&lt;p&gt;Typical scenarios include Spark or Flink jobs.&lt;/p&gt;

&lt;p&gt;The bottleneck lies in resource systems rather than the scheduler.&lt;/p&gt;

&lt;p&gt;Strategies:&lt;/p&gt;

&lt;p&gt;Bind workloads to YARN queues or Kubernetes namespaces and enforce concurrency limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The database bottleneck is the most underestimated
&lt;/h2&gt;

&lt;p&gt;Around 80% of production performance issues ultimately relate to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Common problems
&lt;/h3&gt;

&lt;p&gt;Slow queries&lt;br&gt;
Row-level lock contention&lt;br&gt;
Connection pool exhaustion&lt;/p&gt;
&lt;h3&gt;
  
  
  5.2 Typical SQL
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;t_ds_task_instance&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'RUNNING'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Frequent updates to the same rows lead to lock contention and reduced throughput.&lt;/p&gt;
&lt;h3&gt;
  
  
  5.3 Optimization strategies
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Read-write separation
&lt;/h4&gt;

&lt;p&gt;Masters handle writes, while APIs and queries use read replicas.&lt;/p&gt;
&lt;h4&gt;
  
  
  Reduce update frequency
&lt;/h4&gt;

&lt;p&gt;Inefficient pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RUNNING → RUNNING → RUNNING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optimization:&lt;/p&gt;

&lt;p&gt;Reduce heartbeat frequency.&lt;/p&gt;

&lt;h4&gt;
  
  
  Batch updates
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Batch update task states&lt;/span&gt;
&lt;span class="n"&gt;updateBatch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;taskInstances&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. ZooKeeper as a hidden bottleneck
&lt;/h2&gt;

&lt;p&gt;ZooKeeper is responsible for coordination, including Master election, Worker registration, and heartbeat management.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.1 Common symptoms
&lt;/h3&gt;

&lt;p&gt;Scheduling jitter under high load&lt;br&gt;
Workers falsely marked as dead&lt;br&gt;
Frequent Master failovers&lt;/p&gt;
&lt;h3&gt;
  
  
  6.2 Root causes
&lt;/h3&gt;

&lt;p&gt;Improper session timeout settings&lt;br&gt;
Too many nodes and connections&lt;br&gt;
Network instability&lt;/p&gt;
&lt;h3&gt;
  
  
  6.3 Optimization
&lt;/h3&gt;

&lt;p&gt;Example configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;tickTime&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2000&lt;/span&gt;
&lt;span class="py"&gt;initLimit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;
&lt;span class="py"&gt;syncLimit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recommendations:&lt;/p&gt;

&lt;p&gt;Increase session timeout to at least 20 seconds to tolerate transient failures.&lt;br&gt;
Deploy ZooKeeper independently to avoid resource contention.&lt;/p&gt;
&lt;h2&gt;
  
  
  7. A real-world optimization case
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Background
&lt;/h3&gt;

&lt;p&gt;Daily tasks: 200,000&lt;br&gt;
DAGs: 30,000&lt;br&gt;
Masters: 2&lt;br&gt;
Workers: 30&lt;/p&gt;
&lt;h3&gt;
  
  
  Issues
&lt;/h3&gt;

&lt;p&gt;Scheduling latency exceeded 1 minute during peak hours&lt;br&gt;
Database CPU usage reached 90 percent&lt;/p&gt;
&lt;h3&gt;
  
  
  Optimization process
&lt;/h3&gt;

&lt;p&gt;Step 1: Database indexing&lt;br&gt;
Result: latency reduced by 40 percent&lt;/p&gt;

&lt;p&gt;Step 2: Reduce short tasks&lt;br&gt;
Result: DAG count reduced by 30 percent&lt;/p&gt;

&lt;p&gt;Step 3: Adjust Master parameters&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;exec-threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50 → &lt;/span&gt;&lt;span class="m"&gt;120&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: throughput doubled&lt;/p&gt;

&lt;h3&gt;
  
  
  Final results
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scheduling latency reduced from 60 seconds to 8 seconds&lt;/li&gt;
&lt;li&gt;Database CPU usage reduced from 90 percent to 50 percent&lt;/li&gt;
&lt;li&gt;Overall throughput improved by 2 to 3 times&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. Summary: the essence of scheduling performance optimization
&lt;/h2&gt;

&lt;p&gt;The core insight is that performance is a balance of:&lt;/p&gt;

&lt;p&gt;Scheduling capacity × Execution capacity × Storage capacity × Coordination capability&lt;/p&gt;

&lt;p&gt;Optimization must be holistic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Master controls the scheduling rhythm&lt;/li&gt;
&lt;li&gt;Workers provide execution capacity&lt;/li&gt;
&lt;li&gt;The database defines system limits&lt;/li&gt;
&lt;li&gt;ZooKeeper ensures coordination stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately:&lt;/p&gt;

&lt;p&gt;The limit of a scheduling system is not how many tasks it can dispatch, but how long the database can sustain the load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Previous articles:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-ffeaf159f534" rel="noopener noreferrer"&gt;Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next: The boundaries between DolphinScheduler and Flink, Spark, and SeaTunnel&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>scheduling</category>
      <category>apachedolphinscheduler</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Can Your Scheduler Fix Itself at 2 AM? Inside the DolphinScheduler Agent Meetup</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 10:18:14 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/can-your-scheduler-fix-itself-at-2-am-inside-the-dolphinscheduler-agent-meetup-3ae0</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/can-your-scheduler-fix-itself-at-2-am-inside-the-dolphinscheduler-agent-meetup-3ae0</guid>
      <description>&lt;p&gt;If you’ve ever worked with scheduling systems, you’ve probably had moments like this:&lt;/p&gt;

&lt;p&gt;At 2 AM, your phone suddenly lights up.&lt;br&gt;
Not a message—an alert. A job has failed.&lt;/p&gt;

&lt;p&gt;You stare at the screen, with only one thought in your head:&lt;br&gt;
&lt;strong&gt;“Can it just fix itself?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It sounds a bit idealistic.&lt;br&gt;
But this time, we actually want to take it seriously.&lt;/p&gt;

&lt;p&gt;Soon, the Apache DolphinScheduler community will host a new online Meetup.&lt;/p&gt;

&lt;p&gt;This time, we won’t dive into grand architectures or complex theories.&lt;br&gt;
Instead, we start with a very “engineer-like” question:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Can a scheduling system require less human effort?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  📅 &lt;strong&gt;Event Info&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time&lt;/strong&gt;: April 21, 2026, 14:00–15:00&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format&lt;/strong&gt;: Online livestream&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Register your seat now:&lt;/strong&gt; &lt;a href="https://meeting.tencent.com/dm/sdXKjKfLewVe" rel="noopener noreferrer"&gt;https://meeting.tencent.com/dm/sdXKjKfLewVe&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎤 &lt;strong&gt;Who’s Speaking?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv37xjqy2ier6myjfztsi.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv37xjqy2ier6myjfztsi.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This session features &lt;strong&gt;Liu Xiaodong&lt;/strong&gt;,&lt;br&gt;
an algorithm engineer from Shanghai FamilyMart Co., Ltd.&lt;/p&gt;

&lt;p&gt;His self-introduction is quite fun:&lt;/p&gt;

&lt;p&gt;Not limited to one direction—he tinkers with everything.&lt;br&gt;
Writes code, builds systems, explores new ideas.&lt;br&gt;
And occasionally “wanders around Hyrule to discover new landscapes.”&lt;/p&gt;

&lt;p&gt;Sounds like this won’t be a conventional talk.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 &lt;strong&gt;What’s the Topic?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The topic is simple yet vivid:&lt;br&gt;
&lt;strong&gt;“DolphinScheduler Agent: I Just Want to Lie Down and Still Get Work Done”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It starts from a very real idea:&lt;/p&gt;

&lt;p&gt;The dream state of a “lazy engineer” is:&lt;br&gt;
When something breaks, the system detects and fixes it automatically.&lt;br&gt;
Humans just take a glance and say a word—everything else is handled.&lt;/p&gt;

&lt;p&gt;Sounds exaggerated?&lt;/p&gt;

&lt;p&gt;This talk will explore:&lt;br&gt;
👉 &lt;strong&gt;How far can we actually go in this direction&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 &lt;strong&gt;What Will You Learn?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This is not a purely conceptual talk, but an &lt;strong&gt;ongoing exploration&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The design of DolphinScheduler Agent&lt;/li&gt;
&lt;li&gt;How to make scheduling systems more “self-healing”&lt;/li&gt;
&lt;li&gt;Real-world attempts and lessons learned&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;working demo&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than giving standard answers, it’s more like:&lt;br&gt;
&lt;strong&gt;a journey recap + new ways of thinking&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🎁 &lt;strong&gt;Bonus&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There will also be a &lt;strong&gt;lucky draw&lt;/strong&gt; during the livestream 🎉&lt;/p&gt;

&lt;p&gt;You might even win a custom Apache DolphinScheduler keychain—&lt;br&gt;
a must-have for community members!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jcki55c7hnkop7heatr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jcki55c7hnkop7heatr.png" alt="DS 钥匙扣" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  👀 &lt;strong&gt;Who Should Join?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This Meetup is for you if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re using or exploring DolphinScheduler&lt;/li&gt;
&lt;li&gt;You’re interested in automation, agents, or intelligent operations&lt;/li&gt;
&lt;li&gt;You want to see real demos, not just slides&lt;/li&gt;
&lt;li&gt;Or you simply want to “work less” in a smarter way&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📢 &lt;strong&gt;Final Thought&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We’re used to fixing problems when they occur.&lt;br&gt;
But rarely do we ask:&lt;br&gt;
&lt;strong&gt;Can systems prevent problems—or even solve them on their own?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Maybe that’s the next step for scheduling systems.&lt;/p&gt;

&lt;p&gt;📅 April 21&lt;br&gt;
Let’s talk about building systems that are a little less exhausting.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>apachedolphinscheduler</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Apache DolphinScheduler Local Setup Made Simple: A Beginner-Friendly Guide</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 10:08:09 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-local-setup-made-simple-a-beginner-friendly-guide-108e</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-local-setup-made-simple-a-beginner-friendly-guide-108e</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm86el1td22eufuncrqu7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm86el1td22eufuncrqu7.jpg" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article is intended for developers who want to read and debug the core source code of Apache DolphinScheduler locally. The example environment is based on &lt;code&gt;Windows + IntelliJ IDEA + Docker Desktop + PostgreSQL + ZooKeeper&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you only want to quickly تجربه features rather than debug the full chain of &lt;code&gt;master / worker / api&lt;/code&gt;, it is recommended to use &lt;code&gt;StandaloneServer&lt;/code&gt; first. If you want to debug the distributed scheduling workflow, follow this guide to start services separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Use Cases&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Start &lt;code&gt;MasterServer&lt;/code&gt;, &lt;code&gt;WorkerServer&lt;/code&gt;, and &lt;code&gt;ApiApplicationServer&lt;/code&gt; individually in IntelliJ IDEA&lt;/li&gt;
&lt;li&gt;Use Docker Desktop to host PostgreSQL and ZooKeeper&lt;/li&gt;
&lt;li&gt;Debug Java services locally on the host machine&lt;/li&gt;
&lt;li&gt;Run the frontend locally and connect it to backend APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Environment Requirements&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Docker Desktop&lt;/li&gt;
&lt;li&gt;JDK 8 or 11&lt;/li&gt;
&lt;li&gt;Maven 3.8+ (or use the built-in &lt;code&gt;mvnw.cmd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Node.js 16+&lt;/li&gt;
&lt;li&gt;pnpm 8+&lt;/li&gt;
&lt;li&gt;IntelliJ IDEA&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;java.version&lt;/code&gt; in the root &lt;code&gt;pom.xml&lt;/code&gt; is &lt;code&gt;1.8&lt;/code&gt;. It is recommended to use JDK 8 or 11 for local debugging.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Start PostgreSQL and ZooKeeper&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, navigate to the &lt;code&gt;deploy/docker&lt;/code&gt; directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler\deploy\docker&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using the &lt;code&gt;docker-compose-windows.yml&lt;/code&gt; provided in the appendix, ensure that &lt;code&gt;dolphinscheduler-zookeeper&lt;/code&gt; exposes port &lt;code&gt;2181&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;master&lt;/code&gt;, &lt;code&gt;worker&lt;/code&gt;, and &lt;code&gt;api&lt;/code&gt; all connect to &lt;code&gt;localhost:2181&lt;/code&gt; by default. If ZooKeeper runs only inside the container without port mapping, Java processes started in IDEA will fail to connect.&lt;/p&gt;

&lt;p&gt;Ensure the following configuration exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;dolphinscheduler-zookeeper&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;zookeeper:3.8&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2181:2181"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker-compose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docker-compose-windows.yml&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;up&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler-postgresql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler-zookeeper&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optional verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ps&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;127.0.0.1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;5432&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;localhost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;2181&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port &lt;code&gt;5432&lt;/code&gt; is reachable&lt;/li&gt;
&lt;li&gt;Port &lt;code&gt;2181&lt;/code&gt; is reachable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are using local or remote installations instead of Docker, skip this step but ensure configurations match your environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Build the Project&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\mvnw.cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;spotless:apply&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\mvnw.cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clean&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DskipTests&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;spotless:apply&lt;/code&gt; formats code to avoid check failures&lt;/li&gt;
&lt;li&gt;The first build may take a while&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Initialize PostgreSQL Metadata Database&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before starting &lt;code&gt;master&lt;/code&gt; and &lt;code&gt;api&lt;/code&gt;, initialize metadata tables.&lt;/p&gt;

&lt;p&gt;SQL script location:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_postgresql.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Docker PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Get-Content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler-dao\src\main\resources\sql\dolphinscheduler_postgresql.sql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;exec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-e&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docker-dolphinscheduler-postgresql-1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;psql&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-U&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dolphinscheduler&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Alternatively, use DataGrip, DBeaver, or &lt;code&gt;psql&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: This script contains &lt;code&gt;DROP TABLE IF EXISTS&lt;/code&gt;. Do NOT run it on production databases.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="k"&gt;version&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;t_ds_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected: one record returned (e.g., &lt;code&gt;3.4.0&lt;/code&gt;)&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Verify Local Configuration&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Default configs (usually no changes needed):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL: &lt;code&gt;127.0.0.1:5432&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;DB: &lt;code&gt;dolphinscheduler&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Username: &lt;code&gt;root&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;root&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;ZooKeeper: &lt;code&gt;localhost:2181&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Config files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-master/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-api/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dolphinscheduler-worker/.../application.yaml&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If needed, modify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.url&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.username&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;spring.datasource.password&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;registry.zookeeper.connect-string&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do NOT use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-Dspring.profiles.active=mysql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-Dspring.profiles.active=postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;5. Configure IntelliJ IDEA Run Configurations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Common settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JDK: 8 or 11&lt;/li&gt;
&lt;li&gt;Use the classpath of the module&lt;/li&gt;
&lt;li&gt;Enable: &lt;code&gt;Add dependencies with "provided" scope to classpath&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Working directory: project root&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This option is critical to avoid missing dependency issues.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Create these configurations:&lt;/p&gt;

&lt;h3&gt;
  
  
  MasterServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.server.master.MasterServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPC: 5678&lt;/li&gt;
&lt;li&gt;Spring Boot: 5679&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  WorkerServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.server.worker.WorkerServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RPC: 1234&lt;/li&gt;
&lt;li&gt;Spring Boot: 1235&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ApiApplicationServer
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Main class: &lt;code&gt;org.apache.dolphinscheduler.api.ApiApplicationServer&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP: 12345&lt;/li&gt;
&lt;li&gt;Gateway: 25333&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Startup order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MasterServer&lt;/li&gt;
&lt;li&gt;WorkerServer&lt;/li&gt;
&lt;li&gt;ApiApplicationServer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Start Frontend&lt;/strong&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;your-path&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;\dolphinscheduler\dolphinscheduler-ui&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;pnpm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;pnpm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;dev&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:5173
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default credentials:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Username: &lt;code&gt;admin&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Password: &lt;code&gt;dolphinscheduler123&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;7. Verification&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  API
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/actuator/health&lt;/code&gt; → should return &lt;code&gt;UP&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/swagger-ui&lt;/code&gt; → should load successfully&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Frontend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Access UI and log in successfully&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Check for fatal errors in the IDEA console&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;8. Common Issues&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ZooKeeper connection failed
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;ZooKeeper is not running&lt;/li&gt;
&lt;li&gt;Port 2181 not exposed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Missing &lt;code&gt;t_ds_version&lt;/code&gt; table
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;DB not initialized&lt;/li&gt;
&lt;li&gt;Wrong database&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Missing dependencies in IDEA
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Check the “provided scope” option&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Port 12345 occupied
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Stop conflicting processes&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>beginners</category>
      <category>opensource</category>
      <category>apachedolphinscheduler</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Built by the Community: Apache DolphinScheduler March 2026 Highlights</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 02 Apr 2026 09:59:10 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/built-by-the-community-apache-dolphinscheduler-march-2026-highlights-4nmp</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/built-by-the-community-apache-dolphinscheduler-march-2026-highlights-4nmp</guid>
      <description>&lt;p&gt;Hey there! The March 2026 monthly report is here! The Apache DolphinScheduler community has been on fire 🔥&lt;/p&gt;

&lt;p&gt;A total of 13 contributors actively submitted code. Version &lt;strong&gt;3.4.1&lt;/strong&gt; was released, bringing enhanced scheduling, upgraded task plugins, improved API &amp;amp; UI, and fixing 15+ bugs.&lt;/p&gt;

&lt;p&gt;Meanwhile, infrastructure has also been upgraded. Both enterprise and individual users are encouraged to upgrade and explore the latest features. Let’s grow with the community 🚀&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reporting period: March 1, 2026 – March 30, 2026&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;1. Release&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Release Date&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3.4.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2026-03-01&lt;/td&gt;
&lt;td&gt;Latest stable release&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;📎 Download: &lt;a href="https://dolphinscheduler.apache.org/download" rel="noopener noreferrer"&gt;https://dolphinscheduler.apache.org/download&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;2. Key Feature Updates&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1 Scheduling Enhancements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Configurable Max Runtime&lt;/td&gt;
&lt;td&gt;Set maximum runtime limits for workflows/tasks&lt;/td&gt;
&lt;td&gt;#17932&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Worker Group Optimization&lt;/td&gt;
&lt;td&gt;Allow creation of Worker Groups without Workers&lt;/td&gt;
&lt;td&gt;#17927&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling Timeout Detection&lt;/td&gt;
&lt;td&gt;Handle cases with missing or unavailable Workers&lt;/td&gt;
&lt;td&gt;#17796&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2 Task Plugin Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Java Task&lt;/td&gt;
&lt;td&gt;Support built-in &amp;amp; custom variables&lt;/td&gt;
&lt;td&gt;#17860&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zeppelin Task&lt;/td&gt;
&lt;td&gt;Support parameter parsing&lt;/td&gt;
&lt;td&gt;#17862&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedure Task&lt;/td&gt;
&lt;td&gt;Support cancellation &amp;amp; output parameters&lt;/td&gt;
&lt;td&gt;#17696, #17973&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP Task&lt;/td&gt;
&lt;td&gt;Fix nested JSON sending issue&lt;/td&gt;
&lt;td&gt;#17911&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.3 API &amp;amp; UI Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Remove import/export (DSIP-104)&lt;/td&gt;
&lt;td&gt;#17941&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Improve Spark parameter validation&lt;/td&gt;
&lt;td&gt;#17958&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Fix Keycloak icon 404 issue&lt;/td&gt;
&lt;td&gt;#18007&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI&lt;/td&gt;
&lt;td&gt;Fix lock not released on request failure&lt;/td&gt;
&lt;td&gt;#17989&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3. Bug Fixes&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix timeout alert failure&lt;/td&gt;
&lt;td&gt;#17818&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix workflow failure strategy issue&lt;/td&gt;
&lt;td&gt;#17851&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Master&lt;/td&gt;
&lt;td&gt;Fix task not marked failed on init error&lt;/td&gt;
&lt;td&gt;#17821&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependent&lt;/td&gt;
&lt;td&gt;Fix PostgreSQL dependency SQL error&lt;/td&gt;
&lt;td&gt;#17837&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Fix token deletion issue for non-admin users&lt;/td&gt;
&lt;td&gt;#17997&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Add tenant validation&lt;/td&gt;
&lt;td&gt;#17970&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DAO&lt;/td&gt;
&lt;td&gt;Fix type mismatch in workflow_definition_code&lt;/td&gt;
&lt;td&gt;#17988&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alert&lt;/td&gt;
&lt;td&gt;Fix timeout unit inconsistency&lt;/td&gt;
&lt;td&gt;#17920&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SeaTunnel&lt;/td&gt;
&lt;td&gt;Fix broken documentation link&lt;/td&gt;
&lt;td&gt;#17905&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Params&lt;/td&gt;
&lt;td&gt;Fix Procedure Task param passing issue&lt;/td&gt;
&lt;td&gt;#17968&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;4. Community Updates&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Top Contributors&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In March, &lt;strong&gt;31 PRs&lt;/strong&gt; were merged. Thanks to all &lt;strong&gt;9 contributors&lt;/strong&gt; 🙌&lt;/p&gt;

&lt;p&gt;Full list: &lt;a href="https://github.com/apache/dolphinscheduler/graphs/contributors" rel="noopener noreferrer"&gt;https://github.com/apache/dolphinscheduler/graphs/contributors&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Infrastructure Updates&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Upgrade ZooKeeper to 3.8.3&lt;/li&gt;
&lt;li&gt;Upgrade Testcontainers to 1.21.4&lt;/li&gt;
&lt;li&gt;Update license year&lt;/li&gt;
&lt;li&gt;Add AI usage confirmation to PR template&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. Enterprise Recommendations&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  🔧 Upgrade Advice
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Production environments are recommended to upgrade to &lt;strong&gt;3.4.1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Includes multiple bug fixes and stability improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📋 Key Features to Watch
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Runtime limits for workflows/tasks&lt;/li&gt;
&lt;li&gt;Flexible Worker Group management&lt;/li&gt;
&lt;li&gt;Enhanced Procedure Task capabilities&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ⚠️ Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No major API changes this month&lt;/li&gt;
&lt;li&gt;Follow official docs for latest configurations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;6. Statistics&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;March Data&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Releases&lt;/td&gt;
&lt;td&gt;1 (3.4.1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Improvements&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bug Fixes&lt;/td&gt;
&lt;td&gt;15+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contributors&lt;/td&gt;
&lt;td&gt;13+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
      <category>community</category>
      <category>apchedolphinscheduler</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>Meet ASF’s New Member Xiang Zihao: How He Impacts the Community with Code and the Apache Way</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:24:08 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/meet-asfs-new-member-xiang-zihao-how-he-impacts-the-community-with-code-and-the-apache-way-4ko9</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/meet-asfs-new-member-xiang-zihao-how-he-impacts-the-community-with-code-and-the-apache-way-4ko9</guid>
      <description>&lt;p&gt;Congratulations to &lt;a class="mentioned-user" href="https://dev.to/xiang"&gt;@xiang&lt;/a&gt; Zihao on being recently invited to become an ASF Member! As a PMC Member of Apache DolphinScheduler, the community is truly delighted by this well-deserved recognition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleh1v3557mvdxcnoyadc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fleh1v3557mvdxcnoyadc.png" alt="467d043346d43a87f99395f5ff9e631c" width="560" height="949"&gt;&lt;/a&gt;&lt;br&gt;
Over the years, his continuous contributions to the community have been evident to all—from documentation improvements to code enhancements, from active discussions to helping newcomers. His presence can be seen everywhere. Beyond Apache DolphinScheduler, he is also deeply involved in multiple ASF open source projects, consistently practicing the Apache Way year after year. All his persistent efforts have finally led him to this milestone.&lt;/p&gt;

&lt;p&gt;On this occasion, the community conducted another in-depth interview with him. This time, through five chapters—Personal Background, Open Source Contributions &amp;amp; Growth, Becoming an ASF Member, DolphinScheduler Community Development, and Open Source Culture—we take a closer look at his journey, his growth story in open source, and the passion and persistence he has accumulated within the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Personal Background
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q1: Could you briefly introduce yourself, including how you entered the big data and open source fields?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I’m Xiang Zihao / SbloodyS 👋&lt;br&gt;
My hobbies include coding during the day, gaming at night, taking my kid out on weekends, backpacking during holidays, and enjoying tea chats when I need a break.&lt;br&gt;
My life philosophy is: explore the world through code, and heal through life.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q2: When did you start contributing to Apache DolphinScheduler? What was the trigger?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I first encountered Apache DolphinScheduler in 2021. It was actually quite accidental—an opportunity at work introduced me to this scheduling system. Unexpectedly, this “chance encounter” gradually drew me in, and I began contributing to the community.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q3: What key work or features have you contributed to DolphinScheduler?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I have mainly worked on documentation optimization, performance improvements, bug fixes, code reviews, and CI/CD optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: Open Source Contributions &amp;amp; Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q4: In open source collaboration, what do you think is the most important ability? Technical skills, communication, or something else?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: I believe the most important ability in open source collaboration is not a single dimension, but a combination of technical skills, communication ability, and an open mindset.&lt;br&gt;
Technical skills are the foundation, communication determines efficiency and quality, and an open mindset is the key to long-term growth.&lt;br&gt;
If I had to prioritize, I’d say openness is the most fundamental—it determines whether you are willing to learn, ask, and evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q5: What advice would you give to newcomers in open source?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Start by “using” rather than “building.”&lt;br&gt;
Become a real user first, identify problems during usage, submit issues, then gradually move to documentation fixes, bug fixes, and eventually core feature development.&lt;br&gt;
Don’t aim to contribute “big features” right away—every small PR is the beginning of building trust with the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Becoming an ASF Member
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q6: Congratulations on becoming an ASF Member! What was your first reaction?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Thank you! Honestly, my first reaction was a mix of surprise and gratitude.&lt;/p&gt;

&lt;p&gt;Surprise—because becoming an ASF Member was never my initial goal. In 2021, I simply started contributing to solve problems and give back to the community, and I never imagined this journey would lead here.&lt;/p&gt;

&lt;p&gt;Gratitude—because this honor represents the trust and support of the entire community. Without patient reviewers and fellow contributors, I wouldn’t be here today.&lt;/p&gt;

&lt;p&gt;For me, becoming an ASF Member is not an endpoint, but a new beginning. It means greater responsibility and a commitment to give back even more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q7: How closely related is this achievement to DolphinScheduler? What other factors contributed?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: DolphinScheduler was an important foundation, but not the only reason.&lt;/p&gt;

&lt;p&gt;On one hand, it’s the first Apache project I deeply engaged in, where I built experience and credibility through contributions.&lt;/p&gt;

&lt;p&gt;On the other hand, ASF evaluates broader impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-project contributions&lt;/li&gt;
&lt;li&gt;Community-building efforts&lt;/li&gt;
&lt;li&gt;Practicing the Apache Way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, DolphinScheduler was my starting point, but sustained and sincere contributions to the broader Apache ecosystem made this possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q8: What does becoming an ASF Member mean to you and the community?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: For me, it’s recognition from the global open source community—not for one achievement, but for long-term commitment. It’s also a responsibility to keep improving.&lt;/p&gt;

&lt;p&gt;For the community, ASF Members are core contributors responsible for project incubation, governance, and cultural inheritance.&lt;/p&gt;

&lt;p&gt;For China’s open source ecosystem, more ASF Members represent growing global recognition and diversity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q9: How important is the Apache Way to project success?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It can be summed up in one phrase: “Community Over Code.”&lt;br&gt;
Code can be replaced, but a healthy, collaborative community cannot.&lt;br&gt;
The Apache Way ensures openness, transparency, and consensus-driven development—proven principles behind many successful projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 4: DolphinScheduler Community Development
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q10: What are the key milestones in DolphinScheduler’s growth?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Three major turning points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Donation to Apache&lt;/li&gt;
&lt;li&gt;Graduation from incubation&lt;/li&gt;
&lt;li&gt;Globalization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These milestones transformed it into a globally recognized project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q11: How do you see its positioning and future?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: DolphinScheduler is evolving into a next-generation cloud-native workflow orchestration platform, connecting the full data lifecycle.&lt;br&gt;
Its future lies in integrating with modern data stacks and becoming essential for data engineers worldwide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q12: What are your future plans as an ASF Member?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Three directions: Deepening, Expanding, and Passing On.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deepening: continue contributing to core tech and governance&lt;/li&gt;
&lt;li&gt;Expanding: engage in more Apache projects&lt;/li&gt;
&lt;li&gt;Passing On: help more developers enter open source&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open source has given me a lot—I want to pass it forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5: Open Source Culture &amp;amp; Personal Growth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q13: How has open source changed you?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: It reshaped my definition of growth.&lt;br&gt;
Before, growth meant improving skills. Now, it means expanding impact—helping others grow.&lt;br&gt;
I’ve transformed from a solo problem-solver into a global collaborator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q14: How would you summarize the spirit of open source in one sentence?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Open source is a belief that sharing is more powerful than owning.&lt;/p&gt;

&lt;p&gt;That concludes our interview! If you found this inspiring, feel free to like, share, and spread the word so more people can discover valuable insights from the open source world 🏅&lt;/p&gt;

</description>
      <category>asf</category>
      <category>opensource</category>
      <category>apachedolphinscheduler</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>Part 6 | Enterprise Multi-Tenancy and Resource Isolation Techniques in DolphinScheduler You Might Not Know</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 27 Mar 2026 03:22:57 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-f4n</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-6-enterprise-multi-tenancy-and-resource-isolation-techniques-in-dolphinscheduler-you-might-f4n</guid>
      <description>&lt;p&gt;In Apache DolphinScheduler, multi-tenancy is not just an “auxiliary permission feature,” but the core execution model of the scheduling system. What it truly solves is not “who can use the system,” but:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Under what identity tasks run, what resources they consume, and how to prevent interference between them&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Only by understanding this can we grasp the essence of DolphinScheduler’s multi-tenant design.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Are Single-Tenant and Multi-Tenant?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;First, let’s clarify what single-tenant and multi-tenant mean.&lt;/p&gt;

&lt;p&gt;In enterprise scheduling platforms, how different teams or business units share platform resources is a fundamental design concern. &lt;strong&gt;Single-tenancy and multi-tenancy&lt;/strong&gt; are two common models, with clear differences in resource isolation, stability, and scalability. Understanding these differences helps organizations choose the right architecture for efficient and controllable scheduling.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwni70i5va5v2agogbq1k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwni70i5va5v2agogbq1k.jpg" width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;single-tenant&lt;/strong&gt; system serves only one team or business unit. All tasks share the same execution environment, resource pool, and permission system.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;multi-tenant&lt;/strong&gt; system, on the other hand, allows multiple teams to share one platform. Each team is logically isolated as an independent Tenant and mapped to underlying execution identities (Linux users), resource queues (YARN queues), or cloud-native namespaces (Kubernetes namespaces), enabling independent management of tasks and resources.&lt;/p&gt;

&lt;p&gt;Compared with single-tenancy, multi-tenancy provides significant advantages in resource isolation, stability, and scalability. While single-tenancy is simple to deploy and manage, resource contention and task interference become inevitable as the number of users grows. Multi-tenancy avoids this by clearly isolating Tenants and assigning dedicated resource pools per team or environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Core Mechanism: Tenant-Centric Execution Model&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To overcome the limitations of single-tenancy, Apache DolphinScheduler adopts a multi-tenant design.&lt;/p&gt;

&lt;p&gt;At the heart of this design is a single concept: &lt;strong&gt;Tenant&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, a Tenant is not just a logical label—it is an &lt;strong&gt;execution context container&lt;/strong&gt;. When a task is scheduled, the system determines three key aspects based on the Tenant:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Execution Identity
&lt;/h3&gt;

&lt;p&gt;Tasks do not run abstractly on Worker nodes; they must run as a specific OS user. A Tenant is bound to a Linux user, and tasks execute under that identity, inheriting file permissions and system-level isolation.&lt;/p&gt;

&lt;p&gt;Example: Executing tasks as a Linux user&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Switch to the Linux user corresponding to the Tenant
sudo su - team_alpha_user

# Execute workflow task
spark-submit --class com.example.Job /opt/jobs/job.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Tenant is bound to an OS user, and tasks run under this identity on Worker nodes, achieving file permission and environment isolation.&lt;/li&gt;
&lt;li&gt;Tip: Ensure each Tenant has an independent home directory to avoid unauthorized access.
### 2. Resource Ownership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When tasks are submitted to engines like Spark or Flink, they must enter a resource pool. The Tenant determines the target resource queue or namespace, ensuring controlled resource usage.&lt;/p&gt;

&lt;p&gt;Example: Create a Tenant and bind a YARN Queue&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST http://dolphinscheduler-api:12345/tenants \
  -H "Content-Type: application/json" \
  -d '{
        "name": "team_alpha",
        "queue": "team_alpha_queue",
        "description": "Team Alpha Tenant"
      }'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Each Tenant corresponds to a YARN Queue or K8s Namespace, ensuring exclusive resource usage.&lt;/li&gt;
&lt;li&gt;Tip: After creating a Tenant, remember to configure the queue or namespace in the resource scheduling system.
### 3. Isolation Boundary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tenant defines a clear boundary for data access, task execution, and resource usage, forming logical isolation between teams.&lt;/p&gt;

&lt;p&gt;Together, these three aspects form the foundation of DolphinScheduler’s multi-tenant mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How Resource Isolation Is Achieved&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy alone at the scheduling layer is not enough. The key design of DolphinScheduler is mapping Tenants to &lt;strong&gt;real underlying resource systems&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;YARN-Based Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In traditional big data architectures, Tenants are mapped to YARN queues. Each Tenant corresponds to a queue with defined capacity and limits. Tasks are submitted with queue information and scheduled accordingly, preventing resource contention.&lt;/p&gt;

&lt;p&gt;YARN Mapping Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Queue configuration

&amp;lt;queue name="team_alpha_queue"&amp;gt;
  &amp;lt;capacity&amp;gt;30&amp;lt;/capacity&amp;gt;
  &amp;lt;maximum-capacity&amp;gt;50&amp;lt;/maximum-capacity&amp;gt;
  &amp;lt;user-limit-factor&amp;gt;1.0&amp;lt;/user-limit-factor&amp;gt;
&amp;lt;/queue&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Tasks automatically enter the queue when submitted, avoiding resource conflicts between Tenants.&lt;/li&gt;
&lt;li&gt;Tip: Capacity and maximum capacity can be dynamically adjusted based on team workload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if one team submits a large number of tasks, it only consumes resources within its own queue.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Kubernetes-Based Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In cloud-native environments, Tenants are mapped to Kubernetes namespaces. Tasks run as Pods, and:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ResourceQuota&lt;/strong&gt; limits total resource usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LimitRange&lt;/strong&gt; restricts per-task resource consumption
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    cpu: "20"
    memory: "64Gi"
    pods: "50"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Description: Limits total resources and number of Pods to achieve cloud-native isolation.&lt;/li&gt;
&lt;li&gt;Tip: Combine with LimitRange to control per-task resource limits and prevent a single task from monopolizing resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach isolates not only resources but also runtime environments and networking.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;OS-Level Isolation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the execution layer, Linux users provide the final isolation boundary. Even on the same machine, tasks from different Tenants cannot access each other’s files or scripts.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;End-to-End Execution Flow&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Putting everything together, the execution flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A workflow is triggered in DolphinScheduler&lt;/li&gt;
&lt;li&gt;The system determines the Tenant&lt;/li&gt;
&lt;li&gt;The Master assigns tasks to Workers&lt;/li&gt;
&lt;li&gt;Workers switch to the corresponding Linux user&lt;/li&gt;
&lt;li&gt;Tasks are submitted with resource metadata (YARN queue / K8s namespace)&lt;/li&gt;
&lt;li&gt;Tasks run within the assigned resource pool under defined limits&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsd2mrke5qbm1g0xhrtu.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgsd2mrke5qbm1g0xhrtu.jpg" width="791" height="326"&gt;&lt;/a&gt;&lt;br&gt;
This creates full isolation from scheduling logic to resource execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Technical Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The architecture can be understood in three layers:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmr03er3ohwfjkz947e3.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmr03er3ohwfjkz947e3.jpg" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top Layer&lt;/strong&gt;: DolphinScheduler (Tenant / Workflow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middle Layer&lt;/strong&gt;: Mapping (Linux User / YARN Queue / K8s Namespace)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottom Layer&lt;/strong&gt;: Resource systems (Compute nodes / Big data clusters / Kubernetes clusters)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key idea is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The scheduling layer does not directly manage resources—it controls them through Tenant mapping&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why This Design Works in Enterprises&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This design becomes especially powerful in enterprise environments.&lt;/p&gt;

&lt;p&gt;When multiple teams share a platform, resource contention is inevitable. Without Tenant-to-resource mapping, a high-load workload could impact the entire system. With proper isolation, each team operates within its own boundaries.&lt;/p&gt;

&lt;p&gt;It also simplifies troubleshooting. Issues can be traced to a specific Tenant and then to its corresponding resource pool, without affecting the entire system.&lt;/p&gt;

&lt;p&gt;Most importantly, the design is highly scalable. Adding new teams or integrating new compute engines only requires extending Tenant mappings, without redesigning the scheduling system.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Summary&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;DolphinScheduler’s multi-tenant design is essentially a way to &lt;strong&gt;embed the scheduling system into the resource ecosystem&lt;/strong&gt;. Instead of relying on complex logic, it leverages operating systems, resource schedulers, and container platforms to build a stable, clear, and controllable execution model.&lt;/p&gt;

&lt;p&gt;For engineers, the real focus is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How to create a Tenant”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;but rather:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“How to map Tenants to resources effectively to achieve true isolation and stability”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the core value of multi-tenant design.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous articles:
&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/codex/part-5-what-happens-when-tasks-fail-e0ba3c38a3dc" rel="noopener noreferrer"&gt;Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next article preview:
Part 7 | Where Are the Performance Bottlenecks in Scheduling Platforms?&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dolphinscheduler</category>
      <category>opensource</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>Apache SeaTunnel 2.3.13 Major Release! Top 10 Features You Should Know</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:35:27 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-seatunnel-2313-major-release-top-10-features-you-should-know-j02</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-seatunnel-2313-major-release-top-10-features-you-should-know-j02</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqif2qqdenxyzo3u7zwsg.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
Apache SeaTunnel community officially released &lt;strong&gt;version 2.3.13&lt;/strong&gt;! This release is a milestone for Apache SeaTunnel, bringing important features such as &lt;strong&gt;Checkpoint API, Flink engine upgrade, large file parallel processing, multi-table sync, AI Embedding Transform, and richer connector extensions&lt;/strong&gt;. Whether for batch processing or real-time CDC syncing to Lakehouse, SeaTunnel can now support your data integration tasks more efficiently, stably, and intelligently.&lt;/p&gt;

&lt;p&gt;Thanks to &lt;strong&gt;50+ community contributors&lt;/strong&gt;, this release includes &lt;strong&gt;100+ PRs&lt;/strong&gt; of new features, optimizations, and bug fixes. If you are building &lt;strong&gt;data warehouses, real-time sync platforms, or AI data pipelines&lt;/strong&gt;, this release is worth your attention.&lt;/p&gt;

&lt;p&gt;No time to read the full Release Notes? No worries, here are the &lt;strong&gt;Top 10 features of this release&lt;/strong&gt; with PR references for your reference.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Release Note: &lt;a href="https://github.com/apache/seatunnel/releases/tag/2.3.13" rel="noopener noreferrer"&gt;https://github.com/apache/seatunnel/releases/tag/2.3.13&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  01 New Checkpoint API Enhances Task Fault Tolerance
&lt;/h2&gt;

&lt;p&gt;In data sync tasks, checkpoints are one of the core mechanisms to ensure task reliability. SeaTunnel 2.3.13 introduces &lt;strong&gt;Checkpoint API&lt;/strong&gt; (#10065), making task state management more flexible and providing a solid foundation for future scheduling and operation capabilities. The Zeta engine supports &lt;strong&gt;min-pause configuration&lt;/strong&gt; (#9804) to avoid system pressure caused by frequent checkpoints.&lt;/p&gt;

&lt;p&gt;Monitoring has also been enhanced, such as adding Sink commit metrics and calculating commit rate (#10233), returning PendingJobs information in the task overview interface (#9902), and providing REST API to view the Pending queue (#10078).&lt;/p&gt;

&lt;p&gt;These capabilities help users better understand task execution status and optimize checkpoint strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  02 Flink 1.20.1 Support and Enhanced CDC
&lt;/h2&gt;

&lt;p&gt;On the engine side, this version improves Apache Flink support. SeaTunnel now supports &lt;strong&gt;Flink 1.20.1&lt;/strong&gt; (#9576), and CDC sync capabilities have been enhanced. CDC Source now supports &lt;strong&gt;Schema Evolution&lt;/strong&gt; (#9867), automatically adapting sync tasks to source table structure changes.&lt;/p&gt;

&lt;p&gt;Additionally, NO_CDC Source also supports checkpoints (#10094), improving task recovery. These changes make SeaTunnel more stable in scenarios with frequent database schema changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  03 Large File Parallel Reading Significantly Improved
&lt;/h2&gt;

&lt;p&gt;In real data platforms, large amounts of data often exist as files, such as HDFS, object storage, or local file systems.&lt;/p&gt;

&lt;p&gt;This release significantly optimizes file processing performance. HDFS File Connector supports true large file parallel splitting (#10332), LocalFile Connector supports CSV, Text, JSON large file parallel reading (#10142), and Parquet files now support Logical Split (#10239).&lt;/p&gt;

&lt;p&gt;HDFS File also supports multi-table reading (#9816). These improvements significantly increase throughput for TB-scale file processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  04 File Connector Adds Update Sync Mode
&lt;/h2&gt;

&lt;p&gt;Previously, file sync tasks only supported append or overwrite. In this version, multiple file connectors add &lt;strong&gt;sync_mode=update&lt;/strong&gt;, including FTP, SFTP, and LocalFile Source (#10437), and HdfsFile Source (#10268). This allows file sync tasks to support update semantics, better fitting incremental data processing scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  05 Connector Ecosystem Expansion
&lt;/h2&gt;

&lt;p&gt;SeaTunnel 2.3.13 continues to expand and enhance the connector ecosystem. For analytical databases, it adds DuckDB Source and Sink support (#10285), suitable for local analysis and data exploration.&lt;/p&gt;

&lt;p&gt;New or enhanced connectors include Apache HugeGraph Sink (#10002), AWS DSQL Sink (#9739), Lance Dataset Sink (#9894), IoTDB 2.x Source and Sink (#9872).&lt;/p&gt;

&lt;p&gt;Existing connectors have also been improved: PostgreSQL supports TIMESTAMP_TZ (#10048), Hive Sink supports SchemaSaveMode and DataSaveMode (#9743), MongoDB Sink supports multi-table writing and adds SaveMode (#9958 / #9883).&lt;/p&gt;

&lt;p&gt;These updates significantly improve SeaTunnel’s adaptability in database and Lakehouse scenarios and the efficiency of building data pipelines.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Connector&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Feature Highlights&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Analytical DB&lt;/td&gt;
&lt;td&gt;DuckDB&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Read and write data from DuckDB, suitable for local analysis and exploration&lt;/td&gt;
&lt;td&gt;#10285&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph DB&lt;/td&gt;
&lt;td&gt;Apache HugeGraph&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into HugeGraph&lt;/td&gt;
&lt;td&gt;#10002&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL Lakehouse&lt;/td&gt;
&lt;td&gt;AWS DSQL&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into AWS DSQL&lt;/td&gt;
&lt;td&gt;#9739&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File/Dataset&lt;/td&gt;
&lt;td&gt;Lance Dataset&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Write data into Lance Dataset&lt;/td&gt;
&lt;td&gt;#9894&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time Series DB&lt;/td&gt;
&lt;td&gt;IoTDB 2.x&lt;/td&gt;
&lt;td&gt;Source/Sink&lt;/td&gt;
&lt;td&gt;Add IoTDB 2.x source and sink support&lt;/td&gt;
&lt;td&gt;#9872&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Relational DB&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Source&lt;/td&gt;
&lt;td&gt;Support TIMESTAMP_TZ type&lt;/td&gt;
&lt;td&gt;#10048&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Warehouse&lt;/td&gt;
&lt;td&gt;Hive&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support SchemaSaveMode and DataSaveMode&lt;/td&gt;
&lt;td&gt;#9743&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document DB&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Sink&lt;/td&gt;
&lt;td&gt;Support multi-table write and new SaveMode&lt;/td&gt;
&lt;td&gt;#9958 / #9883&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  06 Kafka Supports Protobuf Schema Registry
&lt;/h2&gt;

&lt;p&gt;In real-time scenarios, Kafka often uses Schema Registry. This release adds &lt;strong&gt;Protobuf Schema Registry Wire Format support&lt;/strong&gt; (#10183) to Kafka Connector, allowing SeaTunnel to directly parse Protobuf data managed via Schema Registry, making real-time pipeline construction easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  07 New AI Embedding Transform
&lt;/h2&gt;

&lt;p&gt;With AI and data engineering integration, more companies need vector data pipelines.&lt;/p&gt;

&lt;p&gt;SeaTunnel adds &lt;strong&gt;Multimodal Embedding Transform&lt;/strong&gt; (#9673) in the Transform component, generating vector data directly in pipelines for vector databases, RAG systems, and AI retrieval applications. &lt;strong&gt;RegexExtract Transform&lt;/strong&gt; (#9829) further enhances data cleaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  08 Markdown Parser Supports RAG Scenarios
&lt;/h2&gt;

&lt;p&gt;Markdown documents are common in AI data preparation. This release adds &lt;strong&gt;Markdown Parser&lt;/strong&gt; (#9760) and related documentation (#9834) for parsing and structuring Markdown, facilitating RAG pipeline construction.&lt;/p&gt;

&lt;h2&gt;
  
  
  09 Stability and Performance Improvements
&lt;/h2&gt;

&lt;p&gt;This release includes numerous stability and performance optimizations, such as ClickHouse Connector parallel read strategy (#9801), MySQL Connector shard calculation (#9975), JSON parsing for nested structures (#10000), Zeta engine task metrics (#9833), and more.&lt;/p&gt;

&lt;p&gt;It also fixes production issues like Zeta engine memory leak on task cancellation (#10315), ClickHouse ThreadLocal memory leak (#10264), MongoDB multi-task submit (#10116), HBase Source scan exception (#10287), Hive Sink init failure (#10331), etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  10 Bug Fixes and Documentation Updates
&lt;/h2&gt;

&lt;p&gt;Fixes include CDC Snapshot Split null pointer (#10404), ClickHouse memory leak (#10264), MongoDB multi-task submit (#10064, #10116), HBase scan exceptions (#10336, #10287), JDBC schema merge overflow (#10387, #9942, #10093), Hive Sink overwrite semantics (#10279, #9823, #9743), Elasticsearch Sink task exit issue (#10038), and other Connector, Transform, Engine, UI, CI fixes (#10422, #10013, etc.).&lt;/p&gt;

&lt;p&gt;Documentation improvements include SeaTunnel MCP &amp;amp; x2SeaTunnel docs (#10108), connector config examples (#10283, #10250, #10241, #10202), multi-table sync examples (#10241), upgrade incompatibility notes (#10068), and doc structure optimizations (#10262, #10395, #10351, #10420, #10438, #10424, #10109, #10382, #10385), helping new users get started and developers better understand architecture and features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thanks to Contributors ❤️
&lt;/h2&gt;

&lt;p&gt;Special thanks to release manager @xiaochen-zhou for strong support in planning and execution. Thanks to all volunteers; your efforts keep the SeaTunnel community growing!&lt;/p&gt;

&lt;p&gt;Adam Wang, AzkabanWarden.Gf, Bo Schuster, cloud456, CloverDew, corgy-w, CosmosNi, Cyanty, David Zollo, dotfive-star, dy102, dyp12, Frui Guo, Jarvis, Jast, Jeremy, JeremyXin, Jia Fan, Joonseo Lee, krutoileshii, 老王, Leon Yoah, Li Dongxu, LiJie20190102, limin, LimJiaWenBrenda, liucongjy, loupipalien, mengxpgogogo-eng, misi, 巧克力黑, shfshihuafeng, silenceland, Sim Chou, Steven Zhao, wanmingshi, wtybxqm, yzeng1618, zhan7236, zhangdonghao, zhuxt2015, zy&lt;/p&gt;

&lt;h2&gt;
  
  
  Download &amp;amp; Try
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Download: &lt;a href="https://seatunnel.apache.org/download" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/download&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Upgrade Guide: &lt;a href="https://seatunnel.apache.org/docs/upgrade-guide" rel="noopener noreferrer"&gt;https://seatunnel.apache.org/docs/upgrade-guide&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Upgrade Note&lt;/strong&gt;: If you are on &lt;strong&gt;SeaTunnel 2.3.x&lt;/strong&gt;, upgrading to 2.3.13 is generally safe as it focuses on feature enhancement and stability. Back up config files and test in staging. For tasks using checkpoints, stop tasks and confirm state consistency to avoid checkpoint conflicts. Check connector config changes (Hive, MongoDB, Kafka). If using Flink engine, consider upgrading to Flink 1.20.x for better compatibility and CDC support.&lt;/p&gt;

</description>
      <category>apacheseatunnel</category>
      <category>release</category>
      <category>datascience</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Airflow Is Overkill for Most Teams-Here’s a Better Option</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 07:32:35 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/airflow-is-overkill-for-most-teams-heres-a-better-option-342h</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/airflow-is-overkill-for-most-teams-heres-a-better-option-342h</guid>
      <description>&lt;p&gt;Last year, when our team was selecting a data platform, my boss directly said:&lt;strong&gt;“Airflow is too heavy. The operational cost is too high. Find a lighter alternative.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To be honest, I was a bit overwhelmed at the time. Airflow is indeed heavy. There are a lot of Python dependencies, and the Celery Executor also requires Redis or RabbitMQ. Once the scale grows a bit, you basically need to use Kubernetes.&lt;/p&gt;

&lt;p&gt;But our data team only has a few people. Asking them to maintain crontab scripts? That would be going backwards.&lt;/p&gt;

&lt;p&gt;Later, after browsing GitHub, I found DolphinScheduler in the Apache Incubator. It has 14.1K stars, is under the Apache 2.0 license, and was open-sourced by a Chinese company (Analysys). Now it has graduated and become a top-level Apache project.&lt;/p&gt;

&lt;p&gt;After trying it out, I found that this thing really has something special.&lt;/p&gt;

&lt;h2&gt;
  
  
  Low-Code Drag-and-Drop, You Can Get Things Done Without Writing YAML
&lt;/h2&gt;

&lt;p&gt;Everyone understands Airflow’s DAG configuration: workflows are written in Python code. It’s flexible, but data analysts can’t understand it.&lt;/p&gt;

&lt;p&gt;DolphinScheduler directly provides you with a visual drag-and-drop interface. You can configure task dependencies just by clicking and dragging with your mouse.&lt;/p&gt;

&lt;p&gt;It supports more than 30 task types: Shell, SQL, Spark, Flink, HTTP, DataX, Python… basically covering all common tasks in big data scenarios.&lt;/p&gt;

&lt;p&gt;Want to run a Hive SQL? Drag a SQL node, configure the data source and script, connect upstream dependencies, done. No need to write a single line of Python, and no need to deal with BashOperator or SparkSubmitOperator.&lt;/p&gt;

&lt;p&gt;This is much more friendly to non-developer roles. Data analysts can configure workflows themselves, without coming to you every day asking you to write DAGs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decentralized High Availability, No Dependence on ZooKeeper
&lt;/h2&gt;

&lt;p&gt;Everyone knows Airflow’s architecture. The Scheduler is a single point. Although it later supports multi-Scheduler HA, it still relies on database locks to ensure tasks are not scheduled repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6mja8di8nya2hrnubew.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz6mja8di8nya2hrnubew.png" alt="DS去中心化架构" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DolphinScheduler was designed with decentralization from the very beginning. The architecture is very clear, with five core components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Server: the entry point for frontend interaction, including workflow configuration and user permission management&lt;/li&gt;
&lt;li&gt;Master Server: DAG parsing and task distribution; multiple Masters can be deployed, and each can work independently&lt;/li&gt;
&lt;li&gt;Worker Server: task execution nodes that receive tasks from Master and return results&lt;/li&gt;
&lt;li&gt;Alert Server: alert notifications, supporting email, DingTalk, WeCom, Feishu, and more&lt;/li&gt;
&lt;li&gt;Registry: registry center responsible for service discovery and distributed locks, supporting three options: JDBC, ZooKeeper, and Etcd&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s focus on the Master’s decentralized design.&lt;/p&gt;

&lt;p&gt;There is no master-slave relationship between multiple Masters. After starting, each Master registers itself to the Registry, and then competes for tasks using a slot partitioning algorithm.&lt;/p&gt;

&lt;p&gt;How is the partitioning done? It uses modulo on ID:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Command ID % total number of Masters = the slot of the current Master&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, if you have 3 Masters, and the Command ID is 1001, then it will be assigned to slot 2 (1001 % 3 = 2, slots start from 0).&lt;/p&gt;

&lt;p&gt;If one Master goes down, its slot will be taken over by other Masters, and tasks will not be lost.&lt;/p&gt;

&lt;p&gt;This design is much simpler than Airflow’s Scheduler HA. It does not require complex leader election logic, and Masters can scale horizontally at any time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use JDBC as Registry, Say Goodbye to ZooKeeper Dependency
&lt;/h2&gt;

&lt;p&gt;In the past, when building distributed scheduling systems, you couldn’t avoid ZooKeeper. Early versions of Airflow also relied on ZK. Later it switched to database locks, but there are still performance bottlenecks.&lt;/p&gt;

&lt;p&gt;DolphinScheduler supports three types of registries: JDBC, ZooKeeper, and Etcd.&lt;/p&gt;

&lt;p&gt;The official recommendation is to use JDBC. You can directly reuse your business database (MySQL or PostgreSQL), without deploying additional ZK or Etcd clusters.&lt;/p&gt;

&lt;p&gt;For small and medium-sized teams, maintaining one less component means reducing cost and improving efficiency.&lt;/p&gt;

&lt;p&gt;Of course, if you already have a ZK cluster, or have extremely high performance requirements (tens of thousands of concurrent scheduling tasks), you can still choose ZK or Etcd.&lt;/p&gt;

&lt;h2&gt;
  
  
  Task Dispatch Mechanism: Active Push Instead of Pull
&lt;/h2&gt;

&lt;p&gt;Airflow’s Celery Executor is a typical task queue model. The Scheduler puts tasks into a Redis queue, and Workers pull them themselves.&lt;/p&gt;

&lt;p&gt;This approach is flexible, but when the queue gets backlogged, it becomes troublesome.&lt;/p&gt;

&lt;p&gt;DolphinScheduler uses active push. After the Master parses the DAG, it directly pushes tasks to Workers via Netty RPC.&lt;/p&gt;

&lt;p&gt;Workers do not need to poll. The Master tells them exactly what to do.&lt;/p&gt;

&lt;p&gt;During task allocation, load balancing is performed. By default, it uses dynamic weighted round-robin, considering CPU, memory, and thread pool usage of Workers, and assigning tasks to nodes with lower load.&lt;/p&gt;

&lt;p&gt;If a Worker is about to be overloaded, the Master will automatically schedule tasks to other nodes.&lt;/p&gt;

&lt;p&gt;The advantage of this push mechanism is low scheduling latency. The Master can grasp Worker status in real time, and tasks will not sit in the queue for dozens of seconds waiting to be consumed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Plugin-Based Architecture, Replace Anything You Want
&lt;/h2&gt;

&lt;p&gt;DolphinScheduler’s plugin system is quite thorough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task plugins: more than 30 built-in task types, and you can write your own plugins&lt;/li&gt;
&lt;li&gt;Alert plugins: email, DingTalk, WeCom, Feishu, Telegram; if not enough, implement the Alert Plugin interface yourself&lt;/li&gt;
&lt;li&gt;Data source plugins: MySQL, PostgreSQL, Hive, Spark SQL, ClickHouse… supporting hundreds of data sources&lt;/li&gt;
&lt;li&gt;Storage plugins: task logs and resource files can be stored locally, on HDFS, S3, or OSS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Want to switch an alert channel? Write a plugin, package it into a JAR, drop it in, restart the service—done.&lt;/p&gt;

&lt;p&gt;No need to modify source code, and maintenance cost is low.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flexible Deployment, One-Click Experience with Docker
&lt;/h2&gt;

&lt;p&gt;The official provides four deployment methods:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standalone: single-machine mode, for development and testing, can run with one command&lt;/li&gt;
&lt;li&gt;Cluster: cluster mode, standard for production, manually deploy each component&lt;/li&gt;
&lt;li&gt;Docker: start a complete environment with one click, suitable for quick experience&lt;/li&gt;
&lt;li&gt;Kubernetes: deploy with Helm Chart, preferred for cloud-native teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to try quickly, just use Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose -f docker/docker-compose.yaml up -d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the containers start, open your browser at:&lt;br&gt;
&lt;a href="http://localhost:12345/dolphinscheduler" rel="noopener noreferrer"&gt;http://localhost:12345/dolphinscheduler&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Default account: admin / dolphinscheduler123&lt;/p&gt;

&lt;p&gt;Drag a Shell task and try it—you can run a workflow in a few minutes.&lt;/p&gt;

&lt;p&gt;For production deployment, it is recommended to have at least 3 Masters plus several Workers. Use MySQL master-slave or PostgreSQL for the database, and choose JDBC as the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlights of Version 3.4.0
&lt;/h2&gt;

&lt;p&gt;The 3.4.0 version released at the end of last year mainly optimized several points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task priority queue: high-priority tasks can jump the queue instead of waiting&lt;/li&gt;
&lt;li&gt;Dynamic resource allocation: Workers can dynamically adjust thread pool size based on task type&lt;/li&gt;
&lt;li&gt;Workflow version management: DAG changes automatically save history versions, supporting one-click rollback&lt;/li&gt;
&lt;li&gt;Enhanced lineage analysis: visualization of upstream and downstream dependencies of data tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most practical one is the task priority queue. Previously, when inserting urgent tasks, you had to manually pause other tasks to free resources. Now you just assign a high priority label, and the scheduler will handle it automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kind of Teams Is It Suitable For?
&lt;/h2&gt;

&lt;p&gt;That said, after talking about so many advantages, it’s only fair to discuss where it actually fits.&lt;/p&gt;

&lt;p&gt;Suitable teams for DolphinScheduler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data teams with fewer than 10 people and limited operational resources&lt;/li&gt;
&lt;li&gt;Tasks mainly based on offline batch processing, such as ETL, data synchronization, reporting scheduling&lt;/li&gt;
&lt;li&gt;Need for a low-code platform so that analysts and business users can configure workflows&lt;/li&gt;
&lt;li&gt;Already using MySQL/PostgreSQL and do not want to deploy ZooKeeper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not very suitable scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mainly real-time streaming tasks (although Flink is supported, scheduling granularity is still batch-oriented)&lt;/li&gt;
&lt;li&gt;Heavy reliance on Python ecosystem with highly customized workflow logic (Airflow is more flexible)&lt;/li&gt;
&lt;li&gt;Extremely large task volume with tens of thousands of concurrent scheduling tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Overall, DolphinScheduler’s positioning is a &lt;strong&gt;user-friendly, stable, and lightweight data scheduling platform&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It doesn’t have as many fancy features as Airflow, but all the core capabilities are there, and the maintenance cost is much lower.&lt;/p&gt;

&lt;p&gt;After our team migrated from Airflow to DolphinScheduler, the cluster size was reduced from 5 nodes to 3 nodes, and operational manpower was cut by half.&lt;/p&gt;

&lt;p&gt;Now data analysts can configure workflows themselves, and no longer need to urge me every day to write DAGs.&lt;/p&gt;

&lt;p&gt;There is no absolute good or bad scheduling tool. The one that fits your team is the best.&lt;/p&gt;

&lt;p&gt;If you are also looking for an alternative to Airflow, you might want to try DolphinScheduler—it might be exactly what you need.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>apachedolphinschedu</category>
      <category>opensource</category>
      <category>tooling</category>
    </item>
    <item>
      <title>See You in Beijing This August! CFP for Community Over Code Asia 2026 Is Now Open</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 20 Mar 2026 07:09:11 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/see-you-in-beijing-this-august-cfp-for-community-over-code-asia-2026-is-now-open-epd</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/see-you-in-beijing-this-august-cfp-for-community-over-code-asia-2026-is-now-open-epd</guid>
      <description>&lt;p&gt;Community Over Code Asia 2026 will take place from &lt;strong&gt;August 7–9, 2026 in Beijing&lt;/strong&gt;, and the Call for Proposals (CFP) is now officially open.&lt;/p&gt;

&lt;p&gt;Developers, Apache Committers, open-source contributors, technology leaders, and practitioners from around the world will gather in Beijing to explore the latest practices in AI, cloud-native technologies, big data, open-source community governance, and the broader Apache ecosystem.&lt;/p&gt;

&lt;p&gt;If you are contributing to an open-source project or using the Apache technology stack in production, this is the perfect opportunity to share your experience and step onto a global stage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi9uo2s8xui2ryg5d5m6.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi9uo2s8xui2ryg5d5m6.jpg" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conference Info
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; August 7 – August 9, 2026&lt;br&gt;
&lt;strong&gt;Location:&lt;/strong&gt; Zhongguancun National Innovation Demonstration Zone Conference Center, Beijing&lt;/p&gt;

&lt;h2&gt;
  
  
  19 Tracks Covering Key Areas of the Apache Ecosystem
&lt;/h2&gt;

&lt;p&gt;This year’s conference will run for three days and feature &lt;strong&gt;19 technical tracks&lt;/strong&gt;, showcasing the latest technical breakthroughs in Apache projects and innovative practices from the Apache Incubator.&lt;/p&gt;

&lt;p&gt;The conference invites developers, technical experts, and open-source contributors worldwide to submit proposals and share insights into Apache projects, cutting-edge technologies, and open-source collaboration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68n2m52946mub46rf7lr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68n2m52946mub46rf7lr.jpg" width="702" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Submit to the DataOps Track!
&lt;/h2&gt;

&lt;p&gt;If you have hands-on experience using Apache DolphinScheduler, optimization practices, or deep insights into new features, you are welcome to submit a talk to the DataOps Track and share your experience with the global community.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conference website:&lt;/strong&gt; &lt;a href="https://asia.communityovercode.org/" rel="noopener noreferrer"&gt;https://asia.communityovercode.org/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submit your proposal now:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0fqbot1wgsinkhirlx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fss0fqbot1wgsinkhirlx.jpg" width="197" height="185"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submission deadline:&lt;/strong&gt; April 21, 2026, 23:59 (Beijing Time, UTC+8)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Submission language:&lt;/strong&gt; Please submit proposals in English. Presentations can be delivered in either Chinese or English.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Makes Community Over Code Asia 2026 Special?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Curated by top Apache experts&lt;/strong&gt;&lt;br&gt;
Each track is led by experienced contributors from the Apache Software Foundation who carefully curate high-quality sessions focusing on real technical innovation and open collaboration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A one-stop event for Apache ecosystem trends&lt;/strong&gt;&lt;br&gt;
From Agentic Coding and AI Infrastructure to Data + AI and Streaming, the conference covers the most important topics in modern open-source development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connect with global open-source leaders&lt;/strong&gt;&lt;br&gt;
Meet Apache Committers, foundation members, and open-source contributors face-to-face. Exchange ideas, grow your network, and explore the spirit of “The Apache Way”.&lt;/p&gt;

&lt;p&gt;Open source is more than code — it’s a way of collaboration and a culture of innovation. Whether you are an experienced Apache Committer or someone who just submitted your first pull request, Community Over Code Asia 2026 welcomes your voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;See you in Beijing this August.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>cfp</category>
      <category>apachedolphinscheduler</category>
      <category>opensource</category>
      <category>communityovercodeasia</category>
    </item>
    <item>
      <title>Part 5 | What Happens When Tasks Fail? A Complete Guide to Retry and Backfill in Apache DolphinScheduler</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 13 Mar 2026 08:26:12 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/part-5-what-happens-when-tasks-fail-a-complete-guide-to-retry-and-backfill-in-apache-45c4</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/part-5-what-happens-when-tasks-fail-a-complete-guide-to-retry-and-backfill-in-apache-45c4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ot1ed5ar9y7bcs8x5k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ot1ed5ar9y7bcs8x5k.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is the &lt;strong&gt;fifth installment of the series “Understanding Apache DolphinScheduler: From Scheduling Principles to DataOps Practices.”&lt;/strong&gt; Using Apache DolphinScheduler as an example, it explains failure retry, manual rerun, and backfill mechanisms in scheduling systems, clarifies the meaning of Exactly Once semantics in scheduling, and summarizes common misuse scenarios and best practices to help build a stable and reliable data scheduling system.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the daily operation of data platforms, task failures are almost inevitable. Network fluctuations, insufficient resources, downstream dependency failures, and code bugs can all cause scheduled tasks to fail. When failures occur, many teams rely on &lt;strong&gt;automatic retries, manual reruns, or backfill operations&lt;/strong&gt; to recover the data pipeline.&lt;br&gt;
However, an often overlooked fact is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Failure retry, manual rerun, and backfill in scheduling systems actually have completely different semantics.&lt;/strong&gt;&lt;br&gt;
If these differences are not clearly understood, it can easily lead to &lt;strong&gt;duplicate data, data misalignment, or even data corruption&lt;/strong&gt;. This article analyzes the design mechanisms of Apache DolphinScheduler to explain three of the most common but frequently misunderstood capabilities in scheduling systems: &lt;strong&gt;failure retry, manual rerun, and backfill&lt;/strong&gt;, and further explores the &lt;strong&gt;real meaning of “Exactly Once” in scheduling systems&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;
  
  
  1 Failure Retry vs Manual Rerun: Two Completely Different Recovery Mechanisms
&lt;/h1&gt;

&lt;p&gt;In scheduling systems, failed tasks are usually recovered in two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Automatic Retry&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual Rerun&lt;/strong&gt;
Many people assume the only difference between them is how they are triggered. In reality, they are fundamentally different in terms of &lt;strong&gt;execution semantics&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  1 Automatic Retry: Re-execution within the Same Instance
&lt;/h2&gt;

&lt;p&gt;In Apache DolphinScheduler, every schedule generates a &lt;strong&gt;Workflow Instance&lt;/strong&gt;, which contains multiple &lt;strong&gt;Task Instances&lt;/strong&gt;.&lt;br&gt;
When a task fails and &lt;strong&gt;Retry Times&lt;/strong&gt; is configured, the system automatically retries the task within the same task instance.&lt;br&gt;
Its characteristics include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Belongs to the same workflow instance&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keeps the same Schedule Time&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dependency relationships remain unchanged&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only the failed task is re-executed&lt;/strong&gt;
Execution flow illustration:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design goal of automatic retry is to handle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Transient failures&lt;/strong&gt;&lt;br&gt;
For example:&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Network fluctuations&lt;/li&gt;
&lt;li&gt;Temporary resource shortages&lt;/li&gt;
&lt;li&gt;Short-term unavailability of external systems
In such cases, automatic retry can usually restore the task quickly.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  2 Manual Rerun: Creating a New Instance
&lt;/h2&gt;

&lt;p&gt;Unlike automatic retry, a &lt;strong&gt;manual rerun creates a new instance&lt;/strong&gt;.&lt;br&gt;
In Apache DolphinScheduler, users can choose to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rerun failed nodes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rerun from the current node&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rerun the entire workflow from the beginning&lt;/strong&gt;
In these scenarios, the system generates a new &lt;strong&gt;Workflow Instance&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means two instances &lt;strong&gt;may process data for the same logical time&lt;/strong&gt;, and downstream tasks &lt;strong&gt;may write data repeatedly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If tasks are not &lt;strong&gt;idempotent&lt;/strong&gt;, this may lead to &lt;strong&gt;duplicate data issues&lt;/strong&gt;.&lt;/p&gt;
&lt;h1&gt;
  
  
  2 Backfill and Data Recovery: Reconstructing Time in Scheduling Systems
&lt;/h1&gt;

&lt;p&gt;In data warehouse scenarios, &lt;strong&gt;backfill&lt;/strong&gt; is a very common operation. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backfilling historical data after creating a new task&lt;/li&gt;
&lt;li&gt;Rerunning tasks for days when execution failed&lt;/li&gt;
&lt;li&gt;Filling missing data due to upstream delays
In Apache DolphinScheduler, backfill is typically performed using &lt;strong&gt;Backfill Run&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  1 The Nature of Backfill: Creating Multiple Historical Instances
&lt;/h2&gt;

&lt;p&gt;Assume a task runs &lt;strong&gt;daily&lt;/strong&gt;.&lt;br&gt;
Backfill range:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2025-03-01 → 2025-03-05
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system will create multiple instances:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instance (2025-03-01)
Instance (2025-03-02)
Instance (2025-03-03)
Instance (2025-03-04)
Instance (2025-03-05)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each instance has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Independent execution status&lt;/li&gt;
&lt;li&gt;Independent dependency relationships&lt;/li&gt;
&lt;li&gt;Independent parameters
The schedule time will be set to the corresponding historical time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2 The Key to Backfill: Schedule Time vs Execution Time
&lt;/h2&gt;

&lt;p&gt;In scheduling systems, two concepts are extremely important.&lt;br&gt;
&lt;strong&gt;Schedule Time&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Logical data time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Execution Time&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Actual task runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Schedule Time : 2025-03-01
Execution Time: 2025-03-10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the SQL uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WHERE dt = ${schedule_time}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Backfill is safe.&lt;br&gt;
But if the SQL uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WHERE dt = today()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Backfill will produce &lt;strong&gt;incorrect data&lt;/strong&gt;.&lt;br&gt;
This is also the root cause of many data quality issues.&lt;/p&gt;
&lt;h1&gt;
  
  
  3 Exactly Once in Scheduling Systems: What Does It Really Mean?
&lt;/h1&gt;

&lt;p&gt;In stream processing systems such as Apache Flink, Exactly Once usually means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Each record is processed only once.&lt;br&gt;
However, in scheduling systems, &lt;strong&gt;Exactly Once has a completely different meaning&lt;/strong&gt;.&lt;br&gt;
A scheduling system cannot guarantee that tasks will not run multiple times, nor can it guarantee that data will not be written repeatedly. This is because automatic retries may re-execute tasks, manual reruns may re-execute tasks, and backfill may rerun historical logic.&lt;br&gt;
Therefore, in scheduling systems, &lt;strong&gt;Exactly Once is closer to the idea that:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Only one logical instance is generated for the same schedule time.&lt;/strong&gt;&lt;br&gt;
But the task itself may still run multiple times.&lt;br&gt;
Thus, true Exactly Once semantics must be guaranteed by &lt;strong&gt;idempotent task logic&lt;/strong&gt;.&lt;br&gt;
Common implementations include:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  1 Overwrite Write
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT OVERWRITE TABLE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  2 Partition-based Writing
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;partition dt='${schedule_time}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3 Deduplicated Writing
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MERGE INTO
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  4 Common Misuse Scenarios
&lt;/h1&gt;

&lt;p&gt;Many data incidents actually stem from misunderstandings of scheduling semantics.&lt;/p&gt;
&lt;h2&gt;
  
  
  1 Using Current Time as Data Date
&lt;/h2&gt;

&lt;p&gt;Incorrect example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt = today()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correct approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dt = ${schedule_time}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2 Non-idempotent Writes
&lt;/h2&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INSERT INTO table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the task is rerun:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;duplicate data will occur
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3 Manually Rerunning the Entire Workflow
&lt;/h2&gt;

&lt;p&gt;Many users habitually do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Failure → rerun from the beginning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But a safer approach is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rerun only the failed nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  5 Best Practice Recommendations
&lt;/h1&gt;

&lt;p&gt;Based on experience using Apache DolphinScheduler, several important practices can be summarized.&lt;/p&gt;

&lt;h3&gt;
  
  
  1 Tasks Must Be Designed to Be Idempotent
&lt;/h3&gt;

&lt;p&gt;All tasks should allow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;repeated execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;without affecting data correctness.&lt;/p&gt;

&lt;h3&gt;
  
  
  2 Data Logic Must Be Based on Schedule Time
&lt;/h3&gt;

&lt;p&gt;Avoid using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;now()
today()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;${schedule_time}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3 Use Retry Strategies Appropriately
&lt;/h3&gt;

&lt;p&gt;Recommended configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retry Times: 1~3
Retry Interval: 1~5 min
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Avoid infinite retries.&lt;/p&gt;

&lt;h3&gt;
  
  
  4 Control Concurrency During Backfill
&lt;/h3&gt;

&lt;p&gt;If the backfill range is too large:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a large number of instances may be generated at once
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which may cause:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scheduling queue congestion&lt;/li&gt;
&lt;li&gt;cluster resource exhaustion
Recommendation:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;perform backfill in batches
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;In data platforms, scheduling systems are often regarded as simple “task triggers.” In reality, they are responsible for &lt;strong&gt;time management, dependency control, and failure recovery&lt;/strong&gt;.&lt;br&gt;
By understanding the true semantics of &lt;strong&gt;failure retry, manual rerun, and backfill&lt;/strong&gt;, we can build &lt;strong&gt;stable and reliable data production systems&lt;/strong&gt;.&lt;br&gt;
Modern scheduling systems, such as Apache DolphinScheduler, already provide powerful mechanisms. However, the ultimate factor determining data quality is still:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Correct understanding of scheduling semantics + idempotent data task design.&lt;/strong&gt;&lt;br&gt;
Only in this way can data platforms remain &lt;strong&gt;recoverable, traceable, and reconstructable&lt;/strong&gt; even when failures occur.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Previous articles:&lt;br&gt;
&lt;a href="https://medium.com/codex/part-1-a-scheduler-is-more-than-just-a-timer-4503be32a187?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 1 | Scheduling Systems Are More Than Just “Timers”&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-2-the-core-abstraction-model-of-apache-dolphinscheduler-ac28ecac83f5?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 2 | The Core Abstraction Model of Apache DolphinScheduler&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/codex/part-3-how-does-scheduling-actually-start-running-773580dbc5e5" rel="noopener noreferrer"&gt;Part 3 | How Scheduling Actually Runs&lt;/a&gt;&lt;br&gt;
&lt;a href="https://medium.com/@ApacheDolphinScheduler/part-4-why-state-machines-power-reliable-scheduling-systems-35d00b8307bf?source=your_stories_outbox---writer_outbox_published-----------------------------------------" rel="noopener noreferrer"&gt;Part 4 | The State Machine: The Real Soul of Scheduling Systems&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next article preview:&lt;br&gt;
Part 6 | Multi-Tenant and Resource Isolation Design in Apache DolphinScheduler&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>apachedolphinscheduler</category>
      <category>ai</category>
      <category>datascience</category>
      <category>programming</category>
    </item>
    <item>
      <title>Apache DolphinScheduler 3.4.1 Released with Task Dispatch Timeout Detection</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Fri, 13 Mar 2026 08:06:35 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-341-released-with-task-dispatch-timeout-detection-3l0l</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-341-released-with-task-dispatch-timeout-detection-3l0l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ia1426s8ss2jsv1x1wy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ia1426s8ss2jsv1x1wy.jpg" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;3.4.1 version&lt;/strong&gt; of Apache DolphinScheduler has been officially released by the community. As a maintenance release in the &lt;strong&gt;3.4.x series&lt;/strong&gt;, this update focuses on &lt;strong&gt;improving scheduling stability, enhancing task execution control, and fixing system issues&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The new version introduces a &lt;strong&gt;task dispatch timeout detection mechanism&lt;/strong&gt; and &lt;strong&gt;maximum runtime control for tasks&lt;/strong&gt;, while also resolving multiple issues in scheduling logic, plugin functionality, and API behavior. In addition, system documentation, development processes, and project structure have been further optimized.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For more details, see the Release Note:
&lt;a href="https://github.com/apache/dolphinscheduler/releases/tag/3.4.1" rel="noopener noreferrer"&gt;https://github.com/apache/dolphinscheduler/releases/tag/3.4.1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Source code download:
&lt;a href="https://dolphinscheduler.apache.org/zh-cn/download/3.4.1" rel="noopener noreferrer"&gt;https://dolphinscheduler.apache.org/zh-cn/download/3.4.1&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Key Highlights
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Task Dispatch Timeout Detection Mechanism
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;task dispatch timeout checking logic&lt;/strong&gt; has been added to the Master scheduling module. When a task is dispatched to a Worker for execution, if the &lt;strong&gt;Worker Group does not exist or no Worker nodes are available&lt;/strong&gt;, the scheduler can detect the dispatch exception within a certain period and handle it accordingly.&lt;/p&gt;

&lt;p&gt;This mechanism prevents tasks from remaining in a waiting state for an extended time and improves the system’s fault tolerance in scenarios involving resource anomalies (#17795, #17796).&lt;/p&gt;

&lt;h2&gt;
  
  
  Support for Configuring Maximum Runtime for Workflow and Task Instances
&lt;/h2&gt;

&lt;p&gt;The new version allows users to configure a &lt;strong&gt;maximum runtime&lt;/strong&gt; for both &lt;strong&gt;Workflow Instances&lt;/strong&gt; and &lt;strong&gt;Task Instances&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Users can define the maximum execution duration for tasks or workflows. If the runtime exceeds the configured threshold, the system can trigger timeout handling mechanisms, preventing tasks from hanging or occupying resources indefinitely and improving overall operational controllability (#17931, #17932).&lt;/p&gt;

&lt;h1&gt;
  
  
  Key Fixes and Improvements
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Scheduling System Stability Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;task timeout alerts were not triggered&lt;/strong&gt; (#17820, #17818)&lt;/li&gt;
&lt;li&gt;Fixed the issue where the &lt;strong&gt;workflow failure strategy did not take effect&lt;/strong&gt; (#17834, #17851)&lt;/li&gt;
&lt;li&gt;Automatically mark a task as failed when &lt;strong&gt;task execution context initialization fails&lt;/strong&gt; (#17758, #17821)&lt;/li&gt;
&lt;li&gt;Fixed incorrect &lt;strong&gt;parallelism calculation in backfill tasks under parallel execution mode&lt;/strong&gt; (#17831, #17853)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Database and Compatibility Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed SQL execution errors for &lt;strong&gt;dependent tasks in PostgreSQL environments&lt;/strong&gt; (#17690, #17837)&lt;/li&gt;
&lt;li&gt;Fixed mismatched &lt;strong&gt;INT/BIGINT column types in database tables&lt;/strong&gt; (#17979, #17988)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  API and Permission Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Removed the &lt;code&gt;WAIT_TO_RUN&lt;/code&gt; state and added a &lt;strong&gt;FAILOVER state&lt;/strong&gt; when querying workflow instances (#17838, #17839)&lt;/li&gt;
&lt;li&gt;Added &lt;strong&gt;tenant validation&lt;/strong&gt; for the Workflow API (#17969, #17970)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;non-admin users could not delete their own Access Tokens&lt;/strong&gt; (#17995, #17997)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Plugin and Task Execution Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fixed incorrect &lt;strong&gt;JVM parameter position in Java Task&lt;/strong&gt; (#17848, #17850)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;Procedure Task parameters could not be passed correctly&lt;/strong&gt; (#17967, #17968)&lt;/li&gt;
&lt;li&gt;Fixed the issue where &lt;strong&gt;ProcedureTask could not return parameters or execute query stored procedures&lt;/strong&gt; (#17971, #17973)&lt;/li&gt;
&lt;li&gt;Fixed an issue where the &lt;strong&gt;HTTP plugin could not send nested JSON structures&lt;/strong&gt; (#17912, #17911)&lt;/li&gt;
&lt;li&gt;Fixed inconsistent &lt;strong&gt;timeout units in the HTTP alert plugin&lt;/strong&gt; (#17915, #17920)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  UI and Documentation Fixes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Removed the &lt;strong&gt;STOP state&lt;/strong&gt; from task instances in the UI (#17864, #17865)&lt;/li&gt;
&lt;li&gt;Fixed an issue where &lt;strong&gt;locks were not released when workflow definition list loading failed&lt;/strong&gt; (#17984, #17989)&lt;/li&gt;
&lt;li&gt;Fixed the &lt;strong&gt;Keycloak login icon 404 issue&lt;/strong&gt; (#18006, #18007)&lt;/li&gt;
&lt;li&gt;Corrected errors in the &lt;strong&gt;installation documentation&lt;/strong&gt; (#17901, #17903)&lt;/li&gt;
&lt;li&gt;Fixed a &lt;strong&gt;SeaTunnel documentation link 404 issue&lt;/strong&gt; (#17904, #17905)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  In-Depth Feature Analysis
&lt;/h1&gt;

&lt;p&gt;In modern data platform architectures, scheduling systems often serve as key infrastructure connecting various computing engines. Tasks from systems such as Apache Spark, Apache Flink, and Apache Hive are commonly orchestrated through a unified scheduler.&lt;/p&gt;

&lt;p&gt;However, in production environments, scheduling systems often face challenges such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Worker resource anomalies preventing tasks from being scheduled&lt;/li&gt;
&lt;li&gt;Uncontrollable task execution time&lt;/li&gt;
&lt;li&gt;Unstable plugin execution behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The newly introduced &lt;strong&gt;task dispatch timeout detection mechanism&lt;/strong&gt; enables the scheduler to quickly identify anomalies when Workers do not exist or resources are unavailable, preventing tasks from waiting indefinitely (#17795, #17796).&lt;/p&gt;

&lt;p&gt;At the same time, the &lt;strong&gt;maximum runtime control capability&lt;/strong&gt; provides a more flexible management approach for task execution. By setting a maximum runtime for workflows or tasks, the system can take action when tasks hang or run abnormally long, preventing resources from being occupied for extended periods (#17931, #17932).&lt;/p&gt;

&lt;p&gt;These improvements further enhance DolphinScheduler’s &lt;strong&gt;stability and controllability in production-grade data platform environments&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Acknowledgements
&lt;/h1&gt;

&lt;p&gt;The release of &lt;strong&gt;Apache DolphinScheduler 3.4.1&lt;/strong&gt; would not have been possible without the contributions of community developers. Special thanks to the release manager &lt;strong&gt;@ruanwenjun&lt;/strong&gt; and the following contributors for their work on this version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SbloodyS&lt;/li&gt;
&lt;li&gt;njnu-seafish&lt;/li&gt;
&lt;li&gt;Mrhs121&lt;/li&gt;
&lt;li&gt;ylq5126&lt;/li&gt;
&lt;li&gt;qiong-zhou&lt;/li&gt;
&lt;li&gt;XpengCen&lt;/li&gt;
&lt;li&gt;iampratap7997-dot&lt;/li&gt;
&lt;li&gt;yzeng1618&lt;/li&gt;
&lt;li&gt;Alexander1902&lt;/li&gt;
&lt;li&gt;maomao199691&lt;/li&gt;
&lt;li&gt;asadjan4611&lt;/li&gt;
&lt;li&gt;dill21yu&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Apache DolphinScheduler 3.4.1&lt;/strong&gt; is a maintenance release focused on &lt;strong&gt;improving scheduling stability and enhancing task runtime control&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;With the introduction of scheduling fault-tolerance mechanisms, maximum task runtime control, and numerous bug fixes, this version further strengthens the system’s reliability in production environments.&lt;/p&gt;

&lt;p&gt;As the community continues to grow, Apache DolphinScheduler is steadily improving its capabilities in the data workflow orchestration space, providing enterprises with a more stable and efficient infrastructure for building modern data platforms. We welcome more contributors to join the community and help drive the development of the project forward.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>apachedolphinscheduler</category>
      <category>opensource</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Apache DolphinScheduler February 2026: What the Community Shipped</title>
      <dc:creator>Chen Debra</dc:creator>
      <pubDate>Thu, 05 Mar 2026 09:41:59 +0000</pubDate>
      <link>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-february-2026-what-the-community-shipped-4598</link>
      <guid>https://dev.to/chen_debra_3060b21d12b1b0/apache-dolphinscheduler-february-2026-what-the-community-shipped-4598</guid>
      <description>&lt;p&gt;In February 2026, the Apache DolphinScheduler community maintained an active development pace. This month’s work mainly focused on improving system stability, enhancing existing features, and optimizing code quality. Community members made significant contributions in bug fixing, user experience improvements, documentation updates, and advancing important architectural decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Support for Configurable Maximum Runtime of Workflow/Task Instances
&lt;/h3&gt;

&lt;p&gt;One of the most important features introduced this month is support for configurable maximum runtime for workflow and task instances (Feature-17931). Users can now set a maximum runtime for a workflow or an individual task. When the instance exceeds this time limit, the system will automatically handle it (for example, marking it as failed or canceling it). This provides stronger guarantees for resource management and preventing runaway tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Master Node Dispatch Timeout Check Logic
&lt;/h3&gt;

&lt;p&gt;To improve system robustness, timeout checking logic for task dispatch has been added to the Master node (Improvement-17795). When a Worker group does not exist or no available Workers are present, this feature can handle dispatch timeout scenarios, preventing tasks from remaining in a waiting state for a long time and improving scheduling reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Proposal to Remove the Import/Export Feature
&lt;/h3&gt;

&lt;p&gt;The community is discussing an important improvement proposal (DSIP-104) that suggests removing the import and export functionality from the project (DSIP-104). This usually indicates that the community is considering adopting more modern and reliable ways to manage and migrate workflows, such as through GitOps or other version-control-friendly approaches. This is an architectural evolution worth watching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixes and Improvements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  UI/UX
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fixed the issue where the KeyCloak icon returned a 404 error (Fix-18006).&lt;/li&gt;
&lt;li&gt;Improved the validation logic for Spark parameters, enhancing the experience when configuring Spark tasks (Improvement-17957).&lt;/li&gt;
&lt;li&gt;Fixed an issue where the workflow definition list loading lock was not released when a request failed (Fix-17984).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  API and Backend
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Stored Procedure Task Enhancements: This month the community focused on improving and fixing the Procedure task type. Issues with parameter passing not working (Fix-17967) and local parameters not being passed correctly (Fix-17971) were resolved, improving the stability of this task type.&lt;/li&gt;
&lt;li&gt;Fixed a permission issue where non-admin users could not delete their own access tokens (Fix-17995).&lt;/li&gt;
&lt;li&gt;Fixed missing tenant validation in workflows, strengthening multi-tenant security (Fix-17969).&lt;/li&gt;
&lt;li&gt;Fixed inconsistent timeout unit settings in the HTTP alert plugin (Fix-17915).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Database
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fixed the mismatch between the INT and BIGINT types of the workflow_definition_code field in the t_ds_serial_command table (Fix-17979), ensuring database stability and data consistency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Other Improvements
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Added support for creating Worker groups without Workers, providing more flexible resource configuration (Improvement-17926).&lt;/li&gt;
&lt;li&gt;Hardened the startup scripts and parameter handling for SeaTunnel tasks (Improvement-17994).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Community and Ecosystem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Community members fixed several spelling and wording issues in multiple README files (Doc).&lt;/li&gt;
&lt;li&gt;Added a section on frontend code checks in the development documentation to help new contributors better follow project standards (Doc-17913).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Code Quality and Refactoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Upgraded the Zookeeper dependency version to 3.8.3 (Chore).&lt;/li&gt;
&lt;li&gt;Upgraded the testcontainer dependency version to 1.21.4 to resolve Docker environment issues in CI (Chore).&lt;/li&gt;
&lt;li&gt;Refactored the datasource plugin manager and processor manager to improve code structure (Chore).&lt;/li&gt;
&lt;li&gt;Refactored Kubernetes task code by moving the generateK8sTaskExecutionContext method into the more specific K8sTaskParameters class, making code responsibilities clearer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Community Governance and Continuous Integration (CI)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Added AI usage confirmation to the PR template, reflecting the community’s attention to code contribution quality and originality (Chore).&lt;/li&gt;
&lt;li&gt;Updated the CI configuration so that when new commits are pushed to a PR, previous review comments automatically become outdated. This helps ensure that code reviews are always based on the latest changes, improving collaboration efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Acknowledgments to Contributors
&lt;/h2&gt;

&lt;p&gt;Thanks to all community members who contributed to Apache DolphinScheduler in February (in no particular order):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wenjun Ruan&lt;/li&gt;
&lt;li&gt;xiangzihao&lt;/li&gt;
&lt;li&gt;yzeng1618&lt;/li&gt;
&lt;li&gt;Divyansh Pratap Singh&lt;/li&gt;
&lt;li&gt;dill&lt;/li&gt;
&lt;li&gt;Muhammad Asad&lt;/li&gt;
&lt;li&gt;huangsheng&lt;/li&gt;
&lt;li&gt;XpengCen&lt;/li&gt;
&lt;li&gt;njnu-seafish&lt;/li&gt;
&lt;li&gt;maomao_zero&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Special thanks to Wenjun Ruan, who was very active in February and contributed numerous fixes, improvements, and code refactoring to the community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Outlook
&lt;/h2&gt;

&lt;p&gt;From the February updates, it is clear that the Apache DolphinScheduler community continues to move steadily toward becoming more stable, more user-friendly, and more powerful. In the coming months, the community is expected to continue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously improving stability: Bug fixes and system improvements will remain a top priority.&lt;/li&gt;
&lt;li&gt;Advancing architectural optimization: As seen in discussions about the import/export feature, the community will keep exploring and implementing better architectural solutions.&lt;/li&gt;
&lt;li&gt;Focusing on user experience: Ongoing UI/UX improvements will provide users with a better operational experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to all developers who contributed to the DolphinScheduler community.&lt;/p&gt;

&lt;p&gt;Note: The references in parentheses (for example, Fix-18006) correspond to Issue or Pull Request numbers on GitHub for DolphinScheduler, allowing readers to find more detailed information.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>apachedolphinscheduler</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
