DEV Community

Chen Debra
Chen Debra

Posted on

A Beginner-Friendly Guide to Apache DolphinScheduler: Hands-on Cloud-Native Job Scheduling

Why Do You Need DolphinScheduler?

1

3-Minute Quick Deployment (Beginner-Friendly)

Environment Preparation

Minimum requirements (for development):
JDK 8+  
MySQL 5.7+  
Zookeeper 3.8+
Enter fullscreen mode Exit fullscreen mode

One-Click Docker Startup (Recommended to Avoid Pitfalls)

docker run -d --name dolphinscheduler \  
-e DATABASE_TYPE=mysql \  
-e SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/ds?useUnicode=true&characterEncoding=UTF-8" \  
-e SPRING_DATASOURCE_USERNAME=root \  
-p 12345:12345 \  
apache/dolphinscheduler:3.2.0
Enter fullscreen mode Exit fullscreen mode

Core Concepts

Term Analogy in Reality Technical Definition
Workflow Factory Production Line A collection of DAG tasks
Task Production Process Execution unit such as Shell/SQL/Spark etc.
Instance Daily Production Batch A specific running instance of a workflow
Alert Factory Broadcasting System Configuration of failure alarm channels

Step-by-Step: Create Your First Workflow (With Code)

Scenario: Daily User Behavior Analysis

Step 1: Log in to the Console
http://localhost:12345/dolphinscheduler (Default account: admin / dolphinscheduler123)

Step 2: Create a Workflow

2

Step 3: Configure a Shell Task (Key Code)

#!/bin/bash
# Example of parameter injection
spark-submit \
  --master yarn \
  --name behavior_analysis_${sys_date} \  # System variable
  /opt/jobs/user_analysis.py ${begin_date} ${end_date}
Enter fullscreen mode Exit fullscreen mode

Step 4: Set a Scheduling Policy

0 2 * * *   # Executes daily at 2 AM (Quartz expression supported)
Enter fullscreen mode Exit fullscreen mode

Unlock Advanced Features (Beginner-Friendly)

  1. Parameter Passing (Cross-Task Value Transfer)
# In a Python node, retrieve upstream output
context.getUpstreamOutParam('uv_count')  
Enter fullscreen mode Exit fullscreen mode
  1. Automatic Retry on Failure
# Part of workflow definition
task_retry_interval: 300  # Retry every 5 minutes  
retry_times: 3            # Retry up to 3 times
Enter fullscreen mode Exit fullscreen mode
  1. Conditional Branching (Dynamic Routing)
# Determine if it's a weekend
if [ ${week} -gt 5 ]; then  
   echo "skip weekend processing"  
   exit 0  
fi
Enter fullscreen mode Exit fullscreen mode

Pitfall Prevention Tips (From Real-World Experience)

  1. Resource Misconfiguration: Spark out-of-memory → Adjust in conf/worker.properties
worker.worker.task.resource.limit=true  
worker.worker.task.memory.max=8g  # Adjust based on your cluster
Enter fullscreen mode Exit fullscreen mode
  1. Timezone Trap: Scheduled task delayed by 8 hours → Fix in common.properties
spring.jackson.time-zone=GMT+8  
Enter fullscreen mode Exit fullscreen mode

Efficiency Comparison (Convincing Metrics)

Metric Crontab Airflow DolphinScheduler
Visualization Level ⭐⭐⭐ ⭐⭐⭐⭐
High Availability Deployment ⭐⭐ ⭐⭐⭐⭐
Big Data Integration Degree ⭐⭐⭐ ⭐⭐⭐⭐⭐
Learning Curve ⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐

Final Thoughts

Apache DolphinScheduler is rapidly becoming the de facto standard in big data workflow scheduling. Its cloud-native architecture and user-friendly interface help developers escape the pain of managing complex task flows. For beginners, this guide is a great starting point to explore more advanced features like cross-cluster task dispatching and Kubernetes integration.

Top comments (0)