DEV Community

Chen Debra
Chen Debra

Posted on

Zero-Risk Operations! A Lossless Scaling Guide for DolphinScheduler in High Availability Architecture

Preface

As an open-source distributed task scheduling system, Apache DolphinScheduler often needs to scale up or down in actual production environments based on business demands. This article provides a detailed walkthrough of the scaling process—both expansion and reduction—of the DolphinScheduler cluster, helping operations teams safely and efficiently adjust cluster size.

Cluster Expansion Operations

1. Pre-expansion Preparation

Before performing the expansion, make sure of the following:

  • Node type to be added: Master or Worker
  • Number of nodes to be added
  • Whether the physical machine where the new node is located has the required services installed

Important Tip: A single physical machine should not run multiple Master or Worker service processes simultaneously.

2. Basic Environment Setup

2.1 Required Software Installation
All new nodes must install:

  • JDK 1.8+: JAVA_HOME environment variable must be configured
  • Basic tools: such as wget, tar, etc.

Optional for Worker nodes:

  • Hadoop/Hive/Spark clients (if corresponding task types are to be executed)

2.2 Obtain Installation Package

  1. Confirm the version of the existing cluster and download the same version of the installation package
  2. Determine a unified installation directory (e.g., /opt/dolphinscheduler)
  3. Download and extract the installation package to the target directory
  4. Add the database driver package (e.g., mysql-connector-java)
mkdir -p /opt
tar -zxvf apache-dolphinscheduler-<version>-bin.tar.gz -C /opt
mv /opt/apache-dolphinscheduler-<version>-bin /opt/dolphinscheduler
Enter fullscreen mode Exit fullscreen mode

3. System User Configuration

Create the deployment user and configure sudo privileges on all new nodes:

useradd dolphinscheduler
echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
echo 'dolphinscheduler ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers
Enter fullscreen mode Exit fullscreen mode

4. Configuration File Adjustments

4.1 Copy Configuration Files
Copy the conf directory from an existing node to the new node and double-check:

  • datasource.properties: Database connection info
  • zookeeper.properties: ZooKeeper connection info
  • common.properties: Resource storage configuration
  • dolphinscheduler_env.sh: Environment variables

4.2 Configure Environment Variables
Edit conf/env/dolphinscheduler_env.sh, sample configuration:

export HADOOP_HOME=/opt/soft/hadoop
export JAVA_HOME=/opt/soft/java
export PATH=$JAVA_HOME/bin:$PATH
Enter fullscreen mode Exit fullscreen mode

Create a symbolic link to Java:

sudo ln -s $JAVA_HOME/bin/java /usr/bin/java
Enter fullscreen mode Exit fullscreen mode

4.3 Update Cluster Configuration
Edit bin/env/install_env.sh on all nodes:

# Add Master node example
ips="ds1,ds2,ds3"
masters="master1,master2,ds1,ds2"

# Add Worker node example
workers="worker1:default,worker2:default,ds3:default"
Enter fullscreen mode Exit fullscreen mode

5. Permission Setup & Cluster Restart

Set directory permissions:

sudo chown -R dolphinscheduler:dolphinscheduler /opt/dolphinscheduler
Enter fullscreen mode Exit fullscreen mode

Restart the cluster:

# Stop all services
bin/stop-all.sh

# Start all services
bin/start-all.sh
Enter fullscreen mode Exit fullscreen mode

6. Expansion Verification

  1. Use jps command to check service processes
  2. Check log files on each node
  3. Confirm the status of new nodes via the Web UI Monitoring Center

Cluster Shrinking Operations

1. Pre-shrink Preparation

Clearly identify the node types and quantities to be removed, ensuring the operation will not affect existing task execution.

2. Shrinking Steps

2.1 Stop Services on Target Nodes
Run the following on the nodes to be removed:

# Stop Master service
bin/dolphinscheduler-daemon.sh stop master-server

# Stop Worker service
bin/dolphinscheduler-daemon.sh stop worker-server
Enter fullscreen mode Exit fullscreen mode

Use jps to confirm services have been stopped.

2.2 Update Cluster Configuration
Edit bin/env/install_env.sh on all nodes and remove the corresponding node configurations:

# Master shrink example
masters="master1,master2"  # Removed ds1, ds2

# Worker shrink example
workers="worker1:default,worker2:default"  # Removed ds3
Enter fullscreen mode Exit fullscreen mode

3. Post-shrink Check

  1. Confirm the remaining nodes are running properly
  2. Check whether task scheduling is affected
  3. Monitor system resource usage

Notes

  1. Version Consistency: Ensure all nodes use the same version of DolphinScheduler
  2. Config Synchronization: All nodes must have identical configuration files
  3. Service Dependencies: Worker nodes must install necessary clients for specific task types
  4. Resource Permissions: Ensure the deployment user has sufficient permissions for the resource storage system
  5. Operation Order: Always stop services before modifying configurations to avoid inconsistencies

By following these steps, you can safely scale your DolphinScheduler cluster up or down to flexibly meet changing business needs. It is recommended to perform these operations during off-peak business hours and to back up your system beforehand.

Top comments (0)