Chen Debra

Posted on Dec 26, 2025

How a Leading Smart Manufacturing Company in Shenzhen Deploys Dozens of Factories in One Day with Apache DolphinScheduler

#programming #apachedolphinscheduler #smartfactory #bigdata

In today’s global wave of digital transformation, intelligent manufacturing has become the core engine driving high-quality development in the manufacturing industry. However, on the path toward intelligence, enterprises face numerous challenges: data silos across multiple systems, complex scheduling dependencies, delayed monitoring and alerting, and many other issues that emerge one after another.

At a recent Apache DolphinScheduler online user meetup, the community invited Qiu Zhongbiao, a Senior Software Engineer from a large intelligent manufacturing enterprise in Shenzhen, to share practical experience on how Apache DolphinScheduler is applied in real-world intelligent manufacturing scenarios within the company.
This article organizes and presents the core content of his talk, offering an in-depth look at how this enterprise achieved a qualitative leap in its scheduling platform through Apache DolphinScheduler.

About the Author

Qiu Zhongbiao is a Senior Software Engineer at a large intelligent manufacturing enterprise in Shenzhen. He focuses on data technology research and practice in the intelligent manufacturing domain and is dedicated to driving digital transformation in the manufacturing industry.

The Era of Intelligent Manufacturing

With the deep advancement of Industry 4.0, intelligent manufacturing has become a focal point of global manufacturing competition. Intelligent manufacturing maturity models are typically divided into multiple levels from low to high. Enterprises must gradually enhance automation, digitalization, and networking capabilities, ultimately achieving intelligent production.

Throughout this process, data becomes a core production factor. How to efficiently, reliably, and stably collect, process, and schedule data has become a major challenge faced by every manufacturing enterprise.

Modern manufacturing enterprises operate in increasingly complex data environments. On one hand, they rely on numerous business systems, including MES (Manufacturing Execution Systems), ERP (Enterprise Resource Planning), WMS (Warehouse Management Systems), WCS (Warehouse Control Systems), CRM (Customer Relationship Management), QMS (Quality Management Systems), PLM (Product Lifecycle Management), SCM (Supply Chain Management), APS (Advanced Planning and Scheduling), and more.

Data interactions among these systems are often implemented through hard-coded integrations, resulting in intricate inter-system relationships, high maintenance costs, poor scalability, and difficulties in troubleshooting.

On the other hand, enterprises also face complex network environments, such as corporate production networks, internal factory networks, and international or domestic leased-line networks. Data collection, transmission, and scheduling requirements vary significantly across these environments, making unified management and task isolation a major challenge.

The Dilemma of Traditional Data Processing Approaches

As intelligent manufacturing advances data-driven initiatives, enterprises encounter multidimensional pain points.

First, there are foundational barriers caused by data diversity: device protocols are highly heterogeneous, ranging from proprietary protocols such as PLM and S7 to general protocols like MQTT. Data formats vary as well, including binary and semi-structured data. Combined with differences across vendors and production lines, this makes standardization extremely difficult.

On top of that, cross-system and cross-factory data collaboration becomes particularly challenging. Data pipelines span devices, multiple systems, and geographically distributed factories. Network environments mix intranets, dedicated lines, and public networks. Meanwhile, business scenarios such as production scheduling and capacity calculation impose strict real-time data requirements, further increasing complexity.

At the same time, data visualization and traceability capabilities are insufficient. Traditional systems cannot intuitively display data flow nodes. Logs are stored in a scattered manner, making exception diagnosis inefficient. Building a complete traceability system requires significant additional manpower.

Finally, data collection quality lacks effective guarantees. Various exceptions related to networks and devices occur frequently, while anomaly detection is delayed and manual recovery is inefficient. When failures occur during multi-system interactions, root cause analysis often relies on deep familiarity with the entire data chain, further undermining data reliability.

Apache DolphinScheduler as the Solution

To address these challenges, Apache DolphinScheduler provides a comprehensive solution. As a distributed, highly scalable, and visual workflow scheduling platform, it has demonstrated strong capabilities in manufacturing scenarios.

Worker Node Grouping: Adapting to Multi-Network Environments

To cope with complex manufacturing network environments, Apache DolphinScheduler introduces flexible isolation strategies through Worker node grouping. Worker nodes are grouped by network environment—such as corporate production network Workers, factory internal network Workers, and international/domestic leased-line Workers—and further divided by business type, including PLC device data collection, production data processing, and quality data analysis.

This design enables task isolation across different network environments and business scenarios, ensuring data collection security and reliability. It effectively supports critical use cases such as production data lake ingestion, customer data feedback, and cross-network data synchronization.

Data Collection

In terms of data collection, Apache DolphinScheduler builds a complete data processing pipeline.
The data source layer includes IoT devices (such as sensors, heartbeat data, status monitoring, and operational data), business systems (MES, WMS, APS, SAP, and other databases), agent probes, and user-uploaded data.

For processing, DataX is used for offline data synchronization, Flink for real-time stream processing, and Kafka as a message queue buffer. All data is eventually ingested into a unified data lake to support BI analysis and AI applications. Through Apache DolphinScheduler’s unified orchestration, enterprises can manage the entire lifecycle from data collection and processing to downstream applications.

Data Interaction

In traditional architectures, systems interact with each other in a point-to-point manner, resulting in highly complex relationships. After introducing Apache DolphinScheduler, all data interactions are centralized through a scheduling hub.

This enables centralized management of all data interaction tasks, visual monitoring of execution status, unified exception handling and alerting mechanisms, while also reducing coupling between systems and significantly improving reliability.

Template-Based Data Collection and Distribution Across Multiple Factories

For manufacturing enterprises operating multiple factories, Apache DolphinScheduler offers a template-based solution.
How can homogeneous systems—such as unified MES/WMS deployments or identical PLC device types—be rolled out rapidly?

By solidifying core workflows (task list reading, parameter injection, collection/distribution execution, completion or exception marking) into reusable templates, and combining them with task configuration tables (including data source settings, SQL statements, source/target system IDs, dedicated parameters, and checkpoint settings), enterprises achieve a flexible model of “common templates with customized parameters.”

This approach delivers significant advantages:

Parameterized configuration allows core workflows to be fixed as templates, while factory-specific parameters (IP addresses, accounts, paths) are configured independently.
Batch deployment capabilities enable dozens of factories to be deployed within a single day, dramatically improving efficiency.
A unified iteration mechanism ensures that template updates are automatically synchronized to all factories without manual adjustments.
Flexible extensibility supports template version management, allowing customized derivatives for specific factories (e.g., additional data fields).
Cross-scenario support accommodates both “multi-factory data collection to headquarters” and “headquarters data distribution to multiple factories” (such as unified production plan delivery).

A Qualitative Leap: From Manual Workshops to Industrial Assembly Lines

After introducing Apache DolphinScheduler, the enterprise achieved a qualitative transformation in data processing.

Dimension	Traditional Coding	Apache DolphinScheduler
Development Efficiency	Requires database connection, exception handling, repeated module coding; high labor input	Drag-and-drop configuration, built-in 10+ components, complete development in minutes
Dependency Management	Difficult to manage complex dependencies (e.g., blood relationship, priority); error-prone	Visual DAG operation interface
Monitoring & Alerting	Manual monitoring/log review; delayed fault discovery	Automatic monitoring, real-time task status tracking, real-time alerts (DingTalk, etc. for enterprises)
Fault Recovery	Manual code modification, tedious rollback/recovery	One-click restart/stop, built-in automatic retry function
Resource Scheduling	Lack of management; easy to cause single-machine CPU/memory overload; high distributed resource waste	Distributed & centralized resource management; fast dynamic scheduling via visual means

Traditional coding approaches require extensive development of data connections, exception handling, and retry logic, resulting in heavy manpower investment. In contrast, Apache DolphinScheduler offers drag-and-drop configuration and dozens of built-in plugins, enabling development to be completed in minutes.

Dependency management improvements are equally significant. Traditional methods struggle with complex cross-system scheduling and require careful handling of idempotency and consistency, making them error-prone. Apache DolphinScheduler’s visual DAG makes dependency orchestration intuitive and efficient.

Monitoring and alerting capabilities see the most dramatic improvement. Traditional approaches rely on custom scripts or manual log inspection, leading to delayed issue detection. Apache DolphinScheduler provides built-in monitoring with real-time task status and logs, and integrates with enterprise messaging tools such as WeChat, DingTalk, and email for alerts.

Fault tolerance and recovery are also enhanced. Traditional methods require manual code or script modifications and complex backfill logic, while Apache DolphinScheduler offers one-click reruns, stop controls, and built-in automatic retries.

Resource scheduling capabilities are likewise strengthened. Traditional setups lack unified management, often leading to CPU or memory exhaustion and downtime. Apache DolphinScheduler adopts a distributed, decentralized cluster architecture, enabling rapid dynamic scaling through monitoring and fine-grained resource management.

These improvements generate tangible value at multiple levels.

At the development level, drag-and-drop workflows lower technical barriers, parameter automation boosts efficiency, second-level log定位 accelerates troubleshooting, and overall operations costs are significantly reduced.

At the business level, visual monitoring provides full transparency, multi-channel alerts ensure timely response, flexible backfill strategies handle diverse exceptions, and unified cross-system coordination reduces excessive reliance on developers.

At the decision-making level, depersonalization turns knowledge into organizational assets resilient to staff turnover. Comprehensive audit trails meet compliance requirements. Centralized database configuration reduces security risks. Transparent workflows enable optimization, while quantified resource usage supports data-driven decision-making. Together, these benefits form a solid foundation for digital transformation.

Practical Outcomes and Future Outlook

Through Apache DolphinScheduler, this intelligent manufacturing enterprise achieved significant improvements across multiple dimensions: higher development efficiency, shorter deployment cycles, sharply reduced operations costs and manpower investment, and a substantial increase in task success rates.

The platform supports rapid expansion, enabling new factories to be deployed within one day, while realizing standardized processes, transparent management, and data-driven decision-making.

Looking ahead, as intelligent manufacturing continues to evolve, data scheduling will play an increasingly critical role. As an open-source project, Apache DolphinScheduler will continue to evolve in several directions:
AI enablement for intelligent scheduling and predictive maintenance, deeper cloud-native integration for elastic scalability, and ecosystem expansion through richer plugin coverage for more business scenarios.

Conclusion

On the journey toward intelligent manufacturing, data scheduling is not the destination—it is the starting point. Apache DolphinScheduler helps enterprises solve the “last mile” of data processing, enabling them to focus more on business innovation and value creation.

Digital transformation is a long and challenging road, but steady progress leads to success. May more manufacturing enterprises harness the power of open source and achieve a remarkable transformation from “manufacturing” to “intelligent manufacturing.”

DEV Community