Chen Debra

Posted on Apr 17

How A Leading Manufacturing Enterprise in Shenzhen Deploys Apache DolphinScheduler Across Dozens of Factories Within One Day?

#ai #programming #apachedolphinscheduler #opensource

As the wave of digital transformation sweeps across the globe, intelligent manufacturing has become the core engine driving high-quality growth in the manufacturing industry.
However, on the path toward intelligence, enterprises are facing a wide range of challenges: data silos across multiple systems, complex scheduling dependencies, and delayed monitoring and alerting issues continue to emerge.

At a recent Apache DolphinScheduler online user meetup, the community invited Qiu Zhongbiao, a senior software engineer from a large intelligent manufacturing enterprise in Shenzhen.

During the session, he delivered a detailed sharing on the practical application of Apache DolphinScheduler in real manufacturing scenarios.

This article organizes the key content from that talk to explore how this enterprise achieved a qualitative leap in its scheduling platform with Apache DolphinScheduler.

About the Author
Qiu Zhongbiao is a senior software engineer at a large intelligent manufacturing enterprise in Shenzhen.

He focuses on data technology research and practice in the field of intelligent manufacturing.

He is dedicated to promoting the digital transformation of the manufacturing industry.

The Era of Intelligent Manufacturing

With the continuous advancement of Industry 4.0, intelligent manufacturing has become the focus of global competition in the manufacturing sector.

The maturity model of intelligent manufacturing is divided into multiple levels from low to high.

Enterprises need to progressively improve their capabilities in automation, digitalization, and networking, and ultimately achieve fully intelligent production.

In this process, data becomes a core production factor.

How to efficiently, stably, and reliably collect, process, and schedule this data has become a critical challenge faced by every manufacturing enterprise.

The data environment in modern manufacturing enterprises is becoming increasingly complex.
On one hand, enterprises operate a large number of business systems, including MES (Manufacturing Execution System), ERP (Enterprise Resource Planning), WMS (Warehouse Management System), WCS (Warehouse Control System), CRM (Customer Relationship Management), QMS (Quality Management System), PLM (Product Lifecycle Management), SCM (Supply Chain Management), and APS (Advanced Planning and Scheduling).

Data exchange between these systems is often implemented through hard-coded integrations.
This leads to highly complex inter-system relationships, high maintenance costs, poor scalability, and difficulty in troubleshooting.

On the other hand, enterprises also face complex network environments.

These include corporate production networks, factory internal networks, and international/domestic dedicated-line networks.

Different network environments have different requirements for data collection, transmission, and scheduling.

How to achieve unified management and task isolation under such conditions becomes a major challenge.

Challenges of Traditional Data Processing Approaches

In the process of promoting data-driven transformation in intelligent manufacturing, enterprises are facing pain points across multiple dimensions.

Category	Details
Data Diversity	1. Protocol complexity: device layer uses proprietary protocols such as PLM/S7, edge layer uses MQTT/COAP, and system layer uses REST/SOAP. 2. Data format heterogeneity: device data includes binary and hexadecimal formats, while database tables are often semi-structured formats such as JSON/XML. 3. Vendor differences: multiple vendors for robots and devices, with significant variations across production lines.
Cross-System / Cross-Factory Collaboration	1. Complex data links: involving devices, gateways, local systems, MES, SAP, ASP, WMS, and remote factories. 2. Mixed network environments: factory intranet, on-site servers, cross-factory dedicated lines, public network, and international network connections. 3. High real-time requirements: production scheduling, capacity planning, and other business functions demand strong timeliness.
Lack of Visualization & Traceability	1. Invisible data pipelines: traditional systems cannot visually display data processing flows. 2. Disconnected logs: data transmission between systems relies on manual logging, making it difficult to store and track complete logs across all nodes. 3. Difficult traceability: tracking data flow across systems requires manual effort and high labor costs.
Unreliable Data Collection Quality	1. Diverse anomalies: network failures, device errors, system exceptions, and duplicate data collection. 2. Delayed issue detection: multiple anomalies are often discovered only after they impact downstream systems, relying on manual intervention. 3. Difficult root cause analysis: multi-system interactions make it hard to locate faults, requiring full-chain understanding of data flows.

First is the foundational barrier caused by data diversity.

Device protocols are highly diverse, covering proprietary protocols such as PLM/S7 as well as general protocols like MQTT.

Data formats include binary data and semi-structured data.

Combined with differences among vendors and production lines, this makes it extremely difficult to standardize data.

On top of that, cross-system and cross-factory data collaboration is particularly challenging.

Data links involve multiple stages, including devices, various systems, and geographically distributed factories.

Network environments are mixed, including intranets, dedicated lines, and the public internet.

At the same time, business scenarios such as production scheduling and capacity calculation have very high requirements for real-time data.

All of these factors further increase the complexity of collaboration.

Meanwhile, data visualization and traceability capabilities are insufficient.
Traditional systems cannot intuitively present data flow nodes.
Logs are stored in a scattered manner, leading to inefficient troubleshooting.
Building a complete traceability system also requires significant manual effort.

Finally, the quality of data collection lacks guarantees.

Various anomalies frequently occur due to networks and devices.

Detection of these anomalies is often delayed.

Manual recovery is inefficient.

In multi-system interactions, fault localization still relies heavily on familiarity with the entire data pipeline.

All of these issues further impact data reliability.

The Apache DolphinScheduler Solution

In response to the above challenges, Apache DolphinScheduler provides a comprehensive solution.
As a distributed, highly extensible, and visual workflow scheduling platform, it demonstrates strong capabilities in manufacturing scenarios.

Worker Node Grouping: A Solution for Complex Network Environments

In terms of Worker node grouping, Apache DolphinScheduler provides a flexible isolation strategy tailored to complex network environments in manufacturing enterprises.

Worker nodes can be grouped by network environments, such as corporate production network Workers, factory internal network Workers, and international/domestic dedicated-line Workers.
They can also be grouped by business types, such as PLC device data collection, production data processing, and quality data analysis.

This enables task isolation across different network environments and business scenarios.
It ensures the security and reliability of data collection.

This solution effectively supports key application scenarios such as production data lake ingestion, customer data feedback, and cross-network data synchronization.

Data Collection

In terms of data collection, Apache DolphinScheduler builds a complete data processing pipeline.

The data source layer includes IoT devices, such as device sensors, heartbeat data, status monitoring, and device operation data.
It also includes business systems such as MES, WMS, ASP, and SAP databases.
In addition, it includes AGENT probes and user-uploaded data.

The processing layer uses DataX for offline data synchronization.
It uses Flink for real-time stream processing.
Kafka is used as a message queue buffer.

Finally, data is unified into a data lake.
This supports BI analysis and AI applications.

Through unified scheduling with Apache DolphinScheduler, enterprises can achieve end-to-end management from data collection to processing to application.

Data Interaction

In the traditional model, systems interact with each other in a point-to-point manner.

This leads to highly complex relationships between systems.

After introducing Apache DolphinScheduler, all data interactions are unified through the scheduling center.

This enables centralized management of all data interaction tasks.
It allows visual monitoring of task execution status.
It provides unified exception handling and alerting mechanisms.

At the same time, it reduces coupling between systems.
It improves the reliability of data interactions.

Template-Based Data Collection and Distribution Across Multiple Factories

For manufacturing enterprises with multiple factories, Apache DolphinScheduler provides a template-based solution.

For homogeneous systems, such as unified MES/WMS systems or the same types of PLC devices, how can rapid deployment be achieved?

The approach is to solidify core processes into reusable templates.
These processes include reading task lists, parameter injection, execution of data collection or distribution, and completion or exception marking.

At the same time, task configuration tables are introduced.
These include data source configurations, SQL statements, system IDs for distribution or collection, custom parameters, and checkpoint settings.

This enables a flexible model of “template standardization + parameter customization.”

This template-based solution brings several significant advantages.
First, parameterized configuration allows the core process to be standardized as a template, while factory-specific parameters such as IP addresses, accounts, and paths are configured separately.
Second, batch deployment capability allows enterprises to complete deployment across dozens of factories within one day, greatly improving efficiency.
Third, a unified iteration mechanism ensures that when templates are updated, all factories are automatically synchronized without the need for manual adjustments.
Fourth, flexible extensibility supports template version management, allowing customized templates to be derived for different factories based on a base template.
For example, some factories may require additional data fields.
Fifth, cross-scenario support enables both “multi-factory data collection to headquarters” and “headquarters data distribution to multiple factories,” such as unified production plan distribution.

A Qualitative Leap: From Manual Workshop to Industrial Pipeline

After introducing Apache DolphinScheduler, the enterprise achieved a qualitative leap in data processing.

Dimension	Traditional Coding	Apache DolphinScheduler
Development Efficiency	Requires writing data processing logic, exception handling, retry logic, etc.; high human effort	Drag-and-drop configuration, built-in components and plugins, development completed in stages
Dependency Management	Difficult to handle complex task dependencies; prone to issues such as missing or inconsistent dependencies	Visual DAG-based workflow orchestration
Monitoring & Alerting	Requires custom development of monitoring or logging, leading to lagging issue detection	Built-in monitoring, real-time task execution status, logs, and alert notifications
Fault Tolerance & Retry	Requires manual modification of code/scripts; complex recovery process	One-click retry/stop; built-in fault-tolerant retry mechanisms
Resource Scheduling	Lacks unified management; prone to CPU/memory contention and uneven resource allocation	Distributed, centralized resource management; dynamic scaling via integration with compute engines

In the traditional approach, developers needed to write code for data connections, exception handling, and retry logic modules.
This required significant human effort.

In contrast, Apache DolphinScheduler uses a drag-and-drop configuration approach.
It comes with numerous built-in plugins.
Development tasks can be completed within minutes.

In terms of dependency management, traditional approaches struggle to handle complex cross-system scheduling.

Issues such as idempotency and consistency must be considered.
This makes the process error-prone.

In contrast, Apache DolphinScheduler provides intuitive and convenient visual DAG operations.

The improvement in monitoring and alerting capabilities is particularly significant.

Traditional approaches require developers to write monitoring scripts or manually check logs.

This leads to delayed fault detection and resolution.

Apache DolphinScheduler comes with built-in monitoring capabilities.

It supports real-time viewing of task execution status and logs.

It can also integrate with multiple alerting channels such as WeCom, DingTalk, and email.

In terms of fault tolerance and recovery, traditional approaches require manual modification of code and scripts.

Data recovery logic is complex.

Apache DolphinScheduler provides one-click rerun and stop functions.

It also includes built-in automatic retry mechanisms for failures.

Resource scheduling capabilities are also greatly improved.

Traditional approaches lack unified resource management.

This often leads to CPU and memory overload on single machines, causing crashes.
Distributed approaches also consume significant resources.

Apache DolphinScheduler adopts a distributed and decentralized cluster management architecture.

It supports rapid dynamic scaling through monitoring.
It enables fine-grained resource management.

These improvements bring real value at multiple levels.

Development	Business	Decision Layer
1. Drag-and-drop development	1. Visualized monitoring	1. De-personalization (processes not dependent on individuals)
2. Automated parameterization	2. Alert assurance	2. Operation auditing
3. Log-based issue localization	3. Flexible parameters	3. Data security (centralized data configuration)
4. Low O&M cost	4. Cross-system orchestration	4. Elimination of black-box operations
—	5. Reduced development dependencies	5. Resource utilization & measurability

At the development level, drag-and-drop workflows lower the technical barrier.

Parameter automation improves development efficiency.

Second-level log tracing shortens troubleshooting time.

Operational costs are significantly reduced.

At the business level, visual monitoring provides a clear view of task status.

Multi-channel alerting ensures timely response to issues.

Flexible data recovery strategies handle various anomalies.

Cross-system coordination enables unified data flow management.

Dependence on individual developers is reduced.

At the decision-making level, knowledge is no longer tied to individuals.

It becomes an organizational asset.

Complete audit logs meet compliance requirements.

Centralized database configuration reduces security risks.

Transparent workflows make management and optimization easier.

Quantified resource usage supports refined decision-making.

These values together form a solid foundation for enterprise digital transformation.

Results and Future Outlook

Through the practical application of Apache DolphinScheduler, this intelligent manufacturing enterprise has achieved significant improvements across multiple dimensions.

These include improved development efficiency, shortened deployment cycles, significantly reduced operational costs and manpower, and greatly increased task success rates.

At the same time, the system supports rapid scaling.

New factories can be deployed within one day.

This enables standardized processes, transparent management, and data-driven decision-making.

Looking ahead, as intelligent manufacturing continues to advance, data scheduling will play an increasingly important role.

As an open-source project, Apache DolphinScheduler will continue to evolve in multiple directions.

In terms of AI enablement, it will introduce AI capabilities to achieve intelligent scheduling and predictive maintenance.

In terms of cloud-native architecture, it will deeply adapt to cloud-native environments to improve elasticity and scalability.

In terms of ecosystem expansion, it will enrich the plugin ecosystem to cover more business scenarios.

Conclusion

In the journey of intelligent manufacturing, data scheduling is not the destination, but the starting point.

Apache DolphinScheduler helps enterprises solve the “last mile” problem of data processing.
It allows enterprises to focus more on business innovation and value creation.

The road to digital transformation is long and challenging.
But with persistence, progress will be made.

May more manufacturing enterprises leverage the power of open source to achieve a transformation from “manufacturing” to “intelligent manufacturing.”

DEV Community