RoTSL

Posted on Mar 12 • Originally published at Medium on Feb 23

Designing a Fully Automated Machine Learning System for Weather-Based Edge Control

#machinelearningops #edgecomputing #weatherforecasts #cyberphysicalsystems

This article presents the design, implementation, and deployment of an end-to-end machine learning system for short-term rainfall prediction and automated mechanical actuation. The system integrates cloud-based data ingestion, continuous model retraining, performance monitoring, static dashboard publishing, and edge deployment on embedded hardware. A six-hour rainfall forecasting model is used to trigger automated shutter control approximately ten minutes prior to predicted precipitation events. Emphasis is placed on reproducibility, security, resource efficiency, and long-term operational stability.

GitHub - rotsl/weather-ml: Hybrid cloud-edge ML system for predictive rain control with automated retraining, monitoring, and Raspberry Pi hardware actuation.

Introduction

Short-term weather forecasting remains a critical component in environmental monitoring, agriculture, and building automation systems. While many predictive models exist, practical deployment often suffers from limited automation, insufficient monitoring, and weak integration with physical systems.

This work aims to address these challenges by developing a self-maintaining machine learning pipeline that continuously adapts to new data and operates reliably on low-power edge hardware. The system demonstrates how modern software engineering practices can be combined with classical machine learning methods to enable autonomous environmental control.

2. System Objectives

The primary objectives of the system are:

Continuous acquisition of high-resolution weather data
Periodic retraining of predictive models
Quantitative monitoring of model performance
Secure publication of system status
Autonomous deployment to embedded devices
Real-time actuation based on predictions

Additional constraints include strict API quota management, isolation of credentials, and offline-capable inference.

3. Data Acquisition and Preprocessing

3.1 Data Source

Hourly meteorological observations are obtained via the Visual Crossing Timeline API. Retrieved variables include temperature, humidity, pressure, cloud cover, wind metrics, precipitation, and solar radiation.

To ensure quota compliance, data collection is limited to a rolling window of recent observations and scheduled at fixed intervals.

3.2 Data Cleaning

Raw observations are merged with historical records and processed using:

• Temporal deduplication

• Hourly resampling

• Linear interpolation

• Forward/backward filling

• Outlier handling

This preprocessing guarantees a continuous time series suitable for feature extraction.

3.3 Feature Engineering

Three precipitation-derived features are constructed:

• Binary rainfall indicator (rain_1h)

• Rolling 6-hour precipitation sum

• Rolling 24-hour precipitation sum

These features encode short- and medium-term rainfall persistence.

4. Predictive Modeling

4.1 Model Selection

The system employs a HistGradientBoostingClassifier due to its:

• Robustness to missing values

• High performance on tabular datasets

• Support for early stopping

• Computational efficiency

This approach avoids the overhead associated with deep learning models while maintaining strong predictive performance.

4.2 Target Construction

For a forecast horizon h, the target variable is defined as:

y_t = \max_{i \in [1,h]} rain_{t+i}

This formulation predicts whether rainfall occurs at any time within the future horizon.

4.3 Training Strategy

Data is split chronologically (80/20) to preserve temporal ordering. Early stopping is applied based on internal validation loss to prevent overfitting.

Performance is evaluated using:

• Receiver Operating Characteristic Area Under Curve (ROC-AUC)

• Precision-Recall Area Under Curve (PR-AUC)

These metrics are suitable for imbalanced rainfall events.

5. Automated Model Lifecycle Management

5.1 Continuous Retraining

Model retraining is orchestrated via GitHub Actions and executed every 48 hours. Each retraining cycle performs:

Data update
Feature regeneration
Model fitting
Performance evaluation
Artifact rotation

5.2 Model Versioning

Three tiers of artifacts are maintained:

• Current model

• Previous model

• Timestamped snapshots

This enables rapid rollback and longitudinal analysis.

5.3 Metrics Archiving

Performance metrics are appended to a persistent history file. These records support trend analysis and drift detection.

6. Monitoring and Visualization

6.1 Static Dashboard

A static HTML dashboard is generated during each retraining cycle and published using GitHub Pages. It displays:

• Model metadata

• Performance trends

• Dataset statistics

• Health indicators

No client-side API calls are performed, ensuring security and cost stability.

Weather ML Dashboard

6.2 README-Based Reporting

Key indicators are embedded directly in the repository README using automated scripts. This provides immediate visibility without external tooling.

6.3 Degradation Detection

Performance drops exceeding predefined thresholds trigger automated warnings. These appear consistently across all reporting interfaces.

7. Security and Credential Management

The system enforces strict separation of concerns:

• API credentials stored exclusively in CI secrets

• No client-side data requests

• No credentials on edge devices

• No hard-coded locations

This design prevents credential leakage and unauthorized usage.

8. Edge Deployment Architecture

8.1 Hardware Platform

The control system operates on a Raspberry Pi connected to:

• Servo motor or relay module

• Manual override button

• Power management circuitry

8.2 Edge Inference

The device periodically retrieves the latest model artifacts and executes local inference. Predictions are generated without network dependency.

8.3 Control Logic

The actuation policy incorporates hysteresis and state persistence:

Condition Action

Probability ≥ threshold Close shutters

Probability ≤ safe margin Open shutters

Manual override Immediate release

State information is persisted to enable recovery after power loss.

9. Software Packaging

To facilitate reuse, the inference subsystem is distributed as an npm package.

You can find it here

https://www.npmjs.com/package/weather-ml-edge

The package provides:

• Secure model loading

• Prediction interfaces

• Configuration management

No network access or credential handling is included.

10. System Integration

The complete operational pipeline is summarized as:

Data API → CI Pipeline → Training → Validation → Deployment → Edge Inference → Actuation

This closed-loop architecture ensures long-term autonomy.

11. Evaluation and Results

Over extended operation, the system demonstrated:

• Stable ROC-AUC > 0.90

• Consistent PR-AUC > 0.80

• Low false positive rates

• Robust offline performance

Automated shutter actuation reliably preceded rainfall events in most observed cases.

12. Lessons Learned

12.1 Importance of Automation

Sustained ML systems require continuous retraining, validation, and deployment. Manual pipelines are not scalable.

12.2 Engineering Over Algorithms

System reliability was primarily determined by pipeline design rather than model complexity.

12.3 Security by Architecture

Credential isolation must be embedded at the system level, not added retroactively.

12.4 Edge Intelligence

Local inference enables resilient operation independent of cloud availability.

13. Future Work

Planned extensions include:

• Integration of physical rain sensors

• Multi-location modeling

• Adaptive thresholding

• Energy-aware scheduling

• Seasonal ensemble models

• Mobile notification interfaces

14. Conclusion

This work demonstrates that robust, production-grade machine learning systems can be constructed using lightweight tools and disciplined engineering practices. By combining automated retraining, secure deployment, and edge-based inference, the system achieves long-term autonomy in a resource-constrained environment.

The presented architecture is applicable to a wide range of cyber-physical systems requiring predictive control under uncertainty.

Acknowledgements

The author acknowledges the open-source community and the maintainers of Python, scikit-learn, GitHub Actions, and Visual Crossing for enabling this work.

References
[1] rotsl, "Weather-ML: Automated Rain Forecasting and Edge Control System," GitHub repository. Available: https://github.com/rotsl/weather-ml. 
[2] rotsl, "Weather-ML Public Monitoring Dashboard," GitHub Pages. Available: https://rotsl.github.io/weather-ml/. 
[3] Visual Crossing Corporation, "Weather API - Timeline Endpoint," Visual Crossing Weather. Available: https://www.visualcrossing.com/weather-api. 
[4] Visual Crossing Corporation, "Weather API Documentation," Visual Crossing Resources. Available: https://www.visualcrossing.com/resources/documentation/weather-api. .
[5] GitHub, "GitHub Actions Documentation," GitHub Docs. Available: https://docs.github.com/en/actions.
[6] rotsl, "NPM Package," NPM. Available: https://www.npmjs.com/package/weather-ml-edge

DEV Community