This article presents the design, implementation, and deployment of an end-to-end machine learning system for short-term rainfall prediction and automated mechanical actuation. The system integrates cloud-based data ingestion, continuous model retraining, performance monitoring, static dashboard publishing, and edge deployment on embedded hardware. A six-hour rainfall forecasting model is used to trigger automated shutter control approximately ten minutes prior to predicted precipitation events. Emphasis is placed on reproducibility, security, resource efficiency, and long-term operational stability.
Introduction
Short-term weather forecasting remains a critical component in environmental monitoring, agriculture, and building automation systems. While many predictive models exist, practical deployment often suffers from limited automation, insufficient monitoring, and weak integration with physical systems.
This work aims to address these challenges by developing a self-maintaining machine learning pipeline that continuously adapts to new data and operates reliably on low-power edge hardware. The system demonstrates how modern software engineering practices can be combined with classical machine learning methods to enable autonomous environmental control.
2. System Objectives
The primary objectives of the system are:
Continuous acquisition of high-resolution weather data
Periodic retraining of predictive models
Quantitative monitoring of model performance
Secure publication of system status
Autonomous deployment to embedded devices
Real-time actuation based on predictions
Additional constraints include strict API quota management, isolation of credentials, and offline-capable inference.
3. Data Acquisition and Preprocessing
3.1 Data Source
Hourly meteorological observations are obtained via the Visual Crossing Timeline API. Retrieved variables include temperature, humidity, pressure, cloud cover, wind metrics, precipitation, and solar radiation.
To ensure quota compliance, data collection is limited to a rolling window of recent observations and scheduled at fixed intervals.
3.2 Data Cleaning
Raw observations are merged with historical records and processed using:
• Temporal deduplication
• Hourly resampling
• Linear interpolation
• Forward/backward filling
• Outlier handling
This preprocessing guarantees a continuous time series suitable for feature extraction.
3.3 Feature Engineering
Three precipitation-derived features are constructed:
• Binary rainfall indicator (rain_1h)
• Rolling 6-hour precipitation sum
• Rolling 24-hour precipitation sum
These features encode short- and medium-term rainfall persistence.
4. Predictive Modeling
4.1 Model Selection
The system employs a HistGradientBoostingClassifier due to its:
• Robustness to missing values
• High performance on tabular datasets
• Support for early stopping
• Computational efficiency
This approach avoids the overhead associated with deep learning models while maintaining strong predictive performance.
4.2 Target Construction
For a forecast horizon h, the target variable is defined as:
y_t = \max_{i \in [1,h]} rain_{t+i}
This formulation predicts whether rainfall occurs at any time within the future horizon.
4.3 Training Strategy
Data is split chronologically (80/20) to preserve temporal ordering. Early stopping is applied based on internal validation loss to prevent overfitting.
Performance is evaluated using:
• Receiver Operating Characteristic Area Under Curve (ROC-AUC)
• Precision-Recall Area Under Curve (PR-AUC)
These metrics are suitable for imbalanced rainfall events.
5. Automated Model Lifecycle Management
5.1 Continuous Retraining
Model retraining is orchestrated via GitHub Actions and executed every 48 hours. Each retraining cycle performs:
Data update
Feature regeneration
Model fitting
Performance evaluation
Artifact rotation
5.2 Model Versioning
Three tiers of artifacts are maintained:
• Current model
• Previous model
• Timestamped snapshots
This enables rapid rollback and longitudinal analysis.
5.3 Metrics Archiving
Performance metrics are appended to a persistent history file. These records support trend analysis and drift detection.
6. Monitoring and Visualization
6.1 Static Dashboard
A static HTML dashboard is generated during each retraining cycle and published using GitHub Pages. It displays:
• Model metadata
• Performance trends
• Dataset statistics
• Health indicators
No client-side API calls are performed, ensuring security and cost stability.
6.2 README-Based Reporting
Key indicators are embedded directly in the repository README using automated scripts. This provides immediate visibility without external tooling.
6.3 Degradation Detection
Performance drops exceeding predefined thresholds trigger automated warnings. These appear consistently across all reporting interfaces.
7. Security and Credential Management
The system enforces strict separation of concerns:
• API credentials stored exclusively in CI secrets
• No client-side data requests
• No credentials on edge devices
• No hard-coded locations
This design prevents credential leakage and unauthorized usage.
8. Edge Deployment Architecture
8.1 Hardware Platform
The control system operates on a Raspberry Pi connected to:
• Servo motor or relay module
• Manual override button
• Power management circuitry
8.2 Edge Inference
The device periodically retrieves the latest model artifacts and executes local inference. Predictions are generated without network dependency.
8.3 Control Logic
The actuation policy incorporates hysteresis and state persistence:
Condition Action
Probability ≥ threshold Close shutters
Probability ≤ safe margin Open shutters
Manual override Immediate release
State information is persisted to enable recovery after power loss.
9. Software Packaging
To facilitate reuse, the inference subsystem is distributed as an npm package.
You can find it here
https://www.npmjs.com/package/weather-ml-edge
The package provides:
• Secure model loading
• Prediction interfaces
• Configuration management
No network access or credential handling is included.
10. System Integration
The complete operational pipeline is summarized as:
Data API → CI Pipeline → Training → Validation → Deployment → Edge Inference → Actuation
This closed-loop architecture ensures long-term autonomy.
11. Evaluation and Results
Over extended operation, the system demonstrated:
• Stable ROC-AUC > 0.90
• Consistent PR-AUC > 0.80
• Low false positive rates
• Robust offline performance
Automated shutter actuation reliably preceded rainfall events in most observed cases.
12. Lessons Learned
12.1 Importance of Automation
Sustained ML systems require continuous retraining, validation, and deployment. Manual pipelines are not scalable.
12.2 Engineering Over Algorithms
System reliability was primarily determined by pipeline design rather than model complexity.
12.3 Security by Architecture
Credential isolation must be embedded at the system level, not added retroactively.
12.4 Edge Intelligence
Local inference enables resilient operation independent of cloud availability.
13. Future Work
Planned extensions include:
• Integration of physical rain sensors
• Multi-location modeling
• Adaptive thresholding
• Energy-aware scheduling
• Seasonal ensemble models
• Mobile notification interfaces
14. Conclusion
This work demonstrates that robust, production-grade machine learning systems can be constructed using lightweight tools and disciplined engineering practices. By combining automated retraining, secure deployment, and edge-based inference, the system achieves long-term autonomy in a resource-constrained environment.
The presented architecture is applicable to a wide range of cyber-physical systems requiring predictive control under uncertainty.
Acknowledgements
The author acknowledges the open-source community and the maintainers of Python, scikit-learn, GitHub Actions, and Visual Crossing for enabling this work.
References
[1] rotsl, "Weather-ML: Automated Rain Forecasting and Edge Control System," GitHub repository. Available: https://github.com/rotsl/weather-ml.
[2] rotsl, "Weather-ML Public Monitoring Dashboard," GitHub Pages. Available: https://rotsl.github.io/weather-ml/.
[3] Visual Crossing Corporation, "Weather API - Timeline Endpoint," Visual Crossing Weather. Available: https://www.visualcrossing.com/weather-api.
[4] Visual Crossing Corporation, "Weather API Documentation," Visual Crossing Resources. Available: https://www.visualcrossing.com/resources/documentation/weather-api. .
[5] GitHub, "GitHub Actions Documentation," GitHub Docs. Available: https://docs.github.com/en/actions.
[6] rotsl, "NPM Package," NPM. Available: https://www.npmjs.com/package/weather-ml-edge

Top comments (0)