DEV Community

wellallyTech
wellallyTech

Posted on

The HRV Engineer: Building a Machine Learning Fatigue Warning System with Scikit-learn

Have you ever pushed through a workout only to feel absolutely wrecked the next day? πŸ˜΅β€πŸ’« That’s often because we ignore our Central Nervous System (CNS). While muscles might feel fine, your nervous system speaks a different language: Heart Rate Variability (HRV).

In this tutorial, we are going to build a personalized Exercise Fatigue Early Warning Engine. We’ll use the Garmin Connect API to fetch data, Pandas for feature engineering, and a Random Forest model via Scikit-learn to predict whether you're ready to smash a PR or if you desperately need a rest day. πŸš€

By the end of this, you'll understand how to transform raw wearable data into actionable health insights using predictive recovery algorithms and machine learning for fitness.


πŸ— The Architecture

Before we dive into the code, let's look at the data flow. We are moving from raw time-series sensor data to a categorical "Recovery Recommendation."

graph TD
    A[Garmin Wearable] -->|Sync| B(Garmin Connect API)
    B -->|JSON Data| C[Data Preprocessing - Pandas]
    C -->|Feature Extraction: RMSSD, SDNN| D{Random Forest Model}
    D -->|Prediction| E[Fatigue Level: Low/Med/High]
    E -->|UI| F[Streamlit Dashboard]
    F -->|Recommendation| G[Rest vs. Train]
Enter fullscreen mode Exit fullscreen mode

πŸ›  Prerequisites

To follow along, you'll need the following tech stack:

  • Python 3.9+
  • Scikit-learn: For the Random Forest Classifier.
  • Garmin Connect API: (We'll use a wrapper like garminconnect).
  • Pandas: For time-series manipulation.
  • Streamlit: For our shiny frontend.

Step 1: Fetching HRV Data

HRV isn't just one number; it's the variation in time between each heartbeat. Specifically, we look for RMSSD (Root Mean Square of Successive Differences).

import pandas as pd
from garminconnect import Garmin

# Initialize Garmin Client
client = Garmin("your_email", "your_password")
client.login()

def get_hrv_data(date):
    # Fetching HRV data for a specific date
    hrv_data = client.get_hrv_data(date.isoformat())
    df = pd.DataFrame(hrv_data['hrvReadings'])
    return df
Enter fullscreen mode Exit fullscreen mode

Step 2: Feature Engineering with Pandas

Machine learning models thrive on good features. For fatigue detection, we don't just want today's HRV; we want the baseline (the 7-day moving average).

def preprocess_features(df):
    # Calculating key metrics
    # RMSSD: Short term recovery indicator
    # SDNN: Overall stress indicator

    df['rolling_avg_7d'] = df['rmssd'].transform(lambda x: x.rolling(window=7).mean())
    df['hrv_drop_ratio'] = (df['rmssd'] / df['rolling_avg_7d'])

    # Labeling (For training purposes, we assume specific thresholds)
    # 1: High Fatigue, 0: Ready to Train
    df['fatigue_label'] = df['hrv_drop_ratio'].apply(lambda x: 1 if x < 0.85 else 0)

    return df.dropna()
Enter fullscreen mode Exit fullscreen mode

Step 3: Training the Random Forest Engine

Why Random Forest? It’s excellent for handling non-linear relationships and is robust against outliers (which happen often with wearable sensors!).

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Assume 'data' is our processed DataFrame
X = data[['rmssd', 'rolling_avg_7d', 'hrv_drop_ratio']]
y = data['fatigue_label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the engine
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
Enter fullscreen mode Exit fullscreen mode

πŸ₯‘ The "Official" Way to Build Health Apps

While this project is a great "Learning in Public" experiment, building production-grade health tech involves complex data privacy (HIPAA/GDPR) and more sophisticated signal processing.

For advanced patterns on integrating multi-modal health data and deploying robust AI models in the cloud, I highly recommend checking out the Wellally Tech Blog. It's a goldmine for developers looking to scale their health-tech stack beyond a local script.


Step 4: The Streamlit Dashboard

Finally, let's wrap this in a user-friendly interface so you don't have to look at a terminal every morning.

import streamlit as st

st.title("πŸ›‘οΈ HRV Fatigue Guard")

user_rmssd = st.number_input("Enter Today's RMSSD:", value=65)
user_baseline = st.number_input("Enter 7-Day Baseline:", value=72)

if st.button("Analyze Recovery"):
    ratio = user_rmssd / user_baseline
    prediction = model.predict([[user_rmssd, user_baseline, ratio]])

    if prediction[0] == 1:
        st.error("🚨 WARNING: High Neural Fatigue Detected. Suggestion: Active Recovery or Rest.")
    else:
        st.success("βœ… SYSTEM READY: Nervous system recovered. Go for that PR!")
Enter fullscreen mode Exit fullscreen mode

Conclusion

Building an HRV-based fatigue engine is a perfect way to combine your passion for fitness with data science. By moving from "I feel tired" to "My RMSSD is 15% below baseline," you're using biometric data to train smarter, not harder.

What's next?

  1. Try adding sleep scores from the Garmin API as a feature.
  2. Experiment with XGBoost to see if you can beat the Random Forest's accuracy.
  3. Check out the Wellally Tech Blog for more production-ready examples of wearable integrations!

Drop a comment below if you've tried building something similar or if you have questions about the Garmin API! πŸƒβ€β™‚οΈπŸ’¨

Top comments (0)