Detcting Burnout Before It Hits: Building an HRV Anomaly Detector with Isolation Forest 🚀

#machinelearning #datascience #python #wearables

Have you ever woken up feeling like a truck hit you, even though you "rested"? Or maybe you smashed a PR in the gym only to be sidelined by a cold 24 hours later? Our bodies often send distress signals long before we feel the symptoms. One of the most powerful signals is Heart Rate Variability (HRV).

In this tutorial, we’re going to build a predictive health pipeline using HRV Anomaly Detection, Isolation Forest, and Python. By leveraging unsupervised learning, we can identify "outlier" days that signify early overtraining or oncoming infection. If you're looking to master wearable data analysis and Scikit-learn, you're in the right place. 🥑

The Science: Why HRV?

HRV measures the variation in time between each heartbeat. A high HRV usually indicates a well-recovered nervous system, while a sudden drop (or a weirdly high spike) often precedes physical "crashes." Using a standard threshold isn't enough because everyone's "normal" is different. That’s where Isolation Forest comes in—it doesn't need labeled data to know when your body is acting "weird."

The Architecture 🏗️

We need a system that ingests time-series data, processes it through our ML model, and visualizes the alerts.

graph TD
    A[Wearable Device / Apple Health] -->|Export| B(InfluxDB)
    B --> C{Python Analytics Engine}
    C --> D[Scikit-learn: Isolation Forest]
    D -->|Identify Outliers| E[Grafana Dashboard]
    E -->|Alert| F[User: Take a Rest Day!]
    style D fill:#f9f,stroke:#333,stroke-width:2px

Prerequisites

To follow along, you'll need:

Python 3.9+
Scikit-learn (for the ML magic)
InfluxDB (optimized for time-series wearable data)
Grafana (for the sexy dashboards)

Step 1: Pulling Data from InfluxDB 📥

First, we need to grab our historical HRV data. InfluxDB is perfect here because health data is essentially one long time series.

import pandas as pd
from influxdb_client import InfluxDBClient

# Setup connection
token = "YOUR_INFLUX_TOKEN"
org = "YourOrg"
bucket = "HealthData"

client = InfluxDBClient(url="http://localhost:8086", token=token, org=org)

def fetch_hrv_data():
    query = f'''
    from(bucket: "{bucket}")
      |> range(start: -30d)
      |> filter(fn: (r) => r["_measurement"] == "heart_rate_variability")
      |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
    '''
    df = client.query_api().query_data_frame(query)
    return df

# Let's assume df has columns: ['_time', 'hrv_ms', 'sleep_duration_hr']
df = fetch_hrv_data()

Step 2: Detecting Anomalies with Isolation Forest 🌲

The Isolation Forest algorithm works by isolating observations by randomly selecting a feature and then randomly selecting a split value. Since anomalies are few and different, they get isolated much faster (shorter paths in the tree) than normal points.

from sklearn.ensemble import IsolationForest
import numpy as np

def detect_overtraining(df):
    # We focus on HRV and Sleep Duration as our primary features
    features = df[['hrv_ms', 'sleep_duration_hr']]

    # contamination=0.05 means we expect 5% of days to be "anomalous"
    model = IsolationForest(n_estimators=100, contamination=0.05, random_state=42)

    # Fit the model and predict
    # 1 = normal, -1 = anomaly
    df['anomaly_score'] = model.fit_predict(features)

    # Filter for potential red flags
    red_flags = df[df['anomaly_score'] == -1]
    return red_flags

anomalies = detect_overtraining(df)
print(f"Detected {len(anomalies)} days where your body was under significant stress!")

The "Official" Way to Scale 💡

While this script is a great start for a personal project, building a production-grade health monitoring system requires handling missing data, sensor noise, and baseline drifting.

For advanced architectural patterns on bio-metric data processing and more production-ready examples of health-tech integrations, I highly recommend checking out the WellAlly Official Blog. They dive deep into how to turn raw wearable signals into actionable clinical insights.

Step 3: Visualizing in Grafana 📊

Once the Python script identifies an anomaly, we write a "flag" back to InfluxDB. In Grafana, you can create a Time Series panel and use State Timeline to highlight these red zones.

Pro-tip: Use the Yellow color for mild deviations and Red for "Stop training immediately" signals.

-- Example Flux query for Grafana
from(bucket: "HealthData")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "hrv_anomalies")
  |> yield(name: "anomalies")

Conclusion: Data > Intuition 🧘‍♂️

By the time you feel "burned out," your HRV has likely been trending downward for days. By using Scikit-learn's Isolation Forest, we move from reactive recovery to proactive health management.

Summary of what we built:

Connected to InfluxDB for time-series retrieval.
Implemented an unsupervised ML model to find health outliers.
Visualized the results to catch infection/overtraining 24 hours early.

What are you tracking? Are you using Oura, Whoop, or Apple Watch data? Drop a comment below and let’s talk about the best features for anomaly detection! 👇