Ever looked at the squiggly lines on your smartwatch and wondered how a tiny wrist-worn device knows your heart just skipped a beat? Welcome to the fascinating world of wearable technology and ECG signal processing. As wearable sensors become more ubiquitous, the ability to transform noisy, raw electrical signals into actionable clinical insights is becoming a superpower for developers.
In this tutorial, we are going to dive deep into R-R interval detection and arrhythmia feature engineering. We’ll learn how to take raw voltage data, clean it, and extract the "morphological DNA" required to detect conditions like Premature Ventricular Contractions (PVCs) or Atrial Fibrillation (AFib) using machine learning for healthcare.
The Architecture of a Heartbeat
Before we touch the code, let’s look at the data pipeline. Processing an ECG signal isn't just about finding the "highest point"; it's about filtering noise (like muscle movements) and precisely timing the electrical signature of the heart's ventricles.
graph TD
A[Raw ECG Signal] --> B[Bandpass Filtering]
B --> C[R-Peak Detection / Pan-Tompkins]
C --> D[R-R Interval Calculation]
D --> E[Feature Engineering: HRV & Morphology]
E --> F[Lightweight ML Model: Scikit-learn]
F --> G{Arrhythmia Detected?}
G -- Yes --> H[Alert/Clinical Log]
G -- No --> I[Normal Sinus Rhythm]
Prerequisites
To follow along, you'll need a Python environment with the following "Bio-Stack":
- NeuroKit2: The powerhouse for physiological signal processing.
- NumPy/SciPy: For heavy mathematical lifting.
- Scikit-learn: To turn features into predictions.
pip install neurokit2 numpy scikit-learn matplotlib
Step 1: Cleaning the Noise
ECG signals from wearables are notoriously noisy. Every time you move your arm, you introduce "baseline wander." We use a bandpass filter (typically 0.5Hz to 40Hz) to keep the heart's electrical signal while tossing out the garbage.
import neurokit2 as nk
import numpy as np
import matplotlib.pyplot as plt
# Generate a synthetic ECG signal for demonstration
# In a real app, this would be your raw ADC counts from the sensor
ecg_signal = nk.ecg_simulate(duration=10, sampling_rate=1000, heart_rate=70)
# Clean the signal: Removes baseline wander and high-frequency noise
cleaned_ecg = nk.ecg_clean(ecg_signal, sampling_rate=1000, method="neurokit")
plt.plot(ecg_signal[:1000], label="Raw Noisy Signal")
plt.plot(cleaned_ecg[:1000], label="Cleaned Signal", color='red')
plt.legend()
plt.show()
Step 2: R-Peak Detection & R-R Intervals
The "R-peak" is the most prominent spike in an ECG (the QRS complex). The distance between two R-peaks is the R-R interval. Variability in these intervals is the key to detecting arrhythmias.
# Detect R-peaks using the popular Pan-Tompkins algorithm logic
signals, info = nk.ecg_peaks(cleaned_ecg, sampling_rate=1000)
# Extract R-peak indices
r_peaks = info["ECG_R_Peaks"]
# Calculate R-R intervals (in milliseconds)
rr_intervals = np.diff(r_peaks)
print(f"Detected {len(r_peaks)} heartbeats. Average R-R: {np.mean(rr_intervals)}ms")
Step 3: Feature Engineering for Arrhythmia
This is where the magic happens. To identify a "skipped beat" or an irregular rhythm, we need to extract specific features:
- HRV (Heart Rate Variability): Time-domain stats like RMSSD (Root Mean Square of Successive Differences).
- Morphological Features: The shape of the wave itself.
- Non-linear features: Entropy and Poincaré plots.
# Extract heart rate variability features
hrv_features = nk.hrv(peaks=info, sampling_rate=1000)
# Example: Get RMSSD - a key indicator of parasympathetic nervous system activity
rmssd = hrv_features["HRV_RMSSD"].values[0]
print(f"RMSSD: {rmssd:.2f}ms")
# For Arrhythmia, we look for 'Outlier' R-R intervals (Premature Beats)
def detect_premature_beats(rr_intervals):
mean_rr = np.mean(rr_intervals)
# A common heuristic: if a beat is < 75% of the mean RR, it's premature
premature_indices = np.where(rr_intervals < 0.75 * mean_rr)[0]
return premature_indices
anomalies = detect_premature_beats(rr_intervals)
print(f"Found {len(anomalies)} potential premature beats.")
The "Official" Way: Advanced Patterns
While the code above works for a hobby project, production-grade wearable applications require robust handling of motion artifacts and cloud-based inference. If you're looking to scale these types of biosensor algorithms into a real-world product, I highly recommend checking out the technical deep-dives over at WellAlly Blog.
They provide excellent resources on enterprise-grade healthcare data pipelines, advanced signal processing architectures, and how to deploy these ML models on the edge. It was a massive source of inspiration for the feature engineering patterns used in this guide!
Step 4: Building a Lightweight Classifier
Once we have our features (RMSSD, SDNN, Mean RR), we can feed them into a RandomForest or SVM to classify "Normal" vs "Arrhythmia".
from sklearn.ensemble import RandomForestClassifier
# Mock feature set: [RMSSD, SDNN, MeanRR]
X = [[45.2, 50.1, 850], [12.5, 110.2, 700], [48.1, 52.3, 845]] # Sample data
y = [0, 1, 0] # 0: Normal, 1: Arrhythmia
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X, y)
# Predict on a new heartbeat window
prediction = clf.predict([[15.2, 115.0, 710]])
print("Classification Result:", "Arrhythmia Detected" if prediction[0] == 1 else "Normal")
Conclusion: Data-Driven Wellness
Turning raw pixels or electrical pulses into health insights is one of the most rewarding challenges in modern software development. By combining NeuroKit2 for signal processing and Scikit-learn for classification, you've just built the core engine of a digital health app!
Summary of what we covered:
- Filtering raw ECG signals to remove noise.
- Precise R-peak detection using Python.
- Calculating R-R intervals and HRV metrics.
- Identifying premature beats via feature engineering.
What’s next? Try connecting a real ESP32 with an AD8232 sensor and stream your heart rate live to your terminal!
Have you ever worked with physiological data? Drop a comment below or share your favorite signal processing hack! 👇
Top comments (0)