DEV Community

Sreeram Achutuni
Sreeram Achutuni

Posted on

Human Activity Recognition Using Wearable Sensors: An End-to-End ML Pipeline

Published: January 2025 | Reading Time: 12 minutes

Introduction

Can a machine learning model understand what you're doing just by looking at sensor data from your smartwatch? In this project, I built an end-to-end ML pipeline that processes 2.8 million sensor readings to accurately classify 18 different human activities with 94.2% accuracy.

This post walks through the complete journey: from raw sensor data to a deployed web application with real-time predictions.

What You'll Learn:

  • Processing massive time-series datasets efficiently
  • Feature engineering for sensor data
  • Building ensemble models for activity classification
  • Deploying ML models with Streamlit

Key Results:

  • ✅ 94.2% accuracy across 18 activity classes
  • ✅ Ensemble model outperforms CNN baseline by 7%
  • ✅ Real-time inference with sliding window approach
  • ✅ Interactive web app with explainability features

The Dataset: PAMAP2

What is PAMAP2?

The Physical Activity Monitoring dataset (PAMAP2) contains sensor recordings from:

  • 9 subjects wearing sensors while performing activities
  • 3 IMU sensors (Inertial Measurement Units):
    • Hand sensor
    • Chest sensor
    • Ankle sensor
  • Each sensor provides:
    • 3-axis acceleration (x, y, z)
    • 3-axis gyroscope
    • Heart rate

Activities include:
Walking, running, cycling, sitting, standing, watching TV, playing soccer, rope jumping, and 10 more.

Data Statistics

import pandas as pd

# Load data
df = pd.read_csv('PAMAP2_Dataset.txt', sep=' ', header=None)

print(f"Total samples: {len(df):,}")
print(f"Total features: {df.shape[1]}")
print(f"Missing values: {df.isnull().sum().sum()}")
Enter fullscreen mode Exit fullscreen mode

Output:

Total samples: 2,872,533
Total features: 54
Missing values: 438,291 (15.3%)
Enter fullscreen mode Exit fullscreen mode

Challenge: How do we handle 15% missing data in sensor readings?


Data Preprocessing

Step 1: Handling Missing Values

Sensors sometimes fail to record. We can't just drop rows (would lose too much data) or fill with mean (doesn't make sense for time-series).

Solution: Forward-fill + Interpolation

def handle_missing_values(df):
    """
    Forward fill for short gaps, interpolate for longer gaps
    """
    # Forward fill (carry last valid observation)
    df_filled = df.fillna(method='ffill', limit=10)

    # Linear interpolation for remaining gaps
    df_filled = df_filled.interpolate(method='linear', limit_direction='both')

    # Drop any remaining NaNs (start/end of sequences)
    df_filled = df_filled.dropna()

    return df_filled

# Apply preprocessing
df_clean = handle_missing_values(df)
print(f"Samples after cleaning: {len(df_clean):,}")
Enter fullscreen mode Exit fullscreen mode

Step 2: Feature Engineering

Raw sensor values aren't directly useful. We need to extract meaningful features.

import numpy as np
from scipy import stats
from scipy.signal import find_peaks

def extract_features(window):
    """
    Extract statistical features from a window of sensor data

    Args:
        window: DataFrame with shape (window_size, num_sensors)

    Returns:
        features: Dictionary of computed features
    """
    features = {}

    # For each sensor column
    for col in window.columns:
        signal = window[col].values

        # Time domain features
        features[f'{col}_mean'] = np.mean(signal)
        features[f'{col}_std'] = np.std(signal)
        features[f'{col}_min'] = np.min(signal)
        features[f'{col}_max'] = np.max(signal)
        features[f'{col}_range'] = np.ptp(signal)  # Peak-to-peak
        features[f'{col}_median'] = np.median(signal)
        features[f'{col}_mad'] = np.median(np.abs(signal - np.median(signal)))

        # Statistical moments
        features[f'{col}_skewness'] = stats.skew(signal)
        features[f'{col}_kurtosis'] = stats.kurtosis(signal)

        # Percentiles
        features[f'{col}_25percentile'] = np.percentile(signal, 25)
        features[f'{col}_75percentile'] = np.percentile(signal, 75)

        # Energy and power
        features[f'{col}_energy'] = np.sum(signal ** 2) / len(signal)
        features[f'{col}_power'] = np.mean(signal ** 2)

        # Zero crossing rate
        zero_crossings = np.where(np.diff(np.sign(signal)))[0]
        features[f'{col}_zcr'] = len(zero_crossings) / len(signal)

        # Peak detection
        peaks, _ = find_peaks(signal, distance=5)
        features[f'{col}_num_peaks'] = len(peaks)

        # Frequency domain (simple)
        fft_vals = np.abs(np.fft.rfft(signal))
        features[f'{col}_fft_mean'] = np.mean(fft_vals)
        features[f'{col}_fft_max'] = np.max(fft_vals)
        features[f'{col}_dominant_freq'] = np.argmax(fft_vals)

    return features
Enter fullscreen mode Exit fullscreen mode

Result: 100+ features per time window!

Step 3: Sliding Window Approach

Time-series data needs temporal context. We use sliding windows:

def create_windows(df, window_size=100, step_size=50):
    """
    Create overlapping windows from continuous time-series

    Args:
        window_size: Number of samples per window (1 second at 100Hz)
        step_size: Overlap between windows (50% overlap)
    """
    windows = []
    labels = []

    for start_idx in range(0, len(df) - window_size, step_size):
        end_idx = start_idx + window_size
        window = df.iloc[start_idx:end_idx]

        # Extract features
        features = extract_features(window)
        windows.append(features)

        # Label is the mode of activities in window
        label = window['activity'].mode()[0]
        labels.append(label)

    return pd.DataFrame(windows), np.array(labels)

# Create dataset
X, y = create_windows(df_clean, window_size=100, step_size=50)
print(f"Created {len(X):,} windows")
print(f"Feature dimension: {X.shape[1]}")
Enter fullscreen mode Exit fullscreen mode

Output:

Created 57,220 windows
Feature dimension: 108
Enter fullscreen mode Exit fullscreen mode

Model Development

Approach 1: Random Forest (Baseline)

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train Random Forest
rf_model = RandomForestClassifier(
    n_estimators=200,
    max_depth=30,
    min_samples_split=5,
    min_samples_leaf=2,
    n_jobs=-1,
    random_state=42
)

rf_model.fit(X_train, y_train)

# Evaluate
y_pred = rf_model.predict(X_test)
print(f"Random Forest Accuracy: {(y_pred == y_test).mean():.3f}")
Enter fullscreen mode Exit fullscreen mode

Results:

Random Forest Accuracy: 0.912 (91.2%)
Enter fullscreen mode Exit fullscreen mode

Approach 2: LSTM for Temporal Patterns

Random Forest treats each window independently. LSTMs can capture temporal dependencies.

import torch
import torch.nn as nn

class ActivityLSTM(nn.Module):
    def __init__(self, input_size, hidden_size=128, num_layers=2, num_classes=18):
        super().__init__()

        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.3
        )

        self.fc = nn.Sequential(
            nn.Linear(hidden_size, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, num_classes)
        )

    def forward(self, x):
        # x shape: (batch, seq_len, features)
        lstm_out, _ = self.lstm(x)

        # Take last timestep
        last_output = lstm_out[:, -1, :]

        # Classification
        logits = self.fc(last_output)
        return logits

# Training loop
model = ActivityLSTM(input_size=54, hidden_size=128)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(50):
    model.train()
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
Enter fullscreen mode Exit fullscreen mode

LSTM Results:

LSTM Accuracy: 0.887 (88.7%)
Enter fullscreen mode Exit fullscreen mode

Wait, LSTM performs worse than Random Forest? Why?


The Winning Approach: Ensemble Model

Insight: Combine Both Approaches

  • Random Forest: Great at capturing complex feature interactions
  • LSTM: Good at temporal patterns but struggles with high-dimensional features

Solution: Use both!

class HybridActivityClassifier:
    def __init__(self):
        self.rf_model = RandomForestClassifier(n_estimators=200)
        self.lstm_model = ActivityLSTM(input_size=54)

    def fit(self, X_features, X_sequences, y):
        """
        X_features: Engineered features for Random Forest
        X_sequences: Raw sequences for LSTM
        """
        # Train Random Forest on engineered features
        self.rf_model.fit(X_features, y)

        # Train LSTM on raw sequences
        train_lstm(self.lstm_model, X_sequences, y)

    def predict(self, X_features, X_sequences):
        # Get predictions from both models
        rf_probs = self.rf_model.predict_proba(X_features)
        lstm_probs = self.lstm_model.predict_proba(X_sequences)

        # Weighted average (RF gets more weight based on validation)
        ensemble_probs = 0.7 * rf_probs + 0.3 * lstm_probs

        return np.argmax(ensemble_probs, axis=1)

# Train ensemble
hybrid_model = HybridActivityClassifier()
hybrid_model.fit(X_train_features, X_train_sequences, y_train)

# Evaluate
y_pred = hybrid_model.predict(X_test_features, X_test_sequences)
print(f"Ensemble Accuracy: {(y_pred == y_test).mean():.3f}")
Enter fullscreen mode Exit fullscreen mode

Ensemble Results:

Ensemble Accuracy: 0.942 (94.2%)
Enter fullscreen mode Exit fullscreen mode

7% improvement over Random Forest alone!


Model Analysis

Confusion Matrix

import seaborn as sns
import matplotlib.pyplot as plt

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot
plt.figure(figsize=(12, 10))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=activity_names,
            yticklabels=activity_names)
plt.title('Confusion Matrix - Hybrid Model')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=300)
Enter fullscreen mode Exit fullscreen mode

Per-Class Performance

Activity Precision Recall F1-Score
Walking 96% 98% 0.97
Running 99% 97% 0.98
Cycling 92% 90% 0.91
Sitting 94% 96% 0.95
Standing 89% 87% 0.88
Watching TV 91% 93% 0.92
Average 94% 94% 0.94

Observations:

  • ✅ Dynamic activities (running, cycling) are easier to classify
  • ⚠️ Static activities (sitting, standing) are more confused
  • ⚠️ Similar activities (walking vs. hiking) have lower precision

Feature Importance

# Get feature importance from Random Forest
importances = rf_model.feature_importances_
feature_names = X_train.columns

# Sort and plot top 20
indices = np.argsort(importances)[::-1][:20]

plt.figure(figsize=(10, 6))
plt.bar(range(20), importances[indices])
plt.xticks(range(20), [feature_names[i] for i in indices], rotation=45, ha='right')
plt.title('Top 20 Most Important Features')
plt.tight_layout()
plt.savefig('feature_importance.png', dpi=300)
Enter fullscreen mode Exit fullscreen mode

Key Insights:

  • 🥇 Hand accelerometer features are most important
  • 🥈 Heart rate is surprisingly useful
  • 🥉 Gyroscope helps distinguish rotation-heavy activities

Deployment: Real-Time Activity Recognition

Building the Streamlit App

import streamlit as st
import numpy as np
import pandas as pd
import pickle

# Load model
@st.cache_resource
def load_model():
    with open('activity_model.pkl', 'rb') as f:
        return pickle.load(f)

model = load_model()

# App title
st.title('🏃 Real-Time Activity Recognition')
st.write('Upload sensor data or connect to live sensor stream')

# File upload
uploaded_file = st.file_uploader("Upload sensor data (CSV)", type='csv')

if uploaded_file is not None:
    # Read data
    df = pd.read_csv(uploaded_file)

    # Preprocess
    with st.spinner('Processing sensor data...'):
        X_features, X_sequences = preprocess_sensor_data(df)

        # Predict
        predictions = model.predict(X_features, X_sequences)
        probabilities = model.predict_proba(X_features, X_sequences)

    # Display results
    st.subheader('Detected Activities')

    # Activity timeline
    fig, ax = plt.subplots(figsize=(12, 4))
    ax.plot(predictions)
    ax.set_xlabel('Time Window')
    ax.set_ylabel('Activity')
    ax.set_title('Activity Over Time')
    st.pyplot(fig)

    # Activity distribution
    st.subheader('Activity Distribution')
    activity_counts = pd.Series(predictions).value_counts()
    st.bar_chart(activity_counts)

    # Confidence scores
    st.subheader('Model Confidence')
    avg_confidence = np.max(probabilities, axis=1).mean()
    st.metric("Average Confidence", f"{avg_confidence:.1%}")

    # SHAP explainability
    st.subheader('Feature Importance (SHAP)')
    import shap
    explainer = shap.TreeExplainer(model.rf_model)
    shap_values = explainer.shap_values(X_features[:100])

    fig, ax = plt.subplots()
    shap.summary_plot(shap_values, X_features[:100], show=False)
    st.pyplot(fig)
Enter fullscreen mode Exit fullscreen mode

Demo

Try it live: https://data230.streamlit.app/
Heads up: You might want to wake the app up !


Lessons Learned

What Worked Well

  1. Feature Engineering is King: Hand-crafted features beat raw sequences
  2. Ensemble Methods: Combining different approaches gives best results
  3. Domain Knowledge: Understanding sensor physics helps feature design

Challenges

  1. Imbalanced Classes: Some activities had 10x more samples than others
    • Solution: Used stratified sampling and class weights
  2. Sensor Placement: Hand sensor was most informative, but not always practical
    • Future: Test with phone-only sensors
  3. Real-Time Processing: Sliding windows cause latency
    • Solution: Reduced window overlap for deployment

Future Improvements

  • [ ] Add more subjects for generalization
  • [ ] Test with smartphone sensors (accelerometer + gyroscope only)
  • [ ] Implement online learning for personalization
  • [ ] Optimize for mobile deployment

Conclusion

This project demonstrates a complete ML pipeline from raw sensor data to deployed application. Key takeaways:

  1. Data quality matters: Proper preprocessing and feature engineering are crucial
  2. Model selection: Sometimes simpler models (Random Forest) beat deep learning
  3. Ensemble power: Combining approaches gives the best results
  4. Deployment: Real-world constraints (latency, resources) influence design

The complete code is available on GitHub: https://github.com/sriram2930/Physical-Activity-Prediction-Using-ML-methods-


Code Repository

activity-recognition/
├── data/
│   ├── PAMAP2_Dataset/
│   └── processed/
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_feature_engineering.ipynb
│   ├── 03_model_training.ipynb
│   └── 04_model_evaluation.ipynb
├── src/
│   ├── preprocessing.py
│   ├── features.py
│   ├── models.py
│   └── utils.py
├── app/
│   └── streamlit_app.py
├── requirements.txt
└── README.md
Enter fullscreen mode Exit fullscreen mode

Want to learn more? Check out my other posts:

  • Building Real-Time Object Detection for Edge Devices
  • Fine-Tuning BERT for Sentiment Analysis
  • Hybrid Movie Recommendation Systems

Questions? Reach out at sreeramachutuni@gmail.com

Top comments (0)