DEV Community

timbo4u
timbo4u

Posted on • Edited on

I Built a Physics Certification Layer for Motion Data — Here's What I Found

Update March 9, 2026: Found and fixed two real bugs in the physics engine since this post. Jerk was computing 4 derivatives instead of 3. Gravity bias wasn't removed before jerk calculation. Both fixed. Also ran hybrid model experiments — results below updated with new numbers.


TL;DR: I trained a classifier on robot motion data and kept getting weird failures. The data looked fine. It wasn't fine. So I wrote a tool that checks whether sensor data actually obeys the laws of physics before you train on it. Here's what I learned.

The Problem Nobody Talks About

When you train a model on images or text, bad data is annoying but recoverable — you clean it, re-label it, filter it. The model is usually forgiving.

When you train a physical AI system — a prosthetic hand, a robot arm, a rehabilitation exoskeleton — bad training data doesn't just hurt accuracy. It teaches the system physically impossible movement patterns. A prosthetic hand trained on corrupted EMG data fails the person wearing it. A humanoid robot trained on synthetic motion data that violates rigid-body kinematics learns to move like a cartoon.

The problem is that most motion datasets have no quality floor. They contain:

  • Synthetic data generated without real sensors (no actual physics coupling)
  • Corrupted recordings with dropped samples and sensor drift
  • Mislabeled actions where the label doesn't match the measured physics

And there's no standard way to detect any of this.

I decided to build one.

The Idea: Check the Physics, Not the Labels

Instead of asking "does this data look human?" (subjective, learnable by fakes), I asked: does this data obey the physical laws that govern human movement?

A real human arm moving through space has to satisfy:

  • Rigid body kinematics — accelerometer and gyroscope on the same limb must couple: a = α×r + ω²×r. Two sensors on the same rigid body cannot disagree.
  • Jerk bounds — human motion minimizes jerk (third derivative of position). Flash & Hogan proved this in 1985. Superhuman jerk = sensor spike or synthetic artifact.
  • EMG-acceleration timing — muscle electrical activation precedes limb acceleration by ~75ms. If acceleration comes first, something is wrong.
  • Resonance frequency — human forearm tremor is 8–12Hz. Always. Vibrations at 40Hz = mechanical noise.
  • BCG heartbeat — a wrist IMU on a resting human shows the mechanical heartbeat signature at 1–3Hz. No signal = not a human.

These aren't heuristics. They're physics. You can't fake them without running a full skeletal simulation.

What I Built: S2S

Pure Python, zero external dependencies, runs anywhere including embedded systems.

from s2s_standard_v1_3.s2s_physics_v1_3 import PhysicsEngine

result = PhysicsEngine().certify(
    imu_raw={
        "timestamps_ns": [...],
        "accel": [...],   # [[ax, ay, az], ...] m/s²
        "gyro": [...],    # [[gx, gy, gz], ...] rad/s
    },
    segment="forearm"
)

print(result['tier'])               # GOLD / SILVER / BRONZE / REJECTED
print(result['physical_law_score']) # 0–100
print(result['laws_passed'])        # ['rigid_body_kinematics', 'jerk_bounds', ...]
Enter fullscreen mode Exit fullscreen mode
pip install s2s-certify
Enter fullscreen mode Exit fullscreen mode

The Result That Surprised Me

Real iPhone 11 IMU data (37 seconds of natural hand movement) versus synthetic data generated to look similar:

Metric Real Human Synthetic
Rigid body coupling r 0.35 -0.01
Jerk P95 (m/s³) 25.8 54.0
Resonance (Hz) 5.4 13.3
Physics score 69/100 53/100
Certification tier SILVER BRONZE

r=0.35 (real) vs r=-0.01 (synthetic) — physics alone, no labels, no training required.

Applied to 10,360 windows from UCI HAR + PAMAP2: 9,050 certified (87.4%), 1,310 rejected for physics violations. Those 1,310 windows aren't low quality — they're physically impossible.

Level 4: Where It Gets Interesting

The single-sensor laws are powerful, but the most interesting result came from kinematic chain consistency across multiple sensors.

PAMAP2 has 3 IMUs — hand, chest, ankle. These sensors don't just need to look right individually. They have to be consistent with each other at the physics level:

  • Ankle leads chest in jerk timing by 50–100ms (force propagates up the skeleton)
  • Dominant locomotion frequency must agree across all three sensors
  • Coupling between segments must respect joint constraints

A synthetic generator can fool single-sensor checks by learning the right statistics. It cannot fake cross-sensor timing without running a complete rigid-body skeletal simulation.

Results on PAMAP2 (12 activity classes):

Method F1 Score
Single chest IMU baseline 0.7969
Multi-sensor naive concat 0.8308 (+3.39%)
S2S kinematic chain filter 0.8399 (+0.91% over concat)
Net vs single sensor +4.23%

Level 5: Physics Features vs Raw IMU — What Actually Happened

This is where I expected a clean win and got a more honest result instead.

I extracted 19 physics features per window (jerk bounds, resonance confidence, rigid body coupling, etc.) and trained a Random Forest to classify walk/sit/run on 63,314 windows from 22 subjects.

Results:

Model Accuracy Features
Raw IMU 79.6% 768
Physics only 70.5% 19
Hybrid (raw + physics) 83.7% 787

The hybrid beats raw IMU alone by +4.13%. Physics adds signal that raw data misses.

But here's the honest part: only 2 of the 19 physics features actually contributed — rigid_rms_measured and resonance_peak_energy. The other 17 had zero importance.

That's not a failure. Those 17 features were designed to detect corrupted vs valid data, not to classify activities. They're doing their job — just not the job I asked them to do in this experiment. The two that matter are measuring motion intensity and frequency content, which directly distinguish walk/sit/run.

The lesson: physics features are highly efficient (2.7x more predictive per feature than raw IMU) but you need to match the feature to the question being asked.

Using the Physics Score as a Training Loss

You don't have to use S2S as a hard filter. Use the score as a soft loss term:

# s2s_torch.py — physics-aware training loss
import torch
from s2s_standard_v1_3.s2s_physics_v1_3 import PhysicsEngine

class S2SPhysicsLoss(torch.nn.Module):
    def __init__(self, task_loss_fn, lambda_physics=0.1):
        super().__init__()
        self.task_loss = task_loss_fn
        self.lambda_physics = lambda_physics
        self.engine = PhysicsEngine()

    def forward(self, predictions, targets, imu_batch):
        task_l = self.task_loss(predictions, targets)
        scores = []
        for sample in imu_batch:
            result = self.engine.certify(sample)
            scores.append(result['physical_law_score'] / 100.0)
        physics_scores = torch.tensor(scores, dtype=torch.float32)
        physics_penalty = (1.0 - physics_scores).mean()
        return task_l + self.lambda_physics * physics_penalty
Enter fullscreen mode Exit fullscreen mode

What's Next

The most useful thing right now: if you work with motion data for any application — robotics, prosthetics, sports science, rehab — try running your dataset through S2S and tell me what the rejection rate is. Every new dataset that gets certified (or fails interestingly) teaches something about what's actually in these benchmarks.

The open question I'm working on: which physics features matter for which problems? Rigid body coupling matters for activity classification. What matters for gait analysis? For prosthetic control? That's the research direction.

S2S — Physics-Certified Sensor Data

Physics-certified motion data for prosthetics, robotics, and Physical AI.

IMU sensor data is silently corrupted more often than people realize. S2S catches it using physics laws, not statistics. Proven on 5 real datasets. One line to install.

PyPI DOI Tests S2S CI License: BSL-1.1 Python 3.9+ Zero Dependencies


Live Demos

No install needed. All processing runs on your device. No data sent anywhere.


The Problem

Physical AI (robots, prosthetics, exoskeletons) is trained on motion data. But most datasets contain synthetic data that violates physics, corrupted recordings, and mislabeled actions — with no way to verify the data came from a real human moving in physically valid ways.

A robot trained on bad data learns bad motion. A prosthetic hand trained on uncertified data fails its…

BSL-1.1 license — free for research/education, converts to Apache 2.0 on 2028-01-01.

PyPI: pip install s2s-certify · DOI: 10.5281/zenodo.18878307

Top comments (1)

Collapse
 
timbo4u1 profile image
timbo4u

Thanks for the reaction klement! Curious if the physics-hard-filter
approach is relevant to your agentic AI work at Netanel Systems —
fault tolerance at the data layer rather than the model layer.