timbo4u

Posted on Mar 6 • Edited on Mar 9

I Built a Physics Certification Layer for Motion Data — Here's What I Found

#machinelearning #robotics #datascience #python

Update March 9, 2026: Found and fixed two real bugs in the physics engine since this post. Jerk was computing 4 derivatives instead of 3. Gravity bias wasn't removed before jerk calculation. Both fixed. Also ran hybrid model experiments — results below updated with new numbers.

TL;DR: I trained a classifier on robot motion data and kept getting weird failures. The data looked fine. It wasn't fine. So I wrote a tool that checks whether sensor data actually obeys the laws of physics before you train on it. Here's what I learned.

The Problem Nobody Talks About

When you train a model on images or text, bad data is annoying but recoverable — you clean it, re-label it, filter it. The model is usually forgiving.

When you train a physical AI system — a prosthetic hand, a robot arm, a rehabilitation exoskeleton — bad training data doesn't just hurt accuracy. It teaches the system physically impossible movement patterns. A prosthetic hand trained on corrupted EMG data fails the person wearing it. A humanoid robot trained on synthetic motion data that violates rigid-body kinematics learns to move like a cartoon.

The problem is that most motion datasets have no quality floor. They contain:

Synthetic data generated without real sensors (no actual physics coupling)
Corrupted recordings with dropped samples and sensor drift
Mislabeled actions where the label doesn't match the measured physics

And there's no standard way to detect any of this.

I decided to build one.

The Idea: Check the Physics, Not the Labels

Instead of asking "does this data look human?" (subjective, learnable by fakes), I asked: does this data obey the physical laws that govern human movement?

A real human arm moving through space has to satisfy:

Rigid body kinematics — accelerometer and gyroscope on the same limb must couple: a = α×r + ω²×r. Two sensors on the same rigid body cannot disagree.
Jerk bounds — human motion minimizes jerk (third derivative of position). Flash & Hogan proved this in 1985. Superhuman jerk = sensor spike or synthetic artifact.
EMG-acceleration timing — muscle electrical activation precedes limb acceleration by ~75ms. If acceleration comes first, something is wrong.
Resonance frequency — human forearm tremor is 8–12Hz. Always. Vibrations at 40Hz = mechanical noise.
BCG heartbeat — a wrist IMU on a resting human shows the mechanical heartbeat signature at 1–3Hz. No signal = not a human.

These aren't heuristics. They're physics. You can't fake them without running a full skeletal simulation.

What I Built: S2S

Pure Python, zero external dependencies, runs anywhere including embedded systems.

from s2s_standard_v1_3.s2s_physics_v1_3 import PhysicsEngine

result = PhysicsEngine().certify(
    imu_raw={
        "timestamps_ns": [...],
        "accel": [...],   # [[ax, ay, az], ...] m/s²
        "gyro": [...],    # [[gx, gy, gz], ...] rad/s
    },
    segment="forearm"
)

print(result['tier'])               # GOLD / SILVER / BRONZE / REJECTED
print(result['physical_law_score']) # 0–100
print(result['laws_passed'])        # ['rigid_body_kinematics', 'jerk_bounds', ...]

pip install s2s-certify

The Result That Surprised Me

Real iPhone 11 IMU data (37 seconds of natural hand movement) versus synthetic data generated to look similar:

Metric	Real Human	Synthetic
Rigid body coupling r	0.35	-0.01
Jerk P95 (m/s³)	25.8	54.0
Resonance (Hz)	5.4	13.3
Physics score	69/100	53/100
Certification tier	SILVER	BRONZE

r=0.35 (real) vs r=-0.01 (synthetic) — physics alone, no labels, no training required.

Applied to 10,360 windows from UCI HAR + PAMAP2: 9,050 certified (87.4%), 1,310 rejected for physics violations. Those 1,310 windows aren't low quality — they're physically impossible.

Level 4: Where It Gets Interesting

The single-sensor laws are powerful, but the most interesting result came from kinematic chain consistency across multiple sensors.

PAMAP2 has 3 IMUs — hand, chest, ankle. These sensors don't just need to look right individually. They have to be consistent with each other at the physics level:

Ankle leads chest in jerk timing by 50–100ms (force propagates up the skeleton)
Dominant locomotion frequency must agree across all three sensors
Coupling between segments must respect joint constraints

A synthetic generator can fool single-sensor checks by learning the right statistics. It cannot fake cross-sensor timing without running a complete rigid-body skeletal simulation.

Results on PAMAP2 (12 activity classes):

Method	F1 Score
Single chest IMU baseline	0.7969
Multi-sensor naive concat	0.8308 (+3.39%)
S2S kinematic chain filter	0.8399 (+0.91% over concat)
Net vs single sensor	+4.23%

Level 5: Physics Features vs Raw IMU — What Actually Happened

This is where I expected a clean win and got a more honest result instead.

I extracted 19 physics features per window (jerk bounds, resonance confidence, rigid body coupling, etc.) and trained a Random Forest to classify walk/sit/run on 63,314 windows from 22 subjects.

Results:

Model	Accuracy	Features
Raw IMU	79.6%	768
Physics only	70.5%	19
Hybrid (raw + physics)	83.7%	787

The hybrid beats raw IMU alone by +4.13%. Physics adds signal that raw data misses.

But here's the honest part: only 2 of the 19 physics features actually contributed — rigid_rms_measured and resonance_peak_energy. The other 17 had zero importance.

That's not a failure. Those 17 features were designed to detect corrupted vs valid data, not to classify activities. They're doing their job — just not the job I asked them to do in this experiment. The two that matter are measuring motion intensity and frequency content, which directly distinguish walk/sit/run.

The lesson: physics features are highly efficient (2.7x more predictive per feature than raw IMU) but you need to match the feature to the question being asked.

Using the Physics Score as a Training Loss

You don't have to use S2S as a hard filter. Use the score as a soft loss term:

# s2s_torch.py — physics-aware training loss
import torch
from s2s_standard_v1_3.s2s_physics_v1_3 import PhysicsEngine

class S2SPhysicsLoss(torch.nn.Module):
    def __init__(self, task_loss_fn, lambda_physics=0.1):
        super().__init__()
        self.task_loss = task_loss_fn
        self.lambda_physics = lambda_physics
        self.engine = PhysicsEngine()

    def forward(self, predictions, targets, imu_batch):
        task_l = self.task_loss(predictions, targets)
        scores = []
        for sample in imu_batch:
            result = self.engine.certify(sample)
            scores.append(result['physical_law_score'] / 100.0)
        physics_scores = torch.tensor(scores, dtype=torch.float32)
        physics_penalty = (1.0 - physics_scores).mean()
        return task_l + self.lambda_physics * physics_penalty

What's Next

The most useful thing right now: if you work with motion data for any application — robotics, prosthetics, sports science, rehab — try running your dataset through S2S and tell me what the rejection rate is. Every new dataset that gets certified (or fails interestingly) teaches something about what's actually in these benchmarks.

The open question I'm working on: which physics features matter for which problems? Rigid body coupling matters for activity classification. What matters for gait analysis? For prosthetic control? That's the research direction.

timbo4u1 / S2S

Physics certification for robot training data. Checks 11 biomechanical laws before your model trains.

S2S — Physics Certification for Motion Data

Bad robot and human motion training data costs you months. S2S finds it in seconds.

from s2s_standard_v1_3 import PhysicsEngine

result = PhysicsEngine().certify(
    imu_raw={
        "timestamps_ns": timestamps,
        "accel": accel_data,   # [[x,y,z], ...] m/s²
        "gyro":  gyro_data,    # [[x,y,z], ...] rad/s
    },
    segment="forearm"
)

print(result["tier"])                  # GOLD / SILVER / BRONZE / REJECTED
print(result["physical_law_score"])    # 0–100
print(result["laws_passed"])
print(result["laws_failed"])

Why this exists

Most motion datasets contain bad data — corrupted recordings, synthetic signals that violate physics, mislabeled actions. You cannot see it by looking at the numbers. Your model trains on it anyway.

S2S asks: does this data obey the physics of human movement? A perfect statistical fake fails if it violates Newton's Second Law…

View on GitHub

BSL-1.1 license — free for research/education, converts to Apache 2.0 on 2028-01-01.

PyPI: pip install s2s-certify · DOI: 10.5281/zenodo.18878307

Top comments (1)

timbo4u • Mar 6

Thanks for the reaction klement! Curious if the physics-hard-filter
approach is relevant to your agentic AI work at Netanel Systems —
fault tolerance at the data layer rather than the model layer.