We ran 216 RoboTurk robot teleoperation episodes through a physics
checker. 21.9% failed.
“This is not synthetic data. These are publicly used datasets.”
Not mislabeled. Not missing values. Physically impossible motion —
data that violates Newton's laws, rigid-body kinematics, or IMU
internal consistency.
These episodes were being used to train robot arms.
What we checked
Seven biomechanical laws validated per window:
- Newton's Second Law (F = ma coupling)
- Segment resonance frequency
- Rigid body kinematics
- Jerk bounds (human motion ≤ 500 m/s³, Flash & Hogan 1985)
- IMU internal consistency
- Ballistocardiography
- Joule heating (EMG + thermal)
Each window gets a score 0–100 and a tier:
GOLD / SILVER / BRONZE / REJECTED
Results on real datasets
| Dataset | Result |
|---|---|
| RoboTurk Open-X (216 episodes) | 21.9% rejected as physically invalid |
| PAMAP2 (100Hz IMU) | +4.23% F1 after filtering corrupted windows |
| WESAD (stress classification) | +3.1% F1 improvement |
| UCI HAR | +2.51% F1 vs corrupted baseline |
| WISDM 2019 | +1.74% F1 improvement |
The F1 improvements are from training classifiers on certified-only
data vs all data. Not a bigger model. Not more data. Just cleaner data.
Why this matters for Physical AI
When you train on images, bad data hurts accuracy.
When you train a robot arm, bad data teaches physically impossible
movement patterns. The arm learns to move like it has no mass.
A prosthetic hand trained on corrupted EMG data fails the person
wearing it. A humanoid robot trained on synthetic motion that violates
rigid-body kinematics learns to move like a cartoon.
There's no standard quality floor for motion data.
We built one.
The tool
pip install s2s-certify
s2s-certify your_imu_data.csv --segment forearm
from s2s_standard_v1_3 import S2SPipeline
pipe = S2SPipeline(segment="forearm")
result = pipe.certify(imu_raw={"timestamps_ns": ts,
"accel": acc,
"gyro": gyro})
print(result["tier"]) # GOLD / SILVER / BRONZE / REJECTED
print(result["score"]) # 0–100
Zero runtime dependencies. 116/116 tests passing.
Reference benchmark
We published a reproducible benchmark: 29 windows from NinaPro DB5,
PAMAP2, and WESAD — real data, not synthetic.
- real_human: 20/21 correctly certified (95%)
- corrupted: correctly rejected or downgraded
- Results: experiments/s2s_reference_benchmark.json
Anyone can run it and get the same numbers.
What we found that surprised us
The most common failure mode in RoboTurk wasn't jerk violations.
It was IMU internal consistency — the translation and rotation
channels showed different teleoperation latency.
52% of rejected windows had this pattern.
This means the robot arm's internal sensors were decoupled during
recording — the motion looked valid to a human reviewer but the
physics said otherwise.
GitHub: https://github.com/timbo4u1/S2S
PyPI: pip install s2s-certify
DOI: 10.5281/zenodo.18878307
If you're working with IMU, EMG, or robot teleoperation data and
want to know what percentage of your dataset is physically valid —
run it and see.
Top comments (0)