DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

PPO vs SAC: Real Robot Benchmark on 3 Manipulation Tasks

The Simulation Trap

Most RL comparisons stop at MuJoCo. Clean physics, deterministic dynamics, unlimited resets. Then you deploy to hardware and PPO's variance suddenly matters. SAC's sample efficiency looks less impressive when each episode takes 4 minutes and your servo overheats after 100 trials.

I ran both algorithms on three real manipulation tasks: peg insertion, door opening, and cable routing. Same reward functions, same hyperparameters (within reason), same hardware budget. The results don't match what you'd expect from simulation benchmarks.

This isn't another "PPO is general-purpose, SAC handles continuous actions" post. This is what happens when your training loop includes motor calibration drift, inconsistent object placement, and a robot that needs 30 seconds to reset between episodes.

Two children observe a humanoid robot on a table, exploring technology and innovation.

Photo by Pavel Danilyuk on Pexels

Hardware Setup and Why It Matters


Continue reading the full article on TildAlice

Top comments (0)