DQN vs PPO vs SAC: MuJoCo Training Speed Benchmarks

HalfCheetah-v4: Run forward as fast as possible (12-dim action space)
Hopper-v4: One-legged robot staying upright (3-dim action space)
Ant-v4: Four-legged walker (8-dim action space)

#dqn #ppo #sac #mujoco

Why Your First RL Algorithm Choice Costs You 10x Compute

Pick the wrong algorithm for continuous control and you'll burn through cloud credits before seeing a working policy. I've watched DQN struggle on HalfCheetah for 48 hours while SAC converged in 4. The advice online is generic: "PPO is stable, SAC is sample-efficient, DQN is simple." But what does that actually mean when you're staring at a flat reward curve at 3am?

This benchmark measures wall-clock training time and sample efficiency across three MuJoCo continuous control tasks. Same hardware (M1 MacBook Pro, 16GB RAM), same total timesteps (1M), same network architecture where applicable. The goal: find out which algorithm gets you to a working policy fastest.