Most RL Tutorials Get the Algorithm Choice Backwards
Pick DQN for CartPole. Pick PPO for MuJoCo. That's the advice I see everywhere, and it's mostly wrong.
The real decision isn't continuous vs discrete action spaces—it's about how your reward structure interacts with the value function approximation error. I spent two weeks running benchmarks on Gymnasium environments, and the results completely flipped my assumptions. DQN converged in 100k steps on LunarLander where PPO needed 500k. PPO hit 95% win rate on Acrobot in 200k steps; DQN never broke 70%.
Here's what actually matters: whether your environment has a clear "good action" at each state (DQN wins) or requires exploring stochastic policies to find solutions (PPO wins). The action space type is just a proxy for this deeper difference.
Why Discrete Actions Amplify DQN's Strengths
Continue reading the full article on TildAlice

Top comments (0)