PPO vs SAC: 1-GPU Memory & Compute Cost Benchmark

#reinforcementlearnin #ppo #sac #gpumemory

SAC Uses 40% More VRAM Than PPO on the Same Task

I expected PPO to be the memory hog. It stores entire trajectories for on-policy updates, while SAC only needs a replay buffer that can sit on CPU. But when I actually profiled both algorithms training Humanoid-v4 on an RTX 3090, SAC consistently peaked at 4.2GB VRAM versus PPO's 3.0GB.

The culprit? SAC's twin Q-networks and automatic entropy tuning add three extra neural networks compared to PPO's actor-critic pair.

Abstract 3D render visualizing artificial intelligence and neural networks in digital form. — Photo by Google DeepMind on Pexels

The Experiment Setup

I ran both algorithms on three MuJoCo environments using Stable Baselines3 v2.3.0 and Gymnasium 0.29.1. Single RTX 3090 (24GB), PyTorch 2.1, CUDA 12.1. Same network architecture where possible: 2-layer MLP with 256 hidden units.


python
import torch
import gymnasium as gym
from stable_baselines3 import PPO, SAC
from stable_baselines3.common.vec_env import SubprocVecEnv
import time
import psutil

def make_env(env_id, seed):
    def _init():
        env = gym.make(env_id)
        env.reset(seed=seed)
        return env
    return _init


---

*Continue reading the full article on [TildAlice](https://tildalice.io/ppo-vs-sac-1-gpu-memory-compute-benchmark/)*

DEV Community

PPO vs SAC: 1-GPU Memory & Compute Cost Benchmark

SAC Uses 40% More VRAM Than PPO on the Same Task

The Experiment Setup

Top comments (0)