SAC Uses 40% More VRAM Than PPO on the Same Task
I expected PPO to be the memory hog. It stores entire trajectories for on-policy updates, while SAC only needs a replay buffer that can sit on CPU. But when I actually profiled both algorithms training Humanoid-v4 on an RTX 3090, SAC consistently peaked at 4.2GB VRAM versus PPO's 3.0GB.
The culprit? SAC's twin Q-networks and automatic entropy tuning add three extra neural networks compared to PPO's actor-critic pair.
The Experiment Setup
I ran both algorithms on three MuJoCo environments using Stable Baselines3 v2.3.0 and Gymnasium 0.29.1. Single RTX 3090 (24GB), PyTorch 2.1, CUDA 12.1. Same network architecture where possible: 2-layer MLP with 256 hidden units.
python
import torch
import gymnasium as gym
from stable_baselines3 import PPO, SAC
from stable_baselines3.common.vec_env import SubprocVecEnv
import time
import psutil
def make_env(env_id, seed):
def _init():
env = gym.make(env_id)
env.reset(seed=seed)
return env
return _init
---
*Continue reading the full article on [TildAlice](https://tildalice.io/ppo-vs-sac-1-gpu-memory-compute-benchmark/)*

Top comments (0)