Here’s a complete, consolidated guide to manually setting up vLLM on a Google Cloud VM with an NVIDIA L4 GPU, using Debian and pip. This includes GPU driver installation, Python environment setup, and vLLM deployment.
🧱 1. Create a VM with NVIDIA L4 GPU
- Go to Google Cloud Console
- Navigate to Compute Engine > VM instances
- Click Create Instance
- Choose:
-
Machine type: e.g.,
n1-standard-4
or higher - GPU: NVIDIA L4 (1 unit)
- Boot disk: Debian 12 (Bookworm)
- Disk size: At least 100 GB recommended
-
Machine type: e.g.,
- Enable API access, SSH, and firewall rules if needed
⚙️ 2. Install System Dependencies
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential dkms linux-headers-$(uname -r)
🧩 3. Enable Non-Free Repositories
echo "deb http://deb.debian.org/debian bookworm main contrib non-free non-free-firmware" | sudo tee /etc/apt/sources.list.d/non-free.list
sudo apt update
🔧 4. Install NVIDIA Driver (DKMS method)
sudo apt install -y nvidia-driver
sudo reboot
After reboot, verify:
nvidia-smi
✅ You should see your NVIDIA L4 GPU listed with driver and CUDA version.
🐍 5. Set Up Python Environment
sudo apt install -y python3 python3-pip python3-venv
python3 -m venv vllm-env
source vllm-env/bin/activate
pip install --upgrade pip
🔥 6. Install PyTorch with CUDA Support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
CUDA 11.8 is compatible with L4 and vLLM. You can adjust if needed.
🧠 7. Install vLLM
pip install vllm
✅ 8. Verify GPU Access
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0))"
Expected output:
CUDA available: True
Device: NVIDIA L4
🧪 9. Run vLLM Server
python -m vllm.entrypoints.openai_api_server --model facebook/opt-1.3b
To expose externally:
python -m vllm.entrypoints.openai_api_server --model facebook/opt-1.3b --host 0.0.0.0 --port 8000
Make sure port 8000
is open in your firewall settings.
🧼 Optional Cleanup & Tips
- Use
tmux
orscreen
to keep server running after logout - Monitor GPU usage with
nvidia-smi
- Upgrade disk size if loading larger models
Top comments (0)