DEV Community

Cover image for Deploying vLLM on your Linux Server
Tobrun Van Nuland
Tobrun Van Nuland

Posted on

Deploying vLLM on your Linux Server

๐Ÿš€ Deploying vLLM on Your Linux Server

Running vLLM as a persistent, reliable background service is one of the best ways to expose a fast local LLM API on your Linux machine.

This guide walks through:

  • Installing dependencies
  • Creating a virtual environment
  • Setting up a systemd service
  • Running vLLM from a fixed directory (/home/nurbot/ws/models)
  • Checking logs and debugging
  • Enabling auto-start on boot

๐Ÿงฐ 1. Install System Dependencies

sudo apt-get update
sudo apt-get install -y python3-pip python3-venv docker.io
Enter fullscreen mode Exit fullscreen mode

Docker is optional but useful if you want containerized workflows.


๐ŸŽฎ 2. Verify NVIDIA GPU Support (Optional but Recommended)

Check whether the machine has working NVIDIA drivers:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

If the command is missing, install drivers before running GPU-backed vLLM.


๐Ÿ 3. Create the vLLM Virtual Environment

We place it in /opt/vllm-env:

sudo python3 -m venv /opt/vllm-env
sudo chown -R $USER:$USER /opt/vllm-env
source /opt/vllm-env/bin/activate
Enter fullscreen mode Exit fullscreen mode

Install vLLM + OpenAI API compatibility:

pip install vllm openai
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ 4. Configure where vLLM Runs From

We want vLLM to run from:

/home/nurbot/ws/models
Enter fullscreen mode Exit fullscreen mode

This directory will contain the start_vllm.sh script.

Ensure the start script is executable:

chmod +x /home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
Enter fullscreen mode Exit fullscreen mode

๐Ÿงฉ 5. Create the Systemd Service

Create the service file:

sudo nano /etc/systemd/system/vllm.service
Enter fullscreen mode Exit fullscreen mode

Paste:

[Unit]
Description=vLLM Inference Server
After=network.target

[Service]
Type=simple
User=nurbot
WorkingDirectory=/home/nurbot/ws/models
ExecStart=/home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
Restart=always
Environment=MODEL_NAME=facebook/opt-125m

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Then reload systemd:

sudo systemctl daemon-reload
Enter fullscreen mode Exit fullscreen mode

โ–ถ๏ธ 6. Starting, Stopping, and Enabling the Service

Start vLLM:

sudo systemctl start vllm
Enter fullscreen mode Exit fullscreen mode

Check its status:

systemctl status vllm
Enter fullscreen mode Exit fullscreen mode

Enable auto-start on boot:

sudo systemctl enable vllm
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“ก 7. Checking Logs

To see the real-time logs from vLLM:

journalctl -u vllm -f
Enter fullscreen mode Exit fullscreen mode

To see historical logs:

journalctl -u vllm
Enter fullscreen mode Exit fullscreen mode

To see recent errors:

journalctl -u vllm -xe
Enter fullscreen mode Exit fullscreen mode

๐Ÿ›  8. Troubleshooting

Service says โ€œfailedโ€

Run:

systemctl status vllm
journalctl -u vllm -xe
Enter fullscreen mode Exit fullscreen mode

Common issues:

  • Wrong ExecStart path
  • Missing execute permission
  • Python crash inside vLLM
  • GPU not available / out of memory

๐ŸŽฏ Conclusion

You now have a fully functional vLLM OpenAI-compatible server running as a background service on Linux. It's stable, auto-starts on reboot, logs to systemd, and uses a clean virtual environment with GPU acceleration.

Top comments (0)