WSL2 systemd Support
To enable systemd in WSL2, configure /etc/wsl.conf.
# Add to /etc/wsl.conf
[boot]
systemd=true
To apply the changes, restart WSL2.
wsl --shutdown
After configuration, you can check the list of services with systemctl --all. To automatically start user services when WSL2 launches, you need to run the loginctl enable-linger command.
vLLM systemd Unit Files
Startup Script
#!/bin/bash
set -e
export CUDA_VISIBLE_DEVICES=0
python3 -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--port 8000 \
--model nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese \
--max-model-len 32768 \
--gpu-memory-utilization 0.9 \
--trust-remote-code \
--tensor-parallel-size 1
systemd Unit File (~/.config/systemd/user/vllm.service)
[Unit]
Description=vLLM Inference Server
After=network.target
[Service]
Type=simple
WorkingDirectory=%h/vllm
ExecStart=%h/vllm_server.sh
Restart=always
RestartSec=5s
[Install]
WantedBy=default.target
Key points of the configuration are as follows:
- Specify the GPU to use with
CUDA_VISIBLE_DEVICES=0. -
--trust-remote-codeallows the execution of custom code for Hugging Face models. Use this only for trusted models. -
Restart=alwaysenables automatic recovery.
Activation Commands
systemctl --user daemon-reload
systemctl --user enable vllm.service
systemctl --user start vllm.service
Service-ifying the Flask API
We will service-ify a Flask application that wraps the vLLM API.
from flask import Flask, request
import requests
app = Flask(__name__)
@app.route('/generate', methods=['POST'])
def generate():
prompt = request.json['prompt']
response = requests.post(
'http://localhost:8000/v1/completions',
json={'model': 'nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese',
'prompt': prompt, 'max_tokens': 200}
)
return response.json()
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8510)
systemd Unit File
[Unit]
Description=Flask API for vLLM
After=vllm.service
[Service]
Type=simple
WorkingDirectory=%h/flask_api
ExecStart=%h/.venv/bin/python app.py
Restart=always
RestartSec=3s
[Install]
WantedBy=default.target
By specifying After=vllm.service, the Flask API will only start after vLLM has finished launching.
Distinguishing from cron
For scheduled tasks, we use systemd timers.
Timer Configuration (~/.config/systemd/user/daily-report.timer)
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
[Install]
WantedBy=timers.target
systemctl --user enable daily-report.timer
The advantages of systemd timers compared to cron are as follows:
- Centralized log management with
journalctl. - Explicit specification of dependencies.
- Automatic completion of missed runs with
Persistent=true.
Startup Order and Dependencies
- vLLM: The foundational inference engine.
- Flask API: Depends on vLLM (
After=vllm.service). - Daily Report Generation: References logs from both.
systemctl --user list-dependencies vllm.service
Log Confirmation (journalctl)
# Logs per service
journalctl --user -u vllm.service -f
# Last 100 logs
journalctl --user -u vllm.service -n 100
# Extract only error logs
journalctl --user -u vllm.service --since "24 hours ago" | grep -i "error\|fail"
Summary
This post explained how to build an operational environment in WSL2 that leverages systemd to seamlessly integrate vLLM, Flask, and scheduled tasks.
- Achieved vLLM service-ification, maximizing the utilization of CUDA 12.8 and RTX 5090.
- Service dependency management using
After=is key to resolving concurrent startup errors. - Real-time monitoring with
journalctlensures operational stability.
This configuration can be utilized as infrastructure to support the practical application of Japanese inference models based on Nemotron.
This article was generated by Nemotron-Nano-9B-v2-Japanese, and Gemini 2.5 Flash performed formatting and verification.
Top comments (0)