NitroGen is an open research project from the MineDojo / NVIDIA ecosystem.
It trains AI agents to play games purely from RGB screen pixels by imitating real human controller actions.
No game APIs.
No memory reading.
No engine hooks.
Just:
screen → neural network → controller input
In this guide, we extend NitroGen to build a vision-based farming bot for World of Warcraft (Retail, Classic, TBC, or private servers).
Important: This article is for research and educational purposes only.
Core Idea
Instead of scripting rotations or reading game memory, the agent learns by watching gameplay videos and copying human behavior.
The model:
- sees health bars, enemies, minimap, cooldowns
- predicts the next controller action
- executes it like a human would
This makes the bot:
- engine-agnostic
- patch-resistant
- closer to human behavior than classic bots
Key Features
- Fully vision-based (no hooks, no addons)
- Works with pretrained NitroGen models
- Fine-tuning on your own WoW footage improves results
- Human-like imperfection (non-deterministic loops)
- Scales to multiple characters / accounts (research only)
What the Bot Can Do
With sufficient training data, the agent can learn:
- Mob grinding
- Herbalism / mining routes
- Basic navigation paths
- Simple combat rotations
- Dungeon queue farming (LFD)
Performance depends entirely on dataset quality.
System Requirements
- Windows 11 (for WoW process capture)
- NVIDIA GPU (RTX 30/40 recommended, ≥8GB VRAM)
- Python 3.12+
- Xbox / PlayStation controller (or virtual controller via ViGEmBus)
- World of Warcraft client installed
- Screen resolution fixed during data collection
Installation
1. Clone NitroGen
git clone https://github.com/MineDojo/NitroGen.git
cd NitroGen
2. Create Virtual Environment
python -m venv venv
venv\Scripts\activate
3. Install Dependencies
pip install -r requirements.txt
Make sure PyTorch detects your GPU:
python -c "import torch; print(torch.cuda.is_available())"
Preparing World of Warcraft
Recommended setup:
- Windowed or borderless fullscreen
- Fixed resolution (e.g. 1920x1080)
- Controller-friendly UI layout
- Consistent camera distance
- Disable UI animations where possible
NitroGen works best with stable visual layouts.
Collecting Training Data
1. Record Gameplay
Record:
- grinding sessions
- gathering routes
- combat encounters
At the same time, record:
- controller inputs
- button presses
- joystick movements
This produces:
frame_t → action_t
pairs for training.
2. Dataset Structure
Example:
dataset/
├── frames/
│ ├── 000001.png
│ ├── 000002.png
│ └── ...
└── actions/
├── 000001.json
├── 000002.json
└── ...
Training the Model
Fine-tune NitroGen on WoW footage:
python scripts/train.py \
--config configs/wow_train.yaml \
--dataset datasets/wow_grinding
Key parameters:
- frame resolution
- temporal window size
- action discretization
- batch size
Training is much more stable than reinforcement learning.
Running the Bot
python scripts/run_agent.py \
--config configs/wow_eval.yaml \
--checkpoint checkpoints/wow_finetuned.pt
The agent will:
- read frames from the WoW window
- predict controller actions
- execute them in real time
Limitations
- No long-term quest planning
- Weak at PvP
- Sensitive to UI changes
- Requires significant training data
- Not competitive with human players
This is a research agent, not a perfect farmer.
Detection and Ethics
NitroGen:
- does not inject code
- does not read memory
- does not modify the client
However:
- automation is still detectable on live servers
- Blizzard actively bans bots
Use responsibly.
Why This Matters
World of Warcraft is one of the richest interactive environments ever built.
Using it as a benchmark helps research:
- embodied AI
- vision-based control
- human-like decision making
- long-horizon tasks
The techniques here apply far beyond games.
Top comments (0)