Ever thought of building an actual robot? Only to be faced with the high price tags for hardware (with a high chance of equipment damage)?
You're not alone. For most of us, physical robots aren't an option. A decent mobile robot platform costs hundreds or thousands of dollars, breaks often, and requires space we don't have. But here's the thing: hardware shouldn't stop you from learning robotics. You don't need an expensive setup to build those amazing projects you've always envisaged.
Simulation gets you remarkably close to real-world environments; close enough to learn, experiment, and prototype effectively. And reinforcement learning (RL) in simulation shouldn't feel abstract. Sure, understanding policy gradients, PPO, SAC, and all those acronyms matters, but there's something uniquely satisfying about watching an agent you trained actually navigate a world that looks and behaves like reality.
This is where Webots comes in: industry-grade physics, used by researchers and companies worldwide, completely free. In this tutorial, we're connecting Webots with Stable Baselines3, pairing a professional simulator with battle-tested RL algorithms.
By the end of this tutorial, you'll have a complete simulation environment ready for RL training. No hardware required, just Python and a handful of curiosityπ.

An example of a Trained Car in webots
What You'll Build
By the end of this tutorial, you'll have:
- [ ] A working Webots simulation world with a robot and target
- [ ] Python virtual environment with Stable Baselines3 installed
- [ ] External controller setup for running RL code from your IDE
- [ ] Verified connection between Python and Webots
- [ ] Foundation ready for building a Gymnasium environment (next tutorial)
The task: A robot that will learn to navigate toward a target from any starting position. The setup is intentionally simple but powerfulβonce you understand this foundation, you can extend it to complex scenarios like autonomous driving.
Background: RL and Simulation
Reinforcement Learning (RL) is a branch of Artificial Intelligence that trains agents through trial and error. Mathematically, it can be represented as an optimization problem where we design closed-loop control policies that maximize accumulated reward over time. RL has proven its success in modern systems ranging from LLMs to robotics and autonomous vehicles.
Simulation involves using computer software to create virtual environments that mimic real-world physics and dynamics. Instead of testing your RL agent on expensive hardware that can break or cause safety issues, you train it in a controlled digital replica. Think of it as a sandbox where your agent can fail thousands of times without consequences, learning what works before ever touching physical hardware.
Why This Stack?
Webots gives you industry-standard, physics-accurate simulation that's completely free and robot-agnostic. Whether you're working with wheeled robots, drones, or manipulator arms, Webots handles the physics engine, sensors, and actuators so you can focus on your RL and control logic.
Stable Baselines3 provides production-ready RL algorithms (PPO, SAC, TD3, etc.) with clean APIs, excellent documentation, and active maintenance. Instead of implementing DDPG from scratch and debugging it for weeks, you get reliable, tested implementations.
By connecting Webots with Stable Baselines3, you get professional-grade tools on both ends. Simulation realistic enough to matter, and algorithms robust enough to work.
Prerequisites
Knowledge:
- Basic Python programming
- Familiarity with RL concepts (agent, environment, reward, policy)
- A sprinkle of curiosity to learn is often all you needβ¨
Software:
- Python 3.8+ (I'm using Python 3.12.0)
- Webots R2023b or later
- Stable Baselines3 and dependencies
Hardware:
- Any modern computer (Windows, macOS, or Linux)
- 4GB+ RAM recommended
Installation
Step 1: Install Python
Download and install Python from python.org. Make sure to check "Add Python to PATH" during installation.
Verify installation:
python --version
Step 2: Install Webots
- Visit https://cyberbotics.com/
- Download the package for your operating system
- Run the installer and follow the prompts (agree to all defaults)
- Launch Webots to verify installation
Project Setup
Create Your Webots World
- Open Webots
- File β New β New Project Directory
-
Use the Project Creation Wizard:
- Directory name:
Webots_SB3_Tutorial - World name:
robot_navigation - Check "Add a rectangle arena"
- Click Finish
- Directory name:
Webots will create the project structure and open your new world with a basic arena.
Set Up Python Environment
Here's something important: Webots uses its own Python environment. Traditional virtual environments don't work directly with Webots controllers. When you set a controller in Webots, it launches a subprocess using the system Python, completely ignoring your activated virtual environment.
For RL/ML workflows with external libraries like Stable Baselines3, we use External Controllers. This lets you run your code from your terminal or IDE (where your virtual environment is active) while connecting to the Webots simulation.
Navigate to your project folder and create a virtual environment:
# Navigate to your Webots project
cd {path-to-your}\Webots_SB3_Tutorial
# Create virtual environment in the project folder
python -m venv webots_rl_env
# Activate it
# On Windows:
webots_rl_env\Scripts\activate
# On macOS/Linux:
source webots_rl_env/bin/activate
Install Required Packages:
pip install stable-baselines3[extra] gymnasium numpy
Verify installation:
python -c "import stable_baselines3; print(stable_baselines3.__version__)"
Set Webots Environment Variable:
For external controllers to work, Python needs to know where Webots is installed. Set this once:
# Windows PowerShell:
$env:WEBOTS_HOME = "C:\Program Files\Webots"
# Windows CMD:
set WEBOTS_HOME=C:\Program Files\Webots
# macOS/Linux:
export WEBOTS_HOME=/Applications/Webots.app
# or wherever you installed Webots
To make this permanent, add it to your system environment variables or shell profile.
Your project structure should now look like this:
Webots_SB3_Tutorial/
βββ webots_rl_env/ # Your virtual environment
βββ controllers/
βββ libraries/
βββ plugins/
βββ worlds/
β βββ robot_navigation.wbt
βββ protos/
Building Your Simulation World
Now we'll add the components our RL agent needs: a robot to control, a target to reach, and a supervisor to manage the training loop.
Understanding the Architecture
Before we build, let's understand how the pieces connect:
Webots runs like this:
Initialize world β Update physics β Read sensors β Control actuators β Repeat
Gymnasium (the RL standard) expects:
reset() β observation step(action) β
observation, reward, done, info
The bridge: We create a Gymnasium-compatible environment that:
- Controls the Webots simulation timestep
- Reads sensor data and converts to observations
- Receives actions and sends to robot actuators
- Calculates rewards based on task progress
- Detects episode termination
Webots & Stable-Baseline3 Interaction. We'll implement this bridge in the next tutorial
The Navigation Task
We're building a simple but powerful setup:
- A robot starts at random positions in the arena
- A target (goal) is placed somewhere in the arena
- The robot learns to drive toward the target using relative observations (distance and angle), not absolute positions
This approach means once trained, you can move the target anywhere and the robot will adapt. The policy learns "navigate toward what I see" rather than "go to coordinates (x, y)."
Add the Robot
-
Add a robot to your world:
- In Webots, click the Add button (+ icon) in the scene
- Navigate to: PROTO nodes (Webots Projects) β robots β gctronic β e-puck β E-puck (Robot)
- or you can search for "E-Puck" in the
Add a nodepop-up. - Click Add
-
Give the robot a DEF name:
- Click on the E-puck in the scene tree
- At the very top of the node properties, add
ROBOTto theDEF:field - This allows our Python code to reference this specific robot
-
Set the robot controller to external:
- In the properties panel, find the
controllerfield - Change it from
"e-puck"to<extern> - This tells Webots we'll control it from our Python script
- In the properties panel, find the
Add the Target
We need a visible target for the robot to navigate toward. We'll use a Solid node so it can be repositioned programmatically (for testing different positions), but we'll make it non-colliding so the robot can reach the exact center.
-
Add a Solid node:
- Click the Add button
- Select Base nodes β Solid
-
Give the target a DEF name:
- Select the Solid node
- Add
TARGETto theDEF:field - This allows our Python code to reference and move this object
-
Add visual appearance:
- Expand the Solid node in the scene tree
- Right-click on
children []β Add New β Choose Shape - Expand the Shape node
- Right-click on
geometry NULLβ Add New β Choose Cylinder - Configure the Cylinder:
- Set
radiusto0.01 - Set
heightto0.05
- Set
- Right-click on
appearance NULLβ Add New β Choose PBRAppearance - Expand PBRAppearance and set
baseColorto red:1 0 0
-
Position the target:
- Find the
translationfield - Set it to:
0.3 0.025 0.3(x, y, z coordinates)
- Find the
Why use Solid without collision?
- Solid nodes can be moved programmatically via the Supervisor API (useful for testing)
- We skip physics and boundingObject so the robot can drive through the marker
- The target is purely visualβa goal marker, not a physical obstacle
- Later, you can add physics if you want obstacle avoidance training
Add the Supervisor
For RL to work, we need a "supervisor" that can:
- Reset the robot position between episodes
- Read positions of both robot and target
- Calculate rewards
- Control the simulation
-
Add a Robot node for supervision:
- Click Add
- Select Base nodes β Robot
-
Configure it as a supervisor:
- Set
nameto"supervisor_controller" - Set
supervisorfield toTRUE - Set
controllerto<extern>
- Set
Save Your World
File β Save World
Your scene tree should now look like this:
Your scene should look like this:
What we just built:
- ROBOT (E-puck): The agent that will learn to navigate
- TARGET (red cylinder): The goal position
- Supervisor: The "brain" that runs our RL training loop
Verifying Your Setup
Let's make sure everything is connected properly.
Create the test controller:
- In your project, create a new folder:
controllers/test_supervisor/ - Inside that folder, create a file:
test_supervisor.py
Your folder structure should look like:
Webots_SB3_Tutorial/
βββ webots_rl_env/
βββ controllers/
β βββ test_supervisor/
β βββ test_supervisor.py
βββ worlds/
β βββ robot_navigation.wbt
βββ protos/
Add this code to test_supervisor.py:
from controller import Supervisor
# Initialize supervisor
supervisor = Supervisor()
timestep = int(supervisor.getBasicTimeStep())
# Test: Can we access our nodes?
robot_node = supervisor.getFromDef("ROBOT")
target_node = supervisor.getFromDef("TARGET")
if robot_node and target_node:
print("β Setup successful!")
print(f" Robot found at: {robot_node.getPosition()}")
print(f" Target found at: {target_node.getPosition()}")
# Test moving the target
trans_field = target_node.getField("translation")
current_pos = trans_field.getSFVec3f()
print(f" Target can be moved: {current_pos}")
else:
print("β Setup error!")
if not robot_node:
print(" Missing: ROBOT (check DEF name on E-puck)")
if not target_node:
print(" Missing: TARGET (check DEF name on Solid)")
# Run one simulation step
supervisor.step(timestep)
print("β Simulation step successful!")
To run the test:
- In Webots, open your
robot_navigation.wbtworld - Select the Robot (supervisor_controller) node in the scene tree
- Change its
controllerfield from<extern>totest_supervisor - Click the Play button (βΆοΈ) in Webots. (You might need to Click restart too)
- Check the Webots console (bottom panel)
Expected output in the Webots console:
INFO: test_supervisor: Starting controller: python.exe -u test_supervisor.py
β Setup successful!
Robot found at: [0.0, 0.0, 0.0]
Target found at: [0.3, 0.025, 0.3]
Target can be moved: [0.3, 0.025, 0.3]
β Simulation step successful!
After testing:
-
Important: Change the supervisor's
controllerfield back to<extern>(we'll need this for the next tutorial) - File β Save World
Optional test: Hold Shift + Left Click and drag the target in the 3D view. It should move freely, confirming the physics setup is correct.
What You Accomplished
π Congratulations! You've built a complete foundation for RL training in Webots:
- Installed Webots and Python environment
- Created a simulation world with robot and target
- Configured external controller setup
- Verified Python can communicate with Webots
- Ready to build a Gymnasium environment (next tutorial)
Next Steps
Coming in the next tutorial: "Building a Gymnasium Environment for Webots Robot Control"
We'll write the code that bridges Stable Baselines3 and Webots:
- Creating a custom Gymnasium environment class
- Implementing
reset()andstep()methods - Defining observation and action spaces
- Designing a reward function
- Handling episode termination
Resources:
- π¦ Complete code: [https://github.com/sam-dude/Webots_SB3_Tutorial]
- π Webots Documentation
- π Stable Baselines3 Documentation
- π Gymnasium Documentation
Final Thoughts
You might find interacting with Webots a bit confusing at first. It can feel daunting getting introduced to a tool with seemingly many features. But here's the catch: the best way to learn is by playing around.
Go beyond what we've covered in this tutorial. Experiment with the "pre-made" robots available in Webots. Try out certain ideas you have by adding and customizing different nodes. Webots allows you to create custom environments, and hands-on exploration is often the fastest way to get comfortable with any new tool.
Conclusion
You now have a professional-grade simulation setup ready for RL experimentation. This foundation uses the same tools researchers and companies use for real robotics projectsβno expensive hardware required.
The key insight we've established: by using relative observations (distance and angle to target) instead of absolute positions, our future trained agent will generalize. Move the target anywhere, and the robot will adapt.
In the next Tutorial, we will connect our Webots environment to Gynasium.
Thank you for reading this piece to the end. If you face any issue during implementation, you can drop a comment. I'll do well to respond on time.







Top comments (0)