IBIYEMI Samuel O.

Posted on Feb 15

Setting Up Webots with Stable Baselines3 for Reinforcement Learning

#robotics #webots #ai #tutorial

Ever thought of building an actual robot? Only to be faced with the high price tags for hardware (with a high chance of equipment damage)?

You're not alone. For most of us, physical robots aren't an option. A decent mobile robot platform costs hundreds or thousands of dollars, breaks often, and requires space we don't have. But here's the thing: hardware shouldn't stop you from learning robotics. You don't need an expensive setup to build those amazing projects you've always envisaged.

Simulation gets you remarkably close to real-world environments; close enough to learn, experiment, and prototype effectively. And reinforcement learning (RL) in simulation shouldn't feel abstract. Sure, understanding policy gradients, PPO, SAC, and all those acronyms matters, but there's something uniquely satisfying about watching an agent you trained actually navigate a world that looks and behaves like reality.

This is where Webots comes in: industry-grade physics, used by researchers and companies worldwide, completely free. In this tutorial, we're connecting Webots with Stable Baselines3, pairing a professional simulator with battle-tested RL algorithms.

By the end of this tutorial, you'll have a complete simulation environment ready for RL training. No hardware required, just Python and a handful of curiosity😉.

An example of a Trained Car in webots

What You'll Build

By the end of this tutorial, you'll have:

[ ] A working Webots simulation world with a robot and target
[ ] Python virtual environment with Stable Baselines3 installed
[ ] External controller setup for running RL code from your IDE
[ ] Verified connection between Python and Webots
[ ] Foundation ready for building a Gymnasium environment (next tutorial)

The task: A robot that will learn to navigate toward a target from any starting position. The setup is intentionally simple but powerful—once you understand this foundation, you can extend it to complex scenarios like autonomous driving.

Background: RL and Simulation

Reinforcement Learning (RL) is a branch of Artificial Intelligence that trains agents through trial and error. Mathematically, it can be represented as an optimization problem where we design closed-loop control policies that maximize accumulated reward over time. RL has proven its success in modern systems ranging from LLMs to robotics and autonomous vehicles.

Simulation involves using computer software to create virtual environments that mimic real-world physics and dynamics. Instead of testing your RL agent on expensive hardware that can break or cause safety issues, you train it in a controlled digital replica. Think of it as a sandbox where your agent can fail thousands of times without consequences, learning what works before ever touching physical hardware.

Why This Stack?

Webots gives you industry-standard, physics-accurate simulation that's completely free and robot-agnostic. Whether you're working with wheeled robots, drones, or manipulator arms, Webots handles the physics engine, sensors, and actuators so you can focus on your RL and control logic.

Stable Baselines3 provides production-ready RL algorithms (PPO, SAC, TD3, etc.) with clean APIs, excellent documentation, and active maintenance. Instead of implementing DDPG from scratch and debugging it for weeks, you get reliable, tested implementations.

By connecting Webots with Stable Baselines3, you get professional-grade tools on both ends. Simulation realistic enough to matter, and algorithms robust enough to work.

Prerequisites

Knowledge:

Basic Python programming
Familiarity with RL concepts (agent, environment, reward, policy)
A sprinkle of curiosity to learn is often all you need✨

Software:

Python 3.8+ (I'm using Python 3.12.0)
Webots R2023b or later
Stable Baselines3 and dependencies

Hardware:

Any modern computer (Windows, macOS, or Linux)
4GB+ RAM recommended

Installation

Step 1: Install Python

Download and install Python from python.org. Make sure to check "Add Python to PATH" during installation.

Verify installation:

python --version

Step 2: Install Webots

Visit https://cyberbotics.com/
Download the package for your operating system
Run the installer and follow the prompts (agree to all defaults)
Launch Webots to verify installation

Project Setup

Create Your Webots World

Open Webots
File → New → New Project Directory
Use the Project Creation Wizard:
- Directory name: Webots_SB3_Tutorial
- World name: robot_navigation
- Check "Add a rectangle arena"
- Click Finish

Webots will create the project structure and open your new world with a basic arena.

Set Up Python Environment

Here's something important: Webots uses its own Python environment. Traditional virtual environments don't work directly with Webots controllers. When you set a controller in Webots, it launches a subprocess using the system Python, completely ignoring your activated virtual environment.

For RL/ML workflows with external libraries like Stable Baselines3, we use External Controllers. This lets you run your code from your terminal or IDE (where your virtual environment is active) while connecting to the Webots simulation.

Navigate to your project folder and create a virtual environment:

# Navigate to your Webots project
cd {path-to-your}\Webots_SB3_Tutorial

# Create virtual environment in the project folder
python -m venv webots_rl_env

# Activate it
# On Windows:
webots_rl_env\Scripts\activate

# On macOS/Linux:
source webots_rl_env/bin/activate

Install Required Packages:

pip install stable-baselines3[extra] gymnasium numpy

Verify installation:

python -c "import stable_baselines3; print(stable_baselines3.__version__)"

Set Webots Environment Variable:

For external controllers to work, Python needs to know where Webots is installed. Set this once:

# Windows PowerShell:
$env:WEBOTS_HOME = "C:\Program Files\Webots"

# Windows CMD:
set WEBOTS_HOME=C:\Program Files\Webots

# macOS/Linux:
export WEBOTS_HOME=/Applications/Webots.app
# or wherever you installed Webots

To make this permanent, add it to your system environment variables or shell profile.

Your project structure should now look like this:

Webots_SB3_Tutorial/
├── webots_rl_env/          # Your virtual environment
├── controllers/
├── libraries/
├── plugins/
├── worlds/
│   └── robot_navigation.wbt
└── protos/

Building Your Simulation World

Now we'll add the components our RL agent needs: a robot to control, a target to reach, and a supervisor to manage the training loop.

Understanding the Architecture

Before we build, let's understand how the pieces connect:

Webots runs like this:

Initialize world → Update physics → Read sensors → Control actuators → Repeat

Gymnasium (the RL standard) expects:

reset() → observation step(action) → 
observation, reward, done, info

The bridge: We create a Gymnasium-compatible environment that:

Controls the Webots simulation timestep
Reads sensor data and converts to observations
Receives actions and sends to robot actuators
Calculates rewards based on task progress
Detects episode termination

Webots & Stable-Baseline3 Interaction. We'll implement this bridge in the next tutorial

The Navigation Task

We're building a simple but powerful setup:

A robot starts at random positions in the arena
A target (goal) is placed somewhere in the arena
The robot learns to drive toward the target using relative observations (distance and angle), not absolute positions

This approach means once trained, you can move the target anywhere and the robot will adapt. The policy learns "navigate toward what I see" rather than "go to coordinates (x, y)."

Add the Robot

Add a robot to your world:
- In Webots, click the Add button (+ icon) in the scene
- Navigate to: PROTO nodes (Webots Projects) → robots → gctronic → e-puck → E-puck (Robot)
- or you can search for "E-Puck" in the Add a node pop-up.
- Click Add

Give the robot a DEF name:
- Click on the E-puck in the scene tree
- At the very top of the node properties, add ROBOT to the DEF: field
- This allows our Python code to reference this specific robot

Set the robot controller to external:
- In the properties panel, find the controller field
- Change it from "e-puck" to <extern>
- This tells Webots we'll control it from our Python script

Add the Target

We need a visible target for the robot to navigate toward. We'll use a Solid node so it can be repositioned programmatically (for testing different positions), but we'll make it non-colliding so the robot can reach the exact center.

Add a Solid node:
- Click the Add button
- Select Base nodes → Solid
Give the target a DEF name:
- Select the Solid node
- Add TARGET to the DEF: field
- This allows our Python code to reference and move this object
Add visual appearance:
- Expand the Solid node in the scene tree
- Right-click on children [] → Add New → Choose Shape
- Expand the Shape node
- Right-click on geometry NULL → Add New → Choose Cylinder
- Configure the Cylinder:
  - Set radius to 0.01
  - Set height to 0.05
- Right-click on appearance NULL → Add New → Choose PBRAppearance
- Expand PBRAppearance and set baseColor to red: 1 0 0
Position the target:
- Find the translation field
- Set it to: 0.3 0.025 0.3 (x, y, z coordinates)

Why use Solid without collision?

Solid nodes can be moved programmatically via the Supervisor API (useful for testing)
We skip physics and boundingObject so the robot can drive through the marker
The target is purely visual—a goal marker, not a physical obstacle
Later, you can add physics if you want obstacle avoidance training

Add the Supervisor

For RL to work, we need a "supervisor" that can:

Reset the robot position between episodes
Read positions of both robot and target
Calculate rewards
Control the simulation

Add a Robot node for supervision:
- Click Add
- Select Base nodes → Robot
Configure it as a supervisor:
- Set name to "supervisor_controller"
- Set supervisor field to TRUE
- Set controller to <extern>

Save Your World

File → Save World

Your scene tree should now look like this:

Your scene should look like this:

What we just built:

ROBOT (E-puck): The agent that will learn to navigate
TARGET (red cylinder): The goal position
Supervisor: The "brain" that runs our RL training loop

Verifying Your Setup

Let's make sure everything is connected properly.

Create the test controller:

In your project, create a new folder: controllers/test_supervisor/
Inside that folder, create a file: test_supervisor.py

Your folder structure should look like:

Webots_SB3_Tutorial/
├── webots_rl_env/
├── controllers/
│   └── test_supervisor/
│       └── test_supervisor.py
├── worlds/
│   └── robot_navigation.wbt
└── protos/

Add this code to test_supervisor.py:

from controller import Supervisor

# Initialize supervisor
supervisor = Supervisor()
timestep = int(supervisor.getBasicTimeStep())

# Test: Can we access our nodes?
robot_node = supervisor.getFromDef("ROBOT")
target_node = supervisor.getFromDef("TARGET")

if robot_node and target_node:
    print("✓ Setup successful!")
    print(f"  Robot found at: {robot_node.getPosition()}")
    print(f"  Target found at: {target_node.getPosition()}")

    # Test moving the target
    trans_field = target_node.getField("translation")
    current_pos = trans_field.getSFVec3f()
    print(f"  Target can be moved: {current_pos}")

else:
    print("✗ Setup error!")
    if not robot_node:
        print("  Missing: ROBOT (check DEF name on E-puck)")
    if not target_node:
        print("  Missing: TARGET (check DEF name on Solid)")

# Run one simulation step
supervisor.step(timestep)
print("✓ Simulation step successful!")

To run the test:

In Webots, open your robot_navigation.wbt world
Select the Robot (supervisor_controller) node in the scene tree
Change its controller field from <extern> to test_supervisor
Click the Play button (▶️) in Webots. (You might need to Click restart too)
Check the Webots console (bottom panel)

Expected output in the Webots console:

INFO: test_supervisor: Starting controller: python.exe -u test_supervisor.py
✓ Setup successful!
  Robot found at: [0.0, 0.0, 0.0]
  Target found at: [0.3, 0.025, 0.3]
  Target can be moved: [0.3, 0.025, 0.3]
✓ Simulation step successful!

After testing:

Important: Change the supervisor's controller field back to <extern> (we'll need this for the next tutorial)
File → Save World

Optional test: Hold Shift + Left Click and drag the target in the 3D view. It should move freely, confirming the physics setup is correct.

What You Accomplished

🎉 Congratulations! You've built a complete foundation for RL training in Webots:

Installed Webots and Python environment
Created a simulation world with robot and target
Configured external controller setup
Verified Python can communicate with Webots
Ready to build a Gymnasium environment (next tutorial)

Next Steps

Coming in the next tutorial: "Building a Gymnasium Environment for Webots Robot Control"

We'll write the code that bridges Stable Baselines3 and Webots:

Creating a custom Gymnasium environment class
Implementing reset() and step() methods
Defining observation and action spaces
Designing a reward function
Handling episode termination

Resources:

📦 Complete code: [https://github.com/sam-dude/Webots_SB3_Tutorial]
📚 Webots Documentation
📚 Stable Baselines3 Documentation
📚 Gymnasium Documentation

Final Thoughts

You might find interacting with Webots a bit confusing at first. It can feel daunting getting introduced to a tool with seemingly many features. But here's the catch: the best way to learn is by playing around.

Go beyond what we've covered in this tutorial. Experiment with the "pre-made" robots available in Webots. Try out certain ideas you have by adding and customizing different nodes. Webots allows you to create custom environments, and hands-on exploration is often the fastest way to get comfortable with any new tool.

Conclusion

You now have a professional-grade simulation setup ready for RL experimentation. This foundation uses the same tools researchers and companies use for real robotics projects—no expensive hardware required.

The key insight we've established: by using relative observations (distance and angle to target) instead of absolute positions, our future trained agent will generalize. Move the target anywhere, and the robot will adapt.

In the next Tutorial, we will connect our Webots environment to Gynasium.

Thank you for reading this piece to the end. If you face any issue during implementation, you can drop a comment. I'll do well to respond on time.

DEV Community