DEV Community

Cover image for Setting Up Webots with Stable Baselines3 for Reinforcement Learning
IBIYEMI Samuel O.
IBIYEMI Samuel O.

Posted on

Setting Up Webots with Stable Baselines3 for Reinforcement Learning

Ever thought of building an actual robot? Only to be faced with the high price tags for hardware (with a high chance of equipment damage)?

You're not alone. For most of us, physical robots aren't an option. A decent mobile robot platform costs hundreds or thousands of dollars, breaks often, and requires space we don't have. But here's the thing: hardware shouldn't stop you from learning robotics. You don't need an expensive setup to build those amazing projects you've always envisaged.

Simulation gets you remarkably close to real-world environments; close enough to learn, experiment, and prototype effectively. And reinforcement learning (RL) in simulation shouldn't feel abstract. Sure, understanding policy gradients, PPO, SAC, and all those acronyms matters, but there's something uniquely satisfying about watching an agent you trained actually navigate a world that looks and behaves like reality.

This is where Webots comes in: industry-grade physics, used by researchers and companies worldwide, completely free. In this tutorial, we're connecting Webots with Stable Baselines3, pairing a professional simulator with battle-tested RL algorithms.

By the end of this tutorial, you'll have a complete simulation environment ready for RL training. No hardware required, just Python and a handful of curiosityπŸ˜‰.

An example of a Train Car in webots
An example of a Trained Car in webots


What You'll Build

By the end of this tutorial, you'll have:

  • [ ] A working Webots simulation world with a robot and target
  • [ ] Python virtual environment with Stable Baselines3 installed
  • [ ] External controller setup for running RL code from your IDE
  • [ ] Verified connection between Python and Webots
  • [ ] Foundation ready for building a Gymnasium environment (next tutorial)

The task: A robot that will learn to navigate toward a target from any starting position. The setup is intentionally simple but powerfulβ€”once you understand this foundation, you can extend it to complex scenarios like autonomous driving.


Background: RL and Simulation

Reinforcement Learning (RL) is a branch of Artificial Intelligence that trains agents through trial and error. Mathematically, it can be represented as an optimization problem where we design closed-loop control policies that maximize accumulated reward over time. RL has proven its success in modern systems ranging from LLMs to robotics and autonomous vehicles.

Simulation involves using computer software to create virtual environments that mimic real-world physics and dynamics. Instead of testing your RL agent on expensive hardware that can break or cause safety issues, you train it in a controlled digital replica. Think of it as a sandbox where your agent can fail thousands of times without consequences, learning what works before ever touching physical hardware.

Why This Stack?

Webots gives you industry-standard, physics-accurate simulation that's completely free and robot-agnostic. Whether you're working with wheeled robots, drones, or manipulator arms, Webots handles the physics engine, sensors, and actuators so you can focus on your RL and control logic.

Stable Baselines3 provides production-ready RL algorithms (PPO, SAC, TD3, etc.) with clean APIs, excellent documentation, and active maintenance. Instead of implementing DDPG from scratch and debugging it for weeks, you get reliable, tested implementations.

By connecting Webots with Stable Baselines3, you get professional-grade tools on both ends. Simulation realistic enough to matter, and algorithms robust enough to work.


Prerequisites

Knowledge:

  • Basic Python programming
  • Familiarity with RL concepts (agent, environment, reward, policy)
  • A sprinkle of curiosity to learn is often all you need✨

Software:

  • Python 3.8+ (I'm using Python 3.12.0)
  • Webots R2023b or later
  • Stable Baselines3 and dependencies

Hardware:

  • Any modern computer (Windows, macOS, or Linux)
  • 4GB+ RAM recommended

Installation

Step 1: Install Python

Download and install Python from python.org. Make sure to check "Add Python to PATH" during installation.

Verify installation:

python --version
Enter fullscreen mode Exit fullscreen mode

Step 2: Install Webots

  1. Visit https://cyberbotics.com/
  2. Download the package for your operating system
  3. Run the installer and follow the prompts (agree to all defaults)
  4. Launch Webots to verify installation

Project Setup

Create Your Webots World

  1. Open Webots
  2. File β†’ New β†’ New Project Directory
  3. Use the Project Creation Wizard:
    • Directory name: Webots_SB3_Tutorial
    • World name: robot_navigation
    • Check "Add a rectangle arena"
    • Click Finish

Webots will create the project structure and open your new world with a basic arena.

Creating a new project in webots

Set Up Python Environment

Here's something important: Webots uses its own Python environment. Traditional virtual environments don't work directly with Webots controllers. When you set a controller in Webots, it launches a subprocess using the system Python, completely ignoring your activated virtual environment.

For RL/ML workflows with external libraries like Stable Baselines3, we use External Controllers. This lets you run your code from your terminal or IDE (where your virtual environment is active) while connecting to the Webots simulation.

Navigate to your project folder and create a virtual environment:

# Navigate to your Webots project
cd {path-to-your}\Webots_SB3_Tutorial

# Create virtual environment in the project folder
python -m venv webots_rl_env

# Activate it
# On Windows:
webots_rl_env\Scripts\activate

# On macOS/Linux:
source webots_rl_env/bin/activate
Enter fullscreen mode Exit fullscreen mode

Install Required Packages:

pip install stable-baselines3[extra] gymnasium numpy
Enter fullscreen mode Exit fullscreen mode

Verify installation:

python -c "import stable_baselines3; print(stable_baselines3.__version__)"
Enter fullscreen mode Exit fullscreen mode

Set Webots Environment Variable:

For external controllers to work, Python needs to know where Webots is installed. Set this once:

# Windows PowerShell:
$env:WEBOTS_HOME = "C:\Program Files\Webots"

# Windows CMD:
set WEBOTS_HOME=C:\Program Files\Webots

# macOS/Linux:
export WEBOTS_HOME=/Applications/Webots.app
# or wherever you installed Webots
Enter fullscreen mode Exit fullscreen mode

To make this permanent, add it to your system environment variables or shell profile.

Your project structure should now look like this:

Webots_SB3_Tutorial/
β”œβ”€β”€ webots_rl_env/          # Your virtual environment
β”œβ”€β”€ controllers/
β”œβ”€β”€ libraries/
β”œβ”€β”€ plugins/
β”œβ”€β”€ worlds/
β”‚   └── robot_navigation.wbt
└── protos/
Enter fullscreen mode Exit fullscreen mode

Building Your Simulation World

Now we'll add the components our RL agent needs: a robot to control, a target to reach, and a supervisor to manage the training loop.

Understanding the Architecture

Webots & Stable-Baseline3 implementation

Before we build, let's understand how the pieces connect:

Webots runs like this:

Initialize world β†’ Update physics β†’ Read sensors β†’ Control actuators β†’ Repeat 
Enter fullscreen mode Exit fullscreen mode

Gymnasium (the RL standard) expects:

reset() β†’ observation step(action) β†’ 
observation, reward, done, info 
Enter fullscreen mode Exit fullscreen mode

The bridge: We create a Gymnasium-compatible environment that:

  1. Controls the Webots simulation timestep
  2. Reads sensor data and converts to observations
  3. Receives actions and sends to robot actuators
  4. Calculates rewards based on task progress
  5. Detects episode termination

Webots & Stable-Baseline3 Interaction. We'll implement this bridge in the next tutorial

The Navigation Task

We're building a simple but powerful setup:

  • A robot starts at random positions in the arena
  • A target (goal) is placed somewhere in the arena
  • The robot learns to drive toward the target using relative observations (distance and angle), not absolute positions

This approach means once trained, you can move the target anywhere and the robot will adapt. The policy learns "navigate toward what I see" rather than "go to coordinates (x, y)."

Add the Robot

  1. Add a robot to your world:
    • In Webots, click the Add button (+ icon) in the scene
    • Navigate to: PROTO nodes (Webots Projects) β†’ robots β†’ gctronic β†’ e-puck β†’ E-puck (Robot)
    • or you can search for "E-Puck" in the Add a node pop-up.
    • Click Add

Add E-Puck robot in Webots

  1. Give the robot a DEF name:
    • Click on the E-puck in the scene tree
    • At the very top of the node properties, add ROBOT to the DEF: field
    • This allows our Python code to reference this specific robot

  1. Set the robot controller to external:
    • In the properties panel, find the controller field
    • Change it from "e-puck" to <extern>
    • This tells Webots we'll control it from our Python script

Making a robot to be controlled by python in Webots

Add the Target

We need a visible target for the robot to navigate toward. We'll use a Solid node so it can be repositioned programmatically (for testing different positions), but we'll make it non-colliding so the robot can reach the exact center.

  1. Add a Solid node:

    • Click the Add button
    • Select Base nodes β†’ Solid
  2. Give the target a DEF name:

    • Select the Solid node
    • Add TARGET to the DEF: field
    • This allows our Python code to reference and move this object
  3. Add visual appearance:

    • Expand the Solid node in the scene tree
    • Right-click on children [] β†’ Add New β†’ Choose Shape
    • Expand the Shape node
    • Right-click on geometry NULL β†’ Add New β†’ Choose Cylinder
    • Configure the Cylinder:
      • Set radius to 0.01
      • Set height to 0.05
    • Right-click on appearance NULL β†’ Add New β†’ Choose PBRAppearance
    • Expand PBRAppearance and set baseColor to red: 1 0 0
  4. Position the target:

    • Find the translation field
    • Set it to: 0.3 0.025 0.3 (x, y, z coordinates)

Why use Solid without collision?

  • Solid nodes can be moved programmatically via the Supervisor API (useful for testing)
  • We skip physics and boundingObject so the robot can drive through the marker
  • The target is purely visualβ€”a goal marker, not a physical obstacle
  • Later, you can add physics if you want obstacle avoidance training

Add the Supervisor

For RL to work, we need a "supervisor" that can:

  • Reset the robot position between episodes
  • Read positions of both robot and target
  • Calculate rewards
  • Control the simulation
  1. Add a Robot node for supervision:

    • Click Add
    • Select Base nodes β†’ Robot
  2. Configure it as a supervisor:

    • Set name to "supervisor_controller"
    • Set supervisor field to TRUE
    • Set controller to <extern>

Save Your World

File β†’ Save World

Your scene tree should now look like this:

Scene tree

Your scene should look like this:

Complete Webots Scene

What we just built:

  • ROBOT (E-puck): The agent that will learn to navigate
  • TARGET (red cylinder): The goal position
  • Supervisor: The "brain" that runs our RL training loop

Verifying Your Setup

Let's make sure everything is connected properly.

Create the test controller:

  1. In your project, create a new folder: controllers/test_supervisor/
  2. Inside that folder, create a file: test_supervisor.py

Your folder structure should look like:

Webots_SB3_Tutorial/
β”œβ”€β”€ webots_rl_env/
β”œβ”€β”€ controllers/
β”‚   └── test_supervisor/
β”‚       └── test_supervisor.py
β”œβ”€β”€ worlds/
β”‚   └── robot_navigation.wbt
└── protos/
Enter fullscreen mode Exit fullscreen mode

Add this code to test_supervisor.py:

from controller import Supervisor

# Initialize supervisor
supervisor = Supervisor()
timestep = int(supervisor.getBasicTimeStep())

# Test: Can we access our nodes?
robot_node = supervisor.getFromDef("ROBOT")
target_node = supervisor.getFromDef("TARGET")

if robot_node and target_node:
    print("βœ“ Setup successful!")
    print(f"  Robot found at: {robot_node.getPosition()}")
    print(f"  Target found at: {target_node.getPosition()}")

    # Test moving the target
    trans_field = target_node.getField("translation")
    current_pos = trans_field.getSFVec3f()
    print(f"  Target can be moved: {current_pos}")

else:
    print("βœ— Setup error!")
    if not robot_node:
        print("  Missing: ROBOT (check DEF name on E-puck)")
    if not target_node:
        print("  Missing: TARGET (check DEF name on Solid)")

# Run one simulation step
supervisor.step(timestep)
print("βœ“ Simulation step successful!")
Enter fullscreen mode Exit fullscreen mode

To run the test:

  1. In Webots, open your robot_navigation.wbt world
  2. Select the Robot (supervisor_controller) node in the scene tree
  3. Change its controller field from <extern> to test_supervisor
  4. Click the Play button (▢️) in Webots. (You might need to Click restart too)
  5. Check the Webots console (bottom panel)

Expected output in the Webots console:

INFO: test_supervisor: Starting controller: python.exe -u test_supervisor.py
βœ“ Setup successful!
  Robot found at: [0.0, 0.0, 0.0]
  Target found at: [0.3, 0.025, 0.3]
  Target can be moved: [0.3, 0.025, 0.3]
βœ“ Simulation step successful!
Enter fullscreen mode Exit fullscreen mode

After testing:

  1. Important: Change the supervisor's controller field back to <extern> (we'll need this for the next tutorial)
  2. File β†’ Save World

Optional test: Hold Shift + Left Click and drag the target in the 3D view. It should move freely, confirming the physics setup is correct.


What You Accomplished

πŸŽ‰ Congratulations! You've built a complete foundation for RL training in Webots:

  • Installed Webots and Python environment
  • Created a simulation world with robot and target
  • Configured external controller setup
  • Verified Python can communicate with Webots
  • Ready to build a Gymnasium environment (next tutorial)

Next Steps

Coming in the next tutorial: "Building a Gymnasium Environment for Webots Robot Control"

We'll write the code that bridges Stable Baselines3 and Webots:

  • Creating a custom Gymnasium environment class
  • Implementing reset() and step() methods
  • Defining observation and action spaces
  • Designing a reward function
  • Handling episode termination

Resources:


Final Thoughts

You might find interacting with Webots a bit confusing at first. It can feel daunting getting introduced to a tool with seemingly many features. But here's the catch: the best way to learn is by playing around.

Go beyond what we've covered in this tutorial. Experiment with the "pre-made" robots available in Webots. Try out certain ideas you have by adding and customizing different nodes. Webots allows you to create custom environments, and hands-on exploration is often the fastest way to get comfortable with any new tool.

Conclusion

You now have a professional-grade simulation setup ready for RL experimentation. This foundation uses the same tools researchers and companies use for real robotics projectsβ€”no expensive hardware required.

The key insight we've established: by using relative observations (distance and angle to target) instead of absolute positions, our future trained agent will generalize. Move the target anywhere, and the robot will adapt.

In the next Tutorial, we will connect our Webots environment to Gynasium.


Thank you for reading this piece to the end. If you face any issue during implementation, you can drop a comment. I'll do well to respond on time.

Top comments (0)