DEV Community

Coding Lad
Coding Lad

Posted on

Monte Carlo Tree Search (MCTS) Tutorial – Complete Guide with Examples

Monte Carlo Tree Search (MCTS) Tutorial – Complete Guide with Examples

Master Monte Carlo Tree Search (MCTS) with this beginner-friendly guide. Learn the theory, step-by-step algorithm, and practical examples used in AI and game-playing systems.

🧠 What is MCTS?

Monte Carlo Tree Search (MCTS) is a powerful decision-making algorithm used in games and simulations. It balances:

  • Exploration: trying new or unknown moves
  • Exploitation: selecting moves that are known to work well

⚙️ Four Steps of MCTS

1. Selection

Traverse the tree from the root using the UCB1 formula to select the most promising child:


UCB1(S\_i) = (t\_i / n\_i) + C \* sqrt(ln(N) / n\_i)

Enter fullscreen mode Exit fullscreen mode

Where:

  • t_i: total value of node S_i
  • n_i: visit count of node S_i
  • N: total visits of the parent
  • C: exploration constant (usually √2 or 2)

2. Expansion

Once a node is selected, generate one or more child nodes (if any unvisited children exist).


3. Rollout (Simulation)

Run a random simulation (or using a default policy) from the newly expanded node until a terminal state is reached.


4. Backpropagation

Propagate the result of the simulation up the tree, updating value and visit count for each node on the path.


🔁 This 4-step process is repeated X times (number of simulations/iterations).


🧩 MCTS Example – Step-by-Step

Image description

Given a snapshot of an MCTS tree, we perform 2 rollouts with return values 15 and 75.

💡 Always pick the right node when there's a tie/random choice.


▶️ Rollout 1 (return value = 15)

Step 1: Selection

Compute UCB1 for S1 and S2:

S1: t=60, n=2
S2: t=18, n=1
Parent (S0): n=3, ln(3) ≈ 1.0986, C = 2

UCB1(S1) ≈ 30 + 2 * sqrt(1.0986/2) ≈ 31.48  
UCB1(S2) ≈ 18 + 2 * sqrt(1.0986/1) ≈ 20.09  
Enter fullscreen mode Exit fullscreen mode


`

Choose S1

Step 2: Expansion

S1 has an unvisited child S3. We expand the tree by adding S3.

Step 3: Rollout

Run a random simulation from S3 → terminal value = 15.

Step 4: Backpropagation

Update values:


S3: t=15, n=1
S1: t=75, n=3
S0: t=98, n=4


▶️ Rollout 2 (return value = 75)

Step 1: Selection from S0


S1: t=75, n=3 → UCB1 ≈ 26.36
S2: t=18, n=1 → UCB1 ≈ 20.35

Choose S1

Step 2: Selection within S1


S3: t=15, n=1 → UCB1 ≈ 17.09
S4: t=40, n=1 → UCB1 ≈ 42.09

Choose S4

Step 3: Expansion

S4 has an unvisited child → expand to S6.

Step 4: Rollout

Simulation from S6 returns 75.

Step 5: Backpropagation


S6: t=75, n=1
S4: t=115, n=2
S1: t=150, n=4
S0: t=173, n=5

📖 Original blog with diagrams + YouTube video here:
👉 Monte Carlo Tree Search (MCTS) Guide


✍️ Written by Dipto Mistry — Full-stack developer & AI enthusiast.

Top comments (0)