This article builds upon last week’s minimax searching article. I won’t discuss minimax in depth here, so please check out that article if you have questions about minimax.
Games are fun! Programming computers to play games is also fun!
Today I’m going to walk you through what is takes to write a tic tac toe bot.
Tic tac toe might be a simple game, but it’s useful to teach strategies for how some computers approach playing games.
How Humans Play Tic Tac Toe
Imagine you wanted to play tic tac toe. How would you go about doing that?
You’d first learn the rules. Then start playing games.
Which deciding on a move, you would probably mentally play out how the game would progress.
If I play there, and he plays there, then I could play there, so on and so on.
Over time, you begin to understand which strategies are better or worse.
You begin to gather an intuition about the game.
How Bots Play Tic Tac Toe
On the other hand, how do bots play games? It’s harder for a bot to build an “intuition” about games.
Humans have to program bots how to play games. Historically that’s done via a rules or brute force based approach. Our programs tell bots exactly what to do in each circumstance. That’s changing a bit with the rise of deep learning.
Today we’re going to teach our bot how to play tic tac toe using a brute force solution, minimax searching.
We’ll leverage minimax searching to simulate every possible move and counter move. That way our bot will always pick the best move given a particular game state.
This type of strategy gets computationally infeasible for more complicated games, but useful for solving games of this complexity.
Setting Up Our Game
The first thing that we’ll do is write classes and methods that keep track of our game state. All source code is located on github.
Player
First, we'll define our player enum.
import enum | |
class Player(enum.Enum): | |
x = 1 | |
o = 2 | |
@property | |
def other(self): | |
return Player.x if self == Player.o else Player.o |
Tic tac toe is a two player game, so naturally we’ll have two players. We’ll also provide a useful property, other
, to always allow us to easily get the other player.
Board
Next we’ll turn out attention towards our board. Our board handles most of the logic for running a game. It tracks the history of moves, determines when someone wins and ensures everyone plays according to the rules.
We’ll start of by defining a helper, our board constructor and a print method to print our board.
import copy | |
from .player import Player | |
MARKER_TO_CHAR = { | |
None: ' . ', | |
Player.x: ' x ', | |
Player.o: ' o ', | |
} | |
class Board(): | |
def __init__(self): | |
self.dimension = 3 | |
self.grid = [ [ None for y in range (self.dimension) ] for x in range (self.dimension ) ] | |
self.moves = [] | |
def print(self): | |
print() | |
for row in range(self.dimension): | |
line = [] | |
for col in range(self.dimension): | |
line.append(MARKER_TO_CHAR[self.grid[row][col]]) | |
print('%s' % (''.join(line))) |
MARKER_TO_CHAR
allows us to convert between how we track our players and how we display them.
Our constructor initializes an empty board and empty moves list.
Here’s what an empty board would look like.
And here’s what it would look like after two turns.
Definitely simple, but effective. Now lets turn out attention to determining if someone has won our game.
Winning
How does someone win a game of tic tac toe? Someone wins if they get three in a row. So these are all winning moves for o.
So we’ll add a method called has_winner()
to our board class that checks those 4 cases.
def has_winner(self): | |
# need at least 5 moves before x hits three in a row | |
if (len(self.moves) < 5): | |
return None | |
# check rows for win | |
for row in range(self.dimension): | |
unique_rows = set(self.grid[row]) | |
if (len(unique_rows) == 1): | |
value = unique_rows.pop() | |
if (value != None): | |
return value | |
# check columns for win | |
for col in range(self.dimension): | |
unique_cols = set() | |
for row in range(self.dimension): | |
unique_cols.add(self.grid[row][col]) | |
if (len(unique_cols) == 1): | |
value = unique_cols.pop() | |
if (value != None): | |
return value | |
# check backwards diagonal (top left to bottom right) for win | |
backwards_diag = set() | |
backwards_diag.add(self.grid[0][0]) | |
backwards_diag.add(self.grid[1][1]) | |
backwards_diag.add(self.grid[2][2]) | |
if (len(backwards_diag) == 1): | |
value = backwards_diag.pop() | |
if (value != None): | |
return value | |
# check forwards diagonal (bottom left to top right) for win | |
forwards_diag = set() | |
forwards_diag.add(self.grid[2][0]) | |
forwards_diag.add(self.grid[1][1]) | |
forwards_diag.add(self.grid[0][2]) | |
if (len(forwards_diag) == 1): | |
value = forwards_diag.pop() | |
if (value != None): | |
return value | |
# found no winner, return None | |
return None |
We only check after our game is 5 moves or older, a tic tac toe game is unwinnable before that point.
If it finds a winner it will return that player, else return None
indicating that there is no winner yet.
Helper Functions
Then lastly we’ll wrap up our board with several helper functions.
def make_move(self, row, col, player): | |
if (self.is_space_empty(row, col)): | |
self.grid[row][col] = player | |
self.moves.append([row,col]) | |
else: | |
raise Exception("Attempting to move onto already occupied space") | |
def last_move(self): | |
return self.moves[-1] | |
def is_space_empty(self, row, col): | |
return self.grid[row][col] is None | |
def get_legal_moves(self): | |
choices = [] | |
for row in range(self.dimension): | |
for col in range(self.dimension): | |
if (self.is_space_empty(row, col)): | |
choices.append([row,col]) | |
return choices | |
def __deepcopy__(self, memodict={}): | |
dp = Board() | |
dp.grid = copy.deepcopy(self.grid) | |
dp.moves = copy.deepcopy(self.moves) | |
return dp |
make_move()
takes a row, col and player and will mark that space for that player if the space is empty.
get_legal_moves()
determines which of the board spaces is unoccupied and returns those spaces in a [row, col] format.
__deepcopy__()
, most of the bots that we’re going to build generate copies of boards. This method helps ensure that our simulated boards don’t mess up our real board.
Game
The last class we’ll introduce before diving into bots is our game class. It’s job is to simulate and record stats for games against two different bots.
from .board import Board | |
from .player import Player | |
class Game(): | |
def __init__(self, num_of_games): | |
self.num_of_games = num_of_games | |
self.x_wins = 0 | |
self.o_wins = 0 | |
self.ties = 0 | |
def simulate(self, xbot, obot, print_game = False): | |
for _ in range(self.num_of_games): | |
board = Board() | |
current_turn = Player.x | |
winner = None | |
for i in range(9): | |
choice = [] | |
if (current_turn == xbot.player): | |
choice = xbot.select_move(board) | |
else: | |
choice = obot.select_move(board) | |
board.make_move(choice[0], choice[1], current_turn) | |
winner = board.has_winner() | |
if print_game: | |
board.print() | |
if (winner != None): | |
print ("Congrats " + str(winner)) | |
break | |
elif (i == 8): | |
print ("It's a tie!") | |
break | |
current_turn = current_turn.other | |
if (winner == Player.x): | |
self.x_wins = self.x_wins + 1 | |
elif (winner == Player.o): | |
self.o_wins = self.o_wins + 1 | |
else: | |
self.ties = self.ties + 1 | |
print ("x wins: " + str(self.x_wins)) | |
print ("o wins: " + str(self.o_wins)) | |
print ("ties: " + str(self.ties)) |
It can simulate a configurable amount of games. We’ll use this to determine how our bots fair against either other over a larger quantity of games.
Now lets dive into bots!
RandomBot
The first bot we’re going to build makes random moves. It’s the simplest bot.
import random | |
class RandomBot(): | |
def __init__(self, player): | |
self.player = player | |
def select_move(self, board): | |
candidates = board.get_legal_moves() | |
return random.choice(candidates) |
All it does is take a board, gets the legal moves it could make and picks one at random.
Lets create a runner class to pit two random bots against each other and see how they perform.
from ttt.player import Player | |
from ttt.game import Game | |
from ttt.random_bot import RandomBot | |
def main(): | |
xbot = RandomBot(Player.x) | |
obot = RandomBot(Player.o) | |
game = Game(1000) | |
game.simulate(xbot, obot) | |
if __name__ == '__main__': | |
main() |
If we simulate 1000 games here are the results:
- x wins 577 times
- o wins 312 times
- ties 111 times
You can run this multiple times and the percentages are in this ballpark. Even though both are playing randomly, the x win percentage is higher because x always goes first. X has a first turn advantage.
OneLayer Bot
A random strategy is really bad. Lets make our bot a little smarter. Our next bot looks at the current board, if there is a winning move it will pick it. Otherwise it will default to the random strategy.
import random | |
import copy | |
class OneLayerBot(): | |
def __init__(self, player): | |
self.player = player | |
def select_move(self, board): | |
# gets legal moves | |
candidates = board.get_legal_moves() | |
# loops through legal moves | |
for i in range(len(candidates)): | |
row = candidates[i][0] | |
col = candidates[i][1] | |
newboard = copy.deepcopy(board) | |
newboard.make_move(row, col, self.player) | |
# if move results in player winning, return this move | |
if (newboard.has_winner() == self.player): | |
return [row,col] | |
# otherwise make a random choice | |
return random.choice(candidates) |
The comments describe what our bot is doing. We’re basically performing a one layer search, hence the bot name. Our bot can look one move into the future and pick the best move if it is a win.
Lets create another runner to test our bot, we’ll bit it against the randombot to see whether it improves or reduces winning percentages.
from ttt.player import Player | |
from ttt.game import Game | |
from ttt.random_bot import RandomBot | |
from ttt.one_layer_bot import OneLayerBot | |
def main(): | |
xbot = RandomBot(Player.x) | |
obot = OneLayerBot(Player.o) | |
game = Game(1000) | |
print ("randombot (x) vs onelayerbot (o)") | |
game.simulate(xbot, obot) | |
xbot = OneLayerBot(Player.x) | |
obot = RandomBot(Player.o) | |
game = Game(1000) | |
print ("onelayerbot (x) vs randombot (o)") | |
game.simulate(xbot, obot) | |
xbot = OneLayerBot(Player.x) | |
obot = OneLayerBot(Player.o) | |
game = Game(1000) | |
print ("onelayerbot (x) vs onelayerbot (o)") | |
game.simulate(xbot, obot) | |
if __name__ == '__main__': | |
main() |
We play 1000 games in each combination of bots. Not surprisingly our onelayerbot improves our odds of winning.
- randombot (x) vs. onelayerbot (o)
- x wins: 390
- o wins: 529
- ties: 81
- onelayerbot (x) vs randombot (o)
- x wins: 805
- o wins: 109
- ties: 86
- onelayerbot (x) vs onelayerbot (o)
- x wins: 689
- o wins: 266
- ties: 45
The first set of results is interesting, our one layer bot is able to reverse the first turn advantage that x has because it has a (marginally) better strategy.
TwoLayer Bot
Lets build another bot that looks two moves into the future. It still picks a winning move if there is one, but it now blocks a winning move for the opponent.
import random | |
import copy | |
class TwoLayerBot(): | |
def __init__(self, player): | |
self.player = player | |
def select_move(self, board): | |
losing_move = None | |
first_candidates = board.get_legal_moves() | |
for i in range(len(first_candidates)): | |
# check for winning move | |
row = first_candidates[i][0] | |
col = first_candidates[i][1] | |
newboard = copy.deepcopy(board) | |
newboard.make_move(row, col, self.player) | |
if (newboard.has_winner() == self.player): | |
return [row,col] | |
second_candidates = newboard.get_legal_moves() | |
for j in range(len(second_candidates)): | |
# check if opponent can win on their next turn | |
r = second_candidates[j][0] | |
c = second_candidates[j][1] | |
nb = copy.deepcopy(newboard) | |
nb.make_move(r, c, self.player.other) | |
if (nb.has_winner() == self.player.other): | |
losing_move = [r,c] | |
# if opponent has a winning move, block it | |
if (losing_move != None): | |
return losing_move | |
# otherwise return a random choice | |
return random.choice(first_candidates) |
Lines 21-29 are the new bit. It calculates determines if its opponent can win on it’s next turn. Playing in that space to prevent them from winning.
Again lets create another runner to pit our bots against each other.
from ttt.player import Player | |
from ttt.game import Game | |
from ttt.random_bot import RandomBot | |
from ttt.two_layer_bot import TwoLayerBot | |
def main(): | |
xbot = RandomBot(Player.x) | |
obot = TwoLayerBot(Player.o) | |
game = Game(1000) | |
print ("randombot (x) vs twolayerbot (o)") | |
game.simulate(xbot, obot) | |
xbot = TwoLayerBot(Player.x) | |
obot = RandomBot(Player.o) | |
game = Game(1000) | |
print ("twolayerbot (x) vs randombot (o)") | |
game.simulate(xbot, obot) | |
xbot = TwoLayerBot(Player.x) | |
obot = TwoLayerBot(Player.o) | |
game = Game(1000) | |
print ("twolayerbot (x) vs twolayerbot (o)") | |
game.simulate(xbot, obot) | |
if __name__ == '__main__': | |
main() |
- randombot (x) vs twolayerbot (o)
- x wins: 50
- o wins: 713
- ties: 237
- twolayerbot (x) vs randombot (o)
- x wins: 879
- o wins: 13
- ties: 108
- twolayerbot (x) vs twolayerbot (o)
- x wins: 310
- o wins: 164
- ties: 526
Winning percentages continue to go up for our new bot against the random bot. You’ll also notice that when our twolayerbot plays against itself, it ties over half the games. it’s defense is a lot better.
InvinciBot
And lastly, for the moment we’ve all been waiting for, a bot that will never lose a game of tic tac toe.
InvinciBot!!!! 👍👏
How does our bot do it? It implements a minimax search algorithm to calculate every possible combination of move. Therefore, ensuring with precision, to never make a move that would result in a loss.
Because tic tac toe is a rather simple game, if both players always choose the an optimal strategy the game will always result in a tie. Which is why I say our bot will never lose, instead of saying it will always win. If it’s playing against an inferior opponent it will win most of the time!
Our minimax implementation is a recursive depth first search algorithm. For a particular board, it calculates whether each move would result in a win, tie or lose.
import copy | |
class Choice(): | |
def __init__(self, move, value, depth): | |
self.move = move | |
self.value = value | |
self.depth = depth | |
def __str__(self): | |
return (str(self.move) + ": " + str(self.value)) | |
class InvinciBot(): | |
def __init__(self, player): | |
self.player = player | |
def minimax(self, board, is_max, current_player, depth): | |
# if board has a winner or is a tie | |
# return with appropriate values | |
winner = board.has_winner() | |
if (winner == self.player): | |
return Choice(board.last_move(), 10 - depth, depth) | |
elif (winner == self.player.other): | |
return Choice(board.last_move(), -10 + depth, depth) | |
elif (len(board.moves) == 9): | |
return Choice(board.last_move(), 0, depth) | |
# otherwise, call minimax on each possible board combination | |
candidate_choices = [] | |
candidates = board.get_legal_moves() | |
for i in range(len(candidates)): | |
row = candidates[i][0] | |
col = candidates[i][1] | |
newboard = copy.deepcopy(board) | |
newboard.make_move(row, col, current_player) | |
result = self.minimax(newboard, not is_max, current_player.other, depth + 1) | |
result.move = newboard.last_move() | |
candidate_choices.append(result) | |
max_choice = None | |
max_value = -100 | |
min_choice = None | |
min_value = 100 | |
# determine which board combinations result in | |
# best move for particular agent | |
for i in range(len(candidate_choices)): | |
choice = candidate_choices[i] | |
if (is_max and choice.value > max_value): | |
max_choice = choice | |
max_value = choice.value | |
elif (not is_max and choice.value < min_value): | |
min_choice = choice | |
min_value = choice.value | |
# pick whichever move is the best for the | |
# particular agent | |
if (is_max): | |
return max_choice | |
else: | |
return min_choice | |
def select_move(self, board): | |
choice = self.minimax(board, True, self.player, 0) | |
return choice.move |
Because it simulates every possible move, this bot takes a lot longer to play. Running locally, it takes about 30 secs to make the first move.
Lets put together a runner to test out our bot (this runner took 25 minutes to run locally).
from ttt.player import Player | |
from ttt.game import Game | |
from ttt.random_bot import RandomBot | |
from ttt.invincibot import InvinciBot | |
from datetime import datetime | |
def main(): | |
dateTimeObj = datetime.now() | |
print(dateTimeObj) | |
xbot = InvinciBot(Player.x) | |
obot = RandomBot(Player.o) | |
game = Game(15) | |
print ("invincibot (x) vs randombot (o)") | |
game.simulate(xbot, obot) | |
xbot = RandomBot(Player.x) | |
obot = InvinciBot(Player.o) | |
game = Game(15) | |
print ("randombot (x) vs invincibot (o)") | |
game.simulate(xbot, obot) | |
xbot = InvinciBot(Player.x) | |
obot = InvinciBot(Player.o) | |
game = Game(15) | |
print ("invincibot (x) vs invincibot (o)") | |
game.simulate(xbot, obot) | |
dateTimeObj = datetime.now() | |
print(dateTimeObj) | |
if __name__ == '__main__': | |
main() |
- invincibot (x) vs randombot (o)
- x wins: 15
- o wins: 0
- ties: 0
- randombot (x) vs invincibot (o)
- x wins: 0
- o wins: 13
- ties: 2
- invincibot (x) vs invincibot (o)
- x wins: 0
- o wins: 0
- ties: 15
InvinciBot dominates randombot. It’s not even close. And when invincibots play against each other, the game always ends in a tie. That is the optimal results for a game of tic tac toe. If each player using the best strategy, the game will always tie.
How InvinciBot Plays
I’ll touch briefly on how InvinciBot is so good at tic tac toe. As mentioned before, it’s good (and takes so long) because it uses minimax to search through each possible move.
Here is what a sample search would look like. It generates every possible combination for each player.
I left off most of the board states because I can’t draw that many. But the search doesn’t stop until a particular board ends with a win or a tie. Here’s the number of boards that are generated to pick each move in the game.
- 1: 294,778
- 2: 31,973
- 3: 3,864
- 4: 478
- 5: 104
- 6: 26
- 7: 8
- 8: 3
- 9: 1
To make the 1st move, InvinciBot generates over a quarter of a million potential boards. To make the 9th and last move, there’s only one open spot, so it only generates one board.
As it generates possible board states, it remembers whenever a game ends in a win, tie or lose. Then at each level compares the best outcome for that particular board, ensuring that the player makes the best possible move.
This strategy works well for simple games, but breaks down quickly in more complex games.
Final Thoughts
I really enjoyed making these tic tac toe playing bots! It was fun to see a bot play tic tac toe. While the game is simple, we can still learn a lot of strategies to help us beat other games.
There’s definitely a lot of room for optimization, alpha beta pruning being the logical optimization to make.
If you’re interested in learning more about game playing bots, follow me on twitter, I’ll be learning, writing and tweeting about it as I do it.
I’m currently reading Deep Learning and the Game of Go, this post is inspired by my reading there. Specifically player and printing the board are inspired by that book. I would recommend the book to anyone looking for an in depth introduction to game playing bots.
Please see my github repo for full source code.
Originally published at thesharperdev.com
Top comments (0)