DEV Community

Andrés Álvarez Iglesias
Andrés Álvarez Iglesias

Posted on

Django 10 - Implementing TicTacToe with IA

NOTE: This article was initially posted on my Substack, at https://andresalvareziglesias.substack.com/

Hi all!

The Tic Magical Line experiment is approaching to an end. In the previous articles, we have learned how to build a full stack Django version of the TicTacToe game, inside a containerized environment with the help of Docker.

Our TicTacToe is a (sort of) MMORPG. Each player can battle against other players... but also against the CPU, disguised as a dragon.

Let's make the dragon's brain and play a bit with the mysterious world of AI and Machine Learning...

Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.

Articles in this series

Image description

CPU player without Machine Learning

The TicTacToe is a simple game, and the CPU player logic can be really simple too. We can do something like this:

import random
import os

from game.tictactoe.dragonagent import DragonAgent

class DragonPlay:
   def __init__(self, board, type="ai"):
       self.board = board
       self.type = type

   def chooseMovement(self):
       if self.type == "simple":
           return self.simpleMovement()
       else:
           raise Exception("Not implemented yet!")

   def getEmptyPositions(self):
       emptyPositions = []

       for i in range(0, 9):
           if self.board[i] == "E":
               emptyPositions.append(i)

       return emptyPositions

   def simpleMovement(self):
       emptyPositions = self.getEmptyPositions()
       if len(emptyPositions) == 0:
           print("No empty position to play!")
           return -1

       if random.choice([True, False]):
           # Choose the fist empty position and play there
           return emptyPositions[0]

       else:
           # Choose a random empty position and play there
           return random.choice(emptyPositions)
Enter fullscreen mode Exit fullscreen mode

This simple agent makes random movements in a very dumb way... but allows a player to play against the CPU. Very useful for testing the game logic of our Django application until now... but a bit boring at the end.

We need a smarter dragon...

CPU player with Machine Learning

Make easy things hard, just for fun. Let's create the same CPU player, but using a bit of AI and Machine Learning this time:

import random
import numpy as np
from tensorflow.keras.models import load_model
import os

from game.tictactoe.dragonagent import DragonAgent

class DragonPlay:
   def __init__(self, board, type="ai"):
       self.board = board
       self.type = type

   def chooseMovement(self):
       if self.type == "simple":
           return self.simpleMovement()
       else:
           return self.aiMovement()

   def getEmptyPositions(self):
       emptyPositions = []

       for i in range(0, 9):
           if self.board[i] == "E":
               emptyPositions.append(i)

       return emptyPositions

   def simpleMovement(self):
       emptyPositions = self.getEmptyPositions()
       if len(emptyPositions) == 0:
           print("No empty position to play!")
           return -1

       if random.choice([True, False]):
           # Choose the fist empty position and play there
           return emptyPositions[0]

       else:
           # Choose a random empty position and play there
           return random.choice(emptyPositions)

   def aiMovement(self):
       emptyPositions = self.getEmptyPositions()
       if len(emptyPositions) == 0:
           print("No empty position to play!")
           return -1

       agent = DragonAgent()
       if os.path.exists('/game/tictactoe/model/dragon.keras'):
           agent.model = load_model('/game/tictactoe/model/dragon.keras')

       validMove = False
       position = -1

       while not validMove:
           position = agent.start(self.boardToState(self.board))
           if self.board[position] == "E":
               validMove = True

       return position

   def boardToState(self, board):
       state = []

       for cell in board:
           if cell == 'E':
               state.append(0)
           elif cell == 'X':
               state.append(1)
           elif cell == 'O':
               state.append(-1)

       return state
Enter fullscreen mode Exit fullscreen mode

This code loads an Agent class and a Machine Learning model. The agent class is a TensorFlow based agent using the QLearning machine learning algorithm, a reinforcement algorithm that learns playing:

import numpy as np
import tensorflow as tf

class DragonAgent:
   def __init__(self, alpha=0.5, discount=0.95, exploration_rate=1.0):
       self.alpha = alpha
       self.discount = discount
       self.exploration_rate = exploration_rate
       self.state = None
       self.action = None

       self.model = tf.keras.models.Sequential([
           tf.keras.layers.Dense(32, input_shape=(9,), activation='relu'),
           tf.keras.layers.Dense(32, activation='relu'),
           tf.keras.layers.Dense(9)
       ])

       self.model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=alpha), loss='mse')

   def start(self, state):
       self.state = np.array(state)
       self.action = self.get_action(state)
       return self.action

   def get_action(self, state):
       if np.random.uniform(0, 1) < self.exploration_rate:
           action = np.random.choice(9)
       else:
           q_values = self.model.predict(np.array([state]))
           action = np.argmax(q_values[0])
       return action

   def learn(self, state, action, reward, next_state):
       q_update = reward
       if next_state is not None:
           q_values_next = self.model.predict(np.array([next_state]))
           q_update += self.discount * np.max(q_values_next[0])

       q_values = self.model.predict(np.array([state]))
       q_values[0][action] = q_update

       self.model.fit(np.array([state]), q_values, verbose=0)

       self.exploration_rate *= 0.99

   def step(self, state, reward):
       action = self.get_action(state)
       self.learn(self.state, self.action, reward, state)
       self.state = np.array(state)
       self.action = action
       return action
Enter fullscreen mode Exit fullscreen mode

It's a bit confusing, we need to learn how we can use this agent to understand it. All will make sense at the end, believe me :)

How to train your dragon

In this line the previous code loaded a pre trained model:

load_model('/game/tictactoe/model/dragon.keras')
Enter fullscreen mode Exit fullscreen mode

But, how can we train this model? We can teach a couple of dragons how to play to TicTacToe and reward them with each victory and punish them with each defeat. The dragons can now play one time, and other, and other, and other... You get the idea.

How can we implement this? Simple: get a TicTacToe board, a couple of DragonAgent instances and let's the play begin:

import numpy as np
from tensorflow.keras.models import load_model
import tensorflow
import os
import random
import sys

from dragonagent import DragonAgent
from tictactoe import TicTacToe

def boardToState(board):
   state = []

   for cell in board:
       if cell == 'E':
           state.append(0)
       elif cell == 'X':
           state.append(1)
       elif cell == 'O':
           state.append(-1)

   return state

def agentPlay(prefix, name, game, agent, symbol):
   validMove = False
   while not validMove:
       if game.freeBoardPositions() > 1:
           position = agent.get_action(boardToState(game.board))
       else:
           position = game.getUniquePossibleMovement()

       validMove = game.makeMove(symbol, position)
       if validMove:
           print(f"{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}")

   return game.checkGameOver()

def agentStart(prefix, name, game, agent, symbol):
   validMove = False
   while not validMove:
       position = agent.start(boardToState(game.board))

       validMove = game.makeMove(symbol, position)
       if validMove:
           print(f"{prefix} > {name}: Plays {symbol} at position {position} | State: {game.board}")

   return game.checkGameOver()

def playGame(prefix, agent, opponent):
   emptyBoard = "EEEEEEEEE"

   game = TicTacToe(emptyBoard)

   # Choose who starts the game
   agentIsO = random.choice([True, False])
   print(f"{prefix} > NOTE: In this game the agent is {'O' if agentIsO else 'X'}")

   agentInitialized = False
   opponentInitialized = False

   while not game.checkGameOver() and not game.noPossibleMove():
       if agentIsO:
           # Give an immediate reward on 1 if the agent wins
           if agentInitialized:
               position = agentPlay(prefix, "Agent", game, agent, 'O')
           else:
               position = agentStart(prefix, "Agent", game, agent, 'O')
               agentInitialized = True

           if game.checkGameOver():
               print(f"{prefix} > Agent wins! Agent's reward is: +1")
               agent.learn(boardToState(game.board), position, 1, None)
               break

           # Give an immediate penalty regard on -1 if the opponent wins
           if opponentInitialized:
               position = agentPlay(prefix, "Opponent", game, opponent, 'X')
           else:
               position = agentStart(prefix, "Opponent", game, opponent, 'X')
               opponentInitialized = True

           if game.checkGameOver():
               print(f"{prefix} > Opponent wins! Agent's reward is: -1")
               agent.learn(boardToState(game.board), position, -1, None)
               break

       else:
           # Give an immediate penalty regard on -1 if the opponent wins
           if opponentInitialized:
               position = agentPlay(prefix, "Opponent", game, opponent, 'O')
           else:
               position = agentStart(prefix, "Opponent", game, opponent, 'O')
               opponentInitialized = True

           if game.checkGameOver():
               print(f"{prefix} > Opponent wins! Agent's reward is: -1")
               agent.learn(boardToState(game.board), position, -1, None)
               break

           # Give an immediate reward on 1 if the agent wins
           if agentInitialized:
               position = agentPlay(prefix, "Agent", game, agent, 'X')
           else:
               position = agentStart(prefix, "Agent", game, agent, 'X')
               agentInitialized = True

           if game.checkGameOver():
               print(f"{prefix} > Agent wins! Agent's reward is: +1")
               agent.learn(boardToState(game.board), position, 1, None)
               break

       # If no one wins, give a reward of 0
       agent.step(boardToState(game.board), 0)

   print(f'{prefix} > Game over! Winner: {game.winner}')
   game.dumpBoard()

   if (agentIsO and game.winner == 'O') or (not agentIsO and game.winner == 'X'):
       return 1
   elif game.winner == 'D':
       return 0
   else:
       return -1

# Reopen the trained model if available
agent = DragonAgent()
if os.path.exists('/game/tictactoe/model/dragon.keras'):
   agent.model = load_model('/game/tictactoe/model/dragon.keras')

# The opponent muest be more exploratory; set yo 1.0 to always choose random actions
# exploration_rate goes from 0.0 to 1.0)
opponent = DragonAgent(exploration_rate=0.9) 

# We can optionally set the number of games from command line
try:
   numberOfGames = int(sys.argv[1])
except:
   numberOfGames = 10

# Uncomment to disable keras training messages
tensorflow.keras.utils.disable_interactive_logging()

# Play each game
wins = 0
draws = 0
loses = 0

for numGame in range(numberOfGames):
   prefix = f"{numGame+1}/{numberOfGames}"

   print(f"Playing game {prefix}...")
   result = playGame(prefix, agent, opponent)

   if result == 1:
       wins += 1
   elif result == 0:
       draws += 1
   else:
       loses += 1

   # Save the trained model after each game
   agent.model.save('/game/tictactoe/model/dragon.keras')

   print(f'{prefix} > Training result until now: {wins} wins, {loses} loses, {draws} draws')
   print()
Enter fullscreen mode Exit fullscreen mode

I'm sure that there is a better way of doing this, but remember, we are still learning, start with something that (sort of) works and improve it later 🙂

This piece of code performs any number of IA battles, learning on the way and storing the training result on a model file. Later, we can use this model file in the Tic Magical Line application.

Not very useful... but funny!

What we learned until now

This experiment has been an excuse from the beginning to the end to learn how to build a Django application inside a Dockerized environment. Everything else (the TicTacToe part, the Dragons and the machine learning) is just a bit of spice to make the learning more funny.

We have learned until now that Django is awesome. Is full of functionalities, very organized and has a toon of plugins and extensions. Very, very useful.

Now, we can use this fantastic framework to do more useful applications.

Thanks for reading A Python journey to Full-Stack! Subscribe for free to receive new posts and support my work.

About the list

Among the Python and Docker posts, I will also write about other related topics (always tech and programming topics, I promise... with the fingers crossed), like:

  • Software architecture
  • Programming environments
  • Linux operating system
  • Etc.

If you found some interesting technology, programming language or whatever, please, let me know! I'm always open to learning something new!

About the author

I'm Andrés, a full-stack software developer based in Palma, on a personal journey to improve my coding skills. I'm also a self-published fantasy writer with four published novels to my name. Feel free to ask me anything!

Top comments (0)