Introducing Whot! AI model.
PROBLEM STATEMENT
The task is to build a practical AI model that can play the Whot! Card Game at human level. Of course as we all know Whot! Game is a blend of skill and luck, skill if you know how to utilize your available cards effectively and luck if you are fortunate to pick more win-bound special cards than your opponent.
INTRODUCTION
As Artificial Intelligence keeps advancing in this age, it does seem that there is basically nothing that cannot be affected by the effect of AI. We wish to build an AI model that can be deployed to the public equipped with the capability of playing the Whot! Card Game, this vision has now given birth to a Whot! AI. We can also use this project to demystify AI, you might think AI is one big unreachable thing, but No you can build your own AI. If you study Statistics a bit, then Machine Learning which is a subset of AI that deals with the algorithms of Artificial Intelligence is similar to the way you predict the future using available dataset using your knowledge of Statistics. For example, if the price is $1000 given certain conditions, how will the price look like given some new conditions?
THE WHOT! AI
The new Whot! AI is an AI model that plays the Whot! Card Game. This model is trained using a Machine Learning approach known as Reinforcement Learning (RL). Reinforcement learning is ML techniques that involve exposing an AI agent to an environment and allowing it learn by directly interacting with the environment. This is an unsupervised learning method that does not require label data for training. In typical ML learning approach, we gather labelled data, here labelling in layman terms means knowing the questions and answers about the data. Questions like, does the animal have small weight, does it have whiskers, does it have long tail, and is its face round or oval, what is its weight in kilograms? By asking and getting the answers to the questions we can come up with a ML model that can classify image of an animal as either dog or cat. However, this kind of learning is cool if we have enough of such data; if we do not have such data, it becomes very difficult to build an AI model that we can trust.
For rule-based game like Whot! Card game, gathering such data will not only be unreachable, but it will also be difficult if not impossible to train an AI that can perform 100% correctly using traditional supervised learning approach, since game rule must be followed 100%. If for instance the call card is Circle 7 and the model predicts to play Star 7, this breaks the game rule. So using such traditional supervised learning technique for gaming applications is nearly unreachable and infeasible.
REINFORCEMENT LEARNING AND WHOT! AI
This barrier can be broken using Reinforcement Learning! In 2016 AlphaGo developed by Google Deep Mind using Reinforcement Learning, defeated human world champion Lee Sedol. Reinforcement is applied not only in gaming but also in finance for instance in fraud detection, it also finds itself in Robotics. One beautiful thing about Reinforcement Learning is we don’t need to tell agent how to do something we just tell the agent what to do, it will learn how to do it by itself.
How then does it work? As described earlier, this technique involves putting an AI agent in an environment and allowing it to learn by interacting with the environment. This environment can for instance be Whot! Game environment. In an environment there is a state and action. A state is the relative condition of the environment at a particular time and the action is the “right” i.e. most proper thing to do at such given state. We will use Whot! game to explain this more clearly. The state of the game can be the current call card, the draw pile size, your hand size, your opponent hand size etc. So we can ask if the call card is Pick Two, and you have pick two card in your pile, what is the best action to take in such a state? According to Whot! rule, for this state there are two valid actions:
Action 1: play a new pick two card to defend.
Action 2: Draw two cards from draw pile.
This led us to a state-value and state-action value function, so our goal is now to derive a policy called 𝞹 (Pi) that maps a given state to an action, if we find such a policy then we can model an AI that plays the Whot game. At first AI agent may not know the right action to take in any state, so it is allowed to take some random valid actions and later on it will no longer take random actions but rather start taking actions that maximize its rewards, so we basically develop a Q-table (think of this as database table) which the agent can reference for action. For any state it checks the table for any action with maximum reward, but when the state is large and continuous using such table becomes ineffective so we adopt a Q-network approach using Deep Learning, we hope the Q-network will generalize more than the Q-table. The core idea is design a reward function for guiding agent learning journey. If it takes a good action like defending pick two when it has pick two card, we give it a positive reward like +1 (encouragment), this positive reward is like saying to it “Good boy! You are doing well!” if it takes a bad action like drawing two cards when it can defend, we give it a small negative reward like -1 (punishment). Then we update its learning policy accordingly using a Mathematical equation known as Bellman Equation. The agent continues to make moves and get encouraged or punished, this can be repeated for up to 10, 000 episodes or more. Check the Whot AI training details, because Whot! game is so stochastic, the model was trained in 50, 000 episodes. Through this training, the agent will learn to avoid those actions that give it negative rewards and learn to do the things that give it positive rewards. After the episodes, we hope it will learn to interact with the game environment optimally.
To explain further, let’s consider Robotics application, let us take an example of how Reinforcement Learning might be useful here. Let’s say we want to train AI models to follow the leader dance. We have a led dancer possibly a human dancer. The task is to train the models to follow the same dance steps of the leader. If the leader raises hands up, we expect the robots to also raise hands up etc. We also apply RL here, we allow it to initially take some random dance step which may not be the same step of the leader, each time a model takes a wrong dance step, we give it negative reward. If it takes the same dance step of the leader, we give it a positive reward. After repeating task for thousands of times, the models will learn to make the dance movements as the leader.
THE TRADITIONAL RULE-BASED ENGINE AND RL
However to train an AI model to play Whot! we need a game rule. So we use the traditional rule-based engine. The Whot! Game you have been playing up to now is called rule-based engine, it is so called because the rule is explicitly programmed and computer follows the instructions to play the game. The idea is to use this rule-based engine to train AI model. Essentially, the agent will typically play against this rule-based engine and later on play against itself and continue to gather experiences through reward function. As you can see from the training details, even though the agent was trained using this traditional rule-based engine, after the end of 50, 000 episodes, it won 28,315 times by getting rid of cards first, won 18, 095 by counting. That is close to winning 40, 000 out of 50, 000! The AI agent has definitely outplayed and outperformed the rule-based engine that trained it; this means the agent has learned pattern and rules of Whot! Game more than the traditional rule-based engine. The traditional engine itself is a tough opponent, it is explicitly programmed for instance to compute the card it will play to get longest streak, something like: HoldOn, HoldON, Suspension, General Market, PickTwo, Check up! For an agent to outplay such an opponent means it has generalized more than the engine.
DOWNLOAD AND TEST THE MODEL
You can test the model by downloading its Android App from the link below, as add-ons we also implemented the traditional rule-based engine (classic computer opponent) as well as multiplayer feature through Bluetooth, Wi-Fi and Online. We hope you enjoy the Whot! game in a modern way. Please play the game and rate and review the app. Thank You.
If you are a developer you can check out the training of the model Whot! game rule:
And check out the card image recognition here:
Top comments (0)