Six months ago, I started working on Mahjong AI. I assumed it would be easier than Go AI.
Go's state space is 10^170 — "more possible positions than atoms in the universe." Mahjong only has 136 tiles. Intuitively, it should be simpler.
I was completely wrong. Here's why, and what we learned building a multi-rule Mahjong AI engine.
Challenge 1: Imperfect Information
Go is a perfect information game. Both players see the entire board. AlphaGo's brilliance was in search + evaluation — exploring future board states and judging which ones are good.
Mahjong is imperfect information. You see 13 tiles in your hand. The other 123 tiles? You know some (discards are visible), but most are hidden. You're making decisions with ~70% of the information missing.
This breaks MCTS (Monte Carlo Tree Search), the backbone of Go AI. MCTS assumes you can simulate future states accurately. In Mahjong, you can't — because you don't know what tiles other players hold.
Our approach: Instead of tree search, we use LSTM networks that learn to infer hidden information from observable signals (discard patterns, timing, claim/pass decisions). Think of it as teaching the AI to "read" opponents the way human experts do.
Challenge 2: 200+ Rule Variants
"Mahjong" isn't one game. It's a family of 200+ games.
Changsha Mahjong has "Zha Niao" (bird catching) — after winning, you flip tiles to determine bonus multipliers. Sichuan Mahjong has "Xue Zhan Dao Di" (bloody fight to the end) — the game continues after the first winner until only one loser remains. Japanese Riichi has entirely different scoring, with concepts like "furiten" (you can't win on a tile you previously discarded).
Each variant requires a separate model. Training 8 models from scratch would be prohibitively expensive.
Our approach: Shared base model + rule-specific adapter layers. The base model learns general Mahjong skills (tile efficiency, defense, hand reading). Adapter layers encode variant-specific rules. This is similar to how multilingual NLP models handle different languages.
Result: Training a new variant takes ~40% less compute compared to training from scratch. The model transfers skills like "don't discard tiles your opponent might need" across all variants.
Challenge 3: Multi-Agent Dynamics
Go is 1v1. Mahjong is 4-player free-for-all (or 2v2 in some variants).
In a 4-player game, optimal strategy isn't just "maximize my winning probability." It's "maximize my winning probability WHILE considering that three other rational agents are doing the same." This is significantly harder than 2-player zero-sum games.
Example: You're one tile away from winning. But the tile you need was just discarded by the player to your left. Should you claim it? In some variants, claiming a discard to win is legal but reveals information. In Riichi Mahjong, you might actually choose NOT to claim it if you're in furiten.
Our approach: We train with self-play across 4 agents simultaneously, using Deep Monte Carlo (DMC) methods. Each agent learns not just its own optimal strategy, but also models of what the other three agents are likely to do.
Challenge 4: Reward Signal Sparsity
In Go, every move changes the board state, providing rich feedback signals. In Mahjong, a game can last 20+ turns before anyone wins — and most of those turns are "draw a tile, discard a tile" with no immediate feedback on whether you're playing well.
Our approach: Auxiliary reward signals. Beyond win/lose, we give partial rewards for:
- Hand efficiency improvements (getting closer to a winning hand)
- Successful defensive plays (avoiding dealing into opponents' wins)
- Information gathering (making discards that reveal useful information)
This dramatically accelerates training convergence.
Challenge 5: Stochastic Elements
Go has zero randomness. Every game state is deterministic.
Mahjong has massive randomness. The tile draw sequence is random. Your starting hand is random. Other players' hands are random. A "perfect" AI can still lose to a novice due to unlucky draws.
This means evaluation requires thousands of games to measure statistical significance. A 2% win rate improvement that would be obvious in Go takes 10,000+ games to confirm in Mahjong.
What We Learned (Technical Summary)
| Aspect | Go AI | Mahjong AI |
|---|---|---|
| Information | Perfect | Imperfect (~70% hidden) |
| Core technique | MCTS + neural net | LSTM + DMC self-play |
| Rules | Single ruleset | 200+ variants |
| Players | 2 (zero-sum) | 4 (general-sum) |
| Randomness | None | High (tile draws) |
| Evaluation | Single game sufficient | Thousands needed |
| State space | Larger (10^170) | Smaller but hidden |
| Action space | ~300/move | ~50/move but context-dependent |
| Training data | Public game records | Variant-specific, often scarce |
The Surprising Takeaway
The hardest part of Mahjong AI isn't any single technical challenge. It's that all five challenges exist simultaneously. Go AI researchers can focus on search algorithms because information is perfect and rules are fixed. Poker AI researchers can focus on imperfect information because the game is well-defined and 2-player.
Mahjong AI requires solving imperfect information + multi-agent dynamics + stochastic outcomes + variable rule sets, all at once. It's a uniquely challenging benchmark for game AI research.
What's Next
- Transformer exploration — attention mechanisms might better capture "who played what" relationships than LSTM
- Online adaptation — adjusting strategy in real-time based on opponent tendencies
- Natural language coaching — using LLMs to translate AI decisions into human-readable explanations ("Don't play 3-wan because your opponent likely needs it for a straight")
I'm building a multi-rule game AI engine covering 7 Mahjong variants + Guandan + Dou Di Zhu + Texas Hold'em. If you're working on game AI or imperfect information games, I'd love to compare notes in the comments.
Top comments (0)