The most efficient way to evaluate poker hands at scale is not to write logic that checks for straights and flushes at runtime. Instead, the industry standard for high-frequency trading (HFT) and real-money poker is pre-computed Lookup Tables (LUTs) combined with bitwise operations and SIMD (Single Instruction, Multiple Data) vectorization.
The goal is to reduce hand evaluation to a single CPU instruction or a few array lookups, achieving nanosecond-level latency. This allows the engine to evaluate millions of hands per second for Monte Carlo simulations, AI training, or real-time fairness checks.
1. The Core Architecture: Lookup Tables (LUTs)
The fundamental insight is that there are a finite number of 5-card combinations from a 52-card deck: $\binom{52}{5} = 2,598,960$.
For 7-card games (Hold'em/Omaha), it's $\binom{52}{7} = 133,784,560$.
The Strategy:
- Pre-computation: Before the server starts, generate a massive array where every possible 5-card (or 7-card) combination maps to a unique integer "strength" value.
- Runtime: Convert the player's hand into a unique integer key (hash).
- Lookup: Access the array at that index. The value returned is the hand's strength.
Memory Trade-off:
- 5-Card LUT: ~2.6 million entries. If each entry is a 4-byte integer, this is ~10 MB. Fits easily in L3 CPU cache.
- 7-Card LUT: ~133 million entries. ~534 MB. Fits in RAM, but may spill out of cache on large servers.
- Optimization: Most engines use a 5-card LUT and a 7-to-5 reduction algorithm. They generate all $\binom{7}{5} = 21$ combinations, look them up, and pick the best.
2. The Bitwise Implementation (The "Cactus Kev" / "TwoPlusTwo" Method)
The most famous high-performance implementation is the Cactus Kev algorithm (popularized by poker-eval in C/C++). It maps cards to prime numbers or bitmasks to perform arithmetic operations that instantly detect flushes and straights.
Data Representation
- Cards: Represented as integers.
- Ranks: 2-14 (2-Ace).
- Suits: 4 bits (1 for Spade, 2 for Heart, etc.).
- Bitmasks: A 52-bit integer where the $i$-th bit is 1 if the $i$-th card is present.
The Evaluation Function (Pseudocode)
// Pre-computed table: maps a 5-card hash to a strength score (0-7462)
// Lower score = better hand (or higher, depending on convention)
int lookup_table[2598960];
int evaluate_hand(int hand_mask) {
// 1. Check for Flush: Sum of suit bits. If any suit sum >= 4 (5 cards), it's a flush.
// This is done via bitwise AND with suit masks (e.g., 0xF0F0F0F0F0F0F0F0)
if (is_flush(hand_mask)) {
return lookup_flush[hand_mask]; // Pre-computed flush strength
}
// 2. Check for Straight: Use a "rank bitboard".
// Shift bits to detect sequences of 5 consecutive bits.
if (is_straight(hand_mask)) {
return lookup_straight[hand_mask];
}
// 3. Standard Evaluation (Pairs, Trips, etc.)
// Use the "Cactus Kev" prime multiplication method.
// Each rank is a prime: 2=2, 3=3, ..., A=41.
// Multiply the primes of the 5 cards.
// The product is unique for every hand type (e.g., A-A-A-K-K always yields X).
int product = 1;
for (int i = 0; i < 5; i++) {
product *= RANK_PRIMES[get_rank(hand_mask, i)];
}
return lookup_table[product];
}
Why this is fast:
- No Loops: The logic uses bitwise shifts and masks, which are single-cycle CPU instructions.
- Cache Friendly: The lookup table is small enough to stay in the CPU's L1/L2 cache.
- Branch Prediction: The
ifstatements are highly predictable (most hands are high-card, few are straights), minimizing pipeline stalls.
3. Scaling to Millions of Hands: SIMD & Vectorization
For Monte Carlo simulations (e.g., "What is my equity if I run this hand 100,000 times?"), single-threaded evaluation is too slow. You must use SIMD (AVX2, AVX-512 on x86; NEON on ARM).
The Approach:
- Load 8 or 16 hands into a single 256-bit or 512-bit register.
- Perform the bitwise operations (flush check, straight check) on all 8/16 hands simultaneously.
- Use lookup tables that are vectorized (or use
shuffleinstructions to map multiple keys to multiple values).
Library Example: HandEvaluator in Rust/C++ with AVX2
// Conceptual SIMD evaluation (using a crate like `hand-eval-simd`)
fn evaluate_batch(hands: &[Hand], results: &mut [u32]) {
// Load 8 hands into a 256-bit register
let hand_vec = load_simd(hands);
// Parallel Flush Check
let flush_mask = check_flush_simd(hand_vec);
// Parallel Straight Check
let straight_mask = check_straight_simd(hand_vec);
// Parallel Lookup (requires a specialized vectorized LUT or scatter/gather)
let scores = vectorized_lookup(hand_vec, flush_mask, straight_mask);
store_simd(results, scores);
}
This can achieve 100M+ hands per second per core on modern CPUs.
4. Architecture for Production Systems
The "Evaluator Service" Microservice
In a distributed architecture, do not embed the heavy evaluation logic in the main game loop if you need to run simulations. Offload it to a dedicated Evaluator Service.
- Protocol: gRPC or UDP (for low latency).
- Language: C++ or Rust for the evaluator core (maximum performance).
- Interface:
-
EvaluateHandsRequest:{ hand_1: [c1, c2...], hand_2: [...] } -
EvaluateHandsResponse:{ score_1: 1234, score_2: 5678 }
-
- Scaling: Run the service as a stateless pod in Kubernetes. Use Horizontal Pod Autoscaling (HPA) based on CPU utilization.
Database & Caching
- Redis: Cache the results of common board textures (e.g.,
A-K-Q-J-Ton board) if the same board appears frequently in simulations. - PostgreSQL: Store the final hand history, including the
hand_strength_scoreandwinner_ids. Do not store the raw bitmasks unless needed for debugging.
5. Security & Compliance
- Determinism: The evaluator must be pure. No random numbers, no external state. If the same 7 cards are passed in, the same score must be returned. This is non-negotiable for regulatory compliance.
- Certification: The LUT generation algorithm must be certified by an independent lab (e.g., eCOGRA, GLI). The source code for the LUT generator is often audited, not just the runtime binary.
- Anti-Cheating: Never expose the evaluator logic to the client. The client sends cards only when allowed; the server evaluates.
6. Real-World Implementation Example
Scenario: A 6-max No-Limit Hold'em table.
- Pre-flop: Player A (AA) vs Player B (KK).
- Simulation: The engine runs 100,000 Monte Carlo simulations to determine equity for the "All-In" button.
- Execution:
- The "Simulator" service generates random boards (5 cards).
- It calls the Evaluator 200,000 times (2 hands x 100k simulations).
- The Evaluator uses the Cactus Kev LUT + AVX2 to process 8 hands in parallel.
- Total time: ~50ms.
- Result: Player A has 82% equity. The server updates the UI.
Trade-offs:
- Memory vs. Speed: A full 7-card LUT is fast but memory-heavy (500MB+). A 5-card LUT + 21 combinations is slower (21x lookups) but uses only 10MB.
- Solution: Hybrid approach. Use a 5-card LUT. If the board is a flush/straight, use the optimized path. Otherwise, use the standard lookup.
7. Code Snippet: The "Fast" Evaluator (Node.js with N-API)
Since Node.js is single-threaded and slow for raw math, use N-API to bind a C++ evaluator.
// evaluator.cc (C++ N-API Module)
#include <node_api.h>
#include "poker_eval.h" // The C++ LUT engine
napi_value EvaluateHand(napi_env env, napi_callback_info info) {
// 1. Extract card array from JS
// 2. Convert to bitboard (uint64_t)
// 3. Call C++ evaluate(bitboard) -> int
// 4. Return int to JS
return napi_create_int32(env, result);
}
// evaluator.js (Node.js)
const evalModule = require('./build/Release/evaluator.node');
function getHandStrength(holeCards, communityCards) {
// Hole: [0, 1], Board: [2, 3, 4, 5, 6]
const allCards = [...holeCards, ...communityCards];
return evalModule.evaluate(allCards);
}
FAQs
Q1: Why not just write a simple if (isFlush) return ... function in Python/JavaScript?
Because a standard if-else approach involves loops to check suits, loops to check ranks, and sorting. This is $O(N)$ or $O(N \log N)$ and involves many conditional branches. In a high-load environment (10k hands/sec), this creates a bottleneck and high CPU usage. A LUT approach is $O(1)$ and uses bitwise operations, which are orders of magnitude faster.
Q2: How do you handle 7-card hands (Omaha/Hold'em) with a 5-card LUT?
You generate all $\binom{7}{5} = 21$ possible 5-card combinations from the 7 available cards. You run the 5-card LUT on each of the 21 combinations and pick the highest score. This is still extremely fast because the LUT lookup is instant. Advanced engines optimize this by pre-computing the "best 5 of 7" directly into a 7-card LUT, but the 21-lookup method is often sufficient and uses less memory.
Q3: Can I use this for Web3/Blockchain poker?
Yes, but with a caveat. On-chain evaluation (e.g., on Ethereum) is too expensive for full LUTs due to gas costs. The standard pattern is Off-chain Evaluation with On-chain Verification. The server (or a trusted oracle) evaluates the hand off-chain using the LUT, signs the result, and the smart contract verifies the signature. The contract never runs the evaluator.
Q4: What is the "Cactus Kev" method?
It is a specific algorithm developed by Cactus Kev (Eric Persson) that maps each card rank to a prime number. The product of the primes of 5 cards is unique for every hand type (e.g., a Full House of Aces over Kings has a unique product). This allows the engine to determine the hand type and strength using a single multiplication and a lookup table, avoiding complex conditional logic.
Q5: How do I ensure the LUT is correct and not buggy?
You must verify the LUT against a known reference implementation (like the PokerStars or WSOP logic) or a brute-force generator.
- Generate all 2.6M 5-card hands.
- Evaluate them with a slow, correct (but inefficient) reference algorithm.
- Evaluate them with your fast LUT.
- Assert that the results match for every single hand. This verification process is part of your CI/CD pipeline.
Top comments (0)