DEV Community

Stojan Kojo
Stojan Kojo

Posted on

What is the most efficient way to evaluate poker hands at scale?

The most efficient way to evaluate poker hands at scale is not to write logic that checks for straights and flushes at runtime. Instead, the industry standard for high-frequency trading (HFT) and real-money poker is pre-computed Lookup Tables (LUTs) combined with bitwise operations and SIMD (Single Instruction, Multiple Data) vectorization.

The goal is to reduce hand evaluation to a single CPU instruction or a few array lookups, achieving nanosecond-level latency. This allows the engine to evaluate millions of hands per second for Monte Carlo simulations, AI training, or real-time fairness checks.

1. The Core Architecture: Lookup Tables (LUTs)

The fundamental insight is that there are a finite number of 5-card combinations from a 52-card deck: $\binom{52}{5} = 2,598,960$.
For 7-card games (Hold'em/Omaha), it's $\binom{52}{7} = 133,784,560$.

The Strategy:

  1. Pre-computation: Before the server starts, generate a massive array where every possible 5-card (or 7-card) combination maps to a unique integer "strength" value.
  2. Runtime: Convert the player's hand into a unique integer key (hash).
  3. Lookup: Access the array at that index. The value returned is the hand's strength.

Memory Trade-off:

  • 5-Card LUT: ~2.6 million entries. If each entry is a 4-byte integer, this is ~10 MB. Fits easily in L3 CPU cache.
  • 7-Card LUT: ~133 million entries. ~534 MB. Fits in RAM, but may spill out of cache on large servers.
  • Optimization: Most engines use a 5-card LUT and a 7-to-5 reduction algorithm. They generate all $\binom{7}{5} = 21$ combinations, look them up, and pick the best.

2. The Bitwise Implementation (The "Cactus Kev" / "TwoPlusTwo" Method)

The most famous high-performance implementation is the Cactus Kev algorithm (popularized by poker-eval in C/C++). It maps cards to prime numbers or bitmasks to perform arithmetic operations that instantly detect flushes and straights.

Data Representation

  • Cards: Represented as integers.
  • Ranks: 2-14 (2-Ace).
  • Suits: 4 bits (1 for Spade, 2 for Heart, etc.).
  • Bitmasks: A 52-bit integer where the $i$-th bit is 1 if the $i$-th card is present.

The Evaluation Function (Pseudocode)

// Pre-computed table: maps a 5-card hash to a strength score (0-7462)
// Lower score = better hand (or higher, depending on convention)
int lookup_table[2598960]; 

int evaluate_hand(int hand_mask) {
    // 1. Check for Flush: Sum of suit bits. If any suit sum >= 4 (5 cards), it's a flush.
    // This is done via bitwise AND with suit masks (e.g., 0xF0F0F0F0F0F0F0F0)
    if (is_flush(hand_mask)) {
        return lookup_flush[hand_mask]; // Pre-computed flush strength
    }

    // 2. Check for Straight: Use a "rank bitboard".
    // Shift bits to detect sequences of 5 consecutive bits.
    if (is_straight(hand_mask)) {
        return lookup_straight[hand_mask];
    }

    // 3. Standard Evaluation (Pairs, Trips, etc.)
    // Use the "Cactus Kev" prime multiplication method.
    // Each rank is a prime: 2=2, 3=3, ..., A=41.
    // Multiply the primes of the 5 cards.
    // The product is unique for every hand type (e.g., A-A-A-K-K always yields X).
    int product = 1;
    for (int i = 0; i < 5; i++) {
        product *= RANK_PRIMES[get_rank(hand_mask, i)];
    }

    return lookup_table[product];
}
Enter fullscreen mode Exit fullscreen mode

Why this is fast:

  • No Loops: The logic uses bitwise shifts and masks, which are single-cycle CPU instructions.
  • Cache Friendly: The lookup table is small enough to stay in the CPU's L1/L2 cache.
  • Branch Prediction: The if statements are highly predictable (most hands are high-card, few are straights), minimizing pipeline stalls.

3. Scaling to Millions of Hands: SIMD & Vectorization

For Monte Carlo simulations (e.g., "What is my equity if I run this hand 100,000 times?"), single-threaded evaluation is too slow. You must use SIMD (AVX2, AVX-512 on x86; NEON on ARM).

The Approach:

  • Load 8 or 16 hands into a single 256-bit or 512-bit register.
  • Perform the bitwise operations (flush check, straight check) on all 8/16 hands simultaneously.
  • Use lookup tables that are vectorized (or use shuffle instructions to map multiple keys to multiple values).

Library Example: HandEvaluator in Rust/C++ with AVX2

// Conceptual SIMD evaluation (using a crate like `hand-eval-simd`)
fn evaluate_batch(hands: &[Hand], results: &mut [u32]) {
    // Load 8 hands into a 256-bit register
    let hand_vec = load_simd(hands);

    // Parallel Flush Check
    let flush_mask = check_flush_simd(hand_vec);

    // Parallel Straight Check
    let straight_mask = check_straight_simd(hand_vec);

    // Parallel Lookup (requires a specialized vectorized LUT or scatter/gather)
    let scores = vectorized_lookup(hand_vec, flush_mask, straight_mask);

    store_simd(results, scores);
}
Enter fullscreen mode Exit fullscreen mode

This can achieve 100M+ hands per second per core on modern CPUs.

4. Architecture for Production Systems

The "Evaluator Service" Microservice

In a distributed architecture, do not embed the heavy evaluation logic in the main game loop if you need to run simulations. Offload it to a dedicated Evaluator Service.

  • Protocol: gRPC or UDP (for low latency).
  • Language: C++ or Rust for the evaluator core (maximum performance).
  • Interface:
    • EvaluateHandsRequest: { hand_1: [c1, c2...], hand_2: [...] }
    • EvaluateHandsResponse: { score_1: 1234, score_2: 5678 }
  • Scaling: Run the service as a stateless pod in Kubernetes. Use Horizontal Pod Autoscaling (HPA) based on CPU utilization.

Database & Caching

  • Redis: Cache the results of common board textures (e.g., A-K-Q-J-T on board) if the same board appears frequently in simulations.
  • PostgreSQL: Store the final hand history, including the hand_strength_score and winner_ids. Do not store the raw bitmasks unless needed for debugging.

5. Security & Compliance

  • Determinism: The evaluator must be pure. No random numbers, no external state. If the same 7 cards are passed in, the same score must be returned. This is non-negotiable for regulatory compliance.
  • Certification: The LUT generation algorithm must be certified by an independent lab (e.g., eCOGRA, GLI). The source code for the LUT generator is often audited, not just the runtime binary.
  • Anti-Cheating: Never expose the evaluator logic to the client. The client sends cards only when allowed; the server evaluates.

6. Real-World Implementation Example

Scenario: A 6-max No-Limit Hold'em table.

  1. Pre-flop: Player A (AA) vs Player B (KK).
  2. Simulation: The engine runs 100,000 Monte Carlo simulations to determine equity for the "All-In" button.
  3. Execution:
    • The "Simulator" service generates random boards (5 cards).
    • It calls the Evaluator 200,000 times (2 hands x 100k simulations).
    • The Evaluator uses the Cactus Kev LUT + AVX2 to process 8 hands in parallel.
    • Total time: ~50ms.
  4. Result: Player A has 82% equity. The server updates the UI.

Trade-offs:

  • Memory vs. Speed: A full 7-card LUT is fast but memory-heavy (500MB+). A 5-card LUT + 21 combinations is slower (21x lookups) but uses only 10MB.
  • Solution: Hybrid approach. Use a 5-card LUT. If the board is a flush/straight, use the optimized path. Otherwise, use the standard lookup.

7. Code Snippet: The "Fast" Evaluator (Node.js with N-API)

Since Node.js is single-threaded and slow for raw math, use N-API to bind a C++ evaluator.

// evaluator.cc (C++ N-API Module)
#include <node_api.h>
#include "poker_eval.h" // The C++ LUT engine

 napi_value EvaluateHand(napi_env env, napi_callback_info info) {
   // 1. Extract card array from JS
   // 2. Convert to bitboard (uint64_t)
   // 3. Call C++ evaluate(bitboard) -> int
   // 4. Return int to JS
   return napi_create_int32(env, result);
 }
Enter fullscreen mode Exit fullscreen mode
// evaluator.js (Node.js)
const evalModule = require('./build/Release/evaluator.node');

function getHandStrength(holeCards, communityCards) {
  // Hole: [0, 1], Board: [2, 3, 4, 5, 6]
  const allCards = [...holeCards, ...communityCards];
  return evalModule.evaluate(allCards);
}
Enter fullscreen mode Exit fullscreen mode

FAQs

Q1: Why not just write a simple if (isFlush) return ... function in Python/JavaScript?
Because a standard if-else approach involves loops to check suits, loops to check ranks, and sorting. This is $O(N)$ or $O(N \log N)$ and involves many conditional branches. In a high-load environment (10k hands/sec), this creates a bottleneck and high CPU usage. A LUT approach is $O(1)$ and uses bitwise operations, which are orders of magnitude faster.

Q2: How do you handle 7-card hands (Omaha/Hold'em) with a 5-card LUT?
You generate all $\binom{7}{5} = 21$ possible 5-card combinations from the 7 available cards. You run the 5-card LUT on each of the 21 combinations and pick the highest score. This is still extremely fast because the LUT lookup is instant. Advanced engines optimize this by pre-computing the "best 5 of 7" directly into a 7-card LUT, but the 21-lookup method is often sufficient and uses less memory.

Q3: Can I use this for Web3/Blockchain poker?
Yes, but with a caveat. On-chain evaluation (e.g., on Ethereum) is too expensive for full LUTs due to gas costs. The standard pattern is Off-chain Evaluation with On-chain Verification. The server (or a trusted oracle) evaluates the hand off-chain using the LUT, signs the result, and the smart contract verifies the signature. The contract never runs the evaluator.

Q4: What is the "Cactus Kev" method?
It is a specific algorithm developed by Cactus Kev (Eric Persson) that maps each card rank to a prime number. The product of the primes of 5 cards is unique for every hand type (e.g., a Full House of Aces over Kings has a unique product). This allows the engine to determine the hand type and strength using a single multiplication and a lookup table, avoiding complex conditional logic.

Q5: How do I ensure the LUT is correct and not buggy?
You must verify the LUT against a known reference implementation (like the PokerStars or WSOP logic) or a brute-force generator.

  1. Generate all 2.6M 5-card hands.
  2. Evaluate them with a slow, correct (but inefficient) reference algorithm.
  3. Evaluate them with your fast LUT.
  4. Assert that the results match for every single hand. This verification process is part of your CI/CD pipeline.

Top comments (0)