ty215

Posted on Jun 27

Why I Built a Tiny Repeated-Game Poker Analysis Tool

#python #ai #opensource #showdev

Most poker solvers answer one question very well: given a single hand and a single decision tree, what is the equilibrium strategy? (Yes, there is subgame solving, node locking, and plenty more — but the default frame is still one hand, one equilibrium.)

I kept getting stuck on a different one. What if the same kind of spot shows up over and over, and a player can commit to a fixed strategy across those repetitions? In a few toy games I had a hunch, worked out by hand, that committing to a fixed strategy could change its value relative to the one-shot picture. I wanted a tool that could make that commitment value precise — to actually analyze it rather than just believe it. (Whether any of this rises to a repeated-game equilibrium is a much stronger claim, and one I am deliberately not making here.)

I'm still learning software engineering, so until recently I couldn't implement this — I was stuck reasoning about toy games on paper. AI tooling made the analysis feasible, so I finally started building it: repeated-poker-analysis.

It's a small research project: write one narrow model down, run small examples, and record what the model does and doesn't justify.

What `repeated-poker-analysis` is

It is an experimental Python toolkit for small abstract poker games. The current MVP covers:

fixed Hero commitment candidates,
exact Villain best-response diagnostics in small finite trees,
candidate generation and filtering,
T_deadline, an economic adaptation deadline,
local T_detect, an observable-distribution sensitivity estimate,
analysis reports and Markdown summaries.

It is small on purpose. It is not a full solver and it is not wired to real solver ranges. It starts from one toy game — a river spot — that is tiny enough to inspect and test by hand.

That toy spot is one where showdown always chops but rake still bites. In a single-hand view, putting more money into a raked pot can be locally unattractive. Across repeated occurrences the same spot raises a commitment question: if one player refuses to fold in a fixed pattern, how does the other respond, and how fast would that response have to come for the commitment to stop being worth it?

This is the question I wanted a tool to make precise — not a claim that any new equilibrium exists.

Why repeated poker is tempting — and where the trap is

Repeated games sound like a natural home for reputation, punishment, and adaptation, and poker has obvious repeated structure: similar river spots, similar blind-vs-blind situations, similar sizings, similar pools.

Here is the trap I had to respect. If the number of repetitions is known, the game is fully observed, each spot is independent, and both players are perfectly rational, then a finite repeated game often collapses back toward the one-shot equilibrium by backward induction. "This spot happens five times" is not by itself enough to claim a reputation equilibrium. That is the standard game-theory result, and it is the reason the project keeps the layers below separate.

So the project keeps several ideas apart that are easy to blur:

a one-hand baseline strategy,
a fixed Hero commitment candidate,
Villain's exact best response to that fixed Hero strategy,
an economic deadline for adaptation,
a local estimate of how visible the change is,
and the much stronger claim of a repeated-game equilibrium.

The MVP mostly lives in the commitment-analysis layer: if Hero is fixed to a candidate strategy in the supplied tree, what are Villain's exact best responses, and what happens to Hero EV under conservative tie handling?

What the MVP can do

(This describes the MVP on main at the time of writing. I'm still changing it, so details may move.)

It runs an end-to-end candidate-analysis pipeline on a small abstract game:

build tiny finite two-player game trees with rake,
evaluate fixed Hero and Villain mixed strategies,
enumerate exact Villain best responses for small trees,
report Hero EV under worst- and best-case Villain best-response tie rules,
generate simple Hero candidates from a baseline,
filter candidates before comparison,
compare candidate values against a baseline profile,
compute T_deadline and local T_detect,
render a Markdown summary.

In plain terms, the analysis loop is:

Start from a baseline profile: a fixed Hero strategy and a fixed Villain strategy on the supplied tree. These are action probabilities at each information set — not hand ranges; the tool does not model or import real solver ranges.
Generate Hero candidates by shifting probability between two actions at a single Hero information set (a blind, systematic enumeration of small shifts — not a search aimed at hurting Villain).
For each candidate, lock Hero to it and compute Villain's exact best response. That yields Hero's worst-case EV after Villain adapts.
Flag a candidate robustly_profitable only when that post-response worst-case Hero EV is strictly higher than Hero's EV in the baseline profile. The point is not "positive EV" — it is "still better than the one-shot baseline even after the opponent best-responds."
T_deadline / T_detect then add repeated-game timing on top of the candidates that survive.

The main entry point is run_candidate_analysis_pipeline.

python scripts/check_mvp.py

A simplified workflow:

from nuts_chop_river import build_nuts_chop_river, default_hero_strategy
from candidate_library import baseline_villain_strategy

from repeated_poker import (
    CandidateFilterConfig,
    CandidateGenerationConfig,
    run_candidate_analysis_pipeline,
)

tree = build_nuts_chop_river()
baseline_hero = default_hero_strategy()
baseline_villain = baseline_villain_strategy()

result = run_candidate_analysis_pipeline(
    tree,
    baseline_hero,
    baseline_villain,
    generation=CandidateGenerationConfig(shift_amounts=[0.1, 0.2]),
    horizon=5,
    profit_tolerance=-2.0,
    max_selection_l1_distance=0.3,
    detection_log_likelihood_threshold=3.0,
    detection_occurrence_probability_per_opportunity=0.5,
    filtering=CandidateFilterConfig(
        max_l1_distance=0.3,
        min_required_observations=5,
    ),
)

print(result.markdown_summary)

The output is a diagnostic report for the model you supplied, not a poker recommendation. Here is an excerpt from the actual output of examples/analysis_pipeline.py on the nuts-chop river toy game (I've trimmed the Configurations block and some columns; 8 candidates generated, 6 dropped by the filter, 2 compared):

generated=8 kept=2 excluded=6
compared=2

## Candidate Analysis Summary

### Summary Counts
- total: 2
- eligible: 2
- excluded: 0
- minimum_villain_ev: 1
- pareto_frontier: 2

### Candidate Rows

| candidate_id            | fixed_hero_ev | post_response_hero_ev_worst | robustly_profitable | t_detect | exclusion_reasons |
| ----------------------- | ------------- | --------------------------- | ------------------- | -------- | ----------------- |
| H1\|check->bet\|shift=0.1 | 0.625         | -0.850                      | no                  | 278      | -                 |
| H1\|bet->check\|shift=0.1 | 0.275         | -0.750                      | no                  | 294      | -                 |

The baseline Hero EV in this run is +0.45. The column that matters is robustly_profitable: it is yes only when post_response_hero_ev_worst exceeds that baseline. Here both candidates are no (-0.85 and -0.75 are below +0.45). A candidate that clears the baseline is rare and can exist in constructed cases — the tool's job is to search the candidates and find it when it does. The next section is a hand-built spot where one does.

A toy game where the commitment beats the baseline

I needed at least one example where the machinery clearly does what it is meant to: a known spot where committing to a fixed strategy leaves Hero better off than the one-shot baseline, even after the opponent best-responds. This nuts-chop steal is that example, and I wrote a dedicated test for it (tests/test_nuts_chop_steal_commitment.py). Treat it as a check that the tool can detect the effect at all — not as the end goal, and not as a claim about real games. Outside this constructed spot I do not know which situations, if any, are profitable to commit in.

The spot: a river where the board is already the nuts, so every showdown chops. There is no value betting — the only reason to bet (shove) is fold equity. Rake is below its cap, so a called pot just bleeds chips to the house. With a small starting pot and a big shove, a single hand looks like this:

initial commitment = 1, initial pot = 2, bet = 98, rake = 5%, cap = 4

| Line         | Hero/IP EV | Villain/OOP EV |
|--------------|-----------:|---------------:|
| check-check  |      -0.05 |          -0.05 |
| bet-fold     |      -1.00 |          +1.00 |
| bet-call     |      -2.00 |          -2.00 |

In one hand the caller folds: -1.00 (fold) beats -2.00 (call). So the one-shot subgame answer is OOP bets / IP folds — a pure steal, since the board is a chop and there is no value in betting.

Now lock IP to always call and ask the tool for OOP's exact best response. The steal's only profit source (fold equity) is gone, a called pot is -2.00 for OOP, so OOP's exact best response flips to check — and check-check is -0.05 for both. The test asserts exactly this: solve_exact_response returns {"OOP_river": "check"} once Hero is locked to call.

And crucially, this clears the baseline: Hero's EV goes from -1.00 (the one-shot steal baseline) to -0.05 after OOP adapts — still negative, but strictly better than the baseline, which is exactly the robustly_profitable condition. That is the whole point of the project stated in one example: the one-shot subgame answer (bet/fold) is not the answer under the fixed commitment I wanted to test (check/check). The commitment to call removes the opponent's only incentive to bet. (Whether this constitutes a repeated-game equilibrium is the stronger claim I am deliberately not making — this is a commitment-analysis result, not an equilibrium proof.)

The tool also puts a number on how long that commitment stays worth it. With baseline Hero EV = -1.00 (steal), pre-adaptation = -2.00 (locked call while OOP still bets), post-adaptation = -0.05 (OOP has switched to check), T_deadline comes out as floor(1 + 19N/39):

| N (horizon) | T_deadline |
|------------:|-----------:|
|          10 |          5 |
|          20 |         10 |
|          50 |         25 |
|         100 |         49 |

The honest caveat: this is a tiny, hand-built tree, and the EVs are ones I can check by hand — that is exactly why I trust this result more than anything else in the repo. It is not evidence about real games; it is evidence that the model and the code agree on one constructed example built to validate the effect.

Verification on my machine:

python -m pytest tests/test_nuts_chop_steal_commitment.py -v → 15 passed
python -m pytest -q → 500 passed
python scripts/check_mvp.py → passes
git diff --check → clean

How I worked with AI

I supplied the algorithm and the poker model. Codex wrote the implementation instructions and reviewed the results; Claude Code wrote the code. I checked Codex's prompts and corrected wrong premises, but I did not review the code line by line — I relied on the Codex/Claude review loop and the test suite (currently 500 passing tests).

Two things from that process are worth recording:

The assistant kept drifting toward the general case. For the commitment analysis I wanted Hero fully fixed and only Villain's exact best response computed, but it repeatedly tried to set up CFR — wasted machinery when Hero is fixed. Stopping it led to a side question I hadn't considered: CFR with one side frozen looks like a fixed-environment learning problem. I'm noting that as a question, not a result.
Explaining the toy game to the model was harder than explaining it to a person — it over-generalized and assumed things that don't apply (e.g. Villain value-bets on a board that is already the nuts). I ended up brushing each spec up in a chat first, then handing the cleaned version to the coding agent.

A note on terms the code keeps separate: T_deadline is economic (how late Villain can adapt while the locked policy still beats the baseline); T_detect is visibility (how many local observations before the candidate's action distribution looks distinguishable from baseline). They are different questions.

What I learned

Best-response ties matter. If Villain has several best responses with identical Villain EV, Hero's EV can still differ across them. Returning one arbitrary response would hide that risk, so the MVP reports both ev_h_worst and ev_h_best across the tie set. (Verified: BestResponseResult exposes both and the action variation across optimal pure strategies.)

Small examples are not a weakness. The nuts-chop river benchmark is tiny on purpose: easier to hand-check, harder to mistake for a real-money recommendation.

Current limitations

The main one: the code has not had an independent human code review. Tests pass, but I haven't read the implementation line by line and nobody else has either. Rather than rely on reading the code, I plan to validate it from the outside — design the verification to be as exhaustive as I can make it, run simulations across many configurations, and check that the results hold up. Whether static or property-based checking can give that coverage is something I'm still working out.

The narrower limits: it is not a full solver, does not import real solver ranges yet, does not solve large no-limit games, and does not do STT / ICM / preflop push-fold yet. The exact response engine enumerates Villain pure strategies, so it is meant for small abstract trees only — there is an explicit max_pure_strategies ceiling, default 100,000. Candidate generation is simple: finite shifts from a baseline, not a continuous strategy space.

Most importantly: positive EV inside this model does not guarantee profitable play. The model can be wrong if the abstraction, action tree, rake rule, ranges, or adaptation assumptions are wrong.

This is not gambling, bankroll, financial, or legal advice.

Next steps

The toy game confirmed the effect, so next I want to extend the tool: analyze with hand ranges rather than abstract action probabilities, and model the opponent adapting gradually (e.g. a Bayesian update of their response over repetitions) instead of switching to an exact best response in one step. Alongside that, I want to firm up the outside-in verification described above before trusting results on new spots.

Links

Repository: guriguri215-lang/repeated-poker-analysis
MVP walkthrough: docs/mvp_walkthrough.md
Assumptions and limitations: docs/assumptions_and_limitations.md
Publication policy: docs/publication_policy.md

Disclosure: I used AI assistance throughout this project and to draft this article. The division of labor was deliberate: I supplied the algorithm and the poker model, Codex handled instructions and review, and Claude Code wrote the code; I checked the prompts and relied on automated review and tests for the implementation. This article was also drafted with AI help and then rewritten to reflect my own decisions, mistakes, and open questions. Technical claims are marked where I have verified them against the code myself; where I say something is provisional or unreviewed, that is literally true.

DEV Community

Why I Built a Tiny Repeated-Game Poker Analysis Tool

What `repeated-poker-analysis` is

Why repeated poker is tempting — and where the trap is

What the MVP can do

A toy game where the commitment beats the baseline

How I worked with AI

What I learned

Current limitations

Next steps

Links

Top comments (0)

What repeated-poker-analysis is

Why repeated poker is tempting — and where the trap is

What the MVP can do

A toy game where the commitment beats the baseline

How I worked with AI

What I learned

Current limitations

Next steps

Links

What `repeated-poker-analysis` is