Mike Czerwinski

Posted on Jun 23

I built a football bot that doesn't watch football. It's #2 in our World Cup league.

#ai #agents #llmops #showdev

The bot is sitting in second place out of fourteen. Three points behind the leader.

The thirteen humans in this league watch a lot more football than I do, and the leader watches more football than most of them. The bot has never seen a match.

The setup

A friends-only prediction league, World Cup 2026, thirteen humans and one bot. The league is private, so I won't link it. The bot's name in the standings is "mike," same as mine, which the others find funnier than I do.

I wrote two files. client.py does session-cookie auth against the league's API and exposes matches(), me(), and put_bet(match_id, home, away). revise.py runs from cron every fifteen minutes, looks at upcoming matches in a window 90 to 200 minutes before kickoff, asks Claude for a verdict, and writes the bet if anything has changed. Idempotent state guard — same match won't get re-revised in the same pre-window. Per-match JSON record of every decision, with reasoning. The whole thing is under 250 lines.

The three rules

The prompt to Claude is structured and small. It asks for: home goals, away goals, confidence level, and a one-sentence reason that has to cite exactly one of three rules.

Squad-value ratio. Estimated market value of each squad. If one side is clearly more expensive than the other, that side scores more.
Class gap. Is one of these teams a debutant or weak federation member, and the other a top-eight nation by recent results? If yes, the modal score gets pushed harder.
Pace mismatch. Does one side's attack-speed style obviously punish the other side's defensive shape?

That's the whole rulebook. No "team chemistry." No "the manager's been under pressure." No "momentum from the last group game." No "this is the kind of fixture Brazil tends to drop points in." Three crude structural proxies for is one of these teams obviously the better one, and a confidence level that determines how aggressive the modal scoreline gets.

The standings

Full table as of this morning. Other players anonymized (single letters), bot's row in bold:

Place	Player	Pts	Exact	Diff	Result	Bets
1	A	71	6	5	27	72
2	bot mike	68	3	6	28	44
3	B	67	4	5	27	72
4	C	65	4	3	27	54
5	D	63	6	3	24	48
6	E	59	4	3	24	49
7	F	59	2	5	25	60
8	G	57	3	5	23	49
9	H	53	4	3	21	69
10	I	50	4	2	20	44
11	J	49	0	3	23	63
12	K	45	4	3	17	35
13	L	37	3	1	15	39
14	M	30	1	0	14	26

The top of the table looks like this in prose: player A is at 71 points with six exact scores and a prediction in every single match of the tournament. From past chat in our group, this person watches football constantly — clubs across three leagues, knows squad rotations, follows transfer news. A serious follower. The bot is three points behind, with three exact scores. Player B at #3, also a serious follower, is one point further back.

The bot is sitting one row below a real follower of the game. Not by being smarter. Not by knowing football. By being a small structured system run by someone who, on a normal day, has to be reminded which group Senegal is in.

Why this is the part that interests me

Most of the people in this league are serious football followers. They watch matches. They form intuitions. They have opinions about which team underperforms its squad value, which side fades in second halves, which manager makes the wrong substitution against teams with fast wingers. That knowledge is real, and it isn't easily replaced.

And yet a 250-line script with three rules and a cron job is sitting above twelve of them.

The bot doesn't beat the leader. It doesn't outperform actual football intuition at the top of the table. What it does is beat almost everyone else, despite having none of the inputs they have. That gap — between domain-rich intuition and a small disciplined system in a fresh domain — is the thing I keep noticing in my other work, and I didn't expect to see it lit up this cleanly in a prediction league.

The discipline isn't football-specific. It's: define a small structured prompt, run it on cron, write a per-match record, let the model worker do the worker job, and let the structure around the model decide when to act and when to keep the previous bet. That structure is doing more of the work than the model is.

A few honest caveats

The bot probably doesn't stay at #2 through the whole tournament. The knockout rounds get messier. Squad-value ratios get less reliable in tournament football because the variance is high, and three crude rules will miss the kind of nuance the top human catches in tight quarter-finals. The bot's lead over the rest of the league is real today. Whether it holds is a separate question.

Also: the bot is not clever in the prediction itself. Claude isn't running a Bayesian model under the hood. It's pattern-matching on a small structured prompt with three rules. The cleverness, such as it is, is in the architecture around the model — the cron cadence, the state guard, the structured rule set, the per-match JSON log. The model is the worker. The structure decides when the worker is allowed to ship.

That distinction is most of the post. The bot isn't beating football fans because the model is smart. It's beating most of them because the system around the model is small, structured, and consistent — and most casual prediction isn't.

Footer

If this sounds like the vibe-coding-is-not-a-level framing — it is. In a fresh domain, a small disciplined system can get surprisingly close to domain intuition — and sometimes beat loosely applied intuition outright. I was writing about this in software last week. I'm writing about it in football this week. The shape doesn't care which domain it lives in.

The bot's predictions will keep landing through the group stage. The standings will move. I'll write the follow-up when they do.

Top comments (6)

born2frag • Jun 24

This is great so far. We will see what happens in the play-off stage :)

Mike Czerwinski • Jun 24

what is yout bet for bot? ;)

born2frag • Jun 24

It's gonna win ;)

Mike Czerwinski • Jun 24

omg - lol - who knows :D

born2frag • Jun 30

Now it's the knockout phase, so it's going to be a little harder. Yesterday, Germany and the Netherlands were knocked out—those are two surprises, and it's just the beginning of the playoffs :). We will see, but I think bot will win.

Mike Czerwinski • Jun 30

Knockout phase is where the bot's strategy actually gets tested. League play forgives a lot because the same model gets dozens of chances to mean-revert. Single elimination punishes any biased calibration immediately. If your bot keeps winning into the playoffs, the prior is real. If it collapses, the league record was variance, not edge. Same problem in trading, different costume. Germany and Netherlands going out is exactly the kind of variance the bot has to survive, not predict.