For years, sports commentators have repeated the same clichés about T20 cricket: "You have to win the powerplay," "You need an anchor to win," and "Team X always chokes."
I wanted to know if any of that was actually true.
The problem is that querying 15+ years of historical ball-by-ball telemetry (over 294,000 deliveries) via commercial sports APIs is painfully slow and brutally rate-limited.
So, I built Midwicket—an open-source SDK that bypasses APIs entirely. It pulls raw open data into a local DuckDB and PyArrow engine, turning your laptop into a sub-millisecond sports data warehouse.
To test the architecture, I wrote a logistic regression Win Probability model (AUC 0.87) trained on 1,239 IPL matches. Because the query engine is entirely local, I could calculate the exact Win Probability Added (WPA) for every single ball ever bowled.
When I ran the numbers, the data completely broke some of the sport's biggest myths.
1. The "Choke Index" is real, and it's ruthless.
I queried the database for every match where a chasing team reached an 80% Win Probability at any point, but still managed to lose the game.
| Franchise | 80%+ WP Matches | Chokes | Choke % | Rating |
|---|---|---|---|---|
| Royal Challengers Bangalore (RCB) | 9 | 6 | 66.7% | Notorious |
| Mumbai Indians (MI) | 10 | 4 | 40.0% | High |
| Chennai Super Kings (CSK) | 10 | 3 | 30.0% | Moderate |
| Kolkata Knight Riders (KKR) | 9 | 2 | 22.2% | Composed |
The Finding: RCB has a staggering 66.7% choke rate from commanding positions. If they reach an 80% probability of winning, they are mathematically more likely to throw it away than finish the job. KKR, on the other hand, converts almost 78% of these positions.
2. The Anchor vs. The Finisher (Kohli vs Dhoni)
Who is more valuable in a run chase: the guy who bats for 15 overs to build a foundation (Kohli), or the guy who comes in at the end (Dhoni)?
The WPA math heavily favors the finisher.
| Player | Total Innings Analyzed | Positive WPA % | Average WPA per Innings |
|---|---|---|---|
| MS Dhoni | 10 | 80% | +46.5% |
| Virat Kohli | 10 | 70% | +25.8% |
The Finding: Kohli is vastly consistent, but Dhoni mathematically contributes ~20% more total win probability per innings. The model reveals that run-chases are highly non-linear; the leverage in the final 3 overs is so massive that a finisher essentially holds the entire win probability of the team in their hands.
3. "Winning the Powerplay" is a myth. Over 19 is the cliff.
If you simulate a catastrophic 2-wicket collapse at any point in a chase, the Powerplay (Overs 1-6) is surprisingly forgiving—a collapse there drops your win probability by about 25%.
However, a collapse in Over 19 drops it by 60.9%.
By the 19th over, the margin of error mathematically compresses to zero. It is the definitive breaking point of the sport.
The Open Source Engine
If you’re a sports analytics nerd, a data engineer, or just someone who wants to run SQL queries against 15 years of sports data without paying for an API, I’ve open-sourced the entire engine, the PyArrow schema, and the trained models.
Check out the repo here: Midwicket on GitHub
I’d love to hear your thoughts on the architecture, the DuckDB implementation, or the data findings! What team should I run the choke index on next?
Top comments (0)