DEV Community

Cover image for I Built a Local DuckDB Engine to Mathematically Settle Cricket Debates
CodersAcademy006
CodersAcademy006

Posted on

I Built a Local DuckDB Engine to Mathematically Settle Cricket Debates

For years, sports commentators have repeated the same clichés about T20 cricket: "You have to win the powerplay," "You need an anchor to win," and "Team X always chokes."

I wanted to know if any of that was actually true.

The problem is that querying 15+ years of historical ball-by-ball telemetry (over 294,000 deliveries) via commercial sports APIs is painfully slow and brutally rate-limited.

So, I built Midwicket—an open-source SDK that bypasses APIs entirely. It pulls raw open data into a local DuckDB and PyArrow engine, turning your laptop into a sub-millisecond sports data warehouse.

To test the architecture, I wrote a logistic regression Win Probability model (AUC 0.87) trained on 1,239 IPL matches. Because the query engine is entirely local, I could calculate the exact Win Probability Added (WPA) for every single ball ever bowled.

When I ran the numbers, the data completely broke some of the sport's biggest myths.


1. The "Choke Index" is real, and it's ruthless.

I queried the database for every match where a chasing team reached an 80% Win Probability at any point, but still managed to lose the game.

Franchise 80%+ WP Matches Chokes Choke % Rating
Royal Challengers Bangalore (RCB) 9 6 66.7% Notorious
Mumbai Indians (MI) 10 4 40.0% High
Chennai Super Kings (CSK) 10 3 30.0% Moderate
Kolkata Knight Riders (KKR) 9 2 22.2% Composed

The Finding: RCB has a staggering 66.7% choke rate from commanding positions. If they reach an 80% probability of winning, they are mathematically more likely to throw it away than finish the job. KKR, on the other hand, converts almost 78% of these positions.


2. The Anchor vs. The Finisher (Kohli vs Dhoni)

Who is more valuable in a run chase: the guy who bats for 15 overs to build a foundation (Kohli), or the guy who comes in at the end (Dhoni)?

The WPA math heavily favors the finisher.

Player Total Innings Analyzed Positive WPA % Average WPA per Innings
MS Dhoni 10 80% +46.5%
Virat Kohli 10 70% +25.8%

The Finding: Kohli is vastly consistent, but Dhoni mathematically contributes ~20% more total win probability per innings. The model reveals that run-chases are highly non-linear; the leverage in the final 3 overs is so massive that a finisher essentially holds the entire win probability of the team in their hands.


3. "Winning the Powerplay" is a myth. Over 19 is the cliff.

If you simulate a catastrophic 2-wicket collapse at any point in a chase, the Powerplay (Overs 1-6) is surprisingly forgiving—a collapse there drops your win probability by about 25%.

However, a collapse in Over 19 drops it by 60.9%.

By the 19th over, the margin of error mathematically compresses to zero. It is the definitive breaking point of the sport.


The Open Source Engine

If you’re a sports analytics nerd, a data engineer, or just someone who wants to run SQL queries against 15 years of sports data without paying for an API, I’ve open-sourced the entire engine, the PyArrow schema, and the trained models.

Check out the repo here: Midwicket on GitHub

I’d love to hear your thoughts on the architecture, the DuckDB implementation, or the data findings! What team should I run the choke index on next?

Top comments (0)