I cleaned every IPL delivery from 2008 to 2026 into one tidy ball-by-ball table — 295,732 rows — and added the columns I always end up recomputing by hand (match phase, running run-rate, per-batter tallies, and fantasy box-scores). Then I asked the data three simple questions.
1. Chasing wins — but the toss barely matters
Across 1,218 completed matches, the toss-winner won just 51.6% of the time. Close to a coin flip — until you look at what they do with it.
Captains who won the toss and chose to field (chase) won 54.7% of their matches. Those who chose to bat first won only 45.3%. A ~9-point swing from one decision — chasing is a real, persistent edge in the IPL.
2. The death overs are a different sport
Run rate by phase, across every IPL ball:
| Phase | Run rate | Wickets / over |
|---|---|---|
| Powerplay (1–6) | 7.78 | 0.23 |
| Middle (7–15) | 7.81 | 0.26 |
| Death (16–20) | 9.78 | 0.52 |
Scoring barely moves from powerplay to middle, then explodes at the death — but the wicket rate more than doubles. If you're modeling fantasy points or win probability, treating all overs the same leaves signal on the table.
3. Do it yourself
The dataset is one flat CSV, so this is a three-liner:
import pandas as pd
df = pd.read_csv("ipl_ballbyball.csv")
print(df.groupby("phase")["runs_total"].sum() * 6 / df.groupby("phase")["ball_in_innings"].count())
The data
- Free sample on Kaggle: IPL ball-by-ball + fantasy box-scores
- Full 19-season history (295,732 deliveries + 27,374 player-match fantasy box-scores), auto-refreshed weekly: $12 on Gumroad
- Internationals too: 765k+ T20I deliveries on Kaggle
Built from Cricsheet (ODC-BY). Columns include match phase, live run rate, running score, per-batter tallies, and per-player fantasy box-scores — so you can model straight away instead of parsing raw JSON.
Top comments (0)