What 295,732 IPL balls reveal: chase to win, and the death overs are a different game

#datascience #python #cricket #tutorial

I cleaned every IPL delivery from 2008 to 2026 into one tidy ball-by-ball table — 295,732 rows — and added the columns I always end up recomputing by hand (match phase, running run-rate, per-batter tallies, and fantasy box-scores). Then I asked the data three simple questions.

1. Chasing wins — but the toss barely matters

Across 1,218 completed matches, the toss-winner won just 51.6% of the time. Close to a coin flip — until you look at what they do with it.

Captains who won the toss and chose to field (chase) won 54.7% of their matches. Those who chose to bat first won only 45.3%. A ~9-point swing from one decision — chasing is a real, persistent edge in the IPL.

2. The death overs are a different sport

Run rate by phase, across every IPL ball:

Phase	Run rate	Wickets / over
Powerplay (1–6)	7.78	0.23
Middle (7–15)	7.81	0.26
Death (16–20)	9.78	0.52

Scoring barely moves from powerplay to middle, then explodes at the death — but the wicket rate more than doubles. If you're modeling fantasy points or win probability, treating all overs the same leaves signal on the table.

3. Do it yourself

The dataset is one flat CSV, so this is a three-liner:

import pandas as pd
df = pd.read_csv("ipl_ballbyball.csv")
print(df.groupby("phase")["runs_total"].sum() * 6 / df.groupby("phase")["ball_in_innings"].count())

The data

Free sample on Kaggle: IPL ball-by-ball + fantasy box-scores
Full 19-season history (295,732 deliveries + 27,374 player-match fantasy box-scores), auto-refreshed weekly: $12 on Gumroad
Internationals too: 765k+ T20I deliveries on Kaggle

Built from Cricsheet (ODC-BY). Columns include match phase, live run rate, running score, per-batter tallies, and per-player fantasy box-scores — so you can model straight away instead of parsing raw JSON.