Naomi Jepkorir

Posted on Aug 6

⚽ Can We Predict the Next Premier League Champion with Binomial Probability?

#datascience #python #api #math

What are the chances your favorite EPL team wins the league next season? Time to let math do the talking! 🎲

🧠 Idea Behind the Madness

Every football fan has asked it:

"Can my team win the league next season?"

Instead of relying on blind hope, I decided to use binomial probability to calculate each team's chances of taking the crown in the next Premier League season, based entirely on how they performed last time.

We’ll:

Fetch last season’s final standings from an API.
Use binomial distribution to simulate two things:
- The probability of a team repeating its exact win total.
- The probability of a team reaching the typical championship threshold which is ≈27.6 so 28 wins.
Rank them accordingly.

🛰️ Step 1: Fetching EPL Data Using an API

I used the football-data.org API to pull the standings. You’ll need a free API token, save it in a .env file like this:

API_TOKEN=your_football_data_token

Now fetch the standings:

import requests
import os
from dotenv import load_dotenv

load_dotenv()

def fetch_epl_standings():
    token = os.getenv("API_TOKEN")
    if not token:
        raise ValueError("API_TOKEN not found in env")

    uri = "http://api.football-data.org/v4/competitions/PL/standings?season=2024"
    headers = { 'X-Auth-Token': token }

    response = requests.get(uri, headers=headers)

    if response.status_code != 200:
        raise Exception(f"API request failed with status code {response.status_code}: {response.text}")

    data = response.json()
    return data["standings"][0]["table"]

standings = fetch_epl_standings()

Convert to DataFrame:

import pandas as pd

data_rows = []
for team in standings:
    data_rows.append({
        "Pos": team["position"],
        "Team": team["team"]["name"],
        "Matches": team["playedGames"],
        "Wins": team["won"],
        "Draws": team["draw"],
        "Losses": team["lost"],
        "Points": team["points"],
        "+/-": team["goalDifference"],
        "Goals": f'{team["goalsFor"]}:{team["goalsAgainst"]}'
    })

df = pd.DataFrame(data_rows)
df.to_csv('epl_standings.csv', index=False)

🎯 Step 2: Binomial Probability of Exact Win Count

Now let's calculate the probability of each team repeating the exact number of wins they had last season.

import math

# Loop through each row and calculate binomial probability
for index, row in df.iterrows():
    team = row['Team']
    n = int(row['Matches'])  # total games
    k = int(row['Wins'])     # wins
    p = k / n                # estimated win probability

    try:
        binom_prob = math.comb(n, k) * (p**k) * ((1 - p)**(n - k))
    except OverflowError:
        binom_prob = 0.0

    print(f"{team}: P( {k} wins)  = {binom_prob:.6f}")

Sample Output:

Liverpool FC: P( 25 wins)  = 0.135388
Arsenal FC: P( 20 wins)  = 0.128761
Ipswich Town FC: P( 4 wins)  = 0.206486
Southampton FC: P( 2 wins)  = 0.278054

📉 What These Results Tell Us

Top teams like Liverpool have lower exact probabilities, there's more room for variation when you're near the top.
Lower-table teams tend to have higher repeat chances, but don't celebrate just yet...

🏆 Step 3: Probability of Title-Winning Season (≥ 28 Wins)

Next, we model the probability of each team reaching 28 or more wins, a common threshold to win the league.

We'll use the cumulative binomial distribution:

from scipy.stats import binom

def title_probability(wins, matches=38, threshold=28):
    p = wins / matches
    return 1 - binom.cdf(threshold - 1, matches, p)

for index, row in df.iterrows():
    team = row['Team']
    wins = int(row['Wins'])
    prob = title_probability(wins, threshold=28)
    print(f"{team}: P(Wins ≥ 28) = {prob:.6f}")

Sample Output:

Team	P(Wins ≥ 28)
Liverpool FC	19.78%
Manchester City FC	1.54%
Arsenal FC	0.66%
Chelsea FC	0.66%
Newcastle United	0.66%
Manchester United FC	0.00%

📊 Interpretation

Liverpool is most likely to hit 28+ wins based on current form.

City, Chelsea and the others trail behind, possibly due to more draws or inconsistent performances.

Man United? Their chance rounds to zero. Ouch 😬.

🫣 United fans, this model says your 11-win season gives you a statistically negligible shot at the title. You might want to pray harder than you code.

⚠️ Limitations

Let’s be honest, binomial probability isn’t a crystal ball. Here's why:

It ignores real-world dynamics: transfers, injuries, managerial changes.
It assumes independent, identically distributed matches (which football is not).
Based on one season, not a large enough sample for deep insight.

But hey, it’s fun and statistically grounded!

🧪 Want to Take This Further?

Here’s how you can level up the model:

Use Poisson regression to simulate goals per match.
Integrate Elo ratings or other power metrics.
Run full Monte Carlo simulations of future fixtures.
Track the model live across the season for dynamic probabilities.

💭 Final Thoughts

While this model won’t help you win your fantasy league, it does give a math-driven glimpse into who’s statistically positioned to succeed. Liverpool fans? You have reason to dream. Southampton? Maybe next year...

Football is unpredictable, and that's what makes it beautiful. But every now and then, it's fun to let the math have a shot at calling the game. ⚽📊

DEV Community