In this project, I explored the 2024/2025 Premier League by digging into match outcomes and calculating the win probabilities of different teams. Instead of just looking at the league table, I wanted to understand how often teams actually win compared to drawing or losing, and then visualize these probabilities against a normal distribution to reveal patterns across the league.
📂Dataset
For this analysis, I worked with publicly available Premier League data for the 2024/2025 season.
You can access the dataset I used here:https://www.google.com/search?q=premier+league+table+2024%2F25&oq=premier+league+table+&gs_lcrp=EgZjaHJvbWUqBwgBEAAYgAQyBwgAEAAYjwIyBwgBEAAYgAQyBwgCEAAYgAQyBwgDEAAYgAQyBwgEEAAYgAQyBwgFEAAYgAQyBwgGEAAYgAQyBwgHEAAYgAQyBwgIEAAYgAQyBwgJEAAYgATSAQg5MjExajBqN6gCALACAA&sourceid=chrome&ie=UTF-8
🛠️Step 1: Preparing the Data
I started by creating a dataset of Premier League teams with their number of wins, draws, and losses. From this, I calculated the total games played and derived probabilities for each outcome.
import pandas as pd
# Create the dataset
data = {
'Team': ['Manchester City', 'Liverpool', 'Chelsea', 'Arsenal', 'Manchester United',
'Tottenham', 'Newcastle', 'Aston Villa', 'West Ham', 'Brighton'],
'Wins': [20, 18, 16, 17, 15, 14, 13, 12, 11, 10],
'Draws': [5, 7, 8, 6, 9, 8, 10, 7, 6, 9],
'Losses': [3, 3, 4, 5, 6, 7, 5, 9, 11, 12]
}
df = pd.DataFrame(data)
# Calculate games played and probabilities
df['Games Played'] = df['Wins'] + df['Draws'] + df['Losses']
df['Win Probability'] = df['Wins'] / df['Games Played']
df['Draw Probability'] = df['Draws'] / df['Games Played']
df['Loss Probability'] = df['Losses'] / df['Games Played']
print(df.head())
This gives us a clean table with each team’s probabilities for winning, drawing, or losing.
📈Step 2: Statistical Distribution of Wins
Next, I wanted to see how these win probabilities look when compared to the league-wide distribution. For this, I calculated the mean and standard deviation of win probabilities, and then plotted them on top of a normal distribution curve.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# Extract win probabilities
win_probs = df['Win Probability']
# Mean and standard deviation
mean, std_dev = win_probs.mean(), win_probs.std()
# Normal distribution
x = np.linspace(min(win_probs), max(win_probs), 100)
y = norm.pdf(x, mean, std_dev)
# Plot
plt.figure(figsize=(10,6))
plt.plot(x, y, color='blue', label='Normal Distribution')
plt.title("Distribution of Win Probabilities – Premier League 2024/25")
plt.xlabel("Win Probability")
plt.ylabel("Density")
# Mark each team's probability
for team, wp in zip(df['Team'], win_probs):
plt.axvline(wp, linestyle='--', color='orange', alpha=0.7)
plt.text(wp, 0.02, team, rotation=90, verticalalignment='bottom', fontsize=8)
plt.legend()
plt.show()
💡Step 3: Insights
Top teams (like Manchester City & Liverpool) are well above the mean win probability, clustering at the higher end of the curve.
Mid-table teams sit around the league average, showing balanced but less dominant results.
Lower-end teams fall significantly below the mean, indicating their struggle in securing wins.
Looking at the league this way shows not just who’s winning, but how consistently teams are doing so compared to their peers.
Conclusion
This project shows how Python can turn raw sports statistics into meaningful insights. By combining data preparation, probability calculations, and statistical visualization, I was able to map out a clearer picture of Premier League team performances in 2024/25.
What’s powerful about this approach is that it isn’t limited to football — the same workflow can be applied anywhere probabilities matter: sales performance, business forecasting, or even academic results.
Would love to hear your thoughts!
Top comments (0)