DEV Community

Cover image for Using Probability and Statistics to Predict Sportive Results
Lisandra Melo
Lisandra Melo

Posted on

Using Probability and Statistics to Predict Sportive Results

This article is an English translation of my article which was written on Brazilian Portuguese and posted here on my dev.to profile.

Initial Considerations

In this article we will use mathematical concepts like Expected Value and Probability Distribution, if you don’t know much about these concepts, you may still understand everything that’s being done in the article, but if you want to learn more about the content, I indicate the Khan Academy website especially the modules on Probability Distribution and Average - Expected Value, they are short and very explanatory videos about the concepts.

Introduction to the Project

In this project, we will use probability and statistics to predict the results of football matches. For this, we will use Python and its Numpy library, along with concepts of probability and statistics.

We will perform the following process, we will read a file containing all the results of AFC Ajax matches in the Dutch football league (Eredivisie) during the 18/19 season and we will, for each round, predict the score of the next match of the team, this prediction will consist of the Expected Value (EV) of goals scored by the club and the EV of goals conceded.

What Happens If We Guess Random Values?

For future reference, we will look at what would happen if we tried to guess the results of the matches using random values.

We will consider that Ajax can score from 0 (minimum number of goals scored in a match in our data) to 8 (maximum number recorded by Ajax in a match in our data) goals, that is, a total of 9 possible results, and allow from 0 (registered minimum) to 6 (registered maximum) goals, a total of 7 possibilities. We have that the probability of getting a match score prediction right is to get the right number of goals scored and the right number of goals conceded, so if we choose random values in the determinated interval we will have:

P(goalsScored)=19P(goalsAllowed)=17P(matchScore)=P(goalsScored)P(goalsAllowed)=1917=163=0,0159 P(goalsScored) = \frac{1}{9} \newline P(goalsAllowed) = \frac{1}{7} \newline P(matchScore) = P(goalsScored) * P(goalsAllowed) = \frac{1}{9}* \frac{1}{7} = \frac{1}{63} = 0,0159

The Dutch league has a total of 34 matches, we will not make predictions for the first round, as we have no previous data to help us calculate a prediction. So, considering that we have 33 matches to try to get at least one right score, we will multiply 33 by the probability of a right match score, which gives us a value of around 0.5238 right score. This means that without mathematical tools, using random values, we are expected to get the right score of less than one match of the 33 analyzed. For the number of goals scored on a match, we have an expected value of 3.6667 (33 * 1/9) right results and for goals conceded 4.7143 (33 * 1/7).

So let's try to improve these values (which are very low) using math and programming.

Project Implementation

To create our project, first, we will create our scores file, this file will have a specific format and will be written as:

goalsscored,goalsconceded
Enter fullscreen mode Exit fullscreen mode

For example, if Ajax scored 4 goals and conceded 2 in a match we will have in the file:

4,2
Enter fullscreen mode Exit fullscreen mode

This file will be named resultados.txt, and it is available in the project repository.

Now we are going to start the coding part of our project! We will begin importing the necessary library.

import numpy as np
Enter fullscreen mode Exit fullscreen mode

Then we will open our scores file.

# Opening the file with our scores
fileResults = open("resultados.txt", "r")
Enter fullscreen mode Exit fullscreen mode

After opening the file, we will insert the contents of the file into a list called matchesScores using a list comprehension, which is a way of defining, creating, and maintaining lists in python. With this tool, we can create an iterator and fill lists within a single line of code.

At the end of the iteration, we will close the file (resultados.txt) that was opened at the beginning of our code.

# Declaring our score list
matchesScores = []

# The for loop will work with every line of the file in each iteration
for lineofFile in fileResults:
    """
   The next line of code will add the contents of a file line,
   inside the braquets we have a list comprehension which
   does the exact same work as the following code:
   list = []
    for x in l.split(","):
        list.append(int(x))
    results.append(list)
    """
    matchesScores.append([int(x) for x in lineofFile.split(",")])

# The we will close our file
fileResults.close()

Enter fullscreen mode Exit fullscreen mode

Now we will start analyzing the data obtained. But first, we will initialize some variables that will store our formatted data.

# We Will declare two lists, one containing the goals scored and one with the goals conceded
goals_scored = []
goals_conceded = []

# We will declare the number of time we got the goals scored, goals conceded and both of them right
right_round = 0
right_goals_scored = 0
right_goals_conceded = 0
Enter fullscreen mode Exit fullscreen mode

We will then iterate through the entire matchesScores list, separating the values it contains in goals scored and conceded and then calculating the expected value of each of these categories to calculate a score prediction for the next round.

For it, we will obtain the frequency of each number of goals, that is, how many times the team has scored 0 goals, 1 goal, 2 goals, and so on. We will do the same with the goals conceded. With the frequency of each number of goals, we will have the data to calculate our expected value.

For example, we can have a frequency like the one shown in the graph below (This is not the actual frequency of the data).

Example of how the frequency could look like
Example of how the frequency could look like

To define the goals scored and conceded we will code:

"""
We will go through our list of scores per round
and calculate the expected value of goals scored
and conceded for each round,
we will predict with these values and
then we will check if these values correspond
to the result that happened in the match.
"""
for round in range(len(matchesScores)):
    goals_scored.append(matchesScores[round][0])
    goals_conceded.append(matchesScores[round][1])

    # Now we will get the frequency of the number of goals scored so far
    num_goals, freq_num_goals = np.unique(goals_scored, return_counts=True)
    # For organizational reasons, we will transform our values into a dictionary 'goals': frequency
    dic_goals_scored = dict(zip(num_goals, freq_num_goals))

    # We wil do the same with the goals conceded
    num_goals, freq_num_goals = np.unique(goals_conceded, return_counts=True)
    # For organizational reasons, we will transform our values into a dictionary 'goals': frequency
    dic_goals_conceded = dict(zip(num_goals, freq_num_goals))
Enter fullscreen mode Exit fullscreen mode

After that, we will calculate the expected value of the goals, that is, the values that are expected in the next match considering the values of the previous rounds. To calculate this value we will multiply all the values in the dictionary (number of goals scored) by their probability of occurrence (Frequency divided by the number of rounds) getting then our expected values.

    expected_scored=0
    for goal in dic_goals_scored.keys():
        expected_scored += goal*(dic_goals_scored[goal]/len(goals_scored))

    expected_conceded=0 
    for goal in dic_goals_conceded:
        expected_conceded += goal*(dic_goals_conceded[goal]/len(goals_conceded))
Enter fullscreen mode Exit fullscreen mode

After calculating our expected values, we will print our prediction and compare it with the result of the next round to see if we got the result of the match, the number of goals scored and the number of goals conceded right with our prediction.

    # After calculating our prediction we will print it and compare to the real result

    # The next line will round our values to the closest integer
    expected_scored = int(np.around(expected_scored))
    expected_conceded = int(np.around(expected_conceded))

    """
    If we are in the last round we have no future round
    to predict so we will stop our iteration
    """
    if (round+1 == len(matchesScores)):
        break
    """
    Now we will print our expected value for the next round
     as lists start at number 0 we have to add
     1 to the round value to get the round currently being read,
     that is, we have to add 2 to the number of the `round`
     to get the value of the NEXT round.
    """
    print(f'At the {round+2} round we predicted a result of Ajax  {expected_scored} x {expected_conceded} opponent')
    print(f'At the {round+2} we got a result of Ajax  {matchesScores[round+1][0]} x {matchesScores[round+1][1]} opponent')

    # We will check the results
    if(expected_scored==matchesScores[round+1][0] and expected_conceded==matchesScores[round+1][1]):
        right_round += 1
    if(expected_scored==matchesScores[round+1][0]):
        right_goals_scored += 1
    if(expected_conceded==matchesScores[round+1][1]):
        right_goals_conceded += 1
Enter fullscreen mode Exit fullscreen mode

After the loop execution, we will check our number of right guesses.

# We Will print the results
print("We got {0:1d} of the matches results right, this is, {1:2.2f}%".format(right_round, (right_round/33)*100))

print("We got {0:1d} of the goals scored in a match right, this is, {1:2.2f}%".format(right_goals_scored, (right_goals_scored/33)*100))

print("We got {0:1d} of the goals conceded in a match right, this is, {1:2.2f}%".format(right_goals_conceded, (right_goals_conceded/33)*100))

Enter fullscreen mode Exit fullscreen mode

The output of our program will look like this

> At the 2 round we predicted a result of Ajax  1 x 1 opponent
> At the 2 we got a result of Ajax  1 x 0 opponent
...
> At the 34 round we predicted a result of Ajax  3 x 1 opponent
> At the 34 we got a result of Ajax  4 x 1 opponent
> We got 4 of the matches results right, this is, 12.12%
> We got 7 of the goals scored in a match right, this is, 21.21%
> We got 15 of the goals conceded in a match right, this is, 45.45%
Enter fullscreen mode Exit fullscreen mode

Note that we got 4 results right from a complete match, 8 times more than using random values, 7 predictions of goals scored, 2 times more, and 15 predictions of goals conceded, 3 times more.

The use of expected values helped a lot to improve our number of correct guesses. This shows how powerful simple concepts of probability and statistics can be in data analysis.

The program developed in this article is available in my gitlab repository. I hope I have helped you in any way, if you have any problems or questions feel free to leave a comment on this post or send me an email;).

Latest comments (32)

Collapse
 
marywillsan profile image
Mary Willsan

Thank you for sharing such valuable insights! I've been looking for information on this topic for ages, and your post is exactly what I needed. As I am fresher and looking for some guidance on Black Tourmaline Ring, I have none other than you in my mind for some information and suggestions. Have a look at our website and share your feedback and tips with us. Waiting for your blogs!

Collapse
 
ellyse_perry_7feb931346bd profile image
Ellyse Perry

Your blog post really sparked my curiosity! I had never known that this topic could be this vast and informative. As we are working on leo and aries, we want some information and your suggestion on this topic. For a brief detail what we are into, please visit out website. We will be waiting for your new blog and your feedback for us.

Collapse
 
ellyse_perry_7feb931346bd profile image
Ellyse Perry

What a fantastic resource! I had no idea that this topic could help boost certain qualities of mine. I can't wait to explore some more information on this topic. We are into virgo and scorpio relationship and want your suggestions, if any, regarding this. Your small information and feedback will help us a lot.

Collapse
 
alexajones99 profile image
Alexa Jones

What an insightful post! I had no idea that this topic could have such a positive impact on developing key qualities. I'm excited to learn more! We're involved in sagittarius and libra compatibility and would appreciate any guidance you can share. Your feedback would mean a lot!

Collapse
 
alexajones99 profile image
Alexa Jones

I really appreciate how you explained all the information smartly. We are waiting for your further blogs. We are also looking for information about aquarius and gemini to clarify how gemstones can have specific effects on our lives. We request you to please visit our website and give suggestions and feedback. And the wait continues for your coming blog

Collapse
 
viola_allen_0f19b68057e00 profile image
Viola Allen

Such a beautiful way to describe two different things at a time. I never considered that this topic could be so deep until reading yours! I want to draw attention that we are also seeking your guidance on libra and aquarius compatibility as we work on it. If you have any information or suggestions about the topic, please send us your feedback. We are eagerly waiting for your further blog.

Collapse
 
ellyse_perry_7feb931346bd profile image
Ellyse Perry

I really appreciate how you explained all the information smartly. We are waiting for your further blogs. We are also looking for information about pisces and taurus to clarify how gemstones can have specific effects on our lives. We request you to please visit our website and give suggestions and feedback. And the wait continues for your coming blog.

Collapse
 
ellyse_perry_7feb931346bd profile image
Ellyse Perry

It was such a fun read! I never knew that this detailed information could immensely help me. I am looking for some suggestions on gemini libra compatibility, and we find you suitable to provide us with some information you have on this topic. Also, I want some suggestions for our website and waiting for your feedback. Being your fan, I look forward to your new blog.

Collapse
 
bhanu_mangal_87dbee8611ff profile image
Bhanu Mangal

Such a beautiful way to describe two different things at a time. I never considered that this topic could be so deep until reading yours! I want to draw attention that we are also seeking your guidance on Blue Topaz Pendant as we work on it. If you have any information or suggestions about the topic, please send us your feedback. We are eagerly waiting for your further blog.

Collapse
 
nicole_kidman_dce0edeb989 profile image
Nicole Kidman

Such an engaging and well-written post! I really appreciate the depth of information and the unique perspective you’ve shared here. It’s always inspiring to find blogs that truly resonate with their audience and encourage meaningful discussions. If you're ever looking for contributors to expand your content, I’d love to collaborate! I’ve been on the lookout for opportunities to write for us jewelry and other topics, and your platform seems like a perfect fit for sharing fresh ideas and insights. Keep up the great work, and I look forward to reading more from you!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.