Lisandra Melo

Posted on Aug 13, 2020

Using Probability and Statistics to Predict Sportive Results

#python #statistics #datascience #football

This article is an English translation of my article which was written on Brazilian Portuguese and posted here on my dev.to profile.

Initial Considerations

In this article we will use mathematical concepts like Expected Value and Probability Distribution, if you don’t know much about these concepts, you may still understand everything that’s being done in the article, but if you want to learn more about the content, I indicate the Khan Academy website especially the modules on Probability Distribution and Average - Expected Value, they are short and very explanatory videos about the concepts.

Introduction to the Project

In this project, we will use probability and statistics to predict the results of football matches. For this, we will use Python and its Numpy library, along with concepts of probability and statistics.

We will perform the following process, we will read a file containing all the results of AFC Ajax matches in the Dutch football league (Eredivisie) during the 18/19 season and we will, for each round, predict the score of the next match of the team, this prediction will consist of the Expected Value (EV) of goals scored by the club and the EV of goals conceded.

What Happens If We Guess Random Values?

For future reference, we will look at what would happen if we tried to guess the results of the matches using random values.

We will consider that Ajax can score from 0 (minimum number of goals scored in a match in our data) to 8 (maximum number recorded by Ajax in a match in our data) goals, that is, a total of 9 possible results, and allow from 0 (registered minimum) to 6 (registered maximum) goals, a total of 7 possibilities. We have that the probability of getting a match score prediction right is to get the right number of goals scored and the right number of goals conceded, so if we choose random values in the determinated interval we will have:

P(goalsScored) = \frac{1}{9} \newline P(goalsAllowed) = \frac{1}{7} \newline P(matchScore) = P(goalsScored) * P(goalsAllowed) = \frac{1}{9}* \frac{1}{7} = \frac{1}{63} = 0,0159

The Dutch league has a total of 34 matches, we will not make predictions for the first round, as we have no previous data to help us calculate a prediction. So, considering that we have 33 matches to try to get at least one right score, we will multiply 33 by the probability of a right match score, which gives us a value of around 0.5238 right score. This means that without mathematical tools, using random values, we are expected to get the right score of less than one match of the 33 analyzed. For the number of goals scored on a match, we have an expected value of 3.6667 (33 * 1/9) right results and for goals conceded 4.7143 (33 * 1/7).

So let's try to improve these values (which are very low) using math and programming.

Project Implementation

To create our project, first, we will create our scores file, this file will have a specific format and will be written as:

goalsscored,goalsconceded

For example, if Ajax scored 4 goals and conceded 2 in a match we will have in the file:

4,2

This file will be named resultados.txt, and it is available in the project repository.

Now we are going to start the coding part of our project! We will begin importing the necessary library.

import numpy as np

Then we will open our scores file.

# Opening the file with our scores
fileResults = open("resultados.txt", "r")

After opening the file, we will insert the contents of the file into a list called matchesScores using a list comprehension, which is a way of defining, creating, and maintaining lists in python. With this tool, we can create an iterator and fill lists within a single line of code.

At the end of the iteration, we will close the file (resultados.txt) that was opened at the beginning of our code.

# Declaring our score list
matchesScores = []

# The for loop will work with every line of the file in each iteration
for lineofFile in fileResults:
    """
   The next line of code will add the contents of a file line,
   inside the braquets we have a list comprehension which
   does the exact same work as the following code:
   list = []
    for x in l.split(","):
        list.append(int(x))
    results.append(list)
    """
    matchesScores.append([int(x) for x in lineofFile.split(",")])

# The we will close our file
fileResults.close()

Now we will start analyzing the data obtained. But first, we will initialize some variables that will store our formatted data.

# We Will declare two lists, one containing the goals scored and one with the goals conceded
goals_scored = []
goals_conceded = []

# We will declare the number of time we got the goals scored, goals conceded and both of them right
right_round = 0
right_goals_scored = 0
right_goals_conceded = 0

We will then iterate through the entire matchesScores list, separating the values it contains in goals scored and conceded and then calculating the expected value of each of these categories to calculate a score prediction for the next round.

For it, we will obtain the frequency of each number of goals, that is, how many times the team has scored 0 goals, 1 goal, 2 goals, and so on. We will do the same with the goals conceded. With the frequency of each number of goals, we will have the data to calculate our expected value.

For example, we can have a frequency like the one shown in the graph below (This is not the actual frequency of the data).

Example of how the frequency could look like

To define the goals scored and conceded we will code:

"""
We will go through our list of scores per round
and calculate the expected value of goals scored
and conceded for each round,
we will predict with these values and
then we will check if these values correspond
to the result that happened in the match.
"""
for round in range(len(matchesScores)):
    goals_scored.append(matchesScores[round][0])
    goals_conceded.append(matchesScores[round][1])

    # Now we will get the frequency of the number of goals scored so far
    num_goals, freq_num_goals = np.unique(goals_scored, return_counts=True)
    # For organizational reasons, we will transform our values into a dictionary 'goals': frequency
    dic_goals_scored = dict(zip(num_goals, freq_num_goals))

    # We wil do the same with the goals conceded
    num_goals, freq_num_goals = np.unique(goals_conceded, return_counts=True)
    # For organizational reasons, we will transform our values into a dictionary 'goals': frequency
    dic_goals_conceded = dict(zip(num_goals, freq_num_goals))

After that, we will calculate the expected value of the goals, that is, the values that are expected in the next match considering the values of the previous rounds. To calculate this value we will multiply all the values in the dictionary (number of goals scored) by their probability of occurrence (Frequency divided by the number of rounds) getting then our expected values.

    expected_scored=0
    for goal in dic_goals_scored.keys():
        expected_scored += goal*(dic_goals_scored[goal]/len(goals_scored))

    expected_conceded=0 
    for goal in dic_goals_conceded:
        expected_conceded += goal*(dic_goals_conceded[goal]/len(goals_conceded))

After calculating our expected values, we will print our prediction and compare it with the result of the next round to see if we got the result of the match, the number of goals scored and the number of goals conceded right with our prediction.

    # After calculating our prediction we will print it and compare to the real result

    # The next line will round our values to the closest integer
    expected_scored = int(np.around(expected_scored))
    expected_conceded = int(np.around(expected_conceded))

    """
    If we are in the last round we have no future round
    to predict so we will stop our iteration
    """
    if (round+1 == len(matchesScores)):
        break
    """
    Now we will print our expected value for the next round
     as lists start at number 0 we have to add
     1 to the round value to get the round currently being read,
     that is, we have to add 2 to the number of the `round`
     to get the value of the NEXT round.
    """
    print(f'At the {round+2} round we predicted a result of Ajax  {expected_scored} x {expected_conceded} opponent')
    print(f'At the {round+2} we got a result of Ajax  {matchesScores[round+1][0]} x {matchesScores[round+1][1]} opponent')

    # We will check the results
    if(expected_scored==matchesScores[round+1][0] and expected_conceded==matchesScores[round+1][1]):
        right_round += 1
    if(expected_scored==matchesScores[round+1][0]):
        right_goals_scored += 1
    if(expected_conceded==matchesScores[round+1][1]):
        right_goals_conceded += 1

After the loop execution, we will check our number of right guesses.

# We Will print the results
print("We got {0:1d} of the matches results right, this is, {1:2.2f}%".format(right_round, (right_round/33)*100))

print("We got {0:1d} of the goals scored in a match right, this is, {1:2.2f}%".format(right_goals_scored, (right_goals_scored/33)*100))

print("We got {0:1d} of the goals conceded in a match right, this is, {1:2.2f}%".format(right_goals_conceded, (right_goals_conceded/33)*100))

The output of our program will look like this

> At the 2 round we predicted a result of Ajax  1 x 1 opponent
> At the 2 we got a result of Ajax  1 x 0 opponent
...
> At the 34 round we predicted a result of Ajax  3 x 1 opponent
> At the 34 we got a result of Ajax  4 x 1 opponent
> We got 4 of the matches results right, this is, 12.12%
> We got 7 of the goals scored in a match right, this is, 21.21%
> We got 15 of the goals conceded in a match right, this is, 45.45%

Note that we got 4 results right from a complete match, 8 times more than using random values, 7 predictions of goals scored, 2 times more, and 15 predictions of goals conceded, 3 times more.

The use of expected values helped a lot to improve our number of correct guesses. This shows how powerful simple concepts of probability and statistics can be in data analysis.

The program developed in this article is available in my gitlab repository. I hope I have helped you in any way, if you have any problems or questions feel free to leave a comment on this post or send me an email;).

Top comments (32)

BebH1 • Sep 16 '21 • Edited

Thanks for the information! I will definitely follow your advice next time when I will bet, to see how effective it is. I do my analysis in another way, but I don't think it is bad to try something new. I bet for a long time, so I have some experience in this field, also I know where to bet and which sites are reliable and always give many bonuses. Most of the time I bet on the sites which are offered here cricketbettingguru.com/best-cricke...

Biren Ramanathan • Oct 25 '21 • Edited

Cricketbettingguru is a great site! Similar to live-score.top/ - those guys are doing a great job!

Biren Ramanathan • Jan 18 '22

Also you can check more info on these sites:
cricket14.in
kntvnews.in
footbal24.in

Anish Giri • Oct 10 '22

One of the top betting sites for Indian cricket fans over the course of a few years is World777. You will be thrilled to learn that cricketing legend Kevin Pietersen hosts World777 cricket betting site and provides players with specialized betting advice and analysis.

Kristan Stewart • Mar 27 '23

I have enjoyed reading your blog. As a fellow writer and Kindle publishing enthusiast, I would like to first thank you for the sheer volume of useful resources you have compiled for authors in your blog and across the web. I'm also working on the blog, I hope you like my MOONSTONE GEMS blogs.

Elisa Cruise • Mar 27 '23

I Hope You Will Share Such Type Of Impressive Content Again With Us So That We Can Utilize It And Get More Advantage.
Click Here: Moonstone Ring

Mary Zoe • Jan 16

Thank you for sharing such interesting information with us. The details you mentioned in this blog really helped me to understand everything about the topic. We are also working on Citrine Earrings; if you have any information or suggestions regarding this topic, please reply after scouring our website. We await your new blog and feedback for our website and blog.

Bella Swan • Mar 27 '23

Very nice information related to this Blog. This Information is very good and helpful, Thank you to provide us. But, I have some Information related to Gemstone & Jewelry, Check Chakra Jewelry Collection on our Website.

Bhanu Mangal • Jan 27

Such a beautiful way to describe two different things at a time. I never considered that this topic could be so deep until reading yours! I want to draw attention that we are also seeking your guidance on Blue Topaz Pendant as we work on it. If you have any information or suggestions about the topic, please send us your feedback. We are eagerly waiting for your further blog.

Aria Zoe • Apr 3 '23

You have written a very good article, I got a lot of pleasure after reading this article of yours, I hope that you will submit your second article soon. Thank You
Visit Now :- Moldavite Jewelry

sasha meg • Apr 7 '23

Wonderful Blog! For sharing the list above, we are grateful to Admin. I perused a lot of your blog's pages. You have a great blog, really. Continue to share such inspiring tales. Thanks. Visit Here :- Chakra Jewelry

Alexa Jones • Mar 27

What an insightful post! I had no idea that this topic could have such a positive impact on developing key qualities. I'm excited to learn more! We're involved in sagittarius and libra compatibility and would appreciate any guidance you can share. Your feedback would mean a lot!

saleena tp • Oct 13 '22

Prediction and expert opinion is very important to win in online games. "Online free poll games are worthy of additional mention on the online gaming benefits list. Free coins and currency allow beginners to learn the rules and strategies of the game and play poker with confidence while winning real money. triofus is the site offers the same mechanism.
"

View full discussion (32 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.