In the previous examples, we calculated Poisson probability on a single data point. Here, I am going to calculate the probabilities of soccer teams scoring a particular number of goals in a match. These probabilities are used to get odds based on historical results. I would like to note that it is a simplified approach because it does not account for the other important factors and features, e.g. injured players, players’ position for the particular match, new players and coaches, etc. I used the English Premier League data for this analysis, the link to the data source is here.
Game outcome probabilities
I used season 2017-2018 data to calculate teams’ defense and attack strength. 20 teams played 380 games total. Because there were 3 new teams in the season 2018-19, I showed only 17 teams' results. You can find a Python script here.
Let’s apply 2017-18 historical data to calculate the outcome probabilities of the first matches in season 2018-19. Particularly, we convert probabilities of scoring n number of goals to the home team win, away team win, and draw.
In this example out of 14 games, the results of 11 games correlated with calculated probabilities. As I have already mentioned in the beginning, these probabilities do not account for the specifics of the particular games. Also, here I skipped a test which checks whether a Poisson distribution is a good approximation.
As part of feature engineering, I would like to check if the above approach could be adjusted for the horse races, e.g. using gate numbers instead of home and away teams.
Top comments (0)