DEV Community

Jaakko Kangasharju
Jaakko Kangasharju

Posted on

Tennis and Amplifying Small Differences

A while ago there was a nice post by @teckert on probabilities in a basketball game

Generalization

Thinking about the two situations in the post I came up with the following generalization: You have an odd number of chances and you have to make more than half the shots. This fits both the "1 out of 1" and "2 out of 3" cases that were in the post. And it lets us visualize the effect that @walker pointed out in a comment, that the more chances you have, the more the final result of the game is determined by your individual shot probability.

In the original post, @teckert computed the probability in the "2 out of 3" game by hand. But if we want to generalize to really long games, the formulas would be really unwieldy to derive and handle. We could also write a loop that would generate @teckert 's table, letting the computer handle the heavy lifting. Is there a better way?

Turns out, there is. We have a situation where we have n chances and probability p of making each of them, independently of how the others have gone. This means that the number of successes follows the binomial distribution. And SciPy has a ready-made function for computing this.

We want to determine the total probability of all the cases where the made shots are more than half the number of chances. In probability, such a calculation can be done using the cumulative distribution function (often abbreviated as CDF). The CDF gives the total probability of getting a value less than its argument, so to get the total probability of getting a value more than half, we can subtract the CDF value from 1 (note that we're assuming an odd number of chances so it is not possible to make exactly half the shots).

from scipy.stats import binom

def winning_probability(number_of_chances, shot_probability):
    return 1 - binom.cdf((number_of_chances - 1) / 2, number_of_chances, shot_probability)

Equipped with this function, we can plot the game winning probability curves for various numbers of chances.

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize=(10, 6))
p_range = np.linspace(0, 1, 100)
for n in [1, 3, 5, 9, 19, 39, 79]:
    probs = [winning_probability(n, p) for p in p_range]
    plt.plot(p_range, probs, label=str(n))
plt.ylabel("Probability of Winning Game")
plt.xlabel("Probability of Making a Single Shot")
plt.legend()
plt.show()
display()

The resulting graph shows how, the more chances we have, the more the probability approaches just 0 or 1, depending on whether we're worse or better than average.

The Case of Tennis

So what does this have to do with tennis? Well, tennis shows a similar amplification of small differences. But in tennis, the effect happens on multiple levels. First, you need to win four points to win a game, you need to win six games to win a set, and finally you need to win three sets to win the match. If you lose a game or set, it doesn't matter further on how much you lost by.

Tennis also presents a complication. To win a game it's not enough to win four points, you also need to be two points ahead of the opponent. So we can't use the binomial distribution anymore. But we can set up an equation for the probability and solve it. The key is noticing that, when the score is tied with two consecutive points needed to win, there are three possible situations

  1. You win two points in a row
  2. Your opponent wins two points in a row
  3. You both win one point of the next two

In case 1, you win the game. In case 2, your opponent wins. And in case 3, the situation ends up back in the beginning. So we can set up a formula for the total probability, which will be a sum of two terms: the probability of case 1 and the probability of you winning in case 3. These work out to

case_1_probability = point_probability * point_probability
case_3_probability = 2 * point_probability * (1 - point_probability) * total_probability
total_probability = case_1_probability + case_3_probability

This doesn't exactly work out as computer code because of the recursive definition (total_probability depends on case_3_probability which in turn depends on total_probability), but we can expand the definition of total_probability to make a perfectly solvable mathematical formula

t_p = p_p * p_p + 2 * p_p * (1 - p_p) * t_p

Standard algebraic manipulations produce the solution

t_p = p_p * p_p / (1 - 2 * p_p * (1 - p_p)) = p_p * p_p / (1 - 2 * p_p + 2 * p_p * p_p)

which we can implement in code

def two_wins_in_a_row(point_prob):
    return (point_prob * point_prob) / (1 - 2 * point_prob + 2 * point_prob * point_prob)

So this is the probability of winning a game when you and your opponent both have, say, 3 points, and you need to win two points in a row.

Let's start putting things together for computing the probability of winning a game when we know the probability of winning a point. Conceptually, we will fill a table indexed by your and your opponent's score with your winning probability from that score. Then, the probability of you winning the game is simply the value in the table cell (0, 0).

O\Y 0 1 2 3 4 5 ...
0   1 1 ...
1   1 1 ...
2   1 1 ...
3 T 1 ...
4 0 0 0   T   ...
5 0 0 0 0 T ...

We can start by filling in the value 1 in each cell where you have 4 or more points and your opponent has at least 2 points less than you, since you've already won those. Similarly, we fill in 0 in the symmetrical cases where your opponent has already won. And finally, for 3-3 and even scores after that, we fill in T to signify the value is computed directly by calling two_wins_in_a_row.

We want to end up with a function signature like

def game_win_prob_with_scores(point_prob, score_you, score_opponent):

which will return the value in the table cell indexed by score_you and score_opponent. Then the probability of you winning a game will be game_win_prob_with_scores(point_prob, 0, 0). From the table that we filled in, we can already implement some simple cases

    if score_you >= 4 and score_you >= score_opponent + 2:
        return 1
    elif score_opponent >= 4 and score_opponent >= score_you + 2:
        return 0
    elif score_you >= 2 and score_you == score_opponent:
        return two_wins_in_a_row(point_prob)

(We noticed that in fact T is the probability already when the game is 2-2)

How to handle the remaining cases? We notice that if the game is now at score x-y, the probability is point_prob that it will next be x+1-y, and 1 - point_prob that it will be x-y+1. So if we know the values in the table right and below some cell, we can compute the value in that cell from those other two values. And in the table we see the "border" going from 4-0 to 4-2 to 3-3 to 2-4 to 0-4 that will let us eventually compute any value to the top and left of it.

So now we can write the code for the else

    else:
        return point_prob * game_win_prob_with_scores(point_prob, score_you + 1, score_opponent) + (1 - point_prob) * game_win_prob_with_scores(point_prob, score_you, score_opponent + 1)

Because the border in the table is implemented by the first three conditions in the if-elif chain, and both recursive calls go towards the border, the recursion will eventually terminate (always make sure recursion terminates, especially in cases like this where it's not so obvious at first glance).

There is still one problem with the code. Say we start from score_you == 0 and score_opponent == 0. The left recursive call will have parameters score_you == 1 and score_opponent == 0 and the right recursive call score_you == 0 and score_opponent == 1. The problem is that from both these calls, the function will be called with parameters score_you == 1 and score_opponent == 1, so there will be duplicate work.

This could be solved by rewriting the function to be non-recursive and to process the table cells in an order so that each cell is computed only when its dependencies to the right and down are known. A simpler way is to apply memoization to the recursive version. This is especially easy in Python, since there is a decorator in functools that converts any function into a memoized version. So we add

from functools import lru_cache

at the top, and

    @lru_cache(maxsize=None)

before the function game_win_prob_with_scores and we're done.

From the game-winning probability we can use the same technique to convert it into a set-winning probability. The winner of a set is the first player to reach 6 won games with the other having at most 4, or reach 7-5, or win the tie break at 6-6. The tie break is the same as a game, except it's played until 7 points.

The match is easiest to handle by special-casing the three different possibilities, 3-0, 3-1, and 3-2.

def match_win_prob(point_prob):
    set_prob = set_win_prob(point_prob)
    three_nil_prob = set_prob ** 3
    three_one_prob = 3 * set_prob ** 3 * (1 - set_prob)
    three_two_prob = 6 * set_prob ** 3 * (1 - set_prob) ** 2
    return three_nil_prob + three_one_prob + three_two_prob

The multipliers 3 and 6 count in how many ways each type of win can happen. For instance, in a 3-1 win the 1 set won by the opponent can be the first, second, or third.

And this brings us to the final graph, of showing how the rules of tennis amplify even small differences in point-winning probability to make it much more certain for even a slightly stronger player to win a match.

This was of course somewhat simplified, since in real tennis the probability of winning a point is not constant. Especially having the serve is a major advantage. It would be possible to perform a similar analysis, considering two different probabilities, but that would be unlikely to demonstrate any additional interesting point.

Appendix

Here is the full code for the tennis example, refactored to take advantage of game and tie break being exactly the same, except for a different target number of wins.

import matplotlib.pyplot as plt
import numpy as np
from functools import lru_cache

def two_wins_in_a_row(point_prob):
    return (point_prob * point_prob) / (1 - 2 * point_prob + 2 * point_prob * point_prob)

@lru_cache(maxsize=None)
def simple_win_prob_with_scores(point_prob, target, score_you, score_opponent):
    if score_you >= target and score_you >= score_opponent + 2:
        return 1
    elif score_opponent >= target and score_opponent >= score_you + 2:
        return 0
    elif score_you >= target - 2 and score_you == score_opponent:
        return two_wins_in_a_row(point_prob)
    else:
        return (
            point_prob * simple_win_prob_with_scores(point_prob, target, score_you + 1, score_opponent)
            + (1 - point_prob) * simple_win_prob_with_scores(point_prob, target, score_you, score_opponent + 1)
        )

def tie_break_win_prob(point_prob):
    return simple_win_prob_with_scores(point_prob, 7, 0, 0)

def game_win_prob(point_prob):
    return simple_win_prob_with_scores(point_prob, 4, 0, 0)

@lru_cache(maxsize=None)
def set_win_prob_with_scores(point_prob, score_you, score_opponent):
    if (score_you == 6 or score_you == 7) and score_you >= score_opponent + 2:
        return 1
    elif (score_opponent == 6 or score_opponent == 7) and score_opponent >= score_you + 2:
        return 0
    elif score_you == 6 and score_opponent == 6:
        return tie_break_win_prob(point_prob)
    else:
        game_prob = game_win_prob(point_prob)
        return (
            game_prob * set_win_prob_with_scores(point_prob, score_you + 1, score_opponent)
            + (1 - game_prob) * set_win_prob_with_scores(point_prob, score_you, score_opponent + 1)
        )

def set_win_prob(point_prob):
    return set_win_prob_with_scores(point_prob, 0, 0)

def match_win_prob(point_prob):
    set_prob = set_win_prob(point_prob)
    three_nil_prob = set_prob ** 3
    three_one_prob = 3 * set_prob ** 3 * (1 - set_prob)
    three_two_prob = 6 * set_prob ** 3 * (1 - set_prob) ** 2
    return three_nil_prob + three_one_prob + three_two_prob

p_range = np.linspace(0, 1, 100)
probs = [match_win_prob(p) for p in p_range]
plt.figure(figsize=(10, 6))
plt.plot(p_range, probs)
plt.ylabel("Probability of Winning Match")
plt.xlabel("Probability of Winning a Point")
plt.show()
display()

Latest comments (0)