Tennis Statistics

Probability of Winning a Tennis Game
The probability of winning a tennis game is somewhat complicated by the fact that the number of points that will be played is not predetermined, and the eventual winner always wins the last point. Let p be the fundamental probability of you winning a tennis point, assumed to be a given quantity, and define q = 1 - p. Also, let P(m,n) be the likelihood that if n points were played, you would win m of them. P(m,n) is thus just the standard binomial expression given by P(m,n) = [n!/(m!(n-m)!)]p^mq^n-m.
While the number of game points is not fixed, each game is comprised of at least four points. The following formula, to be explained, then yields the probability, w, that you will win the game.

w = P(4,4) + P(3,4)[p + qp + q²p_d] + P(2,4)[p_d] + P(1,4)[p²p_d]

In this expression, p_d is the probability that you will win the game once you have reached a deuce situation, a term to be considered shortly.
This equation consists of four terms on the right hand side. The first is the likelihood of winning the first four game points, and thus the game, and can directly be calculated from the binomial formula. The second term corresponds to winning three of the first four points, and then either winning the next (and final) point, or first losing a point before winning the final point, or losing two points placing you in a deuce situation, whose likelihood of your winning is p_d. The third term assumes that you win two of the first four points causing a deuce, and thus again involves p_d. Finally, the fourth term corresponds to winning only one of the first four points, and requires that you win two more points for a chance at winning once at deuce.
p_d remains to be defined before w can be evaluated. It turns out that p_d = p² + pqp_d + qpp_d since the probability of winning at deuce is the probability of winning the next two points plus the probability of winning and then losing a point, which gets you back to deuce again, plus the probability of losing and then winning a point, which also gets you back to deuce. Solving for p_d

p_d = p²/(1 - 2pq)
p_d can now be substituted into the relation for w, as can the P(i,4) binomial expression terms, and w follows as

w = p⁴[1 + 4q + 10q²] + 20p⁵q³/(1 - 2pq)
Probability of Winning the Set
The probability of your winning the set follows much the same procedure as that described above, but using the calculated value of w, instead of p, as the driving parameter. Here we define z = 1 - w, and note that each set involves at least six games. We will also adopt the notation that P(r to s) implies a set score of r to s, and R(m,n) is the binomial formula for winning m of n games, namely R(m,n) = [n!/(m!(n-m)!)]w^mz^n-m. Using rationale similar to that employed before

P(6 to 0) = R(6,6) = w⁶
P(6 to 1) = R(5,6)w = 6w⁶z
P(6 to 2) = R(5,6)zw + R(4,6)w² = 21w⁶z²
P(6 to 3) = R(5,6)z²w + R(4,6)[wz + zw]w + R(3,6)w³ = 56w⁶z³
P(6 to 4) = R(5,6)z³w + R(4,6)[3w²z²] + R(3,6)[3w³z] + R(2,6)w⁴ = 126w⁶z⁴
P(5 to 5) = R(5,6)z⁴ + R(4,6)[4wz³] + R(3,6)[6w²z²] + R(2,6)[4w³z] + R(1,6)w⁴ = 252w⁵z⁵
P(7 to 5) = P(5 to 5)w² = 252w⁷z⁵
P(6 to 6) = P(5 to 5)[2wz] = 504w⁶z⁶
P(7 to 6) = P(6 to 6)p_t
where p_t is the probability of winning a tie breaker game in which the winner is the first to reach at least 7 points, and to win by at least 2 points.
The only remaining detail is to relate p_tto the point probabilities p and q. This is done in precisely the same way as in computing w, except that instead of this game comprising at least four points, it now has a minimum number of seven. Algebraically things get a bit more messy, with a result that

p_t = p⁷[1 + 7q + 28q² + 84q³ + 210q⁴ + 462q⁵] + 924p⁶q⁶p_d
where p_d is derived in the previous section.
We now have derived everthing necessary to calculate all of the possible set scores starting only with the point probability, p. The previous section yields w and p_d, and this section takes those values and evaluates the set scores. The likelihood of winning the set, s, is simply the sum of the probabilities of all winning set scores, namely

s = P(6 to 0) + P(6 to 1) + P(6 to 2) + P(6 to 3) + P(6 to 4) + P(7 to 5) + P(7 to 6)
Probability of Winning the Match
This is a much easier derivation, given the set win probability, s. Assuming that a match win goes to the first player to win two sets, the likelihood of such a win, m, is clearly given by

m = s² + 2s²(1 - s)
w, s, and m are presented as a function of p in the first graph, and specific set score probabilities are displayed in the second figure.

Return to Tennis Figures