Do Extra Innings Games Predict World Series Longevity?

Now that baseball season has ended, I find myself going through withdrawal. And with spring training several months away, I need something to fill the void left in my heart. To that end, let's take a moment and look back -- with a mathematical eye, of course -- on the 2015 World Series.

In case you missed it, Game 1 was one for the record books. The New York Mets and the Kansas City Royals duked it out for 14 innings. This was long enough to take the crown for the longest Game 1 in World Series history (though it tied Game 2 in 1916 and Game 3 in 2005, which were also 14 innings long). Here are the highlights:

Even before Game 1, many pundits predicted that the series would last a full seven games. For example, all 5 members of the CBS Sports staff polled in this article predicted a seven-game series, though they were split on which team would emerge victorious.

It shouldn't be surprising, then, that after Game 1 analysts maintained their faith that the series would be a long one. And at the time, this belief seemed entirely reasonable: if two teams are so evenly matched that they can grind out 14 innings, then surely the series should be more likely to last seven games. Even bookies agreed with this assessment; here's how the odds for World Series betting changed between Game 1 and Game 2 (courtesy of Bleacher Report):

World Series Duration
Games Odds (Before Game 1) Odds (Before Game 2) Percent Change
Four Games 19-4 15-2 ≈ +57.9%
Five Games 9-4 11-4 ≈ +22.2%
Six Games 7-4 17-10 ≈ -2.86%
Seven Games 7-4 13-8 ≈ -7.14%

(Here, the percent change is calculated with respect to a $1 bet. For instance, a$1 bet on a four game series before Game 1 would've earned you $19/4 =$4.75, but after Game 1 and before Game 2 it would've earned \$7.50, a roughly 58% increase.)

Odds on a seven game series had the biggest downward change after Game 1, reflecting the fact that more people thought a seven-game series was likely.

It was somewhat surprising, then, when the Royals defeated the Mets in only five games.

This raises an interesting question: are extra-innings games actually any good at predicting how long a world series lasts? Of course, in theory the answer is yes: if a World Series has 7 extra-innings games, you can be pretty confident that the World Series lasted 7 games. Practically speaking, though, only one World Series (in 1991) has had as many as three extra-innings games; the rest have all had two or fewer.

Let's take a look, then, at whether the presence of extra-inning games has any impact on the likelihood of a seven-game series. Below is a bunch of data I obtained from baseball-reference.com. (Note: the data excludes the 1903, 1919, 1920, and 1921 World Series, which were all best-of-nine rather than best-of-seven; it also excludes 1907, 1912, and 1922 because those years featured tie games.)

 Series Length (World Series) Were There Extra Innings Games? No Yes 4 games 11 8 5 games 18 7 6 games 13 11 7 games 18 18 Totals 60 44

Based on this, you might find it tempting to argue that extra innings games are a reasonable predictor of series longevity: 30% (18/60) of series with no extra innings games went to seven games, compared to nearly 41% (18/44) of series with at least one extra innings game. However, the numbers we're dealing with aren't large enough for this difference to be statistically significant (more on that below).

What if we want more data? One approach is to simply wait for more World Series to be played. But I'm impatient. So, I decided to expand my search from just World Series to League Championship Series (LCS), which have also been a best-of-seven series since 1985. This gives us an additional 60ish data points to consider, since there are two League Championship Series per year (one for the American League, one for the National League).

Here's what the data looks like for the LCS:

 Series Length (League Championship Series) Were There Extra Innings Games? No Yes 4 games 4 4 5 games 11 4 6 games 11 11 7 games 5 10 Totals 31 29

In this case, there's an even larger difference. Of the League Championship Series with no extra innings games, roughly 16% (5/31) lasted seven games. But if the series had extra innings games, that percentage more than doubles, to more than 34% (10/29). But even though the difference is even more pronounced, it's not enough to make the result statistically significant.

But here's where things get a little weird. Combine these two sets together, and we get a cumulative table with data from all relevant World Series and League Championship series:

 Series Length (WS + LCS) Were There Extra Innings Games? No Yes 4 games 15 12 5 games 29 11 6 games 24 22 7 games 23 28 Totals 91 73

In this case, the difference between the proportions (23/91 and 28/73) qualifies as significant! You can play around with the numbers yourself in the area below, if you're curious.

So, are World Series with extra innings games more likely to last 7 games? Thus far, the data suggests no. But more generally, best-of-seven series with extra inning games appear to be more likely to last 7 games. Cool, right? Statistics is weird.

(Note: based on my introductory anecdote, it may be better to ask whether having the first game in a best-of-seven series go extra innings is a predictor of series longevity, but there's simply not enough data to reach a conclusion. Come back in another 50 years or so, and maybe we'll be able to tackle this question.)

Want to dig into statistical significance calculations? Here's the place!

I don't really want to talk much about the details of statistical significance. There are plenty of resources out there that can explain the concept (for example, this one or this one.).

What I would like to do, though, is give you the ability to mess around with sample proportions and see whether differences are stastically significant or not. So here's a little widget that lets you do just that.

The rules are relatively simple. We have two sets of sample data drawn from two populations. Each one has a number of trials (n1 in the first case, n2 in the second case), and a number of successes (s1 and s2). We want to know whether or not it's likely for the proportions of success in the two populations are different or not, based on the samples. We do this by comparing the sample proportions s1/n1 and s2/n2, and calculating what's called a P-value. The lower the P-value, the less likely the two proportions are to be equal.

For example, in the above example, we could take the population to be the space of all past and future World Series; the samples we're comparing are the proportion of seven-game series that have no extra-inning games to the proportion of seven-game series that have at least one extra inning game.

It's worth noting that there are some caveats on the numbers involved: e.g. both s1 and n1s1 should be sufficiently large (say, at least 5).

But enough jibber-jabber. Play around with these inputs if you want to build an intuition. (Note: a p-value of 0.05 or less is typically held as being statistically significant, though this benchmark, as discussed beautifully in Jordan Ellenberg's book How Not to Be Wrong, is somewhat arbitrary.

 Sample 1 Sample 2 P-value Significant (at 0.05)? s1 n1 s2 n2 -- --

Please be sure that s1, n1s1, s2, and n2s2 are all at least 5.