Follow Me!

Follow mmmaaatttttt on Twitter

Playoff Probabilities

Continuing with last week’s theme, and since we are in the midst of playoffs, I’d like to take a moment now to discuss another link between baseball and mathematics.  This link is particularly timely since the scuttlebutt on the internet suggests that next year the playoff rules for baseball will be changed: the number of teams competing for the World Series will increase from 8 to 10, and because of that, another round of playoff games will be introduced.

Currently, the playoffs consist of three rounds.  The first round is the Division Series, in which eight teams compete in a best-of-five match-up (equivalently, a first-to-three match-up, i.e. the first team to win three games wins the series).  The second and third rounds, better known as the Championship Series and World Series, are composed of four and two teams, respectively, but are both best-of-seven (equivalently, first-to-four).  Because of these three rounds of several games each, the playoff season is already quite long; therefore, the new proposed playoff round, it has been suggested, would be composed of either a single game between competing teams, or a best-of-three (first-to-two) series between the two teams.

Many people take issue with such a short series on the grounds of fairness.  In a season where each team plays 162 games, they say, it’s not fair for a team’s World Series hopes to ride on a single game, or even a short series composed of at most three games.  There are even those who suggest that the Division Series is too short, and that all three of the current rounds should be a best-of-seven.  These are noble sentiments, but are they reasonable?  We can use mathematics to try and answer this question.

Suppose two teams are meeting for a playoff series, and the probability that one team (call it, I don’t know, the Giants) will win a single game is p (this model is fairly simple, and does not take into account advantages associated with the starting pitcher, for example, but let’s keep things basic for now).  Then the probability that this team will win a one game series is again p, since the series consists of a single game.

What if the series is three games long?  In this case, the Giants will win if they win the first two games, or split the first two games and win the third game.  So there are three outcomes: WW, WLW, or LWW.  The probability of the first event is p^2, while the probability of the second and third events are both p^2(1-p) (probability p of success is the same as probability 1-p of failure).  Adding these three probabilities gives the total probability that the team will win a best-of-three series:

p^2 + 2p^2(1-p) = p^2(3-2p).

In a best-of-five series, the Giants will win if they win three in a row, two of the first three and the fourth, or two of the first four and the fifth.  Using combinations to count the possibilities, we see that in this case, the probability of the Giants winning the series is equal to

p^3 + \binom{3}{2}p^3(1-p)+\binom{4}{2}p^3(1-p)^2

= p^3(10-15p+6p^2).

With the same type of argument you can calculate the probability that the Giants win a best-of-seven series.  I’ll spare you the details: the result is p^4(35-84p+70p^2-20p^3).

In each case, the probability of winning the series is a polynomial in p, the probability of winning a single game.  But how to these polynomials compare?  Let’s turn to technology to lead the way!

Probabilities for a one, three, five, and seven game series.

Above is a graph of these four functions – the x-axis represents the probability p, while the y-axis represents the probability of winning the series.  The dark blue graph is for a single-game series (the function is p), the light blue graph is for a three-game series (the function is p^2(3-2p), the light green graph is for a five-game series (the function is = p^3(1-15p+6p^2)), and the red graph is for a seven-game series (the function is p^4(35-84p+70p^2-20p^3)).  What can we deduce from the picture above?

First, note that a longer series benefits the stronger team more than the weaker team – this makes intuitive sense, if you think about it.  Also, for teams that are perfectly evenly matched (i.e. p = 0.5), the length of the series doesn’t affect the probability of winning the series, which is also 50% in each case.

But what about teams with a slight, moderate, or strong advantage over their competition?  How does the length of the series affect the probability of winning the series?  Let’s look at a small table of values, in the cases p = .55, p = .6, and p = .7.

pBest of One Odds of SuccessBest of Three Odds of SuccessBest of Five Odds of SuccessBest of Seven Odds of Success
.550.550.575.593.608
.600.600.648.683.710
.700.700.784.837.874

As you can see from the table (or the graph), the more evenly matched the teams, the less of a difference the length of the series makes.  If your team has a 55% chance of winning a given game, the advantage in a seven game series is increased by a little less than 6 percentage points.  With a 60% chance of winning a given game, the advantage in a seven game series is increased by 11 percentage points, and with a 70% chance of winning a given game, the advantage in a seven game series is increased by over 17 percentage points.

Not also that the change from a best-of-five series to a best-of-seven series isn’t really very large.  Even if your team is heavily favored (70% probability of winning each game), the change from a best-of-five series to a best-of-seven series is less than four points.  With more evenly matched teams, the difference is even smaller, suggesting that expansion of the Division Series from a maximum of five to a maximum of seven games isn’t necessarily a great idea.

On the other hand, the largest change in probabilities is between the jump from a best-of-one series to a best-of-three series.  While the change isn’t so significant for evenly matched teams (and more evenly matched teams would be most likely to play each other in this round under the suggested rule changes), for match-ups in which one team is heavily favored, the difference can be more significant.

Whether or not one wants longer series or shorter series depends, I suppose, on one’s baseball philosophy.  It certainly seems like having everything ride on a single game after a season of more than 150 games is a little unbalanced, but from a mathematical standpoint, the stronger team will most likely gain only a small advantage by moving to a three game series.  Of course, this simplified model can only tell us so much, and it’s possible that the advantages of a longer series are being underrepresented here.  To err on the side of caution, I’d be more inclined to support a best-of-three series, though whether or not this is possible without stretching the season too long is something that the folks who are paid better than me to think about these matters will have to decide.

 

Moneyball

This weekend, mathematics played a supporting role to Brad Pitt in one of fall’s first critical darlings, Moneyball. Based on the Michael Lewis book of the same name, the film profiles the Oakland A’s during their 2002 bid for World Series glory.  What allegedly separates their story from the story of other teams during that season is the way General Manager Billy Beane, played by Brad Pitt, deals with the budget constraints imposed on him by the team’s owners.

With a payroll roughly a third the size of the Yankees’, Beane understood that the playing field was not a level one from an economic standpoint.  What’s more, at the end of the 2001 season, three of the A’s star players left Oakland for bigger paychecks.  To fill the void, the film (and book) show how Beane took a more analytic approach, and used statistical analysis to uncover players who were undervalued and could be purchased for much less than they were worth.  Beane, together with Paul DePodesta (Peter Brand in the film, and played by Jonah Hill), used a sabermetric approach to lead the A’s to a league-leading 103 wins for the season. While their first-place ranking for number of wins that year was shared with the Yankees, they spent much less per win than their New York counterparts (the A’s spent the least per win, while the Yankees spent the third most).  Here’s a table comparing the teams; the payroll numbers are taken from here, and differ slightly from the numbers that appear in the book.

TeamWinsLossesPayrollCost Per Win (millions)
Oakland Athletics10359$40,004,167$0.388
Minnesota Twins9467$40,225,000$0.428
Montreal Expos8379$38,670,500$0.466
Florida Marlins7983$41,979,917$0.531
Cincinnati Reds7884$45,050,390$0.578
Pittsburgh Pirates7289$42,323,599$0.588
Los Angeles Angels9963$61,721,667$0.624
Tampa Bay Rays55106$34,380,000$0.625
San Diego Padres6696$41,425,000$0.628
Chicago White Sox8181$57,052,833$0.704
Philadelphia Phillies8081$57,954,999$0.724
Houston Astros8478$63,448,417$0.755
Kansas City Royals62100$47,257,000$0.762
St. Louis Cardinals9765$74,660,875$0.770
Colorado Rockies7389$56,851,043$0.779
San Francisco Giants9566$78,299,835$0.824
Seattle Mariners9369$80,282,668$0.863
Milwaukee Brewers56106$50,287,833$0.898
Baltimore Orioles6795$60,493,487$0.903
Atlanta Braves10159$93,470,367$0.925
Toronto Blue Jays7884$76,864,333$0.985
Detroit Tigers55106$55,048,000$1.001
Los Angeles Dodgers9270$94,850,953$1.031
Arizona Diamondbacks9864$102,819,999$1.049
Cleveland Indians74
88$78,909,449$1.066
Chicago Cubs6795$75,690,833$1.130
Boston Red Sox9369$108,366,060$1.165
New York Yankees10358$125,928,583$1.223
New York Mets7586$94,633,593$1.262
Texas Rangers7290$105,726,122$1.468

 

Their new approach threw out many pieces of conventional baseball wisdom: stealing bases and bunting were strict no-no’s, for example.  Naturally, these changes brought about some tension, and it’s this tension that makes for the dramatic thrust of the film.  In particular, mathematics takes a backseat, though there are some little cameos for those who are paying attention.

The most significant piece of mathematics making an appearance in the film is the Pythagorean Expectation, a formula discovered by Bill James that estimates a team’s win percentage in terms of its runs scored and runs allowed.  More specifically, the formula asserts that a team’s win percentage is approximately equal to

\frac{\textup{runs scored}^2}{\textup{runs scored}^2+\textup{runs allowed}^2}=\frac{1}{1+\textup{(runs scored/runs allowed)}^2}.

For example, the 2002 A’s scored a total of 800 runs, and allowed a total of 654 runs, for a Pythagorean Expectation of

\frac{800^2}{800^2+654^2} \approx 0.599.

(relevant stats can be found here). This compares to the team’s actual win percentage of 103/162, which is around 0.636.

In the film, Peter Brand applies this formula in order to estimate the number of runs the team needs to score, along with the maximum number of runs it can allow, in order to secure a playoff spot.  In one scene, he tells Billy Beane that he thinks the A’s will need to win at least 99 games to guarantee a playoff spot.  In a 162 game season, this equates to a win percentage of around 0.611.  In order to ensure that the Pythagorean Expectation is at least this large, we set

\frac{1}{1+\textup{(runs scored/runs allowed)}^2} > 0.611.

With a little algebra, this is the same as

\textup{runs allowed/runs scored} < \sqrt{\frac{1}{.611}-1}\approx 0.798.

Brand then informs Beane that in order for this to happen, the team needs to score at least 814 runs, and can allow no more than 645 runs.  This gives a runs allowed to runs scored ratio of 645/814, or around 0.793 < 0.798 (though, if I were being anal, I would point out that with 814 runs scored, the team could allow as many as 649 runs and still have a runs scored to runs allowed to runs scored ratio that is less than 0.798).

While the math formulas on display in the film are accurate, I would be remiss if I did not briefly discuss Hill’s portrayal of Peter Brand.  Overall, Hill does a good job; though Brand is clearly a nerd, Hill’s portrayal usually avoids caricature.

Like every other film featuring characters who are good at math, though, Moneyball can’t help itself from including a scene where we see how good Brand is at math because he can do mental calculations quickly.  This particular scene takes place when Brand is sitting in his first meeting with Beane and the rest of the baseball scouts, and though it serves to highlight the tension that exists between Brand’s new school of thought and more traditional baseball thinking, I think the scene could have been just as effective without the clichéd math exercise.

Also, in the interest of full disclosure, I should point out that there are some who feel the story told in Moneyball (both the film and the book) is an exaggeration.  More specifically, as this Slate article discusses, many people believe that the reason for the A’s success during the early aughts had less to do with sabermetrics, and more to do with the fact that they had awesome pitchers in Tim Hudson, Mark Mulder, and Barry Zito, none of whom feature prominently in the book or film.  While I don’t feel knowledgeable enough to weigh in decisively on this issue, the role of the defense certainly appears to be underrepresented here.

To try and convince you of this, recall that the A’s made it to the playoffs in four consecutive years, from 2000-2003.  Here is some data on how many runs they scored and how many runs they allowed during each of those years, and in 2004, when they did not make the playoffs:

Observe that especially from 2001-2003, while the A’s offense declined, their defense remained consistent in allowing relatively few runs.  Of course, this should not be viewed in a vacuum, but rather in relation to how baseball as a whole performed.  Therefore, it is better to consider not runs scored and runs allowed, but runs scored and runs allowed as a proportion of runs scored and runs allowed in the American League.  With this slight adjustment, we get the following picture:

Note in the above that a proportion of 1 means that the A’s were performing at an average rate, while a proportion greater than 1 indicates above-average performance, and a proportion less than 1 indicates below-average performance.  As we can see from the data, in 2001-2003, the A’s defense was allowing runs at a rate well below the average; in other words, the defense was relatively strong.  On the other hand, during the same period, the offense consistently weakened year-over-year, so that the number of runs the A’s scored was below the league average in 2003-2004.  In particular, during the 2002 season profiled in Moneyball, the number of runs scored took a sharp downturn relative to the league average, while the number of runs allowed still remained well below average.  This indicates, to me at least, that the role of the defense was certainly an important factor in the A’s playoff runs during the 2002 and 2003 seasons.  Note also that in the 2004 season the number of runs allowed rose sharply relative to the league average; without a corresponding uptick in runs scored, the A’s didn’t make it to the playoffs.

Nevertheless, I don’t think the issue is binary; excellent pitching and a sabermetric approach probably combined to help the A’s.  Even though Moneyball only explores one of these issues, it’s still a film well worth seeing.  If you’re no fan of mathematics, don’t worry, there isn’t much on display.  And if you’re no fan of baseball, surprisingly, I think you might enjoy the movie anyway.

Wedding Mathematics, Part 3

Today I would like to wrap up my series on mathematics and weddings (a series begun here and continued here) with a little advice for soon-to-be brides and grooms who are looking to integrate some math into their celebrations.  If this describes you, then congratulations – not only on your upcoming nuptials, but also on the classy way you are looking to celebrate them.

For our own wedding, my bride and I decided it would be natural to incorporate some mathematics into the table numbers.  There is some freedom in how one decides to do this.  For example, we initially toyed with the idea of using numbers for the tables that were somehow significant to us and our relationship, but found it too difficult to come up with examples meeting this criterion.  If one wants intrinsically interesting numbers, there are many examples among the whole numbers (I was particularly fond of using the smallest whole number expressible as the sum of cubes in two different ways).  In the end, though, we decided to expand the realm of p0ssibilities beyond the range of whole numbers.  This turned out to be a good decision, both aesthetically and educationally.

Table number e. Hat tip to Caroline for the shot.

If you are looking for a way to incorporate some math into your celebration, the table numbers are certainly one option.  At each of our tables we had a small placard, with the number on one side and a brief description of the number (and some table exercises!) on the reverse.  I tried to have sympathy for our audience, and give descriptions that a general audience would be able to understand, though I gave myself more flexibility with a table occupied by other math students.  For sake of completeness, here are all the numbers we used, along with their descriptions (see if you can tell which table had the math students!).  In no particular order:

1. \pi (see here for more).

The ratio of a circle’s circumference to its diameter, \pi is perhaps the most famous irrational number. Historically, \pi has also been known as Archimedes’ constant, and Archimedes himself proved that 3\frac{10}{71}<\pi<3\frac{1}{7}.

More than one trillion digits of the decimal expansion of \pi have been computed, and folks with nothing better to do than recite those digits come together each \pi day (March 14th, naturally) to see who has memorized the longest string of numbers in the decimal expansion. If you’re looking for more interesting properties of \pi, though, here are a few to mull over:

\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \ldots,

\frac{2}{\pi} = \frac{\sqrt{2}}{2}\cdot \frac{\sqrt{2+\sqrt{2}}}{2}\cdot \frac{\sqrt{2+\sqrt{2+\sqrt{2}}}}{2} \ldots,

 

\frac{\pi}{2} = \frac{2}{1}\cdot \frac{2}{3}\cdot \frac{4}{3}\cdot \frac{4}{5}\cdot \frac{6}{5}\cdot \frac{6}{7}\ldots .

Table exercises!

1. Use geometry to show that 2\sqrt{2}<\pi<4. These bounds are not as good as those of Archimedes, but they are easier to derive.

2. (Harder!) Explain why \pi is irrational, i.e. why it cannot be written as a fraction p/q where p and q are integers.

2. e (see here for more).

e, a.k.a. Euler’s number, a.k.a. Napier’s Constant, is an irrational number of fundamental importance. While it lacks the general public awareness of a number like \pi, I assure you it is no less charming. Typically defined as the limit

e:=\lim_{n\rightarrow\infty}\left(1+\frac{1}{n}\right)^{n},

e enjoys many other identities, including

e=1+\frac{1}{1!}+\frac{1}{2!}+\frac{1}{3!}+\frac{1}{4!}+\ldots,

and

e=\lim_{n\rightarrow\infty}\frac{n}{\sqrt[n]{n!}}.

e also determines the base of the exponential function e^{x}, unique among all exponential functions in the study of calculus because it is equal to its own derivative.

Table exercises!

1. Use one of the identities above to verify that e < 3.

2. Use one of the identities above to verify that e is irrational, i.e. that it cannot be written as a ratio p/q where p and q are integers.

3. Suppose each of you has brought a hat to this wedding. Everyone leaves his or her hat inside, and when a person leaves, he can’t be bothered to search for the hat he brought, and simply takes one from the hat pile at random. Show that the probability nobody ends up with the hat they came in with tends to 1/e as the number of people increases.

3. \zeta(3) (see here for more).

Take all the perfect cubes (1^{3}=1, 2^{3}=8, 3^{3}=27, and so on), take the reciprocals of all those perfect cubes, and add them all together. You will end up with a number that is sometimes called Apéry’s constant, and is written

\zeta(3) = 1+\frac{1}{2^{3}}+\frac{1}{3^{3}}+\frac{1}{4^{3}}+\ldots \approx 1.202\ldots .

The constant is named in honor of Roger Apéry, who proved in 1978 that this number is irrational. Intuitively, one can interpret \frac{1}{\zeta(3)} as the probability that three randomly chosen whole numbers will have no prime factors in common.

One can consider more general numbers as well. For example, for any whole number k bigger than 1, the sum

\zeta(k)=1+\frac{1}{2^{k}}+\frac{1}{3^{k}}+\frac{1}{4^{k}}+\ldots

will yield some finite value. When k is even, one has nice formulas for the values, for instance \zeta(2)=\frac{\pi^{2}}{6}, \zeta(4)=\frac{\pi^{4}}{90}.

In fact, it is possible to let k take on quite a large range of values. The function one gets is called the Riemann zeta function, and lies at the center of one of the most famous unsolved problems in mathematics.

Table exercises!

1. Show that \zeta(1)=\infty.

2. Given that \zeta(2)=\frac{\pi^{2}}{6}, show that 1+\frac{1}{3^{2}}+\frac{1}{5^{2}}+\frac{1}{7^{2}}+\ldots=\frac{\pi^{2}}{8}.

4. \gamma (see here for more).

\gamma, a.k.a. the Euler-Mascheroni constant (not to be confused with Euler’s number e), is perhaps best introduced geometrically. Consider the following figure:

The black portion of the area pictured above is found by drawing rectangles between two integers n and n + 1 with height 1/n (the rectangle between 1 and 2 has height 1, the rectangle between 2 and 3 has height 1/2, and so on), and subtracting the area under the graph of the function y = 1/x.  The total black area, if this picture were to be extented out to infinity, would represent the number \gamma.

\gamma can be approximated by its decimal expansion, \gamma\approx0.5772\ldots, and while this number comes up quite naturally in number theory and mathematical statistics, surprisingly little is known about it. For example, it is unknown whether or not \gamma is a rational number (unlike constants such as \pi or e, which are known to be irrational).

Table exercises!

1. Using geometry and the figure above, show that \gamma>\frac{1}{4}+\frac{1}{12}+\frac{1}{24}+\frac{1}{40}+\ldots.

2. Show that the sum on the right hand side of the inequality in the first exercise equals \frac{1}{2}, so that \gamma>\frac{1}{2}.

5. \infty (see here for more).

\infty is a concept of central importance in mathematics, and ergo, a concept of central importance in all things. While the figure-eight symbol for infinity is known and loved by all, it was not introduced until the year 1655, though many ancient cultures grappled with the idea of the infinite.

Though \infty may seem like a single idea, great minds have shown that not all infinities are created equal. For example, the mathematician Georg Cantor showed that even though there are infinitely many whole numbers, and there are infinitely many real numbers, there are (in a sense that can be made rigorous) infinitely many more real numbers than counting numbers.

On a related note, the love Matt and Meg feel for you all for standing with them on this day is undoubtedly infinite. How this compares to their love for one another, however, is a problem that has yet to be investigated.

Table exercises!

1. Show that there are infinitely many prime numbers.

2. How does the number of even integers compare to the number of integers? Are there more of one type of number?

3. Suppose a set is finite with N elements. Show that the set of subsets of the original set is finite with 2^{N} elements.

6. \varphi (see here for more).

Suppose two line segments have length a and b, with a larger than b. If the ratio of a to b is the same as the ratio of a + b to b, this ratio is called the golden ratio, and is written \varphi. In other words,

\varphi=\frac{a}{b} = \frac{a+b}{a} = 1 + \frac{1}{\varphi}.

This, in turn, implies that \varphi^{2}-\varphi-1=0, or (by the quadratic formula)

\varphi=\frac{1+\sqrt{5}}{2}\approx1.618\ldots.

The golden ratio has a rich history, both mathematically and artistically. It is also closely related to the Fibonacci sequence, the sequence of numbers whose first two terms are 0 and 1, and where all subsequent terms are found by adding the previous two terms. In other words, the sequence begins 0,1,1,2,3,5,8,13,\ldots. If we let F_{n} denote the n^{th} Fibonacci number (so F_{0}=0, F_{7}=13, and so on), then

\varphi=\lim_{n\rightarrow\infty}\frac{F_{n+1}}{F_{n}}.

Table exercises!

1. Show why the above limit formula for \varphi is true.

2. Show that F_{n}=\frac{\varphi^{n}-(1-\varphi)^{n}}{\sqrt{5}}.

3. Show that for any n, F_{0}+F_{1}+F_{2}+\ldots+F_{n}=F_{n+2}-1.

7. \Lambda (see here for more).

The de Bruijn-Newman constant, the value of which is currently unknown, is intimately connected to the Riemann Hypothesis. There exists a class of functions H_{t}(x), one for each real number t. H_{0}(x) is essentially the Riemann \xi function, and in particular, the Riemann Hypothesis is true if and only if H_{0}(x) has only real zeros.

Here are some properties of the family of functions H_{t}(x):

1. H_{t}(x) has only real zeros for any t\geq1/2.

2. If H_{t}(x) has only real zeros, then for any t^{\prime}\geq t, H_{t^{\prime}}(x) has only real zeros too.

3. There exists a real value t_{*} such that H_{t_{*}}(x) has at least one non-real zero.

These properties combine to show the existence of a constant \Lambda, lying somewhere in the range -\infty<\Lambda\leq1/2, such that H_{t}(x) has only real zeroes if and only if t\geq\Lambda. This is how the de Bruijn-Newman constant is defined. Moreover, the Riemann Hypothesis is equivalent to the statement that \Lambda\leq0.

The current best estimates for \Lambda state that

-2.7\times10^{-9}<\Lambda\leq1/2,

so if the Riemann Hypothesis is true, it is, in some sense, “just barely” true. In particular, it’s possible that \Lambda=0, in which case you are really just sitting at the 0 table. But while your table may be marked as such, you should know that none of you are zeros in our hearts.

Table exercises!

1. Prove or disprove the Riemann Hypothesis.

8. i (see here for more).

i, more formally known as the square root of -1, is defined to be one of two solutions to the equation x^{2}=-1 (the other solution being -i).

While this might seem like an arbitrary construction, in the larger context of history, it makes perfect sense. Just as the whole numbers are perfectly good for solving basic counting problems, but may be insufficient for problems involving debts or losses (where negative numbers play a prominent role), or problems involving rates or ratios (where fractions take the spotlight), the extension of numbers to include i leads to a wide variety of applications. This include (but are not limited to) applications in electrical engineering, signal processing, and fluid dynamics.

i is also one of the key ingredients in Euler’s identity, one of the most popular formulas in mathematics. This formula states that e^{i\pi}+1=0, and is noted for its unification of five constants of fundamental importance in mathematics: e, \pi, i, 1 and 0.

Table exercises!

1. Show that i^{n}=1 whenever n is divisible by 4.

2. Find all x satisfying the equation x^{4}-1=0.

3. The set of complex numbers is defined as the set of all a + bi, where a and b are real numbers. 1 + i is a complex number, as is \sqrt{2}-7i. Can you define an addition law on the set of complex numbers? A multiplication law?

9. \rho (see here for more).

The plastic constant \rho can be viewed as a cousin to the golden ratio \varphi (see the \varphi table for more information). Formally, \rho is equal to the real root of the equation x^{3}=x+1. The value of \rho is

\rho=\sqrt[3]{\frac{1}{2}+\frac{1}{6}\sqrt{\frac{23}{3}}}+\sqrt[3]{\frac{1}{2}-\frac{1}{6}\sqrt{\frac{23}{3}}}\approx1.3247\ldots.

Just as the golden ratio is intimately related to the Fibonacci sequence, the plastic constant is related to a sequence known as the Padovan sequence. The first three numbers in the Padovan sequence are given by

P_{0}=P_{1}=P_{2}=1,

and the nth term is given by adding two earlier terms in the sequence:

P_{n}=P_{n-2}+P_{n-3}.

For example, the first few terms in the sequence are given by 1,1,1,2,2,3,4,5,7,9,\ldots.

One can similarly construct a sequence known as the Perrin sequence. This sequence is similar to the Padovan sequence, but in this case, the equations needed to get started are A_{0}=3,A_{1}=0,A_{2}=2,A_{n}=A_{n-2}+A_{n-3}. In either case,

\lim_{n\rightarrow\infty}\frac{A_{n+1}}{A_{n}}=\rho=\lim_{n\rightarrow\infty}\frac{P_{n+1}}{P_{n}}.

Table exercises!

1. Show why the limit formulas given above are true.

2. Show that the first few terms of the Perrin sequence are 3,0,2,3,2,5,5,7,10,\ldots.

3. Show that if p is a prime number, A_{p} is divisible by p.

10. \sqrt{2} (see here for more).

Along with \pi, \sqrt{2} is probably the most well known number on display here. While it may seem mundane, \sqrt{2} has an interesting mathematical history, notably because it was one of the first examples of an irrational number (i.e. a number that cannot be expressed as a fraction p/q where p and q are both integers). An early proof of this fact is attributed to the Greek thinker Hippasus, a follower of Pythagoras; legend has it that when he discovered \sqrt{2} was irrational, the result was so controversial that he was thrown out to sea by his colleagues and drowned.

These days, mathematics is (for the most part) less fraught with peril. The following elegant identities involving \sqrt{2} have been met with much less controversy:

\sqrt{2} = 1 + \frac{1}{2+\frac{1}{2+\frac{1}{2+\ldots}}},

\sqrt{2} = \left ( 1+\frac{1}{1} \right )\left ( 1-\frac{1}{3} \right )\left ( 1+\frac{1}{5} \right )\ldots,

\sqrt{2}^{\sqrt{2}^{\sqrt{2}^{\sqrt{2}^{\ldots}}}} = 2.

Table exercises!

1. Prove that \sqrt{2} is irrational (make sure you are removed from any large bodies of water).

2. Try to prove the identities written above.

3. For which whole numbers m is \sqrt{m} a rational number?

Enjoy the table exercises!

Wedding Planning and the Ménage Problem

Last month I wrote a wedding-themed post on some statistics behind the show Four Weddings.  Now, fully refreshed from my own two week honeymoon, I would like to take some time to discuss some other areas of intersection between weddings and mathematics.

One of the things I most looked forward to during the planning of our wedding was the determination of the seating chart.  Searching for an optimal arrangement given peoples’ preferences to sit next to their friends and away from their enemies was a fun little challenge.  In the end, though, perhaps I made things too easy on myself.  Although I assigned people to specific tables, I did not assign seats within the tables themselves.  Instead, people were free to sit however they chose once they found their table.

An example of our seating. Hat tip to Dave Gilbert for the shot!

Were I truly a glutton for punishment, I would have assigned seats at each table.  And were I a traditionalist, I would have arranged the seating so that couples were separated, and genders alternated.  In other words, more traditional etiquette would dictate that at each table, the seats alternate from boy to girl to boy to girl (and so on), so that no two people who sit next to each other are in a relationship.  While these additional restrictions would have complicated my job, they would have also opened the door to some nice mathematics.  More specifically: if I had imposed these restrictions, in how many ways could I have seated people at a particular table?

Though our wedding was recent, this problem dates back to the 19th century.  It is now commonly referred to as the “Ménage problem,” or, less titillatingly, the “Married Couples Problem.”  The solution to this problem is a formula which tells us in how many ways n couples can be seated at a round table so that (a) the genders alternate, and (b) no two people in a couple are seated next to each other.  A derivation of the formula can be found here; the conclusion is that the number of ways to seat n couples at a round table in a way that would satisfy Ms. Manners, if denoted by the symbol M_{n}, is given by the formula

M_{n}=2\cdot n!\sum_{k=0}^{n}(-1)^{k}\frac{2n}{2n-k}\binom{2n-k}{k}(n-k)!.

(As outlined in earlier posts, \binom{n}{k} = \frac{n!}{k!(n-k)!}, and n! denotes the product of all the whole numbers from 1 up to n.) In the case of our wedding, where each table could seat 10 people (or 5 couples), this amounts to a total of

M_{5} = 2\cdot 5!\left ( 5!-10\cdot 4!+35\cdot 3! - 50\cdot 2!+ 25\cdot 1! - 2 \cdot 0!\right )

= 240(120-240+210-100+25-2)

= 28800 - 57600 + 50400 - 24000 + 6000 - 480 = 3120.

In other words, there are 3,120 ways to seat 5 couples at a round table subject to the restrictions (a) and (b) given above.

Ms. Manners loves math.

Why does this formula work?  While a rigorous proof is provided at the page linked above, let me try to give a discussion that is less concerned with the details.  First, suppose you ignored condition (b) and only wanted to count the number of ways you could seat people at a table so that men and women alternated.  There are two ways to start the alternation, since the first seat can be filled by either a man or a woman; once this choice has been made, there are n! ways to seat the men, and n! ways to seat the women, so the total number of ways to seat people is 2\cdot (n!)^{2}. In particular, for 5 couples, 5! = 120, so this expression yields a value of 2 x 120 x 120 = 28,800. Note that this number, not coincidentally, appears in the calculation above.

28,800 is much larger than the true answer of 3,120.  The reason, of course, is that we have ignored the restriction that no couples sit together, and we have therefore over-counted the number of possible configurations.  So let us now try to remedy the situation.  One thing we can do is try to subtract out the number of ways we can have a seating arrangement with one couple seated together – by eliminating these cases from our count, we should end up with a more reliable count, right?

How many ways can we arrange things so that a couple is guaranteed to sit together?  Well, first we must choose the couple (5 ways to do this).  Then we must decide how to start the male/female alternation of seats, just as in the previous case (2 ways to do this).  Then we must choose where to seat the couple we have chosen to put together (10 ways to do this), and from there we must decide where to seat the remaining four men (4! ways to do this) and four women (4! ways to do this).  Therefore, the number we want to subtract out (by the fundamental counting principle) is 5 x 2 x 10 x 4! x 4! = 57,600.  Notice the appearance of this number in our earlier calculation as well.

But this can’t be right, because 28,800 – 57,600 is negative!  While we’ve subtracted, it turns out we’ve subtracted too much.  What we’re witnessing here is an example of the inclusion-exclusion principle, and this principle is what lies behind the alternating positive and negative signs in the expression 28800 – 57600 + 50400 – 24000 + 6000 – 480.

As it turns out, what we need to do is add in the number of ways we can arrange things so that two couples are guaranteed to sit together (50,400 ways).  This will once again lead to a number that is too high, so we must subtract out the number of ways we can arrange things so that three couples are guaranteed to sit together (24,000 ways).  This, in turn, will lead to a number that is too low, so we must add in the number of ways we can arrange things so that four couples are guaranteed to sit together (6,000 ways), and, as it turns out, this number will be too high, so we must finally subtract out the number of ways we can arrange things so that all five couples are guaranteed to sit together (480 ways).  Only then will we have the right answer.

The inclusion-exclusion principle can be understood in certain cases with the use of Venn diagrams (see the linked Wikipedia article for more information).

Of course, we live in the 21st century – it’s not necessarily true that everyone will come to a wedding with a date, and for those that do, there is no guarantee that the date will be of the opposite sex!  In fact, if one has a table with an equal number of same-sex male couples and same-sex female couples, any table arrangement in which genders alternate will automatically fulfill the condition that no couples sit next to each other.  This raises a question, which I will leave you with for now: suppose a round table is to be arranged so that it will sit n same-sex male couples, n same-sex female couples, and m heterosexual couples.  In how many ways can the seats be arranged so that conditions (a) and (b) written above remain satisfied?

Four Weddings and Some Statistics

When my fiancee was in the midst of the wedding planning, part of her research (or perhaps it was simply a guilty pleasure) involved watching wedding shows on basic cable.  For those of you who have not had the pleasure, between stations like WE tv and TLC, there are no fewer than nine different wedding-themed reality shows airing weekly.  Many of them are appealing in a rubbernecking sort of way; much like a car crash, the spectacle is too ridiculous to turn away from (I’m looking at you, My Big Fat Gypsy Wedding).

Of all of these shows, though, the one that most piques my mathematical interest is TLC’s Four Weddings.  Based on a British show with the same name, the premise is as follows: four brides-to-be, unknown to one another, meet and attend each others’ weddings.  When one bride gets married, the other three score various aspects of the wedding, and the bride with the highest score among the four wins a honeymoon (contingencies are in place in the event of a tie, though these are not always explained and seem to vary from season to season).  In order to make for good TV, the show frequently manages to bring out the worst aspects of these women, as they nitpick and pass judgment on everyone else’s wedding.  Here’s a short clip to give you a taste for what this show is all about:

How are the weddings scored?  This process is explained in detail during the course of each episode.  The wedding is broken down into four categories: dress, venue, food, and overall experience.  For overall experience, the other three brides in attendance give a score from 1 to 10 (though I’ve never seen a bride give another wedding a 10).  For the rest of the categories, though, the brides can only rank the weddings as being 1st, 2nd, or 3rd in the given category.  1st place gets 10 points, 2nd place gets 6, and 3rd gets 3.  The total possible number of points a wedding can score is therefore 120.  At the end of each episode, the scores for each bride are broken out in detail.  The wedding budget and headcount are provided as well.

After watching a few episodes, it seemed like the best way to ensure a win was to simply outspend your competitors.  Certainly a large budget can help improve the guest experience or score a hip dress, but I was curious as to what overall trend (if any) could be made between, say, money spent on a wedding and the wedding’s overall score.

With this noble goal in mind, I proceeded to DVR 28 episodes of this show.  After recording the scores for 112 weddings, some of my questions were as follows: is the amount a person spends on a wedding correlated to the score they receive from the other competitors?  What about the amount a person spends per guest?  Finally, did the most expensive wedding win more frequently than would be expected by pure chance?

Let’s look at some data.  Here is a scatterplot of each wedding’s budget, vs. the total points earned.

Click to embiggen!

As the dots suggest, there is a slight positive correlation between the amount one spends on a wedding, and the score one receives from one’s fellow competitors.  The coefficient of correlation here is approximately 0.286, though if we discard the two $150,000 weddings, this improves to around 0.348.

Of course, a $10,000 wedding for 10 people might be a much nicer affair than a $10,000 wedding for 1,000 people.  If you spend more money per guest, does this translate into a higher score as well?  The dots don’t lie; here’s another scatterplot:

Click to embiggen!

There is quite clearly an outlier in this set of data – this corresponds to a bride who spent $150,000 for a 120 person wedding, for a whopping $1,250 spent per guest (this particular bride did end up taking first place).  Eliminating this outlier, though, the correlation here is weaker than the correlation for actual cost, at a meager 0.098.  In other words, there may not be a linear relationship between cost per guest and total score.  This may be because certain fixed costs, such as the dress and the venue, won’t necessarily vary much with the guest total, unlike something like food.

This analysis, though, obscures a key point.  In order to win the honeymoon, you don’t necessarily need a high score; you only need a higher score than your three other competitors.  With this in mind, it may be simpler to just look at the scores episode-by-episode, and see how the amount spent on a wedding compares to the wedding’s relative ranking.

In the graph below, the blue bars count the number of times the most expensive wedding was given a certain rank.  The red bars count the number of times the wedding with the highest cost per guest was given a certain rank.  Note that if cost had no bearing on the rank, we would expect an equal number of weddings in each rank (in this case, about 7 per rank).

Click to embiggen!

As you can see, spending the most money seems to bestow an advantage: fully 50% (14 out of 28) of the most expensive weddings were ranked first.  This would be unlikely if cost had no impact on ranking.

The picture is a little murkier for cost per guest -the frequency for each ranking sticks pretty closely to 7, so it’s not clear that spending more per guest gives any advantage.

In conclusion, there is a small positive correlation between amount spent and score received, though this does not transfer to the amount spent per guest.  Compared only to one’s other three competitors, the amount spent appears to confer an even greater advantage, so if you are on this show and want to show your competitors that you are better than them, my advice would be to simply outspend them.  If all you’re interested in is a nice vacation, though, it may be cheaper to just stick to your budget, and plan a vacation on your own.

Batman Interlude

Hi everyone.  Apologies for flying under the radar lately.  I am getting married soon, and along with life’s usual habit of getting in the way, preparations are surprisingly time consuming.

Having said that, I have a couple of articles in the pipeline specifically addressing the intersection of mathematics and weddings (the intersection is non-empty, I assure you).  In the meantime, if you’re looking for a mathematical fix, you need look no further than this link, which gives an explicit function whose graph bears a striking resemblance to the Batman logo.  Mathematicians who need to contact crime fighters need no longer live in fear.

Na-na-na-na Na-na-na-na MATH GRAPHS!!!!

Want to see your favorite superhero’s logo memorialized in the Cartesian coordinate plane?  Give it a shot!

I’ll be back soon with some more substantial content.  Hat tip to Nate for the link to this crime-reducing function.

Math Jams

Sorry I’m so late to the party on this one, but I wanted to draw your attention to this NPR article from a couple of months back.  It profiles the “Songwriter in Residence” program at the University of Tennessee’s National Institute for Mathematical and Biological Synthesis (or NIMBioS if you feel like spitting a bunch of letters out of your mouth).  The experimental program hires songwriters for one month stints at the Institute, during which time they work with researchers to develop two songs on current scientific/mathematical research.  Here’s one of the resident’s performing a song on sexual selection:

While combining the arts with the sciences is nothing new, it’s cool to see a program embrace the intersection of these disciplines with such gusto.  Of course, it can be difficult to squeeze educational content out of a song with a science focus, but if School House Rock has taught me anything, it is that education and fly jams need not be mutually exclusive.  If you feel, however, that NIMBioS’s song on sexual selection doesn’t quite make the cut on the education front, here are some other math and science songs to ease you through your hump day.

Here is “The First and Second Law” by Flanders and Swann (this one is highly recommended), a song about thermodynamics:

The University of Tennessee isn’t the only school to join songwriting and science.  Here’s an evolutionary jam courtesy of the University of Chicago:

For the mathematically inclined, there isn’t much that can beat “Finite Simple Group (of Order Two)”:

Though, if those jokes don’t make much sense, one can always listen to muppets singing about math instead. Who knew there was math in tube socks?

(Hat tip to Meg for the NPR link!)

More Shameless Self-Promotion

Hi all.  As a small gift for you going into this weekend, here‘s a link to an article from The Numbers Guy at the Wall Street Journal.  I was one of several people interviewed for my thoughts on the preponderance of math holidays that have been in the news recently.  If you’ve been reading this blog for a while, you will already know my general feelings towards these holidays.  More details, though, can be found here or here.  If you’re curious, you can probably find other articles in which I jump on the soapbox.

I’ll be back next week with something more substantive.  In the meantime, enjoy your weekend, and if you’re in Los Angeles, Happy Carmageddon!

Some Readership Statistics

This week marks the third anniversary of Math Goes Pop!  As such, I thought it might be appropriate to engage in a bit of navel-gazing.  But since I can gaze at my own navel whenever I please, I’d like to flip the script, as it were, and turn my attention towards the collective navels of my readership.

Our cat's third birthday is also this week. It is unclear which event he is celebrating, although the dilated pupils suggest he is celebrating a bit too hard.

I’d like to share with you some data on the geographic distribution of my US readers.  While there is a large California bias, people from all over the country seem to have stumbled upon this corner of the internet, and have hopefully enjoyed their time here.

This represents you, gentle reader. Darker green means more viewers.

Of course, a California bias shouldn’t be all that surprising.  After all, California is the most populous state in the country, accounting for roughly 12% of the country’s total population, according to this 2010 Census data.  A more interesting thing to think about, then, is not the map pictured above, but how the map contrasts with actual population data for each state.  For example, New York is a hair darker than Texas on this map, even though Texas has a larger share of our population: roughly 8% as compared to 6% for New York.

One way to compare population data with Math Goes Pop visitor data is through rankings.  This table compares the rankings, and shows the relative difference for each state (feel free to play around with the data to your heart’s content):

StateView Proportion RankPopulation Proportion RankRank Difference
California110
New York23+1
Michigan38+5
Texas42-2
Illinois550
Pennsylvania660
New Jersey711+4
Massachusetts814+6
Florida94-5
Virginia1012+2
Washington1113+2
North Carolina1210-2
Ohio137-6
Maryland1419+5
Georgia159-6
Missouri1618+2
Connecticut1729+12
Minnesota1821+3
Oregon1927+8
Arizona2016-4
Colorado2122+1
Wisconsin2220-2
Tennessee2317-6
Indiana2415-9
Washington DC2550+25
South Carolina2624-2
Louisiana2725-2
Kentucky2826-2
Iowa2930+1
Utah3034+4
Oklahoma3128-3
Alabama3223-9
Kansas33330
Nebraska3438+4
Arkansas3532-3
Nevada3635-1
Mississippi3731-6
New Hampshire3842+4
Rhode Island3943+4
New Mexico4036-4
Idaho4139-2
Maine4241-1
West Virginia4337-6
Hawaii4440-4
Vermont4549+4
Alaska4647+1
Montana4744-3
Delaware4845-3
North Dakota4948-1
South Dakota5046-4
Wyoming51510

The rankings give us some information: we see that Indiana and Alabama are not as well represented in readership as one might expect given their population rankings (both states have Math Goes Pop readership rankings 9 levels below their population rankings).  On the other hand, folks from DC, Connecticut, and Oregon are visiting this site more than would be expected based on population numbers alone; readership rankings for these states are 25, 12, and 8 levels above their population rankings, respectively.

But while the rankings tell us some things, they leave a great deal out.  For instance, while California is ranked first in both the proportion of the US population and the proportion of visitors to this site, the rankings tell us nothing about how these proportions compare to each other.  In fact, while California boasts the number 1 proportion in both categories, the proportion of California viewers to Math Goes Pop is more than twice the proportion of California’s population (26% of my viewers vs. 12% of the US population).

Comparison of the proportions in this way allows us to get a better understanding of how visitors to this site are distributed across the country, as compared to the overall population distribution.  While the population is distributed more heavily in California, the proportion of California Math Goes Pop visitors is greater than can be explained by simple population data.

If we compare the state-by-state readership proportions to overall population proportions, we get the following picture:

Click to Embiggen!

Big ups to our nation’s capital for having the largest share of viewership relative to its overall share of the country’s population.  In addition to DC, the proportion of readership from 10 states is greater than the state’s proportional population: California, Massachusetts, Michigan, New York, Connecticut, Maryland, Oregon, New Jersey, Washington, and Vermont.  For example, while Massachusetts accounts for roughly 2% of the country’s population, thus far it has accounted for nearly 3.5% of Math Goes Pop readership.

I won’t go into an analysis of why some states might be over- or underrepresented in the blog’s readership according to this metric.  I just thought it might be appropriate to share some of this data in commemoration of Math Goes Pop!  I hope you will continue to enjoy the material posted here.  And if you live in Wyoming, Mississippi, Alabama, or any of the other 40 states I didn’t mention in the paragraph above, let’s see what we can do to get some mathematical love flowing in your neighborhood.

Some Shameless Self-Promotion

Looking for a way to procrastinate before the three day weekend?  Then feel free to check out this interview I gave to the Journal of Media Literacy Education.  I gave the interview some time ago, but just happened to stumble upon it in published form this week.  If you want some behind-the-scenes perspective into how this blog started, and my general philosophy behind writing it, this interview is a good place to start.

Hope the long weekend treats you well!