<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Math Goes Pop! &#187; statistics</title>
	<atom:link href="http://www.mathgoespop.com/tag/statistics/feed" rel="self" type="application/rss+xml" />
	<link>http://www.mathgoespop.com</link>
	<description>Ruminations on the Intersection Between Mathematics and Popular Culture</description>
	<lastBuildDate>Sat, 04 Feb 2012 02:21:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Four Weddings and Some Statistics</title>
		<link>http://www.mathgoespop.com/2011/08/four-weddings-and-some-statistics.html</link>
		<comments>http://www.mathgoespop.com/2011/08/four-weddings-and-some-statistics.html#comments</comments>
		<pubDate>Wed, 17 Aug 2011 17:46:59 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Math Gets Around]]></category>
		<category><![CDATA[Math on TV]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[weddings]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=1347</guid>
		<description><![CDATA[<p>When my fiancee was in the midst of the wedding planning, part of her research (or perhaps it was simply a guilty pleasure) involved watching wedding shows on basic cable.  For those of you who have not had the pleasure, between stations like WE tv and TLC, there are no fewer than nine different wedding-themed reality <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2011/08/four-weddings-and-some-statistics.html">Four Weddings and Some Statistics</a></span>]]></description>
			<content:encoded><![CDATA[<p>When my fiancee was in the midst of the wedding planning, part of her research (or perhaps it was simply a guilty pleasure) involved watching wedding shows on basic cable.  For those of you who have not had the pleasure, between stations like <a href="http://www.wetv.com/">WE tv</a> and <a href="http://tlc.discovery.com/">TLC</a>, there are no fewer than nine different wedding-themed reality shows airing weekly.  Many of them are appealing in a rubbernecking sort of way; much like a car crash, the spectacle is too ridiculous to turn away from (I&#8217;m looking at you, <a href="http://tlc.howstuffworks.com/tv/my-big-fat-gypsy-wedding">My Big Fat Gypsy Wedding</a>).</p>
<p style="text-align: center;"><object width="640" height="390"><param name="movie" value="http://www.youtube.com/v/2HAUmII_hcg?version=3&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="640" height="390" src="http://www.youtube.com/v/2HAUmII_hcg?version=3&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Of all of these shows, though, the one that most piques my mathematical interest is TLC&#8217;s <a href="http://en.wikipedia.org/wiki/Four_Weddings">Four Weddings</a>.  Based on a British show with the same name, the premise is as follows: four brides-to-be, unknown to one another, meet and attend each others&#8217; weddings.  When one bride gets married, the other three score various aspects of the wedding, and the bride with the highest score among the four wins a honeymoon (contingencies are in place in the event of a tie, though these are not always explained and seem to vary from season to season).  In order to make for good TV, the show frequently manages to bring out the worst aspects of these women, as they nitpick and pass judgment on everyone else&#8217;s wedding.  Here&#8217;s a short clip to give you a taste for what this show is all about:</p>
<p style="text-align: center;"><object width="640" height="390"><param name="movie" value="http://www.youtube.com/v/PeI09kys09Q?version=3&amp;hl=en_US" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed type="application/x-shockwave-flash" width="640" height="390" src="http://www.youtube.com/v/PeI09kys09Q?version=3&amp;hl=en_US" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>How are the weddings scored?  This process is explained in detail during the course of each episode.  The wedding is broken down into four categories: dress, venue, food, and overall experience.  For overall experience, the other three brides in attendance give a score from 1 to 10 (though I&#8217;ve never seen a bride give another wedding a 10).  For the rest of the categories, though, the brides can only rank the weddings as being 1st, 2nd, or 3rd in the given category.  1st place gets 10 points, 2nd place gets 6, and 3rd gets 3.  The total possible number of points a wedding can score is therefore 120.  At the end of each episode, the scores for each bride are broken out in detail.  The wedding budget and headcount are provided as well.</p>
<p>After watching a few episodes, it seemed like the best way to ensure a win was to simply outspend your competitors.  Certainly a large budget can help improve the guest experience or score a hip dress, but I was curious as to what overall trend (if any) could be made between, say, money spent on a wedding and the wedding&#8217;s overall score.</p>
<p>With this noble goal in mind, I proceeded to DVR 28 episodes of this show.  After recording the scores for 112 weddings, some of my questions were as follows: is the amount a person spends on a wedding correlated to the score they receive from the other competitors?  What about the amount a person spends <em>per guest</em>?  Finally, did the most expensive wedding win more frequently than would be expected by pure chance?</p>
<p>Let&#8217;s look at some data.  Here is a scatterplot of each wedding&#8217;s budget, vs. the total points earned.</p>
<div id="attachment_1357" class="wp-caption aligncenter" style="width: 620px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-5.png"><img class="size-full wp-image-1357" title="CostVsScore" src="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-5.png" alt="" width="610" height="311" /></a><p class="wp-caption-text">Click to embiggen!</p></div>
<p>As the dots suggest, there is a slight positive correlation between the amount one spends on a wedding, and the score one receives from one&#8217;s fellow competitors.  The <a href="http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient">coefficient of correlation</a> here is approximately 0.286, though if we discard the two $150,000 weddings, this improves to around 0.348.</p>
<p>Of course, a $10,000 wedding for 10 people might be a much nicer affair than a $10,000 wedding for 1,000 people.  If you spend more money per guest, does this translate into a higher score as well?  The dots don&#8217;t lie; here&#8217;s another scatterplot:</p>
<div id="attachment_1359" class="wp-caption aligncenter" style="width: 649px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-6.png"><img class="size-full wp-image-1359" title="CPGvsScore" src="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-6.png" alt="" width="639" height="326" /></a><p class="wp-caption-text">Click to embiggen!</p></div>
<p>There is quite clearly an outlier in this set of data &#8211; this corresponds to a bride who spent $150,000 for a 120 person wedding, for a whopping $1,250 spent per guest (this particular bride did end up taking first place).  Eliminating this outlier, though, the correlation here is weaker than the correlation for actual cost, at a meager 0.098.  In other words, there may not be a linear relationship between cost per guest and total score.  This may be because certain fixed costs, such as the dress and the venue, won&#8217;t necessarily vary much with the guest total, unlike something like food.</p>
<p>This analysis, though, obscures a key point.  In order to win the honeymoon, you don&#8217;t necessarily need a high score; you only need a higher score than your three other competitors.  With this in mind, it may be simpler to just look at the scores episode-by-episode, and see how the amount spent on a wedding compares to the wedding&#8217;s relative ranking.</p>
<p>In the graph below, the blue bars count the number of times the most expensive wedding was given a certain rank.  The red bars count the number of times the wedding with the highest cost per guest was given a certain rank.  Note that if cost had no bearing on the rank, we would expect an equal number of weddings in each rank (in this case, about 7 per rank).</p>
<div id="attachment_1362" class="wp-caption aligncenter" style="width: 649px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-7.png"><img class="size-full wp-image-1362" title="costranks" src="http://www.mathgoespop.com/wp-content/uploads/2011/08/Picture-7.png" alt="" width="639" height="336" /></a><p class="wp-caption-text">Click to embiggen!</p></div>
<p>As you can see, spending the most money seems to bestow an advantage: fully 50% (14 out of 28) of the most expensive weddings were ranked first.  This would be unlikely if cost had no impact on ranking.</p>
<p>The picture is a little murkier for cost per guest -the frequency for each ranking sticks pretty closely to 7, so it&#8217;s not clear that spending more per guest gives any advantage.</p>
<p>In conclusion, there is a small positive correlation between amount spent and score received, though this does not transfer to the amount spent per guest.  Compared only to one&#8217;s other three competitors, the amount spent appears to confer an even greater advantage, so if you are on this show and want to show your competitors that you are better than them, my advice would be to simply outspend them.  If all you&#8217;re interested in is a nice vacation, though, it may be cheaper to just stick to your budget, and plan a vacation on your own.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2011/08/four-weddings-and-some-statistics.html/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Some Readership Statistics</title>
		<link>http://www.mathgoespop.com/2011/07/some-readership-statistics.html</link>
		<comments>http://www.mathgoespop.com/2011/07/some-readership-statistics.html#comments</comments>
		<pubDate>Sat, 09 Jul 2011 00:40:18 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Et cetera]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=1294</guid>
		<description><![CDATA[<p>This week marks the third anniversary of Math Goes Pop!  As such, I thought it might be appropriate to engage in a bit of navel-gazing.  But since I can gaze at my own navel whenever I please, I&#8217;d like to flip the script, as it were, and turn my attention towards the collective navels of my <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2011/07/some-readership-statistics.html">Some Readership Statistics</a></span>]]></description>
			<content:encoded><![CDATA[<p>This week marks the third anniversary of Math Goes Pop!  As such, I thought it might be appropriate to engage in a bit of navel-gazing.  But since I can gaze at my own navel whenever I please, I&#8217;d like to flip the script, as it were, and turn my attention towards the collective navels of my readership.</p>
<div id="attachment_1295" class="wp-caption aligncenter" style="width: 233px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-1.01.43-PM.png"><img class="size-medium wp-image-1295" title="Screen shot 2011-07-08 at 1.01.43 PM" src="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-1.01.43-PM-223x300.png" alt="" width="223" height="300" /></a><p class="wp-caption-text">Our cat&#39;s third birthday is also this week.  It is unclear which event he is celebrating, although the dilated pupils suggest he is celebrating a bit too hard.</p></div>
<p>I&#8217;d like to share with you some data on the geographic distribution of my US readers.  While there is a large California bias, people from all over the country seem to have stumbled upon this corner of the internet, and have hopefully enjoyed their time here.</p>
<div id="attachment_1298" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-1.05.22-PM.png"><img class="size-medium wp-image-1298" title="Screen shot 2011-07-08 at 1.05.22 PM" src="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-1.05.22-PM-300x197.png" alt="" width="300" height="197" /></a><p class="wp-caption-text">This represents you, gentle reader.  Darker green means more viewers.</p></div>
<p>Of course, a California bias shouldn&#8217;t be all that surprising.  After all, California is the most populous state in the country, accounting for roughly 12% of the country&#8217;s total population, according to <a href="http://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population">this</a> 2010 Census data.  A more interesting thing to think about, then, is not the map pictured above, but how the map contrasts with actual population data for each state.  For example, New York is a hair darker than Texas on this map, even though Texas has a larger share of our population: roughly 8% as compared to 6% for New York.</p>
<p>One way to compare population data with Math Goes Pop visitor data is through rankings.  This table compares the rankings, and shows the relative difference for each state (feel free to play around with the data to your heart&#8217;s content):</p>
<p style="text-align: center;">
<table id="wp-table-reloaded-id-2-no-1" class="wp-table-reloaded wp-table-reloaded-id-2">
<thead>
	<tr class="row-1 odd">
		<th class="column-1">State</th><th class="column-2">View Proportion Rank</th><th class="column-3">Population Proportion Rank</th><th class="column-4">Rank Difference</th>
	</tr>
</thead>
<tbody>
	<tr class="row-2 even">
		<td class="column-1">California</td><td class="column-2">1</td><td class="column-3">1</td><td class="column-4">0</td>
	</tr>
	<tr class="row-3 odd">
		<td class="column-1">New York</td><td class="column-2">2</td><td class="column-3">3</td><td class="column-4">+1</td>
	</tr>
	<tr class="row-4 even">
		<td class="column-1">Michigan</td><td class="column-2">3</td><td class="column-3">8</td><td class="column-4">+5</td>
	</tr>
	<tr class="row-5 odd">
		<td class="column-1">Texas</td><td class="column-2">4</td><td class="column-3">2</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-6 even">
		<td class="column-1">Illinois</td><td class="column-2">5</td><td class="column-3">5</td><td class="column-4">0</td>
	</tr>
	<tr class="row-7 odd">
		<td class="column-1">Pennsylvania</td><td class="column-2">6</td><td class="column-3">6</td><td class="column-4">0</td>
	</tr>
	<tr class="row-8 even">
		<td class="column-1">New Jersey</td><td class="column-2">7</td><td class="column-3">11</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-9 odd">
		<td class="column-1">Massachusetts</td><td class="column-2">8</td><td class="column-3">14</td><td class="column-4">+6</td>
	</tr>
	<tr class="row-10 even">
		<td class="column-1">Florida</td><td class="column-2">9</td><td class="column-3">4</td><td class="column-4">-5</td>
	</tr>
	<tr class="row-11 odd">
		<td class="column-1">Virginia</td><td class="column-2">10</td><td class="column-3">12</td><td class="column-4">+2</td>
	</tr>
	<tr class="row-12 even">
		<td class="column-1">Washington</td><td class="column-2">11</td><td class="column-3">13</td><td class="column-4">+2</td>
	</tr>
	<tr class="row-13 odd">
		<td class="column-1">North Carolina</td><td class="column-2">12</td><td class="column-3">10</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-14 even">
		<td class="column-1">Ohio</td><td class="column-2">13</td><td class="column-3">7</td><td class="column-4">-6</td>
	</tr>
	<tr class="row-15 odd">
		<td class="column-1">Maryland</td><td class="column-2">14</td><td class="column-3">19</td><td class="column-4">+5</td>
	</tr>
	<tr class="row-16 even">
		<td class="column-1">Georgia</td><td class="column-2">15</td><td class="column-3">9</td><td class="column-4">-6</td>
	</tr>
	<tr class="row-17 odd">
		<td class="column-1">Missouri</td><td class="column-2">16</td><td class="column-3">18</td><td class="column-4">+2</td>
	</tr>
	<tr class="row-18 even">
		<td class="column-1">Connecticut</td><td class="column-2">17</td><td class="column-3">29</td><td class="column-4">+12</td>
	</tr>
	<tr class="row-19 odd">
		<td class="column-1">Minnesota</td><td class="column-2">18</td><td class="column-3">21</td><td class="column-4">+3</td>
	</tr>
	<tr class="row-20 even">
		<td class="column-1">Oregon</td><td class="column-2">19</td><td class="column-3">27</td><td class="column-4">+8</td>
	</tr>
	<tr class="row-21 odd">
		<td class="column-1">Arizona</td><td class="column-2">20</td><td class="column-3">16</td><td class="column-4">-4</td>
	</tr>
	<tr class="row-22 even">
		<td class="column-1">Colorado</td><td class="column-2">21</td><td class="column-3">22</td><td class="column-4">+1</td>
	</tr>
	<tr class="row-23 odd">
		<td class="column-1">Wisconsin</td><td class="column-2">22</td><td class="column-3">20</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-24 even">
		<td class="column-1">Tennessee</td><td class="column-2">23</td><td class="column-3">17</td><td class="column-4">-6</td>
	</tr>
	<tr class="row-25 odd">
		<td class="column-1">Indiana</td><td class="column-2">24</td><td class="column-3">15</td><td class="column-4">-9</td>
	</tr>
	<tr class="row-26 even">
		<td class="column-1">Washington DC</td><td class="column-2">25</td><td class="column-3">50</td><td class="column-4">+25</td>
	</tr>
	<tr class="row-27 odd">
		<td class="column-1">South Carolina</td><td class="column-2">26</td><td class="column-3">24</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-28 even">
		<td class="column-1">Louisiana</td><td class="column-2">27</td><td class="column-3">25</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-29 odd">
		<td class="column-1">Kentucky</td><td class="column-2">28</td><td class="column-3">26</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-30 even">
		<td class="column-1">Iowa</td><td class="column-2">29</td><td class="column-3">30</td><td class="column-4">+1</td>
	</tr>
	<tr class="row-31 odd">
		<td class="column-1">Utah</td><td class="column-2">30</td><td class="column-3">34</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-32 even">
		<td class="column-1">Oklahoma</td><td class="column-2">31</td><td class="column-3">28</td><td class="column-4">-3</td>
	</tr>
	<tr class="row-33 odd">
		<td class="column-1">Alabama</td><td class="column-2">32</td><td class="column-3">23</td><td class="column-4">-9</td>
	</tr>
	<tr class="row-34 even">
		<td class="column-1">Kansas</td><td class="column-2">33</td><td class="column-3">33</td><td class="column-4">0</td>
	</tr>
	<tr class="row-35 odd">
		<td class="column-1">Nebraska</td><td class="column-2">34</td><td class="column-3">38</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-36 even">
		<td class="column-1">Arkansas</td><td class="column-2">35</td><td class="column-3">32</td><td class="column-4">-3</td>
	</tr>
	<tr class="row-37 odd">
		<td class="column-1">Nevada</td><td class="column-2">36</td><td class="column-3">35</td><td class="column-4">-1</td>
	</tr>
	<tr class="row-38 even">
		<td class="column-1">Mississippi</td><td class="column-2">37</td><td class="column-3">31</td><td class="column-4">-6</td>
	</tr>
	<tr class="row-39 odd">
		<td class="column-1">New Hampshire</td><td class="column-2">38</td><td class="column-3">42</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-40 even">
		<td class="column-1">Rhode Island</td><td class="column-2">39</td><td class="column-3">43</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-41 odd">
		<td class="column-1">New Mexico</td><td class="column-2">40</td><td class="column-3">36</td><td class="column-4">-4</td>
	</tr>
	<tr class="row-42 even">
		<td class="column-1">Idaho</td><td class="column-2">41</td><td class="column-3">39</td><td class="column-4">-2</td>
	</tr>
	<tr class="row-43 odd">
		<td class="column-1">Maine</td><td class="column-2">42</td><td class="column-3">41</td><td class="column-4">-1</td>
	</tr>
	<tr class="row-44 even">
		<td class="column-1">West Virginia</td><td class="column-2">43</td><td class="column-3">37</td><td class="column-4">-6</td>
	</tr>
	<tr class="row-45 odd">
		<td class="column-1">Hawaii</td><td class="column-2">44</td><td class="column-3">40</td><td class="column-4">-4</td>
	</tr>
	<tr class="row-46 even">
		<td class="column-1">Vermont</td><td class="column-2">45</td><td class="column-3">49</td><td class="column-4">+4</td>
	</tr>
	<tr class="row-47 odd">
		<td class="column-1">Alaska</td><td class="column-2">46</td><td class="column-3">47</td><td class="column-4">+1</td>
	</tr>
	<tr class="row-48 even">
		<td class="column-1">Montana</td><td class="column-2">47</td><td class="column-3">44</td><td class="column-4">-3</td>
	</tr>
	<tr class="row-49 odd">
		<td class="column-1">Delaware</td><td class="column-2">48</td><td class="column-3">45</td><td class="column-4">-3</td>
	</tr>
	<tr class="row-50 even">
		<td class="column-1">North Dakota</td><td class="column-2">49</td><td class="column-3">48</td><td class="column-4">-1</td>
	</tr>
	<tr class="row-51 odd">
		<td class="column-1">South Dakota</td><td class="column-2">50</td><td class="column-3">46</td><td class="column-4">-4</td>
	</tr>
	<tr class="row-52 even">
		<td class="column-1">Wyoming</td><td class="column-2">51</td><td class="column-3">51</td><td class="column-4">0</td>
	</tr>
</tbody>
</table>
</p>
<p>The rankings give us some information: we see that Indiana and Alabama are not as well represented in readership as one might expect given their population rankings (both states have Math Goes Pop readership rankings 9 levels below their population rankings).  On the other hand, folks from DC, Connecticut, and Oregon are visiting this site more than would be expected based on population numbers alone; readership rankings for these states are 25, 12, and 8 levels above their population rankings, respectively.</p>
<p>But while the rankings tell us some things, they leave a great deal out.  For instance, while California is ranked first in both the proportion of the US population and the proportion of visitors to this site, the rankings tell us nothing about how these proportions compare to each other.  In fact, while California boasts the number 1 proportion in both categories, the proportion of California viewers to Math Goes Pop is more than twice the proportion of California&#8217;s population (26% of my viewers vs. 12% of the US population).</p>
<p>Comparison of the proportions in this way allows us to get a better understanding of how visitors to this site are distributed across the country, as compared to the overall population distribution.  While the population is distributed more heavily in California, the proportion of California Math Goes Pop visitors is greater than can be explained by simple population data.</p>
<p>If we compare the state-by-state readership proportions to overall population proportions, we get the following picture:</p>
<div id="attachment_1305" class="wp-caption aligncenter" style="width: 610px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-5.26.02-PM.png"><img class="size-full wp-image-1305" title="StateData" src="http://www.mathgoespop.com/wp-content/uploads/2011/07/Screen-shot-2011-07-08-at-5.26.02-PM.png" alt="" width="600" height="307" /></a><p class="wp-caption-text">Click to Embiggen!</p></div>
<p>Big ups to our nation&#8217;s capital for having the largest share of viewership relative to its overall share of the country&#8217;s population.  In addition to DC, the proportion of readership from 10 states is greater than the state&#8217;s proportional population: California, Massachusetts, Michigan, New York, Connecticut, Maryland, Oregon, New Jersey, Washington, and Vermont.  For example, while Massachusetts accounts for roughly 2% of the country&#8217;s population, thus far it has accounted for nearly 3.5% of Math Goes Pop readership.</p>
<p>I won&#8217;t go into an analysis of why some states might be over- or underrepresented in the blog&#8217;s readership according to this metric.  I just thought it might be appropriate to share some of this data in commemoration of Math Goes Pop!  I hope you will continue to enjoy the material posted here.  And if you live in Wyoming, Mississippi, Alabama, or any of the other 40 states I didn&#8217;t mention in the paragraph above, let&#8217;s see what we can do to get some mathematical love flowing in your neighborhood.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2011/07/some-readership-statistics.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>MTV/Oscar Showdown</title>
		<link>http://www.mathgoespop.com/2011/06/mtvoscar-showdown.html</link>
		<comments>http://www.mathgoespop.com/2011/06/mtvoscar-showdown.html#comments</comments>
		<pubDate>Thu, 09 Jun 2011 18:16:46 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Math in the Movies]]></category>
		<category><![CDATA[Math on TV]]></category>
		<category><![CDATA[mtv]]></category>
		<category><![CDATA[oscars]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[twilight]]></category>
		<category><![CDATA[voting]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=1263</guid>
		<description><![CDATA[<p>For many of us, summer is thought of as the time between Memorial Day and Labor Day.  For folks of a younger generation, though, trendier bookends are provided by two MTV Award shows: The Movie Awards at the beginning of the summer, and the Video Music Awards at the end.  Continuing this noble tradition, <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2011/06/mtvoscar-showdown.html">MTV/Oscar Showdown</a></span>]]></description>
			<content:encoded><![CDATA[<p>For many of us, summer is thought of as the time between Memorial Day and Labor Day.  For folks of a younger generation, though, trendier bookends are provided by two MTV Award shows: <a href="http://en.wikipedia.org/wiki/MTV_Movie_Awards">The Movie Awards</a> at the beginning of the summer, and the <a href="http://en.wikipedia.org/wiki/MTV_Video_Music_Awards">Video Music Awards</a> at the end.  Continuing this noble tradition, the 20th iteration of the MTV Movie Awards was broadcast this weekend.  If you missed it, don&#8217;t worry; I&#8217;m sure it will be shown another 300,000 or so times before the summer is out.</p>
<p>As a shining beacon of what is hip, MTV has a responsibility during its movie awards to highlight the most popular films of the year.  This is in stark contrast to the priorities of higher brow award shows such as the Oscars, for which artistic achievement is placed on the highest pedestal.  This is not to say that these two goals need be mutually exclusive; indeed, since the first MTV Movie Awards was broadcast in 1992, the &#8220;Best Film&#8221; has agreed with the Academy Award winning best film three times (1997&#8242;s <a href="http://www.imdb.com/title/tt0120338/">Titanic</a>, 2000&#8242;s <a href="http://www.imdb.com/title/tt0172495/">Gladiator</a>, and 2003&#8242;s <a href="http://www.imdb.com/title/tt0167260/">Lord of the Rings: The Return of the King</a>).  Even so, a quick glance at the nominated films from these two awards shows each year reveals a fairly small overlap, in general.  But if MTV&#8217;s goal is to prop up films from an angle focused more on pop culture, it is natural to ask how good of a job they do.</p>
<div id="attachment_1268" class="wp-caption aligncenter" style="width: 234px"><a href="http://www.mtv.com/ontv/movieawards/2011/"><img class="size-medium wp-image-1268" title="mtvma" src="http://www.mathgoespop.com/wp-content/uploads/2011/06/mtvma-224x300.jpg" alt="" width="224" height="300" /></a><p class="wp-caption-text">Delicious metal popcorn. (Photo: Jason McDonald/MTV)</p></div>
<p>This question begets another one: how can we best measure a film&#8217;s popularity?  My first thought was to consider the rankings on <a href="http://www.imdb.com/">IMDB</a>.  There, users can give any film a score from 1 to 10; as a prime example of a <a href="http://rangevoting.org/">range voting</a> system, this seemed like a good place to measure the public&#8217;s reception of a film.</p>
<p>The results were mixed.  With this metric, comparing the 20 years that both awards shows have been around, the MTV best film scored higher than the Oscar winning best film only 5 times.  Oscar trumped 12 times, and the two awards tied three times.  The trend is also worth mentioning &#8211; after 5 consecutive years of beating or tying the Oscars in IMDB score from 1999-2003, the Oscar winning film has bested the MTV winning film ever since.  The disparity has become even larger in recent years (I call this the <a href="http://en.wikipedia.org/wiki/Twilight_%28series%29">Twilight</a> effect, as the Twilight films have won best film at the MTV awards for three years running).  Here&#8217;s a graph of the scores over time (the list of MTV Best film winners is <a href="http://en.wikipedia.org/wiki/MTV_Movie_Award_for_Best_Movie">here</a>; Oscar winners can be found <a href="http://en.wikipedia.org/wiki/Academy_Award_for_Best_Picture">here</a>):</p>
<p style="text-align: center;"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.11.17-AM.png"><img class="size-full wp-image-1269 aligncenter" title="Screen shot 2011-06-09 at 10.11.17 AM" src="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.11.17-AM.png" alt="" width="601" height="393" /></a></p>
<p>While you may argue there isn&#8217;t enough data here to draw much of a strong conclusion, the recent trend is fairly convincing.  By this metric, it seems like Oscar winning films, at least over the past few years, seem to have been more popular.</p>
<p>Rather than looking at only the winner, though, you might expect to get a better sense of the popularity of films on display by looking at all nominated films, rather than just the winners.  If we take the average IMDB score for all nominated best films at each awards show, rather than just the winning film, we get the following picture:</p>
<p style="text-align: center;"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.19.18-AM.png"><img class="size-full wp-image-1270 aligncenter" title="AvgIMDB" src="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.19.18-AM.png" alt="" width="600" height="386" /></a></p>
<p>It&#8217;s come close, but the average IMDB score of MTV nominated films has never been greater than the average IMDB score for Oscar nominated films.  We see the Twilight effect among the averages as well, though it was dampened somewhat this year due to the inclusion of critical darlings <a href="http://www.imdb.com/title/tt1375666/">Inception</a>, <a href="http://www.imdb.com/title/tt1285016/">The Social Network</a>, and <a href="http://www.imdb.com/title/tt0947798/">Black Swan</a> on the MTV nominee list.</p>
<p>&#8220;Hold up,&#8221; you may be thinking to yourself, &#8220;this is all a bunch of hooey.&#8221;  You may think that IMDB scores are not a very good measure of a film&#8217;s popularity.  It may be quite likely that IMDB scores are biased towards those same features of a film that make it more likely for Oscar consideration.  Perhaps the type of person who goes onto the website to rate films is more likely to be somewhat of a connoisseur, and therefore the tastes reflected by the IMDB community are more likely to reflect the tastes of the Oscars.  At the very least, it seems likely that teenage girls are not voting on the website in droves; how else can one explain the Twilight series&#8217; limp average of only 4.9?</p>
<p>What else can we use to measure a film&#8217;s popularity?  Well, to return to teenage girls, they don&#8217;t show their support for the Twilight series by rating it highly on the internet.  They show their support by going out and seeing the movie multiple times.  So perhaps we should look at box office receipts rather than IMDB score (and, of course, by picking sides in the bitter feud between Edward and Jacob).  What sort of picture do we see in this case?</p>
<p>If we only consider the winning film from each awards show, the data looks like this (I&#8217;m only considering US box office numbers here):</p>
<p><a href="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.39.40-AM.png"><img class="aligncenter size-full wp-image-1271" title="BestMoney" src="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.39.40-AM.png" alt="" width="600" height="385" /></a>(I&#8217;ve cut the graph so that you can&#8217;t see the leap in 1997, when Titanic took top prize at both shows.)  Things look a little more erratic, but if you look closely, you&#8217;ll see that the MTV award winner has taken in more money than its Oscar winning counterpart 14 times out of 20.  The Oscar has favored the larger cash cow only three times.</p>
<p>As with the IMDB rankings, we can try to smooth things out by looking at the average box office returns among nominees, rather than just returns for the winner.  This yields the following graph:</p>
<p><a href="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.49.21-AM.png"><img class="aligncenter size-full wp-image-1272" title="AvgMoney" src="http://www.mathgoespop.com/wp-content/uploads/2011/06/Screen-shot-2011-06-09-at-10.49.21-AM.png" alt="" width="600" height="378" /></a>These numbers aren&#8217;t adjusted for inflation, which may explain in part the growth trend (2009 numbers are bumped up because of <a href="http://www.imdb.com/title/tt0499549/">Avatar</a>, as well).  I&#8217;m less interested in the actual numbers than I am the difference between the two graphs.  And here we see, in contrast to the IMDB case, that the roles of the two awards shows have flipped.  While the average IMDB score of Oscar nominees has always been higher than the average IMDB score of MTV nominees, the average box office return of MTV nominees has always been higher than the average box office return of Oscar nominees.</p>
<p>To return to the original question: do the MTV movie awards highlight more popular films than the Oscars?  Well, it depends on how you define &#8220;popular.&#8221;  If popular means highly rated, the claim is somewhat dubious.  But if popularity is measured by the almighty dollar, then this seems like a fair conclusion to draw.  Whatever your line of thinking, though, I&#8217;m fairly confident that current trends will continue, at least until the last of the Twilight films has exited theaters.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2011/06/mtvoscar-showdown.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Scoreboard Stats</title>
		<link>http://www.mathgoespop.com/2011/05/scoreboard-stats.html</link>
		<comments>http://www.mathgoespop.com/2011/05/scoreboard-stats.html#comments</comments>
		<pubDate>Thu, 26 May 2011 21:28:19 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Math in the News]]></category>
		<category><![CDATA[Sports]]></category>
		<category><![CDATA[baseball]]></category>
		<category><![CDATA[e]]></category>
		<category><![CDATA[poisson distribution]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=1247</guid>
		<description><![CDATA[<p>A couple of weeks ago I noticed this article on the Yahoo Sports page, which highlighted a statistically rare event that occurred in the American League on Sunday, May 8th.  On that day, 7 baseball games were played on the AL schedule, and in all of those games one team scored exactly 5 runs.  The post <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2011/05/scoreboard-stats.html">Scoreboard Stats</a></span>]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago I noticed <a href="http://sports.yahoo.com/mlb/blog/big_league_stew/post/Gimme-Five-American-League-scoreboard-features-?urn=mlb-wp5759">this</a> article on the Yahoo Sports page, which highlighted a statistically rare event that occurred in the American League on Sunday, May 8th.  On that day, 7 baseball games were played on the AL schedule, and in all of those games one team scored exactly 5 runs.  The post then links to <a href="http://news.yahoo.com/s/ap/20110509/ap_on_sp_ba_ne/bba5_alive">this</a> article from the AP, which gives this rare event the following context:</p>
<blockquote><p>It was the first time in 18 years that such a quirky thing happened with a full schedule. On Aug. 10, 1993, all seven NL games featured one team scoring precisely two runs, STATS LLC said.</p>
<p>The last time it occurred with five or more runs was July 20, 1955, when all four AL games had at least one team score exactly six, STATS LLC said.</p></blockquote>
<p>When I read this article, some questions immediately came to mind: exactly how rare is it for one team in a collection of 7 baseball games to have a common score of 5?  Also, if 7 teams in 7 games have the same score, which score are they most likely to share?  Are the 7 games with a common score 0f 2 more or less likely to occur than the 7 games with a common score of 5?</p>
<p>We can answer these questions with some (relatively) simple probability models, given some caveats.  I&#8217;d like to estimate these probabilities using only one parameter: the average number of runs a team scores during a game.  Of course, that average will vary from team to team, and also from year to year (in particular, runs per game have declined from the heyday of steroid-mania that gripped baseball at the turn of the millennium).  Due to different rules, there may also be variation between the American and National Leagues.  Let me ignore this, though, and consider only an average number of runs per game overall &#8211; what we lose in precision we will more than make up for in clarity.</p>
<div id="attachment_1249" class="wp-caption aligncenter" style="width: 320px"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/05/dingers.jpg"><img class="size-full wp-image-1249" title="dingers" src="http://www.mathgoespop.com/wp-content/uploads/2011/05/dingers.jpg" alt="" width="310" height="230" /></a><p class="wp-caption-text">Ahh, the late 90&#39;s, when it was easier to sock a few dingers.</p></div>
<p>The question remains: how many runs are scored on average in a baseball game?  I found some data online which is somewhat outdated, but I&#8217;ll stick to it for convenience (and, more importantly, out of laziness) &#8211; any alteration in this number is easy to propagate throughout the following discussion.  In <a href="http://www.hardballtimes.com/main/article/runs-per-game/">this</a> article from 2005, the author tabulated the average number of runs per game in MLB over a 5 year span from 2000-2004 (that&#8217;s over 12,000 games!).  He has a nice looking graph of the distribution of scores as well:</p>
<p><a href="http://www.hardballtimes.com/main/article/runs-per-game/"><img class="aligncenter size-full wp-image-1250" title="runspergame" src="http://www.mathgoespop.com/wp-content/uploads/2011/05/runspergame.gif" alt="" width="439" height="369" /></a>A savvy probability student might see the long tail of this probability distribution and liken it to the <a href="http://en.wikipedia.org/wiki/Poisson_distribution">Poisson distribution</a>, a distribution encountered in many probability courses, and which is frequently motivated by a desire to model &#8220;rare events.&#8221;  I put the term in quotations since what constitutes &#8220;rare&#8221; is frequently left undefined, and in any event, is not really pertinent to this discussion.</p>
<p>Let us suppose, then, that the number of runs scored per game by each team follows a Poisson distribution.  French aside, this means that the probability a team will score <em>n</em> runs is equal to</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=e%5E%7B-A%7D%5Cfrac%7BA%5En%7D%7Bn%21%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='e^{-A}\frac{A^n}{n!}' title='e^{-A}\frac{A^n}{n!}' class='latex' />,</p>
<p style="text-align: left;">where A is the average number of runs scored per game &#8211; in this case, 4.82, and <em>e</em> is the unsung hero sometimes known as <a href="http://www.mathgoespop.com/2010/01/e-day.html">Euler&#8217;s number</a>.  Don&#8217;t worry too much about this formula; if you prefer, the graph of the function <img src='http://s.wordpress.com/latex.php?latex=e%5E%7B-4.82%7D%5Cfrac%7B4.82%5En%7D%7Bn%21%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='e^{-4.82}\frac{4.82^n}{n!}' title='e^{-4.82}\frac{4.82^n}{n!}' class='latex' /> looks like this (courtesy of <a href="http://www.wolframalpha.com/">Wolfram Alpha</a>):</p>
<p style="text-align: left;"><a href="http://www.mathgoespop.com/wp-content/uploads/2011/05/Picture-2.png"><img class="aligncenter size-full wp-image-1252" title="Poisson482" src="http://www.mathgoespop.com/wp-content/uploads/2011/05/Picture-2.png" alt="" width="320" height="193" /></a>Note that the fit isn&#8217;t perfect &#8211; this graph starts much lower at 0 than the graph of the actual data pictured above, for example &#8211; but there is precedence for using the Poisson distrubtion to model runs in a baseball game (<a href="http://www.jstor.org/pss/2684837">this</a> article provides one such example, but a subscription is required to view it in its entirety).  More careful analysis is possible, and can be found in resources like <a href="http://books.google.com/books?id=1mNZfyil2ecC&amp;lpg=PA168&amp;ots=oXZDh_q7X5&amp;dq=probability%20distribution%20of%20runs%20scored%20in%20a%20baseball%20game&amp;pg=PP1#v=onepage&amp;q=probability%20distribution%20of%20runs%20scored%20in%20a%20baseball%20game&amp;f=false">this</a> one, but again, I want to keep things relatively simple.</p>
<p style="text-align: left;">So, let us suppose that the probability that a team scores <em>n</em> runs is <img src='http://s.wordpress.com/latex.php?latex=e%5E%7B-4.82%7D%5Cfrac%7B4.82%5En%7D%7Bn%21%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='e^{-4.82}\frac{4.82^n}{n!}' title='e^{-4.82}\frac{4.82^n}{n!}' class='latex' />.  What then, is the probability than in a baseball game, one of the teams will score <em>n</em> runs?  Either team A can score <em>n</em> runs or team <em>B</em> can score <em>n</em> runs, but they can&#8217;t both score <em>n</em> runs since baseball games can&#8217;t end in a tie.  This means that the probability of A or B scoring <em>n</em> runs is simply the probability that A scores <em>n</em> runs plus the probability that <em>B</em> scores <em>n</em> runs, or <img src='http://s.wordpress.com/latex.php?latex=2e%5E%7B-4.82%7D%5Cfrac%7B4.82%5En%7D%7Bn%21%7D&#038;bg=T&#038;fg=000000&#038;s=0' alt='2e^{-4.82}\frac{4.82^n}{n!}' title='2e^{-4.82}\frac{4.82^n}{n!}' class='latex' /></p>
<p style="text-align: left;">For the odds that this happens 7 times, we then multiply this number by itself 7 times (lurking under this is the assumption that runs scored in different games are <a href="http://en.wikipedia.org/wiki/Independence_%28probability_theory%29">independent</a>, which seems like an entirely reasonable assumption to make).  To summarize, we estimate the probability that one team in each of 7 games scores n runs is</p>
<p style="text-align: center;"><img src='http://s.wordpress.com/latex.php?latex=%282e%5E%7B-4.82%7D%5Cfrac%7B4.82%5En%7D%7Bn%21%7D%29%5E7.&#038;bg=T&#038;fg=000000&#038;s=0' alt='(2e^{-4.82}\frac{4.82^n}{n!})^7.' title='(2e^{-4.82}\frac{4.82^n}{n!})^7.' class='latex' /></p>
<p style="text-align: left;">If <em>n</em> = 5 (as it did earlier this month), the probability is roughly .064%.  In other words, if 7 AL games were played every day, you would expect this outcome once every 1,560 days or so.  Having said that, with more careful analysis it&#8217;s possible to show that in fact, if 7 games will have teams scoring the same number of runs, 5 is the most likely number.  For comparison, when <em>n</em> = 2 the probability is only a paltry 0.00812%, making what happened on May 8th over 75 times more likely than what happened on August 10, 1993.  Of course, it&#8217;s not fair to compare these records to the 6 run record in 1955, since in that case only 4 games were played, rather than 7.  Nevertheless, it&#8217;s not difficult to adjust this model from 7 games to 4 games (or an arbitrary number of games).</p>
<p style="text-align: left;">So, rather than some murky intuition telling us this event should be unlikely, with a little more effort we can attempt to quantify exactly how unlikely this event should be.  More sophisticated models for runs could be used, but perhaps that is a topic I will save for another day.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2011/05/scoreboard-stats.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lying with Statistics in Football</title>
		<link>http://www.mathgoespop.com/2010/02/lying-with-statistics.html</link>
		<comments>http://www.mathgoespop.com/2010/02/lying-with-statistics.html#comments</comments>
		<pubDate>Mon, 08 Feb 2010 17:33:00 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Math in the News]]></category>
		<category><![CDATA[Sports]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=158</guid>
		<description><![CDATA[In the aftermath of the Super Bowl, some of you fans may be dreading the next six months.  To kick off this football drought, I&#8217;d like to highlight this article, which was featured on Yahoo yesterday.  The article says that Saints quarterback Drew Brees should hope to lose the coin toss at the start of the <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2010/02/lying-with-statistics.html">Lying with Statistics in Football</a></span>]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">In the aftermath of the Super Bowl, some of you fans may be dreading the next six months.  To kick off this football drought, I&#8217;d like to highlight <a href=" http://sports.yahoo.com/nfl/blog/shutdown_corner/post/If-Saints-win-coin-toss-Super-Bowl-could-be-ove?urn=nfl,217725">this article</a>, which was featured on Yahoo yesterday.  The article says that Saints quarterback Drew Brees should hope to lose the coin toss at the start of the game, because in the past 43 Super Bowls, the team that won the coin toss had only won 20 times.</div>
<div id="attachment_163" class="wp-caption aligncenter" style="width: 385px"><a href="http://www.mathgoespop.com/wp-content/uploads/2010/02/xlivcoin.jpg"><img class="size-full wp-image-163" title="xlivcoin" src="http://www.mathgoespop.com/wp-content/uploads/2010/02/xlivcoin.jpg" alt="" width="375" height="375" /></a><p class="wp-caption-text">An unlucky coin?  Unlikely.</p></div>
<p>Um&#8230;what?  Who cares?  While 20/43 is slightly less than the expected 50%, this difference is not even close to being statistically significant.  Actually, the fact that this ratio is only 1 1/2 games shy of the mean is pretty good.  Matt Springer has posted an <a href="http://scienceblogs.com/builtonfacts/2010/02/super_bowl_coin_toss_mathemati.php">article</a> that discusses why we shouldn&#8217;t really care about this difference.</p>
<div style="text-align: justify;">
<p>Of course, the sample size is naturally restricted by the small number of Super Bowls, but if the author (Mark Pesavento) had really been interested in the question of whether or not the coin toss is correlated with the winner in a football game, he could&#8217;ve easily collected data over a couple of seasons and obtained an answer to the question.  At the very least, he could&#8217;ve owned up to the fact that his analysis is worthless, but instead, to the critics he offers only the following rebuttal: &#8220;because of the small sample size, some statisticians argue that the win-loss record of coin-toss winners is statistically insignificant.&#8221;</p>
<p>This is completely disingenuous, because it suggests that there would be a debate among statisticians about the significance in the data Pesavento uses, when no such debate exists.  Anyone with even a rudimentary background in statistics would understand that the sample size here would be too small to draw the conclusion he draws.</p>
<p>Moreover, Pesavento falls for one of the most common traps in statistics: mistaking correlation for causation.  Even if the data was much stronger in indicating that the coin toss winner is at a disadvantage, this would not imply that Brees should hope to lose the toss.  A correlation between these two effects does not imply a causal relationship between the two.  I feel like I&#8217;ve discussed this before, but just in case, here&#8217;s a thorough <a href="http://en.wikipedia.org/wiki/Correlation_does_not_imply_causation">discussion</a> of this misconception.</p>
<p>Here this point is moot, since we don&#8217;t even have a correlation.  I thought no one would need to point out that &#8220;No correlation does not imply causation,&#8221; but apparently we do.</p>
<p>Thankfully, most of the comments on Pesavento&#8217;s post are scathing in regards to his methods.  But that&#8217;s cold comfort in light of the fact that the article was deemed fit for posting on the front page of Yahoo.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2010/02/lying-with-statistics.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Football Pools, Part 3</title>
		<link>http://www.mathgoespop.com/2010/01/football-pools-part-3.html</link>
		<comments>http://www.mathgoespop.com/2010/01/football-pools-part-3.html#comments</comments>
		<pubDate>Sun, 31 Jan 2010 16:00:06 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Sports]]></category>
		<category><![CDATA[betting]]></category>
		<category><![CDATA[digital root]]></category>
		<category><![CDATA[football]]></category>
		<category><![CDATA[pool]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/?p=125</guid>
		<description><![CDATA[
<p>This is the third in a series of posts about pools used for betting on the outcome of football games (part one can be found here, and part two here).  Let me briefly recall the setting, which is probably familiar to anyone who has been to a Super Bowl party.  Typically, one bets on the outcome <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2010/01/football-pools-part-3.html">Football Pools, Part 3</a></span>]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">
<p>This is the third in a series of posts about pools used for betting on the outcome of football games (part one can be found <a href="http://www.mathgoespop.com/2009/02/a-variant-of-the-traditional-football-pool.html">here</a>, and part two <a href="http://www.mathgoespop.com/2009/10/more-on-football-pools.html">here</a>).  Let me briefly recall the setting, which is probably familiar to anyone who has been to a Super Bowl party.  Typically, one bets on the outcome of a football game using a 10 x 10 grid.  People can buy any number of the 100 squares on the grid, and when all the squares have been purchased, each row and each column is assigned a random digit from 0 to 9.</p>
<p>Suppose, for example, that you buy four squares, and after the rows and columns have been labeled, you find that you own square 3-7, square 2-5, square 9-0, and square 6-6.  You will win money if, at the end of any one of the four quarters, the last digit in each team&#8217;s score matches your pair.  For example, if the score after the 3rd quarter is 13-27, you will win some money, since the last two digits are 3 and 7, and you own square 3-7.  There are variants of this: some pools only pay out every half, not every quarter, and usually the payouts vary by quarter, so that having the right square at the end of the game wins you more money than having the right square at the end of the first quarter.</p>
</div>
<div id="attachment_133" class="wp-caption aligncenter" style="width: 468px"><a href="http://www.mathgoespop.com/wp-content/uploads/2010/01/fpool3.jpg"><img class="size-full wp-image-133" title="fpool3" src="http://www.mathgoespop.com/wp-content/uploads/2010/01/fpool3.jpg" alt="" width="458" height="227" /></a><p class="wp-caption-text">Here&#39;s an example of a football pool which has been tagged in the four squares mentioned above.</p></div>
<div style="text-align: justify;">
<p>In the first part of this discussion, we introduced a new way to conduct the pool: rather than looking at the last digit of a team&#8217;s score, we looked instead at the digital root of the team&#8217;s score.  Recall that the digital root of a team&#8217;s score is obtained by adding the digits in their score.  If that sum is between 1 and 9, we stop &#8211; if it is larger than 9, we compute the digital root again, until we get a digit between 1 and 9.  For example, the digital root of 14 is 1 + 4 = 5, while the digital root of 38 is 2, since 3 + 8 = 11, and 1 + 1 = 2.  We then analyzed the distribution of scores, and found that the digital root of a team&#8217;s score is more evenly distributed between 1 and 9 than the last digit of a team&#8217;s score is evenly distributed between 0 and 9 (this is subject to the convention that we assign 0 a digital root of 9, since 0 is the only number with digital root equal to 0).</p>
<p>In the second part of the discussion, we tackled questions of independence.  Namely, we asked whether the last digit in one team&#8217;s score is independent of the last digit of the other team&#8217;s score, and similarly we asked whether the digital root in one team&#8217;s score is independent of the digital root of the other team&#8217;s score.  In both cases, we found the answer to be negative.</p>
<p>The subject of this article is based on the following observation: when you have wagered in a traditional football pool, it&#8217;s not uncommon for a small number of squares to be hit with high frequency during the course of a game.  For example, suppose you watch a game in which one team scores 7 points, then 3, then 7, then 3, while the opposing team never scores.  This means that the game&#8217;s score will go from 0-0, to 7-0, to 10-0, and then to 17-0.  So, while there are four unique scores in the game, with the usual football pool, only two squares will be hit: the 0-0 square, and the 7-0 square.  However, with the digital root pool, four squares will be hit: again using the convention that we assign 0 a digital root of 9, the squares will be 9-9, 7-9, 1-9, and 8-9.</p>
<p>The reason the digital root pool hits more squares in this case is because whenever one team increases its score by 10, the last digit of their score will return to a previous value.  However, with the digital root method, if a team increases its score by 10, the digital root increases by 1.  Because a score increase of 10 is a relatively common occurrence in football (all one needs is a touchdown, extra point, and field goal), one may therefore guess that using the digital root pool, more squares should be hit during the course of the game.</p>
<p>Whether one would want more squares to be hit or not is up for debate, but I see certain benefits.  For example, if more squares are hit during the game, then more people will have something invested in the game as it airs.  If you are sitting on the square that represents the current score, you want the score to remain the same through the end of the quarter so that you can reap the rewards &#8211; but if the winning squares keep bouncing around between a small number of people, there may be fewer people actively interested in the score as the game progresses.  This is especially true in Super Bowl parties, when many of the attendees are less interested in the game than they should be.</p>
<p>In other words, I&#8217;m of the belief that if more squares are hit, it&#8217;s a good thing.  It therefore becomes natural to ask whether or not the digital root pool actually does hit more unique squares than the traditional pool.  Thankfully, we have a wealth of data which we can use to answer this question.</p>
<p>I looked at all the games from this current season, and counted the number of boxes that would have been hit in each game using the traditional pool and the digital root pool.  Averaged over 331 games (this includes preseason and postseason), the number of squares hit using the traditional pool is approximately 6.84.  By comparison, the number of squares hit using the digital root pool is 8.43 &#8211; an increase of 1.59 boxes, or an increase of about 23%.  This effect is amplified when one considers the fact that the digital root pool uses only 81 squares, as opposed to the traditional pool&#8217;s 100.  This means that as a proportion of the total number of squares, the traditional pool hits about 6.84% of its squares, while the digital root pool hits 10.4% &#8211; here we have an increase of over 50%!</p>
<p>This is strong evidence that the digital root pool hits more squares than the tradition pool.  In fact, the data shows that an average game will have a change in the score approximately 8.73 times, which is only a bit higher than the average number of boxes hit by the digital root pool.  This makes sense when we slice the data another way: of the 331 games analyzed, in 252 of them the number of squares hit with the digital root pool was equal to the number of changes of score, meaning that no square got hit more than once.  The same cannot be said of the traditional pool &#8211; in this case, the number of games in which no square got hit more than once was only 62.</p>
<p>The data has convinced me that the digital root pool may be better suited for festive gathering, where wagering on football will be but one of many activities designed to induce merriment.  At the very least, it&#8217;s hard to argue that the traditional pool will hit as many squares as the digital root pool. Some may balk at a break from football pool tradition, but that&#8217;s ok.  I won&#8217;t watch football games with them anyway.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2010/01/football-pools-part-3.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Restructuring the Math Pyramid?</title>
		<link>http://www.mathgoespop.com/2009/07/restructuring-the-math-pyramid.html</link>
		<comments>http://www.mathgoespop.com/2009/07/restructuring-the-math-pyramid.html#comments</comments>
		<pubDate>Thu, 23 Jul 2009 03:22:00 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Math Education]]></category>
		<category><![CDATA[calculus]]></category>
		<category><![CDATA[pedagogy]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.mathgoespop.com/2009/07/restructuring-the-math-pyramid.html</guid>
		<description><![CDATA[A friend recently shared with me the following video from TED (see below).  In it, mathematician (or, in this case, mathemagician) Arthur Benjamin gives a brief argument for eliminating calculus as the top of the &#8220;mathematical pyramid&#8221; in high school education, and replacing it probability and statistics.  The main reason for this shift is <span style="color:#777"> . . . &#8594; Read More: <a href="http://www.mathgoespop.com/2009/07/restructuring-the-math-pyramid.html">Restructuring the Math Pyramid?</a></span>]]></description>
			<content:encoded><![CDATA[<div style="text-align: justify;">A friend recently shared with me the following video from <a href="http://www.ted.com/">TED</a> (see below).  In it, mathematician (or, in this case, mathemagician) <a href="http://www.ted.com/speakers/arthur_benjamin.html">Arthur Benjamin</a> gives a brief argument for eliminating calculus as the top of the &#8220;mathematical pyramid&#8221; in high school education, and replacing it probability and statistics.  The main reason for this shift is that unless you are planning to have a career in a technical field, it&#8217;s unlikely you&#8217;ll find a use for calculus in your everyday life, but an understanding of statistics can benefit you no matter what you do.  For example, it can help you to build an intuition about day to day decision making when risk and uncertainty are involved.  Here&#8217;s the video (it&#8217;s short, only a couple of minutes):</div>
<p><center><object height="326" width="446"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"><param name="allowFullScreen" value="true"><param name="wmode" value="transparent"><param name="bgColor" value="#ffffff"><param name="flashvars" value="vu=http://video.ted.com/talks/embed/ArthurBenjamin_2009-embed_high.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=587"><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgcolor="#ffffff" allowfullscreen="true" flashvars="vu=http://video.ted.com/talks/embed/ArthurBenjamin_2009-embed_high.flv&amp;su=http://images.ted.com/images/ted/tedindex/embed-posters/ArthurBenjamin-2009.embed_thumbnail.jpg&amp;vw=432&amp;vh=240&amp;ap=0&amp;ti=587" height="326" width="446"></embed></object></center><br />A noble goal, to be sure, and it&#8217;s certainly a solution that wouldn&#8217;t cost a whole lot.  There is an argument to be made for such a change lurking in here somewhere, but coming in at under 3 minutes, Benjamin&#8217;s argument barely scratches the surface.  In no particular order, here are some of the problems I have with his proposal:
<div style="text-align: justify;">1) Arguing that students shouldn&#8217;t learn calculus because they may not use it in their everyday life is specious.  By this reasoning, I should never have taken any courses in history, biology, or chemistry.  The purpose of high school education in this country seems to be not only determining what educational avenues students want to pursue further, but also what avenues they don&#8217;t want to pursue.  If you want to argue that students should only be learning things that they can apply to their everyday lives, then you are arguing for a much more sweeping reform of education.</p>
<p>I do acknowledge that there is an opportunity cost at work when we spend a year teaching a student calculus rather than statistics, and certainly the average student will find more use later in life for the latter.  But there&#8217;s also an opportunity cost at work when we spend a year teaching a student statistics rather than calculus, especially for students who aren&#8217;t sure in what direction their academic future will head.  If anything, this seems to be an argument for offering both statistics and calculus for students, rather than forcing them into one option or the other.</p>
<p>2) About 2/3rds of the way through the talk, Benjamin asserts that &#8220;if our students, if our high school students, if all of the American citizens knew about probability and statistics, we wouldn&#8217;t be in the economic mess we&#8217;re in today.&#8221;  This is met with some cheers from the audience, but is it actually true?</p>
<p>The answer depends on your definition of &#8220;knowing&#8221; probability and statistics.  I agree that having some knowledge of statistics is a good thing for the population at large, and there are no doubt many fundamental principles that could be taught at a high school level &#8211; for example, the idea that correlation does not imply causation, or the ways in which one can manipulate data or graphs of data.  These topics, among others, are discussed in the book <a href="http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728">How to Lie with Statistics</a>, which would be a great required reading book for any teacher trying to impart intuition and a healthy dose of skepticism onto his or her students, and is written for a general audience.</p>
<p>However, even if everyone in America had this basic level of knowledge, it&#8217;s not at all clear that this would have somehow saved us from economic catastrophe.  If you work in finance, odds are pretty good that you already have a knowledge of statistics that goes beyond a high school level, but this didn&#8217;t stop the economy from tanking.</p>
<p>Nassim Nicholas Taleb wrote an excellent <a href="http://www.edge.org/3rd_culture/taleb08/taleb08_index.html">article</a> last year on the limits of statistics, which is well worth a read if you can spare the time.  One of the arguments he makes is that part of the reason the financial models caused such an economic implosion was that these models are necessarily unable to predict <a href="http://en.wikipedia.org/wiki/Black_swan_theory">black swan</a> events, which can have a tremendously negative impact, but are also tremendously rare.  In fact, he argues that statistics is actually quite poor at trying to  predict what will happen in extremely complex systems where rare extreme events can have a profound effect on the system.</p>
<p>However, this is not what one learns in a high school or even undergraduate class on statistics.  Most problems at this level involve simple systems (games of chance, for example).  In other words, studying statistics at a low level does not expose one to the subtleties and limitations of the subject &#8211; in particular, I don&#8217;t think it&#8217;s feasible to say that if every high school graduate had taken a course in statistics, somehow we would have prevented the current economic catastrpohe.  To do so would have required a much deeper understanding of statistics among those applying the financial models than can be supplied at the high school level.</p>
<p>This brings me to my third point&#8230;</p>
<p>3) To have a good understanding of statistics, one must already have a working knowledge of calculus.  There is a limit to the amount of depth a probability or statistics course can explore when calculus is not a prerequisite, and because of this many results (such as the Central Limit Theorem) are stated without proof.  This is fine if you are simply trying to expose students to some of the standard tools in the subject, but if you can&#8217;t go deeper, there really is a limit to the level of understanding a student can achieve.</p>
<p>I agree that there is a great deal of value in teaching statistics to high school students, even at the level of pre-calculus.  One can still impart a significant amount of intuition at this level.  However, for students who plan to use statistics in any significant capacity, it&#8217;s important that they develop a working knowledge of calculus as well.</p>
<p>If you&#8217;re not planning on going into a technical field, certainly you&#8217;ll get more value out of a basic statistics class than you will a calculus class.  But students who dislike math in general will probably still dislike statistics, even though there&#8217;s more to like for someone who&#8217;s not interested in math than there is in a calculus course.</p>
<p>This, in turn, brings me to my next point&#8230;</p>
<p>4) In the larger debate over the failings of math education in America, the choice of whether to teach statistics or calculus in secondary school misses the point entirely.  By the time students reach the later years of their high school career, most already have a pretty well developed sense of their relationship to mathematics &#8211; either they had good teachers and enjoyed the subject, or through a series of misfortunes which may have been out of the student&#8217;s control, they feel like math is a subject they will never understand, and will struggle with until they have the freedom to not take a math class, and are finally free from its iron grip.</p>
<p>Sadly, the problems with math education in this country run much deeper, and swapping out calculus for stats at higher levels won&#8217;t alleviate the fundamental problems students have with mathematics.  When I grade papers in a calculus class, students make just as many (if not more) algebra mistakes as they do calculus mistakes.  In other words, many students leave the year without having mastered the math concepts presented to them during that year.  Compound this over several years, and it doesn&#8217;t matter if you give them a calculus book or a statistics book &#8211; they will have trouble because they haven&#8217;t mastered the prerequisites.</p>
<p>Certainly one can argue that there are fewer prerequisites in a statistics class, but prerequisites are still present, and algebra is certainly one of them.  If a student has a poor understanding of algebra, it&#8217;s reasonable to assume he will have significant gaps in his understanding of statistics, and if the goal is to give students an intuition for randomness and understanding data that can help them in their everyday lives, gaps in statistical understanding are significant problems.  Therefore, achieving this goal isn&#8217;t as simple as making sure every high school senior has taken a statistics class &#8211; we really need to insist that every student first has a working knowledge of algebra.  This is a problem we have already, and is not resolved by Benjamin&#8217;s proposal.</p>
<p>5) This is a small point, but important.  What really bothers me about this talk is when Benjamin makes the statement that, &#8220;If [probability and statistics] is taught properly, it can be a lot of fun!&#8221;  Well, yes, but this is true of any subject.  Implicit here seems to be the idea that calculus cannot be fun, even if taught well.  I&#8217;m sure this isn&#8217;t Benjamin&#8217;s intention, but it&#8217;s easy to misinterpret, especially if you are someone who has never taken a calculus class, or has fallen victim to the commonly held opinion that calculus is some kind of black magic whose secrets only a chosen few can hope to unravel.</p>
<p>The truth (and one that Benjamin knows) is that any math class can be fun if taught properly.  A more accurate statement might be &#8220;it&#8217;s <span style="font-style: italic;">easier</span> to make probability and statistics fun for students,&#8221; because of the vast applicability to everyday life, from games of chance to calculating the probability that someone in a family is colorblind.  But to suggest that statistics is inherently more fun for students than calculus does a disservice to all the great teachers of calculus.  Either class can be fun and valuable if taught well, or traumatizing if taught poorly.</p>
<p><center><object height="344" width="425"><param name="movie" value="http://www.youtube.com/v/nJ3qw4McwO0&amp;hl=en&amp;fs=1&amp;"><param name="allowFullScreen" value="true"><param name="allowscriptaccess" value="always"><embed src="http://www.youtube.com/v/nJ3qw4McwO0&amp;hl=en&amp;fs=1&amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" height="344" width="425"></embed></object><br /><span style="font-size:78%;">Mr. Prezbo, using probability to make math fun.  No doubt he could work this magic on other math subjects as well.</span></center><br />I understand what Benjamin is saying, and I also understand its appeal.  The argument works well as a 3 minute sound clip, but upon further reflection, there are some significant questions that need to be addressed.  There are many problems with math education in this country, and I&#8217;m not sure which, if any, are solved by this proposal.</p>
<p>From my own experience, no students in my high school were forced to take either calculus or statistics, although both courses were offered.  Preparing students exclusively for either one or the other will of course do a disservice to some, so perhaps putting both on the table is the best compromise, although this becomes a problem for schools with limited resources.  I am confident, however, that simply putting statistics on the pedestal currently occupied by calculus doesn&#8217;t do a whole lot in terms of fixing everything that&#8217;s broken.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.mathgoespop.com/2009/07/restructuring-the-math-pyramid.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

