The Danger of a Small Sample Size

Fair warning, little geographic content makes it onto Twelve Mile Circle today. Mostly I’ll focus on statistics. No, no, don’t go running for the door quite yet. It will be fun and actually the statistical slant will be relatively mild, grossly overly simplified with sweeping generalizations and involve no actual mathematics.

I went to a Washington Nationals baseball game yesterday evening. If you’re a baseball fan I already know what you’re thinking, but actually this is a happy story.


Final Home Game of the 2009 Season

Nationals Park. Photo by howderfamily.com; (CC BY-NC-SA 2.0)
Nationals Park on September 30, 2009; a crappy image taken with my mobile phone. Check out the size of the crowd

Imagine the Nats down 3-4 at the bottom of the ninth inning, bases loaded, two outs, a batter facing a full count, and BAM, a walk-off grand slam homer with the Nationals taking the game 7-4 over the Mets.(1) The Post declared “the last scene of the 81st and final home game of the season was unlike any moment that preceded it, perfect and delirious…”

This was the last home game of the year, a final chance to catch a ballgame in Washington until next April. I’m not a huge fan of any given team. I have no loyalties. However, I do enjoy attending live sporting events. I love being part of the crowd, soaking up the atmosphere, participating in the pageantry, and drinking a couple of beers with friends. Buy me some peanuts and cracker jacks, and all that. I’d only been to one other game the entire season so I figured I should fit this last opportunity into my schedule when it was offered to me.


Earlier That Season

Ryan Zimmerman Washington Nationals 2009. Photo by howderfamily.com; (CC BY-NC-SA 2.0)
Ryan Zimmerman on May 24, 2009; when I remembered to bring a much better camera

Oddly, the only other game that I’d happened to attend this year also featured a grand slam home run. That time, on May 24, 2009, it came off the bat of Adam Dunn. He put the Nats out front in the seventh inning. They left the field with a victory over the Orioles, 8-5.

Getting back to the title of this article, the Danger of Small Sample size, if one were to examine only these two instances one might conclude that the Washington National are a fantastic team producing a grand slam home run every game as they cruised to another victory.


Not As it Seems

Let’s get real. Grand slams don’t happen often. The chance of seeing one occur in two separate pseudorandom games — the only two games attended in an entire season — has to be considerably less than one percent. I’m sure the baseball statisticians out there could find the actual number of grand slams, the actual number of games, and the joint probability of two such occurrences. If there are any baseball math nerds out there who’d like to take this on as a challenge and post the calculation/results in the comments below, it would be a truly outstanding contribution. Until then I’m going to forgo precision and guess “it is fairly remote.”

Let’s look at the other improbable thought. This one is easy to dispel because the entire population is well known, and in fact is published widely every day. As of October 1, 2009, the Nationals have won 55 games and lost 103, with four games remaining in the regular season. This isn’t good. In fact it’s the worst in major league baseball. Any season with a hundred or more losses is considered pretty dismal, and the Nationals have accomplished this dubious achievement two years in a row.

Sample size of two: the Nationals are a great team.
Larger sample: well, not so much.

Either way, I’ll be out at the ballpark next spring. Maybe I’ll bring them some luck.


12MC Loves Footnotes!

(1)My apologies to those of you who live in parts of the world not familiar with this sport. It’s probably best to simply note that this was “an exciting and somewhat rare event

Comments

3 responses to “The Danger of a Small Sample Size”

  1. Craig Avatar
    Craig

    Taking a crack at this….

    Each team plays in 162 games in the regular season, and there are 30 teams in MLB. So there are are 2430 games played per season.

    From 2001 to 2008, the average number of grand slams was 129.6 +/- 4.7 per season. (I note that 1999 and 2000 were each higher at 140 and 176, but I’m no baseball statistician, so I don’t know why.)

    I have read that having two grand slams in one game has only happened 13 times in the history of baseball, so I’m going to assume that most grand slams in a season are in distinct games.*

    That means that the probability of seeing one in a game, lately, has been about 5.3+/-0.2%. And seeing two of them should be about 0.28+/-0.02%.

    That makes your chances of choosing two games and having a grand slam be hit in both equal to about one in a little over 350.

    That being said, obviously, chances vary depending on the teams, the hitters, the pitchers, the weather conditions, etc.

    (*Even though I know that the Nats’ Josh Willingham had two in one game this season in an away game against the Brewers.)

    1. Twelve Mile Circle Avatar

      Craig: I went through a similar mental exercise albeit less precise and came up with a figure within the same (ahem) ballpark. I got the 2430 pretty quickly but didn’t have access to the grand slam per season figures (except for a single season, 2008 I think, that was listed in Wikipedia) so I didn’t have great confidence in it. You’ve obviously added some much needed precision which is much appreciated. One in ~350 is definitely pretty unusual but hardly anything approaching lightning-striking territory. I think what you’re saying is that I’m not lucky enough to quit my day job and play the lotto full time? 😉

  2. Greg Avatar
    Greg

    This is remarkably close to a crossover with my second-favorite blog, Five Thirty Eight (a politics-meets-stats blog run by a guy who got his start as a professional baseball statistician).

    If I had to guess why the ’99 and ’00 grand slam numbers were so much higher (and I say this without looking anything up, so I’ll probably be wrong here), it would be that those were basically two big Steroid Era years. So maybe home runs in general happened much more often, and grand slams rose with them.

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Comments

  1. Osage Orange trees are fairly common in Northern Delaware. I assumed they were native plants. As kids we definitely called…

  2. Enough of them in Northern Delaware that they don’t stand out at all until the fruit drops in the fall.…

  3. That was its original range before people spread it all around. Now it’s in lots of different places, including Oklahoma.