When discussing predictive statistics in baseball, year to year correlations are an important, yet sometimes overlooked, factor. In a previous post, I referred to baseball teams playing 162 times a season in the hope that over such a long schedule, the best teams will emerge because of their skill level, not through random luck.

Similarly, we often hear the phrase “small sample size” as a means of refuting a belief that a player’s true talent level can be deciphered after only a few chances. For example, looking at Brett Lawrie’s six home runs after 25 games at the Major League level, we can’t assume that he’d then hit 36 home runs over a full season.

Think about it in terms of a scientific experiment. If you run an experiment one time, can you safely assume that you’ll always get that result? Who would you trust more: the doctor who claims that smoking cigarettes cures cancer because he found one smoker whose cancer cells went into remission; or the doctor who claims that smoking cigarettes causes cancer because he found thousands of smokers who were all found to have similar types of cancer?

While we’re not exactly breaking new ground by coming to the conclusion that the more data you have the better, baseball fans, even with a whole season’s worth of information can still misuse and misunderstand that which has been collected. More often than not we will look at a previous year’s number and automatically assume that this number is a proper indication of skill or talent level.

Enter Beyond The Box Score’s Hitting Metrics Correlation Chart.

BTB’s┬áBill Petti looked at every batter from 2001 to 2008 that had at least 300 plate appearances in back to back seasons, and compared their numbers, with the idea being that a proximity in numbers year to year are a better indication of skill as opposed to varying numbers which indicate elements outside of the batter’s control.

It makes sense that a player’s plate discipline numbers have a high correlation from year to year because that best describes a hitter’s approach at the plate, which is unlikely to drastically change one year to the next. After that, we see similar numbers for a batter’s ground ball to fly ball ratio, meaning that not only is contact similar year to year, what happens to that contact is also similar.

Despite this, we see a relatively low correlation between batting average and batting average for balls in play which to me suggests that even though there’s such a thing as ground ball hitters and fly ball hitters, the frequency with which batted balls find more ground as opposed to a fielder’s glove is maybe a little bit more random than I would’ve thought.

We also see a low correlation for line drives, which itself often correlates to BABIP. So, while we can come to the conclusion that the angle at which a ball comes off a bat is important to deciding whether or not it’s a hit, actually controlling that angle isn’t necessarily a repeatable skill. For more on line drive rates, check out this article from The Hardball Times.

Petti goes over some of his other findings on the site, and it’s well worth a read.