The real baseball season is dawning/has dawned, and for most fans optimism still reigns. Of all the things to look forward to, we are most optimistic about the players who had big seasons in 2011 — they “broke out” after all, surely they will retain that level of performance!
The rise of sabermetric analysis and its dissemination by way of the Interweb has dampened that sort of enthusiasm a bit, as the notion of “regression” looms. However, while the term is used a fair bit, in my experience people still seem to get a bit confused as to what it means in certain contexts. Rather than offering yet another primer, perhaps a few simple and concrete examples of regression would make things clear.
Or, if you simply think that it is all sabermetric hocus-pocus, remember the following principle: “Regression to the mean does not apply to my favorite player.”
Here is one excellent, non-technical introduction to the concept of regression to the mean for baseball fans, and this article by Dave Studeman at The Hardball Times is also a classic. I cannot better those, but for those who want a quick explanation right here, I will try to give one that does not contain too many howlers. After all, I’m not all that different from the average joe in my mathematical (in)competence.
When we use the term “regression” in a sabermetric contexts, we do not necessarily mean “get worse.” We mean that a player’s true talent is probably closer to the average of the population to which he belongs. We use it as a tool to estimate what his “true talent” is as opposed to his observed performance.
For example, when a player goes 0-4, we do not think he is “really” a .000 hitter. The same player could go 4-4 in the next game, and no one would think he was a 1.000 hitter, or, combined with the previous game, a true talent .500 hitter. We know this because baseball fans have a basic, intuitive understanding of things like “sample size” and “random variation,” even if we do not use those words. We understand that players will sometimes play above or below their true talent for period of time.
What sometimes gives us trouble is understanding that even one season is not a huge sample. We can understand a one-month hot or cold streak, but a whole season? That is not to say that one season does not tell us something important about a player’s true talent — it does. It simply does not tell us as much as we often think. When estimating a player’s true talent, we have to figure out what population he belongs to and “regress” his performance a certain amount toward the average of that population — the player’s “mean.” The more data we have of a player’s performance, the less we have to regress, although good projections always regress. But we don’t want to get too far into the mechanics and philosophy of projections (I’ve written about it here and here) at the moment.
Perhaps we can intellectually acknowledge this, but the idea that the best are not as good as the they look and the worst are not as bad still may “feel wrong.” You’re telling me that Roy Halladay isn’t as good as he has pitched? Well, not exactly. Regression to the mean and projections generally have trouble “nailing” single player true talent. But the idea is really to use statistical tools to be right for populations in general, and to be close on groups of individuals rather than “nailing” specific players.
So as a way to illustrating that regression does, in fact, happen for groups, I have taken an simple idea inspired by The Book. As we start the season, we may be expecting that last year’s best players will repeat their performances, so I thought it was a good time to look at the leaders (and trailers) from 2010 to see what happened with them in 2011. Some were better, some were worse, but what did they do collectively?
To begin with, let’s check out 2010′s top 15 qualified hitters by wOBA and compare their 2011 performances:
|Player||2010 wOBA||2011 wOBA|
As you can see, while most of the hitters listed here did worse in 2011 than in 2010, some did better. Does this show that regression “does not apply” to some hitters?
Not really. Remember, we never really “know” what a player’s true talent is. Even after years of performance, we still have to adjust for age. Again, regression is a tool to help us reduce our error in estimates. The hitters with the best numbers in every year generally are playing “over their heads.”
In any case, just comparing two years without any other information is not enough to tell us which year contains more random variation — we do not know just from this whether Jose Bautista was more over his head in 2010 or 2011. You can come up with possible “reasons” for specific declines — Choo’s injury, for example. But again, this is a very basic demonstration. Generally, most of the best are not as good as they looked
“Regression” is most often taken to mean that a player will get worse, and that is how the word is used in other contexts. However, as mentioned earlier, is also means that players that are part of the population in question are probably closer to average than they appear. So let’s look at 2010′s worst hitters by wOBA:
|Player||2010 wOBA||2011 wOBA|
Again, as a group they were considerably better in 2011 than in 2010. Does this mean that Melky Cabrerea and Jeff Francoeur are “really” actually good hitters as in 2011 rather than the dreadful ones they were in 2010?
Well, no. There was probably plenty of “luck” (bad in 2010 and good in 2011) at work in both cases. Regression and projection generally is going to miss… a lot. But using these techniques helps cuts down that error by not getting “caught up” in one really good or bad year.
How about pitchers? Let’s check out 2010′s best qualified starting pitchers by ERA (using FIP would be sort of cheating, since it already effectively regresses balls in play) compared to their 2011 ERA. Note that there are only 13 pitchers on this list because two of 2010′s ERA leaders, Adam Wainwright and Johan Santana, did not pitch in 2011:
|Player||2010 ERA||2011 ERA|
The nature of pitching and the ERA statistic itself means that it’s generally going to vary more year-to-year than something like wOBA, so as you can see, the group appears to have regressed quite a bit compared to the hitters (although that is just a subjective comparison, not a statistical statement). Trevor Cahill stands out — remember how after 2010 certain people who shall go nameless insisted that he could outpitch his peripherals? That was awesome.
The best wouldn’t be the best if there wasn’t a worst. Or something. So we present 2010′s worst qualified pitchers by ERA compared to 2011. Again, this list is shorter because a couple did not pitch in 2011 (selection bias at work):
|Name||2010 ERA||2011 ERA|
At this point, you’ve probably gotten the idea. Again, this group “improved” in year two. (Kyle Kendrick’s 2011 numbers, by the way, are just as a starter, I don’t think I missed anyone else too badly).
One thing to remember with regression is that it means regression to the mean of the population from which the individual is drawn. When doing careful projections, one can get more specific in determining what population a player belongs to by handedness, size, position, and so on. I say this all as a way of nothing that while Kyle Davies seemingly did not regress toward the mean in 2011, it is, um, “questionable” whether he really ever belonged to the population of major-league quality pitchers. It’s too bad we don’t get to see a Davies-Mathis battery in Toronto.
All that is a roundabout way of saying that you can basically stick to two principles:
1) A player`s true talent is usually closer to average than it appears.
2) But regression to the mean totally does not apply to your favorite player.