Runs Created, Base Runs, Estimated Runs Produced, wOBA,
EqA TAv… all of these are sabermetric stats are meant to estimate an individual’s, team’s, or league’s expected runs. They variously fall into different categories of run estimators. They can be used for many different purposes, but the main one most of us are immediately concerned with is how they measure the relative performances of individual hitters. Thus, we want the one that is the most “accurate.”
So much has been written about which is the “best” over the years that there is no point in trying to do it again. Instead, I want to offer some observations and reflections on the practical side of things: how much of a difference does using one particular sort of run estimator over another really make?
If there is anything I’ve learned from dabbling in sabermetrics over the last few years it is this: while we have learned a lot about baseball through objective research, that amount is dwarfed by what we still do not know. That “what we do not know” region is the “fog” that Bill James defined as the task of sabermetrics (and what gives this column its oh-so-clever title).
However, if there is one thing that sabermetrics has done pretty well at, it is run estimation. As I wrote above, I will not get into everything about run estimators here, check out the links I have provided in different places in the post. When you see “Runs Created” or “Batting Runs” on a player page at a particular site, what does it mean?
It usually means the number of runs we would expect the events that player is responsible for given the same number of plate appearances (or outs — another issue too large and complex to get into here) on an average team. This use of run estimators as applied to individuals thus attempts to put players on a level playing field. Yes, each walk by a player on the 2012 Cardinals is worth more than one by a player on the 2012 Pirates, but if we are comparing the performances of all the hitters in the league, would taking that into account really be fair to Andrew McCutchen? I would guess that for most of our discussions, the answer would be “no.”
The truth is that there are not giant differences, generally speaking, between run estimators in each season. Yes, some are more accurate than others. On an individual level (which is different than an team or league level, but that is a topic for another time), the best tests have generally shown that linear weights-based estimators (of which wOBA/wRAA, TAv/Tr, and Batting Runs are all varieties) are superior to stuff like OPS and James’ Runs Created. But how does that work itself out in a practical way? I’ve had people say that a preference for linear weights lased estimators is simply an idiosyncratic preference, so (if only for the sake of nostalgia) let’s take a brief, closer look using very basic versions of run estimators.
For the sake of simplicity, I have chosen two very basic estimators that, at least before wOBA made all of this much easier for everyone, are pretty simple and easy-to-use. Bill James’ Runs Created is an early and very simple dynamic run estimator. James has added later, more technical versions, but they are considerably more complicated, and simplicity was one of the main reasons the original version was so attractive. The formula I will use here is like James’ Basic Runs Created, except that for walks I use walks minus intentional walks plus hit by pitches. Thus:
RC = ((Hits + Walks – Intentional Walks + Hit by Pitches) x Total Bases)/(At-Bats + Walks)
There are many versions of linear weights estimators, but an early one that is simple enough to “match up” with Basic RC is Paul Johnson’s Estimated Runs Produced. It also has the ironic historical connection with James, who once claimed it was better than linear weights… although it turned out that ERP was itself a linear weights estimator. I’ve used Patriot’s simplification of the original ERP formula with the same modification for walks added above. Thus:
ERP = (Total Bases + 0.5 x Hits + Walks – Intentional Walks + Hit by Pitches – 0.3 x (At-Bats – Hits)) all times a constant, usually around .32.
[Four too-brief technical notes: 1) To "even things out," for each season I multiply each formula so that RC or ERP equals actual runs on a league-wide level. 2) Remember that these calculate "absolute" runs created, not the runs above average figures that are used in, e.g., WAR(P) calculations. That is not a conflict, simply a choice made here for straightforward issues. 3) I have not adjusted for parks, leagues, or eras, as the issue is to compare the result of the run estimators, not the players. Doing so would add unnecessary complications. 4) These "absolute" values may not match up with other linear-weights based absolute values you might find on the web, but that has to do with using outs rather than plate appearances as a denominator. That comes down to a difference in presentation. Thanks to Patriot (my "father confessor" when it comes to run estimation) for helping to clear up this last point in private correspondence.]
As I said above, my concern is not to attempt one more “accuracy” test. I think that issue has been decided pretty clearly for linear weights-based estimators on an individual level. I want to see what differences there are on an individual and somewhat anecdotal level. Using the formulas above, let’s see where the biggest differences lie looking at individual hitter seasons from 1955 to 2011 with at least 500 plate appearances as found in the Lahman database.
Who are the five hitters that Basic RC favors most over ERP?
["Diff" is short for "difference," which is here defined as RC minus ERP.]
Those are five of the best (unadjusted) offensive seasons of modern baseball. In one sense, that is not surprising, since better hitters typically have the most raw numbers of positive events in their stat line, so little differences will be exaggerated. But those are some pretty big differences — these days, 30 runs, or three wins, is worth about $15 million a year in free agency.
Looking at the five hitter seasons most favored by ERP relative to RC during the same era starts to give us the clue as to what is going on:
You have probably noticed that while there are some good seasons on this list, they are not nearly as great as the seasons RC favors. This confirms what research has shown. For one thing, RC tends to (over)value “great” seasons much more because it is a dynamic run estimator applied to an individual. Briefly — it assumes that once Mickey Mantle takes a walk, he can also drive himself in. That is obviously a mistake, but we are not focusing on which is correct here, simply to note that there is a difference.
Moreover, the difference comes about because RC values hits, especially singles, more highly relative to other events than do linear estimators like ERP. The hitters on the first list all had extremely high batting averages (maybe not Bonds, but early 00s Barry Bonds pretty much “breaks” every run estimator anyway for a number of reasons), so that makes sense. The hitters that ERP likes more than RC generally get a greater portion of their value from walks and other non-hits. So whichever one you think is correct, there is certainly an important difference lurking. [Insert Dayton Moore joke here.]
The larger point is that while if one looks at all players grouped together, there might not be much of a difference, but for certain, extreme players, there can be a pretty big difference. To illustrate this from a different direction, let’s look at the five players with the smallest differences between their RC and ERP totals (I had to go to four decimals for this!).
See? Hardly any difference! So what’s the big deal, you ask? Well, I would say the main difference between these seasons and the seasons in the first two lists is that these are the kinds of seasons and players that, frankly, we generally do not care too much about. Whether in free agency or, especially, in historical discussions, we are more concerned with those players who at the extremes, or who have unusual lines. Thus, the big differences between run estimators in those cases weigh much more heavily than those for “meh” seasons.
So how is that for a justification of extended nerd-ery?