It is always worth reminding ourselves: sabermetric research has a long way to go. The (rather obvious) truth is that the objective accounting of everything in baseball is what Edmund Husserl called, in a different context, an “infinite task.” In other words, there is a goal that guides sabermetric research – one that can never be reached. Thus, the actual sabermetric project is never ending.
Fielding and even pitching value are just two of the many areas in which sabermetric research is well-known to have a long way to go. One area in which sabermetrics has done well over the years is run estimation. Run estimators do what their name suggests: estimate the number of runs that would result from a given combination of events. While we already know how many runs actually scored, there are situations for which one would want a different sort of estimate. For example, one might want to know how many runs an player would produce if one added his events to an “average” team in order to establish his neutral offensive value. Or one might want to know how many runs a projected lineup would produce. Things of that nature.
In any case, there are a number of types of run estimators. The most commonly used type is linear weights run estimators. There are various good reasons for this. While the principles behind the various linear weights offensive run estimators are basically the same, one difference can be the exact weights given or the way those weights are derived. It can be helpful to use the resources of a dynamic run estimator in order to get more accurate linear weights values appropriate to a particular run environment while still being able to apply them to individual players.
This is not the place for a full historical or systematic overview of run estimators. (Here is a good rough outline, and here is a series of more detailed discussion). Of particular relevance here is the difference between linear and dynamic run estimators.
Linear run estimators are the more commonly used. They basically assign a coefficient to each event, which is multiplied by the number of each event, and the products are all added together to produce the number for runs created. This can be done for any entity — an individual, a team, a league, or so on. The coefficients can be derived in various ways, with different baselines. One can either use “absolute” linear weights runs created or, more commonly, linear weights baselined against average, that is, the runs a batter or team in an average context would be expected to produce in the same number of plate appearances or outs. One classic linear weights formula is Pete Palmer’s: .47*1B + .85*2B + 1.02*3B + 1.40*HR + .33*(BB + HBP) – ABF*(AB – H).
It’s application should be pretty obvious, and it shares the same basic principles with other linear estimators like Paul Johnson’s Estimated Runs Produced or linear weights used as the basis for wOBA or wRAA as implemented at FanGraphs and Baseball-Reference (which uses wRAA as the basis for Batting Value in rWAR).
One element that may strike one as being strange in Batting Runs is ABF. ABF is the factor that contextualizes batting runs so that the total amount of batting runs above and below average for the league is zero. In this way, the linear weights of the events (since they all end up being relevant to the out, the AB – H factor are “customized” for the seasonal (or whatever parameter) run environment. Again, similar things can and are done to adjust other linear run estimators like wRAA for a particular season.
The basic advantage of a linear run estimator is simplicity: just basic math operations. Moreover, they can be easily applied to individual players without logical problems. The disadvantage is that on a corporate level for teams and leagues, especially, linear weights estimators are not as accurate as better-designed dynamic run estimators. This is true not only of over run estimation, but on an inning-level analysis of what how many runs individual events produce. Without getting into all the details of an innings-level analysis, one can see that the simplicity of the multiplicative adjustment is a trade off for inaccuracy. While an adjustment like ABF does adjust to the run environment, the basic proportional differences between events stays the same year to year, even if the numbers all change.
One option is to use empirical linear weights for each particular season or whatever by looking at the empirical, average run expectancy for each event. The basic weights for Batting Runs and wRAA are derived from these on a general level, if not specified for each season in a direct way. In principle, this is the most accurate way to get seasonal linear weights for each event, since one is directly measuring the average change in run expectancy for each event in a season. Ideally, then, this is the way one would do it. The problems are that if one does not have all the play-by-play data for the season in question, it cannot be done accurately, and even then, it is not easy, though some do it (Baseball Prospectus’ TAV currently does this). Even then, it really only works retrospectively — outside of doing a simulation, applying these sorts of linear weights to a projected season or combination of players is difficult.
Dynamic run estimators, on the other hand, try to model the run scoring process in a way such that the value of events change based not only on the runs per out or plate appearance, but from the specific combination of event themselves. Bill James various versions of Runs Created are dynamic run estimators, although they also have their problems that can be read about at length elsewhere. Over the last number of years, David Smyth’s Base Runs (as also theorized by Patriot) has been acknowledged by saberists as an excellent dynamic run estimator. One good explication of Base Runs can be found here.
The basic form of Base Runs is A * B/(B+C)+D, where “A” is baserunners, B is advancement, C is outs and D is home runs. How each element is worked out varies from formula to formula, and Patriot provides some starters here. Base Runs has been found to be more more accurate than both basic linear estimators and Runs Created, although obviously empirical linear weights will be just as good, and are even, at least in a basic form, needed for Base Runs to get started.
Base Runs is great for generating team or league run totals, and is also promising as the basis for individual pitching metrics (and I am surprised they have not been more widely used for this). It is not, however, proper to use BaseRuns formulas directly for individual batters. This is because Base Runs models the runs scoring process — runners on, runners over — and a runner cannot drive himself in.
However, Base Runs can be indirectly used for individual batters. One way is the theoretical team model, for example, putting a hitter and his events on a theoretical team with eight league average players to see what the difference in runs scored would be. That is useful, if complex.
Another way is to use Base Runs to generate intrinsic linear weights for an entity, e.g., a major league system, then apply those linear weights to individuals. Prior to FanGraphs publishing their guts, I used to publish basic custom wOBA and linear weights using Tango’s old scripts. Since then, I have wanted to do something similar using Base Runs. I did more than just this, but the other results (e.g., a rate metric along the same lines as OPS+ or wRC+) is not ready for Prime Time.
With help from Patriot a long time ago (and he should not be held responsible for my screw ups), I found the Intrinsic linear weights using this method. I used Patriot’s formula from Base Runs that matches empirical weights from 1960-2004 as its basis. However, as noted, by using a dynamic run estimator to calculate individual rates for each season, not only are they adjusted for the number of runs per out for each season, but for all the other events and how they impact the value of one another.
I used Patriot’s formula that also includes a number of specific events not always taken into account. There might be some improvements that could be made, but I think this is pretty good, and at least demonstrates what can be done. The table below with each individual seasons values might be a bit confusing, so some explanations might be in order for a few columns (I rounded to three decimal places so the table would fit the page).
Because of the way I generated the formula, these are almost all “absolute” linear weights values as opposed to those above/below average, although one can get to runs above/below average, too. “R/O” is league runs per out (one could separate NL and AL, but I wanted to keep this relatively simply). This comes in handy for making the switch the runs above/below average. “Oval” is the absolute out value for each season. “Aout” is the value of an out above or below average — it is just Oval minus runs per out. This is not used directly in the formulas for individual RC I have used below, but I have included it here so it can be compared the out values given by other run estimators. “Kout” is absolute value of a strikeout as opposed to other outs. “SH” and “SF” are sacrifice hits and flys — these are the absolute values, the average value is negative. “GIDP” is the value of a double play out beyond the value of the first out.
To get the formula for an individual player’s runs created using these values, one can simply treat these values as you would in a linear runs estimator. Let’s use the listed 2012 coefficients in the examples. Using negative values in the table sort of confuses things in this transition; I will use subtraction for the negative events instead of all addition and assume the adding negatives.
For “absolute” runs created:
.449*1B + .715*2B +.967*3B + 1.363*HR + .316*(BB-IBB) + .193*IBB + .172*SB -.233*CS +.057*SH + .135*SF – (AB-H-K)*Oval – K*Kout – .418*GIDP
For runs created above/below average, we simply incorporate seasons runs per out:
.449*1B + .715*2B +.967*3B + 1.363*HR + .316*(BB-IBB) + .193*IBB + .172*SB -.233*CS +.057*SH + .135*SF – (AB-H-K)*Oval – K*Kout – .418*GIDP – (AB-H+SF+SH+GIDP)*.161
It is interesting to compare both the event values given here to the static linear weights given elsewhere, as well as the runs created values with those given for players. It is not all apples to apples: I included pitcher hitting here, some do not. I also use outs as the denominator (it was more straightforward given my method), others use plate appearances.
[My thanks to P for all the help and encouragement he gave me a long time ago with this stuff, hopefully I've gotten it basically right, although it is not his fault if I did not.]