Arizona Diamondbacks v Milwaukee Brewers

Fichte started using the word Wissenschaftslehre as a substitute word for philosophy in a post-Kantian vein simply because, as he put it somewhere (I can’t find the exact quote or reference) that since the Germans invented this science, they should get to name it. “Sabermetrics” also because a word for a discipline via authorial fiat. It does not have anything like a long Latin etymological lineage. It was simply a neologism created by Bill James in order to put a name to the sort of research that people like himself and (more influentially, if far less famously) Pete Palmer were doing at the time. There is not much to the story. James simply decided to name this new discipline after the Society of American Baseball Research. Intentionally or not, that was a bit ironic, because, although things have changed in recent years, at least at the time SABR more focused on historical discussions rather than towards sophisticated statistical analysis in the way we think of it now.

Origin stories aside, James simply defined sabermetrics as the “search for objective knowledge about baseball.” That sounds a lot like good ol’ science’s goal, and while I am sometimes wary of the rhetorical uses to which “sabermetrics is (or should be) a science” is used, in a simple sense, it works well enough for me. At least on a fairly surface view (and who really wants to get into big debates in post-empiricist philosophy of science here? Not me. Not anyone, I hope.), science as a search for objective truth requires forward motion. It requires putting forth hypotheses and theories that fit the available evidence to the best of our powers of understanding, then replacing those hypotheses and theories with (hopefully) better ones. (One last time: I know this is crude and even controversial way of characterizing science to some, but I am not trying to be technical here.)

Yet it is also (to the point of cliche) true that the past has much to teach us. Even more established sciences are not the linear, triumphant march, even if certain Whig histories would have us believe it to be so. That is all a lengthy introduction to this post and what may be an occasional sub-series on older metrics and what usefulness they might have. Today, I will briefly discuss how range factor, one of the original defensive metrics, can still be useful despite the emergence of more advanced fielding stats.

A blog post is no place for a full history of the development and controversies around, well, anything, including measuring fielding in baseball. It will touch on those things, but those debates go on in more depth and competence elsewhere. This post will be simply touch on those aspects that may be helpful for seeing how plain ol’ range factor should not necessarily be completely ignored. (I am not going to deal directly with an even older metric, fielding percentage, here. It is relevant, but for now see Branch Rickey‘s comments in his classic Life Magazine article Goodbye to Some Old Baseball Ideas.)

Bill James came up with Range Factor. It is simply a way of expressing the rate of plays made per game by a fielder using the formula (Assists + Putouts)/Innings Played) * 9. Note that it uses only objective stats – assists, putouts, and innings being defined by the rules of baseball, and thus objective in that realm rather than the subjective errors. Simple enough. There are obvious problems when really trying to use this stat for valuing player contributions. A relatively minor one is that different positions have different average rates — if one is trying to do a total value stat, one can simply use positional average as the baseline.

More significant is the problem of using innings as the denominator. After all, for every three outs, there could be any number of balls in play while the fielder is on the field. Every inning is three outs, but one inning could simply be three strikeouts, while another could feature 7 hits on balls in play and 3 putouts — this difference is not reflected in range factor. This can be adjusted for, and has been, in different adjusted range factors, by using balls in play during the time the fielder is in action.

A further problem, even with that simple adjustment, is figuring out whether or not the fielder would have been “responsible” for a play that was not made. Range factor does not consider hit location, mostly because this was not really available or recorded, in a even a crude form, when the metric was invented. Which obviously makes a difference. Why should a shortstop be rewarded or punished for a long fly to right field that is either caught by the outfielder or falls in for a hit? This can be somewhat accounted for by taking into account the average number of plays a team makes on balls in play as well the average proportion of balls fielded by a specific position. When dealing with seasons for which more specific play-by-play information, one can take into account batter and pitcher handedness for batted balls as well, since those show tendencies to be fielded by particular positions and hit rates on balls in play. One can even adjust for parks.

Despite all the advances made in range factor starting from objective information, given facts and proceeding up to estimations of hit location, it was obvious that they just band aids to cover the bleeding wound fielding statistics leave on baseball analysis. Not only was the location of balls in play a problem, but even then, some are obvious more difficult to make plays on than other. Even the most complex adjusted range factors leave these obviously significant dimensions out, for better or worse. (Baseball Prospectus new FRAA might be seen as a particularly well-thought out and sophisticated variant of Range Factor, although I do not know if they would agree with that description. I simply thought it was worth mentioning for the curious.)

Fast-forwarding past intervening steps (such as stuff like the original Zone Rating) to the current era of publicly-available “advanced” defensive metrics such as Ultimate Zone Rating and Defensive Runs Saved (formerly known as Plus/Minus). I will spare you yet another primer on these, and the links can be followed for more information. While they have their differences, philosophically, both share mostly common ground. To determine hit location, the field of play is divided into a number of zones. Balls hit into those zones are manually classified by stringers in the stadium and/or on video by difficulty — even grounders and liners and fly balls are divided by difficulty.

Summarizing a bunch of different stuff: the systems then take each kind of ball in each location and see how often each fielding position makes those plays on average. That becomes the baseline for seeing how many plays the fielder makes above or below average for his position relative to “expected outs” (average). (This is leaving out important aspects such as outfield arms and double plays, but range is the most problematic aspect and is directly addressed by range factor, so it is worth pursuing). At least initially, it makes sense; while there is a fair bit of fluctuation from year-to-year in the advanced methods of measuring range, there is enough correlation to give it prima facie credibility, for better or worse. At least methodologically, it seems to be an improvement over even adjusted range factor, as it does not extrapolate from league-wide frequency and handedness of batters, but rather uses specific observations on balls in play.

If you are reading Getting Blanked, chances are you know that these advanced metrics still remain controversial in many circles, sabermetric and otherwise. Leaving the “otherwise” objections aside for now and eliding the technical details, I will point out two specific issues that have been raised. First is the issue of the appropriateness of using zones to determine batted ball location and thus fielder responsibility for expected outs. The other issue is how good the batted ball classifications really are — some inconsistencies have been found from stadium to stadium. This encompasses a number of issues about observer and range bias. This is not an accusation of the stringers being biased towards one team or the other, simply that, for various reasons they are not consistent enough from player to player, batted ball to batted ball, and stadium to stadium for the data they provide to do the work it needs to do.

I am not going to adjudicate those debates here. The issues are pretty serious for those interested in having publicly available defensive metrics be useful. If the zones are not really an adequate method for determining hit location and expected outs, then the baselines are shaky. Moreover, if the locations are not very consistent, then the run values for each batted ball are likely to be off, too.

What the “minimum” standard for the metrics should be is not something I am qualified to judge. For the sake of getting back to the point, it is simply worth noting that however much the batted ball data strives for objectivity (and it is not as if the stringers and video observers are unsupervised hacks), there are limitations there, and “subjectivity” is creeping in. This is not to dismiss advanced defensive metrics, but is simply to acknowledge (or remind) us that they are far from perfect. Yes, Fieldf/x is probably a serious advance. (For those hoping it will come in and save the day, well, I will simply say that no one should be holding her or his breath waiting for that data to be released publicly.)

Am I saying we should go back to some variant on Range Factor and abandon the supposedly “advanced” metrics? No. Personally, I look at advanced defensive metrics. However, I do not think it is a one-or-the-other kind of thing. I do not want to leave this as a Golden Mean cop-out, though. Let me be more specific: Range Factor and its variants can be useful as a “check” on how much work the “extra” mechanics of UZR and DRS might be doing when it comes to measuring range. This is because Range Factor involves only the objective facts of putouts, assists, and innings played (or balls in play for more advanced versions).

I have gone on (far too) long enough, so I will conclude with an example of how range factor might be used to “check up” on the other metrics. Let’s take last year’s UZR second base ratings for Range Runs. Take Darwin Barney: he was second in UZR Range Runs at about 10 runs above average in 2012. His Range Factor per nine innings as a second baseman was third in baseball, and he had the most putouts. In other words, the simplest counting and rate metrics are in general line with him making a lot of plays relative to most second basemen last year, whether hit location and play difficulty are taken into account or not.

Now look at Aaron Hill‘s numbers. According to UZR, he was about two runs below average in terms of range in 2012. Still about average, but it is just one year. However, according to the older metrics, he was one of the better second basemen in baseball in this respect in 2012. He was fourth in putouts as well as Range Factor per nine innings. Obviously, I am taking liberties, to put it kindly, in comparing absolute and unbaselined measures like range factors with what is intended to be a relative-to-average measure like UZR Range Runs. Despite that, I think the ranking overall makes the point. According to UZR, Hill was middle-of-the-pack in terms of range in 2012, while the older metrics see him as one of the best in the league.

The issue is not which is to be preferred, but rather to see how Range Factor can be useful even to those most enamored with advanced fielding metrics. In Barney’s case, the old and new stats are in basic agreement. In Hill’s case, while he fielded a lot of balls, it may be that he simply had more opportunities (although Baseball Prospectus’ FRAA, which adjusts for that, also sees him as being tremendous in 2012.) I am not here to say which set of metrics is right about Hill. What I am saying is Range Factor can help us understand that we might want to be less confident in what UZR is telling us about Hill. For example, Hill may may have encountered, easier-to-field batted balls in 2012, and UZR is giving him less credit for those. The problems is, as discussed above, it is the difficulty (for example, precise location, fielder positioning, speed of the ball off of the bat) that are particularly nebulous areas for the advanced metrics, whether due to zone-based methodology or batted ball data.

Thus, the limitations of Range Factor and its components are the very things that make it useful as a way of tempering evaluations based on advanced metrics. And that is how I use it.

Comments (1)

  1. I have no problem with using advanced defensive metrics as a tool for evaluating player defense; having more data to reference is never a bad thing.

    But I think defensive metrics are a causing a large problem in player evaluation because of the way they are integrated into WAR. Not only is there a lot of debate as to the validity of individual metrics, there is also (and should be) lots of debate as to how relatively important defense is to a player’s overall value. But very few people cite oWAR (which we should all be able to agree is a very good aggregate of offensive & baserunning ability; I have a very difficult time finding objectionable player comparisons using oWAR). Almost everyone cites WAR.

    This annoyed me to no end in the Cabrera-Trout MVP debate. I personally feel that Trout was the better player, but it was an obviously close comparison. oWAR supports that assessment, but using WAR makes it seem like a rout. Similarly, BBRef WAR considers Yunel Escobar’s 2012 season superior to Derek Jeter’s. With all due respect, no one could credibly make that argument.

    Until defensive metrics become more credible, or unless they are weighted less heavily to reflect their volatility, their inclusion in WAR is a barrier, not a boon, to effective analysis and debate.

Leave a Reply

Your email address will not be published. Required fields are marked *