## Roto-Relevant Research: Strikeouts Minus Walks

We’ve spent a lot of time and energy trying to improve our ERA estimators. In order to really show what the pitcher can control, we’ve taken things out, added things, and moved away from our simplest estimators in a search for the tiniest slivers of r-squared value. In trying to better predict next year’s ERA, we’ve run through FIP, xFIP, tERA, SIERA and more. Plenty of cooks, plenty of spices, plenty of tweaks in this alphabet soup.

Turns out, we coulda stopped before we started.

At least when it comes to in-season ERA estimation, the simplest estimator looks like it beats all the fancy ones. Glenn DuPaul did the legwork on The Hardball Times today, and he found that (K-BB)/IP beat SIERA out for in-season ERA prediction.

Here is the most concise table of results in his piece:

Predictor R-Squared RMSE
(K-BB)/IP 8.84% 1.092
SIERA 5.99% 1.127
xFIP 4.48% 1.145
tERA 3.04% 1.162
FIP 2.42% 1.170
ERA 1.45% 1.185

That last column (root mean squared error) represents the average distance the prediction was away from the actual outcome, so the smaller number is the better one. While SIERA did indeed improve upon ERA, FIP and the like, it’s still in a group that falls short of the simplest predictor. And though SIERA and xFIP added batted-ball information to give a nod to the ground-ballers (not the ground ballers) out there, that information didn’t add in-season value in DuPaul’s study. The chicken soup beats the consomme.

The whole thing is worth reading, but as it espouses an Occam’s Razor type result, I’ll eschew more complicated explanations and run with the simplest: the most important facet of pitching is striking people out, and not allowing walks. Get outs, don’t allow free base runners. That’s it!

The implications for fantasy baseball are obvious and enormous if not immediate. At this point, fantasy baseball is all about matchups and milking the schedule for a few final pieces of value. But we can look backward, perhaps at the half-way mark this season, and think about some of the ERA outliers that we were considering as sell-highs. We might find that K-BB/IP could have been a better guidepost for these players, considering it was for baseball as a whole. Certainly, Jeff Zimmerman has suggested this before, and now the research has backed him up.

Thanks to DuPaul once again, we have a list of the 27 first-half pitchers that either over- or under-performed their ERA estimators by a full run. It’s a delicious group when it comes to fantasy, rife with opportunity. Let’s sort by (K-BB)/IP to show what our most important ERA estimator had to say about em:

Name K-BB ERA FIP xFIP tERA SIERA 2ndERA
Max Scherzer 1.26 5.11 3.79 3.36 4.34 3.15 2.51
Tim Lincecum 1.1 5.57 3.62 3.71 4.38 3.89 4.59
Bud Norris 1.04 5.23 4.93 3.9 5 3.58 5.00
Cliff Lee 1 4.07 3.01 3.11 3.2 2.98 2.64
Francisco Liriano 0.98 5.53 4.64 4.46 5.55 4.46 5.01
Jeff Samardzija 0.97 5.91 3.92 3.86 4.76 3.92 2.58
James McDonald 0.9 2.49 3.09 3.8 3.58 3.7 6.18
Chris Capuano 0.89 2.7 3.74 3.87 3.89 3.76 4.70
Mike Minor 0.84 6.35 5.81 4.92 5.84 4.65 2.35
Joe Blanton 0.84 4.77 4.2 3.51 4.83 3.49 5.15
C.J. Wilson 0.82 2.33 3.48 4.04 3.65 4.13 5.35
Ryan Dempster 0.81 1.62 3.07 3.77 3.78 3.85 3.69
Jered Weaver 0.81 1.93 2.92 3.98 3.47 3.88 3.17
Jarrod Parker 0.79 2.19 3.09 3.93 4.56 4.25 4.33
Johnny Cueto 0.73 2.2 3.1 3.8 3.69 3.73 3.77
Luke Hochevar 0.73 5.28 3.94 4.29 4.59 4.23 5.79
Wade Miley 0.7 2.7 3.33 3.9 3.89 3.89 3.39
Ryan Vogelsong 0.69 2.51 3.67 4.48 3.42 4.34 5.50
Ross Detwiler 0.68 3.11 4.15 4.48 3.8 4.26 3.06
Jordan Zimmermann 0.67 2.67 3.89 3.78 4.4 3.9 3.33
Jason Marquis 0.66 6.55 5.8 4.59 6.21 4.57 4.57
Jeremy Hellickson 0.65 3.9 5.5 5.07 5.57 4.91 3.00
Randy Wolf 0.64 5.78 4.57 4.71 5.03 4.67 5.57
Kyle Lohse 0.57 2.81 3.71 4.19 4.32 4.33 2.82
Blake Beavan 0.56 3.8 5 5.34 4.81 5.37 4.02
Barry Zito 0.56 3.8 5 5.34 4.81 5.37 4.73
Jeremy Guthrie 0.51 6.67 6.52 5.25 8.28 5.1 2.58

The most interesting situations are those in which K-BB/IP had something different to say about the pitcher than the other estimators, of course. For example, Jeff Samardzija had okay estimators — all around four — but a top K-BB/IP number. Big score for the simple sauce. Francisco Liriano would therefore count as a loss for K-BB/IP, it seems. James McDonald — good simple sauce, bad estimators — is another loss. Mike Minor is a win!

On the bottom half of the ledger, Jordan Zimmermann is a loss for K-BB, since he’s below-average in that stat, but the estimators did no different. Kyle Lohse’s stats tell the same story. Ryan Vogelsong, on the other hand, had his demise more correctly predicted by the simpler estimator.

Jeremy Guthrie? Who knows what’s going on there.

The easiest take away is that there’s a simpler way to do this all now. You can look at strikeout rate, walk rate, and boom, you’re done. Another nice thing about this is that you can then do your more advanced work just surrounding those two rates in particular. Look at James McDonald. Sure, he had a good K-BB/IP rate at the half-way mark, but his first-strike percentage was not good, and that predicted a worse walk rate in the second half. That would probably be the best mix of efficiency and efficacy, and it would have worked just as well to tell you that while Tim Lincecum was going to be better in the second half, he wasn’t likely to suddenly have a better walk rate or be the Timmeh of old.

If you make a simple sauce, it’s easy to evaluate the ingredients. The more complicated the sauce, the more likely you’re left wondering which input was the spoiled one. Everything we needed to know about pitching we learned in the kitchen, it seems.