In my first year of university, my friend Jeff and I would usually begin the night by playing a few games of chess while having a few beers. Chess is a great game, but requires patience and thinking that doesn’t really suit the university dorm decorum. Just two beers can really change the course of a game, as we generally got a little worse as the night went along.
Never really thought that those nights would become the introduction to a blog post six years later, but I think there was a lot to learn from those nights. Even when I beat Jeff at chess, sober or not, I always knew it was an aberration. He won about 65% of the games we played, but since we played so many times, it’s very likely that there were samples of games that I beat him cleanly, like six-out-of-ten or eight-out-of-fifteen. If I played a friend Rob three times and beat him on each occasion, we’d assume I was a much better player, but it could be a problem with the sample. Jeff and I have played dozens of games against each other and it wasn’t until about halfway through that year when I figured that he was a much better player than I was, or at least his game better matched up against mine than mine did his.
I still have the chessboard, and we’ll play a game or two when one of us visits. I may not be able to tell you his particular strategy off the top of my head, but I bet when we play this week I’ll subliminally remember that he likes to protect his king early, or use knights for discovered checks. If I played against anybody else, my mind wouldn’t be as tuned to strengths or weaknesses, which could be a disadvantage, until you consider that the new opponent also hasn’t seen my style, or my own strengths and weaknesses.
Chess, unlike hockey, is quite pure. It doesn’t play to the whims of the bouncing pucks and bad ice. Even when the circumstances are equal in terms of sobriety and wakefulness the better player isn’t going to win every game. You always have to factor in sample size and regression even in chess, with the game being determined almost entirely by skill.
Anybody can appreciate the difference between chess and hockey when it comes to the luck element. We’ve seen bad hops and hot goaltenders and crazy deflections. With so many more moving pieces, so much more can go wrong for a right strategy, but the inverse is that so much can go right for a team with a poor strategy. This is sort of what the hockey blogosphere has been in uproar about this week, in relation to Greg Cronin and his comments about quality possessions.
By now, everybody’s seen the comments and they’ve been broken down several ways. Cronin talks about teams that take a lot of shots as if they’re teams that don’t generate any quality shots. His example uses extremes, but his extremes don’t exist in reality. While Cronin says that “shots can be a misleading stat”, the five teams that have taken the highest percentage of shot attempts in the last three seasons are Detroit, Los Angeles, Chicago, Boston and St. Louis. The five teams that have taken the lowest percentage are the New York Islanders, Toronto, Anaheim, Edmonton and Minnesota. Which group of five teams would you say has performed better since the 2010-2011 season?
I discussed shot quality on Friday, and using out-of-sample data, made the point that all teams are susceptible to regression to the mean. Teams that shot a combined 9% in Year 1 shot 8% the next season, while teams that shot 6.8% in Year 1 shot 7.8% the next. This doesn’t mean that all team shoots equally and that everybody will catch up, it just means that it’s something to take into consideration when predicting how your team will perform the next season.
“Feature, or Bug” is a concept explored by Nate Silver in the chess portion of his book The Signal and the Noise about making more accurate forecasts. The programmers that build Deep Blue, the system designed to defeat grandmaster Gary Kasparov, had to be tweaked endlessly. It was never perfect, although Kasparov’s fatal flaw was assuming it was.
In fact, the bug was anything but unfortunate for Deep Blue: it was likely what allowed the computer to beat Kasparov. In the popular recounting or Kasparov’s match against Deep Blue, it was the second game in which his problems originated—when he had made the almost unprecedented error of forfeiting a position that he could probably have drawn. But what had inspired Kasparov to commit this mistake? His anxiety over Deep Blue’s forty-fourth move in the first game—the move in which the computer had moved its rook for no apparent purpose. Kasparov had concluded that the counterintuitive play must be a sign of superior intelligence. He had never considered that it was simply a bug.
Silver’s “general advice, in the broader context of forecasting, is to lean heavily toward the ‘bug’ interpretation when [the] model produces an unexpected or hard-to-explain result.”
In this case, when looking to find out “why did the Toronto Maple Leafs make the playoffs last season” the first instinct for Cronin, who has a vested interest in chalking the Leafs’ high shooting percentage up to good coaching, is to suggest that the coaching style was behind the team’s high shooting percentage.
It’s very difficult to predict hockey, and you can barely forecast a team’s upcoming season just by looking at their goal rate (GF%, the rate of total even strength goals that went for the club in question) the previous year, and their shots rate is no more help. The fact is that 84% of teams generally score and allow as many goals as they should given their shots rate, and their improvement or non-improvement is difficult to quantify until they’ve actually taken to the ice. Much of our work online is done with the remaining 16% and determining whether their lack of success is a feature (the New Jersey Devils’ inability to find offensive players or goaltending) or a bug (Sergei Bobrovsky’s .941 even strength save percentage).
As for outliers in PDO, the addition of shooting and save percentage, and for outliers in Corsi, the shots for and against metric we’re using, it’s preferable to be on the high-end outlier in Corsi when looking ahead to the next season. Again, using the raw data available at Hockey Analysis year-by-year between 2007 and 2012, I sorted 150 team seasons into five 30-team “buckets”, in descending order
|Year 1 PDO||Year 1 GF%||Year 2 GF%|
While the teams that had the highest PDO had a tonne of goals one year, they didn’t keep it the next. They were still the best teams out of the sample, but their advantage was curbed. The smaller than 3% difference in goals for rate from Year 1 to Year 2 makes a difference of about 8 even strength goals, which doesn’t seem like a whole lot, but it will probably push a bubble team into the playoffs and push a team from the playoffs into the bubble.
Do the same thing for Corsi For rate (found at Hockey Analysis under CF%), and again, five bucket of teams in descending order:
|Year 1 CF%||Year 1 GF%||Year 2 GF%|
Teams that had the highest Corsi rates one year actually scored more goals the following season. The distribution isn’t as random as it seems for our PDO chart, since we’re dealing with a real look at quality teams. The team that should win 52% of its games won’t always win 52% of its games, but put 30 teams that should win between 51-53% of their games together and they’ll probably combine to win 52%. You can only figure out these rates with a real big sample.
But it’s the Year 1 GF% column that has me interested. The variation between the top of the chart and the bottom isn’t as strong as it is in the PDO graph. It’s nature’s way of warning us against outlying percentages. Corsi, for all its flaws, stabilizes faster than shooting percentage and goal rates and PDO. I think that coaches and managers have to be more willing to expect that there are matters that can’t be predicted or coached that determine outcomes. If machines can be fallible, then so can humans, especially when we have to deal with raw emotion and sometimes booze.
In the end, the Leafs’ acquisition of seven years of David Clarkson and five years of Tyler Bozak and trying to upgrade in goal are all the results of being very confident in what the surface data is telling us about the 2013 Toronto Maple Leafs. I wouldn’t say their failure next year is a sure thing, but I wouldn’t bet on them to make it back to the playoffs.
Concluding thought from Mr. Silver:
We have big brains, but we live in an incomprehensibly large universe. The virtue in thinking probabilistically is that you will force yourself to stop and smell the data—slow down, and consider the imperfections in your thinking. Over time, you should find that this makes your decision making better.
Corsi isn’t perfect, but I think it’s illuminating in certain cases during the offseason.