Hockey’s counting problem

alberts orr

So the analytics movement has gone mainstream. I’ve seen references to the website Behind the Net in several major daily newspapers over the last two weeks. And now, Elliotte Friedman wrote a fairly popular blog post on the subject at CBC. His work itself has a line or two that I’d quibble with, but it’s pretty enlightening about the state of analytics in hockey:

The biggest problem for the NHL is the sport just doesn’t have the statistical bent of others.

“We are third, behind baseball and basketball,” [Washington assistant GM Don] Fishman said.

So teams are creating their own. Because there is no consensus, they are notoriously secretive. One thing I believe some teams do is remove “second assists” from players and see how many points are left over. But good luck trying to confirm that.

I find a lot of useful information in hockey has made its way through the Internet. A lot of the statistical bloggers who have been hired to do some consulting work for NHL teams still post quality information on their websites and it’s not like the whole project is going dark.

That said, there are some teams that I know dabble in analytics who come up with some confusing decisions. The Vancouver Canucks, for instance, who have attended the Sloan conference for the last four seasons have tried to make it through the first half of the short season with a single NHL-tested top nine centreman. The Calgary Flames, one of four teams that lists an analytical director as a staff member on the team page, have made an unending number of bizarre offseason decisions resulting in consecutive playoff misses, and on their way for a third. The Buffalo Sabres are the second worst team in the Eastern Conference and fired their coach for the first time in a generation this season.

This isn’t to say that analytics make teams perform poorly, but part of the advantage with having peer-reviewed research in the public sphere is that other people can check your work for inaccuracies. I can imagine a huddled room of Montreal Canadiens executives coming up with the basis for the “plus-minus” statistic, and not realizing there was nothing to devise from the information until years later.

Or not. I’ll admit that being pretty young, I have no memory of anything before the free agency era, and every player acquisition I paid attention to had some sort of number attached to it. X goals, Y assists, Z penalty minutes, whatever it was. I’ve written before about for how much information released by the NHL, so little of it is actually useful. A lot of people suggest that you can’t build a team with just numbers and, I guess until about 10 years ago, that was because there was so little reputable publicly-available information that was useful. I think times have changed to the point where even if a computer couldn’t build a team for you, it could give you some damn good ideas.

One of the things I’ve seen is that it can take a couple of years for a player’s true talent to finally begin to be noted. Players like David Booth, Scott Gomez, Daniel Winnik, Michael Frolik or Patrick Dwyer are useful players if you’re patient enough to wait for pucks to go into the net. Part of the problem with hedging bets is that it can take years for a bet to really pay out, because so much of what happens over the short-term can be attributed to variance.

Mark Cuban said this at Sloan:

“We take input from everybody, but in the end, the GM and owner have to make the final decisions. A lot of it is how you evaluate players organizationally. It’s not about any one individual, it’s what the organization is thinking.

“A lot of GMs measure their own mortality relative to their job. If they feel they’re at risk, they’ll make different decisions than if they feel safe. That’s typical in any job. People want to keep their jobs. Man loves hierarchy. GMs want to feel safe and have longevity, and hopefully they also want to win championships. If he feels there is a risk of losing his job, he’ll behave differently than if there’s no chance he’ll lose his job.”

Basically what you have is teams struggling with concepts they can’t prove, all while working within a very narrow window to succeed. I can imagine if I were a team owner and for two years I was paying money for an analytics department to wrestle with the concept of zone entries while the team lost games on the ice and nobody was paying for tickets, I wouldn’t have too much patience with the direction of the club. The work done by Eric, Geoff, Robert and Corey on entries took a lot of hard work and dedication, and they made a pretty good breakthrough in hockey research. What helped, though, was being able to reach out to the online community for help with tracking individual games and coming up with accurate results.

Finally, baseball guy Voros McCracken, who created “defence-independent pitching statistics” which judged pitchers based on things they could control: homeruns, walks and strikeouts:

“Just because everyone knows OBP is important doesn’t mean OBP isn’t important. Just because we learned something a long time ago doesn’t mean we should unlearn it. We should keep it and add to it. There are a lot of people who are itching to do the next new thing. That’s great, it’s just that mindset can cause you to forget some of the basics.

“You might say to yourself, ‘I want a stat that can measure this.’ Then video technology comes out and gives you the stat you wanted to measure. There is a tendency to think, “Ooh, I’ve been waiting for this, and now I’ve got it, and it’s the greatest stat in the world.” But you haven’t even looked at it yet. You haven’t looked at what it actually says — what its weaknesses are. There’s a hazard there. You want to know more things than your competition. What you don’t want is to know something your competition doesn’t, and it’s wrong. If everybody is wrong about something it doesn’t hurt you too bad, but if you’re the only one, you have 29 teams taking advantage of your mistake.

(That’s from the same Fangraphs link as the Cuban quote)

There’s a reference in the Friedman piece to Craig MacTavish walking around looking for the “Aha!” moment when it comes to hockey analytics. I don’t think MacTavish has realized that half the hockey world is a step ahead of him in that regard. The “Aha!” moment comes when you realize that shots are a hell of a lot more predictive than goals for determining future events. As soon as you realize that hockey is a game between two teams trying to take shots on goal, I think the rest of it falls into place. Shooting percentages, PDO, “clutch” performances in the playoffs… once you get over the hurdle by recognizing that shots are a more sustainable currency than goals, lots of new concepts make sense.

To apply that to McCraken’s point, say you’re a coach who is looking to play the defenceman in his lineup who is the best at exiting the zone, and he has to pick between Andrew Alberts and Albert Andrews. The data shows that Andrew Alberts is the best at getting the puck out of his zone, so the coach sends Albert Andrews to the press box.

What the coach doesn’t see in zone exit data is that his team can’t get shots on goal when Alberts is on the ice. Regardless of how good he is clearing the zone, the team generates far fewer shots relative to the opposition. Shot differential statistics are good because they paint a good picture of what happened to a player when he was on the ice, and microstats are good because they can break up the ‘team’ components of differential stats and show an individual contribution, but I don’t think you want to get carried away by any one particular micro statistic. Some team, for instance, is going to sign the face-off proficient Tyler Bozak to a big money contract this offseason and get burned on the return.

The limitations of numbers as they apply to hockey aren’t so much the “you can’t track a fast-moving game” or “you can’t track leadership” strawmen arguments. It’s moreso that in the rush to obtain information, teams are preventing others from seeing what they’re doing, but at the same time preventing others from adding to their work. I’m unfamiliar with soccer analytics, but I’m thinking along the lines of Manchester City releasing a season’s worth of data to the public. The longer information stays behind imaginary lines, I think the less chance there is of something useful coming of it. It takes patience though, and presumably a lot of people, not only to accumulate the data, but to make something of it, and to have smart people in the background making sure you’re doing it right.