Many of you already versed in this stuff will know about Ted Knutson’s excellent Stat Definitions guide. I don’t want to add much to that (bookmark/pocket it if you haven’t already), but I want to give my own two cents on each one. Keep in mind this is entirely subjective and so you may want to clarify your own concerns in the place where one traditionally does that sort of thing.
This post is dedicated to the guy who said I never talked about what analytics actually does. I also want to stress that most of the data below, if not all of it, comes from shot statistics. That’s it. An easy-to-measure bit of data.
1. Total Shots Ratio
What is it? The measure of a team’s ability to take more shots than their opponents on a consistent basis.
How is it calculated? It’s a Pythagorean expression of shots, so TSR = (Total shots for)x/(Total shots for + total shots against). Anyone can calculate it. It’s easy! It should be noted too that this is often expressed in different way, including percentages and ‘Shot Dominance.’
What is it for? TSR is a really good metric to get a basic idea of the underlying strength of a particularly team. To quote its developer James Grayson, “Not only does TSR correlate well with the number of points a team scores in a given season (R^2 = 0.66), but it also is an incredibly repeatable metric.” If you want a rough idea of what it looks like in practice, take a gander at these tables. You’ll note a pretty clear general correlation between shots ratio and position, with some interesting exceptions. Like most stats, it’s best used as an anchor against other measurements. Users should also keep in mind that while the points correlation is strong, it’s not absolute.
What isn’t it for? It’s not clear if TSR is applicable yet across all possible worlds. Based on some highly informal work on my own, I didn’t notice a strong correlation in MLS between TSR and points totals, though schedule imbalance may have played a major part, in that. And, like most stats, it’s probably best avoided in isolation to make any claims about how a team will do. See Spurs this season…which brings us to…
What is it? A team’s shot percentage plus its save percentage. It varies quite a bit over a short period of time, which means it’s a good way to measure a team’s relative fortune/misfortune.
How is it calculated? I’m just going to follow Knutson and quote Grayson directly:
“PDO is the sum of a teams shooting percentage (shots/shots on target) and its save percentage (saves/shots on target against). It treats each shot as having an equal chance of being scored – regardless of location, the shooter, or the identity or position of the ‘keeper and any defenders. Despite this obvious shortcoming it regresses heavily towards the mean – meaning that it has a large luck component. In fact, over the course of a Premiership season, the distance a teams PDO is from 1000 is ~60% luck.”
What is it for? Generally to see whether a team is overperforming or underperforming based on other metrics, like TSR. One way to use it is to look at the ten previous games for a struggling team, calculate their PDO, and then compare it their TSR to get a picture of whether the team might be enduring an unlucky streak.
What isn’t it for? Making definitive claims. See the drawbacks mentioned by Grayson. PDO can be distorted by several things over the short term, including shot locations (perhaps the team’s shot percentage dropped because they were incapable of getting into optimal shooting positions). And PDO isn’t as luck-based as hockey, particularly as teams on the higher end of the table post consistently high PDOs. It’s best used as part of an initial diagnosis, before moving on to look at other things.
What is it? Expected Goals. This is usually someone’s secret formula in calculating the number of goals a player or team was ‘expected’ to score based on league averages (hopefully with a nice big fat sample) isolating for shot location, shot type, body to goal frame data, distance, whatever.
How is it calcuated? Dunno because the people that use keep their inputs to themselves. You can make your own if you like! It cleans up a lot of the dirt in misleading stats like shot percentages or conversion rates.
What is it for? Getting a good idea of whether a team or player is under or over performing based on the league average. It may end up being really important down the line, as Daniel Altman demonstrated in a pretty integral post today on the subject. Some analysts believe this metric is the future, a way of isolating for shot quality.
What isn’t it for? Well, here’s my problem with this metric. You look at a player for a particular season, you see they have a low ExpG rate based on where they took their shots, what kind of shots they were etc. etc. But what kind of sample are we talking about here over what period of time? Does the ExpG for each player fluctuate each year, reflecting a certain amount of luck involved? (It apparently does). I think ExpG will have a lot of useful applications, but I’d like to see some of these concerns allayed.
4. Game States
What is it? Basically a specific set of effects on how team’s behave when tied, a goal ahead, a goal behind, etc.
How is it calculated? It’s not. It’s a tendency that shows up in broad samples. The tendency is for team’s leading by a goal to shoot less often but with a higher shot conversion rate, while a team behind a goal will shoot more often but less accurately. It’s pretty cool. You can read about it here.
What is it for? Game states can help explain why certain metrics fail to give the whole picture. For example, Man United in 2012-13 had a low TSR but won the league with a very respectable points total. There were several explanations for this, and one of them included United’s league-leading time spent at +1 GS, which would have dropped their shot dominance a bit.
What isn’t it for? Probably use in single game analysis. Also, Altman again proposes an interesting tweak to Games States to look at game ‘paths’ instead, the difference in transition from one state to the other, in the first half and the second half. Something to look at in the future.
There are some others besides but these are the ones in most common use at the moment. Maybe next week I’ll throw in some others…