So the Big Show is nearly upon us, and I thought it might be good to think about what to watch for in football analytics developments for this upcoming domestic season.
The Importance of Statistical Anomalies
The major outlier in the Premier League last season (that I seem to harp on endlessly) was Manchester United’s points total, which was unexpectedly high considering their relatively low Total Shots Ratio. James Grayson has a great definition of TSR here that you should just bloody well bookmark at this rate.
As with all great outliers, the discrepancy provoked a number of important discussions which led to some interesting developments in footy analysis that I will touch on below. While this season might go completely according to plan, it will be interesting to see if the data this season defies expectations in another, equally thought provoking way.
The Rise of Game States
One of the explanations for Manchester United’s low TSR last season came via the idea of how game states—whether a team is at +2, +1, 0, -1, -2 etc.—affect the number of shots a team takes. For example, teams which trail by a single goal tend to take more shots from worse areas, while teams that are a goal ahead take fewer shots but in some cases have a higher sh%. United for example spent more time at +1 than any other team in the Premier League, and this likely lowered their TSR considerably while increasing their sh% and sv%.
There are still many interesting questions GS raises that I’ve yet to see concrete answers to, like: is GS effect tactical in nature, or psychological? And can GS be integrated into TSR to get a better, more accurate picture of how a team is likely to do at the end of the season? Hopefully some of the data sites will include this metric in their team evaluations.
A Move from Analytics Into Tactics
Coming out of the discussion of Game States came an increased recognition that certain teams may be better than others at taking advantage of the space teams sometimes leave open when chasing a one goal deficit. Which obviously raises the question: what are those teams doing that others are not? Does it have to do with individual player talent? Or is it a team tactic?
I have my own ideas, which are completely speculative and might therefore get me trouble. United under Ferguson for example favoured accurate shots on a minimum number of touches, and indeed as we know practiced for it; could this have in some way influenced their impressive shot percentage?
I suspect there will be increased exploration of player positioning too, using X,Y positioning data in a more public way to give us a better sense of whether certain defensive postures increase shot conversion rates for example (pace Ted Knutson). And by ‘suspect’ I really mean ‘hope.’ And from this, we might open up a broader conversation about ideal tactical choices in given circumstances. Tis now only a pipe dream.
Shot Quality Controversy
Right now, Costantinos Chappas has a great post up over at Stats Bomb on Goal Expectation and Efficiency in football. Both Chappas and Colin Trainor have been working with a new metric which compares “expected goals” (based on a secret data recipe) to actual goals for several individual clubs and players. While the authors will not disclose their specific inputs, it’s likely safe to say it involves both average expected conversion rates from certain areas of the pitch and shot placement statistics (generally shots closer to the posts convert at a higher rate). Trainor recently used this metric to rate European football’s top ten young attackers on Stats Bomb.
The major question though is whether these metrics are repeatable for individual players. At the team level, we know that raw shot percentage quickly regresses to the mean, indicating it’s more a function of random variation than skill. So it’s great that Christian Tello got so many shots near the right corner last season, but is this repeatable? Over two season? A career? And how do we isolate shot conversion from the quality of the team in terms of the quality of the final through ball or cross?
Perhaps a kind soul will use this data and run a linear regression to see if any of it holds up. If so, it could be tremendously useful in player evaluation. But the shot quality controversy will only heat up this season.
Soccer Analytics in the Media: Goodbye Moneyball, Hello Advanced Stats
I’ve long held that ice hockey analytics has a lot to teach those interested in soccer analytics. PDO for example was originally a hockey metric, and Grayson’s TSR has a lot in common in some ways with Fenwick and Corsi. It makes a lot of sense; the basic idea of hockey (“He shoots, he scores!”) is the same in football, although in miniature—two teams invade each other’s territory to get an object into the opposing net.
While it’s never a good idea to say one sport will follow the same historical development as another, I think as individual metrics gain traction in mainstream media publications we will see a move away from the “Moneyball” blather that has plagued media coverage of performance analysis for so long. Instead, we might see a public debate over “advanced stats,” similar to the one that continues to rage in hockey circles. Certain media figures will come out in early support, while others will staunchly oppose. My own hunch is that, in contrast to hockey, the opposition to advanced stats will come not from conservative Daily Mail readers, but football’s angry, young and intelligent romantics, several of whom write or wrote for this very blog.
I’m sure Chris Anderson and David Sally’s book The Numbers Game (which I’m set to read this week) will also drive the conversation away from the idea of analytics as a means to buy good players on the cheap. I’ll have more to say on that next week.
The Disappearing Stats Blogs
When the EPL Index announced its data service would be ending due to proprietary reasons laid out by the Premier League, it sent ripples through the soccer analytics community. The Index (to which I was a subscriber) provided some interesting and important information to users unavailable elsewhere. While other sites like Squawka and Who Scored still offer important information to users, there is a sense the data could stop flowing at any moment at the snap of a powerful finger.
The same principle applies in the analytics blogging community. Many talented writers are getting hoovered up by data firms, where they do much of their work behind the scenes. Others must weigh their own professional lives against their interesting hobby, and so post fewer and far between. Still others have come together to start aggregate sites like Stats Bomb which is a very welcome development, and yet even there some metrics are kept private for perfectly good reasons, of course.
Even so, I feel it’s vital that the soccer analytics discussion remain in the public realm whenever possible. The silencing of a few integral voices now could scupper something great. I truly hope this isn’t the trend for next season, and that younger, ambitious voices continue to try their hand at analytics. Which brings me to…
The Re-Education of Yours Truly
In order to provide you with better, more interesting work, I’m going to dedicate my free time this season to going back to school and diving into statistical math and modelling so that I can do some of my own work rather than simply pointing you in the direction of good work done elsewhere. As a philosophy major, I’m good with theory but not so good with praxis, so bear with me. Should be a very fun season around here indeed.