Manchester United v Tottenham Hotspur - Barclays Premier League

So another terrible article on the bane of analytics made its way to a newspaper website that a I generally quite like. Which isn’t so bad, except it got lots of approval from people who should know better.

It does make an important point (one that has been better made elsewhere) about people reducing games or players to a single set of context dependent numbers, which is pretty stupid. The problem is that it extrapolates from stats abuse into declarative and wholly wrong statements about analytics, which is something else entirely.

First, let me remind everyone that statistics and analytics are not the same things. Let me repeat this point: STATISTICS AND ANALYTICS ARE NOT THE SAME. This should be self-evident, and yet it gets clouded over once in a while by people constructing beautiful polemics to nowhere.

Let me take the Merriam-Webster definition of the word ‘statistic’:

a number that represents a piece of information

So let me now blow your mind and tell you that most of what football is is statistics. Last night Manchester City played Chelsea in the Premier League. The outcome, a 0-1 victory for Chelsea, was determined by a statistic: which team scored the most goals. In winning the match, Chelsea earned three points in the league table: a statistic that will eventually determine the league champion. Some of the players assembled on both teams were in part purchased with great sums of money (the amounts are statistics) based on their statistics, like number of goals scored or number of first team matches played with their previous clubs. We all follow these statistics closely, and we all “deserve football.”

The winner of the Premier League won’t walk listlessly around the field worrying about the context of their three point wins all year, many of which came from fortuitous results. Nor will there be angry young men on the sideline writing op-eds against the recording of goals and wins and team points totals in order to preserve the “romance” of the game. No, they will celebrate like drunken lords, and so will some of you.

So our general concern isn’t with statistics per se, but the use of statistics to make claims they don’t have the power to make. The author gives an excellent example:

In the aftermath of November’s fixture between Tottenham and Manchester United, a remarkable match report appeared on the Squawka website – an organisation that claim to have built their own “scientific player performance scoring algorithm”. Their analysis of the game certainly featured some unique conclusions.

The author argued that United won the midfield battle because Phil Jones and Tom Cleverley had successfully completed all of their tackles (three out of three for Jones, two out two for Cleverley). However, anyone who saw the match immediately noticed a fundamental flaw in this analysis: Cleverley was at fault for Spurs’ first goal. There wasn’t a statistic for “being done by your opponent” so Cleverley completed the game with an unblemished record.

Without context statistics are meaningless.

This last sentence may appear profound, but it’s redundant. Of course statistics are meaningless without context, because football isn’t a collection of measurements but rather a ninety minute game played by two sides of eleven human beings trying to get a ball into the opposition net. Just as words are not the essence of the things they represent, neither are statistics the essence of football. Jozy Altidore scored 31 goals for AZ last season and has scored 2 for Sunderland this season. Chelsea scored more goals than Barcelona in the 2011-12 Champions League semis, but few would say Chelsea was a superior team. Tom Cleverley was at fault for United’s first goal, but he may have positively contributed to United’s performance in ways that ball-following pundits may have missed (he probably didn’t though). Chelsea beat City twice this season but City may still win the Premier League. Football is inscrutable.

Analytics takes this inscrutability as a given. It serves to test the degree to which various statistics (volume of shots) correlate to other statistics (points totals). And in this process, it blows up a lot of the assumptions that understandably piss off people like our Guardian writer. Like the importance of successfully completing tackles per se in winning football matches.

Put another way, analytics is about absence where statistics (and the bulk of essentialist football commentary) are about presence. Analytics tests our assumptions about what certain numbers mean with regard to football. It is a tool for opening up doubt where blithe certainty used to be. It makes no absolute claims over ideal tactics or players. And if analytics work does note a strong correlation between two different metrics, it is always one of degree. Teams that regularly outshoot opponents tend to get more points in the league table. Shot conversion rates tend to be heavily influenced by random variation. From here it can offer profound insight to any team in separating signal from noise. Analytics lets you say “I don’t know” in good conscience, whether to a stat or a cliche.

If people were willing to sniff around a little, they’d realize that analytics is an ally in the fight against stats abuse. Analytics questions whether we have the right to simply point at a set of numbers and say this means that,. In doing so, it goes further than any of these tortured romantics in preserving football’s essential mystery.