## The State of Analytics: Too many soccer analytics posts lack Statistical Power

I mentioned at the start of these columns I know next to nothing about even basic statistical science, but it appears if I or indeed any of us are going to help in moving forward on developing useful metrics in gaining a better understanding of best practices in football, that might have to change.

The reason is that one of the major misunderstandings among amateur soccer analytics writers is the importance of Statistical Power, generally determined by an accurate and useful sample size. The name of the game is eliminating the possibility of Type I and II errors, which involve ruling out a null hypothesis (no statistical correlation between factor X and result Y).

Now, as of writing, I can only use common sense in determining which conclusions can safely be drawn within a particular sample size and which conclusions cannot, and there will come a time when that alone won’t be good enough for me or anybody purporting to be an amateur soccer stats person. But it’s clear some work being done under the guise of meaningful analysis of soccer statistics is woefully inaccurate.

The amateur soccer analytics community is still fairly small, so I don’t want to rustle too many feathers. Here is one example from the EPL Index. It contains a sample error so basic, it’s a wonder this was ever published:

Using EPL Index Opta Stats we can analyse Cole and Carroll’s performances this season. We’ll start by looking at how many games they’ve played and how much pitch time they’ve accumulated.

Cole’s extra minutes are because Carroll joined West Ham late and suffered a hamstring injury.

From this data, the author goes on to compare pass completion statistics, aerial’s won, and possession statistics in order to draw broad conclusions about the value of the two players to Sam Allardyce’s West Ham.

Now, the author’s conclusions about Carroll’s skill in comparison to that of Carlton Cole are probably correct, but these statistics don’t qualify as evidence. You simply cannot take an entirely unrepresentative sample size (TWO starts for Carroll!) to make extraordinary claims about a player’s comparative value. The playing time here is not large enough to correct for position, type of opposition, Allardyce’s tactical instructions on the day, whether or not Carroll’s and Cole’s role remained static, were interchangeable, or different. Moreover, the lopsided sample size for Cole compared to Carroll further make the conclusions inaccurate, regardless of the “per minute” distinctions, which involve a total playing time so small as to be practically meaningless.

This article isn’t an outlier. Less egregious forms of these errors seem to occur in even the most cursory analytics articles, or tactics pieces that attempt to use measurements with no statistical power to draw conclusions.

One of the more common defenses for this kind of abuse is the ‘author caveat,’ in which a writer will defend the general lack of statistical power with a warning—e.g. “We should be careful not to read too much into this apparent correlation”—before they dive headlong in and do just that. It’s hard to be charitable to this approach, particularly when the “new information” in the piece is drawn from a sample group so paltry it tells us absolutely nothing.

Not all analytics writers are guilty of this, and many have long corrected for reversion to the mean to determine just what team behaviours tend to produce more goals over the long-term. Nor should tactics writers be afraid to look at in-game data to make observations about the tactical approach of a given team, particularly if they’re backed up by the manager himself (that said, there is still a danger of the post hoc fallacy rearing its head from limited data).

Of course, this is sports, and moreover, it’s a team invasion sport. It’s all impossibly vague, so why not just use the data to tell us what we already know? This approach however simply plays into the hands of those who would sneer at analytics as an odious bit of Americanism, rather than the future of football.

For dumb ass journalists who don’t want to be caught out on this sort of thing, a resource.

1. Agreed, a lot of people peddling “statistics” related to football these days are really have no idea what they are talking about. There, I said it. Not that I’m going to claim to be a statistical genius, but a stats 101 course and some common sense can quite easily punch holes in a lot of claims.

BTW, that’s not a knock on the analytics blogs you’ve linked to (I’ve never read them, so can’t comment) but some of the stuff I’ve seen from non-statistical people blogging or what not is atrocious.

• Some of these inaccuracies leak into the general discourse, can be a bit maddening.

2. Richard, the point about considering statistical power cannot be made often enough. Well done. Do want to add that the line about reducing Type 1 and Type 2 errors is somewhat misleading. A) you never eliminate either, and B) the two move in opposition to each other, so all things equal, decreasing the likelihood of one will increase the likelihood of the other.

And to Alex, I would add that while I do agree that stats 101 is nice, it can be more of a problem than a solution. With the advent of point and click programs like SPSS, it is all too easy to perform statistical analyses without actually thinking about what you are doing. Therefore a little knowledge often can be more damaging than no knowledge. It’s not that most relevant statistical concepts are beyond the grasp of anyone reading this; they’re not. It’s just that statistical analysis is like scuba diving: anyone can assemble the tools and do the activity with almost no training whatsoever, the trick is knowing what to do when something goes wrong.

3. Hi Richard

Two points in response to your article.

1) There is no “sample error” in my article. The sample size we have used is a small one, but that does not constitute an error. As I’m sure you’re aware, there is no moment where, all of a sudden, the sample becomes suitably large. All samples are, to some extent, imperfectly representative of the data set.

2) Extending the sample size would have affected the topicality of the article. I could have data-mined back in time to my heart’s content, but games played even a few months ago can be so materially different as to affect the credibility of the analysis. Using a small, recent sample of games allows a comparison of players playing in similar teams, with similar teammates, within the same season. This is something with which everyone conducting statistical analysis must wrestle – every time a sample is chosen, it must balance the twin (and opposing) demands of size and topicality.

To conclude, there are indeed few data points, but not so few as to render the analysis “practically meaningless”.

Reece Haines-Aubert

• This was your stated intention in the article: to “investigate the differences between the two West Ham front men so we can examine why Carroll is doing so much better than Cole.”

Based on this rather ambitious goal, sample size certainly does matter for the reasons I laid out in the article. 176 minutes of discrete events is simply not a good indicator of a player’s absolute ability in comparison with similar position player.

Why? Perhaps on the day Carroll played, his wingers were facing inferior defense and were better able to be in positions to send in better crosses, or played against a high-attacking team, thereby suiting a more counterattacking approach which would have seen more long-balls sent to the player.

This might explain 90 of those minutes for example, which were against Arsenal as part of a 3-1 defeat. So to say that Andy Carroll “it’s clear to see that Carroll has been far more effective than Cole” is skewed by individual circumstances of one particular game.

It’s fine to use small examples, for example in drawing on evidence to make a point about tactics on the day. But that’s not what your goal was. Remember, it is, in your words, to “investigate the differences between the two West Ham front men so we can examine why Carroll is doing so much better than Cole.”

You attempt to draw absolute conclusions about the two players’ individual abilities from a very limited sample, and that’s why it was a poor article.