100 – Arsene Wenger has just texted 100 people asking for the phone number of the Ukrainian goalkeeper. Howler.
— OptaJoke (@OptaJoke) June 19, 2012
Just as there is a lot of confusion in football over the excessive (mis)use of the term “Moneyball” (“I think it means teams that buy good players that are cheap to win, right?”), so is there confusion on what exactly constitutes useful statistics in soccer.
Examples abound in these Euros, from misunderstanding the importance of sample size in using one or two key metrics in understanding a particular performance, to using possession percentages or interception rates without making reference to a team’s tactical approach, to relying on historical results dating back more than a decade in order to yield predictive data over the group or quarterfinal stages.
As I’ve written in these digital pages before, part of the problem is that we’re at a relatively early phase in the conventional use and understanding of statistical data in football. But one of the biggest hindrances to average football fans getting a better understanding of the best way to interpret and understand the importance of shots-per-goal ratios or the meaning and context of a key pass is the popular medium in which this information is usually disclosed: Twitter.
Twitter, like many social media platforms when they first emerge, is often blamed for a whole host of ills, usually involving the deterioration of the written word. More often than not these charges are anecdotal and unfair. For example, for all the huffing and puffing from some literary types on how Twitter is shortening attention spans, little air time is given to how the character restriction can often sharpen a verbose writer’s editing skills (guilty!).
But one area where Twitter does have a negative effect is in football analytics. This may seem like a minor point, but it’s an important one. To give you an example of what I’m talking about, here is a real Tweet (I won’t name the author but this is hardly atypical across the football writing spectrum) I read this morning:
Greece Stats: 19 shots conceded pg (3rd), 21.7 interceptions pg (1st), 16.3 tackles pg (9th), 10 aerial duels won pg (1st).
(pg is per game, of course). There is some very interesting information here, but devoid of context, it’s hard to determine its intrinsic value in making predictive statements about Greece as a team. First, how do these metrics compare to other nations in Euro 2012? Is 19 shots conceded per game particularly bad? Is 10 aerial duels won good?
Moreover, if this data is taken from Greece’s first three group stage matches alone, doesn’t the small sample size affect the predictive quality of these figures? Surely the tactics and quality of Greece’s opponents would affect this data. A more aggressive opponent playing higher up the pitch would arguably increase the average Greek tackles in the short term, for example.
Additionally, these figures are also intimately linked with Greece’s style of play. The number of interceptions per game can tell you a lot about a team, but only in the context of their general tactical approach. A team with fewer tackles but a higher rate of interceptions in the course of a game would arguably be more positionally disciplined in defence, but isn’t an absolute measure of quality.
Unfortunately, Twitter as a medium doesn’t offer the space for this kind of nuance without writing six or seven Tweets in a row (bad etiquette, as I’ve learned). But these short, data-strewn tweets have the aura of authority. The temptation is for the follower to read the numbers as absolute fact and move on.
And there is another, slightly different problem with stats tweets, one that was unfortunately first perpetuated by the stats firms themselves. I’m speaking here of the now very familiar “Opta Joe” style accounts which trade in short, interesting bits of statistical information, mostly isolated curiosities.
The vast majority of these Tweets are interesting, and informative. Here’s a recent example:
17 – There have now been 17 headed goals at Euro 2012, the most of any European Championships finals. Leap
— Opta Sports (@OptaJoe) June 19, 2012
This is stat as “bauble,” a little keepsake one can throw at the end of a match report under the category “Things that make you say hmmm”. But while this is information is great, its existence on Twitter as an isolated 140-character ‘microblog’ (as Twitter once dubbed itself) often prevents any follow-up discussion.
For example, why have there been so many headers? Are they primarily arising from set-pieces? Or open-play crosses? Is this an indication, as a few analysts have speculated, of the return of the traditional centre-forward to the European stage? Or is it simply a curious coincidence?
Yes, there are blogs for this kind of thing, but we’ve yet to see this kind of analysis in mainstream media reports (which still eat up most of the public eyeballs), save for Michael Cox’s columns for the Guardian. Soccer stats currently enjoy the most exposure on twitter, and that short-form medium has led to abuse of data, and the prevention of its transition toward information (calling back StatsZone developer Colm McMullan’s referral to the DIKW hierarchy).
Here for example is a popular Reddit threat on Theo Walcott’s game contribution against Ukraine, showing one lowly successful pass.
As hilarious as it is, it doesn’t take into consideration that Walcott was subbed on in the 70th minute, or how generally poor England was on the right side throughout. Here for example is England’s total pass activity in the final twenty minutes when Walcott played. Walcott’s pass was one of only three England completed in the final twenty minutes on the right flank of the Ukraine half. These are James Milner’s total game actions for the preceding 70 minutes before Theo came on. That includes 9 successful passes out of 13 attempted.
But again, even these data points provoke more questions. What was England’s tactical approach throughout? Did they deliberately play through the left hand side because of a weakness in Ukraine? Was Ukraine successful in marking out England’s right flank?
The conclusion here is that single stats alone don’t just not tell the whole story; devoid of extensive, Twitter-unfriendly context, they’re deceiving. The temptation to flash a few numbers to a waiting audience without follow-up must be avoided. Better to write a lengthy, hundred thousand-character blog post and tweet the link instead.



Even after showing a relation between raw data and a tactical approach, what is the conclusion? What purpose did it serve? It’s a means to its own end. Not all stats, but most basically say; “Through these stats I can show that these stats show what I said they’d show, i.e. I was right”
The purpose is to promote a culture in which blithe, subjective statements about the game of football are supported by some measure of evidence. That can only be a good thing.
I certainly agree that these 140 stat attacks are fascinating yet flawed but, even in a larger framework, if those stats are reliant on context then they surely become as open to subjective opinion as anything else?
The question “What was England’s tactical approach throughout?” (for example) could have a number of different answers depending on who you ask, making the interpretation of what the related numbers mean equally varied.
I’m coming round to the point fo view that such stats are great for provoking debate but not so good for resolving it.