Joe Allen (left) and Leon Britton tag-team Park Ji-Sung

The last column took a look at some of the fairly rudimentary primary source statistics sites available for free to the “casual fan” (in football, I’ve found there’s really no such thing). Yet as FourFourTwo Stats Zone app developer Colm McMullan mentioned last week, one of the main issues with soccer analytics is that we’re still largely at the “data” phase, and we still have yet to progress to the “knowledge” part, i.e. a hard consensus on the best way to employ soccer statistics. We have oodles of ready-at-hand metrics at our disposable—the question is, how to use them?

Unlike baseball, in which sabermetric formulas identify “good” players beyond our intuitive sense (and ERA, batting averages and RBIs), most of the time in football (with some very interesting exceptions) statistics are mere sign posts that either confirm or deny what we already know about the sport. They offer an added dimension to traditional game analysis. But that doesn’t mean they’re a mere gimmick or needless bauble.

That’s because for decades (and to this day), football match reports require a great deal of faith from the reader in the sportswriter who will breathlessly tell us who was the “man of the match,” which central defender was “stalwart,” which young midfielder “stood out,” all without a jot of supporting evidence. Here are some examples from match reports or previews taken from today:

-A player “was not able to deliver anything truly threatening.”

-A player’s “greatest strength is his ability to finish”

-A player is “coming off a season of questionable form.”

There is nothing demonstrably false about any of these statements, but when the entire edifice of mainstream soccer analysis rests on the subjective opinions of a small handful of appointed scribes (and the biases of a hundred thousands blogs, forums, Facebook pages, etc), it allows for blind spots to emerge. That’s why it’s important for more and more fans, writers, and bloggers to feel comfortable using the data. And that means taking risks that could be subject to open critique and revision.

Common Sense

As Michael Cox—editor of Zonal Marking and frequent user of pass graphs and certain statistical measures in his reports—put it, there is no single definitive stat in football one could point to and say the higher the number, the better the player (save for goals perhaps, but even that cannot be divorced from team/league context). Individual statistics cannot often be divorced from the player’s specific role within the team, for example. And besides, says Cox, “Some stats are just interesting.”

Take Leon Britton, of whom I wrote a slightly controversial column on last week. I noted his average pass completion rate was 93.5%. But that in and of itself doesn’t mean a whole lot. For one, a player who completes 9 out of only 10 attempted passes per game isn’t going to be very useful for a team. As Cox points out, “total passes attempted is almost more important, because it indicates the player is actively involved, having an impact, and is in good positions to receive the ball” (Britton incidentally ranks third in the team on attempted passes, behind Ashley Williams and Angel Rangel).

One reader pointed out he only makes an average of 1.7 tackles per-game, but again that in of itself doesn’t mean much. His role at Swansea may not be identical to Busquets or a Mascherano, which makes sense as he isn’t the sole holding midfielder. Brendan Rodgers’ team often lined up in a 4-2-3-1, in which Britton was paired with Joe Allen behind the midfield three. Allen’s job seems to be to win back possession and Britton’s job is to cycle the ball. Here for example are Allen’s tackles from Swansea’s 3-2 win against Arsenal, and their 2-1 victory over West Brom, the most made out of both teams. Allen in fact leads the team in average tackles per game.

Note I haven’t done anything particularly groundbreaking here. This kind of analysis can be made more complex either by comparing other individual player metrics, average positions, or their specific tactical roles within the wider team according to Rodgers’ use of “levels” in building his preferred formations. Nor does it contradict any of the general consensus on how Swansea plays.

The major working principle here isn’t a convoluted algorithm, but common sense. Obviously this data can be both used and abused, but as the use though of rudimentary statistics becomes more popular and over time a consensus will emerge on the best way to use it.

Different levels

This isn’t to say that there isn’t a level of football analytics that will eventually yield an “If player X does Y number of Z actions per game, they will increase their team’s chances of winning by Q percent” type formula. It’s likely individual clubs are at the forefront of this type of research and will forever keep it away from prying eyes, but it’s also being done at the grassroots level among the more involved analytics blogs. This kind of algorithm may eventually enter the realm of common knowledge, although even then it will likely depend an awful lot on team/tactical context.

Neither does it mean even the most rudimentary data might not have “Billy Beane” potential to identify less-favoured players. In a recent (must-read) two-part interview with Soccernomics authors Simon Kuper and Stefan Szymanski in Forbes, Kuper cites a few examples:

For example, I am giving a talk in Moscow next week, and Opta sent me a data set on the players in all the leagues they cover with the best shot conversion rate. Of the shots you have on goal how many go in? Well the best in the world is Gonzalo Higuain at 44%. Which in a sense is not very interesting because you know Higuain is a great striker. There are names in the top ten or fifteen that you don’t know. There’s a guy named Seydou Doumbia at CSKA Moscow. He’s scoring on nearly 30% of his shots. Rodrigo Palacio at Genoa, who I must admit I had never heard of, is scoring 30%. Shooting on goals is surely the most watched thing in soccer. It’s hardly a secret. You look at a chart like that and say, “Wow! I might go have a look at guys like that.” In fact, an old friend of mine says that a number of clubs are having a look at those two guys right now.

This kind of thing will always be subject to intense debate, particularly when it comes to more ineffable but no less important factors like experience, leadership, psychology etc. But soccer is no different in that respect than any sport subject to statistical analysis.

The most important point when attempting to use soccer statistics to study football is to avoid simple reductionism. This of course works both ways. Just as it’s true that simply quoting single statistical measures to make any kind of point is a v. bad idea in football (save perhaps for average key passes, says Cox cautiously), it’s equally misleading to reduce football to a simple game of twenty-two guys running after a ball for ninety minutes (at which point Germany wins, goes the old joke), where romance, passion and temperament count for everything.

Writers are often laughed off for doing the former, but the latter has been de rigeur practice in mainstream soccer journalism since forever. It’s up to us to move toward the happy middle…