First, going to keep plugging this. The esteemed James Grayson set up this forum based on my analytics post last Friday, and we need more voices to reach critical mass. If you haven’t already, have a look, post something, return, keep returning, and tell your nerdlinger friends.
Second, what is football analytics for?
This week, I’ve spent some time looking at some numbers—particularly Total Shots Ratio, a good, simple, predictive metric which measures a club’s ability to control the ball, and tends to indicate a team’s ability to win over the long term; and PDO, which assesses whether or not a club is performing above or below the general mean, whether due to luck or incredible skill—in order to make the case that it would be foolish in the extreme for Real Madrid to sack Jose Mourinho.
In those pieces, I didn’t pay much attention to rumours of a poor relationship between RM president Florentino Perez or any individual instances of questionable ‘leadership’ from Mourinho, and perhaps more controversially, I didn’t discuss particular games or individual tactical mistakes Mourinho may or may not have made in particular matches that may or may not have led to a negative outcome.
It should be said that any or all of the above factors could, in theory, have a detrimental effect on the team’s ability to continue performing at a high standard over the long-term. At the very least though, Perez owes it to the club’s fans—particularly considering Mourinho’s phenomenal success rate at Madrid when faced with defeating one of the best club sides in footballing history—to make a clear case they would.
Otherwise, Perez could be necessarily putting the club at risk of a dip in form, at least if Barcelona continue to perform to their incredible standard, including a 1 goal to every 4 shots conversion rate.
And so, I ask again, in light of this, what purpose does football analytics serve?
In the minds of most fans, sports analytics is meant to tease out counter-intuitive metrics or algorithms in order to give less-financially able teams an unseen market advantage. This is, of course, the Moneyball approach. However, it’s fair to say that the advantage of any first mover in this regard is short-lived in sports analytics (as it was in baseball, despite the romanticism that still surrounds the 2003 Oakland Athletics). As for football, it’s hard to make a case whether first mover metrics exist at all. If they do, they likely only offer small, in-game dividends that may not even translate into added point totals.
Some others have looked to analytics, particularly statistical absolutes like the Reep Ratio of one goal for every nine shots, in order to make normative statements about the “ideal” way to play football. This is closely related to the Moneyball approach, and for some, it’s the basis of their interest in the field. If statistics once famously revealed that most gridiron teams would be better off ‘going for it’ than punting on a fourth down for example, perhaps there are similar inefficiencies in how most teams go about trying to score goals and prevent goals from being scored.
This area of analytics is controversial and subject to skepticism, for good reason. For one, there is a major risk in complex team invasion sports, which are open ended and free-flowing, of over-generalization—even in broad samples. Large samples drawn exclusively from a particular league or region may be influenced by unseen regional biases based on cultural preferences and tactics. For instance, Spanish sides generally lead Europe in interceptions, and English sides generally lead in tackles. It would therefore be erroneous to draw normative conclusions based on those individual factors for football as a whole.
Even statistical patters that can be observed over a large number of matches accounting for both historical samples and regional differences however do not account for the fact that football is a game of particulars, single matches with determined outcomes played by sides of greatly varying quality. This is one of the major reasons why Charles Reep’s theories came into disrepute. Reep believed that because he ratio of shots to goals was generally fixed, and that goals on average arose from a limited number of passes, teams would do well to beat the odds by punting up the ball to goal as quickly as possible to rack up the shots.
But this is using the general to abuse the particular. As Jonathan Wilson pointed out in his famous criticism of Reep in Inverting the Pyramid, there is evidence from Reep’s own numbers that, at the time he made his measurements, 91.5% of all moves in football involved three or fewer passes. So if goals were more likely to be scored from fewer passes, this percentage should increase. However, 80% of goals in his examples came from three moves or fewer, indicating that goals from three or more passes are actually more efficient. Moreover, Reep fails to account for crucial particulars, including evidence that higher-skilled teams were more likely to score goals from longer passing chains. As Wilson concludes, “That is not to say that direct football is wrong at all times, merely that fundamentalism in tactics tends to be misguided as in any sphere.”
It’s for this reason that those in the analytics field would be wise to avoid normative statements based on some broad statistical correlatives unless they have very strong evidence of its universal application.
Finally, some other fans have looked to soccer analytics as a means to beat betting odds. More power to you, if that’s your interest. But here I think is the closest analytics will come to broad utility.
As I’ve spent more and more time on this subject over the last few months, it’s become clear to me that the single, major advantage of football analytics—particularly those that maintain strong predictive power over a reasonably small sample size—is to provide a clear picture of the relative strength of a certain team over the longish term.
Why is this important?
Well, just as broad samples can be over-generalized to particular matches, so can instances from particular matches be abused when applying them to the long term. I wrote a piece making a simple argument that Real Madrid’s unnaturally low PDO this season, coupled with their impressive TSR, meant they’re possibly due for an improvement in total points. A commenter wrote in response, “The matches against Dortmund were illustrative of the decline the coach is overseeing in the club.” On what evidence though has there been any ‘decline’?
TSR tells us, broadly speaking, whether or not a team is controlling the ball. That control is normally commensurate with a higher points total over a domestic league season (it obviously cannot be similarly applied in the Champions League for lack of statistical power). PDO tells us, broadly speaking, whether a team is performing above or below the median for “luck,” how many shots for are being converted to goals, and how many shots against are leading to conceded goals.
Both of these metrics offer a partial piece of the puzzle, in addition to any clear signs a manager has “lost the plot” tactically. But compiling evidence for the latter, and moreoever, arguing that any individual tactical error in a game will suddenly affect a team’s ability to control the ball or very quickly improve their shot-conversion rates to the mean, is very, very difficult. Particularly if you think these tactical preferences will have an effect over 20 games, in sharp contrast to the first 14 of the season.
Yet it matters, because Real Madrid is about to sack a manager that, on the evidence, has produced a high-performance team whose point total reflects some bad, but inevitable, luck. Chances are it will cost Real Madrid the title this season. But to sack him based on an incident with the fans or calling out Sergio Ramos, or even some errors against Borussia Dortmund is exactly the kind of epistemic error analytics can help prevent. It’s not as sexy as providing the means for soccer’s 2003 As, but it’s vitally important, and doesn’t require any of the quackery most football fans have rightly been wary of since the offing.