First, going to keep plugging this. The esteemed James Grayson set up this forum based on my analytics post last Friday, and we need more voices to reach critical mass. If you haven’t already, have a look, post something, return, keep returning, and tell your nerdlinger friends.

Second, what is football analytics for?

This week, I’ve spent some time looking at some numbers—particularly Total Shots Ratio, a good, simple, predictive metric which measures a club’s ability to control the ball, and tends to indicate a team’s ability to win over the long term; and PDO, which assesses whether or not a club is performing above or below the general mean, whether due to luck or incredible skill—in order to make the case that it would be foolish in the extreme for Real Madrid to sack Jose Mourinho.

In those pieces, I didn’t pay much attention to rumours of a poor relationship between RM president Florentino Perez or any individual instances of questionable ‘leadership’ from Mourinho, and perhaps more controversially, I didn’t discuss particular games or individual tactical mistakes Mourinho may or may not have made in particular matches that may or may not have led to a negative outcome.

It should be said that any or all of the above factors could, in theory, have a detrimental effect on the team’s ability to continue performing at a high standard over the long-term. At the very least though, Perez owes it to the club’s fans—particularly considering Mourinho’s phenomenal success rate at Madrid when faced with defeating one of the best club sides in footballing history—to make a clear case they would.

Otherwise, Perez could be necessarily putting the club at risk of a dip in form, at least if Barcelona continue to perform to their incredible standard, including a 1 goal to every 4 shots conversion rate.

And so, I ask again, in light of this, what purpose does football analytics serve?

In the minds of most fans, sports analytics is meant to tease out counter-intuitive metrics or algorithms in order to give less-financially able teams an unseen market advantage. This is, of course, the Moneyball approach. However, it’s fair to say that the advantage of any first mover in this regard is short-lived in sports analytics (as it was in baseball, despite the romanticism that still surrounds the 2003 Oakland Athletics). As for football, it’s hard to make a case whether first mover metrics exist at all. If they do, they likely only offer small, in-game dividends that may not even translate into added point totals.

Some others have looked to analytics, particularly statistical absolutes like the Reep Ratio of one goal for every nine shots, in order to make normative statements about the “ideal” way to play football. This is closely related to the Moneyball approach, and for some, it’s the basis of their interest in the field. If statistics once famously revealed that most gridiron teams would be better off ‘going for it’ than punting on a fourth down for example, perhaps there are similar inefficiencies in how most teams go about trying to score goals and prevent goals from being scored.

This area of analytics is controversial and subject to skepticism, for good reason. For one, there is a major risk in complex team invasion sports, which are open ended and free-flowing, of over-generalization—even in broad samples. Large samples drawn exclusively from a particular league or region may be influenced by unseen regional biases based on cultural preferences and tactics. For instance, Spanish sides generally lead Europe in interceptions, and English sides generally lead in tackles. It would therefore be erroneous to draw normative conclusions based on those individual factors for football as a whole.

Even statistical patters that can be observed over a large number of matches accounting for both historical samples and regional differences however do not account for the fact that football is a game of particulars, single matches with determined outcomes played by sides of greatly varying quality. This is one of the major reasons why Charles Reep’s theories came into disrepute. Reep believed that because he ratio of shots to goals was generally fixed, and that goals on average arose from a limited number of passes, teams would do well to beat the odds by punting up the ball to goal as quickly as possible to rack up the shots.

But this is using the general to abuse the particular. As Jonathan Wilson pointed out in his famous criticism of Reep in Inverting the Pyramid, there is evidence from Reep’s own numbers that, at the time he made his measurements, 91.5% of all moves in football involved three or fewer passes. So if goals were more likely to be scored from fewer passes, this percentage should increase. However, 80% of goals in his examples came from three moves or fewer, indicating that goals from three or more passes are actually more efficient. Moreover, Reep fails to account for crucial particulars, including evidence that higher-skilled teams were more likely to score goals from longer passing chains. As Wilson concludes, “That is not to say that direct football is wrong at all times, merely that fundamentalism in tactics tends to be misguided as in any sphere.”

It’s for this reason that those in the analytics field would be wise to avoid normative statements based on some broad statistical correlatives unless they have very strong evidence of its universal application.

Finally, some other fans have looked to soccer analytics as a means to beat betting odds. More power to you, if that’s your interest. But here I think is the closest analytics will come to broad utility.

As I’ve spent more and more time on this subject over the last few months, it’s become clear to me that the single, major advantage of football analytics—particularly those that maintain strong predictive power over a reasonably small sample size—is to provide a clear picture of the relative strength of a certain team over the longish term.

Why is this important?

Well, just as broad samples can be over-generalized to particular matches, so can instances from particular matches be abused when applying them to the long term. I wrote a piece making a simple argument that Real Madrid’s unnaturally low PDO this season, coupled with their impressive TSR, meant they’re possibly due for an improvement in total points. A commenter wrote in response, “The matches against Dortmund were illustrative of the decline the coach is overseeing in the club.” On what evidence though has there been any ‘decline’?

TSR tells us, broadly speaking, whether or not a team is controlling the ball. That control is normally commensurate with a higher points total over a domestic league season (it obviously cannot be similarly applied in the Champions League for lack of statistical power). PDO tells us, broadly speaking, whether a team is performing above or below the median for “luck,” how many shots for are being converted to goals, and how many shots against are leading to conceded goals.

Both of these metrics offer a partial piece of the puzzle, in addition to any clear signs a manager has “lost the plot” tactically. But compiling evidence for the latter, and moreoever, arguing that any individual tactical error in a game will suddenly affect a team’s ability to control the ball or very quickly improve their shot-conversion rates to the mean, is very, very difficult. Particularly if you think these tactical preferences will have an effect over 20 games, in sharp contrast to the first 14 of the season.

Yet it matters, because Real Madrid is about to sack a manager that, on the evidence, has produced a high-performance team whose point total reflects some bad, but inevitable, luck. Chances are it will cost Real Madrid the title this season. But to sack him based on an incident with the fans or calling out Sergio Ramos, or even some errors against Borussia Dortmund is exactly the kind of epistemic error analytics can help prevent. It’s not as sexy as providing the means for soccer’s 2003 As, but it’s vitally important, and doesn’t require any of the quackery most football fans have rightly been wary of since the offing.

Comments (9)

  1. Yeah, but have you ever played the game?

  2. It’s not just for Sergio Ramos or the fans whistling thing that Mourinho’s time appears to be coming to an end. If you read Marca’s truncated English translation of their Spanish piece, perhaps that is the impression you are left with. Read the Spanish language pieces in Marca, AS, El Pais etc. since Mourinho took over and the case has been made (by Florentino Perez or others) that a long term (4+ years) relationship was always going to be difficult. Just a few things:

    Madrid gave Mourinho full control of transfers only to find players like Altintop, Essien & Coentrao being signed because they were represented by Jorge Mendes or one of his associates.

    Contract negotiations for the players represented by Mendes became increasingly complex. For example, Mourinho and Mendes lobbied the club for months to give Carvalho a new extended contract. He is now a reserve player.

    Not to mention the nearly weekly bust-up with someone or the other in La Liga – Preciado, Villanova, Manzano etc. The ridiculous complaints about referees, opposition teams and league authorities all to try and create a siege mentality that was always an awkward fit at a club like Madrid.

    It all adds up. They put up with him because of the results in the 2nd season, but it has always been a tense relationship.

    As for what soccer analytics is for, IMO it is to add a layer to our understanding of football. To perhaps confirm or refute our thoughts about a team, player or league They shouldn’t be used as the sole source to form those thoughts.

    We can perform a detailed statistic analysis of the Polish league, but without watching any matches and having knowledge of the off-field factors, there is no context for the figures obtained.

    (While mentioning the statistics for Mourinho, shouldn’t those of his predecessor also be examined to at least provide some sort of reference?)

    • Your concerns are interesting and I am aware of them—contrary to some of the Madristas who seem to think because my first language is English or because I write a lot on the Premier League I’m unable to read Marca, AS, El Pais, LLL, Jimmy Burns, Graham Poll, Sid Lowe etc. etc in fine detail, I do know the goings on behind Madrid fairly well.

      Moreover, you’re likely right on the money. However, as part of a general commercial strategy, these obstacles to my mind can’t compete with the benefit of a manager who is more qualified than any other to provide RM with a route to knockout stage CL football with its attendant fees, or to continue to push for a La Liga title against a plainly superior Barcelona. Things like “siege mentalities” or “difficult working relationships” are not normally single party affairs. Mourinho is famously difficult. An intelligent club would absorb this as an annoying, non-monetary cost of doing business.

      I’m also a little disappointed you didn’t read this piece as closely as you might’ve. I would never ever argue that analytics should be “the sole source to form” thoughts on football, but they can offer a unique tool in holding up those thoughts to the light of day. Would you make important decisions based solely on personal bias and untested impression? Wouldn’t you want some sort of insurance? Wise planners would.

      As for his predecessor, I did write on him earlier today, though in passing. Manuel Pelligrini is the only other Real Madrid manager to earn a higher win percentage, at 75%.

  3. I appreciate the effort you have put into writing this piece, but …

    1. Please supply links in your post to articles you reference (none of the Mourinho pieces have links to them and I’m too lazy to look them all up).

    2. TSR does NOT tell us if a team is “good at controlling the ball” … possession % does that, TSR is only the ratio between the total number of shots that a team takes in comparison to the opponent … you assume that if a team is “good at controlling the ball” they then have a higher TSR, this is not always the case, but, yes, often is. A stat that would couple possession % and TSR would indicated a team good at controlling the ball and most likely to obtain good results over the duration of a season.

    3. TSR varies wildly per game … thus actually a tactical error or substitution etc. COULD affect the individual game TSR (both in a positive or negative way) but will not affect TSR of future games nor, most likely the TSR found over a season …
    As an indication, I have information on a team that had their TSR fluctuate between 0.032(L) (yes that low) and 0.636(W) on an individual game basis, but was 0.360 for the year (last season).
    On a side note … possession % varied between 17%(L) (different game to lowest TSR game) and 57%(T) (different game to highest TSR game) with a season average of 38.9%.
    Have indicated game result in brackets behind numbers for novelty value.

    4. People always drag Moneyball into the conversation … interesting though that book/movie/idea may be, here’s the thing … Moneyball looked at stats for individual players (mainly on base percentage) as that was perceived to be the best stat to indicate success for a team and came at a low cost, as it was not rated highly by others.
    The stats thrown about above are team based stats … now … unless one team wants to buy up a complete other team, TSR cannot be transformed with a few shrewd buys (e.g look at Newcastle TSR or Swansea TSR over this and last season … see James’ blog). Buying an individual who “shoots a lot” does not directly translate into success (I only have to mention L.Suarez of Liverpool) … a team would have to buy a player that creates a positive delta TSR when replacing the player currently occupying that position. As far as I know, no one has looked into that.
    Hence, when talking about team stats, don’t throw around the “moneyball” term, as it does not make sense. Football (soccer for the US) is a long way away from finding a wins above replacement level statistic or on base percentage type stat for players.

    5. I did look up that season PDO you mentioned for Real Madrid … if I read it correctly you state that it is currently at 993 … not really that “unnaturally low” as you make it out to be … PDO will regress to 1000 in the long run, as you probably know, not to a number ABOVE 1000!! i.e. Real Madrid hasn’t been that unlucky at all, have almost been spot on average with regards to luck so far this season … you win some you lose some is a familiar expression.

    That’s all I could come up after a brief read of your post … sorry if I missed anything … will give the post a second read if need be.
    cheers,
    b.

    • ps – forgot to add to #3 …

      That team’s PDO fluctuated between 1596(L) (thus very lucky, but still lost) to 750(L) for a season average of 979. Both game PDOs came at different games from the possession% numbers and the TSR numbers. Thus all 6 highs/lows came during 6 separate games (total games 38).
      On a side note … that team did obtain a PDO of 1833 (a maximum number of 2000 is theoretically possible) during a cup match, it was a loss but they still went to the next round on away goals rule. The TSR was 0.111 with a possession % of 31%. The cup matches have not been included in the season total numbers.

    • Hi Bart,

      Thanks for taking the time to comment. I’ll try to answer these in turn.

      1. I’ll be sure to provide more links in future, though in this case the piece I was referring to was from earlier in the day.

      2. I strongly disagree with your interpretation of what TSR reveals. It’s important to remember it’s primarily a ratio, specifically between the number of shots a team makes, and how many it concedes. The stat assumes that teams are doing what most teams do on a football pitch: score goals and prevent goals from being scored. Therefore “control” in this sense is not quite the same thing as the ability merely to have the ball and hold on to it, but to use the ball effectively to create more chances than your opponent. It’s for that reason that while TSR often overlaps with possession, they’re not the same thing. For example, a team could enjoy 60% of the possession outright, but concede 14 shots and attempt only 7. While we can’t know any specifics without watching the game, we know that the team with 40% of the ball was far better in controlling the ball toward their objective: creating more chances than the opposition.

      3. You’re right in that TSR doesn’t discriminate between the quality of shots (which is why it’s perhaps a good idea to couple it with the shots on target ratio (SOTR)), and that’s why it’s not a good idea to read meaning into the stat over the course of a single match. TSR begins to yield predictive power over a very reasonable 4-6 games. My sample here included all of RM’s La Liga matches and CL games for the season, I believe 19 games in total.

      4. I raised “Moneyball” here only to address what many believe is a common purpose of sports analytics, and to demonstrate it likely does not have a strong application in the game, at least not yet. I too am tired of reading about it, but it’s constantly raised in newspaper op-eds on the subject so it needed addressing here. My entire point reflects the one you made here: no, it doesn’t not make sense in complex team invasion sports.

      5. You’re right in a sense, however (and I didn’t state this well enough in the piece, mea culpa), but in addition to luck, PDO in very highly skilled teams reflects an extraordinary level of skill. Both RM and Barcelona for example finished last La Liga season with a PDO above 1100 for the season. Remember, it measures Sh% + Sv%. Barcelona score a lot of goals on a comparatively low number of shots, but this is because they employ Xavi, Iniesta, and Lionel Messi. Ditto for Real Madrid. It does tend to be a luck-based metric, and does regress heavily to the mean, but it can also reflect an immensely talented team. Real Madrid’s PDO so far this season is historically low for Madrid.

      Hope these are clear!

      • Hi,

        Thanks for replying … here’s a reply from me … :)

        1, 3, 4. Okay.

        2. Well … I guess it is down to semantics then. My interpretation of “controlling the ball” is having the ball in one’s possession and keeping it so (pass related) …
        I guess what I would call your explanation “controlling the game” … this is where your example comes in … my example: many times in the past the Dutch national team has had crazy amounts of possession of the ball but their opponent (e.g. Italy) has been okay with that as the Dutch would not do much (i.e. shoot, create scoring chances) with the possession, however, when the other team got it there was a quick counter with a resulting shot (goal or not).
        Johan Cruijff argues that to have control of the game one must first have control of the ball because for the time that one has the ball the other team cannot score (crazy own goals are the exception here).
        Control of the game is basically how teams like Real Madrid or Napoli or Valencia under Cuper operate … not that worried about possession, but deadly in shooting/TSR … see last nights R.Mad v Ajax game.
        Anyway … the reason I mentioned it was that in paragraph 3 you say “ball”, where in my opinion it should say “game” …
        semantics really, as for me ball does not = game.

        At this point I’ll agree to disagree.
        These are actually things the statistics community has to figure out … what is each stat actually telling us? what are we rationally/emotionally inferring from this number? etc.
        (discussion for another time)

        5. That’s my big problem with PDO … James and others (11tegen11 blog) have stated that PDO is luck-based and basically such stats as save% are more luck-based than skill-based … and, as such, it would always, in all cases, regress to 1000 … there’s something that has not sat well with me ever since I first read about PDO, as one of the things implied is that it does not matter who a team puts in goal, over time each option will converge to a similar save%, as luck is involved.
        Anyway, I understand where you are coming from, but then it would have been handy to include past PDO numbers to compare then … also if Real Madrid are always “lucky” then what does their PDO “regress” to?
        PDO and TSR are not related so one cannot say that an average TSR team will eventually have a 1000 PDO if somehow, somewhere skill is involved, as you hint at in your reply.
        Anyway, I’m still not convinced by PDO and am still mulling it over …

        oh … and as to shooting percentages … C.Ronaldo has one terrible shooting% when compared to efficient strikers! Guess R.Madrid’s PDO “luck” is coming from the goalie end then. ;)

        cheers,
        b.

        • Okay, I’ll respond to these too!

          2. Fair point on the “ball/game” distinction, but I still think ‘ball’ works just as well, because the point, ultimately, is to get it to do more of what you want it to do than your opponents, if that makes sense. All TSR measures is whether a team is able to take more shots than their opponents over several games. TSR is usually commensurate with table position. The means whereby they maintain that kind of dominance likely varies a lot.

          However, we do know with a good certaintly that while an approach that concedes possession (and shots, and TSR) in the hope of scoring on a set-piece or on the counter will win the odd game, you can’t generally hope to win the league with that approach (the Champions League, however, is another story CHELSEA CHELSEA CHELSEA). ‘Possession’ meanwhile is practically a bauble. It’s tactically vital, but statistically empty.

          5. PDO should not be used in and of itself as an absolute measure. The “lucky” aspect of a team’s particular PDO comes not from the measure itself, but its tendency to regress to the mean over a reasonably short period of time. RM’s PDO is lower than their historical mean which is in the 1100s, so chances are results could even out over the rest of the season.

          • 2. Enough said about it … we both understand each other’s points I think. :)
            As for possession … yup … that has been what has frustrated me on Dutch commentary for years … always going on and on about holding the ball in the team, yet not really having much to show for it at the end of the day.
            Which is why I like the style of play that sets out to get the ball asap in whatever part of the field and then does something with the ball asap … I refer you to Athletic Bilbao (more of last season than this) for example … but this is letting preference into style of play … we’re talking stats here. :)

            5. Yes I know … but here’s the thing … if RM’s PDO progresses to the historic mean of 1100 … then that implies that some other team’s PDO must historically regress to a number below 1000 … this is because over time we are assuming that under equal circumstances save% and shooting% will regress to create a PDO of 1000 for a league average.
            What this implies is that all goalies on all other teams are not good, but Casillas is (one extreme) or strikers/shooters are good and other teams have below average strikers/shooters (other extreme).
            And here’s the crux … now what happens if the goalie or shooters are replaced? How will PDO shift? Will it shift? If PDO is luck-based then it doesn’t matter who is put in goal, the save% will always add up with the shooting% to get to 1100 (over time)!
            If PDO has talent in it then maybe what is being noticed is a drop in talent and not luck at RM. This could be due to numerous factors, players getting old, injured etc. so not being as sharp as in the past etc.
            Do you see my issues with PDO? … I get how it works and how it regresses to a mean etc.
            Also … if the element of talent is in it then PDO is useless to compare between teams … it would need to be related to the historic average of that team before it could be compared to how a PDO of another team currently is in relation to that 2nd team’s historic average …

            Just as an example … that one team mentioned before has a historic PDO list like this:
            2008/09 – 998
            2009/10 – 990
            2010/11 – 975
            2011/12 – 979

            (for novelty these are the same team’s TSR numbers over those years: 0.511, 0.439, 0.486, 0.360. and the position they finished in 12, 16, 12, 20.)

            Anyway … enough discussion … tired and well … tired,
            cheers,
            b.

Leave a Reply

Your email address will not be published. Required fields are marked *