Billy Beane

A guy who’s poor at math(s) tries to understand soccermetrics.

It’s strange that a book about the first mainstream application of analytics and sabermetrics in Major League Baseball general management was made into a movie. Clearly Columbia Pictures saw something in Michael Lewis’ book Moneyball—on Oakland Athletics GM Billy Beane’s use of statistics when buying players—right off the bat (chuckles); they optioned it a year after its release in 2003.

The movie is structured around Beane’s redemption as a failed baseball player, relying on revolutionary player analysis as a kind of revenge against the Old Guard that fed him the same lies about his long term prospects as a pro. But the means whereby Beane made his name in baseball is, cinematically-speaking, quite compelling, even though in the film it’s generally treated as a subplot.

Beane essentially relies on a Paul DePodesta construct whom the movie refers to as Peter Brand to use advanced statistics involving on-base percentages to assemble a championship team on an average wage bill. This is what much of the sporting world knows today as Moneyball. So when someone mentions player metrics or analytics in soccer, this is generally how it’s popularly understood.

Unfortunately, the term is now so widespread it’s trotted out whenever a player bought on the relative cheap plays extraordinary football regardless of what we know about the rationale behind their acquisition. Hence observers have often spoken of “Moneyball” in passing when referring to Newcastle United’s Papiss Cisse, purchased for around £10 million in the January transfer window, and who subsequently scored 13 times in 13 appearances since January. This despite the fact we know little about what sort of statistical rubrics were used by the club when considering the player, other than Alan Pardew’s words at the time: “He is a finisher with an already established CV in the Bundesliga, where we have monitored him for the best part of two years.”

I write this to illustrate that the popular understanding of the potential application of statistical analysis in soccer is limited by this narrow view. It is highly unlikely, for example, that soccer analytics will dramatically change how football is played, or allow managers to buy players on the cheap that will render a whole team greater than the sum of its parts and increase a club’s win percentage. The integration of advanced analytics in soccer will be more evolutionary than revolutionary.

As Sarah Rudd, Vice President of Analytics and Software Development at StatDNA, told me, soccer is similar to basketball in that the numbers confirm rather than contradict conventional tactical wisdom. Best practices in football are fluid and change over time to adapt to present circumstances and trends, as Jonathan Wilson’s history of football tactics and formations Inverting the Pyramid illustrates. In practice, advanced soccer metrics simply allow players to tweak their performances and adjust from one game to the next based on subtle but measurable weaknesses on the opposing team, for example. See Jen Chang’s recent look at Everton’s use of performance analytics.

This is not to say analytics can’t be used to deduce whether a club is overpaying mediocre players. At the MIT Sloan Sports Analytics Conference held in Boston this past march, Michael Fotopoulos and Andrew Opatkiewicz released a paper titled “Salary Allocation Strategies for Major League Soccer“, a means to measure ability against wages using former Chelsea and Watford manager Gianluca Vialli’s geometric framework for player evaluation.

It’s not a perfect science by any means, but as both player analysis technology improves managers and coaches will be better able to measure player performance against wages, particularly important in single-entity, salary-capped MLS.

Chances are however there will be no single metric that will change the game, no hidden on-base percentage or other sabermetric tool that will forever alter the way the casual fan, or player, or manager, views the sport. Rather analytics will be used to tweak improvements a number of diffuse areas in the game, like optimal mix between offense and attack, best practices in formational play based on available personnel, the proficient execution of set-pieces, understanding whether crossing the flanks is an efficient use of possession, and which of the many newly available and accepted metrics are actually useful for the layperson (or blogger) in evaluating either a team or player performance, like pass completion percentages or the dreaded “assist number.” The latter will be particularly important in the evolution of soccer media, although it’s unlikely it will provoke the same kind of split we see in baseball between the narrative-driven romantics and the small-sample size-obsessed numbers nerds.

Each of these in turn will be enhanced as player analysis technology improves. It will be a slow, arduous process, but the fun part is there is still a lot to learn. Unfortunately many of the most advanced metrics are kept secret both by large analytics firms and by clubs, which largely leaves the average joe out of the loop on more complex player and team metrics. That will be the subject of a future column…

Comments (3)

  1. What shocks me is that statistics have been used in soccer (and hockey) for decades, maybe half a century. The former Soviet Union sides used to have a point system based on passes, passes completed, shots on target, and all the other possession stats we drooled upon. However, their tasteless yet efficient sides lacked Spanish flair, so it drew and draws little press.

  2. Papiss Cissé was of course no Moneyball purchase. After the 10/11 there were a host of Premier League clubs interested, most notably Blackburn. But they were all turned off by the transfer fee, which Freiburg demanded. £ 15 million for a player, who only had one outstanding season was a bit steep.
    But after turning into bit of a locker room cancer and Freiburg struggling at the bottom of the Bundesliga, they decided to cot him loose during the winter break. Newcastle was still interested and snapped him up for £ 10 million. Still a bit of a gamble but mostly a solid transfer.

    The true genius of the Newcastle was done twelve respectively six months before. Extracting maximum money out of panicking Liverpool/Chelsea in the Andy Carroll-Fernando Torres-saga was very smart business. Getting rid of overpaid, overrated, underperforming English egomaniacs in Kevin Nolan and Joey Barton. That is smart business. Replacing them with cheap, underrated, hardworking comitted guys such as Yohan Cabaye. That is supersmart.

    That is Moneyball.Acquiring players, which the public does not value and even more important which your oppositon does not value – that is Moneyball. Selling players, who are overvalued because of certain properties (in Newcastle’s case being an Englishman) – that is Moneyball.

    Statistics don’t matter. They were just a tool for Baseball analytics. The teams, against which Billy Beane’s Oakland A’s competed, also used statistics (though not exclusively). But they used the WRONG statistics.

  3. A few quick thoughts in reference to the paper mentioned in this article (Fotopoulos and Opatkiewicz 2012):

    The only thing one could reasonably ascertain from this paper is that going with academy players vs. NCAA vs. Internationals it’s equally a crap shoot (i.e. all have an average AREA score of ~50, and StDev. of ~30). They state as much in their conclusion, however, technically I don’t think you can actually make this claim without p-values and confidence intervals on their aggregated numbers. High standard deviations can be the result of large variation in a given population, or a small sample size (i.e. an underpowered study) or flaws in methodology yielding an unrepresentative sample that doesn’t reflect the true population.

    Moreover, I would be curious to see if there is a correlation between age and AREA score, as well as potentially further subdividing their three groups (i.e. Academy, NCAA, International) into age ranges. I would intuitively think there could be a correlation between age and AREA score, thus comparing these three groups – which likely have vastly different average ages – may be an apples to oranges comparison.

    Has their AREA calculation method been validated in any way? E.g. prospective study to correlate past scores with future performance for individual players, and/or correlation with some sort of team success metric (and of course, with some form of regression analysis to determine if winning = higher average team AREA score is a chicken or egg phenomena)? If so, they certainly didn’t indicate as much in their paper.

    I could get more technical here, but I think I’ll sum things up by saying it was an interesting read, but (sports analogy alert), kind of bush league from an academic standpoint.

Leave a Reply

Your email address will not be published. Required fields are marked *