I found out about this potentially momentous news while boarding a ferry from PEI to Nova Scotia, so I’m a little late to the party. Still, if only to keep momentum from this news going for a tad longer, here’s my take on the announcement that Manchester City Football Club has, in cooperation with Opta Pro, released a full XLS spreadsheet of analytics data from the 2011-12 Premier League season.
First, let’s get the minor caveats out of the way. While it is a treasure trove of data, it is limited to a single season in a single elite professional domestic league, which runs into a bit of a small sample size problem. Second, it is in a very raw form (an Excel spreadsheet parsed by player actions from individual games, a minor problem really, but a deterrent perhaps to the very casual user (although there’s an opportunity here which I will speak to shortly).
Despite those minor quibbles, this is a major coup in the fight to make advanced metrics available to the tinkering public. While I’m more in the wait-and-see camp than zealots like Forbes’ Zach Slaton, I’m for the most part in agreement this is a significant step forward in a discipline that is barely out its infancy.
To that end, the intentions of head of performance analysis at Manchester City, Gavin Fleig, with respect to the long-term goals in the football analytics community, should be perhaps celebrated even more than the data-set itself.
When I started this informal series on the state of analytics in football, essentially as a complete and utter statistics rube, I arrived with some fairly major prejudices about the analytics community at large. Essentially, that data-gathering companies cared more about dollar signs than advanced research, and that club performance analysts in a permanent state of Cold War-esque paranoia, ever fearful of ceding market advantage from a particularly revealing and counterintuitive algorithms as yet unknown to the general public.
Furthermore, because soccer is a “complex team invasion sport” in which discrete events are not easily measured by eager amateurs as they were in baseball in the early days of SABR, resource-intensive data would forever live behind moneyed walls. Meanwhile, eager amateurs could only nip at the scraps on offer for broad commercial use from Opta and Infostrada.
This has, in the last few months, turned out to be nonsense. What I quickly discovered is that club performance analysts and data firms alike are very interested in wider public recognition for their efforts, and more than any other figure in football, were willing (within reason) to go on the record about exactly what they do. Many are admirers of Bill James and the crowd-sourced development of Sabermetrics in baseball, and even more understand the inherent power of the Internet in overcoming the limitations of Allen curve. This is a venture that needs not only media attention, but accurate reportage, particularly in the wake of swathes of idiocy which have hounded the football analytics movement since Damien Comolli’s Simon Kuper-assisted self-characterization as football’s Billy Beane.
This move is also further indication that yet again, City is strongly signalling they do not match the fly-by-night moneybags club caricature painted by the press. They were strong first movers on producing original, revealing digital content, and now they are positioning themselves as first movers in the race to take advantage of crowd-sourcing. As detailed in the press release:
To continue giving exposure to this discipline we will be running a research competition in the coming months whereby the work submitted to us will be reviewed by our Performance Analysis department and Opta, with the best projects being published on our MCFC Analytics page and the OptaPro website to share with the world. Furthermore, as recognition of the contribution to the performance analytics discipline we will invite those with the best projects to come to our training ground to present their work and to then share with us the match day experience, observing how the Performance Analysis department supports our first team during a Premier League game.
One only has to read between the lines not to see the avenue to gainful employment through talented, useful research.
While this release seems to target amateur analytics bloggers, statisticians, and the media, I should hope talented software developers will look for ways they can potentially use this data to provide an interactive experience for their users. This would help invite the more casual fan a means to join the party, to sift through the data, to stumble across their own correlations. And if the software proves popular, it will put further pressure on either Opta Pro or MCFC to regularly update available data, season-by-season.
As it stands, football doesn’t have its Fangraphs or Baseball Prospectus, although there have been a few early contenders like whoscored.com and footballrrating.com. While it’s putting the cart before the horse to demand statistics sites before we’ve even agreed on broadly useful metrics, it would behove both software and web developers to find the means to provide the same information in a more graphically dynamic way, and open the space for blog posts and discussion.
Besides that, more of this sort of thing, please.