May 14, 2013 at 8:14 pm

The Limits of Data Analysis

Man Utd v Liverpool -2First off, let me put my cards on the table. I’m the son of a mathematician, and although I’m not a data analyst, I’m pretty good with numbers, and think I understand data and statistics to a certain degree. Saying that, I’m sure there is nuance that is lost on me, so I’d be happy to have any ignorance that might be informing my doubts pointed out so I can rid myself of it.

I’m not a ‘statto’ or a ‘fanalyst’, but I do recognise the value of stats-based analysis, and I enjoy and am grateful for the work of people like Dan Kennet, Lee Mooney and Andrew Beasley, (this article is not aimed at any of these individuals, I’m merely naming a few data analysts whose work I have respect for) as well as others. I think their work is helpful for confirming impressions, and disproving certain fallacies often arrived at via confirmation bias or just blatant prejudice.

More recently there has been a lot of talk about fanalysts turning their attention to data-scouting, or using data models to identify potential targets for clubs. It’s my view that there are fundamental reasons why this is a waste of time. I’m not saying there is no value in it whatsoever; I’m certain some clubs already use databases like the one in Football Manager. Rather, I’m saying it will not be possible to do any more or better than the Football Manager database already does, and without any qualitative data (on things like heading ability, pace, etc) which can only be derived via traditional scouting, it will be an inferior dataset of no interest to clubs.

If the data used to identify players relates to players’ recognition and achievements in the game (international caps, stature of the club employed by), as opposed to value judgements (pace, creativity) all that data will end up telling you is who has already been effectively scouted. Of course there is a correlation between a player being high quality and playing for a big team or being capped internationally, but those two factors – big club and international recognition – are forms of recognition in themselves. Players who are internationals at big clubs are, by definition, the very players clubs don’t require data on, or bringing to their attention, because they have been discovered and exposed already. Any focus on internationally capped players, or players at ‘big’ clubs, automatically filters out the real rough diamonds who have yet to be discovered, while ensuring the names of recognised players are put forward even if they are clearly not good enough.

The aim of any data model should be the precise opposite: to identify players for whom there is no indication that they have ability (international recognition, contract with a big club) other than their ability itself. If I was a manager given a list of players who have been capped internationally, or who are already at big clubs, I’d be wondering why. What I want is a list of players totally unknown and unrecognised, who people proven to be a good judge assure me are huge talents. With modern television coverage there are armchair fans with extensive knowledge of players across the world, and not just England, France, Germany and Spain, but leagues like the Brazilian, Ukrainian and Dutch. I think it’s naive to think that a Premiership football club, with numerous scouts and coaches paid handsomely to make football their life, don’t also have a huge wealth of knowledge in their ranks. If I can name you great prospects from the Ligue 2 and Serie B – which I can – then I’m pretty sure there are people at Liverpool who can too.

There’s no way that, as a manager, I would ever say to a scout “find me a defender from Germany who is over six foot and under 23 years old” – the kind of data that can be easily compiled without scouting. I might say, “I need a quick centre-back who has experience playing as part of a back three.” Now, it wouldn’t be too difficult to compile extensive data on the formations players have played in, and although it would be extremely time and money consuming, it’s technically possible to apply a non-subjective pace rating to every player by timing them running over different distances and with or without the ball. But even if those two things could be done, it wouldn’t rule out quick defenders who have played in a back three who are absolute rubbish, because player selection – when done right – is about far more than a few pieces of specific data. A good manager looks at a player’s movement, tendencies and body-shape, and thinks about how they will blend with the other players around them.

There’s a kind of chemistry to it, not so different to selecting ingredients to go into a meal, and what you are trying to cook is all important. For instance, there is nothing objectively wrong with chocolate as an ingredient  but I wouldn’t want it in an omelette. You could say something similar about Andy Carroll. In a team playing a direct game with wingers crossing consistently, Carroll is ideal, but if you want to keep possession, working the ball into the box on the floor, he’s just not suitable.

But how do you derive suitability to styles of play and specific roles from stats alone? Physical attributes aren’t enough. Ibrahimovic is tall and heavy but ideal for a patient, passing game on the floor. Again, stats on things like pace, ball-control and mobility could be helpful, but you only obtain them by scouting the player in the traditional way in the first place or carrying out rigorous tests, which defeats the point of a cheap, easy to use database. To provide any genuinely valuable results, qualitative data must be obtained via either the human eye, or intense scientific measurements in the first place.

Without that qualitative data, any results will at best tell us what Football Manager already can, and at worst, be a list of players who have already been scouted by big clubs and international teams, when the aim is to uncover yet undiscovered gems. Once a player is at Bayern and capped by Germany, it’s too late for any model for identifying players. The player’s reputation is such at that point that he will be known to clubs, and will likely have been scouted at some point, too.

This isn’t a case of the right or wrong data. It’s a paradox inherent in the concept of using non-qualitative data to identify quality players. Using only non-qualitative data to identify players is the equivalent of selecting an ingredient for a meal based on it’s statistical size, density and water-content, rather than how it actually tastes, or how well that taste blends with other tastes, and you could end up with oranges rather than tomatoes – no good if you’re making a pasta sauce!

I don’t believe it is possible to ever produce statistics that can accurately describe a player’s ability. I’m a huge fan of good passers. I love a great pass more than a great goal, and I believe passing is the most important aspect of the game. But can stats really inform me about passing ability? Completed pass percentage? No. It’s clear enough just by the fact that centre-backs – for whom passing is not the most important aspect pf the game – regularly feature at the top of passing stats that pass completion stats are of little value. That’s because they are routinely playing extremely safe passes. Zinedine Zidane was one of the best passers I’ve ever seen, but there will be many inferior passers with a better pass completion percentage, because his tendency to try to ‘difficult’ passes – those likely, if successful, to result in a chance – meant that even with his ability, some would fail. What kind of stats could represent the ability Coutinho has to play perfectly weighted passes through tight spaces without telegraphing them? It would take such an extraordinarily detailed accumulation of data, that it simply would not be worth doing it, and when you actually see a player with that kind of ability, their talent is so obvious – or ought to be to anyone working in football – that stats confirming it would be pointless.

I also think context is vital in any statistical analysis. How did the player’s teammates score compared to him? What’s the average in the league? How good were the opposition the player faced? Another missing aspect that limits the usefulness of data is ‘intention’. If a player is told by his manager to ‘shoot on sight’ all season, the stats he generates will indicate what he has done, rather than what he is capable of doing. If he happens to be a player adept at dribbling and finishing from close range, but poor at long-shots, his stats will look ugly, but they say nothing about how good he actually is. A trained eye can spot that a player is being asked to perform a role he is ill-suited to, or in a team playing a style he’s unsuited to. Stats can’t. This also applies when evaluating how well a team is performing. We might look at one team’s statistics to see that they don’t win many duals, but if there game-plan places little significance on winning duals, it won’t be very relevant. On the other hand, a team might look like they are doing well if they are winning an above average number of duals, but if they are deliberately set-up to play in a way that requires them to win a high percentage of duals, just above average might actually be poor in given to what they are trying to achieve. So ideally recorded stats would be seen in relation to what statistics we would expect to see from the perfect application of any given approach.

I personally think there are a lot of currently unexplored uses for statistics in the game. One thing I would like to see become as prevalent as teams’ possession stats, are statistics showing the average depth of teams’ defensive lines. I think this would be far more informative than possession stats, as it provides an indication as to the style of play a team has adopted, or has been forced to adopt.


I’d welcome any comments on this post pointing out mistakes I’ve made, false assumptions, or things I may be missing. I’d especially like to hear any suggestions of metrics which could be used to identify players suitable for a specific style, or a particular role, more effectively than Football Manager already can. 

Leave a Reply

You must be logged in to post a comment.


Get every new post delivered to your Inbox

Join other followers: