The Limits of Observation

*This article is a piece I wrote for an NHL annual this past summer. It’s rather extensive. Fair warning.

An enduring debate in hockey analysis circles currently centers around “observation” versus “stats”. Like most arguments that enter the public domain, the debate has become polarized to such a degree that the dichotomy presented is an utterly false one. The truth is, rather than “observation versus stats”, the actual debate is over traditional analysis (a mix of observation, counting numbers and conventional perception about what wins hockey games) and co called “advanced” analysis (which foregrounds testing observation and perceptions with statistical methods). Advocates of the former more or less subscribe to a “I know what I like and I like what I see” method of player and team evaluation, while the latter couch performance in a framework of norms, means, percentages and rates in order to strip, as best as possible, subjectivity from the analysis. For the purposes of this piece, I’ll focus on explaining this latter viewpoint: why observation alone is inadequate for evaluation.

Observation is the most data-rich method of gathering information. It is also the one most fraught with error. Unfortunately, the human mind isn’t the most accurate or reliable instrument when it comes to collecting and interpreting data: attention, perception and recall can all be skewed, biased or influenced in a multitude of ways, potentially distorting the signal and sullying the final analysis. As such, observation is a necessary but not sufficient element of hockey analysis. Alone and untested against measurements of performance, even the most ardent or experienced hockey fan or pundit can be led astray by his or her observations.

Cognition is Conservative

In general, humans tend to fight for that which they already know. That is, instead of re-ordering our views of the world or even minor aspects of it in particular whenever we’re presented with a new piece of info, people tend to either shoe-horn the new data into already established frameworks or act to dismiss the data as either irrelevant or incorrect. While this tendency sounds maladaptive, it actually serves a valuable purpose: imagine, for example, having to continually re-assess or re-evaluate your entire lexicon of knowledge whenever a potentially conflicting bit of info is presented. The world would dissolve into incoherence.

Of course, conservatism comes with it’s obvious costs as well. By establishing and protecting assumptions and cognitive rules-of-thumb (heretofore to be referred to as heuristics), the human mind finds ways to distort or ignore new information, thereby skewing perspectives and offering an incomplete or incorrect view of reality.

Perhaps the best known heuristic amongst sports fans is confirmation bias, defined as the tendency to seek to confirm original beliefs or theories. Even when information is incomplete or sullied by confounding factors, people tend to generate hypotheses about it’s meaning and then try to confirm these original guesses thereafter. For example, an Oilers fan sees a 30 second clip of an Edmonton prospect on youtube. He has no other real information on the player initially, but the clip shows the prospect flying through the opposition and scoring a highlight reel goal. The perception in the Oilers fan’s mind is now established: Prospect X is good. Afterwards, the fan will be motivated to seek out and foreground data that confirms this belief (other highlights, performances at training camp, in drills, etc.) and disregard data that conflicts with this belief (poor counting numbers in a prior season, a bad shift during a game, etc.). Of course, the more info the fan finds to confirm his perception, the more ingrained it will become as a fact in his mind and, therefore, the more resistant he will become to conflicting input. By actively finding information consistent with the perception (while rejecting data inconsistent with the belief), the confirmation bias acts as a sort of psychological positive feedback loop over time.

This is helpful when the hypothesis or perception is an accurate one, but hinders decision making when it’s not. Of course, confirmation bias is often employed to defuse cognitive dissonance (a feeling of discomfort when two conflicting thoughts are held simultaneously), particularly when people are emotionally oriented towards a given perception or outcome. Visit any NHL team-focused messageboard, for example, and casually suggest that one the club’s favored players isn’t as good as he’s generally perceived to be by the fan base. Then sit back and watch the stirred hornets nest.

Even if one were to include a host of defensible facts or stats, there would be little chance of swaying public opinion or assuaging the resultant tidal wave of abuse. Not only are fans motivated to confirm their beliefs (like everyone else), they’re motivated to construct their beliefs in a fashion that accords with their experiences and established self-identity as a fan of Team X. Positive associations based on emotional experiences can powerfully influence the construction of a perception and, therefore, the maintenance of a bias. And sports fans aren’t the only folks subject to these perceptual machinations: consider how often NHL GM’s re-sign players to an obviously inflated salary in the wake of an emotional or improbable playoff run.

There are other flavors of mental shortcuts beyond the confirmation bias. The availability heuristic, for example, refers to the tendency to base a judgement on how easy it is to bring specific examples of something to mind. The pitfalls of the availability heuristic is that sometimes what is easily recalled is not necessarily accurate or representative of the object or person in question. This can be observed in hockey analysis frequently: a player who appears on a lot of post-game highlight reels is often considered to be good. If a guy makes high-impact, highly memorable plays during a game, his performance will be easily recalled (this can work in both good and bad directions). This can result in accurate performance evaluation if the players game is more or less congruent with the exciting or obvious plays. If it isn’t, however, the availability heuristic will work to skew the observers perceptions in the direction of the “highlights” and away from the mean level of his performance in general.

The halo effect can be a result of the availability heuristic. In hockey this generally applies to players whose performance markers are a little more subtle. The halo effect is a bias in which a general impression of a person affects inferences about future expectations. This tendency is especially obvious when it comes to players who play or act in a pleasant or likable manner, either on the ice (always works hard! Sticks up for teammates!) or with the media/fans (funny, easily approachable, etc.). In the NHL, these are typically the players who consistently fail to produce worthwhile results but tend to stick around year after year due to some perceived quality that both fans and hockey decision makers tend to value. These acquisitions are usually rationalized in terms of “intangibles” – character, leadership, work ethic – and the guys are valued even when their tangible contributions don’t tend to actually help the team win.

Similarly, evaluations frequently suffer from illusory correlation, which is the tendency to perceive a relationship where none exists. The human brain is excellent at identifying (and sometimes fabricating) patterns, but lousy at identifying true relationships within those patterns. In hockey analysis, it’s simple to fall prey to the post hoc ergo propter hoc fallacy, for example, which is the tendency to assume that since one event follows another, the latter was necessarily caused by the former. Of course, the sequence of events can be entirely coincidental rather than causal. In what can be referred to as “building a narrative”, fans will observe a sequence of events and codify the entirety into a typical story structure featuring heroes, villains, rising action, a climax, etc. The narrative lends structure and meaning to observed events, but often substitutes archetypes and assumptions for analysis.

To illustrate:

The New York Rangers and Philadelphia Flyers are tied in the third period of a hockey game. The two fourth lines for each team take the ice and the enforcers decide it’s time to dance. Derek Boogaard wrestles with Jody Shelley for a minute and half before scoring a devastating left hook and a decisive victory. The fans cheer, the players bang their sticks on the boards and the goons go to the penalty box. Three shifts later, a puck skips over Prongers stick, Gaborik skates in alone and scores the game winning goal. After the game, pundits and teammates alike talk about the Boogaard fight as the decisive momentum swing and a causal agent in the Gaborik goal (and, ultimately, the Ranger victory). The relationship between the Boogaard fight and Gaborik goal is perceived to be causal due to the sequencing of events and resultant story that can be told afterwards.

And thus, fans watching the game might be tempted to come to that conclusion. However, the question remains: was Boogaard truly a causal agent in the Gaborik tally? Or did the fight merely precede the goal? One sequence of events similar to the ones described doesn’t prove anything either way, but it does work to plant the idea in the minds of players, coaches, GM’s and fans alike who observed it. In the future, when a Boogaard fight doesn’t result in any positive events, all the heuristics described previously will likely come into play for the observers in question – the confirmation bias will either cause them to ignore or downplay the competing evidence that perhaps fights don’t lead to game-changing momentum swings. At the same time, the availability heuristic will cause them to recall previous instances where an obvious Boogaard victory preceded a favorable result while the halo effect may cause fans and management alike to view him in a favorable light despite his other various faults as a hockey player. I’m not sure this completely explains why Glen Sather would choose to pay Boogaard $6.5 million over four years, but it probably comes close.

Attention and Memory

Much of the above could applied to both observation and statistical analysis: after all, fans frequently cherry pick counting numbers to justify certain evaluations of players or teams. Confirmation bias doesn’t only apply to watching and mentally evaluating the sport, for instance. Heuristics can be marshaled to skew a perspective in both areas of analysis.

However, observation tends to be more susceptible to such cognitive short-cuts because the act of encoding information during something as fast paced and emotionally involved as a hockey game is rife with other limitations. Consider that attention is in fact volitional and is both directed and framed by preconceptions and expectations. Meaning, an observer doesn’t passively absorb everything that’s happening during a hockey game: his or her attention is directed like a spotlight at certain aspects or events. What the observer focuses on is dependent on idiosyncratic factors: who is she cheering for? What players does she like? Not like? What’s happening in the game? Where are her eyes focused on the ice? And so forth. Of course, the ability to concentrate our attention on certain stimuli is actually an adaptive response (having to attend to all input, all the time would make for a highly chaotic, disruptive world), but it also means that some data is necessarily lost when it’s being encoded. What data (and how much) is, again, dependent on the individual variables described above as well as filters such as the heuristics listed in the previous section. This is what gives rise to subjectivity in observations: the difference in expectations, perceptions, attentional faculties and biases in each observer often results in competing interpretations of the same event(s).

Recall of events is similarly problematic. Human memory is re-constructive in nature. Rather than a simple recording of events, data is not only filtered during the encoding phase (observation) but is also subject to revision during recall. In fact, it’s been proven that memories of an event can be altered after the fact by something as simple as the wording used to describe it. For example:

  1. “In the third period of the game last night, Robyn Regehr crushed Ryan Kelser at the Flames blueline.”
  2. “In the third period of the game last night, Robyn Regehr checked Ryan Kesler at the Flames blueline.”

If you were to read the first description of the play in question, what are the chances you’d be more likely to remember the Regehr bodycheck as a game changing ‘BIG HIT’ later? According to studies done this area of human memory, the chances are very good. In 1974, Elizabeth Loftus found that after viewing a car crash sequence, future recall by participants could be influenced by the wording of the crash description. For example, the inclusion of the verb “smashed” in the description instead of “bumped” or “hit” meant that participants were more likely to answer “yes” to the question “did you see any broken glass?” one week later (even though there was no broken glass in the film). Participants who read the “smashed” description were also more likely to estimate higher speeds for the cars in the video. So not only did a simple verb change increase the perceived severity of the crash, when combined with a leading question (“did you see any broken glass?”) it was able to ferment a wholly new aspect to the memory itself.

The Constraints of Time and the Perception of Means

The issues mentioned are an indicative but not exhaustive survey of the human mind’s inherent ability to skew the intake, recall and interpretation of data. There are ways to combat observational biases, obviously: highly experienced viewers, assuming some level of competence, can work to separate personal preferences and emotional attachments in favor of what is relevant. A seasoned scout versus a casual fan, for instance. Strict definitions for observable behavior can also cut back on subjectivity. The amateur scoring chance counters during the past season agreed on a specific definition of “scoring chances” in order to preserve a reasonable level of agreement across counters. In addition, merely being cognizant of the above issues can help an observer to actively resist their effects. That said, no one can be perfectly rational all the time, which is why the scientific method of double blind trials (and the replication of results) is considered the best process available for discovering the truth or falsehood of a hypothesis. Perception and observation alone are considered inadequate.

However, even if one were able to enforce an inhuman level of discipline in an observer, the limits of observation extend beyond perceptual filters, biases and memory re-construction. For example, human’s are inherently innumerate. Patterns and generalities are easy for the brain. What’s difficult is effectively perceiving things like rates, means, medians and variance in behaviors or a performance measure within a given population. For example, imagine a perfectly rational hockey scout: a man who had basically conquered all the previous biases issues discussed earlier. He’s able to detach personal investment, resist confirmation bias, defuse other mental heuristics and observe hockey with an informed but impartial eye. Now, such a scout would seem theoretically invaluable and his employer would likely send him to observe as many junior games as possible. Let’s say 200 games per season. For the purposes of this thought experiment, let’s also assume that the scout is completely qualitative in his performance evaluations: meaning he doesn’t bother to record or look up stuff like point totals, plus/minus, shot rates etc.

His value at the end of the season would actually be fairly limited. His sense of the prospects and their performances would be sketched in general terms like: some, many, few, often, usually, a lot and rarely. It would be impossible to determine the validity and degree of these impressions. He would have almost no way to contextualize each prospects performance over the season. The distribution and variance of the junior population for each performance measure would be a total mystery to our hypothetical scout. He might have some dim awareness of approximate totals, but he’d never be able to judge rates and norms without gathering the requisite information and conducting calculations. The human brain just doesn’t do this naturally, especially over a large number of events and/or a long period of time. So even if he could observe a prospects performance through a completely rational framework, he’d be unable to properly couch the prospect’s level of performance in the context of his peers without the use of quantitative methods.

A related, non-trivial problem with observation is the high time and resource cost associated. Watching a single hockey game is a two-to-four hour investment, minimum. Watching 200 games, like the perfectly rational scout envisioned above, would represent an entire sixth month season of attending multiple rinks in different cities for 3 hours at a time. Even then, assuming his interest is in five different prospects, he’d only observe about 40 games for each. That’s just over half a season of work for a junior in the CHL. In statistical terms, that’s a relatively small sample to work with: a ten game scoring streak or drought could completely skew a player’s results (and the resultant perception of his abilities) for instance. If one considers 30 teams and hundreds of players at the NHL level, the issue grows by several orders of magnitude: it would take one observer 7,380 hours – or about 308 straight days – to watch every single NHL game during a typical season. Not including playoffs. That’s obviously just a straight forward viewing of each contest. The inclusion of replays or the close inspection of certain sequences would vastly increase the time and effort commitment.

The result for fans and pundits committed to the “saw him good” school of analysis is that their impression of a player or team is mostly made up relatively limited number of viewings. Particularly of players/teams they don’t habitually follow, meaning they’ll fall prey to the issue of small sample size and the variance/luck confounders that accompany it. In addition, their analysis will be cluttered with biases and skewed information: perceptually obvious plays and “highlight reel” events, high impact plays at critical moments (ie; “clutch”), as well as easily recalled (but potentially altered) memories that may not on even be truly indicative of overall impact or performance of the entity in question. They will also lack important referential data, such as rates and means, which place an individual team or players results in the proper context from which accurate conclusions can be made.

As mentioned at the onset, observation is the most primary, most data-rich source of information. It is also beset by psychological pitfalls and other limitations. While it is widely considered folly in conventional hockey analysis circles to only consider stats absent observation, the opposite is equally true: observation absent stats can leave one hopelessly lost amongst one’s own preconceptions and assumptions.

**This post would not have been possible without the excellent Elliot Aronson textbook “The Social Animal”.