Berkson’s Paradox

Why correlations breakdown in the presence of "range restriction"

There is a positive correlation between a high school student’s grades and standardized test scores.

Yet… high standardized test scores before entering university do not predict university grades.

I’ll let you think a bit about possible reasons for that.

Some more statements:

  1. “Smart students are less athletic.”
  2. “Good books make bad movies.”
  3. “Height does not correlate with performance in the NBA”
  4. “People winning engineering contests are not that good at their job.”
  5. “In a good restaurant, the least appetizing sounding items likely taste better.”

You may have nodded with 1 and 2. Too bad they are illusions.

You may have been surprised by 3,4 and 5.

These are all examples of Berkson’s Paradox. What the heck is that?

Via Wikipedia:

The most common example of Berkson’s paradox is a false observation of a negative correlation between two positive traits, i.e., that members of a population which have some positive trait tend to lack a second. Berkson’s paradox occurs when this observation appears true when in reality the two properties are unrelated—or even positively correlated—because members of the population where both are absent are not equally observed.

Here Wikipedia summarizes the same example I highlighted from Jordan Ellenberg’s book How Not To Be Wrong around the idea that attractive men are rude.

Suppose Alex will only date a man if his niceness plus his handsomeness exceeds some threshold. Then nicer men do not have to be as handsome to qualify for Alex’s dating pool. So, among the men that Alex dates, Alex may observe that the nicer ones are less handsome on average (and vice versa), even if these traits are uncorrelated in the general population. Note that this does not mean that men in the dating pool compare unfavorably with men in the population. On the contrary, Alex’s selection criterion means that Alex has high standards. The average nice man that Alex dates is actually more handsome than the average man in the population (since even among nice men, the ugliest portion of the population is skipped). Berkson’s negative correlation is an effect that arises within the dating pool: the rude men that Alex dates must have been even more handsome to qualify. 

The key to understanding all of these examples is the correlations you expect break down when the sample is narrowed. This is commonly referred to as “range restriction”.

Let’s go back to the surprising statement I opened with. This time I’ll boldface the restrictor.

High standardized test scores before entering university do not predict university grades. 

Brilliant.org created a handy visual for understanding what’s happening. Even though SATs and GPA are positively correlated at large, at any particular university you may see a negative correlation.

They explain:

The admissions committee accepts students who have either a sufficiently high GPA, a sufficiently high SAT score, or some combination of the two. However, applicants who have both high GPAs and high SAT scores will likely get into a higher-tier school and not attend, even if they are accepted. The range of students that actually attend the school is given by the blue dots in the plot in the introduction. These dots show a downward trend even though the overall population (red and blue dots) show an upward trend. This trend reversal is the “paradox,” though there is nothing truly paradoxical about it. It is the result of a trade-off between GPA and SAT scores in the people reviewed.

Why is this so important?

Because it shows up everywhere! It’s tempting to see counterintuitive correlations and try to create a story about them but they are often not surprising once we realize that the narrow pool selects between 2 dominating attributes. Consider the NBA where many players are selected by skill and height. Height does not correlate with performance in the NBA because a short player in the NBA must have abnormally high skill to have gotten to the restricted range known as the NBA. Similarly, the chance of random 7 footer in the population playing in the NBA is an order of magnitude more likely than an average height player to make it to the NBA. It’s the same reason why you shouldn’t be surprised when a small NFL player like Wes Welker or Steve Tasker is a badass (I see you 90s Bills. I also hated you, but Tasker was a maniac).

So now I’ll go back to the original statements and boldface the “range restrictors”.

  1. Smart students are less athletic.”
  2. Good books make bad movies.”
  3. “Height does not correlate with performance in the NBA
  4. “People winning engineering contests are not that good at their job.”
  5. “In a good restaurant, the least appetizing sounding items likely taste better.”

The visual versions of all of these can be found in @page_eco Twitter thread that inspired this post.

More examples

  • Grit and violinists

    David Epstein spots a “restriction of range” problem in the book Grit which cites a study of 30 violinists. When you squash the range of a variable that is correlated with the dependent variable you risk understating the correlation with the restricted variable. In this case, the sample was violinists who had already been accepted to a famous academy. We have squashed their innate talent even though it likely has a wide range.He also articulates the NBA example: If you studied the correlation of height to points scored in basketball for NBA players you find a jarring negative correlation but that is because you are selecting from a sample of abnormally tall players, to begin with. You’ve squashed the height variable, which would lead people to think that height has no impact on points scored.
  • Surgeons

    Nassim Taleb has warned that you should be wary of surgeons that look like stars who play surgeons on TV.
  • Hedge fund managers

    Not due diligence advice but I’m guessing you probably would have wanted to invest with a black or female hedge fund manager in say the 1980s.

Finally, one last example from Byrne Hobart who has a “range restriction” detector in his brain:

Institutional Investor highlights a study showing that CEOs get more authority relative to boards when they benefit from lucky economic conditions they had nothing to do with. There’s always some range restriction at work when analyzing the performance of CEOs. In a simple model, a CEO gets the job through some combination of a) underlying skill, and b) the ability to persuade. That persuasive skill means that we should expect the worst CEOs to be overcompensated, and the best to be underpaid. And it makes sense that one channel through which charisma works is in taking credit for good news and dodging blame for bad news, irrespective of who was really responsible for either. 

It’s not surprising Byrne spots Berkson’s Paradox everywhere. He wrote Brilliant Jerks, Crazy Hotties, and Other Artifacts of Range Restriction. (Link)