There’s Gold In Them Thar Tails: Part 1

Selection from strictly narrowed criteria

If you were accepted to a selective college or job in the 90s, have you ever wondered if you’d get accepted in today’s environment? I wonder myself. It leaves me feeling grateful because I think the younger version of me would not have gotten into Cornell or SIG today. Not that I dwell on this too much. I take Heraclitus at his word that we do not cross the same river twice. Transporting a fixed mental impression of yourself into another era is naive (cc the self-righteous who think they’d be on the right side of history on every topic). Still, my self-deprecation has teeth. When I speak to friends with teens I hear too many stories of sterling resumes bulging with 3.9 GPAs, extracurriculars, and Varsity sport letters, being warned: “don’t bother applying to Cal”.

A close trader friend explained his approach. His daughter is a high achiever. She’s also a prolific writer. Her passion is the type all parents hope their children will be lucky enough to discover. My friend recognizes that the bar is so high to get into a top school that acceptance above that bar is a roulette wheel. With so much randomness lying above a strict filter, he de-escalates the importance of getting into an elite school. “Do what you can, but your life doesn’t depend on the whim of an admissions officer”. She will lean into getting better at what she loves wherever she lands. This approach is not just compassionate but correct. She’s thought ahead, got her umbrella, but she can’t control the weather.

My friend’s insight that acceptance above a high threshold is random is profound. And timely. I had just finished reading Rohit Krishnan’s outstanding post Spot The Outlier, and immediately sent it to my friend.

I chased down several citations in Rohit’s post to improve my understanding of this topic.

In this post, we will tie together:

  1. Why the funnels are getting narrower
  2. The trade-offs in our selection criteria
  3. The nature of the extremes: tail divergence
  4. Strategies for the extremes

We will extend the discussion in a later post with:

  1. What this means for intuition in general
  2. Applications to investing

Why Are The Funnels Getting Narrower?

The answer to this question is simple: abundance.

In college admissions, the number of candidates in aggregate grows with the population. But this isn’t the main driver behind the increased selectivity.  The chart below shows UC acceptance rates plummeting as total applications outstrip admits.

The spread between applicants and admissions has exploded. UCLA received almost 170k applications for the 2021 academic year! Cal receives over 100k applicants for about 10k spots. Your chances of getting in have cratered in the past 20 years. Applications have lapped population growth due to a familiar culprit: connectivity. It is much easier to apply to schools today. The UC system now uses a single boilerplate application for all of its campuses.

This dynamic exists everywhere. You can apply to hundreds of jobs without a postage stamp. Artists, writers, analysts, coders, designers can all contribute their work to the world in a permissionless way with as little as a smartphone. Sifting through it all necessitated the rise of algorithms — the admissions officers of our attention.

Trade-offs in Selection Criteria

There’s a trade-off between signal and variance. What if Spotify employed an extremely narrow recommendation engine indexed solely on artist? If listening to Enter Sandman only lead you to Metallica’s deepest cuts, the engine is failing to aid discovery. If it indexed by “year”, you’d get a lot more variance since it would choose across genres, but headbangers don’t want to listen to Color Me Badd.  This prediction fails to delight the user.

Algorithms are smarter than my cardboard examples but the tension remains. Our solutions to one problem excarbates another. Rohit describes the dilemma:

The solution to the problem of discovery is better selection, which is the second problem. Discovery problems demand you do something different, change your strategy, to fight to be amongst those who get seen.

There’s plenty of low-hanging fruit to find recommendations that reside between Color Me Badd and St. Anger. But once it’s picked, we are still left with a vast universe of possible songs for the recommendation engine to choose from.

Selection problems reinforce the fact that what we can measure and what we want to measure are two different things, and they diverge once you get past the easy quadrant.

In other words, it’s easy enough to rule out B students, but we still need to make tens of thousands of coinflip-like decisions between the remaining A students. Are even stricter exams an effective way narrow an unwieldy number of similar candidates? Since in many cases predictors poorly map to the target, the answer is probably no. Imagine taking it to the extreme and setting the cutoff to the lowest SAT score that would satisfy Cal’s expected enrollment. Say that’s 1400. This feels wrong for good reasons (and this is not even touching the hot stove topic of “fairness”). Our metrics are simply imperfect proxies for who we want to admit. In mathy language we can say, the best person at Y (our target variable) is not likely to come from the best candidates we screened if the screening criteria, X, is an imperfect correlate of success(Y).

The cost of this imperfect correlation is a loss of diversity or variance. Rohit articulates the true goal of selection criteria (emphasis mine):

Since no exam perfectly captures the necessary qualities of the work, you end up over-indexing on some qualities to the detriment of others. For most selection processes the idea isn’t to get those that perfectly fit the criteria as much as a good selection of people from amongst whom a great candidate can emerge.

This is even true in sports. Imagine you have a high NBA draft pick. A great professional must endure 82 games (plus a long playoff season), fame, money, and most importantly, a sustained level of unprecedented competition. Until the pros, they were kids. Big fish in small ponds. If you are selecting for an NBA player with narrow metrics, even beyond the well-understood requisite screens for talent, then those metrics are likely to be a poor guide to how the player will handle such an outlier life. The criteria will become more squishy as you try to parse the right tail of the distribution.

In the heart of the population distribution, the contribution to signal of increasing selectivity is worth the loss of variance. We can safely rule out B students for Cal and D3 basketball players for the NBA.  But as we get closer to elite performers, at what point should our metrics give way to discretion? Rohit provides a hint:

When the correlation between the variable measured and outcome desired isn’t a hundred percent, the point at which the variance starts outweighing the mean error is where dragons lie!

Nature Of The Extremes: Tail Divergence

To appreciate why the signal of our predictive metrics become random at the extreme right tail we start with these intuitive observations via LessWrong:

Extreme outliers of a given predictor are seldom similarly extreme outliers on the outcome it predicts, and vice versa. Although 6’7″ is very tall, it lies within a couple of standard deviations of the median US adult male height – there are many thousands of US men taller than the average NBA player, yet are not in the NBA. Although elite tennis players have very fast serves, if you look at the players serving the fastest serves ever recorded, they aren’t the very best players of their time. It is harder to look at the IQ case due to test ceilings, but again there seems to be some divergence near the top: the very highest earners tendto be very smart, but their intelligence is not in step with their income (their cognitive ability is around +3 to +4 SD above the mean, yet their wealth is much higher than this).

The trend seems to be that even when two factors are correlated, their tails diverge: the fastest servers are good tennis players, but not the very best (and the very best players serve fast, but not the very fastest); the very richest tend to be smart, but not the very smartest (and vice versa). 

The post uses simple scatterplots to demonstrate. Here are 2 self-explanatory charts. 

LessWrong continues: Given a correlation, the envelope of the distribution should form some sort of ellipse, narrower as the correlation goes stronger, and more circular as it gets weaker.

If we zoom into the far corners of the ellipse, we see ‘divergence of the tails’: as the ellipse doesn’t sharpen to a point, there are bulges where the maximum x and y values lie with sub-maximal y and x values respectively:

Say X is SAT score and Y is college GPA. We shoudn’t expect that the person with highest SATs will earn the highest GPA. SAT is an imperfect correlate of GPA. LessWrong’s interpretation is not surprising:

The fact that a correlation is less than 1 implies that other things matter to an outcome of interest. Although being tall matters for being good at basketball, strength, agility, hand-eye-coordination matter as well (to name but a few). The same applies to other outcomes where multiple factors play a role: being smart helps in getting rich, but so does being hard working, being lucky, and so on.

Pushing this even further, if we zoom in on the extreme of a distribution we may find correlations invert! This scatterplot via Brilliant.org shows a positive correlation over the full sample (pink) but a negative correlation for a slice (blue). 

This is known as Berkson’s Paradox and can appear when you measure a correlation over a “restricted range” of a distribution (for example, if we restrict our sample to the best 20 basketball players in the world we might find that height is negatively correlated to skill if the best players were mostly point guards).

[I’ve written about Berkson’s Paradox here. Always be wary of someone trying to show a correlation from a cherry-picked range of a distribution. Once you internalize this you will see it everywhere! I’d be charitable to the perpetrator. I suspect it’s usually careless thinking rather than a nefarious attempt to persuade.]

Strategies For The Extremes

In 1849, assayor Dr. M. F. Stephenson shouted ‘There’s gold in them thar hills’ from the steps of the Lumpkin County Courthouse in a desperate bid to keep the miners in Georgia from heading west to chase riches in California. We know there’s gold in the tails of distributions but our standard filters are unfit to sift for them. 

Let’s pause to take inventory of what we know. 

  1. As the number of candidates or choices increases we demand stricter criteria to keep the field to a manageable size.
  2. At some cutoff, in the extreme of a distribution, selection metrics can lead to random or even misleading predictions.¹

    I’ll add a third point to what we have already established:
  3. Evolution in nature works by applying competitve pressures to a diverse population to stimulate adaptation (a form of learning). Diversity is more than a social buzzword. It’s an essential input to progress. Rohit implicitly acknowledges the dangers of inbreeding when he warns against putting folks through a selection process that reflexively molds them into rule-following perfectionists rather than those who are willing to take risks to create something new.

With these premises in place we can theorize strategies for both the selector and the selectee to improve the match between a system’s desired output (the definition of success depends on the context) and its inputs (the criteria the selector uses to filter). 

Selector Strategies

We can continue to rely on conventional metrics to filter the meat of the distribution for a pool of candidates. As we get into the tails, our adherence and reverence for measures should be put aside in favor of increasing diversity and variance. Remember the output of an overly strict filter in the tail is arbitrary anyway. Instead we can be deliberate about the randomness we let seep into selections to maximize the upside of our optionality. 

Rohit summarizes the philosophy:

Change our thinking from a selection mindset (hire the best 5%) to a curation mindset (give more people a chance, to get to the best 5%).

Practically speaking this means selectors must widen the top of the funnel then…enforce the higher variance strategy of hire-and-train.

Rohit furnishes examples:

  • Tyler Cowen’s strategy of identifying unconventional talent and placing small but influential bets on the candidates. This is easier to say than do but Tony Kulesa finds some hints in Cowen’s template. 
  • The Marine Corps famously funnels wide electing not to focus so much on the incoming qualifications, but rather look at recruiting a large class and banking on attrition to select the right few.
  • Investment banks and consulting firms hire a large group of generically smart associates, and let attrition decide who is best suited to stick around.

David Epstein, author of Range and The Sports Gene, has spent the past decade studying the development of talent in sports and beyond. He echoes these strategies:

One practice we’ve often come back to: not forcing selection earlier than necessary. People develop at different speeds, so keep the participation funnel wide, with as many access points as possible, for as long as possible. I think that’s a pretty good principle in general, not just for sports.

I’ll add 2 meta observations to these strategies:

  1. The silent implication is the upside of matching the right talent to the right role is potentially massive. If you were hiring someone to bag groceries the payoff to finding the fastest bagger on the planet is capped. An efficient checkout process is not the bottleneck to a supermarket’s profits. There’s a predictable ceiling to optimizing it to the microsecond. That’s not the case with roles in the above examples. 
  2. Increasing adoption of these strategies requires thoughtful “accounting” design. High stakes busts, whether they are first round draft picks or 10x engineers, are expensive in time and money for the employer and candidate. If we introduce more of a curation mindset, cast wider nets and hire more employees, we need to understand that the direct costs of doing that should be weighed against the opaque and deferred costs of taking a full-size position in expensive employees from the outset.

    Accrual accounting is an attempt match a business’ economic mechanics to meaningful reports of stocks and flows so we extract insights that lead to better bets. Fully internalized, we must recognize that some amount of churn is expected as “breakage”. Lost option premiums need to be charged against the options that have paid off 100x. If an organization fails to design its incentive and accounting structures in accordance with curation/optionality thinking it will be unable to maintain its discipline to the strategy.  

Selectee Strategies

For the selectee trying to maximize their own potential there are strategies which exploit the divergence in the tails. 

To understand, we first recognize, that in any complicated domain, the effort to become the best is not linear. You could devote a few years to becoming an 80th or 90 percentile golfer or chess player. But in your lifetime you wouldn’t become Tiger or Magnus. The rewards to effort decay exponentially after a certain point. Anyone who has lifted weights knows you can spend a year progressing rapidly, only to hit a plateau that lasts just as long. 

The folk wisdom of the 80/20 rule captures this succinctly: 80% of the reward comes from 20% of the effort, and the remaining 20% of the reward requires 80% effort. The exact numbers don’t matter. Divorced from contexts, it’s more of a guideline. 

This is the invisible foundation of Marc Andreesen and Scott Adam’s career advice to level up your skills in multiple domains. Say coding and public speaking or writing plus math. If it’s exponentially easier to get to the 90th percentile than the 99th then consider the arithmetic².

a) If you are in the 99th percentile you are 1 in 100. 

b) If you are top 10% in 2 different (technically uncorrelated) domains then you are also 1 in 100 because 10% x 10% = 1%

It’s exponentially easier to achieve the second scenario because of the effort scaling function. 

If this feels too stifling you can simply follow your curiosity. In Why History’s Greatest Innovators Optimized for Interesting, Taylor Pearson summarizes the work of Juergen Schmidhuber which contends that curiosity is the desire to make sense of, or compress, information in such a way that we make it more beautiful or useful in its newly ordered form. If learning (or as I prefer to say – adapting) is downstream from curiousity we should optimize for interesting

Lawrence Yeo unknowingly takes the baton in True Learning Is Done With Agency, with his practical advice. He tells us to truly learn we must:

decouple an interest from its practical value. Instead of embarking on something with an end goal in mind, you do it for its own sake. You don’t learn because of the career path it’ll open up, but because you often wonder about the topic at hand.

…understand that a pursuit truly driven by curiosity will inevitably lend itself to practical value anyway. The internet has massively widened the scope of possible careers, and it rewards those who exercise agency in what they pursue.

Conclusion

Rohit’s essay anchored Part 1 of this series. I can’t do better than let his words linger before moving on to Part 2.

If measurement is too strict, we lose out on variance.

If we lose out on variance, we miss out on what actually impacts outcomes.

If we miss what actually impacts outcomes, we think we’re in a rut.

But we might not be.

Once you’ve weeded out the clear “no”s, then it’s better to bet on variance rather than trying to ascertain the true mean through imprecise means.

We should at least recognize that our problems might be stemming from selection efforts. We should probably lower our bars at the margin and rely on actual performance [as opposed to proxies for performance] to select for the best. And face up to the fact that maybe we need lower retention and higher experimentation.

Looking Ahead

In Part 2, we will explore what divergence in the tails can tell us about about life and investing. 


Footnotes

  1. LessWrong has an intuitive explanation for some subset of Berkson Paradox observations — the “too much of a good thing” hypothesis. It goes like this: One candidate explanation would be that more isn’t always better, and the correlations one gets looking at the whole population doesn’t capture a reversal at the right tail. Maybe being taller at basketball is good up to a point, but being really tall leads to greater costs in terms of things like agility. Maybe although having a faster serve is better all things being equal, but focusing too heavily on one’s serve counterproductively neglects other areas of one’s game. Maybe a high IQ is good for earning money, but a stratospherically high IQ has an increased risk of productivity-reducing mental illness. Or something along those lines.I would guess that these sorts of ‘hidden trade-offs’ are common.
  2. This LessWrong post builds a more detailed toy model than the hand-waving I did. I appreciate the effort even if it feels like NFL refs stretching the first down chains. The juxtaposition of measuring an arbitrary spot of a leather ball done with a naked eye from several yards away in the midst of over-fed colliding bodies is a smidge of humor that has faded into our autumn Sunday armchairs