Who discovered regression to the mean?

Francis Galton, in the 1880s, while studying inherited traits. He noticed that tall fathers had sons who were taller than average, but not as tall as the fathers themselves. Short fathers had sons closer to average than themselves. He called this 'regression toward mediocrity' — now known as regression to the mean. The statistical term 'regression' originally comes from this finding.

Does it apply to anything that isn't 50/50?

Yes — anywhere a measurement has noise. Sports performance, test scores, blood pressure readings, business quarterly results, weather data. As long as there's variation around an underlying true value, extreme observations will be followed by less extreme ones. The bigger the noise, the stronger the regression.

Is regression to the mean the same as the gambler's fallacy?

No, opposite. The gambler's fallacy is wrongly believing extreme runs will reverse (e.g., 'red is due after a black streak'). Regression to the mean is the real, correct expectation that extreme MEASUREMENTS — those containing noise — will be followed by less extreme ones. The difference is whether you're looking at fixed-probability events (independent) or measurements of an underlying value (regressed).

Regression to the Mean — Why Extremes Don't Last

The pattern

Pick someone who did exceptionally well at something — say, a rookie baseball player who had the best batting average in the league. Watch them next year.

Statistically, they will almost certainly do worse. Not because something is wrong, not because of pressure, not because of the "sophomore slump curse" — just because their first year's performance was a combination of their actual ability and good luck, and the luck won't repeat consistently.

This is regression to the mean, and it explains an astonishing fraction of what we mistake for cause and effect.

Where it comes from

Most measurements have two components: the true value (what you're trying to measure) and noise (random fluctuation from many small factors).

When you select for an extreme observation — the highest, the lowest, the most unusual — you're not just selecting for an extreme true value. You're also selecting for extreme noise, because someone with average ability who happened to have a very lucky day can show up in the same extreme position as someone with high ability who had an average day.

When you measure them again, the noise resets. Their true value (high or low) stays roughly the same, but the noise is now drawn fresh from its normal distribution — mostly close to zero, not extreme. So the second measurement is closer to the true value, which is closer to average than the first measurement was.

That's the whole mechanism. Selection partly captures noise; noise reverts; measurements regress.

Galton's discovery

Francis Galton noticed this in the 1880s while studying inherited heights. He found that tall fathers had sons who were taller than average — but not as tall as the fathers. Short fathers had sons closer to average than themselves.

This puzzled him. If tallness was inherited, why didn't tall fathers produce equally tall sons? Why this drift toward the middle?

The answer wasn't biological. It was statistical. Height in any individual is partly genetic and partly random (nutrition, gestation, measurement error). Selecting tall fathers selects partly for genes and partly for the random factors that happened to favor them. Their sons inherit the genes but not the random luck, so they fall closer to average.

Galton called this "regression toward mediocrity," and the entire statistical concept of "regression" got its name from this. It's an unfortunate name — there's nothing about mediocrity involved, and "regression" doesn't really mean what it suggests in normal English. But the math is solid.

The Sports Illustrated cover curse

A famous example: athletes who appear on the cover of Sports Illustrated tend to underperform in the next year. Sportswriters speculated about pressure, complacency, hubris, the curse of fame.

The real explanation: SI tends to put exceptionally-performing athletes on the cover. Their previous performance was a combination of skill and luck. Their luck won't repeat. They'll perform closer to their actual skill level — which, while still good, is worse than the lucky streak that got them on the cover.

No curse. Just regression.

The same logic explains:

The "sophomore slump" in any field — second albums, second novels, second seasons.
Why "best schools" don't keep their #1 ranking — they got there partly through good luck (a strong cohort, a favorable year of test scoring); the next year will be more typical.
Why "fastest-growing companies" usually slow down — their growth rate was an extreme observation that included favorable conditions.
Why "the boss is brilliant when she yells, and useless when she encourages" — yelling happens after bad performance (which tends to be followed by better, regardless), and encouragement happens after good performance (which tends to be followed by worse).

This last one is Kahneman's most famous example. Israeli air force trainers told him that praise spoiled pilots while criticism improved them. He pointed out: bad performances are usually flukes, so they'll be followed by better performances regardless of what the instructor says. Good performances are also flukes, followed by worse performances. Praise vs criticism wasn't doing anything. Pure regression.

How it fakes evidence

Regression to the mean is one of the cleanest ways to manufacture apparent treatment effects.

Imagine you test a new headache cure. You recruit subjects with the worst headaches. You give them the drug. A week later, their headaches are better.

Did the drug work? You can't tell. Because you selected for "worst headache," you partly selected for people whose headaches were temporarily extreme. Even with no treatment, most of them would have improved. Their headaches regress toward their personal averages.

This is why placebo-controlled trials matter. Without a placebo control, regression alone can manufacture a 30-40% "improvement" in selected-on-extremes populations. Any treatment will look effective. Pharmaceutical trial design has accounted for this since the 1960s, but pseudoscience and quack medicine often don't.

How to spot it

Whenever you hear a story of the form "X was extreme, then [intervention happened], then X was less extreme, so the intervention worked," ask whether regression to the mean is doing the work:

Did the subjects get selected because they were extreme?
Is the measurement itself noisy?
Was there a control group of similarly-selected subjects who didn't get the intervention?

If the answers are yes, yes, and no, the apparent effect might be regression, not the intervention.

How to deal with it in your own analysis

If you can, don't select on the outcome. Random assignment removes the problem.

If you must select on extremes:

Pre-register your hypothesis and methods before looking at the data.
Have a control group, also selected on the same extremes.
Report measurement reliability — high noise means high regression.

If you're reading research, look for these in the design. Studies that lack a proper control group, or that select on extreme initial scores, should be assumed to be at least partly capturing regression.

If you'd like a guided 5-minute walkthrough of regression to the mean with worked examples, NerdSip can generate a course on it.

The cousins

Regression to the mean is related to several other statistical effects:

Reversion to the mean in finance: high-performing stocks tend to underperform later. Mostly regression, plus some real economic dynamics.
Mean reversion in physics: a system perturbed from equilibrium tends to return. Often real (thermodynamics), occasionally a regression illusion.
Cognitive bias of "things even out": a real-feeling expectation that bad luck must reverse, sometimes valid (regression) and sometimes wrong (gambler's fallacy — see the gambler's fallacy article).

The hardest part is keeping track of which is which. Regression to the mean applies when you're measuring a noisy underlying value. Gambler's fallacy is about events with fixed independent probabilities. They look similar but are mathematically opposite.

The takeaway

Regression to the mean is a statistical certainty: when measurements have noise, selecting for extreme observations and then re-measuring will produce values closer to the average. Any "intervention" applied between the selections gets unfairly credited for what regression was always going to do. It's been quietly creating the appearance of treatment effects, training improvements, and miracle cures since people started measuring things. Once you can see it, half of "X did Y and look what happened" stories take on a different colour.