Posts Tagged ‘normal distribution’
“It’s the bell curve again”*…
Joseph Howlett on how the central limit theorem, which started as a bar trick for 18th-century gamblers, became something on which scientists rely every day…
No matter where you look, a bell curve is close by.
Place a measuring cup in your backyard every time it rains and note the height of the water when it stops: Your data will conform to a bell curve. Record 100 people’s guesses at the number of jelly beans in a jar, and they’ll follow a bell curve. Measure enough women’s heights, men’s weights, SAT scores, marathon times — you’ll always get the same smooth, rounded hump that tapers at the edges.
Why does the bell curve pop up in so many datasets?
The answer boils down to the central limit theorem, a mathematical truth so powerful that it often strikes newcomers as impossible, like a magic trick of nature. “The central limit theorem is pretty amazing because it is so unintuitive and surprising,” said Daniela Witten, a biostatistician at the University of Washington. Through it, the most random, unimaginable chaos can lead to striking predictability.
It’s now a pillar on which much of modern empirical science rests. Almost every time a scientist uses measurements to infer something about the world, the central limit theorem is buried somewhere in the methods. Without it, it would be hard for science to say anything, with any confidence, about anything.
“I don’t think the field of statistics would exist without the central limit theorem,” said Larry Wasserman, a statistician at Carnegie Mellon University. “It’s everything.”
Perhaps it shouldn’t come as a surprise that the push to find regularity in randomness came from the study of gambling…
Read on for the fascinating story of: “The Math That Explains Why Bell Curves Are Everywhere,” from @quantamagazine.bsky.social.
Howlett concludes by observing that “The central limit theorem is a pillar of modern science, ultimately, because it’s a pillar of the world around us. When we combine lots of independent measurements, we get clusters. And if we’re clever enough, we can use those clusters to find out something interesting about the processes that made them”– which follows from the story he shares.
Still, we’d do well to remember that there are limits to its applicability, both descriptively (as Nassim Nicholas Taleb points out, “because the bell curve ignores large deviations, cannot handle them, yet makes us confident that we have tamed uncertainty”) and prescriptively (as Benjamim Bloom argues, “The bell-shaped curve is not sacred. It describes the outcome of a random process. Since education is a purposeful activity….the achievement distribution should be very different from the normal curve if our instruction is effective).
For (much) more, see Peter Bernstein‘s wonderful Against the Gods: The Remarkable Story of Risk
* Robert A. Heinlein, Time Enough for Love
###
As we noodle on the normal distribution, we might send curve-shattering birthday greetings to Norman Borlaug; he was born on ths date in 1914. An agronomist, he developed and led initiatives worldwide that contributed to the voluminous increases in agricultural production we call “the Green Revolution.” Borlaug was awarded multiple honors for his work, including the Nobel Peace Prize, the Presidential Medal of Freedom, and the Congressional Gold Medal; he’s one of only seven people to have received all three of those awards.
“Chance, too, which seems to rush along with slack reins, is bridled and governed by law”*…
And the history of our understanding of those laws is, as Tom Chivers explains (in an excerpt from his book, Everything is Predictable), both fascinating and illuminating…
Traditionally, the story of the study of probability begins in French gambling houses in the mid-seventeenth century. But we can start it earlier than that.
The Italian polymath Gerolamo Cardano had attempted to quantify the maths of dice gambling in the sixteenth century. What, for instance, would the odds be of rolling a six on four rolls of a die, or a double six on twenty-four rolls of a pair of dice?
His working went like this. The probability of rolling a six is one in six, or 1/6, or about 17 percent. Normally, in probability, we don’t give a figure as a percentage, but as a number between zero and one, which we call p. So the probability of rolling a six is p = 0.17. (Actually, 0.1666666… but I’m rounding it off.)
Cardano, reasonably enough, assumed that if you roll the die four times, your probability is four times as high: 4/6, or about 0.67. But if you stop and think about it for a moment, that can’t be right, because it would imply that if you rolled the die six times, your chance of getting a six would be one-sixth times six, or one: that is, certainty. But obviously it’s possible to roll six times and have none of the dice come up six.
What threw Cardano is that the average number of sixes you’ll see on four dice is 0.67. But sometimes you’ll see three, sometimes you’ll see none. The odds of seeing a six (or, separately, at least one six) are different.
In the case of the one die rolled four times, you’d get it badly wrong—the real answer is about 0.52, not 0.67—but you’d still be right to bet, at even odds, on a six coming up. If you used Cardano’s reasoning for the second question, though, about how often you’d see a double six on twenty-four rolls, it would lead you seriously astray in a gambling house. His math would suggest that, since a double six comes up one time in thirty-six (p ≈ 0.03), then rolling the dice twenty-four times would give you twenty-four times that probability, twenty-four in thirty-six or two-thirds (p ≈ 0.67, again).
This time, though, his reasonable but misguided thinking would put you on the wrong side of the bet. The probability of seeing a double six in twenty-four rolls is 0.49, slightly less than half. You’d lose money betting on it. What’s gone wrong?
A century or so later, in 1654, Antoine Gombaud, a gambler and amateur philosopher who called himself the Chevalier de Méré, was interested in the same questions, for obvious professional reasons. He had noticed exactly what we’ve just said: that betting that you’ll see at least one six in four rolls of a die will make you money, whereas betting that you’ll see at least one double six in twenty-four rolls of two dice will not. Gombaud, through simple empirical observation, had got to a much more realistic position than Cardano. But he was confused. Why were the two outcomes different? After all, six is to four as thirty-six is to twenty-four. He recruited a friend, the mathematician Pierre de Carcavi, but together they were unable to work it out. So they asked a mutual friend, the great mathematician Blaise Pascal.
The solution to this problem isn’t actually that complicated. Cardano had got it exactly backward: the idea is not to look at the chances that something would happen by the number of goes you take, but to look at the chances it wouldn’t happen…
…
… Pascal came up with a cheat. He wasn’t the first to use what we now call Pascal’s triangle—it was known in ancient China, where it is named after the mathematician Yang Hui, and in second-century India. But Pascal was the first to use it in problems of probability.
It starts with 1 at the top, and fills out each layer below with a simple rule: on every row, add the number above and to the left to the number above and to the right. If there is no number in one of those places, treat it as zero…
… Now, if you want to know what the possibility is of seeing exactly Y outcomes, say heads, on those seven flips:
It’s possible that you’ll see no heads at all. But it requires every single coin coming up tails. Of all the possible combinations of heads and tails that could come up, only one—tails on every single coin—gives you seven heads and zero tails.
There are seven combinations that give you one head and six tails. Of the seven coins, one needs to come up heads, but it doesn’t matter which one. There are twenty-one ways of getting two heads. (I won’t enumerate them all here; I’m afraid you’re going to have to trust me, or check.) And thirty-five of getting three.
You see the pattern? 1 7 21 35—it’s row seven of the triangle…
Pascal’s triangle is only one way of working out the probability of seeing some number of outcomes, although it’s a very neat way. In situations where there are two possible outcomes, like flipping a coin, it’s called a “binomial distribution.”
But the point is that when you’re trying to work out how likely something is, what we need to talk about is the number of outcomes— the number of outcomes that result in whatever it is you’re talking about, and the total number of possible outcomes. This was, I think it’s fair to say, the first real formalization of the idea of “probability.”..
On the historical origins of the science of probability and statistics: “Rolling the Dice: What Gambling Can Teach Us About Probability,” from @TomChivers in @lithub.
See also: Against the Gods, by Peter Bernstein.
And for a look at how related concepts shape thinking among quantum physicists, see “The S-Matrix Is the Oracle Physicists Turn to in Times of Crisis.”
* Boethius, The Consolation of Philosophy
###
As we roll the bones, we might send carefully-calculated birthday greetings to a central player in this saga, Abraham de Moivre; he was born on this date in 1667. A mathematician, he’s known for de Moivre’s formula, which links complex numbers and trigonometry, and (more relevantly to the piece above) for his work on the normal distribution and probability theory. de Moivre was the first to postulate the central limit theorem (TLDR: the probability distribution of averages of outcomes of independent observations will closely approximate a normal distribution)– a cornerstone of probability theory. And in his time, his book on probability, The Doctrine of Chances, was prized by gamblers.





You must be logged in to post a comment.