(Roughly) Daily

Posts Tagged ‘statistical analysis

“Gentlemen, you need to add armor-plate where the holes aren’t, because that’s where the holes were on the airplanes that didn’t return”*…

Diagram of bullet-holes in WWII bombers that returned

Allied bombers were key to Britain’s air offensive against Germany during the second world war. As such, the RAF wanted to armour their bombers to prevent them from being shot down. But armour is heavy – you cannot reinforce an entire bomber and still have it fly. So statistician Abraham Wald was asked to advise on where armour should be placed on a bomber.

After each wave of bombing, every returning aircraft was meticulously examined and a note was made of where each aircraft had sustained damage by the Germans. The image [above] conceptualises what Wald’s data might have looked like visually.

So what was Wald’s advice? Where should armour be added?

He essentially advised the RAF to add armour to places where you do not find bullet holes. Wait… what?!

Wald wisely understood that the data was based only on planes that survived. The planes that did not survive were likely to have sustained damage on the areas where we do not observe bullet holes – such as around the engine or cockpit…

Making better decisions: one of the most prevalent– and insidious– forms of selection bias, survivorship bias, illustrated: “How to armour a WWII bomber.”

See also: “How to avoid being duped by survivorship bias.”

###

As we think clearly, we might send productive birthday greetings to W. Edwards Deming; he was born on this date in 1900. An engineer, statistician, professor, author, lecturer, and management consultant, he helped develop the sampling techniques still used by the U.S. Department of the Census and the Bureau of Labor Statistics.

But he is better remembered as the champion of statistically-based production management techniques that first gained traction in post-WWII Japan, where many credit Deming as a key ingredient in what has become known as the Japanese post-war economic miracle of 1950 to 1960, when Japan rose from the ashes of war onto the its path to becoming the second-largest economy in the world– through processes shaped by the ideas Deming taught. In 1951, the Japanese government established the Deming Prize in his honor.

While his impact in Japan (finally) brought him to the attention of business leaders in the U.S., he was only just beginning to win widespread recognition in the U.S. at the time of his death in 1993.

source

“Devise, wit; write, pen; for I am for whole volumes in folio”*…

source

The ever-illuminating Jason Kottke dips into Statistical Reasoning for Everyday Life (Bennett, Briggs, and Triola; Addison Wesley Longman; Second Edition, 2002) for a measure of Shakespeare’s vocabulary.  Using a method recounted here, the authors concluded:

This means that in addition the 31,534 words that Shakespeare knew and used, there were approximately 35,000 words that he knew but didn’t use. Thus, we can estimate that Shakespeare knew approximately 66,534 words.

Linguist Richard Lederer observes (as cited in in this piece) that Shakespeare hadn’t begun to reach the bottom of the barrel:  there are currently over 600,000 entries in the Oxford English Dictionary (and in Shakespeare’s time things were especially fluid– as witnessed by the Bard’s own fevered invention of new words and phrases).

Still, Shakespeare’s facility is easier to appreciate in context when we recognize that the average English speaker has a vocabulary of (only) 10,000 to 20,000 words, and, as Lederer observes, actually uses only a fraction of that (the rest being recognition or recall vocabulary).

* Love’s Labour’s Lost I,ii

As we reach for our copies of Word Power, we might wish a glittering birthday to Anita Loos, who was born on this date in 1888. A writer from childhood, she sold a movie idea to D.W Griffiths at Biograph while she was still in her teens– and began a career through which she wrote plays, movies, stories/novels, magazine articles, and finally memoirs.

She’s probably best remembered for her 1925 novel Gentlemen Prefer Blondes.  Loos claimed to have written the spoof, which she started on a long train ride, as an entertainment for her friend H. L. Mencken (who reputedly had a fondness for Lorelei Lee-like blonds).  In any case, the book was an international bestseller, printed in 14 languages and in over 85 editions. It was a hit on Broadway in 1949, then adapted again into a movie musical in 1953– the Howard Hawks classic in which Marilyn Monroe reminds us that “Diamonds Are a Girl’s Best Friend.”

Loos with fellow writer (and sometime husband) John Emerson
by Edward Steichen for Vanity Fair, July 1928

Playing the odds…

A P value is the probability of an observed (or more extreme) result arising only from chance.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense”…

What’s one to make of the stream of “eat this,” “avoid that” studies surfacing nearly daily?  It’s an odds-on bet that readers will find out in the complete Science News story, “Odds Are, It’s Wrong.”

As we tell Monty that we’ll take what’s behind Door #2, we might recall that it was on this date in 1905 that Albert Einstein kicked off  “Annus Mirabilis” with the publication of the first of his four epoch-making papers in Annalen der Physik— this one, proposing energy “quanta”– thus kicking off the year in which he reinvented physics and our understanding of reality.

The second of those papers, on Brownian motion, was the very first work of “statistical physics.”

Einstein, dressed for the patent office, 1905

Happy Náw-Rúz! This date in 1844 was the first day of the first year of the Bahai calendar.