(Roughly) Daily

Posts Tagged ‘statistical analysis

“Devise, wit; write, pen; for I am for whole volumes in folio”*…

source

The ever-illuminating Jason Kottke dips into Statistical Reasoning for Everyday Life (Bennett, Briggs, and Triola; Addison Wesley Longman; Second Edition, 2002) for a measure of Shakespeare’s vocabulary.  Using a method recounted here, the authors concluded:

This means that in addition the 31,534 words that Shakespeare knew and used, there were approximately 35,000 words that he knew but didn’t use. Thus, we can estimate that Shakespeare knew approximately 66,534 words.

Linguist Richard Lederer observes (as cited in in this piece) that Shakespeare hadn’t begun to reach the bottom of the barrel:  there are currently over 600,000 entries in the Oxford English Dictionary (and in Shakespeare’s time things were especially fluid– as witnessed by the Bard’s own fevered invention of new words and phrases).

Still, Shakespeare’s facility is easier to appreciate in context when we recognize that the average English speaker has a vocabulary of (only) 10,000 to 20,000 words, and, as Lederer observes, actually uses only a fraction of that (the rest being recognition or recall vocabulary).

* Love’s Labour’s Lost I,ii

As we reach for our copies of Word Power, we might wish a glittering birthday to Anita Loos, who was born on this date in 1888. A writer from childhood, she sold a movie idea to D.W Griffiths at Biograph while she was still in her teens– and began a career through which she wrote plays, movies, stories/novels, magazine articles, and finally memoirs.

She’s probably best remembered for her 1925 novel Gentlemen Prefer Blondes.  Loos claimed to have written the spoof, which she started on a long train ride, as an entertainment for her friend H. L. Mencken (who reputedly had a fondness for Lorelei Lee-like blonds).  In any case, the book was an international bestseller, printed in 14 languages and in over 85 editions. It was a hit on Broadway in 1949, then adapted again into a movie musical in 1953– the Howard Hawks classic in which Marilyn Monroe reminds us that “Diamonds Are a Girl’s Best Friend.”

Loos with fellow writer (and sometime husband) John Emerson
by Edward Steichen for Vanity Fair, July 1928

Playing the odds…

A P value is the probability of an observed (or more extreme) result arising only from chance.

It’s science’s dirtiest secret: The “scientific method” of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.

Replicating a result helps establish its validity more securely, but the common tactic of combining numerous studies into one analysis, while sound in principle, is seldom conducted properly in practice.

Experts in the math of probability and statistics are well aware of these problems and have for decades expressed concern about them in major journals. Over the years, hundreds of published papers have warned that science’s love affair with statistics has spawned countless illegitimate findings. In fact, if you believe what you read in the scientific literature, you shouldn’t believe what you read in the scientific literature.

“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense”…

What’s one to make of the stream of “eat this,” “avoid that” studies surfacing nearly daily?  It’s an odds-on bet that readers will find out in the complete Science News story, “Odds Are, It’s Wrong.”

As we tell Monty that we’ll take what’s behind Door #2, we might recall that it was on this date in 1905 that Albert Einstein kicked off  “Annus Mirabilis” with the publication of the first of his four epoch-making papers in Annalen der Physik— this one, proposing energy “quanta”– thus kicking off the year in which he reinvented physics and our understanding of reality.

The second of those papers, on Brownian motion, was the very first work of “statistical physics.”

Einstein, dressed for the patent office, 1905

Happy Náw-Rúz! This date in 1844 was the first day of the first year of the Bahai calendar.

%d bloggers like this: