When scientists in England reported last April that what a woman eats around the time she conceives can affect whether she has a boy or a girl—the headline-making finding of the study, titled “You Are What Your Mother Eats,” was that women who ate breakfast cereal were more likely to have a boy—it was picked up by more newspapers and Websites than you can count (including here, here and here). Basically, they reported that 56 percent of women who consumed the most calories (including breakfast cereal) before conceiving had boys, while only 45 percent of women who consumed the fewest calories did. Now comes the not-so-fast part.
In a paper online in the same journal that published the original, scientists led by statistician S. Stanley Young, assistant director of the National Institute of Statistical Sciences, call the link between eating cereal and giving birth to a boy “most likely a multiple testing false positive.” They contend that the correlation between eating cereal and having a boy is easily explained by chance, and is not a true cause-and-effect relationship.
The statistical argument is somewhat arcane, but can be boiled down to this: if you test lots and lots of things (behaviors, genes, anything) to see whether it is correlated with some outcome (getting a disease, having a boy, whatever) then you are bound to get a hit by chance alone. Think of it this way: you are curious about whether particular articles of clothing correlate with getting run over by a car, and you test hundreds of clothes (pink shirts, blue shirts, black pants, blue jeans, do-rags, cowboy hats . . . ). By statistical fluke, people wearing some article of clothing are going to be more likely to get run over than are people wearing any other article of clothing. Look at enough pairings of supposed cause and effect, and by chance alone you’ll find one.
“Multiple testing can easily lead to false discoveries,” write Young and his co-authors. “Hundreds of comparisons were conducted . . . [so] the claimed effects are readily explainable by chance.”
The authors of the original paper, led by Fiona Mathews of the University of Exeter, are having none of it. They deny running hundreds of tests in a search for something that correlates with sex determination. Firing back, they say that their critics' “account of our work bears little relationship to the methods, results or conclusions we report. For example, [they] claim that we used 396 tests to address our primary hypothesis. In fact, we used two.” (Young disputes this. How many tests Mathews ran depends on what you mean by “tests.” And, probably, by “how many.”)
While we let the two teams duke it out, I am reminded of the cautionary note that John Ioannidis of Tufts University and the University of Ioannina School of Medicine (in Greece) sounded in 2005. The title of his paper, “Why Most Published Research Findings Are False,” says it all.