« lambda.pm | Main | Classics »

On Why Most Published Research Findings Are False

Here's a stimulating article: Why Most Published Research Findings Are False by John P. A. Ioannidis (PLoS Medicine, 2005). It focuses on research that aims to find statistical relationships in data, and asserts that most such relationships claimed in the literature are in fact false. Distilling the discussion, I find these compelling reasons why it would be so:

  • standard tests of "statistical significance" are taken as proof of a proposition,
  • there is bias in experimental design & interpretation,
  • researchers and journals prefer positive results,
  • experiments tend not to be independently reproduced.

This last point is particularly damning—few things are more essential to the scientific method than reproducible experiments, yet the article blithely says (and I readily believe) that most biomedical studies are not reproduced. In fact, the competitive publication cycle works against this: merely to confirm an existing result is not very publishable; to contradict an existing result may be publishable, but this means, as Ioannidis notes, that there can be an alternation of positive, then negative, then positive, then negative results on a particular question, as each team becomes interested in upsetting the last published result. Far from amassing independent evidence on a question, this is just another selection bias that works against the scientific process.

Interestingly, the article is wholly unscientific. Without stating its assumptions, it works these assumptions through to conclusions. Along the way, it presents a bunch of formulas, which add the gloss of analysis to what is essentially a work of persuasive writing—but I don't buy the formulas, which include unobservable (and perhaps ill-defined) quantities such as "the ratio of true relationships to no-relationship pairs in a field" and "the false-negative error rate." (How amusing would it be if this article were a false research finding?) But methodology aside, I do believe it: that many, it not most, published research findings are false.

I'd be interested to see someone look at the issue in other kinds of fields—fields that aren't quantitative, for example. In the field of programming languages, people make a lot of claims that are justified by proofs. How often are these proofs actually correct, I wonder? And how often are the claims correct? Moreover, much PL research is not even "claim-based," as such. Many papers simply present a feature or technique and tout its virtues—and don't make falsifiable claims, at all. And often, this is the most valuable research: someone tried something and described the experience. We can learn from others' experience, even without proven claims.

How do we assess the value of a research field, such as that of programming languages? How do we know when we're doing a good job?

Comments

It was interesting for me to come into programming languages (PL) from computer architecture (CA). In the latter, papers are generally published based on comparative analysis and performance (or, e.g., power) improvements. These are quantitative changes that are often shown through simulation and benchmarking.

One issue that has plagued the CA field is the (un-)reliability of simulation for analysis. Most of the time, it is not correlated to real systems, because real systems are so much more complex. And yet, it is prohibitively expensive for most academic entities to build real systems. So, the cycle continues.

With PL, I have been pleasantly surprised at the type of research being published. Like you said, many papers simply present a technique. For the reader, it is a way to expand your mind and to debate the true usefulness of a given idea. Unlike CA, I don't feel like I'm overrun with numbers that may or may not have a valid meaning. I enjoy reading PL papers so much more.

Post a comment