" /> Ezra's Research: January 2008 Archives

« October 2007 | Main | March 2008 »

January 19, 2008

MapReduce slides from Google

Google has published a series of tutorial presentations about MapReduce, the GFS, and distributed algorithms. I love this kind of stuff: the fruitful interaction of my two passions, programming languages and distributed computing.

The presentation shows functional programming in a good light; I have one quibble so far: The 3rd slide of the 2nd talk ("MapReduce Theory and Implementation") says, "Functional operations do not modify data structures: They always create new ones." And then, "Original data still exists in unmodified form." But this is an implementation detail; the compiler needn't keep the original data if it it's not going to be used.

More helpful would be to note that, in a functional language, a variable's value never changes, thus making it easy to read the code. Whether input data stays around after an operation is a different issue, and one the compiler can easily handle better than humans, on average.

January 14, 2008

Classics

I was surprised to find that these classic papers are available online:

It's interesting to read Church's papers to see how he's approaching the project. Initially, he's trying to clarify the notion of a free variable in logic, or rather to chuck it out in favor of a clear discipline for binding variables. Because he's looking at foundational issues in logic, he's at great pains to identify his assumptions.

A minor historical fact: alpha- and beta-conversion are defined in these early Church papers, but not with those names; they're identified as I-conversion and II-conversion (there is a conversion rule III, which looks like the reverse of beta-conversion).

January 7, 2008

On Why Most Published Research Findings Are False

Here's a stimulating article: Why Most Published Research Findings Are False by John P. A. Ioannidis (PLoS Medicine, 2005). It focuses on research that aims to find statistical relationships in data, and asserts that most such relationships claimed in the literature are in fact false. Distilling the discussion, I find these compelling reasons why it would be so:

  • standard tests of "statistical significance" are taken as proof of a proposition,
  • there is bias in experimental design & interpretation,
  • researchers and journals prefer positive results,
  • experiments tend not to be independently reproduced.

This last point is particularly damning—few things are more essential to the scientific method than reproducible experiments, yet the article blithely says (and I readily believe) that most biomedical studies are not reproduced. In fact, the competitive publication cycle works against this: merely to confirm an existing result is not very publishable; to contradict an existing result may be publishable, but this means, as Ioannidis notes, that there can be an alternation of positive, then negative, then positive, then negative results on a particular question, as each team becomes interested in upsetting the last published result. Far from amassing independent evidence on a question, this is just another selection bias that works against the scientific process.

Interestingly, the article is wholly unscientific. Without stating its assumptions, it works these assumptions through to conclusions. Along the way, it presents a bunch of formulas, which add the gloss of analysis to what is essentially a work of persuasive writing—but I don't buy the formulas, which include unobservable (and perhaps ill-defined) quantities such as "the ratio of true relationships to no-relationship pairs in a field" and "the false-negative error rate." (How amusing would it be if this article were a false research finding?) But methodology aside, I do believe it: that many, it not most, published research findings are false.

I'd be interested to see someone look at the issue in other kinds of fields—fields that aren't quantitative, for example. In the field of programming languages, people make a lot of claims that are justified by proofs. How often are these proofs actually correct, I wonder? And how often are the claims correct? Moreover, much PL research is not even "claim-based," as such. Many papers simply present a feature or technique and tout its virtues—and don't make falsifiable claims, at all. And often, this is the most valuable research: someone tried something and described the experience. We can learn from others' experience, even without proven claims.

How do we assess the value of a research field, such as that of programming languages? How do we know when we're doing a good job?