January 19, 2008

MapReduce slides from Google

Google has published a series of tutorial presentations about MapReduce, the GFS, and distributed algorithms. I love this kind of stuff: the fruitful interaction of my two passions, programming languages and distributed computing.

The presentation shows functional programming in a good light; I have one quibble so far: The 3rd slide of the 2nd talk ("MapReduce Theory and Implementation") says, "Functional operations do not modify data structures: They always create new ones." And then, "Original data still exists in unmodified form." But this is an implementation detail; the compiler needn't keep the original data if it it's not going to be used.

More helpful would be to note that, in a functional language, a variable's value never changes, thus making it easy to read the code. Whether input data stays around after an operation is a different issue, and one the compiler can easily handle better than humans, on average.

February 9, 2007

Whistle While You Work

A recent tech report from Berkeley surveys the state of parallel computing [via Tim Bray]—particularly with an eye to the importance and difficulty of coding for multicore processors (one of which is helping me author this article). The bulk of the article deals with hardware design but also discusses programming systems (that's my bailliwick) and application design issues.

I've not read the whole thing yet, but I've already found some surprises:

  • For measuring success, the authors promote "dwarfs"—something like patterns of algorithms—as opposed to conventional benchmarks on particular tasks. It's not clear why this is better for measuring purposes, but the dwarfs do convey nicely some areas of challenge (e.g. "linear algebra," "spectral methods," "Monte Carlo methods").
  • They list twelve recent changes to the conventional wisdom (from no-brainers like "power is free, transistors are expensive"—which has now inverted—to more surprising ones such as "Researchers [can no longer] demonstrate new ideas by building chips.")
  • Transactional memory, a hot topic in PL research, is given some significant airtime. The authors don't mince words over the conventional alternative, locking:
    These locking schemes are notoriously difficult to program, as the programmer has to remember to associate a lock with every critical data structure and to access only these locks using a deadlock-proof locking scheme. Locking schemes are inherently noncomposable and thus cannot form the basis of a general parallel programming model. Worse, these locking schemes are implemented using spin waits, which cause excessive coherence traffic and waste processor power.
    Transactional memory seems like a clear win, though it's not widely implemented and needs to get some experience behind it.
  • With regard to programming paradigms, they put a great emphasis on the role of human programmers in the process of making software:
    We believe that future successful programming models must be more human-centric. They will be tailored to the human process of productively architecting and efficiently implementing, debugging, and maintaining complex parallel applications on equally complex manycore hardware.
    This seems to me quite wise, although it's still unclear whether we can understand the cognitive (or psychological) processes in programming well enough to design for them.

Section 5.4 seems a haunting parable for language designers:

Programming languages, compilers, and architectures have often placed their bets on one style of parallel programming, usually forcing programmers to express all parallelism in that style. Now that we have a few decades of such experiments, we think that the conclusion is clear: some styles of parallelism have proven successful for some applications, and no style has proven best for all.

There seems to be a lot of wisdom collected here, which cuts across subfields of CS—as in a list of which "dwarfs" are commonplace in which application domains. This appears to be a quite useful reference for those of us who don't work in any particular application domain but make systems for programmers to use. I nudge you gently in the direction of this article.

August 1, 2006

Lambda 5: Modal Logic Inpsires a "Distributed" Programming Language

At the Oregon Summer School 2006, Robert Harper presented his Lambda 5, a modal-logic-as-distributed-programming-language. The idea here was that modal logic allows you to reason about things that are true in some "possible world"—and since, via Curry-Howard, true statements are like types of values that are known to exist, the programming-language interpretation of "proposition true at a possible world" is "data available at a possible world"; Harper et al. interpret a "possible world" as a node on a network. Thus, Lambda 5 gives you a way to write terms that are local or mobile: you create local values by publishing an address, which others can use; you create mobile values by packaging them up in a context-independent way. You consume the latter by simply "unboxing" them. The address of a local value is harder to consume—you need to use a binding construct whose body is executed in a hypothetical world where only the addressed value is available.

Harper's presentation was fast & furious, and if you weren't already adept with propositions-as-types, you were probably lost at the beginning. I held on with my toes. Afterward, I got the paper [1, 2] and have been looking at it, trying to figure out if it's going to help me with the issues of locality and distribution that I've been dealing with in the context of Links.

There's an important drawback, or at least caveat, about Lambda 5 as a theoretical approach to distributed computing. It explicitly avoids issues of concurrency. The entire multi-node network of locations is assumed to have a single thread of control. When you consider that there is a substantial bulk of research called "Distributed Computing" and that this research is almost all primarily concerned with issues of concurrency, you might consider this omission rather serious. In the second paper [2], Harper and company use letcc to make control flow explicit, arguing that this gives a way to recover concurency. But if I understand what they're doing, it amounts to a kind of explicit, cooperative multi-tasking—a poor approximation to the situation (and the problems) of distributed systems. My question for this line of research: What are you trying to accomplish?

Oregon Summer School: Further Notes

I've been putting off writing up the rest of the lectures from the Oregon Summer School because I seem to have lost my trusty notebook where all the brilliant things were intoned. But I guess it's gone on long enough; the world must know what happened at Oregon Summer School 2006.

Some of the presenters I mentioned before talked on more than one topic. Michael Hicks, after talking about futures, went on to discuss some work on "dynamic software update" or DSU. Hicks has developed a set of techniques for dynamically loading a new version of an existing program, and switching it over while it runs, with zero downtime. This includes translating old data representations to new ones. As I listened to the talk, I kept thinking, "OK, sure, we're going to design our code in this highly-restricted way, and then as long as we're willing to bend over backwards to do some tricky retrofitting, then hypothetically we might be able to update the code without downtime."

I was wrong! Hicks and his team have actually taken three years' worth of updates to sshd and vsftpd (amounting to a couple dozen updates, including some whole-number revisions) and updated running instances with each successive version, all without crashing or taking the server down. I was quite astonished that these techniques could be applied to changes that have already been made in the wild. Of course, they had to write some code to translate in-memory data structures on the fly—but they didn't have to re-architect the application to make it fit. Everyone in the seminar room woke up when Hicks showed the slide showing all the versions, with their dates, that had been dynamically loaded into these servers.

I would be interested to see whether these DSU techniques turn out to be a good software-engineering tradeoff in the long run. Most of the time, just having an extra machine to handle load while you bounce individual servers to the new version is a cheap way to get the same result. And you still have the challenge of writing your updates so that they're compatible on the wire: you can update sshd's internal structures on the fly, but updating the protocol might be more challenging. Also, to be slightly critical, sshd and vsftpd together make a pretty constrained class of software: slow-changing servers that mainly wait for connections and spawn off processes to handle them. Would this work for a more sophisticated system like a fancy real-time game system, where the gamers are actively interacting through the system?

Matthew Flatt argued for programming-language features inspired by OS features. The case was reasonably compelling: an IDE like DrScheme needs to run user programs in a special bomb-proof box, so that user programs can't impinge on the workings of DrScheme itself. This extends to lots of issues: device ownership, memory consumption, non-termination. Flatt argued for an abstraction called a "custodian" that manages all those resources together; killing the custodian frees up all the resources it manages. At the same time, he wants to enable sharing of data between programs, as an OS might allow. This makes the memory-management problem much harder, of course, since you need a policy for determining which custodian is "charged" for a block of memory, when it's shared between many. Flatt outlined a policy, whose details I didn't get, which apparently works in his setting.

Sandhya Dwarkadas talked about transactional memory from the hardware point of view. Unfortunately, her talk was pitched in the vocabulary of computer architects, so I didn't understand any of it! At a high level, what I took away was that transactional memory might be easy for processor makers to provide, by taking advantage of the cache-coherency systems that are already being included in multiprocessor machines.

Jeff Foster talked about another system for statically detecting race conditions, like Flanagan's for Java, but this time for C code. It amounts to a kind of pointer alias analysis, and the details are very complicated. A question that wasn't raised, which just occurred to me: Why was alias analysis necessary in C but not in Java? I think the answer will be that the Java system may assume that most access to data members are from within the class definition (and thus are not by reference).

Shaz Qadeer had the true misfortune of presenting last, after we'd patiently sat through 48 hours of lectures. For myself, I know I didn't follow his (or Jeff Foster's) presentation as closely as most of the others. Someone has to go last, I guess. Qadeer's presentation was on model-checking concurrent software. Some of the material he presented was basic model-checking stuff (like "What is LTL?") but he quickly jumped ahead to cover fancy techniques for state-space reduction. I'm always surprised when speakers do that. If you assume that I don't know the basics, then why do you expect me to absorb those and with some advanced material in one lecture? If you want to focus on the advanced stuff, then why not just say, "This is for people who already know X," and just give a quick refresher for X? The advanced students were probably bored while us newbies asked questions about LTL, and us newbies got bored once our intuition had been outstripped and we couldn't follow the lecture closely anymore.

All in all, the quality of the presentations at the Summer School was quite high. I was surprised that I could follow about 40 of the 48 hours of lectures, and got something out of almost every one (the previous 48 seminars I'd attended didn't have half that hit rate).

We also had a great time: Jim Allen's nightly walks around Eugene were quite nice, and we always ended up at a pub (if you like beer, they have lots of good ones in Oregon [my favorite: the Black Butte Porter, not to everyone's taste]). I met loads of people there and really enjoyed it. To PhD students in the US and abroad, I'd suggest you go to the Oregon Summer School in future years.

May 7, 2006

Pi-calculus solves consensus

Dolev, Dwork and Stockmeyer in 1987 examined the variations of asynchronous message-passing models where consensus is solvable [1]. They examined the axes of processor synchrony (can processors sleep indefinitely or is their sleep bounded?), network synchrony (are messages received in a bounded number of steps after being sent?) and message-order synchrony (are messages on a channel received in the order they're sent?). The result was that processor synchrony doesn't help (1-resilient consensus is unsolvable whether processors are synchronous or not) but either of the other two parameters alone makes the difference between no fault-tolerant consensus and n-resilient consensus (any number of process failures can be tolerated).

The (synchronous, or classic) pi-calculus is a "model of concurrency" where messages are rendezvous points: messages are received right when they're sent and thus in order. This implies that pi-calculus can solve n-resilient consensus.

In case you're not convinced, here's a pi-calculus expression that solves it*.

* For pedants: my solution is only (n–2)-resilient: it deadlocks if only one process is non-faulty.

UPDATE: C. Palamidessi had shown that synchronous pi-calculus can solve consensus and that asynchronous pi-calculus cannot [2]. Wischik and Wischik have a protocol for synchronous rendezvous [3], which suggests that pi-calculus is achievable purely given the right primitives, and probabilistically achievable in any case.

[1] Dolev, D., Dwork, C., and Stockmeyer, L. 1987. On the minimal synchronism needed for distributed consensus. J. ACM 34, 1 (Jan. 1987), 77-97.
[2] Palamidessi, C. 1997. Comparing the expressive power of the synchronous and the asynchronous &pgr;-calculus. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Paris, France, January 15 - 17, 1997). POPL '97. ACM Press, New York, NY, 256-265. DOI=
[3] Wischik, L., Wischik, D. 2004 A Reliable Protocol for Synchronous Rendezvous. Technical Report UBLCS-2004-1, Dept. of Computer Science, Univ. of Bologna. Feb. 2004

April 21, 2006

What Can Be Implemented Anonymously?

Here's another interesting paper in distributed computing that I read recently: "What Can be Implemented Anonymously?" by Geurraoui and Ruppert in DISC 2005.

Lots of the classic distributed algorithms require every process to have (and use) a unique identifier; this makes some problems easy to solve. The paper asks which problems can be solved, in a shared-memory model, without such IDs, and gives good answers. Of course, this anonymity is a technical one and is quite different from the notion of personal privacy, but the existence of such anonymous algorithms might facilitate some real applications that offer such privacy.

First they implement an object called a weak counter, which returns non-decreasing numbers over time, and they use this to implement the other algorithms in the paper. Then they give good efficient for consensus and snapshot and give space complexity bounds for their algorithms. Finally, they characterize the distributed objects that can be implemented anonymously: every idempotent object is implementable. All of the implementations are wait-free, non-blocking, or obstruction-free.

The paper is a brisk and readable 14 pages and has good, clear arguments of correctness.

March 26, 2006

100s of Results

The most exciting thing I've come across lately is the survey paper "Hundreds of Impossibility Results for Distributed Computing" by Faith Fich and Eric Ruppert. It shows what an exciting field distributed computing is, in that there are so many strong results covering a small number of models.

The paper is a good introduction to the field in general, giving a description of some of the key problems (consensus, broadcast, leader election, resource allocation), the important models of distributed computation (synchronous/asynchronous, message-passing/shared-memory, faulty/non-faulty), and also a concise presentation of some of the clever proof techniques that are used to establish impossibility results.

Researchers in this area have a fairly clear task in front of them: present algorithms that solve one of the problems on one of the models—or present a proof that there is no algorithm for that problem and model. Any such result is an exciting, solid advancement: either something is possible, and we've learned how to do it, or it is not possible and one must strengthen the underlying model in order to achieve it.

Of course, there is no shortage of generality in the field—it's not all about individual algorithms. There is an appealing and powerful result known as Herlihy's hierarchy, or the consensus hierarchy, which states that primitives in the shared-memory model can be characterized by the number of processes for which they can be used to solve (wait-free) consensus. Some primitives, such as read/write, cannot solve it for any number of processes. Others, such as test-and-set, can be used to build algorithms for consensus for small numbers of processes. Certain primitives are universal: they can solve consensus for any number of processes. What's more, consensus itself is universal: any other shared object can be implemented on top of a system that solves consensus. Characterising the primitives in this way has clear implications for building real systems.

Contrast this with the theory of "process calculus." Those who know me know that I am not at all fond of this vein of research—I believe its merits are unproven after decades of research. The field of process calculus has hundreds of theorems, no doubt—but these theorems are all about the bisilimarity of one process calculus with another, or about when two individual systems can be considered equivalent. A result of Milner's shows that any CCS formula can be factored uniquely into a parallel composition of non-concurrent processes. But this is a statement about the process calculus itself, rather than about the systems that it could model. The field seems to spend all its time proving that its formalisms are consistent or not entirely unrealistic, rather than cutting to the chase and proving interesting statements about computational phenomena.

Proofs of the kind summarized in Fich and Ruppert's paper must surely be considered theory, rather than systems research: they are strong mathematical arguments about a broad class of phenomena. But the proofs in "distributed computing" are eminently useful, whereas those in "process calculus" are largely abstracted to the point of dubious utility.

If anyone reading this knows more about process calculus/process algebra, and wants to challenge me on this, please do. I've not read the process algebra literature as deeply as I've read the distributed computing literature. I'd love to be convinced that I'm wrong.

February 9, 2006

Distributed Modeling

Amazon wants to do more modeling of their distributed environment. Werner Vogels has the details.