Main

June 12, 2009

Dept. of Wheel Reinvention

From the brief on Apple's new concurrency framework, Grand Central Dispatch:

Blocks are a simple extension to C (as well as Objective-C and C++) that make it easy for you to define self-contained units of work. A block in code is denoted by a caret at the beginning of a function. For example, you could declare a block and assign it to x by writing:

    x = ^{ printf("hello world\n"); }
This turns the variable x into a way of calling the function so that calling x( ); in the code would print the words hello world.

What’s really powerful about blocks is that they enable you to wrap much more complex functions—as well as their arguments and data—in a way that can be easily passed around in a program, much as a variable can be easily referenced and passed.

Underneath this is a diagram showing that a block is like a function with some associated data.

I think I've seen this idea before somewhere. :-)

Ah, but a closure by any other name would smell as sweet...

March 17, 2007

New Erlang Book

Interesting: Joe Armstrong is writing a new book about Erlang.

This is good because the old one was short on helpful material for beginners. I had the same problem when I started learning Erlang as I had with Perl: I was dead in the water trying to get a simple program to run, for hours or days—until, with Perl, I found Learning Perl. With Erlang it was just a tremendous amount of banging my head against the wall. The first hours of learning a new language, for me, are often spent with trivial frustrations, like figuring out what "main" is called, or how to include the standard library. Maybe that's because I usually don't start with examples, the way many people do. Rob Sayre says the book is good.

February 9, 2007

Whistle While You Work

A recent tech report from Berkeley surveys the state of parallel computing [via Tim Bray]—particularly with an eye to the importance and difficulty of coding for multicore processors (one of which is helping me author this article). The bulk of the article deals with hardware design but also discusses programming systems (that's my bailliwick) and application design issues.

I've not read the whole thing yet, but I've already found some surprises:

  • For measuring success, the authors promote "dwarfs"—something like patterns of algorithms—as opposed to conventional benchmarks on particular tasks. It's not clear why this is better for measuring purposes, but the dwarfs do convey nicely some areas of challenge (e.g. "linear algebra," "spectral methods," "Monte Carlo methods").
  • They list twelve recent changes to the conventional wisdom (from no-brainers like "power is free, transistors are expensive"—which has now inverted—to more surprising ones such as "Researchers [can no longer] demonstrate new ideas by building chips.")
  • Transactional memory, a hot topic in PL research, is given some significant airtime. The authors don't mince words over the conventional alternative, locking:
    These locking schemes are notoriously difficult to program, as the programmer has to remember to associate a lock with every critical data structure and to access only these locks using a deadlock-proof locking scheme. Locking schemes are inherently noncomposable and thus cannot form the basis of a general parallel programming model. Worse, these locking schemes are implemented using spin waits, which cause excessive coherence traffic and waste processor power.
    Transactional memory seems like a clear win, though it's not widely implemented and needs to get some experience behind it.
  • With regard to programming paradigms, they put a great emphasis on the role of human programmers in the process of making software:
    We believe that future successful programming models must be more human-centric. They will be tailored to the human process of productively architecting and efficiently implementing, debugging, and maintaining complex parallel applications on equally complex manycore hardware.
    This seems to me quite wise, although it's still unclear whether we can understand the cognitive (or psychological) processes in programming well enough to design for them.

Section 5.4 seems a haunting parable for language designers:

Programming languages, compilers, and architectures have often placed their bets on one style of parallel programming, usually forcing programmers to express all parallelism in that style. Now that we have a few decades of such experiments, we think that the conclusion is clear: some styles of parallelism have proven successful for some applications, and no style has proven best for all.

There seems to be a lot of wisdom collected here, which cuts across subfields of CS—as in a list of which "dwarfs" are commonplace in which application domains. This appears to be a quite useful reference for those of us who don't work in any particular application domain but make systems for programmers to use. I nudge you gently in the direction of this article.

August 1, 2006

Oregon Summer School: Further Notes

I've been putting off writing up the rest of the lectures from the Oregon Summer School because I seem to have lost my trusty notebook where all the brilliant things were intoned. But I guess it's gone on long enough; the world must know what happened at Oregon Summer School 2006.

Some of the presenters I mentioned before talked on more than one topic. Michael Hicks, after talking about futures, went on to discuss some work on "dynamic software update" or DSU. Hicks has developed a set of techniques for dynamically loading a new version of an existing program, and switching it over while it runs, with zero downtime. This includes translating old data representations to new ones. As I listened to the talk, I kept thinking, "OK, sure, we're going to design our code in this highly-restricted way, and then as long as we're willing to bend over backwards to do some tricky retrofitting, then hypothetically we might be able to update the code without downtime."

I was wrong! Hicks and his team have actually taken three years' worth of updates to sshd and vsftpd (amounting to a couple dozen updates, including some whole-number revisions) and updated running instances with each successive version, all without crashing or taking the server down. I was quite astonished that these techniques could be applied to changes that have already been made in the wild. Of course, they had to write some code to translate in-memory data structures on the fly—but they didn't have to re-architect the application to make it fit. Everyone in the seminar room woke up when Hicks showed the slide showing all the versions, with their dates, that had been dynamically loaded into these servers.

I would be interested to see whether these DSU techniques turn out to be a good software-engineering tradeoff in the long run. Most of the time, just having an extra machine to handle load while you bounce individual servers to the new version is a cheap way to get the same result. And you still have the challenge of writing your updates so that they're compatible on the wire: you can update sshd's internal structures on the fly, but updating the protocol might be more challenging. Also, to be slightly critical, sshd and vsftpd together make a pretty constrained class of software: slow-changing servers that mainly wait for connections and spawn off processes to handle them. Would this work for a more sophisticated system like a fancy real-time game system, where the gamers are actively interacting through the system?

Matthew Flatt argued for programming-language features inspired by OS features. The case was reasonably compelling: an IDE like DrScheme needs to run user programs in a special bomb-proof box, so that user programs can't impinge on the workings of DrScheme itself. This extends to lots of issues: device ownership, memory consumption, non-termination. Flatt argued for an abstraction called a "custodian" that manages all those resources together; killing the custodian frees up all the resources it manages. At the same time, he wants to enable sharing of data between programs, as an OS might allow. This makes the memory-management problem much harder, of course, since you need a policy for determining which custodian is "charged" for a block of memory, when it's shared between many. Flatt outlined a policy, whose details I didn't get, which apparently works in his setting.

Sandhya Dwarkadas talked about transactional memory from the hardware point of view. Unfortunately, her talk was pitched in the vocabulary of computer architects, so I didn't understand any of it! At a high level, what I took away was that transactional memory might be easy for processor makers to provide, by taking advantage of the cache-coherency systems that are already being included in multiprocessor machines.

Jeff Foster talked about another system for statically detecting race conditions, like Flanagan's for Java, but this time for C code. It amounts to a kind of pointer alias analysis, and the details are very complicated. A question that wasn't raised, which just occurred to me: Why was alias analysis necessary in C but not in Java? I think the answer will be that the Java system may assume that most access to data members are from within the class definition (and thus are not by reference).

Shaz Qadeer had the true misfortune of presenting last, after we'd patiently sat through 48 hours of lectures. For myself, I know I didn't follow his (or Jeff Foster's) presentation as closely as most of the others. Someone has to go last, I guess. Qadeer's presentation was on model-checking concurrent software. Some of the material he presented was basic model-checking stuff (like "What is LTL?") but he quickly jumped ahead to cover fancy techniques for state-space reduction. I'm always surprised when speakers do that. If you assume that I don't know the basics, then why do you expect me to absorb those and with some advanced material in one lecture? If you want to focus on the advanced stuff, then why not just say, "This is for people who already know X," and just give a quick refresher for X? The advanced students were probably bored while us newbies asked questions about LTL, and us newbies got bored once our intuition had been outstripped and we couldn't follow the lecture closely anymore.

All in all, the quality of the presentations at the Summer School was quite high. I was surprised that I could follow about 40 of the 48 hours of lectures, and got something out of almost every one (the previous 48 seminars I'd attended didn't have half that hit rate).

We also had a great time: Jim Allen's nightly walks around Eugene were quite nice, and we always ended up at a pub (if you like beer, they have lots of good ones in Oregon [my favorite: the Black Butte Porter, not to everyone's taste]). I met loads of people there and really enjoyed it. To PhD students in the US and abroad, I'd suggest you go to the Oregon Summer School in future years.

July 20, 2006

Oregon Summer School: Early Lectures

I am having a great time at the Oregon Summer School for Language-based Techniques in Concurrent and Distributed Systems & learning loads of interesting things. I'll summarize a few of the earlier lectures, but there are lots more besides these.

Cormac Flanagan presented his technique for checking & inferring that Java programs observe a locking discipline that gives desired atomicity properties. A motivation for this is that Java's synchronized keyword allows you to protect a block with a lock; but it is up to you to make sure that all of the right locks are held--in general it can be hard to tell whether a piece of code is atomic with respect to the rest of the program. Flanagan's system allows you to annotate a method or a block with atomic; a static checker then infers whether it is truly atomic by virtue of the locks it holds (viz-a-viz other bits of code in the program). The analysis is somewhat conservative, in that it may reject programs that are actually correct, but the techniques seem to lead you to write the kind of lock-based code that is ordinarily used in practice; Flanagan's team has run the checker successfully on large bodies of threaded benchmark code, and has even found bugs in the Java standard library (e.g. with the append function on Strings). The biggest drawback to this work is that it still relies on locks, and deadlock can still occur.

Dan Grossman gave an nice survey of possible semantics for transactions. Here again, programmers would wrap blocks of code with an atomic keyword, but now we are proceeding from semantics to implementation, rather than the other way around. Some of the semantic questions surround the interaction of transactions with exceptions, the nesting of transactions, and the distinction between weak and strong atomicity [1]. Dan convinced me that when an exception escapes an atomic block, you should not roll back the transaction. One good reason for this (among many) is that it preserves "serial elision"[2]: if you erase the atomic keywords, you get a sequential program that behaves the same as the original program would behave in a sequential context.

Strong and weak atomicity are distinguished by how you treat reads and writes that are not protected by an atomic block. An interesting tidbit is that Haskell's STM system moots the distinction by cleanly separating transactional memory from non-transactional memory (they have different types). This means that the low-level implementation can provide only weak atomicity, but at the language level there is no risk of other threads altering transactional memory.

Dan's thesis is that if you provide explicit threads and mutable shared memory, *then* transactions are a great solution to the problems that arise—but that it's not clear whether threads with shared memory are the best model of concurrency.

To contrast, we've had two presentations on alternate approaches to concurrency. Charles Leiserson presented Cilk, a conservative, faithful extension of C that offers declarative parallelism. Designed for big parallel algorithms (think FFT, matrix multiplication, etc.), this language allows you to tag subexpressions for evaluation in separate threads—the principal mode of communication is simply the return value of the subexpression. This model removes the chance of race conditions and deadlocks (Although the full power of C is still available so you can still shoot yourself in the foot). The language design seems reasonably elegant (just a few keywords are added) and it has the property they call "serial elision"; a drawback is that return values need to be used in specific ways (e.g., assigning directly to a local variable) and there are ways to use it unsafely (e.g., trying to use a spawned return value before the spawned thread has actually returned).

Leiserson also gave some very simple and interesting ways to analyze the parallelism of an algorithm, which gives you a good guideline to how much speedup you can expect as you add more processors. Essentially, you need to add up the total amount of work done as well as the length of the critical path (the longest dependency chain) and look at the ratio. I hope to post more about this another time.

Matthew Flatt of PLT Scheme fame (currently at University of Utah) gave a really neat demo of the Concurrent ML primitives in Scheme, in realtime: he built up a program iteratively, running it each time, while we watched. This worked surprisingly well. At times, it was easy to get confused, but asking lots of questions was a strategy that allowed me to grasp the ideas. The concurrency primitives are a lot like pi-calculus, in that they allow synchronous rendezvous and a "choice" combinator. This sublanguage is pure in the way that Erlang is: locks are not needed. Of course, the synchrony means the primitives are hard to implement (efficiently and reliably) in a distributed setting.

Michael Hicks presented "futures," a language feature first implemented in MULTILISP; futures permit declarative concurrency like Cilk, but the language handles some of the necessary mechanics: futures automatically collect or wait on result values when they're needed, whereas Cilk requires the programmer to explicitly wait for the results (and dies if this is done incorrectly!).

[1] Weak atomicity requires only that transactions are atomic with respect to one another; strong atomicity requires that they be atomic even with respect to code to non-atomic blocks. The latter is, of course, much harder to implement.

[2] So-called by the Cilk people.

June 19, 2006

STM in Pugs

Pugs (the Perl6 implementation in Haskell) has started to add STM, à la Haskell. Audrey says they're missing certain combinators, which sound to me like nearly all of the necessary ones, but still it sounds like they "get it." Audrey also hints that STM may be part of the standard concurrency model for Perl6 widely, and that some Summer of Code project is working on adding STM to Parrot.

I'll be pretty interested to see how all this pans out.

May 7, 2006

Pi-calculus solves consensus

Dolev, Dwork and Stockmeyer in 1987 examined the variations of asynchronous message-passing models where consensus is solvable [1]. They examined the axes of processor synchrony (can processors sleep indefinitely or is their sleep bounded?), network synchrony (are messages received in a bounded number of steps after being sent?) and message-order synchrony (are messages on a channel received in the order they're sent?). The result was that processor synchrony doesn't help (1-resilient consensus is unsolvable whether processors are synchronous or not) but either of the other two parameters alone makes the difference between no fault-tolerant consensus and n-resilient consensus (any number of process failures can be tolerated).

The (synchronous, or classic) pi-calculus is a "model of concurrency" where messages are rendezvous points: messages are received right when they're sent and thus in order. This implies that pi-calculus can solve n-resilient consensus.

In case you're not convinced, here's a pi-calculus expression that solves it*.

* For pedants: my solution is only (n–2)-resilient: it deadlocks if only one process is non-faulty.

UPDATE: C. Palamidessi had shown that synchronous pi-calculus can solve consensus and that asynchronous pi-calculus cannot [2]. Wischik and Wischik have a protocol for synchronous rendezvous [3], which suggests that pi-calculus is achievable purely given the right primitives, and probabilistically achievable in any case.

[1] Dolev, D., Dwork, C., and Stockmeyer, L. 1987. On the minimal synchronism needed for distributed consensus. J. ACM 34, 1 (Jan. 1987), 77-97.
[2] Palamidessi, C. 1997. Comparing the expressive power of the synchronous and the asynchronous &pgr;-calculus. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Paris, France, January 15 - 17, 1997). POPL '97. ACM Press, New York, NY, 256-265. DOI= http://doi.acm.org/10.1145/263699.263731
[3] Wischik, L., Wischik, D. 2004 A Reliable Protocol for Synchronous Rendezvous. Technical Report UBLCS-2004-1, Dept. of Computer Science, Univ. of Bologna. Feb. 2004

April 13, 2006

Concurrency Theory

Ironically, just a week after I posted a screed about the lack of purpose in process-calculus research, I found myself trying to write a semantic model for a distributed programming language.

My purpose was not to prove anything about computation, but just to explicate the language precisely.

I have some strange requirements. I want to show that my semantics are realistic for a certain kind of environment, having peculiar limitations (for example, threads exist in particular locations, which could fail, and not all primitives are available in all locations; some components of the system are single-threaded...), so I want to choose a system that is close to that environment. CCS, pi-calculus, and join calculus are not at all realistic for distributed computation of any sort, so I shied away from those.

Fortunately, not all process calculists are blind to the issues that arise in distributed computing. Andrew Gordon has a nice page on "Nominal Calculi for Security and Mobility," summarizing various attempts to resolve the unrealisms. He observes that "Synchronisation on global channels is the basic communication primitive of the pi calculus. Many authors note that this is an expensive primitive to implement in a distributed system."

Fournet, et al., motivate their "Calculus of Mobile Agents" [1] with this acknowledgement:

Suppose ... that we want to implement a concurrent calculus with CCS-like communication channels and with processes running on different physical sites. If we do not locate channels, we quickly face a global consensus problem for nearly every communication which uses the interconnection network.

Later, they introduce the distributed reflexive chemical machine (DRCHAM), an extension of CHAM, which offers located solutions—that is, bags of processes which are collected together and marked with a location—and limit the communication between locations. As a result, "The transport is deterministic, static, and point-to-point, and synchronization is only done locally on the receiving site during message treatment."

The mobile ambient calculus of Cardelli and Gordon [2] is similar to the DRCHAM model in that it provides named locations. I haven't read this work in detail yet so I can't compare the two.

The upshot is that there may be a calculus which is realistic enough to model the issues that our programming language is designed to solve, and hence it may be worthwhile modeling our language in that calculus. Using the classical process calculi would not have been fruitful, any more than modeling it on a regular old Turing machine.

[1] Cédric Fournet and Georges Gonthier and Jean-Jacques Lévy and Luc Maranget and Didier Rémy. A Calculus of Mobile Agents. Proceedings of the 7th International Conference on Concurrency Theory. Springer-Verlag, 1996.
[2] Luca Cardelli and Andrew D. Gordon. Mobile Ambients. Foundations of Software Science and Computation Structures: First International Conference. Springer-Verlag, 1998.

February 17, 2006

How Much Contention in STM?

Suppose you have a set of processes interacting via STM, and suppose that for a given region of time, each process performs one transaction which writes just once to one shared variable (and may read from any number). We can give tight bounds on the number of retries that each process will perform, and thus the amount of extra work.

Describe the relationships between the processes using a directed graph (V, E), where the nodes in V represent the processes and an edge u → v asserts that v reads from u during its transaction.

Such a set of transactions can be executed in a variety of ways.

To describe the different ways a set of processes can execute, define a history as a 1-1 function h : {1 ... |V|+|E|} ↔ V ∪ E. This attaches a sequential number to each process and each read, denoting the order of the write operations and the order of the first read operation for a given u → v. h(v), the time of v's write operation, must be h(v) > h(u, v) for all u → v. That is, it must write after all of its reads. However, it may read from u before or after u actually writes its value—such is the nature of concurrency and STM. Thus whenever a node u writes, all v for which u → v and h(u, v) < h(u) < h(v) will retry. (If h(v) < *h(u) then v has already written and committed its transaction.)

How many retries occur, then, for a given history? Note that a node u will retry no more times than the number of processes it reads from (it's in-neighbors). Those processes only write once; so even if upstream processes forced the in-neighbors to retry many times, u will only retry when an immediate neighbor finally writes. (This is not true in the general situation, of multiple transactions per process.) In fact, the retries exactly correspond to {t → u | h(t, u) < h(t) < h(u)}, i.e., those edges whose read is before the source's write.

There is a first node u0, having the least h(u) of all the nodes. This node never retries. There is a second node u0; this node retries if it has a read from u0 and h(u0 → u1) < h(u0); thus u1 retries 0 or 1 times. The ith node retries a number of times r(ui) >= 0. We can bound this above in two ways: first by the number of of in-edges (as observed already) and second by i.

Thus, in one pathological case, the graph is a clique, with all the first reads happening before any writes; here each node i, 0 <= i <= |V|, retries exactly i times, leading to a total of \sumi i retries, or about |V|2/2—a slowdown of about a factor |V|/2.

If the graph is large but narrow—if it has bounded in-degree d—then the worst-case number of retries can not be more than d|V| and thus the slowdown factor is bounded by d.

This may be useful if you know you have a large number of STM processes but they are interacting only in small groups. The assumption of only one transaction per process for a given time period is reasonable if your model involves synchronous time-steps, as it would for a game that's rendered frame-by-frame.

For the future: It might be interesting to check, also, what the expectation of these numbers would be for a random assignment of h.

February 15, 2006

3-body Problem in Haskell

So last week I implemented the n-body simulation in Erlang with message-passing, as an exploration of one-process-per-particle concurrency. Now I've got it up and running in Haskell, one version using STM and another using just MVars (individual atomic variables).

I found the shared-memory approach to be a more natural style than message-passing for this kind of physical model. My technique is simple: for each time step in simulation time, there is a shared vector of the states of the particles. Each process constantly tries to read the current moment values in sequence (blocking on those particles that are not filled in) and when it has read the whole thing, it performs its next-state computation on those states; it then writes its own next-state into the t+1 shared state vector. I want to reiterate that this technique works only because the processes necessarily work synchronously through simulation time.

Why MVars instead of STM? MVars implement a medium-power concurrency primitive—something like test-and-set—and it was plenty powerful enough to write this application without much fuss. Transactional memory is a significantly higher-level primitive, and its added flexibility wasn't necessary in this case. I'd like to make a loose argument that MVars are generally suffficient for the concurrency that's likely to be found in games.

The STM version works very similarly, but instead of just blocking on an unfilled particle-state, it retries. I believe that this is inefficient, because it's not necessary to repeat the reads that have already been performed at a given timestep: the variables in question are all single-assignment. Also, I suspect that the semantics of STM will lead to lots of retries, the threads tending to swarm around an unfilled variable and all of them retrying the whole transaction together. By contrast, the MVar implementation is efficient in the sense that when a state-variable is filled in, one waiting thread is woken, whose take/put action causes another waiting thread to be woken, etc.

Here's the code, for reference.

Continue reading "3-body Problem in Haskell" »

February 10, 2006

3-body Problem in Erlang

UPDATE: The same problem in Haskell.

I've been learning Erlang, and trying to understand the pros and cons of its approach to concurrency, particularly for physical simuilations that are computationally-intensive and perhaps hard to parallelize. The press packets keep saying that Erlang makes it easy: if you want to model some thing that has n parts, just fork off n processes that each do the work of one part, communicating when they need to. This is by no means the only way to build a model of such a system, but it's what the Erlang evangelists tout. To start testing the wisdom of that, I made a simulation for a simple gravitional system. What follows are the tribulations of a neophyte, and there may well be better ways to do it. What's the upshot? I'm not sold on Erlang's ease-of-use for one-to-one modeling of systems.

The first conceptual problem I hit was the question of synchronization. In order for the sim to be a decent finite approximation to the continuous world of physics, we need to break it into discrete time steps, each of which depends on the previous time step (at least, that's the only way I know of to get a fair approximation). This means that each particle can't just crunch away at it's own speed, working as fast as it can to calculate its position at various times. Easily the particles could grow out of sync with one another, making an inaccurate physical model. So right at the start, Erlang's asynchronous model seems to be a disadvantage.

Another problem is that in this particular model, every process depends on every other. As such, it would be nice if each process could simply fetch the needed information about the others. In Erlang, that's not trivial, since it would require each particle acting as a server, responding to requests for its position data, and each one acting as a client, sending off requests and waiting for responses. That sounded overly complicated, so I chose a different approach: each particle broadcasts its position as soon as it has been calculated. This basically dispenses with one half of the client-server interaction; but it means that each process can receive messages that it doesn't immediately care about, so it has to manage that information.

I solved the synchronization problem by attaching a time stamp (in model time, not real time) to each position message. When it has enough information about time t, a process calculates its position for time t+1 and broadcasts a message that says, "My position at t+1 is (x, y)." The others capture this information, and when they know all they need to know about time t+1, they each calculate their position at time t+2, and so on ad nauseam.

As I said, a process can receive messages that it doesn't care about yet. In the code, you'll see that each process keeps track of a a list of "current" positions of particles, and also a list of "future" positions. In this particular model, I know that a process can only receive data that's at most one time step ahead of it. That's because a process that is "at" time t (i.e., it has not yet computed its t+1 position) cannot receive information about others' t+2 positions, because those t+2 positions would depend on its own t+1 position. That makes the future-data management a little easier in this case. A different model might not have that constraint, and would require better management of the future data.

This model is particularly pernicious, though, since it has total mutual interdependence. I'm interested in how well Erlang would work for big game worlds; maybe in those worlds each object has a small neighborhood of other objects that can affect it. But, I expect that the coding style would be the same within that neighborhood. What's more, if objects can move around and change neighborhoods, then there will be the issue of managing the set of objects in each process's neighborhood. Is this unneccessary overhead?

A final note about the paradigm: a lot of my work went into managing the messages, since they can be received at any time and in any order. The machine does a lot of work in handling the messages, too: at each time step there are n distinct messages but the machine has to deliver n2 messages. In this case, the particle positions are only written by one process, and they are read by all the others. A shared-memory approach might have an advantage here, since locking per se might not be needed.

At last, the code. To run it, start up erl and type c(atoms)., then atoms:start().

Continue reading "3-body Problem in Erlang" »

January 23, 2006

Message-passing and Entanglement

Also today, I pondered some issues around concurrency. I had the realization that, although message-passing may be a useful language mechanism (as a sole concurrency mechanism) for structuring a lot of programs, it's probably not good for applications that have a lot of data interdependence and need strong atomicity properties. Here was my insight:

Suppose you're building software for a complex system—like a simulator, say—and there are discrete units in the system that seem like they are largely independent from one another. It may be attractive to implement these units as distinct processes (threads) in the way you structure the software. You can write a single process description that sends messages whenever the real, modelled object has an interaction with other objects. This might seem appealing, but it's not necessarily the right thing to do.

The critical thing is that if, as in Erlang, message-passing is the sole means of concurrency-control, then passing a message is the only kind of atomic operation. That means that if variables x and y are updated together in some atomic operation, they have to be managed by the same process. And if y and z are also part of some atomic operation, they too have to be managed by the same process.

So the natural division of your program into threads, mirroring the division of modelled objects, may not be correct, since there may be some operation which must be atomic, and must involve two or more "natural" objects. Furthermore, it's not hard to imagine most or all of the variables in a program becoming entangled in this way.

This is not to say message-passing is not useful, or that there aren't lots of programs that could profitably be written that way. Many applications, after all, don't require strict atomicity in every of the operations that are in principle atomic. But, I think some real safety-critical applications are going to have data that is so entangled that, if they had to use the message-passing approach, they'd lose out on the benefits of concurrency.

So I continue my search for other concurrency abstractions that the language can usefully support.