" /> Ezra's Research: February 2006 Archives

« January 2006 | Main | March 2006 »

February 26, 2006

Breaking Engima (still)

The M4 Project is a vounteer distributed-computing effort to break messages encoded with the Enigma in 1942 and supposedly still unbroken. [via Schneier]

February 24, 2006

Interactivity Helps

Scott Aaronson (of the Complexity Zoo) has a nice post called "The Fable of the Chessmaster" that talks about what computational complexity means in real terms—layman's terms, if you like.

He hits the nail on the head in pointing out that interactivity greatly helps communication: if you can ask questions, you can pursue the information you most want to know. This goes equally well for algorithms that are trying to decide some question as it does for human beings talking, whether in a formal presentation or chatting about bills. Interactivity helps.

In the human case, you may not always know what information you want to know, but you can also ask questions to help discover it. I suspect that fact has a complexity-theory analogue, though I don't know what it is. I am but a poor programming-language designer...

Finally, while I agree implicitly with Aaronson that complexity is about much more than computers, I don't think he's made the case. A chess game is a computation, isn't it? Does complexity theory cover things other than computations? Are there any computations that we'd meaningfully want to do away from a computer? And if so, are they large enough in scale that asymptotic complexity is relevant?

February 21, 2006

The Week in Web Languages

This was a hot week for web-language news.

Generators (yield and next statements) were implemented in FireFox's JavaScript a few days ago, in what looks like about two weeks of work by one guy. Cheers to Brendan Eich.

Tim Bray posted a round-up of arguments for and against PHP (mostly against). I was interested to read the Zend CEO's claim that half the websites of the world are powered by PHP. I'm disappointed that the Netcraft survey doesn't seem to be tracking that data.

Speaking as a language designer and as someone who briefly coded PHP for money, I'm pretty convinced that PHP is a bad language design. Yet, it's not entirely clear that language design matters.

Joe Armstong, the Erlang doyen, announced Jaws, a web framework for Erlang this week. It has some of the features of Links, such as easy calls from client to server. One difference I note right away is that finding a server method from the client is, in Jaws, a matter of tacking on the right string to a URL. Links treats the method definition, as a name binding, and calls from client to server have to be to bound names. Not that mis-typing the function name is a great source of error amongst web programmers!

Finally, a bit off-topic, Greg Linden of Findory writes about Blogger losing data and asking its users to cut-and-paste their entries from the web into their posting interface. The irony is astounding, of course. Greg thinks the problem is lax data-management at Google. They might be lax, but this tells me that they don't care much about the Blogger product. It's free, after all.

February 19, 2006

Poet at OOPSLA

WTF? Robert Hass was an invited speaker at OOPSLA 2005 (yes, that Robert Hass, the former US Poet Laureate). I'd love to know what he said.

February 17, 2006

How Much Contention in STM?

Suppose you have a set of processes interacting via STM, and suppose that for a given region of time, each process performs one transaction which writes just once to one shared variable (and may read from any number). We can give tight bounds on the number of retries that each process will perform, and thus the amount of extra work.

Describe the relationships between the processes using a directed graph (V, E), where the nodes in V represent the processes and an edge u → v asserts that v reads from u during its transaction.

Such a set of transactions can be executed in a variety of ways.

To describe the different ways a set of processes can execute, define a history as a 1-1 function h : {1 ... |V|+|E|} ↔ V ∪ E. This attaches a sequential number to each process and each read, denoting the order of the write operations and the order of the first read operation for a given u → v. h(v), the time of v's write operation, must be h(v) > h(u, v) for all u → v. That is, it must write after all of its reads. However, it may read from u before or after u actually writes its value—such is the nature of concurrency and STM. Thus whenever a node u writes, all v for which u → v and h(u, v) < h(u) < h(v) will retry. (If h(v) < *h(u) then v has already written and committed its transaction.)

How many retries occur, then, for a given history? Note that a node u will retry no more times than the number of processes it reads from (it's in-neighbors). Those processes only write once; so even if upstream processes forced the in-neighbors to retry many times, u will only retry when an immediate neighbor finally writes. (This is not true in the general situation, of multiple transactions per process.) In fact, the retries exactly correspond to {t → u | h(t, u) < h(t) < h(u)}, i.e., those edges whose read is before the source's write.

There is a first node u0, having the least h(u) of all the nodes. This node never retries. There is a second node u0; this node retries if it has a read from u0 and h(u0 → u1) < h(u0); thus u1 retries 0 or 1 times. The ith node retries a number of times r(ui) >= 0. We can bound this above in two ways: first by the number of of in-edges (as observed already) and second by i.

Thus, in one pathological case, the graph is a clique, with all the first reads happening before any writes; here each node i, 0 <= i <= |V|, retries exactly i times, leading to a total of \sumi i retries, or about |V|2/2—a slowdown of about a factor |V|/2.

If the graph is large but narrow—if it has bounded in-degree d—then the worst-case number of retries can not be more than d|V| and thus the slowdown factor is bounded by d.

This may be useful if you know you have a large number of STM processes but they are interacting only in small groups. The assumption of only one transaction per process for a given time period is reasonable if your model involves synchronous time-steps, as it would for a game that's rendered frame-by-frame.

For the future: It might be interesting to check, also, what the expectation of these numbers would be for a random assignment of h.

February 15, 2006

3-body Problem in Haskell

So last week I implemented the n-body simulation in Erlang with message-passing, as an exploration of one-process-per-particle concurrency. Now I've got it up and running in Haskell, one version using STM and another using just MVars (individual atomic variables).

I found the shared-memory approach to be a more natural style than message-passing for this kind of physical model. My technique is simple: for each time step in simulation time, there is a shared vector of the states of the particles. Each process constantly tries to read the current moment values in sequence (blocking on those particles that are not filled in) and when it has read the whole thing, it performs its next-state computation on those states; it then writes its own next-state into the t+1 shared state vector. I want to reiterate that this technique works only because the processes necessarily work synchronously through simulation time.

Why MVars instead of STM? MVars implement a medium-power concurrency primitive—something like test-and-set—and it was plenty powerful enough to write this application without much fuss. Transactional memory is a significantly higher-level primitive, and its added flexibility wasn't necessary in this case. I'd like to make a loose argument that MVars are generally suffficient for the concurrency that's likely to be found in games.

The STM version works very similarly, but instead of just blocking on an unfilled particle-state, it retries. I believe that this is inefficient, because it's not necessary to repeat the reads that have already been performed at a given timestep: the variables in question are all single-assignment. Also, I suspect that the semantics of STM will lead to lots of retries, the threads tending to swarm around an unfilled variable and all of them retrying the whole transaction together. By contrast, the MVar implementation is efficient in the sense that when a state-variable is filled in, one waiting thread is woken, whose take/put action causes another waiting thread to be woken, etc.

Here's the code, for reference. (In case there are any physicists reading: I don't think the model is quite physically correct. It doesn't seem to be conserving energy. If anyone can find my error, I'd be grateful. Hopefully it demonstrates the principles of Concurrent Haskell better than it demonstrates the mechanics of interacting particles! )

-- Ezra's hackneyed gravity sim inConcurrent Haskell, MVar version
import Control.Concurrent

(+:+) :: Num a => (a, a) -> (a, a) -> (a, a)
(+:+) (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)

(*:*) :: Num a => a -> (a, a) -> (a, a)
(*:*) s (x, y) = (x * s, y * s)

force :: (Ord s, Fractional s) => (s, s) -> (s, s) -> (s, s)
force (ax, ay) (bx, by) = if distsq > 0 then (dx / distsq, dy / distsq) else (0, 0)
      where dx = (bx - ax)
            dy = (by - ay)
            distsq = dx * dx + dy * dy

timestep :: (Ord s, Fractional s) => (s, s) -> (s, s) -> [(s, s)] -> ((s, s), (s, s))
timestep pos vel pos_vector = (pos +:+ vel, vel +:+ dv)
          where dv = 0.001 *:* (foldr1 (+:+) (map (\pos' -> force pos pos') pos_vector))

readAllMVars :: [MVar a] -> IO [a]
readAllMVars vars = sequence (map readMVar vars)

rounds = 100
n = 3

new_pos_vec = sequence (map (\_-> newEmptyMVar) [1..n])

particle :: Int -> Int -> ((Float, Float), (Float, Float))
         -> [MVar (Float, Float)] -> [[MVar (Float, Float)]]
         -- -> ((Float, Float), (Float, Float)) 
         -> IO ()
particle i t state currentPosns [] = return ()
particle i t (pos, vel) currentPosns (nextPosns:futurePosns) =
      do {
         nextState <- do { 
             posnsVec <- readAllMVars currentPosns;
             (pos, vel) <- return(timestep pos vel posnsVec) ;
             putMVar (nextPosns !! i) pos;
             return (pos, vel)
            };
         particle i (t+1) nextState nextPosns futurePosns
      }

-- end when futurePosns is exhausted?

glue g [] = ""
glue g [x] = x
glue g (h:t) = h ++ g ++ glue g t

watch [] = return ()
watch (state : futureStates) = 
  do { posns <- readAllMVars state;
       putStr $ (glue ", " $ map show posns) ++ "\n";
       watch futureStates
     }

main =
    -- the initial position and velocity of each particle. They all start
    -- out at rest; their x co-ordinates are 1, 2, 3, 4, ... and their
    -- y co-ords alternate +1, -1, +1, -1, ... This way they don't all
    -- sit on a line, which would be boring.
    let initialStates = [((fromIntegral proc, fromIntegral (proc `mod` 2) * 2 - 1), (0.00, 0.00)) | proc <- [0..n-1]] in
  do {
    -- stateVectors and initialPubStates are the shared variables where
    -- the processes will communicate their state. initialPubStates is
    -- the the initialized first position, whereas the stateVectors all
    -- start out empty.
    stateVectors <- sequence [new_pos_vec | t <- [1..rounds]] ;
    initialPubStates <- sequence $ map (\(pos, vel) -> newMVar pos) initialStates;
    sequence $ zipWith (\s i -> forkIO $ particle i 0 s initialPubStates stateVectors) initialStates [0..n-1];
    watch stateVectors;
 }

Crazy MacOS Header Values

For a while now, I've been getting horrible errors whenever I compile some package on my MacOS X machine. Some of the errors were of the form, MAC_OS_X_VERSION_MIN_REQUIRED must be <= MAC_OS_X_VERSION_MAX_ALLOWED. Others were complaining about some precompiled header being out of date.

It turned out that I had an environment variable, MACOSX_DEPLOYMENT_TARGET set to 10.3. I'd set this a long time ago because it alleviated problems that I seemed to have with every Perl module I tried to install.

I unset MACOSX_DEPLOYMENT_TARGET and ran the magical sudo fixprecomps. This fixed it.

February 14, 2006

Idiots are a Lot Smarter

Simon Willison (et al.) took some nice notes on a talk by Josh Schachter, developer of del.icio.us. Speaking from my experience developing web applications at Amazon.com and Six Apart, I agree with every one of these points. A lot of these are easy no-brainers that lots of web apps don't get right—like dropping users back where they were after registering.

Some particularly nice tidbits:

Scaling: avoid early optimization. SQL doesn't map well to these problems—think about how to split up data over multiple machines. Understand indexing strategies, profile every SQL statement. Nagios or similar for monitoring.

There's no explanation what "these" problems are. Still, I think it's the right sentiment. The relational model is a weird model that was foisted on computing decades ago. It's become a crutch that developers—and particularly web developers—rely on excessively.

"Idiots are a lot smarter than you"—wait to see what breaks before you fix it.

There's little I hate more in engineering that attempts to fix unbroken stuff. Christopher Alexander's best line is, "Every act of building is an act of repair." We're here to make the world better, only.

February 10, 2006

3-body Problem in Erlang

UPDATE: The same problem in Haskell.

I've been learning Erlang, and trying to understand the pros and cons of its approach to concurrency, particularly for physical simuilations that are computationally-intensive and perhaps hard to parallelize. The press packets keep saying that Erlang makes it easy: if you want to model some thing that has n parts, just fork off n processes that each do the work of one part, communicating when they need to. This is by no means the only way to build a model of such a system, but it's what the Erlang evangelists tout. To start testing the wisdom of that, I made a simulation for a simple gravitional system. What follows are the tribulations of a neophyte, and there may well be better ways to do it. What's the upshot? I'm not sold on Erlang's ease-of-use for one-to-one modeling of systems.

The first conceptual problem I hit was the question of synchronization. In order for the sim to be a decent finite approximation to the continuous world of physics, we need to break it into discrete time steps, each of which depends on the previous time step (at least, that's the only way I know of to get a fair approximation). This means that each particle can't just crunch away at it's own speed, working as fast as it can to calculate its position at various times. Easily the particles could grow out of sync with one another, making an inaccurate physical model. So right at the start, Erlang's asynchronous model seems to be a disadvantage.

Another problem is that in this particular model, every process depends on every other. As such, it would be nice if each process could simply fetch the needed information about the others. In Erlang, that's not trivial, since it would require each particle acting as a server, responding to requests for its position data, and each one acting as a client, sending off requests and waiting for responses. That sounded overly complicated, so I chose a different approach: each particle broadcasts its position as soon as it has been calculated. This basically dispenses with one half of the client-server interaction; but it means that each process can receive messages that it doesn't immediately care about, so it has to manage that information.

I solved the synchronization problem by attaching a time stamp (in model time, not real time) to each position message. When it has enough information about time t, a process calculates its position for time t+1 and broadcasts a message that says, "My position at t+1 is (x, y)." The others capture this information, and when they know all they need to know about time t+1, they each calculate their position at time t+2, and so on ad nauseam.

As I said, a process can receive messages that it doesn't care about yet. In the code, you'll see that each process keeps track of a a list of "current" positions of particles, and also a list of "future" positions. In this particular model, I know that a process can only receive data that's at most one time step ahead of it. That's because a process that is "at" time t (i.e., it has not yet computed its t+1 position) cannot receive information about others' t+2 positions, because those t+2 positions would depend on its own t+1 position. That makes the future-data management a little easier in this case. A different model might not have that constraint, and would require better management of the future data.

This model is particularly pernicious, though, since it has total mutual interdependence. I'm interested in how well Erlang would work for big game worlds; maybe in those worlds each object has a small neighborhood of other objects that can affect it. But, I expect that the coding style would be the same within that neighborhood. What's more, if objects can move around and change neighborhoods, then there will be the issue of managing the set of objects in each process's neighborhood. Is this unneccessary overhead?

A final note about the paradigm: a lot of my work went into managing the messages, since they can be received at any time and in any order. The machine does a lot of work in handling the messages, too: at each time step there are n distinct messages but the machine has to deliver n2 messages. In this case, the particle positions are only written by one process, and they are read by all the others. A shared-memory approach might have an advantage here, since locking per se might not be needed.

At last, the code. To run it, start up erl and type c(atoms)., then atoms:start().

-module(atoms).
-export([start/0, particle_loop/10, monitor/2]).

sum_vec([], Init) -> Init;
sum_vec([{X, Y}|Etc], Init) ->
    {Rx, Ry} = sum_vec(Etc, Init),
    {Rx + X, Ry + Y}.

% force()
% Calculate the force between two particles, assumed to be of the same mass

force(Ax, Ay, Bx, By) -> 
  DistSq =((Bx - Ax) * (Bx - Ax) + (By - Ay) * (By - Ay)),
  {0.0001 * (Bx - Ax) / DistSq,
   0.0001 * (By - Ay) / DistSq}.

% timestep(X, Y, Vx, Vy, Others)
% Calculate the next position and veloctiy for a particle at X, Y, 
% moving at velocity Vx, Vy, acted upon by the forces of the 
% particles in Others

timestep(X, Y, Vx, Vy, Others) ->
  {NewVx, NewVy} = sum_vec(lists:map(fun({N, Nx, Ny}) -> force(X, Y, Nx, Ny) end,
                                     Others), {Vx, Vy}),
  {NewX, NewY} = sum_vec([{Vx, Vy}], {X, Y}),
  {NewX, NewY, NewVx, NewVy}.

% inform_pos()
% Inform another process about my position at this time

inform_pos(To, From, T, X, Y) ->
    To ! {posof, From, T, X, Y}.

% particle_loop()
% This routine is the main control loop for each particle thread.
% It keeps track of the current time (according to this particle),
% the particles own current position, a list of the other particle
% processes, and a set of messages that have been received from
% others, regarding their current and future positions.
%
% Synchronization is achieved by keeping this explicit time 
% parameter, T. No particle can process a time step until it gets
% all the messages for all the other particles for that time step.
% In the event it receives a message about a future time step, it
% queues that up in the ParticlesLater set. Such a future message
% can never be for a time later than T+1, because the other particle
% must be waiting for this one to issue its position for T+1.
% A slightly more robust model would allow information to be received 
% about any future time step at any time, and would carefully 
% determine which information is relevant to the iteration it's 
% presently calculating.

particle_loop(Name, T, X, Y, Vx, Vy, Monitor, Procs, ParticlesNow, ParticlesLater) ->
  receive
    % the 'neighbors' message just lets a process know the PIDs of all 
    % the other particles.
    {neighbors, MsgProcs} -> 
        Others = [P || P <- MsgProcs, P =/= self()],
        [inform_pos(P, Name, T, X, Y)  || P <- [Monitor|Others] ],
        particle_loop(Name, T, X, Y, Vx, Vy, Monitor, Others, 
             ParticlesNow, ParticlesLater);

    {posof, Sender, MsgT, OtherX, OtherY} -> 
       if
         MsgT == T -> 
           NowParticlesNow = [{Sender, OtherX, OtherY}|ParticlesNow],
           if (length(NowParticlesNow) == length(Procs)) -> 
                {NewX, NewY, NewVx, NewVy} = timestep(X, Y, Vx, Vy, NowParticlesNow),
                [P ! {posof, Name, T+1, NewX, NewY} || P <- [Monitor|Procs]],
                particle_loop(Name, T+1, NewX, NewY, NewVx, NewVy, Monitor, Procs, ParticlesLater, []);
              true ->
            particle_loop(Name, T, X, Y, Vx, Vy, Monitor, Procs, NowParticlesNow, ParticlesLater)
           end;
     MsgT == T+1 ->
        particle_loop(Name, T, X, Y, Vx, Vy, Monitor, Procs, ParticlesNow, [{Sender, OtherX, OtherY}|ParticlesLater]);
         true -> io:format("posof recd, no match\n")
       end;
    Msg -> io:format("rec'd unknown message"), io:write(Msg), io:nl()
  end.

start() -> 
          Monitor = spawn(atoms, monitor, [0, []]),
          Procs = [spawn(atoms, particle_loop, [N, 0, N, 0, 0, 0, Monitor, [], [], []])
                     || N <- [1, 2, 3]],
          [P ! {neighbors, Procs} || P <- Procs],
          io:format("initialized\n").

monitor(T, State) ->
  receive
    {posof, Particle, PT, X, Y} -> 
        NewState = [{Particle, X, Y}|State],
        if (length(NewState) == 3) ->             % fixme: assumes exactly 3 particles
            Particles = lists:keysort(1, NewState),
            Posns = [{X, Y} || {N, X, Y} <- Particles],
            io:format("~p\t~p\t~p\n", Posns),
            monitor(T+1, []);
          true ->
            monitor(T, NewState)
        end
  end.

February 9, 2006

Distributed Modeling

Amazon wants to do more modeling of their distributed environment. Werner Vogels has the details.