How Much Contention in STM?
Suppose you have a set of processes interacting via STM, and suppose that for a given region of time, each process performs one transaction which writes just once to one shared variable (and may read from any number). We can give tight bounds on the number of retries that each process will perform, and thus the amount of extra work.
Describe the relationships between the processes using a directed graph (V, E), where the nodes in V represent the processes and an edge u → v asserts that v reads from u during its transaction.
Such a set of transactions can be executed in a variety of ways.
To describe the different ways a set of processes can execute, define a history as a 1-1 function h : {1 ... |V|+|E|} ↔ V ∪ E. This attaches a sequential number to each process and each read, denoting the order of the write operations and the order of the first read operation for a given u → v. h(v), the time of v's write operation, must be h(v) > h(u, v) for all u → v. That is, it must write after all of its reads. However, it may read from u before or after u actually writes its value—such is the nature of concurrency and STM. Thus whenever a node u writes, all v for which u → v and h(u, v) < h(u) < h(v) will retry. (If h(v) < *h(u) then v has already written and committed its transaction.)
How many retries occur, then, for a given history? Note that a node u will retry no more times than the number of processes it reads from (it's in-neighbors). Those processes only write once; so even if upstream processes forced the in-neighbors to retry many times, u will only retry when an immediate neighbor finally writes. (This is not true in the general situation, of multiple transactions per process.) In fact, the retries exactly correspond to {t → u | h(t, u) < h(t) < h(u)}, i.e., those edges whose read is before the source's write.
There is a first node u0, having the least h(u) of all the nodes. This node never retries. There is a second node u0; this node retries if it has a read from u0 and h(u0 → u1) < h(u0); thus u1 retries 0 or 1 times. The ith node retries a number of times r(ui) >= 0. We can bound this above in two ways: first by the number of of in-edges (as observed already) and second by i.
Thus, in one pathological case, the graph is a clique, with all the first reads happening before any writes; here each node i, 0 <= i <= |V|, retries exactly i times, leading to a total of \sumi i retries, or about |V|2/2—a slowdown of about a factor |V|/2.
If the graph is large but narrow—if it has bounded in-degree d—then the worst-case number of retries can not be more than d|V| and thus the slowdown factor is bounded by d.
This may be useful if you know you have a large number of STM processes but they are interacting only in small groups. The assumption of only one transaction per process for a given time period is reasonable if your model involves synchronous time-steps, as it would for a game that's rendered frame-by-frame.
For the future: It might be interesting to check, also, what the expectation of these numbers would be for a random assignment of h.