Ezra's Research: Tricky Query Optimizations

Optimizing language-integrated queries is trickier than it looks.

Besides the structural problem I talked about yesterday, there are cases where a naive "optimization" can easily make the result run much slower than the direct version. This will happen whenever some time-consuming operation occurs between an inner and an outer comprehension. Suppose t is a table with loads of big numbers in the number column; consider this Links algorithm:

for x <- t in
    factors = factors_of(x.number);  # (costly)
    for y <- s in
        (y, factors)

A naive optimizer would say, "I should combine these two queries rather than re-running the inner query over and over again." It would yield an algorithm that works like this:

for (x, y) <- (t, s) in
    factors = factors_of(x.number);  # (costly)
    (y, factors)

But this algorithm might evaluate factors_of many more times than the original, since there may be more than one pair (x, y) for any given value of x. The optimization improved performance with respect to the database, but worsened the performance of the algorithm itself.

Ezra's Research

Programming languages. Ezra Cooper.

Tricky Query Optimizations