Tricky Query Optimizations
Optimizing language-integrated queries is trickier than it looks.
Besides the structural problem I talked about yesterday, there are cases where a naive "optimization" can easily make the result run much slower than the direct version. This will happen whenever some time-consuming operation occurs between an inner and an outer comprehension. Suppose t
is a table with loads of big numbers in the number
column; consider this Links algorithm:
for x <- t in
factors = factors_of(x.number); # (costly)
for y <- s in
(y, factors)
A naive optimizer would say, "I should combine these two queries rather than re-running the inner query over and over again." It would yield an algorithm that works like this:
for (x, y) <- (t, s) in
factors = factors_of(x.number); # (costly)
(y, factors)
But this algorithm might evaluate factors_of
many more times than the original, since there may be more than one pair (x, y)
for any given value of x
. The optimization improved performance with respect to the database, but worsened the performance of the algorithm itself.