Ezra's Research: August 2008 Archives

Sensible SQL: The second typing for grouped columns

SQL keeps turning out to be more sensible than I thought it was.

If you're like me, you thought that GROUP BY clauses partitioned your columns into "grouping columns" and "non-grouping columns" and that the MUST to be processed with aggregate functions (count, sum, max, min, average) and that the second MUST NOT be processed that way. This would have been a problem for converting Peyton Jones and Wadler's Comprehensive Comprehensions to SQL, since that technique uses a ncie uniform treatment of group-by clauses: they transform all the query's bound variables to list type for the remainder of the query. Exampleâ€”here's a simple query that takes a table, groups it on values from columna a, and returns pairs of the value from a and the sum of the list of corresponding column b values:

select a, sum(b) from t group by a

Fine and simple. And as you may know, the following produces an error:

select a, b from t group by a

Because we've grouped on a, we can't just select column b within each grouping because there are multiple column b values. The DBMS will insist that we use an aggregate function to convert this collection to a primitive-type value. Such are the restrictions of the relational model.

But Peyton Jones and Wadler's proposal simply treats all columns, when grouped, as lists. So

[(a, b) | (a, b) <- t, group by a]

is a perfectly legal expression of type [([A], [B])] (letting A and B be the raw column types of a and b, resp.). What they expect you to do, to make SQLizable expressions, is to apply some other function to reduce each column, a typical example being

[(the a, sum b) | (a, b) <- t, group by a]

which corresponds to the first SQL query above: it collects the values from a and matches these with the sums of corresponding values from b. But the above syntax allows you to do things like this:

[(sum a, sum b) | (a, b) <- t, group by a]

which seems to correspond to SQL that applies an aggregate function to a grouped column as well as an ungrouped column. "Whither SQL!?" I worried! Yet all was not lost. SQL treats grouped columns as being alternately bag-typed or primitive-typed, depending on the context. So the following is a valid query:

select sum(a), sum(b) from s group by a

Huzzah! This shows SQL to be in closer accordance with a sensible, uniform, orthogonal languge such as Peyton Jones and Wadler's than I'd imagined. In other ways, too, I keep finding it easier and easier to make SQL queries out of difficult FP expressions by using little-known SQL features. Cheers, guys.

Posted by Ezra elias kilty Cooper at 11:50 PM | Permalink

Slow blog year chez Ezra

Hey, I haven't been blogging very much! Sorry about that, guys. It's just, well, research is so vague and inconclusive most of the time. But, I'll try and do better. Watch this space!

Posted by Ezra elias kilty Cooper at 2:54 PM | Permalink

NamedGZipStream, Covariance and Contravariance

Favorite MSDN article title of the month: NamedGZipStream, Covariance and Contravariance.

That's Microsoft, as always, blending the sublime and the ridiculous.

(Thanks also for the bogus etymology of "covariance": "In mathematics, covariance is a measure of the degree to which two variables move up or down together. The term was co-opted by the OO crowd to describe..." Way to overlook category theory, guys!)

Posted by Ezra elias kilty Cooper at 2:50 PM | Permalink

Ezra's Research

Programming languages. Ezra Cooper.

August 27, 2008

Sensible SQL: The second typing for grouped columns

August 22, 2008

Slow blog year chez Ezra

NamedGZipStream, Covariance and Contravariance