March 27, 2012

Which exactly answers its end

I happened upon this nice letter to the Communications of the ACM by Jeremy Gibbons. Like him, I find it disappointing how often people focus on the superficial aspects of "code beauty," at the expense of a program's denotational simplicity, which makes a hard problem easy.
A s an admirer of the “artistic flare, nuanced style, and technical prowess that separates good code from great code” explored by Robert Green and Henry Ledgard in their article “Coding Guidelines: Finding the Art in the Science” (Dec. 2011), I was disappointed by the authors’ emphasis on “alignment, naming, use of white space, use of context, syntax highlighting, and IDE choice.” As effective as these aspects of beautiful code may be, they are at best only skin deep. Beauty may indeed be in the eye of the beholder, but there is a more compelling beauty in the deeper semantic properties of code than layout and naming. I also include judicious use of abstraction, deftly balancing precision and generality; elegant structuring of class hierarchies, carefully trading between breadth and depth; artful ordering of parameter lists, neatly supporting common cases of partial application; and efficient reuse of library code, leveraging existing definitions with minimum effort. These are subjective characteristics, beyond the reach of objective scientific analysis— matters of taste not of fact—so represent aspects of the art rather than the science of software. Formalizing such semantic properties is more difficult than establishing uniform coding conventions; we programmers spend our professional lifetimes honing our writing skills, not unlike novelists and journalists. Indeed, the great American essayist Ralph Waldo Emerson (1803–1882) anticipated the art in the science of software like this: “We ascribe beauty to that which is simple; which has no superfluous parts; which exactly answers its end; which stands related to all things; which is the mean of many extremes.” It is to this standard I aspire. Jeremy Gibbons, oxford, u.K

January 14, 2011

Compiling regex-posix in Haskell for Windows

WIth a coworker, I just had a problem with a fresh installation of Haskell Platform on Windows, for which we couldn't find a ready fix online. Here's what happened on how we fixed it, in hopes of helping others work around it.

The package regex-posix-0.94.2, which was installed as part of haskell-platform-2010.2.0.0, had a bug in its installation script where it didn't properly locate a dependent library (the libregex package) and so it would fail to link when it was loaded.

Mailing list traffic identified that this was the core problem, and it sounded well understood, so that a fix ought to have been made, although it wasn't obvious; I was able to see that regex-posix had a newer version, 0.94.4, and while unable to find a changelog for that package, I conjectured that this might have the fix. So I upgraded the regex-posix package, as follows:

cabal install regex-posix-0.94.4

Apparently, the cabal upgrade regex-posix command might have had the same effect.

Then we had to reinstall the packages that depended on it, which we were able to determine using the

ghc-pkg dot

output. In our case, the reinstall commands were as follows:

cabal install --reinstall test-framework
cabal install --reinstall test-framework-hunit
cabal install --reinstall test-framework-quickcheck2

It seems that, until we did this, these packages still depended on the old versions. Oddly, when we ran cabal install --reinstall regex-compat, it kept its dependency on the old regex-posix, even though regex-compat's .cabal file didn't have a version limitation barring the new version. Anyone know what's up with that?

Seemingly Haskell Platform should be point-released with a dependency on the newer regex-posix. I shall have to talk to Haskell Platform folks about whether this is the right thing.

Update Two further mailing-list posting about this: from Sept 2010 and Nov 2010.

November 3, 2009

Function calls are not stack frames

Tim Bray is spreading more misinformation about tail recursion. He describes it this way:

It looks like a subroutine call, but in the case where it occurs as the last thing in the routine, it magically, silently, and automatically gets turned into, now how did I put it? “A highly controlled and structured GOTO.”

A tail-call is a subroutine call. The efficient implementation does not magically transformed into something else; if it doesn't create a stack frame on such a call, it's because one simply isn't relevant.

The essential observation behind the efficient-tail-call implementation (not "optimization"—more on which in a moment) is as follows: For most programming languages, a stack frame is needed not for a subroutine call but only for an argument evaluation, that is, an evaluation whose result is temporary and needs further processing. Calls in the middle of a procedure are "argument" evaluations, because their results need further processing. It's really the temporary, non-final natural of the result that forces us to do the book-keeping that remembers where to come back to.

Another confusion is between semantics and cost model:

Call me old-fashioned, but it just seems wrong when the semantics of two identical function invocations vary wildly as a function of whether there’s an executable statement between the invocation and the function exit.

The semantics of the call doesn't change; the result and the side-effects are the same (that's what's usually meant by "semantics" anyway). The cost, on the other hand, might be quite different depending on whether a stack frame is needed.

Unfortunately, efficient tail recursion has often been described as a transparent "optimization," so that it might or might not be efficient and the programmer can't tell in advance.

Efficient tail calls (or space-consuming ones) really should be part of the official "cost model" of the language, something that goes along with the semantics, as a peer to the semantics, in fact. The cost model tells you how expensive you can expect things to be, but should be a little less binding than the semantics, since language implementors should have some freedom to do additional optimizations.

The idea that stack frames should correspond directly to the call structure is just odd. Maybe we want to know the call structure at runtime; in that case, we should capture that as debugging information, or as a reflection feature of the runtime system, but not as a core language design. Let the language implementation use the stack as efficiently as possible!

To summarize:

  • "Tail-call optimization" is a terrible misnomer.
  • The programmer should have assurance as to whether tail-calls have a cost.
  • Most languages have no reason to use up space on every function call; only calls whose result will be fed to another expression need to use space.

May 20, 2009

Discovering the platform, and the problem

I appreciated Dan Weinreb's comment on why MIT switched from Scheme to Python in teaching freshman programming:

[C]omputer engineering was [once] based on starting with clearly-defined things (primitives or small programs) and using them to build larger things that ended up being clearly-defined. Composition of these fragments was the name of the game. However, nowadays, a real engineer is given a big software library, with a 300-page manual that’s full of errors. He’s also given a robot, whose exact behavior is extremely hard to characterize (what happens when a wheel slips?). The engineer must learn to perform scientific experiments to find out how the software and hardware actually work, at least enough to accomplish the job at hand.

It's very perceptive, this idea that a programmer's job is, in part, to work out how an existing system really behaves--through "scientific experiments." It flies in the face of the mathematical approach to programming, where we compose perfect units to accomplish our well-defined goal.

What's more, many programming tasks don't have a clearly-defined goal. You might instead want to explore a certain area by slapping things together. Or the goal may shift as you discover the problem terrain through coding.

None of this is incompatible with functional languages--but it might suggest an argument against purity: it might be very valuable for a language to have back doors, which allow you to do something improper, at least for a time, while you're poking around at the system or the domain.

But "back doors" are also an argument in favor of orthogonal language design. Being able to put functions anywhere, for example, allows you to code routines that are extremely unreadable--truly a vice. But when you're bodging around, it's very handy to be able to plop in a function where there once was a value, or the like. Functional languages win here, since they nearly always take orthogonality much further than imperative ones. (By this measure, though, Perl rates as a functional language!) Hopefully, after discovering what you needed to discover, you'll rewrite the code in a clear way, without very deep nesting of constructs, for example.