Main

December 1, 2011

The Scala-Yammer note

This criticism of Scala-in-production is most astute. I wish we had more of these things to drive functional language development. I'm very pleased to see that the Yammer people took their use of Scala as far as they did, and hope that young startups continue to experiment with modern languages and that languages continue to meet such needs better and better every year.

February 4, 2011

Polis and Oikos online

All contributors to this debate (at the New York Times) argue that the reason women are underrepresented among Wikipedia editors (according to a recent survey) because the community is unwelcoming to women in subtle ways. This seems to me a surprising conclusion to draw for a couple of reasons.

First, how do we know it is so unwelcoming? The data in the survey doesn't directly support that, and the debaters don't give any new evidence that it is so. (I wouldn't be shocked to see such evidence, but I haven't yet.) Some of the characterizations seem downright wrong to me: Joseph Reagle says, "a culture of hacker elitism can be unappealing to those unable or unwilling to hew to the stereotypical features of the hacker." Is Wikipedia a culture of hacker elitism? Maybe twenty years ago, any "online community" had some of that hacker flavor, even if it centered on, say, recipe sharing. But now that the web is so mainstream, I don't believe this is always true. If the space is unwelcoming, it's not for this precise reason.

Second, and more importantly, what about the fact that identity is—at least ostensibly—so easy to cloak on the internet? Wikipedia is a place of handles and obscure user pages. What percent of Wikipedia editors have a recognizable gender on the site? Is it hostile to editors who are actually seen as women? Or only because the activity in discussion pages is somehow untenable for women's participation—even when so cloaked?

I wouldn't want Wikipedia's women to rely on such cloaking, but still I'm surprised that more women aren't participating under such cover, or at least that the Times' debaters aren't more interested in the question of how gender is represented and enforced on Wikipedia, if that is the claim. For example, one debater, Anna North, says, "I suspect ... one factor [leading fewer women to contribute] may be that solitarily editing an article on an online encyclopedia may be seen as antisocial." But, seen as antisocial by whom? Who's doing this policing?

A very interesting point comes from Susan Herring, who has studied online gender dynamics from mailing lists forward:

Wikipedia ... is public. Men ... tend to feel a greater sense of entitlement to occupy public space. This is reflected in my finding that men regularly post longer messages to online discussion forums than women do, and they rarely apologize for message length, even when they go on for 20 screens, whereas women apologize even for short messages. Some women may lack the confidence to contribute to Wikipedia or feel that it would be presumptuous of them to do so.

Elsewhere, it's been reported that online discussions can be "intimidating" to women—but the above quote gives some color to the sort of intimidation we're talking about: not threats as such, or even nasty remarks necessarily. Sometimes it's just weight

Herring also notes the relatively higher participation of women in blogs and on Facebook, which might be seen as "private" spaces. This seems a more compelling mechanism for the gender gap: this entitlement that men feel toward public places, as well as the importance they place on fighting over it. There is only one "Egypt" article on Wikipedia—who gets to affect it? Perhaps men will feel a willingness, if not a compulsion, to work for that space, while women may simply feel it's "not worth it" to engage in the flame wars (even if not especially bitter or gender-biased in their content).

To me, it is that "feeling of presumptuousness" in modifying a central public space (a forum, a polis) that is most telling—not to mention the genuine presumptousness of someone who'd ramble on linguistics for twenty screenfuls, taking the right to fill that space, and perhaps fight over it.

Much more research needs to be done on this!

October 17, 2010

Installing Gtk2Hs on MacOS

After months of quiet yearning & frustration, I finally chewed into a big bullet and got Gtk2Hs installed on my Mac (OS 10.6.4 with GHC 6.12.3).

It was a long and harrowing affair. There were many confusing error messages. Ultimately, though, it was just a process of being patient with the various package managers and manually chasing down dependencies that are external to cabal. I had to use both Fink and MacPorts (and had to do lots of time-consuming update and selfupdate runs on these).

The Packages

Here are the packages I can remember having to install:

  • glib (from Fink)
  • gtk+2 (from Fink; NB -shlib and -dev packages)
  • freetype (??)
  • fontconfig (??)
  • cairo (from MacPorts)
  • pango (from MacPorts)

After all that, cabal install gtk just worked. I was able to run example programs from this Gtk2Hs tutorial and get the windows to pop up and work correctly.

Double-Fisting

Running Fink and MacPorts is like having bacon on your sausage; it's a little redundant and includes much cholesterol, but it's easy to do and feels good at the time. Why I did this, or felt I had to, is now lost to me as well.

Misc. Stumbling Blocks

It bears noting that gtk and gtk+ are essentially synonyms, so don't go hunting for one when you've got the other. On the other hand, a "2" suffix there is essential: gtk2 and gtk-2.0 have nothing to do with "gtk" or "gtk 1.x"; they're essentially different products.

Several times I thought I was set having installed, say, the gtk+2 and gtk+2-shlibs packages from Fink, only to receive a complaint that gtk+2.pc (the pkg-config data file which other packages use to locate gtk+2) was missing, which made me gnash my teeth. The fix was to also install gtk+2-dev, which apparently should be called gtk+2-without-which-nothing.

Wordsize Conundra

I've had a number of minor 32- vs 64-bit problems with GHC. It seems that Haskell Platform installs a 32-bit ghc. (Why would this be? GHC does run 64-bit on MacOS, doesn't it?). Sometimes package managers install fat versions of libraries, which is good, but when using both Fink and MacPorts, a library with 64-bit-only versions might shadow the other.

Somewhere in the darkness, I had a problem with zlib being 64-bit and hence unusable, and I couldn't determine which library was using it in order to muck with it (it wasn't gtk itself, glib, nor the Haskell zlib bindings). This was fixed by un- and re-installing GHC via Haskell Platform, which entailed a minor version upgrade as well as perhaps clearing any strange customizations that I can't deny making to the 32-bitness of my GHC install.

(I'm surprised there's not a better system for managing such library flavors on OS X. Couldn't we keep separate system paths like /usr/local/lib64 and such? Or would this be a greater nightmare in itself?)

Apologies

I wish this were a more helpful document. If you try this soon, and hit a snag, let me know and I'll either remember how I resolved it, or try to add to what's here.

November 3, 2009

Function calls are not stack frames

Tim Bray is spreading more misinformation about tail recursion. He describes it this way:

It looks like a subroutine call, but in the case where it occurs as the last thing in the routine, it magically, silently, and automatically gets turned into, now how did I put it? “A highly controlled and structured GOTO.”

A tail-call is a subroutine call. The efficient implementation does not magically transformed into something else; if it doesn't create a stack frame on such a call, it's because one simply isn't relevant.

The essential observation behind the efficient-tail-call implementation (not "optimization"—more on which in a moment) is as follows: For most programming languages, a stack frame is needed not for a subroutine call but only for an argument evaluation, that is, an evaluation whose result is temporary and needs further processing. Calls in the middle of a procedure are "argument" evaluations, because their results need further processing. It's really the temporary, non-final natural of the result that forces us to do the book-keeping that remembers where to come back to.

Another confusion is between semantics and cost model:

Call me old-fashioned, but it just seems wrong when the semantics of two identical function invocations vary wildly as a function of whether there’s an executable statement between the invocation and the function exit.

The semantics of the call doesn't change; the result and the side-effects are the same (that's what's usually meant by "semantics" anyway). The cost, on the other hand, might be quite different depending on whether a stack frame is needed.

Unfortunately, efficient tail recursion has often been described as a transparent "optimization," so that it might or might not be efficient and the programmer can't tell in advance.

Efficient tail calls (or space-consuming ones) really should be part of the official "cost model" of the language, something that goes along with the semantics, as a peer to the semantics, in fact. The cost model tells you how expensive you can expect things to be, but should be a little less binding than the semantics, since language implementors should have some freedom to do additional optimizations.

The idea that stack frames should correspond directly to the call structure is just odd. Maybe we want to know the call structure at runtime; in that case, we should capture that as debugging information, or as a reflection feature of the runtime system, but not as a core language design. Let the language implementation use the stack as efficiently as possible!

To summarize:

  • "Tail-call optimization" is a terrible misnomer.
  • The programmer should have assurance as to whether tail-calls have a cost.
  • Most languages have no reason to use up space on every function call; only calls whose result will be fed to another expression need to use space.

May 9, 2009

Apologies

It's very disappointing when someone gets up to do a talk and starts with an apology about the quality of the slides, or even the work itself. In the audience, we expect to see the presenter's best. A talk is, in many ways, a final outcome of the research work, not an intermediate state, and so it needs to be, simply, the best you can do.

You never tune in to your favorite TV show to see a disclaimer like this scroll across the screen:

Tonight's show is not going to be very funny and includes lots of shots of the outside of the house. That's because the writer was hung over all week and one of the actors turned up an hour late because the car wouldn't start—you know how that is!! We're really sorry and we promise that next week's show will be better.

TV shows, like academic talks, are performances, and final products—rather than working products—of the work that went in to produce them.

"The best you can do" doesn't have to be the best thing ever. There's a range in every community, and that's part of life. If someone gives a mediocre presentation one day and a better one another day, we'd say he or she improved. But we never adjust our estimation of someone's work because of excuses.

When you know you've done suboptimal work, or you're not fully prepared, it's best just to press on and give the best performance you can. Maybe your slides are crap—either skip some, or clarify them in words, or redraw them on a whiteboard, or something. Don't just apologize and expect the audience to swallow something bitter-tasting. It may be important to characterize what you've done modestly, for example, being clear about what problems you have and haven't solved, but that's quite different from an apology. You might give a talk where you say, "We tried to solve these three problems but we haven't really solved any of them." And maybe this failure is even because of desultory work; it still might be worth giving a talk and passing on what you've learned. "Here's why these problems are hard," you could say.

Apologies are important, in general, to signal that you know you've made a mistake. In a relationship of trust, when you've messed up, you need to acknowledge it; otherwise your confidantes may think the bad behavior was typical of you, or that you would do it again in the same situation. Such a relationship might be an intimate one—say you forgot your lover's birthday—or it could be a working relationship—the writer on the sitcom, above, who showed up for work hung over, was trusted by his co-workers to give a strong effort on the show. That he turned in dodgy work, causing the show to come out crap, is a mistake worthy of an apology. The apology allows you to regain trust, and of course many mistakes are forgivable. An apology signals that you learned something from the mistake, and so others can risk trusting you again.

Contrariwise, TV viewers are not in a relationship of trust with the production team. They want a damn good show with no buts about it. Anything else will simply drive them elsewhere—likewise, the audiences of technical and academic talks.

May 6, 2009

Bon mot

It is important, and difficult too, to be unsnobbish when passing on a sense of taste.

January 7, 2008

On Why Most Published Research Findings Are False

Here's a stimulating article: Why Most Published Research Findings Are False by John P. A. Ioannidis (PLoS Medicine, 2005). It focuses on research that aims to find statistical relationships in data, and asserts that most such relationships claimed in the literature are in fact false. Distilling the discussion, I find these compelling reasons why it would be so:

  • standard tests of "statistical significance" are taken as proof of a proposition,
  • there is bias in experimental design & interpretation,
  • researchers and journals prefer positive results,
  • experiments tend not to be independently reproduced.

This last point is particularly damning—few things are more essential to the scientific method than reproducible experiments, yet the article blithely says (and I readily believe) that most biomedical studies are not reproduced. In fact, the competitive publication cycle works against this: merely to confirm an existing result is not very publishable; to contradict an existing result may be publishable, but this means, as Ioannidis notes, that there can be an alternation of positive, then negative, then positive, then negative results on a particular question, as each team becomes interested in upsetting the last published result. Far from amassing independent evidence on a question, this is just another selection bias that works against the scientific process.

Interestingly, the article is wholly unscientific. Without stating its assumptions, it works these assumptions through to conclusions. Along the way, it presents a bunch of formulas, which add the gloss of analysis to what is essentially a work of persuasive writing—but I don't buy the formulas, which include unobservable (and perhaps ill-defined) quantities such as "the ratio of true relationships to no-relationship pairs in a field" and "the false-negative error rate." (How amusing would it be if this article were a false research finding?) But methodology aside, I do believe it: that many, it not most, published research findings are false.

I'd be interested to see someone look at the issue in other kinds of fields—fields that aren't quantitative, for example. In the field of programming languages, people make a lot of claims that are justified by proofs. How often are these proofs actually correct, I wonder? And how often are the claims correct? Moreover, much PL research is not even "claim-based," as such. Many papers simply present a feature or technique and tout its virtues—and don't make falsifiable claims, at all. And often, this is the most valuable research: someone tried something and described the experience. We can learn from others' experience, even without proven claims.

How do we assess the value of a research field, such as that of programming languages? How do we know when we're doing a good job?

September 10, 2007

Abstraction and Variation

Fellow researchers may be interested in this fanciful article on the techniques, advantages, and supposed pitfalls of abstraction in software engineering.

June 18, 2007

Missing Features of HTML

I've just been looking at this position paper, "Declarative Models for Ubiquitous Web Applications" [1]; amongst other things, it makes some criticisms of HTML that are in line with my goals for the Links project:
  • A declarative expression language notation is not specified in (X)HTML. As a result developers have to code dynamic aspects of the user interface by means of server side scripts or Javascript DOM [DOM] manipulation.
  • (X)HTML classic forms [Forms] are severely limited in terms of typing information. For example, It cannot be specified that an input element accepts a date, a number or a patterned text. Furthermore, due to limitations in the submission formats, applications have to convert everytime, typically at server side, strings to the correct type expected by the application logic.
  • Lack of data binding mechanisms. Data binding is a technique that allows to associate data between the model and user interface components in a declarative way. Developers typically write server side scripts or Javascript AJAX-based code to populate user interface components with data. As a result, in (X)HTML-based user interfaces there is no clear separation between the model, the view and the controller.
It looks like the authors are concerned more with the phenomenon of web apps deployed on mobile devices and inside other web apps, rather than "ordinary" browser–server apps. Who knows what will come of it. [1] Fonseca, J. M. C., Prendes, I. M., Soriano, J., Hierro, J. J. Declarative Models for Ubiquitous Web Applications. April, 2007.

February 25, 2007

DrScheme UI complaints

DrScheme is a very good app for writing Scheme code. It's much better than any other that I've used, or am aware of. But I do have some complaints; I'm lodging them here for later reference.

  1. When I get an error that my parens don't match, the blinky-paren-matching feature is turned off! (The error is highlighted in pink, and paren-blinking is turned off in the pink region.)

  2. In the REPL, there's no keystroke for "execute this": instead I have to go to the end of the expression, then hit Return. (Something like Shift-Return would do the trick.)

  3. In the REPL, there's no easy way to retrieve the last command for editing. (In command shells, this is usually just Up-arrow. Is there any reason not to use that here. Surely I'll never need to use the keyboard to select random interpreter output, if that's what it's being saved for. I'd be quite happy, though with Ctrl-Up-arrow.)

  4. Cmd-Left-arrow (on the Mac) takes me to a point before the interpreter prompt. It's inconceivable I could ever want to go there. Make it take me to the beginning of my expression, just after the prompt.

  5. The Modifier-Arrow combinations don't work at all in the Help window's input field, and neither does double-clicking to select by words.

  6. The dock/undock feature for "palette" windows is not very compelling, but it does take up space and confuse matters. On the Mac in particular, the Find/Replace palette comes up as a "sheet," extending from the top of the window, which is entirely the wrong mode, for at least two reasons. First, because it's modal: it doesn't let me interact with the underlying window. Second, because it sometimes obscures the very text I'm looking for. Pretty much every text editor on the Mac does Find/Replace with a floating window—why not do it that way? Perhaps the platform-independence of DrScheme makes this hard to implement. In that case, it might be okay to use an ordinary (non-floating) window: one that can be hidden by other windows. In that case, there is a question of which definitions window will be searched. Possible right answers are: (1) the one that was frontmost when I opened this Find/Replace window, or (2) whichever one is frontmost when I hit the "Find" button.

  7. The behavior of Ctrl-arrows and Alt-arrows are reversed, IMHO. On the Mac, Option (Alt) always means "move by words" and Ctrl usually means something else, appropriate to the context. DrScheme's Ctrl-arrows move by word, while Option-arrows move by expression; this should be the other way around to keep with the Mac conventions. I wonder if other Mac users agree?

November 7, 2005

OCaml Gripes

Gripes with OCaml:

  • Types that are declared in an interface file also need to be redeclared in the implementation file.
  • let is stranger word than fun (as in SML) for declaring functions
  • OCamlMakefile doesn't detect dependencies if the interface file is not mentioned in SOURCES; "X and Y have inconsistent assumptions"
  • No built-in "compose" combinator, can't use o (as in SML) or . (as in Haskell)
  • No before form as in SML (very useful when debugging, checking return values)
  • a record type cannot be given directly within a variant constructor declaration (e.g. Foo of {field : string}).
  • Syntax errors are often unhelpful (Syntax error)
  • Never helpfully points out that you might've partially applied a function when you meant to fully apply it (Haskell does help in this way!).
  • Interactive shell is very difficult to use:
    • It is picky about terminating each expression with exactly two semicolons.
    • There are lots of special commands for loading modules (open FooModule is never enough) and they have different conventions: for example, some begin with # and require an exact filename. I never know what file needs to be #loaded.
  • The build system detects when a module's interface has changed, but forces programmer to do something about it.
  • The built-in regexp interface is very clunky:
    • Str.quote needlessly returns a string instead of a regexp, requiring me to then call Str.regexp whenever I use it.
    • Str.string_match needlessly requires an integer argument designating the index at which I'd like to start matching. This should, of course, be an optional argument or a distinct form of the function (Perl seems to have gotten along okaytaken over the world without a silly index argument on regex matching).
  • The Map module for finite maps provides a map function for transforming the domain of the map, but doesn't provide a way to transform the keys.