" /> Ezra's Research: December 2005 Archives

« November 2005 | Main | January 2006 »

December 22, 2005

Simplifying XML

Don't give up hope, Tim. A simplified/revised XML spec would make some of our lives much easier.

I'm working on a programming language for the web which integrates XML syntax into the language. This is dandy except that there are times when we need to look at the Doctype in order to be able to parse a program. In our language, for example, you can't use   until you've indicated a Doctype of XHTML, and until our wee little parser has fetched and parsed that whole mess and made a big table of entities.

Also, DTD doesn't do much for us. The newer models for validating documents (e.g. Relax) fit better with the kind of validity that programming-language people are inclined to think about (namely, "regular trees").

We do integrate XML Namespaces, so having that pulled in probably wouldn't hurt, either.

December 16, 2005

What Web Apps Are

Let's all get on the same page about what makes up a web app. Here are some concepts and terms for talking about it. Users have a clear conceptual model of a web app; this model should be well-understood by web developers and tools should take it into account. The below is a first draft and probably will need revision—I'm interested in feedback from active web app developers. Is this accurate from your point of view? Any nuances of the web it doesn't take into account?

What Web Apps Are

Every web app has a model, which is some body of information that the user wants to interact with (view and alter). To the user, the app offers a particular experience of interaction with the model.

The fundamental experiential units of the web are requests. This unit issues from the infrastructure of HTTP. A "request," for present purposes, is a round-trip exchange of data with the server which results in the browser (a) adding a new item to its history and (b) completely redrawing the page contained in the response. It is possible to redraw the page without such a request, and it is possible to get data from the server without redrawing the page, but it is not possible to add an item to the browser's history without a request.

Different application designs (in particular, ways of breaking down into requests) may offer different experiences over a model.

From an application designer’s point of view, a web experience is composed of two distinct kinds of requests, actions and views.

An action transforms the app's model and redirects the user to a view; an action should not display a page by itself (see below for an exception).

A view should not modify the model, but it does display a page to the user. This separation guarantees an important user expectation: that hitting “reload” will merely refresh the information I’m looking at; it never has consequences beyond my browser.

An action may display a page directly, if something has gone wrong in performing the action. In this case, hitting “Reload” has a clear meaning to the user: it means that the app should “retry” the action that failed. If an action succeeds, however it should redirect to a view, rather than display a page directly. Otherwise reloading the page will retry an already-performed action, when the user would expect it merely to refresh the displayed information.

What the designer or the user thinks of as an (informal) “action” may in fact display several pages to the user, accumulating local state over the course of that interaction, and modifying the model only at the end of the series of requests. For example, an “action” that deletes some object from the model may first confirm that the user really wants to do so, or a flight-booking app may have several pages of questions to collect before recording the ticket as booked. Call this series of requests a pipeline rather than an action; the intermediate pages are merely views in our terminology, and the final request is the only action. These intermediate views may need to store intermediate state information; but this state is ancillary to the model.

Sticking to this discipline (which, broadly speaking, good web apps already do) ensures that the browser's history accumulates a list of things the user saw, and not a list of actions taken. It is impossible to take unintended action by using the browser's history.

December 14, 2005

Del.icio.us Technology Revealed

del.icio.us uses HTML::Mason, the Perl web framework. I got a Mason error message today when the database was unreachable.

December 6, 2005

Removing Duplicates From a Database

You've inserted multiple copies of the same data into a table. You want to get rid of all but one copy of each, and you want to keep the one with the lowest ID. You are using a DMBS with sub-selects, such as PostgreSQL.

You will do this:

select min(wine_id) as canonical from
        wine, (select distinct wine_name from wine) as names
        where wine.wine_name = names.wine_name
        group by names.wine_name;

If that looks right, you will proceed:

delete from wine where wine_id not in
    (select min(wine_id) as canonical from
        wine, (select distinct wine_name from wine) as names
        where wine.wine_name = names.wine_name
        group by names.wine_name);

More generally, if you have a table T with pkey T_ID and the desired data is unique on some column UNIQUE_COLUMN, it's:

select min(T_ID) as canonical from
        T, (select distinct UNIQUE_COLUMN from T) as equiv_classes
        where T.UNIQUE_COLUMN = equiv_classes.UNIQUE_COLUMN
        group by equiv_classes.UNIQUE_COLUMN;

December 5, 2005

Items to do from today's meeting


  • Active Links code as the target of a/@href. (done)
  • Table declarations at top level are too eager.
  • Label expressions rather than serializing them in full (when serializing).

Major tasks:

  • Implement Sam & Dave's Wineshop.
  • Package up Links interpreter for neophytes to build it.
  • Think about building Links interpreter as a web service.

December 1, 2005

Shared Memory

I've been thinking about concurrency.

I think the model of "message-passing with no destructive update" may be the ultimate solution. This is the Erlang approach, and it has a lot going for it.

When you disallow updating data, you can do one incredible, massive optimization: whenever process A sends a huge chunk of data to process B (on the same machine), you just send a pointer. You don't copy the data. You don't marshal it, you don't do nothing. You just give process B a copy of the pointer that A is using. Now, A may later "update" that data by constructing a very similar data structure and throwing away its old pointer. At that point, B is "out of date." If B needs to stay in sync, you should send the new data as well. Since this is cheap, there's no reason not to.

Whenever I've done concurrent programming, I've started out with a shared-memory model and ultimately found that it's too hard to manage the timing and notification issues. I've always abandoned it for message-passing sooner or later.

Do you have an application that absolutely requires mutable shared memory? Either for performance reasons, or just to achieve a solution to the problem?

I want to hear about it.