Raw Data Lacking on the Web

Why isn't there more data on the web?

There seems to be a lot of news stories and opinion articles, but comparatively little data.

For example, I was just reading a piece asserting that book sales were up over the last decade, and provided a link. At the end of the link, I was hoping to find a nice graph showing this supposed spike in book sales. Instead, it was a USA Today article with a bunch of, you know, text. Why would I want to read a bunch of text with nothing more behind it than the editorial voice of USA Today?

Surely some reputable institute has counted the total number of books sold in each of the last ten years. Why isn't that study available for free on the web?

I'd expect to see the raw numbers, a graph displaying them, and ideally the full study which puts those numbers in context and describes the methods, etc. Sometimes you can get this kind of data; for example, in government studies like these power industry statistics. But most of the time, bloggers and others are pointing at news reports, which has at least two problems: first, it has too many layers of editorialism (adding the news editor and journalist besides just me, the blogger, and the original scholars), and second, it breaks down valuable raw data into a journalist's tag lines and pull quotes.

Why isn't more of this available?

One issue is probably that lots of data is owned; it's gathered at some cost and the institute wants to sell it to people who can get some business use out of it. I'd like to see more intellectual property owners open up the access to at least some of their property. I get a lot out of the New York Times online even as a non-subscriber, and I'm glad that they show as much as they do, though I wish there wer more.

Leaving that aside, another issue is that it's hard to search for raw data. It's easy to search the web for a topic, but hard to specify that you're interested in a certain level of detail. Words like "data," "table" and "figures" aren't likely to help. Perhaps a search engine could specialize in recognizing stuff that looks like tabular numerical data, and offer a way to say you want that kind of stuff.

