Wednesday, April 05, 2006

Methodology for the Infinite Archive

In a widely circulated talk, the mathematician Richard Hamming suggested that researchers should ask themselves what are the most important problems in their field, and, as a follow-up question, why they are not working on them [see Hamming, You and Your Research, and for an interesting response, Graham, Good and Bad Procrastination]. Historians seem comfortable enough asking about the significance of a particular piece of current work—most directly with some form of the "so what?" question—but we seem less willing (or perhaps able) to enter into discussions about the relative importance of current approaches or schools. Perhaps it is the historiographic tradition: with the benefit of hindsight we can all agree on the importance of the Annales school. For more recent approaches, like big history, most historians seem to be taking a wait-and-see approach.

Now I have to admit that I have reservations about big history, some of which I've spelled out in a forthcoming article in the journal Rethinking History. Setting those reservations aside, however, I think that the project that the big historians are attempting is an important one. They are trying to put the past, from the Big Bang to the present day, into a single, coherent narrative. Such an ambitious project is bound to fail in some ways, of course, but the failures promise to be interesting and informative. What more could we want? We know that the next generation will revise our interpretations ... let's give them something that is worth revising. [For an introduction to big history, see David Christian's wonderfully readable Maps of Time.]

It probably won't come as much of a surprise that I think that the questions raised by digital history are some of the most important that we face. The explosion of printed material after the fifteenth century fundamentally changed scholarship, making it much easier to compare different editions of the same text, making it possible to read extensively as well as intensively, and creating the conditions for widespread literacy [see, for example, the essays in The Renaissance Computer]. We are currently in the midst of another such transformation, one that will give us nearly instantaneous access to the contents of the world's great libraries and archives, will radically democratize knowledge production, and will force us to think of machines as part of our audience.

So does this mean that we have to throw out everything we hold dear? Of course not. There's still no substitute for being able to read closely and critically; as Timothy Burke put it a few months ago in Easily Distracted, "interpretation is the antibody" against viral marketing and other kinds of spin and propaganda. Given the low average quality of online information and the read/write nature of the web, we need the work of archivists, librarians and curators more than ever.

We also need some new skills. We need to be able to digitize and digitally archive existing sources; to create useful metadata; to find and interpret sources that were "born digital"; to expose repositories through APIs; to write programs that search, spider, scrape and mine; to create bots, agents and mechanical turks that interact seamlessly with one another and with human analysts.

I've noticed that many people, otherwise very erudite, feel comfortable coming up to me and saying, "I'm a luddite," like it was something to be proud of. So how well did that turn out the first time? Don't we study history, in part, so we don't have to repeat it? [See Thompson's Making of the English Working Class for more on the Luddites.]