Friday, February 10, 2006

Historiographical Process(ing)

In my first post, I posed the problem of developing methodologies for an archive that is constantly changing and effectively infinite. Obviously this has implications for the way we think of traditional activities like creating bibliographies and writing historiography. Consider the way that dissertations are usually written: the student does a literature review, writes a historiographical introduction and dissertation proposal, then does archival work and writes the rest of the monograph. Research handbooks suggest that one should check for new literature when the monograph is nearing completion, so that it can be as up-to-date as possible. Given the hell that is the academic job search, this may or may not happen.

One problem with the traditional model is that most academics don't seem to realize that the world of scholarship has completely changed within the last seven years. Even the most newly minted of PhDs began his or her dissertation in the aftermath of the dot.bomb, when it wasn't clear which companies would survive, or what the web of the future would be like. Google was coming to prominence as the interface to the web, and new sources for just about any research topic could be turned up regularly. One measure of this historical shift comes from the data mining research group at the Online Computer Library Center (OCLC), the folks who bring us WorldCat, among other things. In a presentation from last year, they show that the number of records for digital materials entered into WorldCat was less than about 20,000 per year from the mid-1980s through 1998. In 1999, it jumped to about 30,000. In 2000, it jumped again, this time to over 160,000. Every year since then, more than 100,000 records have been entered for digital materials each year. That is just the stuff that is showing up in WorldCat. Not counting Google's relatively new project to make 30 million books full text searchable.

It's time we rethink bibliography and historiography as processes, or better yet, as processing, as something that our bots can be continually working on in the background.

One vision for this comes from the unsettling world of contemporary data mining. In O'Harrow's No Place to Hide, he quotes Jeff Jonas, chief scientist at Systems Research & Development:

Our work is about perpetual analytics, instant intelligence, as fast as something is introduced, instantaneously being able to tell if that means something important to you. You're sitting under an ocean of data, and every day millions of gallons are being added, and every day you have to go through zillions of drops to find out whether there's something important in there. You're slicing time down to the nanosecond, so you can see every drop hit. So when each drop hits you can see where it lands, what it's next to. You can measure the ripples, and there is an instant where you can make interesting decisions about what has changed.

Tags: | | | | |