Digital History Hacks (2005-08): Results When and Where You Need Them

In my previous post I complained about a taken-for-granted model that carves the research process into discrete stages of information gathering, analysis, writing and publication. As I noted, I don't think that this model really makes sense anymore. I've been trying to figure out where it came from, and more to the point, why it persists.

We all have preferred ways of coming up with explanations, and one of my favorites is to start with an unshakeable belief in the second law of thermodynamics and go from there. In the wake of any event, there are a range of material and documentary sources that can be used to make inferences about what happened. Time continues, however. Memories are reworked, documents are lost, physical evidence decays and is disrupted. Contexts for understanding various pasts change, too, of course. We might even say that "all is flux." Against this inexorable dissolution, we've tried to create little islands of stasis. These include libraries, museums and archives, and also brass plaques, time capsules, heirloom species, national parks, and mathematical laws.

In Into the Cool, Schneider and Sagan summarize the second law by saying that "nature abhors a gradient." To the extent that we don't, we have to pay to maintain them. For example, there are information and transaction costs associated with learning anything. (In this case, the gradient that you are trying to maintain is your own wit and wisdom. If you're reading this, you may find it easier than your waistline, but they're all losing battles in the long run). In the past, these costs were highest for moving historians to distant documents and keeping them near those documents temporarily. When I did archival work and fieldwork for my dissertation, I was acutely aware of the cost of being 3,000 miles from home. I had the sense that it really mattered which box I requested next at the archive, or which place I decided to visit in the field. Many researchers describe having had similar experiences... it's part of the fun, the frisson, of archival work. But the high cost of doing research in the material world forces research time into clumps.

Most academic researchers also have to teach to support themselves, and this introduces another kind of temporal clumping. Research trips are rarely taken during the school year, and writing is often deferred, too. I'm trying hard to suffuse my own research and writing throughout the year, but I'm aware that I went for 25 days without posting to my blog last December, and have written five posts in the last 12 days. I start teaching again tomorrow, attending job talks, and so on.

I'm not going to change costs associated with working in the material world, of course. I'm not going to change the university calendar to a year-round, part-time engagement, either. But to the extent that the digital world changes the landscape of transaction and information costs that we face, it will make a big difference in our shared research model.

As I see it, many of the programs that we are currently using impede the unification of the research process. At a minimum, most historians probably rely on a word processor and web browser. They may also use a spreadsheet, bibliographic database and more specialized programs like an RSS feed reader, relational database, statistical package, GIS, or concordancer. Each of these programs is designed to be "sovereign," to use Alan Cooper's term, to be "the only [program] on the screen, monopolizing the user's attention for long periods of time." The move to Web 2.0 has put a lot of functionality in the browser, and programs like Zotero are clearly a step in the right (only) direction. But the fact remains that most of our own research materials are locked into little silos. Moving from one of these silos to another imposes its own granularity on our activities.

How could this be different? Think of your Zotero bibliography as the core of your research process. Every item in it is there because it is relevant to your work. Suppose you keep your notes and drafts in Zotero, too. Then for the purposes of digital history, a good statistical description of your Zotero database is the best and most up-to-the-minute description of your research process. That description will be more accurate to the extent that you can incorporate other streams of information into it, like the feeds that you read, the books that you purchase, and the research-related web searches that you do. I think that the development of Zotero in the near future will allow more and more of this kind of incorporation, and the fact that the software is open source and provides an API bodes well for using it as a platform for mining. The key point that I want to emphasize, however, is that measurements of your Zotero bibliography will be most useful to the extent that they are fed back into your research in a useful way. Suppose you do a quick analysis of a text that you are in the process of reading. It is quite simple to provide the results of that analysis both as information that you can read, and as a vector that can be used to refine automatic searching or spidering for related material.

Tags: analysis and synthesis | browser | data mining | digital history | entropy | flows | information costs | Zotero

Digital History Hacks (2005-08)

Tuesday, January 08, 2008

Results When and Where You Need Them

William J. Turkel

Blog Archive

The Programming Historian

Digital Historians / Humanists

Digital History / Humanities

Hacking