Saturday, January 05, 2008

All is Flux

If you wanted a motto for digital history, it's hard to imagine finding anything better than the one that Heraclitus is supposed to have come up with around 500 BCE, when he said something to the effect that 'all is flux' or 'everything flows' or 'you can't step into the same river twice'.

I think that many historians have a research model which looks a bit like this:
  1. Formulate question
  2. Do research
    1. Collect a bunch of sources
    2. Decide which look most promising and skim through those
    3. Read the most relevant ones carefully
    4. Take good notes
  3. Write
  4. Publish

We all agree that the stages of the research process are indistinct and blend into one another. We all agree that there is a lot of movement to-and-fro and back-and-forth, and time for visions and revisions. Nevertheless, this research model--what the heck, let's call it Parmenidean--is widely enough understood that many professors ask their graduate students questions like "Have you done your research yet?" or "When are you going to start writing?" The students, in turn, reply with answers that may please or displease their advisors, but which are understood to be felicitous in the pragmatic sense.

Digital historians, on the other hand, have to be thoroughgoing Heracliteans and reject questions like "Have you done your research yet?" The only sensible way to do research online is to be doing everything all at once all the time. The research model looks like this:
  • Until your interpretation stabilizes...
    • You keep refining your ensemble of questions
    • Your spiders and feeds provide a constant stream of potential sources
    • Unsupervised learning methods reveal clusters which help to direct your attention
    • Adaptive filters track your interests as they fluctuate
    • You create or contribute to open source software as needed
    • You write/publish incrementally in an open access venue
    • Your research process is subject to continual peer review
    • Your reputation develops

Do we have what we need to fully implement this strategy? A lot of the pieces are already in place, including massive textual databases, search engines with APIs, XML, RSS feeds and feed readers, high-level programming languages, and tools for online scholarship like Zotero. The combined literature of statistical natural language processing, text and data mining, machine learning, and information retrieval provide a cornucopia of useful techniques. If you know how to program you're already most of the way there; if not, now is as good a time as any to begin learning how.

Tags: |