Sunday, June 04, 2006

Experimenting with the TAPoR Tools

This summer I'm in the process of developing a new graduate course on digital history. One of the things that we will study is the creation of online historical materials, and for this, I plan to assign Cohen and Rosenzweig's Digital History. I would also like to emphasize the new computational techniques that historians will increasingly need to use with digital sources. This raises some interesting challenges. I can't assume that my students will know how to program or that they will be familiar with markup languages like HTML or XML. We don't even really have time for the systematic exploration of a particular language, like Perl. (Although we will have time for some fun stuff.) I've decided to focus on specific problems faced by historians working in the digital realm, and show how computation makes them tractable. I'll say more about the course in future posts; for now, suffice it to say that it will teach stepwise refinement, be very hands-on and, no doubt, a bit hackish.

The beta release of the TAPoR Text Analysis Portal gives students the chance to experiment with text processing without having to code everything from scratch. It allows the user to enter the URL of a digital source and then explore the text with an interactive concordance.

For example, suppose you want to get (or convey) a sense of how the historian's job of interpretation can be augmented with computational tools. Go to the online Dictionary of Canadian Biography and choose an entry at random. I picked Robert McLaughlin, someone with whom I wasn't already familiar. Using the TAPoR tool it is possible to find the most frequently occurring distinctive words and phrases in McLaughlin's biography:

mclaughlin carriage
in oshawa

It is also possible to get information about keywords in context. For example, clicking on "carriage" returns the following:

Enniskillen, where he built acarriageworks, which, in at least
him to build the OshawaCarriageWorks, a three-storey brick
which became known as McLaughlinCarriageabout , was facilitated by careful
new designs (some influenced byCarriageMonthly, a Philadelphia journal), and
patents (and buying others), refiningcarriagemechanisms, tabulating the credit ratings
mostly wholesale business of McLaughlinCarriageis all the more impressive
transportation. Boosted as the largestcarriagemaker in the British empire

Without reading the biography yet, I can now guess that Robert McLaughlin lived in Oshawa and founded a carriage works which became very successful. At this point, it is reasonable to object that I could have learned the same thing by reading his biography. The point, however, is that a computer can't learn by reading, but it can make use of text processing to produce more useful output. For example, suppose you wanted to create a "smarter" search engine. If you type "Robert McLaughlin" into Google, you get the following results.

  1. An art gallery in Oshawa
  2. (ditto)
  3. Bible Ministries
  4. (ditto)
  5. A photographer in Glasgow
  6. An art gallery in Oshawa
  7. A book about the battle of Okinawa in WWII
  8. A role-playing game called "Cthulhu Live"
  9. Realtors in New Jersey
  10. The blog of a Californian graphic artist

Now these results have less to do with one another than the animals in Borges' "Chinese Encyclopedia". But what if your search engine was to recognize "Robert McLaughlin" as a proper name, first submit the search to the Dictionary of Canadian Biography, process the text for keywords and then submit the query "Robert McLaughlin"+oshawa+carriage to Google? Then the first ten results would look like this:

  1. A Wikipedia entry on Oshawa with information about the McLaughlin Carriage Company
  2. The entry on Oshawa with information about the McLaughlin Carriage Company
  3. A popular history website (Mysteries of Canada) with an article about the McLaughlin Carriage Company and General Motors
  4. The history page of the City of Oshawa website with information about McLaughlin and his carriage company
  5. An art gallery in Oshawa
  6. The Canadian Encyclopedia entry on Oshawa with information about the McLaughlin Carriage Company
  7. The Oshawa Community Museums and Archives page about the McLaughlin Carriage Company
  8. An art gallery in Oshawa
  9. An article about McLaughlin from the Financial Post, reproduced by the Business Library at the University of Western Ontario
  10. A history page on the GM Canada website which talks about McLaughlin and his company

