The beta release of the TAPoR Text Analysis Portal gives students the chance to experiment with text processing without having to code everything from scratch. It allows the user to enter the URL of a digital source and then explore the text with an interactive concordance.
For example, suppose you want to get (or convey) a sense of how the historian's job of interpretation can be augmented with computational tools. Go to the online Dictionary of Canadian Biography and choose an entry at random. I picked Robert McLaughlin, someone with whom I wasn't already familiar. Using the TAPoR tool it is possible to find the most frequently occurring distinctive words and phrases in McLaughlin's biography:
mclaughlin carriage |
in oshawa |
company |
toronto |
motor |
automobiles |
business |
It is also possible to get information about keywords in context. For example, clicking on "carriage" returns the following:
Enniskillen, where he built a | carriage | works, which, in at least |
him to build the Oshawa | Carriage | Works, a three-storey brick |
which became known as McLaughlin | Carriage | about , was facilitated by careful |
new designs (some influenced by | Carriage | Monthly, a Philadelphia journal), and |
patents (and buying others), refining | carriage | mechanisms, tabulating the credit ratings |
mostly wholesale business of McLaughlin | Carriage | is all the more impressive |
transportation. Boosted as the largest | carriage | maker in the British empire |
Without reading the biography yet, I can now guess that Robert McLaughlin lived in Oshawa and founded a carriage works which became very successful. At this point, it is reasonable to object that I could have learned the same thing by reading his biography. The point, however, is that a computer can't learn by reading, but it can make use of text processing to produce more useful output. For example, suppose you wanted to create a "smarter" search engine. If you type "Robert McLaughlin" into Google, you get the following results.
- An art gallery in Oshawa
- (ditto)
- Bible Ministries
- (ditto)
- A photographer in Glasgow
- An art gallery in Oshawa
- A book about the battle of Okinawa in WWII
- A role-playing game called "Cthulhu Live"
- Realtors in New Jersey
- The blog of a Californian graphic artist
Now these results have less to do with one another than the animals in Borges' "Chinese Encyclopedia". But what if your search engine was to recognize "Robert McLaughlin" as a proper name, first submit the search to the Dictionary of Canadian Biography, process the text for keywords and then submit the query "Robert McLaughlin"+oshawa+carriage to Google? Then the first ten results would look like this:
- A Wikipedia entry on Oshawa with information about the McLaughlin Carriage Company
- The Answers.com entry on Oshawa with information about the McLaughlin Carriage Company
- A popular history website (Mysteries of Canada) with an article about the McLaughlin Carriage Company and General Motors
- The history page of the City of Oshawa website with information about McLaughlin and his carriage company
- An art gallery in Oshawa
- The Canadian Encyclopedia entry on Oshawa with information about the McLaughlin Carriage Company
- The Oshawa Community Museums and Archives page about the McLaughlin Carriage Company
- An art gallery in Oshawa
- An article about McLaughlin from the Financial Post, reproduced by the Business Library at the University of Western Ontario
- A history page on the GM Canada website which talks about McLaughlin and his company
Tags: concordance | dictionary of canadian biography | digital history | history education | pedagogy | search | stepwise refinement | text mining