Sunday, February 19, 2006

Text Mining the DCB, Part 5

My original plan for this series of posts was to show how it is relatively simple to spider an online historical collection like the Dictionary of Canadian Biography, scrape out some information, and use it to make possibly novel inferences. Up until now, we've been storing information in text files, which is OK for a simple demo but unwieldy if we want to try something more ambitious.

As a result I've decided to make a few modifications so that we can store information in a relational database instead. Eventually, I will probably implement something open source on a server, like MySQL. In the meantime, however, I already have MS Access installed on my computer so that is what I am going to use. It shouldn't be too hard to port later.

In my next post I will describe the intial data tables and the way we get Perl to talk to our database...

Tags: | | |