Most professors get paid for some agreed-upon mix of research, teaching and service. I've spent most of my time in this blog making a case for the ways that web programming can and will change the research and teaching of history. This week, however, I had the unexpected pleasure of doing a little service-related hacking.
Like every graduate program in the province, our department is subject to periodic review by the Ontario Council on Graduate Studies. One of the things that we thought might be useful was to try to find ways to assess the suitability of our library holdings for various kinds of historical projects. My colleague Allyson May had the good idea of checking books reviewed on various H-Net lists against our library holdings. Unlike regular journals, H-Net reviews tend to come out much more quickly, and thus are a better measure of recent literature of interest to historians. Of course, there are 170 H-Net discussion lists, some defunct, some with hundreds of book reviews over the last 13 years, and some with none. If you think this doesn't sound like something you'd want to do by hand, you're right.
In the following two hacks we automate the process. First, we go to the page of H-Net lists and get the name of each discussion list. The we go to the review search page, and, one-by-one, we search for all of the book reviews published on each discussion list. We save the links to each individual book review. We then spider the reviews, scrape the ISBN out of each, and save it to a file. Then, we submit each of the ISBNs to our library catalogue, and do a bit of rudimentary scraping to see if each book is in the library or not. Finally, we can accumulate the final results in a spreadsheet for whatever further analysis is required.
Tags: bibliography | hacking | h-net | spidering | web programming