Thursday, January 25, 2007

Exploratory Bibliography

A couple of weeks ago, I posted a reading list for a graduate exam in digital history. These kinds of lists are customarily around 80-100 books, so I didn't have room to include everything that's in my digital history library, or even everything that seemed essential (the list is quite light on programming books, for example.) Some of my students and colleagues were kind enough to send me suggested additions, which I may include in a subsequent post.

It's been quite a while since I posted a hack, so I decided to do something with the reading list. Suppose you're interested in digital history and you want to know what other books I might've included on the list, or what similar books someone else might have recommended. Is there any way to automate the process of searching for similar books? Sure.

The first step is to get a complete set of recommendations via the Amazon API. We pass my blog post through a simple scraper to get the ASINs (Amazon Standard ID Numbers) for each book on the list. In a loop, we then submit each ASIN to Amazon and get back an XML file that includes ASINs for recommended books. We create a big list of (ASIN, ASIN) pairs. Each one of these is a recommendation: customers who bought the first book also bought the second one. Since we will want to play with these data later on, we use a high-level Python module called pickle to save them to disk. Python source code for the first step is here.

Now that we've got a big list of recommendations, the second step is to concentrate on the books that come up most frequently. We unpickle our data then create a list of recommended books and filter out any that appear on the original list. We count the number of times each is recommended and sort to create a frequency list with most frequently recommended books at the top of the list. Python source for the second step is here.

(In the two steps above, I've linked to earlier posts on doing simple tasks with Python. If you compare those posts with the source code presented here, you can see how I modified code that I had already written to solve new problems.)

So how well did it do? All of the books that were recommended four or more times appear in the list below. I have most of them and think any could easily have been included on the original list.
Tags: | | | | |