It's been quite a while since I posted a hack, so I decided to do something with the reading list. Suppose you're interested in digital history and you want to know what other books I might've included on the list, or what similar books someone else might have recommended. Is there any way to automate the process of searching for similar books? Sure.
The first step is to get a complete set of recommendations via the Amazon API. We pass my blog post through a simple scraper to get the ASINs (Amazon Standard ID Numbers) for each book on the list. In a loop, we then submit each ASIN to Amazon and get back an XML file that includes ASINs for recommended books. We create a big list of (ASIN, ASIN) pairs. Each one of these is a recommendation: customers who bought the first book also bought the second one. Since we will want to play with these data later on, we use a high-level Python module called pickle to save them to disk. Python source code for the first step is here.
Now that we've got a big list of recommendations, the second step is to concentrate on the books that come up most frequently. We unpickle our data then create a list of recommended books and filter out any that appear on the original list. We count the number of times each is recommended and sort to create a frequency list with most frequently recommended books at the top of the list. Python source for the second step is here.
(In the two steps above, I've linked to earlier posts on doing simple tasks with Python. If you compare those posts with the source code presented here, you can see how I modified code that I had already written to solve new problems.)
So how well did it do? All of the books that were recommended four or more times appear in the list below. I have most of them and think any could easily have been included on the original list.
- Anderson, Chris. The Long Tail: Why the Future of Business is Selling Less of More. Hyperion, 2006. [10 recommendations]
- Baker, Nicholson. Double Fold: Libraries and the Assault on Paper. Vintage, 2002. [4]
- Berry, Michael, ed. Survey of Text Mining: Clustering, Classification and Retrieval. Springer, 2003. [6]
- Berry, Michael and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, 2005. [6]
- Campbell-Kelly, Martin and William Aspray. Computer: A History of the Information Machine, 2nd ed. Westview, 2004. [4]
- Greenfield, Adam. Everywhere: The Dawning Age of Ubiquitous Computing. Peachpit, 2006. [10]
- Hiltzik, Michael A. Dealers of Lightning: Xerox PARC and the Dawn of the Computer Age. Collins, 2000. [4]
- Hutchins, Edwin. Cognition in the Wild. MIT, 1996. [4]
- Jackson, Peter and Isabelle Moulinier. Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. John Benjamins, 2002. [4]
- Jenkins, Henry. Fans, Bloggers, and Gamers: Media Consumers in a Digital Age. NYU, 2006. [4]
- Kuhn, Thomas S. The Structure of Scientific Revolutions, 3rd ed. Chicago, 1996. [4]
- Lanham, Richard A. The Electronic Word: Democracy, Technology, and the Arts. Chicago, 1995. [5]
- Litman, Jessica. Digital Copyright: Protecting Intellectual Property on the Internet. Prometheus, 2000. [5]
- Moggridge, Bill. Designing Interactions. MIT, 2006. [4]
- Ryan, Marie-Laure. Narrative as Virtual Reality: Immersion and Interactivity in Literature and Electronic Media, new ed. Johns Hopkins, 2003. [4]
- Standage, Tom. The Victorian Internet. Berkeley, 1999. [6]
- Suchman, Lucy A. Plans and Situated Actions: The Problem of Human-Machine Communication. Cambridge, 1987. [4]
- Sunstein, Cass R. Infotopia: How Many Minds Produce Knowledge. Oxford, 2006. [4]
- Vaidhyanathan, Siva. The Anarchist in the Library: How the Clash between Freedom and Control is Hacking the Real World and Crashing the System, new ed. Basic, 2005. [4]
- Waldrop, M. Mitchell. The Dream Machine: J.C.R. Licklider and the Revolution that Made Computing Personal. Penguin, 2002. [4]
- Wardrip-Fruin, Noah and Pat Harrigan, eds. First Person: New Media as Story, Performance, and Game. MIT, 2004. [4]