Tuesday, January 30, 2007

Keywords and Clues

Back in the days of Usenet, people occasionally played a game on some of the book-related discussion lists that I followed. One person would post a transcript of a book index, and other people would try to guess the book. One that sticks in my mind for some reason had an index which included the following items. (See if you can guess the book, before checking your answer here.)
  • Absent-mindedness
  • Aeneid, The
  • Alexander the Great
  • Amsterdam
  • Anti-semitism
  • Arthurian legend
  • Augustine, St.
  • Auto-erotism
  • Bed-wetting
  • Bills, forgetting to pay
  • Bosnia and Herzegovina, customs of the Turks in
  • Castor and Pollux
  • Cervantes
  • Chance and determinism
  • Cheating at cards
  • Day-dreams
  • Doctors, bungled actions by
  • Dreams
  • Education
  • Enuresis
  • ...
Now let's be wildly anachronistic. Imagine Dr. Freud in his forties, working on The Psychopathology of Everyday Life and "going to the Internet to do some research," as the TV / movie cliché has it. He sits at his laptop typing queries into Google: "bed-wetting OR enuresis site:edu", "+bosnia +herzegovina +turks", and so on. Meanwhile, pattern-matching agents are targeting him for advertisements: "Find out what causes bed-wetting", "Drug free in home bedwetting treatment program with counseling", "Searching for bed wetting alarms?", "Find a huge choice of medical cosmetic items that meet your needs!", "Cheap Sarajevo Flights", "Help Kosovo Orphans", and so on.

John Battelle coined the phrase the database of intentions to highlight the sheer amount of information that search histories contain about people's interests and desires. In some cases, these keywords can be revealing enough that they can be tied to specific individuals, as in the well-publicized case of AOL searcher 4417749. As the Freud example suggests, however, people search for a variety of reasons. A friend of mine once bought me a CD for a present, and then was irritated because Amazon kept recommending other music that he didn't like based on that purchase. Amazon has since introduced a gift-wrapping option that, I suspect, serves a double purpose ... knowing which items are gifts helps their recommendation engine avoid such gaffes.

The problem is that it is rarely possible to read directly from one piece of evidence to one inference. Carlo Ginzburg's essay on "Clues" is particularly useful here. Starting with Sherlock Holmes, Freud and an art critic named Giovanni Morelli, Ginzburg traces the intellectual genealogy of what he calls the 'evidential' or 'venatic' paradigm: a mode of inference that works by integrating clues, telltale details, traces, signs, symptoms. Pattern-matching agents often fail because they are doing just that, matching patterns. An intelligent humanist, poring over the total collection of traces left by Dr. Freud in our whimsical example won't assume there's a bed-wetting problem in Freud's family if there is no other evidence for one. In other words, sometimes the key piece of evidence is what's not there, rather than what is. This, of course, was also famously true for Holmes in "Silver Blaze." (For more, see Rita C. Manning, "Why Sherlock Holmes Can't Be Replaced by an Expert System," Philosophical Studies 51, no. 1 (1987): 19-28.)

