Sunday, July 20, 2008

Towards a Computational History

[Cross-posted to Cliopatria & Digital History Hacks]

Given that relatively few of our colleagues are familiar with digital history yet--and that those of us who practice some form of it aren't sure what to call it: digital history? history and computing? digital humanities?--it may seem a bit perverse to start talking about computational history. Nevertheless, it's an idea that we need, and the sooner we start talking and thinking about it, the better.

From my perspective, digital history simply refers to the idea that many of our potential sources are now online and available on the internet. It is possible, of course, to expand this definition and tease out many of its implications. (For more on that, see the forthcoming interchange on "The Promise of Digital History" in the September 2008 issue of The Journal of American History). To some extent we're all digital historians already, as it is quickly becoming impossible to imagine doing historical research without making use of e-mail, discussion lists, word processors, search engines, bibliographical databases and electronic publishing. Some day pretty soon, the "digital" in "digital history" is going to sound redundant, and we can drop it and get back to doing what we all love.

Or maybe not. By that time, I think, it will have become apparent that having networked access to an effectively infinite archive of digital sources, and to one another, has completely changed the nature of the game. Here are a few examples of what's in store.

Collective intelligence. Social software allows large numbers of people to interact efficiently and focus on solving problems that may be too difficult for any individual or small group. Does this sound utopian? Present-day examples are easy to find in massive online games, open source software, and even the much-maligned Wikipedia. These efforts all involve unthinkably complex assemblages of people, machines, computational processes and archives of representations. We have no idea what these collective intelligences will be capable of. Is it possible for an ad hoc, international, multi-lingual group of people to engage in a parallel and distributed process of historical research? Is it possible for a group to transcend the historical consciousness of the individuals that make it up? How does the historical reasoning of a collective intelligence differ from the historical reasoning of more familiar kinds of historian?

Machines as colleagues. Most of us are aware that law enforcement and security agencies routinely use biometric software to search through databases of images and video and identify people by facial characteristics, gait, and so on. Nothing precludes the use of similar software with historical archives. But here's the key point. Suppose you have a photograph of known provenance, depicting someone in whom you have an interest. Your biometric software skims through a database of historical images and matches your person to someone in a photo of a crowd at an important event. If the program is 95% sure that the match is valid, are you justified in arguing that your person was in the crowd that day?

Archives with APIs. Take it a step further. Most online archives today are designed to allow human users to find sources and read and cite them in traditional ways. It is straightforward, however, for the creators of these archives to add an application programming interface (API), a way for computer programs to request and make use of archival sources. You could train a machine learner to recognize pictures of people, artifacts or places and turn it loose on every historical photo archive with an API. Trained learners can be shared amongst groups of colleagues, or subject as populations to a process of artificial selection. At present, APIs are most familiar in the form of mashups, websites that integrate data from different sources on-the-fly. The race is on now to provide APIs for some of the world's most important online archival collections.

Models. Agent-based and other approaches from complex adaptive systems research are beginning to infiltrate the edges of the discipline, particularly amongst researchers more inclined toward the social sciences. Serious games appeal to a generation of researchers that grew up with not-so-serious ones. People who might once have found quantitative history appealing are now building geographic information systems. In every case, computational processes become tools to think with. I was recently at the Metropolis on Trial conference, loosely organized around the 120 million word online archive of the Old Bailey proceedings. At the conference, historians talked and argued about sources and interpretations, of course, but also about optical character recognition and statistical tables and graphs and search results generated with tools on the website. We're not yet at a point where these discussions involve much nuanced analysis of layers of computational mediation... but it is definitely beginning.

Tags: |