Digital History Hacks (2005-08): April 2008

Thursday, April 10, 2008

Fitness Functions

[Cross-posted to Cliopatria & Digital History Hacks]

One of the distinctions that applied mathematicians make is between linear and nonlinear problems. In a linear problem, you have a set of variables that you can tweak, and as you adjust each variable you can get ever closer to an optimal configuration. Using techniques such as linear programming, it is straightforward to determine precisely how many scoops of raisins to put in your box of bran, or how many Cherries will make a Garcia. Many problems, alas, don't admit of this kind of solution. In the days before digital everything, it was all too common to futz around with the brightness knob, color balance, rabbit ears, and position of pets and small children to try and get a TV signal that didn't look like it was being relayed from the dark side of the moon. The slightest change could make things drastically better or worse, with no apparent logic.

The problem with nonlinear problems is that you pretty much have to get every variable right at the same time. Think of the space of all possible states of your problem as a kind of dark landscape, and the optimal solution as the highest point in that space. Linear problems have smooth landscapes. If you start groping your way up a hill, you end up at the top and that's the best you can do overall. Nonlinear problems have jagged landscapes. It is easy to feel your way up a low peak and get stuck there, unaware of higher peaks elsewhere.

There are different methods for solving nonlinear optimization problems; one of the more popular makes use of genetic algorithms. First you find a way of representing all of the possible solutions to your problem. In the TV example, you might want to represent the angle of each of the two antennas, the xy coordinates of the napping cat, the rotational angle of the brightness knob, and so on. A list of each of these variables is known as a genome, and a list of particular values as a genotype. Generate a small random population of genotypes, and test each one to see how good it is. This test is called the fitness function. In our example, it is the person sitting on the couch shouting "not bad," "pretty good" or "awful" each time an adjustment is made. Once you know how well each of your solutions performed, you make a new generation of solutions by mutating and recombining the genomes of your old ones. Over time, the fitness of the population increases, and the artificial selection mechanism eventually finds solutions that are near optimal. (If you want to start programming your own GAs, I recommend Mitchell's Introduction and Goldberg's Genetic Algorithms as good places to start).

One of the perennial tragedies of academia is that we constantly pretend that our careers or those of our students are linear optimization problems. Grades are the most obvious way that we do this. Students learn that their mark on one test is independent of their mark on another, that it is better to have a high GPA than to risk taking hard courses that interest them, that exploration and failure will usually be punished. Teachers justify marks by appealing to rubrics, bemoaning grade inflation and students "who look good on paper." Too many of us think of a good career in terms of lines on a CV, a list of so many independent accomplishments, each of which can be attained and then forgotten.

On a rainy day in 1992, I wandered into a Vancouver technical bookstore on my way home from school. I think I was probably avoiding a problem set or some other homework, as I've never been very good at doing what I should be doing rather than what I want to be doing. Anyway, I remember finding a copy of John Holland's Adaptation in Natural and Artificial Systems on the shelf of new releases and really wanting to buy it. I stood in the store holding the book for the longest time. It was more than I could afford, it was a distraction from my school work, I had a bad habit of buying books and losing interest in them. I had been doing a lot of exploring and a fair bit of failing. I finally made the decision that was, in context at least, sub-optimal. I bought the book and went home to read it rather than doing my schoolwork.

I often tell my students that they should follow their curiosity, take chances and not be afraid to fail. You never really know what whim, what chance encounter or distraction is going to change your life. In my case, I read a lot of science fiction and graphic novels and ate a lot of guacamole. I played role playing games and got married early and happily. I watched TV. I got bad grades in linear algebra and analysis, but I liked math enough to keep trying until I got better at it. And my first published work was on a subject that was novel and trendy enough that my reputation as an up-and-coming researcher outweighed my uneven transcript: genetic algorithms. It's tempting to look back at that moment in the bookstore as a crucial inflection point in my life, but that would be too linear. The choices that we make affect our fitness, but never in a way that makes it easy to assign credit or blame.

Tags: feedback | genetic algorithms | nonlinear optimization | pedagogy

Saturday, April 05, 2008

Visualizing the Emergence of a Strategic Knowledge Cluster

In the summer of 2004, when I had just arrived at the University of Western Ontario, my new colleague Alan MacEachern invited me to join a small group that was putting together a grant application. The federal agency SSHRC had just announced funding for the design of something called 'research clusters'. At the time none of us was particularly clear what these clusters were supposed to be, and like many of the best kinds of opportunity, I don't think that SSHRC was really clear either. We eventually settled on the idea that the main task of clusters was 'knowledge mobilization', which left the matter nicely open.

Our initial grant application was successful, and five of us set to work to develop NiCHE, the Network in Canadian History & Environment / Nouvelle initiative canadienne en histoire de l'environnement. As we tried various things we kept track of activities and participants, allowing us to visualize the emergence of our research network. I should say up front that NiCHE doesn't cause research and is prohibited from directly funding research per se. Instead we find ways to facilitate research and training in environmental history broadly construed, and to mobilize the knowledge that researchers create.

One of the tools that we use for visualization is an open source package called Graphviz. We create a file that specifies entities (people, publications, field trips, etc.) and the relationships between them, then we hand off that file to Graphviz, which uses sophisticated algorithms to figure out a neat way to plot the network. We've found such visualization to be very useful, even though it can only ever show the tip of a much larger social iceberg. In our graphs, two people may be linked because they attended the same meeting or each published a chapter in a book. Our data doesn't show whether they knew each other in grad school, have a longstanding rivalry, or both secretly like Buffy the Vampire Slayer.

The original NiCHE executive group worked quite closely together. One of the interesting facts about networks is that the number of possible pairwise relations between entities grows much faster than the number of entities as the network gets larger. Two people have at most one relationship, three people can have three (AB, BC, AC), four people can have six (AB, AC, AD, BC, BD, CD). The ten possible pairwise relationships between the five of us looked like this:

One of the first things that we tried to do was provide licenses for Groove collaborative software to all of the people who were interested in joining NiCHE. For people with Windows machines the software worked very well. Unfortunately, it never really worked for people with Macs. We had to supplement Groove with other software, find suboptimal workarounds, and eventually abandon it. For a while, however, it gave us a way to interact relatively closely with NiCHE members who also happened to be tech-savvy Windows users. Our network took on a hub-and-spoke form.

To reach out to more potential participants, we formed an advisory group and held a meeting in Toronto. Instead of one hub, we now had two, with some bridging members who participated in both online and face-to-face activities.

The executive group split up to host regional meetings in other cities across Canada.

We put together an online directory so members could add information about themselves. The directory allowed us to contact people and tell them about upcoming activities. Since it was publicly accessible, the directory also allowed NiCHE members to learn more about one another.

Although adding one's name to a directory is a relatively weak form of participation, we found that many people became more active in NiCHE over time. The network seemed to extend to new participants, many of whom would then get involved in a number of subsequent projects. There is a saying in free / open source software, "contribute nothing, expect nothing." Conversely we could say that the people who contributed something to NiCHE could expect something from us. Some of them contributed articles to a special issue of the journal Environmental History. Some contributed chapters to a new textbook, Method and Meaning in Canadian Environmental History.

Subsequent activities like a summer school and a graduate student workshop brought in some new participants, and brought back many more:

When SSHRC announced a much larger grant for strategic knowledge clusters, we were able to include a version of the last figure as part of our application. (The Graphviz script that generated it is here.)

A year and half later, we're in the process of scaling up NiCHE activities by a couple of orders of magnitude. Network visualization gives us some insight into the work of a few hundred people who are loosely affiliated with NiCHE and collaborating in many different ways. We can identify people who have energy and initiative to share, and try to help them. Some provide 'bonding capital', tying tightly-linked groups closer together. Some provide 'bridging capital', mobilizing knowledge from one region or disciplinary specialization to another. We can also be more strategic about developing the connections that still need to be made, to make our network stronger and more effective. (For more about social networks, see Clay Shirky's new Here Comes Everybody.)

What is more exciting is that we are getting closer to the point where we can make these kind of tools available to everyone in NiCHE. People will be able to enter their own information about research collaborations and interests, and explore social connections within the network. It will become much easier to find joint acquaintances to make introductions or to find people with particular skills or expertise.

Tags: Graphviz | social network analysis | SSHRC (Social Sciences and Humanities Research Council of Canada) | visualization

Digital History Hacks (2005-08)

Thursday, April 10, 2008

Fitness Functions

Saturday, April 05, 2008

Visualizing the Emergence of a Strategic Knowledge Cluster

William J. Turkel

Blog Archive

The Programming Historian

Digital Historians / Humanists

Digital History / Humanities

Hacking