The results for the accuracy measure are shown below, in the form of a bar graph rather than the scatterplot-style figure we used before. In this graph and the next one, we can see that the performance of the learner is about as good for data that it hasn't seen (i.e., the 1840s trials) as it is for the data that were used to train it. Most of the measures are around two or less, which is comparable to what we saw before. The performance has actually improved for many of the offence categories, like assault, fraud, perjury, conspiracy, kidnapping, receiving and robbery. We do notice, however, some performance degradation for a number of sexual offences, including sexual assault with sodomitical intent, bigamy, indecent assault, rape and sodomy. This might be a statistical anomaly. On the other hand, it might be a sign that the language that was used to describe sexual offences changed somewhat in the 1840s, causing a learner trained on 1830s data to miss later cases. This is one of the ways that tools like machine learning can be used to generate new research questions.

The next figure shows the results for the precision measure. In general the learner makes more false positive errors than misses, which is exactly what we want, given that the false positives can be useful in themselves. We don't see quite the same clear difference between sexual and non-sexual offence categories that we saw with the accuracy measure ... and for some reason it is quite hard for our learner to pick out cases of perverted justice in the 1840s.

Tags: archive | data mining | digital history | feature space | machine learning | text mining