
From the graph, it is pretty clear that it is easiest to learn to categorize larceny, which is the best-attested offence we looked at. We can also see that the TFIDF-15 learner does particularly poorly by missing many instances of the less frequent offences. Increasing the number of features the learner can make use of seems to improve performance up to a point. After that, increasing features increases the number of false positives the learner makes. We want the performance of our learner to be relatively robust when learning offence categories that are more or less frequently attested, which means we want the learner with the tightest grouping of results for these test categories (in other words, TFIDF-50).
Note that in this test, we only ran each learner once on each data set, rather than doing ten-fold cross-validation. Our experiments with cross-validation suggested that the different versions of the learner were relatively insensitive to the order in which training and testing trials were presented. Since this is exploratory work, we will make the (possibly incorrect) assumption that a single trial is probably representative. This will let us do a lot more testing in the same amount of time.
Tags: archive | data mining | digital history | feature space | machine learning | text mining