Compression can also be used to automatically refine searches. Suppose you are interested in the explorer Martin Frobisher. If you type "Frobisher" into Yahoo! some of the first few pages of hits are relevant and some are not. Usually you have to wade through the results (or specify more search keywords and hope you don't eliminate something interesting by being too specific.)
An alternate strategy is to enter a broad search keyword (e.g., "Frobisher") and use the NCD to automatically compare the summary that Yahoo! returns for each hit with a "probe" text such as Frobisher's DCB entry. A short Python program to do exactly that is listed here. The search engine results can then be ranked according to increasing NCD from the probe text.
The figure below shows the first 31 of 50 hits for "Frobisher" before and after this search refinement process. I used red font to indicate the irrelevant results. As can be seen, this use of compression and a probe text does a good job of floating the relevant hits to the top of the pile.

Tags: data compression | data mining | Dictionary of Canadian Biography | digital history | Kolmogorov complexity | search | visualization