next up previous
Next: ListCorrections Up: The programme Previous: Respell

Kruskal

Inasmuch as the theory behind correct is that misspellings are just perturbations in the normal process of generating spellings by applying the rules of English sound-spelling correspondences, one might hypothesize that misspellings are more likely to be fairly similar to the correct word than widely different. For if there is a certain probability that a person would generate the wrong spelling correspondence for any particular phoneme in the word, then the probability that several or all should be wrong should be smaller. Indeed, [Pollock 1984] reports that in keycoded text, 90-95% of all misspellings have but one error. Therefore some measure of the distance between a misspelling and a candidate respelling would seem to be one good way of ranking among several candidate respellings.

The module Kruskal calculates the Levenshtein distance between two words. It follows the algorithm set forth by Kruskal in [Sankoff 1983], under the simple set of assumptions that all insertions, deletions, and substitutions are of equal weight. Then it returns an integer number telling the minimum number of such operations that would be needed to transform the one word into the other.



Brett Kessler
Wed Dec 27 22:16:48 PST 1995