Next: Product description Up: A spelling corrector incorporating Previous: A spelling corrector incorporating

The problem

Attempts at automatic spelling correction are dominated by transformations and comparisons directly between misspellings and candidate respellings. Alberga [Alberga 1967] gives a thorough overview of many such approaches (see also [Pollock 1984] for a more recent project in this vein), and presents results of experiments where he reports a 98% success rate in using such techniques to verify whether a misspelt word is an attempt at a particular target spelling. Automated spelling correctors are in general not nearly so good, in part because it would be prohibitively expensive to apply sophisticated string comparisons between a misspelling and all the words in a dictionary. But an influential paper, [Damerau 1964], showed that, following an early analysis by Gates [Gates 1937], a large majority of errors could be corrected by the insertion, deletion or substitution of a single letter, or the transposition of a single pair of letters -- the sort of transformations which can be efficiently processed automatically. As a result of the popularity of this successful model, spelling correctors tend to be better at correcting mechanical keystroke errors -- which the typist could have easily corrected without any help beyond flagging the error -- than at fixing correctly typed misspellings which one can recognize as an attempt to spell out the pronunciation of a word. For example, the widely-used Unix spelling corrector ispell can easily correct the form hpantom to phantom, but not the misspelling fantom.

Such systems have their niche, partly because selecting a correction from a menu can be faster than editing text, and partly because many true (conceptual) spelling errors are indistinguishable from such mechanical errors. For example, if the computer analyzes the misspelling recieve as a simple mechanical transposition error, the user is still grateful for the correction, even if what really occurred was a spelling error influenced by analogy with believe and a host of similar words. But one might wonder whether a large class of spelling errors might be more effectively handled if the programme modeled the processes which led the human to make the error. Although some misspellings surely arise from misremembering the details of a memorized spelling (and therefore may be similar in many ways to keystroke errors), others are clearly the product of trying to spell out pronunciations, based on sound to spelling correspondences the speller is familiar with from other words. An unpublished PhD thesis cited in [Alberga 1967], namely [Masters 1927], reported that in a study of spelling errors made by pupils and students from eighth grade through college, 65% were phonetically correct and another 14% nearly so. It is discouraging that Alberga goes on to state that in following up on an unpublished suggestion by H. B. Savin of the University of Pennsylvania to do just the sort of sound-spelling analysis I am here suggesting, he got low performance: only 64% of the misspellings he paired with the correct spelling were verified as matches by his programme. Alberga unfortunately gives few details of his algorithm, perhaps in part because it performed much worse than every other algorithm he tried, but this does seem to confirm that only about two thirds of all spelling errors use normal sound-spelling correspondences. I nevertheless undertook to develop an orthographically based spelling corrector. The intent was partly to verify Alberga's results in actual use (he simply asked the programme to verify whether two words might be intended to be the same). But even if the low results turned out to be sustained, it could be the case that this approach covers a different set of words than traditional spelling correctors, and so may complement their efficacy.

In summary, my approach assumes that the user is aware of the basic rules of English orthography, but may be vague on some of the exceptions and idiosyncrasies of particular words or morphemes. If the user types a j, this approach does not assume that this was a random insertion, or possibly a substitution for h or k or some other letter, but rather that the user meant to spell some sound that is in some context spelt by a j, viz., /j/ (or conceivably /y/ as in hallelujah). The corresponding letter for /j/ may turn out to be a g, dg or d as well as j. The system I have developed is aware of the true pronunciation of English words, has a full inventory of the sound-spelling correspondences used in English orthography, and knows how often such correspondences are regularly employed in English spelling. On encountering a misspelling, it generates a list of possible respellings and attempts to position the most likely guesses at the top. A prototype called correct is now available and has undergone preliminary evaluation of its effectiveness.

The next section of this report describes the product and its implementation, and Chapter 3 presents the results of running evaluation suites on it. Chapter 4 concludes the report with some details on how the system was developed.

Next: Product description Up: A spelling corrector incorporating Previous: A spelling corrector incorporating

Brett Kessler
Wed Dec 27 22:16:48 PST 1995