next up previous
Next: Kruskal Up: The programme Previous: DictTrie

Respell

This module uses the DictTrie and its own knowledge of sound-spelling rules to generate respellings for a misspelling.

The guesses are generated by assuming that each letter in the input is intended either to directly spell one or more phonemes, or to do so in conjunction with one or more other letters. Thus I assume the existence of correspondences like a = /a/, x = /ks/, and sh = /S/. The hypothesis of silent letters is ruled out, since there is effectively no way to distinguish a silent letter from a partner in a digraph. That is, one cannot say whether sh is a digraph, or whether s spells /S/ before a silent h. [Lucassen 1983] implicitly agrees on the issue, but went the other way, saying that there are no digraphs, just silent letters.

Respell contains (as include file RuleTrie.c) the full set of 1,053 context-free grapheme to phoneme correspondences found in the dictionary.gif These correspondences are stored in a trie which branches on the spelling, but is conceptually much like the DictTrie index. For example, to find the phonemic correspondences for the spelling ai, one begins at the root of the trie. One would take the a branch to get to the node for a, and take its i branch to get to the node for ai. This might have other continuation nodes, but it also has a termination pointer to arrays of pronunciations and the frequencies with which these pronunciations correspond to ai. These frequencies represent the number of times the sound-spelling correspondence occurs in the dictionary word types. It is often argued that such frequencies should be weighted by the frequency of the words themselves in written text, since that would represent how often the correspondence is encountered in daily life. I hypothesize however that in general humans analogize on the basis of word types rather than tokens. One rarely sees for example the letter f used to spell the /v/ sound in misspelt words, even though that correspondence appears in 4% of words in running English text (in the word of [Francis 1982]).

The Respell algorithm in presented and explained in the attached source code listings, but it is important enough to be sketched briefly here. The misspelt word is stored in global space while a recursive procedure Recur works its way through it with pointers. Briefly, each iteration of Recur matches part of the word with a pronunciation, based on a single sound-spelling correspondence, then recurses to match the next part of the word. On reaching the end of the misspelling, the word(s) matching that string of phonemes is looked up in the dictionary and passed on to the client. Recur also passes along the number of correspondences so far matched and the sum of their frequency, so that at the end the average frequency can be reported along with the spellings. As discussed under DictTrie above, pronunciations can be looked up in the dictionary one phoneme at a time by moving from node to node via MoveUpTrie. Respell therefore merely passes the current node on from one invocation to the next. As soon as the pronunciation it is generating begins to represent an initial segment not found in English, Respell can abort that path instead of following fruitlessly to the end of the word.

Of course, as so far described, Recur could be a simple iterative procedure instead of a recursive one. There are two complications which explain the need for the overhead of recursion. One is that for a given spelling segment, there may be more than one corresponding pronunciation, a well-known feature of English orthography. The use of recursion allows us to follow multiple paths through the dictionary index, branching at each alternative pronunciation. Thus Recur is called again for each pronunciation associated with a spelling. The other complication is that a spelling correspondence may have one or more letters, but there is no easy way to predict whether a known multiletter string should be parsed as a multiletter correspondence in a particular word (e.g., sh in bishop) or as simpler letter correspondences that just happen to be adjacent (e.g., sh in mishap). Recur addresses this problem by trying all possibilities at each recursion. For example, when it works its way to the sh in a word, it first checks out all pronunciations for s and calls itself recursively with them, then checks out all the pronunciations of sh and calls itself recursively with those. The need to look at spelling rules one letter at a time is just like the need to look up pronunciations one phoneme at a time, and explains why the correspondences are organized as a trie.

When Recur gets to the end of a misspelling, the initial phoneme segment generated may or might not be a complete word. For example, processing the spelling ka to the end will successfully generate /ka/ as a possible spelling, which won't have been pruned because it is an initial segment of words like cat and Kathy, but there won't actually be a word pronounced /ka/. Respell calls the LookUp routine of DictTrie to see if the word exists. If not, it simply returns quietly from that invocation, backtracking to pursue other paths. If the word does exist, Respell calls GetEntry, then calls the client's callback procedure with the resultant spelling and word class. It repeats this as long as it gets from GetEntry words with the same pronunciation.



next up previous
Next: Kruskal Up: The programme Previous: DictTrie



Brett Kessler
Wed Dec 27 22:16:48 PST 1995