next up previous
Next: Sound-spelling correspondences Up: Development history Previous: Dictionary preprocessing

Dictionary index

The next task is to build the index file. Because an index trie can use large amounts of memory, this is not done all at once. Rather, the programme splitInv first derives from oneDict a set of 39 smaller dictionaries, one for each phoneme, and assigns the words to a file, based on their initial. For each of those text files, new-make-trie then constructs a trie index. This is in the same format that correct requires, but it would be inconvenient to have to manipulate 39 index files. Instead, the programme collapseDict joins the index files, making one large file. The truly curious are invited to look at the source files for further details. The basic idea is that new-make-trie builds a trie in memory, then prints it out; the only trick is that it prints all children before their parents, so that the parents will know their children's location for use in continuation pointers. collapseDict constructs a new master root node; otherwise the tries for the subdictionaries are simply copied, adding to the continuation and termination pointers an offset to account for the nodes and dictionary entries that were inserted before the subtrie.



Brett Kessler
Wed Dec 27 22:16:48 PST 1995