Many of the ideas in this programme were inspired by discussions in a course taught at Stanford University by Jared Bernstein of SRI and András Kornai of CSLI and the Hungarian Academy of Sciences.

Originally written by Pace Willisson.

The phoneme notation used here is explained in section 2.1.1. Although more intuitive phonemic representations could be generated, even without the use of IPA founts, it was felt that the reader who wishes to study the source code and dictionary files would be better served if the representations used in this paper are the same.

For a discussion of where these correspondences come from, see Section 4.

It does give precedence to the more common pronunciations for a given string, but it also has the possibly undesirable property of giving precedence to correspondences with shorter spelling strings, especially on the left side of the word.

Inspection shows that many of the lower-ranked words, such as duece/deuce, could be described as a transposition. If the Kruskal algorithm were modified to treat that as an atomic operation (weighted 1 instead of 2), as is done in most systems inspired by [Gates 1937], rankings might perhaps be even better.

This and most similar pre-processing tasks were undertaken using the Awk programming language [Aho 1987] and standard Unix tools. This particular programme was convShPr.nawk.

Brett Kessler
Wed Dec 27 22:16:48 PST 1995