Words are from the Random House Dictionary, 2nd ed. But we collapse the distinction between /A/ (khat), /O/ (cot), and /c/ (caught), representing all as /A/. We also collapse the distinction between /w/ (witch) and /H/ (which), calling both /w/.
Frequencies are reported either by word types or by word tokens. The latter weights each string frequency by the frequency of the word in which it is found. The frequencies are taken from the Brown corpus (Francis & Kucera). Words not found in that corpus are assigned the frequency 0.3.