Regressions for Experiment 2:

Frequency of syllable components predicting word-acceptability ratings.

Two sets of regression data are presented here. The first presents the results of regression analysis using the Splus package, for log frequency counts using the two most useful measures: D, which only counts consonants at the edges of words, and E, which only counts consonants that are not in clusters. The variables for this analysis are presented here. The second set of data gives a more detailed analysis of one or two variables at a time.

One way of deciding whether speakers are sensitive to the frequency of rimes is to perform linear regressions to see how well the frequencies can predict the word-likeness ratings assigned by the subjects in Experiment 2. I compare that with the predictive power of the other variables, because they are heavily correlated with each other. The variables considered were H, frequency of the head; Rime; Shell (onset with coda); Onset; Vowel; and Coda. Frequencies were taken from a list of the monomorphemic monosyllabic words in the Random House Dictionary, 2nd edition.

It was not however immediately clear what sort of measures to use. Should the frequencies be weighted by word frequencies (Token counts) or not (Types)? Should the frequencies be on a log scale or not (Raw)? And what are the relevant comparanda? For example, when figuring the relevant frequencies in /syg/, should the frequency of /s/ be calculated based on the frequency of /s/ anywhere in the word? Only /s/ that appear in onsets? Only /s/ that appear immediately before vowels? Only /s/ that appear at the beginning of words? Only /s/ that occur as the entire onset of a word? To address this question, the frequencies are computed in 5 different environments, labelled A-E:

A: Count phones as they appear anywhere in the word. For pairs, this may even result in counting pairs that occur in the wrong order. Thus for /syg/, the frequency for /yg/ will include words that have /gy/ in them, for example.
B: Count phones that occur in the appropriate part of the syllable. For /syg/, count /s/ occurring anywhere in the onset, /g/ anywhere in the coda, even if they appear in clusters.
C: Count consonants that occur only in the appropriate part of the syllable, adjacent to the vowel.
D: Count consonants that occur only in the appropriate part of the syllable, at the beginning or end of a word.
E: Count consonants that only occur in the appropriate part of the syllable, but not in clusters.

The following table shows the R-squared (proportion of sums of squares accounted for by the regression) for all five environments, and the four different ways of counting frequencies in those environments:

Env	Tokens		Types
	log	raw	log	raw	SUM
A	0.2476	0.0869	0.2900	0.1763	0.8008
B	0.3363	0.2487	0.4613	0.2149	1.2612
C	0.4253	0.2271	0.3823	0.1557	1.1904
D	0.3688	0.3294	0.4776	0.3506	1.5264
E	0.4743	0.3832	0.4256	0.2767	1.5598
SUM	1.8523	1.2753	2.0368	1.1742
	3.1276		3.2110

Log counts are always better than raw counts, averaging 59% better.

Condition A is worse than anything else, except for one instance where it is slightly better than C: the average R-squared is 33% to 49% worse than any other method. The other conditions vary depending on which type of measure one looks at. However, measures that look at the frequency of consonants only when they appear in the matching syllable constituents at the edge of words (D and E) are generally much better than those that have no such requirement (B and C). D is the best one for type counts, and E is the best for token counts. E is marginally (2.2%) better overall. But D provides the analysis with the highest R-squared, explaining 47.8% of the variation in the log types measure.

There doesn't seem to be much to distinguish type and token counts (type counts give only 3% better performance than token counts), so the rest of this analysis will look at both Type and Token counts, for both of the highest-scoring environments (D and E). Only the log of frequencies will be considered.

When doing multiple regression under all four conditions (D and E by Token and Type), the contribution of the Rime is always considered statistically significant by a t-test on its coefficient (p < .05). In 3 of the 4 conditions, the Head is also significant, and in the type counts, the Vowel and the Coda are significant as inverse factors (i.e., the higher their frequency, the lower the mean Wordhood rating). The following table shows which coefficients are significant for each analysis:

	Token	Type
D	H+R	R-V-C
E	H+R	H+R-V-C

It is however always difficult to interpret what significance tests of coefficients mean. Another way of looking at things is to see what the individual contributions are for the variables, added one at a time. In the following analysis, multiple regressions are tried with each available variable. For each regression, we see which variable gives the highest R-squared. Then taking that set of variables as a base, the remaining variables are tested to see which should be added next. This continues until all are factored in. In the tables below, I show the R-squared for each variable taken independently, then show the order in which the remaining variables enter in. Each variable is prefixed with a + or - to show whether the contribution is positive or negative, i.e., whether higher frequencies predict higher or lower word-likeness ratings.

D token
---		H+		HR+		HRS+		HRSC+		HRSCO+
H	.191	R	.3138	S	.3363	C	.3661	O	.3688	V	.3688
S	.114	S	.2148	C	.3293	V	.3412	V	.3661
p>.05:
R	.096	V	.2121	O	.3248	O	.3389
O	.053	C	.2058	V	.3154
C	.012	O	.1975
V	.005

D type
---		H+		HV+		HVC+		HVCR+		HVCRS+
H	.289	V	.3120	C	.3573	R	.4503	S	.4596	O	.4776
O	.242	S	.3064	R	.3351	S	.3828	O	.4588
S	.221	C	.2960	O	.3158	O	.3607
p>.05:
C	.058	R	.2900	S	.3133
R	.005	O	.2899
V	.0004

E token
---		H+		HR+		HRS+		HRSV+		HRSVC+
H	.316	R	.4482	S	.4578	V	.4720	C	.4732	O	.4743
R	.109	V	.3534	V	.4554	C	.4641	O	.4731
S	.100	S	.3292	C	.4489	O	.4589
p>.05:
O	.044	C	.3173	O	.4482
C	.018	O	.3160
V	.005

E type
---		H+		HV+		HVC+		HVCR+		HVCRS+
H	.276	V	.2902	C	.3258	R	.4192	S	.4230	O	.4256
O	.202	C	.2868	R	.3148	S	.3355	O	.4196
S	.127	S	.2853	S	.2915	O	.3263
p>.05:
C	.052	R	.2819	O	.2903
R	.002	O	.2782
V	.0004

First, I consider the variables as individual predictors (first column in each condition, ranked by decreasing strength). In all conditions, the Head frequency is clearly the most important, and the Vowel frequency is the least important. By and large, constituents containing the onset fare better than the others. By tests of significance, Head and Shell each makes a significant regression in all conditions; Onset in the Type-based counts; Rime in the E Token count. In contrast, Vowel and Coda are never significant predictors in themselves. These are indicated in the first columns of the above tables by placing "p>.05" after the last row where the individual regressions are considered significant.

The situation is less clear when multiple variables are admitted. Rime comes in either next (Token count) or after its components, Vowel and Coda, are added as negative predictors. But even though it might be the 4th variable to be added, and be an insignificant predictor by itself, the Rime always makes a large contribution (9 to 13 percentage points) when it is brought in---much larger than the contribution made by any variable except Head frequency. I have no idea what this interaction with Head, Vowel and Coda means.

Another thing to look at is the relative contribution of constructs (Head, Rime, Shell) versus single phonemes (Onset, Vowel, Coda). As individual predictors, the above table shows that the Head is always better than the Onset or the Vowel. The Rime is always better than the Vowel, but its strength relative to the Coda depends on whether one considers Tokens (R > C) or Types (C > R). Similarly, the Shell is always better than the Coda, but its strength relative to the Onset depends on whether one is considering Token counts (S > O) or Types (O > S).

We can also look at how powerful the construct is relative to multiple regression systems in which the frequency of each of their two components is used (first and second columns in the tables below):

D token
H .1913	OV .0649	OVH .2213
R .0957	VC .0128	VCR .1161
S .1140	OC .0715	OCS .1609

D type
H .2893	OV .2415	OVH .3158
R .0047	VC .0879	VCR .1593
S .2208	OC .2726	OCS .3639

E token
H .3156	OV .0556	OVH .3534
R .1087	VC .0183	VCR .1341
S .1003	OC .0658	OCS .1715

E type
H .2756	OV .2021	OVH .2903
R .0024	VC .0655	VCR .1146
S .1273	OC .2427	OCS .2746

A comparison of the Head with the Onset+Vowel interaction shows that the Head is always a better predictor than a combination of its constituents. Rimes may be better (Token) or worse than Vowel+Coda, and the same is true of Shells vis-a-vis Onset + Coda. It may however be important to note that a 3-variable combination of Rime, Vowel, and Coda (third column above) is always more powerful a predictor than the sums of R and V+C. Such interactions are not consistently found for the other constituents.

Finally, one may ask whether, overall, the frequency of phonemes or constructs is a better predictor of Wordhood. In all four conditions, a combination of the three constructs beats a combination of the three phonemes. This is less striking in the Type conditions, where the figures probably represent mostly the strength of the Head:

	HRS	OVC
D Token:	.3363	.0751
D Type:	.3093	.2950
E Token:	.4578	.0686
E Type:	.2932	.2532

In summary, the clearest finding is that speakers are sensitive to the frequency of the Head of nonsense syllables, judging them to be more word-like the more frequent the Head is. The frequency of that construct is more important than the frequency of its two constituents, the Onset and the Vowel, either independently or taken together. There is also some evidence that the other constructs (the Rime and the Shell) are important. In particular, the contribution of the Rime is statistically significant in multiple linear regressions that factor in the frequency of the three constructs and their three components, and it always adds at least 9 percentage points in explaining the word-likeness ratings.

It should not however be concluded that Heads are more important than other constituents. The manipulations of the data in this experiment resulted in suppressing the variability of the Rime frequencies. The onset, on the other hand, was chosen randomly, and so the variability of the Head was greater. The following table shows the variance for Head and Rime, using environment E; F-tests for equality between Head and Rime variance are not significant at p<.05. It is reasonable to assume that speakers would be more sensitive to large differences in frequency than to small differences. An experiment designed specifically for the purpose of regression analysis should attempt to maintain similar variations across the various components, or at least select the entire syllable on a random basis.

	Head	Rime
Token	7.37	5.80
Type	1.10	0.61

Webster: Brett Kessler
email address

Last change 2004-08-27.