Binomial probabilities for word-rating experiment

This experiment uses sets of quadruplets for stimuli. The quadruplets are 4 nonsense CVC syllables. Two of them (Hi) have rimes that are more frequent than the rimes of the others (Lo), but otherwise the pairs are permutations of the same bodies (CV-) and coda (-C). For example:

Subjects are asked to rate how much these syllables are like words, on a scale of 1 (not at all) to 7 (very wordlike). The hypothesis is that any difference between the Hi and Lo pairs would have to be due to the rimes, because in all other respects the pairs are identical.

One approach to significance is to evaluate the statistics within each quadruplet, testing whether both His are preferred over both Los. Here I use binomial probabilities. Assuming that subjects would assign ratings evenly between 1 and 7 given no actual preferences, then for any pair of syllables Hi1 and Lo1 there would be a 1/7 chance of a tie, a 21/49 chance that Hi1 would be rated above Lo1, and a 21/49 chance that Lo1 would be rated above Hi1. The same would obtain for the other 3 possible comparisons (Hi1 vs Lo2, Hi2 vs Lo1, Hi2 vs Lo2). So the odds that all 4 pairings would come out "right" (i.e., both His having values above both Los) would be 21/49 to the 4th power, or 0.0337. If we have 26 subjects evaluating a quadruplet, we would expect on the average about 1 subject (0.8762) to get the right values. However, by the binomial distribution, we would have to see at least 4 such answers before we could conclude that the deviation from the expectation is statistically significant at p < .05: the probability of getting 4 or more events of probability .0337 for N=26 is .0107, whereas for 3 or more events it is 0.0559. Because the experiment actually used two sets of quadruplets (an octuplet) for each pair of rimes, one could also lump each of those two together and consider this an N=52 experiment, in which case we would need to find a total of 5 or more correct responses per octuplet to achieve significance (p=.0305).

The results for this experiment show that 3 of the 5 octuplets achieved significance at the octuplet level, getting 5 or more correct out of 52 opportunities. Furthermore, 3 of the 10 quadruplets achieve significance in their own right, with one of the octuplets showing significance for both of the quadruplets.

Applying the binomial distribution to those results, the odds of getting 3 or more significant octuplets out of 5 at probability level .05 is 0.0012; but because we are really looking at probability level .0305, the overall probability here is .0003. At the quadruplet level 3 out of 10 showed significance. The odds of that happening at significance level .05 is just a smidgeon over .0115. However, since we were really forced to deal at the probability level of .0107, the actual probability of 3 out of 10 trials achieving significance is just .00014.

It would appear therefore that the hypothesis is correct, that there is a statistically significant preference for both Hi syllables over both Lo syllables, even though the effect is not proven to obtain for all quadruplets.

These computations were done in part with a Perl program to analyse the experimental data, and another Perl program to compute binomial probabilities.


Webster: Brett Kessler
email address
Last change 2004-08-27.