Kessler, Brett. 2001. The significance of word lists: Statistical tests for investigating historical connections between languages. Stanford, CA: CSLI Publications.


Historical linguistics has no generally accepted methodology for calculating whether the connections it documents between languages are statistically significant. This lack has led to unusually strident controversies. One linguist spends years documenting evidence of special similarities between languages and calculates the odds as ten billion to one that they are related; another linguist dismisses the evidence as a tissue of coincidences. This polarization is particularly strong between those who accept and reject the technique of mass lexical comparison, which groups languages into families by collecting lists of words for the same concept, and noting exceptional similarities. Opponents (traditional comparativists) claim that the kind of evidence mass lexicalists gather is in principle incapable of revealing connections between languages, and may even look with disfavor on the whole idea of comparing lists of words. In this book I step back and look at the challenge from a statistical standpoint: What evidence is available for statistically proving that two languages are historically connected, and how can that evidence be treated in a mathematically convincing and unbiased manner? My conclusions are perhaps surprising. As currently practiced, the traditional comparativist method is much more reliable than mass lexical comparison. But when a rigorous statistical methodology is introduced, the most effective and reliable type of evidence turns out to be something that looks a lot like mass lexical comparison: collecting lists of words and comparing at a fairly superficial level those that name the same concept. The kind of evidence most favored by traditional comparativists turns out to be intractable or less powerful in the context of a statistical experiment. Hopefully this book will not simply antagonize both camps and lead to another schism. Rather, I hope that readers will be convinced that the difference between the two schools really hinges on a rather narrow issue of statistical significance, and that that difference can be bridged by an objective method of testing for such significance.

This is not one of those proposals that claim that traditional linguistic research can be thrown out the window and replaced by a simple numerical formula. Much of the book deals with the havoc that can be wreaked if one simply pulls words out of a dictionary and runs them through a computer without first conducting painstaking linguistic research into the structure and history of each individual word. It is for that reason that I refrain from directly investigating here the specific claims for large language families that are often put forth by mass lexicalists: It is much preferable for specialists in those areas to do so. Nevertheless, to avoid drifting off into the aether of the purely theoretical, I do give many examples of how my methodology would be applied, and what the results would be like, using a suite of eight languages. I also present explicit guidelines to experts who wish to apply my methodology to their own languages. They should be warned, however, that the statistics involved necessarily entail the use of specialized computer programs. Linguists who wish to explore these techniques before investing in a programing project are invited to contact me at

APA citation:

