Kessler, Brett. 2007. [Review of the book Language Classification by Numbers, by April McMahon & Robert McMahon]. Anthropological Linguistics 49(3–4). 435–438.

Unofficial submitted manuscript:

Historical linguistics is in the curious position of having no widely accepted quantitative or statistical methodology. However much this lack may be celebrated by those of us who went into the field precisely because it does not require advanced mathematics, there would be advantages to a more computational approach. The development of computer software would not only increase the productivity of linguists, but potentially enable research on huge data sets that are beyond the grasp of mere mortals. Methodologies that produce statistics such as confidence intervals and significance values could help us understand to what extent the patterns we uncover may be due merely to chance. And well-defined algorithms could open up the discussion about the classification of languages beyond the small community who are proficient in the comparative method, not to mention the even smaller community who have expertise in applying it to a specific set of languages.

Increasing numbers of researchers have been trying to rectify that lack of a quantitative methodology. The book under review is a select survey of numerical techniques that have been adduced for describing connections between languages, i.e., proving languages to be related and subgrouping them. The survey is not exhaustive. I regretted finding no mention of the work of Kondrak (e.g., 2002), or of Lowe and Mazaudon’s attempt to automate the comparative method (1994). It was also surprising that the discussions of the groundbreaking dialectometry studies of Hans Goebl and others were limited to a sentence or two, especially since the present book heavily emphasizes precisely the kind of contact phenomena that Goebl’s methods are especially well suited for visualizing. Instead, the selection is strongly slanted toward methodologies and software packages that were originally developed for biology. This is not a bad decision, given both Robert McMahon’s expertise in genetics and the current interest in applying biological classification techniques to linguistics.

The result is a survey that is compact yet gives reasonably broad attention to the methods it does discuss, usually with multiple case studies. There is fairly extensive discussion of Swadeshian word lists and how they have been used by researchers such as Don Ringe, Robert L. Oswalt, and Alexis Manaster Ramer to test whether two languages are related to each other. There is also an explanation of the work of Ringe, Tandy Warnow, and colleagues that uses computers to infer the family tree of Indo-European, and a discussion of how John Nerbonne, Wilbert Heeringa, and Paul Heggarty use phonetic comparison algorithms in language and dialect comparisons.

The bulk of the book, however, is devoted to case studies using publicly available software packages developed for phylogenetic studies in biology. Three programs from Felsenstein’s PHYLIP package are discussed—Fitch, Kitsch, and Neighbor–that draw unrooted trees based on data showing the similarity or distance between languages; such similarity is often measured by whether languages share words of common origin for each of a list of concepts, though the final chapter shows that phonetic similarity can also be used. The authors also discuss programs–Network, SplitsTree, and NeighborNet– that draw networks: structures that do not attempt to force all data into an idealized tree, but add extra strokes that explicitly show to what extent factors such as borrowing or parallel development have made it impossible to assign characters to one specific ancestor language.

It is inspiring to see the progress that has been made in developing automated and computer-assisted procedures for language classification. The reader learns about intriguing new ways to visualize and quantify relations between languages. The book is replete with case studies, mostly involving Indo-European languages, with trees and networks suggesting or confirming interesting hypotheses, such as Italo-Celtic (the Italic and Celtic branches belong to a subgroup of their own) or Indo-Hittite (all branches of Indo-European except Anatolian constitute a subgroup).

At the same time, the authors themselves repeatedly advise that these new methodologies should supplement, not replace, the traditional work of linguists, and that they should be considered experimental and in the early stages of their development. Indeed, the authors are forthright in printing results that confound the reader’s expectations. When using the Fitch software to plot a tree of English dialects based on differences in pronunciation (p. 229), for example, their colleague Heggarty reported a tree with several peculiarities. Standard German was grouped closer to the modern English dialects than Old English was. American English was grouped with Scottish and Ulster accents, rather than with the accents of southern England with which it actually groups historically. It is easy to guess the reason for this latter result. Fitch groups objects (here, dialects) based on their similarity to each other, and over 20 percent of the words Heggarty used in computing these similarities happened to originally have postvocalic /r/. Those sounds were derhotacized in his reference southern English accents, but not in his reference accents of American, Scottish, and Irish English. The agreement between American, Scottish, and Irish English in so many words overwhelmed the cases in which American English was more like the English of southern England. The authors show how standard statistical software can reveal which words were responsible for the historically suspect analysis, and indeed eight words with postvocalic /r/ are at the top of the list (p. 233).

It was instructive to see how software can help one diagnose problems caused by other software, but in the end we still do not get a credible phylogenetic tree of the sort linguists have always been interested in–a tree that plots the historical development of the dialects as they diverged. The authors point out that the case can be made that the product is a credible phenetic tree–an analysis that quantifies the similarities that the dialects have with each other here and now, prescinding from their historical origin. Undeniably, phenetic analysis can be very useful, and it is especially appropriate when studying dialects of the same language, which typically borrow very heavily from each other. But phylogenetic systematics, or cladistics, is also useful. Most of the techniques the authors discuss are essentially phenetic, but are deployed with the hope that the simpler and more efficient phenetic analyses will correlate sufficiently with true phylogenetic history. Certainly there are many occasions where imprecise heuristics are useful and necessary. At the same time, linguists have a habit of dealing severely with students who solve a subgrouping assignment by simply grouping together languages based on nothing more than how similar they look to each other. A good student would apply what she knows about directionality of sound change to conclude that postvocalic r must be the original state, and therefore that r in American and Scottish English is a shared retention (symplesiomorphy) and thus of no value in setting up a subgroup, which can only be established on the basis of shared innovations (synapomorphies). Further, the techniques presented take no account of what kinds of language change are likely to happen many times independently (homoplasies) and therefore are less probative evi- dence for subgrouping. Finally, many of the techniques implicitly assume that language change proceeds at a reasonably constant pace. If language B is much more like language A than like language C, then distance-based techniques will tend to group B with A. But C could be historically closer to B and now be widely divergent simply because it independently underwent linguistic change faster than the other languages (Blust 2000). Curiously, though the authors vigorously object to glottochronology–which assigns dates of divergence between languages and subgroups–because they know that the rate of linguistic change is not constant and universal, they do not overtly object to language classification heuristics that implicitly rely on such constancy.

Most of these cladistic issues are brought up here and there in the text, but the authors could have given more explicit guidance as to what principles are being set aside and when. Many readers, especially those still stinging from the disappointments of glottochronology and multilateral comparison, will want constant reassurance that relaxation of basic cladistic theory is not being done out of carelessness, and that the risks of doing so are controllable and outweighed by the benefits.

I noticed few outright errors. Many are concentrated in the introductory chapter, where the comparative method is discussed with understandable brevity and also with several regrettable slips that could put off the historical linguists whom the book seeks to attract. For example, the Germanic shift of Proto-Indo-European d to t is explained as a word-final devoicing (p. 9), and it is stated that there are three Proto-Indo-European labiovelars, /kʷʰ/, /gʷ/, and /kʷ/, the last of which often appears in Sanskrit as /s/, as in asvas ‘horse’ (p. 12). Many of the software and algorithm names are misspelled throughout (e.g., “Splitstree,” “Kitch,” “Neighbour”). The discussion of Nerbonne and Heeringa’s work on phonetically based dialectometry (pp. 210—214) is, in many respects, quite misinformed. For example, their work in developing Levenshtein (string edit) techniques to give the most plausible automatic phonetic alignment of words is misdescribed as though it were simple linear alignment, and then those techniques are invidiously compared to Heggarty’s method of aligning by hand. The discussion comes across as uncharacteristically negative, so much so that I fear the reader will be steered away from some excellent research (e.g., Heeringa 2004).

The book is very readable and well written, often slyly humorous, and filled with much good sense. It fills a great need for an introductory survey of numerical methods in language classification. It is not, however, a comprehensive handbook. One will not find full discussions of algorithms here, nor instructions on how to use the software that is discussed. Even the case studies do not, by and large, stand on their own feet; for example, the many fascinating trees and networks presented have no keys for the language labels, retaining instead the often opaque symbols of their sources. But it is quite likely that masses of additional detail would have detracted from the book’s literate and engaging style. It serves very well as a gentle yet informative introduction to the field, and does so in a very accessible way that makes it a good read for the general linguist, as well as for anyone contemplating taking up numerical classification in the future.


Blust, Robert. 2000. Why lexicostatistics doesn’t work: The “universal” constant hypothesis and the Austronesian languages. In Colin Renfrew, April McMahon, and Larry Trask (eds.), Time Depth in Historical Linguistics, vol. 2, 311–331. Cambridge, England: McDonald Institute for Archaeological Research.

Heeringa, Wilbert. 2004. Measuring dialect pronunciation differences using Levenshtein distance. Groningen, Netherlands: Rijksuniversiteit Groningen Ph.D. dissertation.

Kondrak, Grzegorz. 2002. Algorithms for language reconstruction. Toronto, Canada: University of Toronto Ph.D. dissertation.

Lowe, John B. & Martine Mazaudon. 1994. The Reconstruction Engine: A computer implementation of the comparative method. Computational Linguistics 20. 381–417.

APA citation:

Kessler, B. (2007). [Review of the book Language Classification by Numbers, by A. McMahon & R. McMahon]. Anthropological Linguistics, 49, 435–437.

Last change 2009-08-06T13:50:47-0500