WESLALEX is the first cross-linguistic database of words in children’s school books in the West Slavic languages: Czech, Slovak, and Polish. The database contains morphologically and phonetically tagged words extracted from the most widely used instructional textbooks for grades 1 to 3/4/5, and it allows, via a simple WWW interface, searches for useful statistics about words and their morphological and phonological attributes.
Inspired by the British-English Children’s Printed Word Database (CPWD) (Masterson et al., 2003), a major feature of WESLALEX is that it generates comparable information about the words that primary school children encounter in reading in Czech, Slovak, and (soon also in) Polish, as well as in British English (CPWD). A unique extension of WESLALEX is the wealth of grammatical information that it can generate, which is a component of key importance for the inflected Slavic languages. WESLALEX can serve a wide variety of single-language and cross-linguistic research and educational purposes.
Files in this directory are in character encoding UTF-8. If your browser keeps serving up apparent garbage, try setting its character encoding. E.g. in Firefox, select View / Character Encoding / Unicode (UTF-8) or try View / Character Encoding / Auto-detect / Universal.
Czech | Wordforms CSV; | Lemmas CSV; | Excel (wordforms and lemmas) |
Polish | Wordforms CSV; | Wordforms Excel | |
Slovak | Wordforms CSV; | Lemmas CSV; | Excel (wordforms and lemmas) |
Fields are as follows. Additional information is given as comments in the Excel files.
Principal investigator for this project is Markéta Caravolas, Bangor University. Support for this project was provided by a grant from the British Academy.
Kessler, B., & Caravolas, M. (2011). Weslalex: West Slavic lexicon of child-directed printed words. Retrieved from http://spell.psychology.wustl.edu/weslalex