The Global Lexicostatistical Database: Mission statement











In comparative-historical linguistics, uniform basic wordlists of related or potentially related languages are primarily used for lexicostatistics, a simple, but often efficient technique that derives the relative genetic proximity of languages from the percentages of «shared» (that is, going back to the same historical ancestor) items on the Swadesh list; and glot­tochronology, a slightly more complex procedure of assigning absolute historical dates for these common ancestors based on the idea that basic lexicon is replaced over time at a constant or, at least, regu­larly shifting rate that can be used as a rough equivalent of a «glotto-clock».


Although both lexicostatistics and especially glottochronology have been frequently criticized on various grounds (some of the criticisms are discussed, answered, and refuted in GLD-related pa­pers published on the site), they still remain a viable, promising, and, what is most important, the only universally applicable method of language classifi­cation. The exactness and reliability of the results, however, depend significantly on how well the lexicostatistical calculations are aligned with the results of the historical-comparative analysis of the basic lexicon.


The past decade has seen a major increase of interest in various lexicostatistical techniques, much of it stemming from the development of new phylogenetic classification methods in biology and a desire to test them in other fields of study; there has been a veritable swarm of publications in prestigious journals that apply complex statistical and probabilistic algorithms to wordlists of languages. Very few of these publications, however, have so far managed to make a serious impact on the general field of historical linguistics, since, for the most part, what they offer is statistical approximations that do not deal with individual «word histories», and sometimes go as far as to contradict historical reality and common sense — due to either false priors, or failure to take into account all the necessary factors, or, as is very often the case, inadequate data collections.


The chief mission of the GLD is to assemble a unified and ordered collection of basic lexical data on all/most of the world's languages that may be easily submitted to various automatized algorithms of analysis, but, above everything else, would acknowledge and take advantage of, rather than routinely ignore, all the achievements of historical linguistics. To that end, the data, wherever possible, should be accompanied with notes on available synchronic and diachronic information; cognation indexes that tie together words of common origin should be explicated and justified; and, most importantly, thorough attention should be paid to the construction of the wordlist itself — experience shows that common mis­takes in the data are a very frequent problem with some «popular» Swadesh wordlists indis­criminately employed by researchers without a pedigree in historical linguistics.


The wordlists that are collected and annotated on the GLD site may serve a variety of purposes. Beyond the obvious one — serving as a basis for genetic classifications — they will be of signi­ficant use to various typological studies, particularly in the field of the typology of phonetic change. Another advantage of the annotation system employed in the database is the possibility of its eventual use for the study of semantic shifts in language; any progress in this sphere will have se­rious repercussions in almost every area of linguistic sciences.


BACK TO MAIN PAGE                                   DATABASE LIST                              RUSSIAN VERSION


     © 2011-2016 George Starostin (site design, data input coordination)
    © 2011-2016 Phil Krylov (programming, technical support)