The Global Lexicostatistical Database: Plans











The Global Lexicostatistical Database is a long-term project with a limited number of specialists and resources at its disposition; this makes it im­pos­sible to implement all the desired features at the very start, or even come up with a definitive/finalized list of such features. Nevertheless, the GLD is currently updated on at least a bi-weekly basis, and we hope that the project will gradually pick up speed as time goes by. Besides what is already there, users should eventually expect the following ad­di­tions:


1. More lists — no dazzling interface or complex analytical machinery is worth anything without lots and lots of accompanying data! Swadesh wordlists are being added to the overall database all the time, and, if you are willing to master the format, you can throw in your own additions as well (see Collaboration for further details).

It is also quite possible that at least some databases will be later expanded to include «complete» Swadesh 200-item lists, and, perhaps, even move beyond that limit. In particular, work has recently begun on the construction and aprobation of an expanded 400-item list that not only adds more data, but also allows to take into account the possibility of «trivial» semantic shifts (empirically attested as polysemies or reconstructed beyond a reasonable doubt in low-level families) between items, so as to come up with more complex models of lexicostatistical classifications. Certain preliminary results on that research are expected to appear on the website relatively soon.


2. User-defined versions — possibility for registered users to build their own «unauthorised» copies of the uploaded Swadesh wordlists online, ad­ding their own notes and mo­difying cognation indexes if they do not agree with the original etymological judgements or wish to test different hypothetical configurations. This will make the GLD a sig­nificant working tool for all historical-comparative linguists working with lexicostatistics. However, this particular feature will require quite some time to implement. In the meantime, it is always possible to work with the databases in offline mode, by using StarLing.


3. More tree-building options — even more additional parameters will be introduced into the tree-building algorithm, including different variants of the standard glottochronological formula, error margin indications, and possible incorporation of character-based rather than distance-based algorithms.


