The Tower of Babel

Evolution of Human Language Project
The Tower of Babel
All Databases
Interactive Maps
Russian Language
Text Corpora
What Is It?
Wiki & issue tracker
Technical Advice
Analytical Catalogue
Mythological Motifs

Online Database on
Russian Folk Dialects

Sergey A. Krylov.
“Quantitative Instance-Oriented
Grammatical Dictionary of Modern
Mongolian” database

In memoriam:
S. Starostin

Artem Kozmin
<< Home Page

A brief list and description of the databases on the site

For a detailed description of the database interface and its capacities, please consider the F.A.Q. file in the corresponding section of the site.

1. Long-range etymologies: Compiled by Sergei Starostin from several different sources and featuring some of his own additions, this database contains a large set of tentative parallels between all the major language families of Eurasia. Not a true etymological base in any sense, but, hopefully, will become one in the future.

2. Nostratic etymologies: Compiled by Sergei Starostin from the main Nostratic dictionaries (Illich-Svitych, Dolgopolsky, Bomhard), with extra additions from himself and other members of the Moscow school. The database is relatively raw, since far from all of the problems of Nostratic comparative phonology have been solved.

3. Indo-European etymology: Compiled by Sergei Nikolayev on the basis of A. Walde and J. Pokorny's dictionary, with Anatolian (Hittite) and Tocharian material added in by S. Nikolayev and S. Starostin. Subordinate databases include Germanic and Baltic (also compiled by S. Nikolayev), as well as scanned, OCR'd, and database-converted versions of M. Vasmer's etymological dictionary of Russian (currently serving as a substitute for the comparative Slavic database) and J. Pokorny's unmodified dictionary of Indo-European. NOTE: the Indo-European database usually follows Pokorny in the representation of the proto-forms, presented in their "traditional" (i. e. 'laryngeal-free', except where laryngeals are actually attested in Anatolian) state. However, many of the sub-entries in the dictionary have been granted independent etymological status, thus neutralizing Walde/Pokorny's bias towards "primary" verbal roots.

4. Altaic etymology: Compiled by Sergei Starostin, Anna Dybo, and Oleg Mudrak, this is the on-line version of the three authors' Etymological Dictionary of the Altaic Languages (Brill Academic Publishers, 2003. ISBN 90-04-13153-1). It includes both the main database for Altaic etymology and the subordinate databases for Turkic, Mongolic, Tungus-Manchu, Korean, and Japanese etymology. In its current state, the database differs very little from the published dictionary, but is bound to undergo modifications as various remaining problems are gradually taken care of.

5. Uralic etymology: Compiled by Sergei Starostin, this is still the weakest spot in our Nostratic databases, since it still relies far too heavily on K. Redei's (far from perfect) dictionary and does not include subordinate databases for any subbranches. Some of the members of the project (S. Nikolayev, Yu. Normanskaya, M. Zhivlov) are currently working on a better representation system for Uralic data.

6. Kartvelian etymology: Compiled by Sergei Starostin on the basis of the best comparative Kartvelian dictionaries available (G. Klimov and H. Faehnrich-Z. Sardzhveladze), with notes (mostly on the external relations of Kartvelian) added by Starostin. The database is rather small, which is not surprising considering the scant number of Kartvelian languages.

7. Dravidian etymology: Compiled by George Starostin on the basis of the Dravidian Etymological Dictionary by T. Burrow and M. B. Emeneau, with extensive modifications of the existing reconstruction and various new comments from the compiler. The database is quite detailed, with a large subset of subordinate etymologies for various subbranches of Dravidian.

8. Chukchee-Kamchatkan etymology: Compiled by Oleg Mudrak on the basis of his own Etymological Dictionary of Chukchee-Kamchatkan Languages (Moscow, 2000. ISBN 5-7859-0141-2). The database system includes separate files on Chukchee-Koryak and Itelmen etymologies. Unfortunately, most of the meanings and comments are still exclusively in Russian.

9. Eskimo etymology: Compiled by Oleg Mudrak and representing his own reconstruction of Proto-Eskimo, together with subordinate databases on Yupik and Inupik.

10. Afroasiatic etymology: Compiled by Alexander Militarev and Olga Stolbova on the basis of multiple published sources as well as constantly on-going newer work. Both the main Afroasiatic database and all of the numerous subordinate databases are in a state of near-permanent construction, containing much raw data that still has to be polished, but nevertheless, the database even as it is is a considerable improvement on previously available etymological dictionaries. Subordinate databases include files with Semitic, Berber, Egyptian, Cushitic, and Chadic data (the latter - courtesy of O. Stolbova, most of the others supported by A. Militarev). Arguably the most solidly presented database is the one on Semitic languages, representing the results of the collaborative effort between A. Militarev and Leonid Kogan (which have, by now, yielded two volumes of the monumental Semitic Etymological Dictionary).

11. Sino-Caucasian etymology: Compiled by Sergei Starostin based almost entirely on original research, this is the main evidence for the Sino-Caucasian macrofamily as consisting of North Caucasian, Sino-Tibetan, Yeniseian, and Burushaski; a solid number of Basque etymologies have also been added by John Bengtson. For a detailed explanation of the reconstruction and phonetic correspondences please check out Sergei Starostin's "Sino-Caucasian" ms. in the "Articles and Books" section.

12. North Caucasian etymology: This is the electronic version of the "North Caucasian Etymological Dictionary" by Sergei Starostin and Sergei Nikolayev. The preface to the dictionary, featuring a concise presentation of the complex phonological correspondences between the various subbranches of North Caucasian, is available separately in the "Articles and Books" section. The main database is linked to a series of subordinate bases for Nakh, Avar-Andi, Tsezi, Dargwa, Lak, Khinalug, Lezghian, and West Caucasian (Abkhaz-Adyghe) languages.

13. Sino-Tibetan etymology: The main database here is the electronic version of the "Sino-Tibetan Etymological Dictionary" by Sergei Starostin and Ilya Peiros, the result of direct comparison between Old Chinese, Tibetan, Burmese, Jingpo, and Lushai, with Proto-Kiranti added later by Sergei Starostin and data on Lepcha added by Olga Mazo. Subordinate databases include the Proto-Kiranti reconstruction by S. Starostin, as well as four electronic dictionaries of various Kiranti languages kindly provided to us by our Leiden colleagues (George van Driem, R. Rutgers, and J. Tolsma).

14. Chinese characters: The "Bigchina" database is Sergei Starostin's unfinished project of supplying a full-fledged etymological background for all Chinese characters attested in pre-Han times. It is linked to the main Sino-Tibetan database and contains Sergei Starostin's reconstructions for Middle and Old Chinese along with various comments. Readings for different sub-periods of Old Chinese have been added by George Starostin based on Sergei Starostin's relative chronology of phonetical developments in Chinese. The subordinate database on Chinese dialects has been kindly provided to us by Dr. William Wang; it includes comparative phonetic data on 18 different Chinese dialects for more than 2,500 characters.

15. Yeniseian etymology: Compiled by Sergei Starostin on the basis of his own reconstruction for Proto-Yeniseian (the basic details of this reconstruction can be learned from his 1982 article in the "Articles and Books" section), with additional comments on Heinrich Werner's alternative reconstruction. Linked to the Sino-Caucasian database.

16. Burushaski etymology: Compiled by Sergei Starostin from data on three main Burushaski dialects and linked to the Sino-Caucasian database.

17. Basque etymology: Compiled by John Bengtson and linked to the Sino-Caucasian database. Featuring John's own reconstruction of Proto-Basque along with comments on the more "traditional" variants of Basque reconstruction.

18. Austric etymology: Compiled by Sergei Starostin and Ilya Peiros. The database consists of several hundred comparisons, both old and new, reflecting a rather sketchy and preliminary set of correspondences between Austric languages worked out by the compilers (although the correspondences do not seem to be very complex between these languages). Compared data include Austro-Asiatic and Thai-Kadai reconstructions by Ilya Peiros and others; Austronesian reconstructions by Otto Dempwolff, Robert Blust and others (unfortunately, no Austronesian database is as of yet available on the site); and separately added occasional parallels from Miao-Yao (Hmong-Mien) languages.

19. Austro-Asiatic etymology: Compiled by Ilya Peiros. This is a huge sub-set of databases, most of which are, however, still under heavy construction. The proposed proto-forms for Austro-Asiatic are based on the author's own research which also takes into account many previous achievements. Subordinate databases include Katuic, Bahnaric, Khmer, Pearic, Viet-Muong and other etymologies, with varying degrees of completion (the ones for Bahnaric are arguably among the most accomplished).

20. Thai-Kadai etymology: Compiled by Ilya Peiros. This is still a preliminary version, waiting to be perfected. Its most accomplished part is the Zhuang-Tai etymological database, most of the data in which have been taken from Li Fang-kuei's excellent reconstruction of this family.

21. Khoisan etymology: Compiled by George Starostin. This system of databases is tentatively tied together by the "Macro-Khoisan etymology" database, presenting various comparanda between such distantly related families as Peripheral Khoisan and Central Khoisan (Khoe), as well as such language isolates as Sandawe (most probably, a true "Khoisan" language) and Hadza (more likely, either a non-Khoisan language or a Khoisan-Afroasiatic hybrid language). Most of the reconstructions over the entire subset of Khoisan databases belong to G. Starostin, except for the Central Khoisan system, which more or less closely follows the research of Rainer Vossen on that family.

22. Global linguistic database: Compiled by Merritt Ruhlen, this database contains a certain amount of typological information on most of the world's languages. A detailed description of the structure of the database is available in .PDF form.

23. Bibliographic data: Compiled by Sergei Starostin and George Starostin. Contains bibliographic information on many (in a not-too-distant perspective - all) of the data sources used in ToB's databases and can be accessed both separately and through hyperlinks placed in particular linguistic databases.