The Global Lexicostatistical Database: News and updates


NEWS

GENERAL INFORMATION

MISSION STATEMENT

GLD SPECIFICS

CONTRIBUTORS

PLANS

COLLABORATION

DOWNLOADS

PUBLICATIONS

 

02.26.2017. Today's update:

1) A wordlist for Pekon Kayan added to the Karen database (Sino-Tibetan section). Based essentially on Ken Manson's fieldwork, re-compiled and annotated by G. Starostin.

2) A wordlist for the Dinka language added to the West Nilotic database (East Sudanic section). The wordlist is primarily based on Arthur Nebel's dictionary (Rek dialect), but includes extensive notes on various other Dinka dialects as well, synthesized from a variety of sources. Compiled and annotated by G. Starostin.

 

02.12.2017. Today's update:

1) A wordlist for Surui-Paiter (Monde group) added to the Tupi section of the database. Compiled and annotated by A. Nikulin.

2) A wordlist for the Bench language added to the Gonga-Gimojan database (Omotic section). Compiled and annotated by G. Starostin based on comparison of two different published sources.

 

01.22.2017. The first, and quite large, update for 2017 includes:

1) From A. Nikulin, two more wordlists in the South American section: the Arawakan family gets its first coverage with two wordlists for the Maritime group: Lokono and Añu. Compiled and annotated based on a variety of recent sources.

2) From A. Trofimov, a wordlist for the extinct Avestan language (largely based on Young Avestan data) added to the Iranian database; compiled based on Bartholomae's classic dictionary and cross-checked with actual texts.

3) The Koman database (Komuz family) has been completed with an incomplete, but testable wordlist for the extinct, distantly related Gule (Anej) language, compiled by G. Starostin based on M. L. Bender's data, as well as several much earlier sources. Additionally, the information on Uduk (Twampa) has been significantly updated, courtesy of Don Killian who was kind enough to provide comments based on his own fieldwork.

4) The Talodi database (Kordofanian family) has been updated with wordlists for the closely related Dagik (Masakin) and Ngile (Daloka) languages. Compiled and annotated by G. Starostin based on Th. Schadeberg's wordlists and some additional control sources.

5) In the Sino-Tibetan section, the Karen database has been updated with wordlists for Kayah Monu, Brek Kayaw and Yintale languages. Largely based on data from a comparative study by Myar Doo Myar Reh from 2004, as well as some additional control sources. Compiled and annotated by G. Starostin.

6) A proto-wordlist has been added to the Khoekhoe database (Central Khoisan section), with detailed comments. Current reconstruction by G. Starostin, but largely based on R. Vossen's research originally published in 1997.

 

12.05.2016. This week's update:

1) More lists in the South American section: four varieties of Wichi (Eastern, Bazan, Nocten, Güisnay) and Iyojwa'ja Chorote, all belonging to the Matacoan group and family. Compiled and annotated by A. Nikulin based on a variety of mostly recent sources.

2) The Koman database is nearly completed with a wordlist for the last major living Koman language, Uduk. Compiled by G. Starostin based on Beam & Cridland's classic dictionary from 1970, as well as a set of more recent control sources.

 

11.14.2016. Another lengthy break, but here are more updates:

1) The Gondi-Kui (Central Dravidian) database updated with a wordlist for the Konda language. Compiled and annotated by G. Starostin based on the work of Bh. Krishnamurti, Th. Burrow and S. Bhattacharya in the 1960s.

2) The Kordofanian section is expanded with the first two wordlists from the Talodi group: Talodi proper (Jomang) and Nding. Compiled and annotated based on Th. Schadeberg's lists.

3) A new database added to the Omotic (Afro-Asiatic) section: the Gonga-Gimojan database (Ometo languages and their closest relatives) now features lists for Yem (Yemsa) and Chara languages. Compiled and annotated by G. Starostin based on a variety of recent sources.

 

10.09.2016. After a summer break, the GLD updates are back with a bang!

1) From M. Saenko, three more wordlists to round out the Romance database: one for the Asturian language and two for two different stages of the Latin language - based on a detailed analysis of textual contexts for Plautus (Archaic Latin) and Apuleius (Late Classical Latin), as well as some additional texts as control sources.

2) From A. Trofimov, a detailed wordlist on Vedic Sanskrit, largely based on the Atharvaveda, but with strong attention to the Rigveda as well; the annotations contain both references to classic dictionaries (Grassmann, etc.) and to particular textual locations that help select the optimal candidates for the Swadesh wordlist.

3) From A. Kassian, another update to the Athabaskan database: a wordlist for Degexit'an, compiled and annotated from existing dictionaries and auxiliary sources.

4) From A. Nikulin, five wordlists for the Tupari language group of South America, including Tupari proper, Akuntsu, Wayoro, Makurap, and Mekens. Compiled based on existing dictionaries, wordlists, grammar sketches, and other rare sources.

5) From G. Starostin, seven wordlists on the Heiban languages of the Kordofanian language family: Utoro, Shirumba, Tiro, Moro, Ko, Warnang, and Logol. Compiled and annotated based on Th. Schadeberg's lists, but also cross-checked with other sources on these languages whenever they are available (esp. for Tiro, Utoro, and Moro). This more or less completes the Heiban database.

6) Also from G. Starostin: the first wordlist of a Nilotic language - Nuer, belonging to the West Nilotic group. Compiled and annotated based on J. Kiggen's classic dictionary and cross-checked with two additional sources.

7) Finally, from Timothy Usher, four wordlists for two different languages (more precisely, two subdialectal varieties of one language and two very closely related dialects in another branch) of the Bulaka River family of New Guinea: Maklew (two different wordlists) and Jelmek (including Jabsch). More information on these languages may be found on Timothy's own website, Newguineaworld: https://sites.google.com/site/newguineaworld/families/bulaka-river.

 

06.06.2016. Two updates:

1) Three more wordlists in the Romance database: for Turinese Piemontese, Savoyard Franco-Provençal, and Walloon. As usual, compiled by M. Saenko based on work with language speakers and comparison with previously published sources where available.

2) A new database has been added to the Nilo-Saharan section: wordlists for Koman languages Kwama (with a separate wordlist for its dialectal variety called Begi Mao), Komo, and Opo. (The wordlist for the fourth Koman language, Uduk/Twampa, is forthcoming). Compiled and annotated by G. Starostin based on a variety of old and recent sources.

 

22.05.2016. Two Indo-European updates today:

1) The Romance database has been updated with two classical lists: Old Italian, based on the preserved corpus of Dante Alighieri, and Old French, based on the corpus of Chrétien de Troyes. Both wordlists compiled and annotated by M. Saenko.

2) The Iranian database has been updated with lists for Yaghnobi and Parachi languages, compiled and annotated by our new contributor, Artem Trofimov, based on a variety of published sources.

 

16.05.2016. Two updates:

1) The Gondi-Kui database (Dravidian section) has been updated with a wordlist for Kuwi, compiled by G. Starostin based on M. Israel's dictionary and additional control sources.

2) The Hmong database has been updated with a wordlist for Eastern Xiangxi, compiled by G. Starostin based on a comparative monograph by Yang Zaibiao. Additionally, some mistakes were corrected and extra notes written for Western Xiangxi as well.

 

11.04.2016. Updates:

1) The Karen group database (Sino-Tibetan family) has been expanded with a list for Western Kayah Li, compiled and annotated by G. Starostin based on several recent data sources. Some changes have been made to the Eastern Kayah Li wordlist as well.

2) A wordlist for the Tapirape language (Tupi-Guarani group) has been compiled and annotated by A. Nikulin, based on Antonio Almeida's grammar and glossary as well as a more recent control source.

3) Another two worldlists added to the Romance database: for the Foligno dialect of Italian and the Picard language. Compiled by M. Saenko based on data collected from actual language speakers.

 

04.04.2016. Some African updates:

1) The Dizoid group database (Omotic family) has been updated with two wordlists for the Nayi and Sheko languages, compiled and annotated by G. Starostin based on a variety of sources.

2) The Heiban database (Kordofanian family) has been updated with a wordlist for the Laru language, compiled and annotated by G. Starostin based on Th. Schadeberg's wordlist and some additional control sources.

3) A new proto-wordlist, for the Proto-Taa language, has been added to the Taa database (Peripheral Khoisan). Compiled by G. Starostin based on his own preliminary reconstruction.

 

08.03.2016. This week's update consists of three wordlists added to the Romance database: for Galician, (Standard) Portuguese, and (Standard) French. All three wordlists compiled by M. Saenko based on existing dictionaries, grammars, and work with native language speakers.

 

01.03.2016. This week's update:

1) The Germanic database has been updated with a wordlist for the Faroese language, primarily based on Young & Clewer's dictionary with some additional data. Compiled and annotated by G. Starostin.

2) The Athapaskan database is expanded again, with a wordlist for the nearly extinct Sarsi language. Compiled by A. Kassian based on a variety of published sources.

 

22.02.2016. After another long break, we are back with some important updates:

1) The Daju database (East Sudanic section) has been completed, with five new wordlists for Sila, Nyala, Eref, Lagowa, and Nyalgulgule varieties of Western Daju. Compiled by G. Starostin based on works by Robin Thelwall and a variety of newer control sources.

2) In the South American section, a new database for Cahuapanan languages is made available, courtesy of the compiler A. Nikulin, with wordlists for the Shiwilu and Shawi languages, based on the most recent sources, as well as a partially reconstructed proto-list for Proto-Cahuapanan.

 

29.12.2015. Our last update for this year, once again, comes in several parts:

1) More Athapaskan wordlists – this time, for the Central Carrier and Koyukon languages. Compiled and extensively annotated by A. Kassian based on a large variety of existing sources (dictionaries, grammars, texts, etc.).

2) More South American wordlists – for the Arikapú and Djeoromitxí languages of the Jabuti group (Macro-Je family). Compiled and annotated by A. Nikulin based on several recent dictionaries; also accompanied by his own reconstruction of the Proto-Jabuti wordlist.

3) A couple Kordofanian wordlists added to the Niger-Congo section, for the closely related Ebang (= Heiban) and Abul (= Abul-Heiban) dialects. Compiled and annotated by G. Starostin based on Thilo Schadeberg's fieldwork plus additional sources.

 

19.11.2015. We are alive and well, with a whole new bunch of updates:

1) In the North American section, a wordlist for the Lower Tanana language, based on James Kari's dictionary and a variety of other sources, has been compiled/annotated by A. Kassian and added to the Athapaskan database.

2) We finally have our first Australian wordlist online, for the Gunwinyguan language Ngalakan, compiled and annotated by M. Zhivlov based on F. Merlan's vocabulary and grammar.

3) Coverage of the East Sudanic family goes on with two wordlists for Daju languages Logorik (= Lagori) and Caning (= Shatt), compiled and anno­tated by G. Starostin on the basis of a variety of sources (most notably Robin Thelwall's published fieldwork).

4) Another addition to the Nilo-Saharan section is a set of wordlists for four dialectal varieties of the Gumuz language (Sai, Sese, Metemma, Gojjam), also compiled and annotated by G. Starostin. The primary source is a comparative paper by M. L. Bender, but many newer sources have been consulted as well for additional verification.

5) Finally, the Karen database (Sino-Tibetan section) has been expanded with a wordlist for Eastern Kayah Li, based on D. Solnit's monographic description of this language. Compiled and annotated by G. Starostin.

 

11.10.2015. Two updates:

1) In the North American section, a wordlist for the Ineseño (Samala) language of the small Chumashan family has been compiled and annotated by M. Zhivlov, based on the works of R. B. Applegate and the Santa Ynez Band of Chumash Indians.

2) In the African section, a wordlist for the Dizi language (Dizoid group, Omotic family) has been compiled and annotated by G. Starostin based on recent work by M. D. Beachy and its comparison with older Swadesh wordlists published by a variety of researchers.

 

05.10.2015. Update in the East Sudanic section: three wordlists for the small Temein group of Nuba Mountain languages (Temein proper, Doni /= Keiga Jirru/, and Tese /= These, Teisei umm Danab/), drawn largely from the manuscripts of Roland C. Stevenson and compared with some additional sources, have been compiled, annotated, and uploaed by G. Starostin.

 

25.09.2015. Three updates today:

1) A wordlist for Mesa Grande 'Iipay has been uploaded to the Yuman database (Hokan family). Compiled and annotated by Mikhail Zhivlov based on several sources from the 1970s.

2) Two wordlists for the Pharasa and the Cappadocian (Aravan) dialects of the Greek language have been compiled and heavily annotated by Alexei Kassian, based on a variety of old sources.

3) The up-to-now largely empty section on South America gets a small, but significant boost from our most recent contributor, André Nikulin: three wordlists for languages of the Nadahup group (Hup, Dâw, Nadëb) have been compiled and annotated by him, based on largely recent sources. More to come on South America in the near future!

 

05.09.2015. Two updates to kick off the fall season:

1) A wordlist for Southern Tsakonian Greek, based on major comprehensive dictionaries as well as some fairly recent sources, has been compiled and annotated by Alexei Kassian for the Greek database.

2) The East Sudanic section has been expanded with a database for two languages of the small Nyimang group in the Nuba Mountains: Ama (or Nyimang proper) and Afitti (Dinik). Both lists compiled and annotated by G. Starostin based on a large variety of older and more recent sources.

 

18.08.2015. Two updates in the African section:

1) Finally, a proto-wordlist has been uploaded for Proto-!Wi, the common ancestor of the largest known (but mostly extinct) branch of South Khoisan (= !Wi-Taa). Although the reconstructions are quite preliminary and approximate (largely due to the unreliable nature of the data), they nevertheless illustrate some interesting and important diachronic phonetic and semantic processes in !Wi languages (due to the accompanying notes section) and may be used for external lexicostatistical comparison on higher levels. All work on the reconstructions performed by G. Starostin.

2) A wordlist for the Gaam (= Gaahmg, Ingassana) language of the Jebel group of Eastern Sudanic has been compiled and annotated by G. Starostin based on a comparative set of older and more recent data (the primary source is M. L. Bender and Malik Ayre's dictionary of the language). This completes the Jebel database, except for a proto-wordlist.

 

06.08.2015. A small, but important update: a wordlist for the deeply isolated Shabo language of Ethiopia has been added to the Nilo-Saharan section (very schematically, since there is really no strong evidence that would tie Shabo to «Nilo-Saharan») of the site. Compiled and annotated by G. Starostin based on a variety of published sources.

 

03.08.2015. Another massive update of the Romance database today: three wordlists for Venetian dialects (Venice, Primiero, Bellunese), four wordlists for Sicilian dialects (Palermitan, Messinese, Catanian, South-Eastern), six wordlists for Catalan dialects (Central, North-Western, Minorcan, Castello de la Plana, Valencian, Manises), one wordlist for Castilian Spanish, and one wordlist for Provençal Occitan. All of the wordlists, as usual, were compiled and annotated by M. Saenko based on information from native speakers, except for the Provençal wordlist, compiled from a 1995 dictionary.

 

01.07.2015. Two updates:

1) A wordlist for Standard Swedish added to the Germanic database. Compiled and annotated by G. Starostin on the basis of several large dictionaries. All the major «literary» Scandinavian languages are now included.

2) A dialect for Grosseto Tuscan (Italian), compiled and annotated by M. Saenko based on information from native speakers, has been uploaded to the Romance database.

 

27.06.2015. More updates:

1) Still more Romance wordlists from: this time, for two dialects of Lombard (Bergamo and Plesio), the Neapolitan dialect, and one more dialect of Sardinian (Campidanese). Original data collected from native speakers, compiled and annotated by Mikhail Saenko.

2) Two Bantu wordlists, for the Mwetug and Elung dialects of the Akoose language, added to the newly created «Bantu A» database. Compiled and annotated by G. Starostin based on publications by Robert Hedinger.

3) A new database for the «Jebel» or «East Jebel» language group added to the East Sudanic section of the site. The database contains four wordlists for the minor tribal languages Aka, Molo, Kelo, and Beni Sheko. Compiled and annotated by G. Starostin based mainly on comparative data from works by M. L. Bender (also taking into consideration the earlier and less reliable data by E. Evans-Pritchard).

 

17.06.2015. Another large update:

1) More Romance wordlists from Mikhail Saenko, this time with a focus on Italy: Ravennate Romagnol, 3 dialects of Emiliano (Ferrarese, Carpigiano, Reggiano), 3 dialects of Ligurian (Genoese, Stella and Rapallo), a wordlist for Literary Italian, and, finally, a wordlist for the very distinct Logudorese dialect of Sardinian. This brings up the total number of lects in the Romance database to 25 and makes it the single largest database on the site in terms of the number of languages covered - good job! Big thanks also to all the native informants (listed in the database description) who provided the raw materials for M. Saenko.

2) The Surmic database has been more or less completed (except for the proto-wordlist) with the inclusion of data for the single known North Surmic language, Majang (Masongo), compiled from a variety of sources (mostly notes by M. L. Bender, but cross-checked with earlier data) by G. Starostin. Also, the notes sections on several other Surmic languages have been expanded with annotated items from M. L. Bender's wordlists in his comparative survey on the languages of Ethiopia (1971) - a useful source, albeit suffering from a number of phonetic and semantic inaccuracies (which makes it all the more important to comment on these inaccuracies whenever they are detected).

3) The Gondi-Kui database has been expanded with a wordlist for Kui, put together on the basis of W. W. Winfield's classic dictionary for the Udayagiri dialect and cross-checked with data from two newer sources on the Balliguda and Kuṭṭiya dialects. Compiled and annotated by G. Starostin.

 

22.05.2015. HUGE update today:

1) No less than eight new wordlists, many of them containing totally original data, added to the Romance database. The lists are for the literary Rumantsch Grischun language; three different colloquial dialects of Romansh (Sursilvan, Surmiran, and Vallader); and four colloquial dialects of the Piemontese language (Lanzo Torinese, Barbania, Carmagnola, and Vercellese). Compiled and annotated by Mikhail Saenko largely on the basis of surveys recently completed by native speakers, but also taking into consideration previously available descriptions and dictionaries.

The other updates are in the African section:

2) Two wordlists (unfortunately, containing serious gaps) for the Akunnu and Ekiromi (Ikorom) dialects of the Akpes language, constituting a separate subbranch of the Benue-Congo family, compiled and annotated by G. Starostin based on several brief wordlists and comparative studies.

3) Two wordlists added to the Nubian database, for the Karko and Wali varieties of Hill Nubian. Compiled and annotated by G. Starostin, based on the recent SIL survey of several Kordofanian languages by Amy Krell.

4) A wordlist for the Kwegu (Koegu) language, compiled and annotated by G. Starostin based on materials of M. Yigezu and O. Hieda, has been added to the Surmic database. This completes the South Surmic subsection of the base.

 

11.05.2015. Two updates:

1) Another three wordlists added to the Romance database by Mikhail Saenko: one for Friulian, and two for dialects of Ladin: Gardenese and Fassano. Compiled and annotated based on existing dictionaries as well as original consultations with native speakers.

2) Three more wordlists added to the Hmongic database: Northern Pa-Hng, Southern Pa-Hng, and Hm-Nai (Wunai). Compiled and annotated by G. Starostin based on Mao Zongwu and Li Yunbing's materials, originally published in 1997.

 

27.04.2015. Two updates in the African section:

1) A wordlist on the Me'en language (Surmic group) has been compiled and annotated by G. Starostin on the comparative basis of several different sources, with elements of dialectal comparison.

2) A wordlist for the Rere dialect of the Koalib language (Heiban group, Kordofanian family) has been compiled and annotated by G. Starostin on the basis of Thilo Schadeberg's original Swadesh wordlist, contrasted with later publications on Koalib phonology by Nicholas Quint. More wordlists on Kordofanian languages should be expected within the year.

 

15.04.2015. Two updates:

1) A new Romance wordlist from Mikhail Saenko, this time for the extinct Dalmatian language, based on a comprehensive source that summarizes nearly all of the recorded data on Dalmatian, extracted from the last known speaker, Antonio Udina, in the late 19th century.

2) Another wordlist in the Sino-Tibetan section: Geba Karen, based on a recent description of the grammar of this language accompanied with a list of basic lexicon. Compiled and annotated by G. Starostin.

 

02.04.2015. The first wordlist for a Mande language has been added to the Niger-Congo section of the site: Bobo (Southern Bobo Madaré), culled from Le Bris & Prost's dictionary and compared with the results of a more recent dialectal survey. Compiled and annotated by G. Starostin.

 

23.03.2015. Two updates:

1) Two more wordlists compiled and annotated by Mikhail Saenko: one on the Aromanian language and another one on Standard Romanian (the literary variety), allowing now for a comprehensive lexicostatistical analysis of the Eastern Romance (Vlach) languages.

2) A wordlist on the Mursi language (Surmic group) compiled and annotated by G. Starostin on the comparative basis of three different sources.

 

11.03.2015. Two Indo-European updates:

1) In the Romance database, a wordlist for the Istro Romanian language has been compiled and annotated by Mikhail Saenko, based on a variety of old and new sources.

2) In the Germanic database, a wordlist for the standard Danish language has been compiled and annotated by G. Starostin.

 

25.02.2015. Two updates:

1) A wordlist for the Megleno Romanian language has been compiled and annotated by Mikhail Saenko, our most recent contributor, based on several available sources. More wordlists for various forms of Romanian and other Romance languages are to be expected within the year.

2) A wordlist for the Serer language (Fula-Serer group of the alleged North Atlantic family) has been compiled and annotated by G. Starostin, based on L. Crétois's enormous dictionary and an auxiliary control source.

 

14.02.2015. Another Surmic wordlist, for the Chai dialect of the Suri language, has been compiled from three different sources, with M. Yigezu's (2001) data supported by two earlier publications as control sources. Compiled and annotated by G. Starostin.

 

31.01.2015. A list for the Pengo language added to the Dravidian section of the site. Compiled and annotated by G. Starostin on the basis of T. Burrow and S. Bhattacharya's description (1970).

 

20.01.2015. Two updates:

1) A long-awaited revival of the Athapaskan database: a new list added for the Upper Tanana language, compiled and annotated by A. Kassian on the basis of the most recent sources, with some aid from Paul Milanowski.

2) The Surmic database has been updated again with a list for the Baale (Kacipo-Balesi) language, compiled and annotated by G. Starostin on the basis of M. Yigezu's and Gerrit J. Dimmendaal's data from 1998-2001.

 

23.12.2014. The Hmong database is updated with wordlists for the Xiaozhai and Huangluo (sub)dialects of the Younuo language, compiled and annotated by G. Starostin, based on data published by Mao Zongwu and Li Yunbing in 2007.

 

16.12.2014. One more list added to the Surmic database, for the Murle language, based on M. Yigezu's list (2001) with R. Lyth's old description of the language used as an important secondary source. Compiled and annotated by G. Starostin.

 

22.11.2014. A list has been added for the Bwe Karen language, based on Eugenie Henderson's dictionary of that language. More lists will be added to the Karen database (Sino-Tibetan subsection) in the next year.

 

06.11.2014. Two updates in the African section:

1) A list added for the Tennet language of the Surmic group, compiled and annotated by G. Starostin on the basis of M. Yigezu's list (2001) as well as independent research by Scott Randal.

2) Two more lists added to the Krongo-Kadugli database, for Keiga (Deiga) and Tumtum languages, compiled and annotated by G. Starostin on the basis of Th. Schadeberg's comparative wordlists and additional data by M. Reh. This completes the Krongo-Kadugli database, pending the inclusion of a reconstructed proto-wordlist for this rather unique group of languages in the Nuba Mountains.

 

25.10.2014. Another Surmic list added, one for the Didinga language, compiled and annotated by G. Starostin based on a comparison of M. Yigezu's list (2001) with several older sources.

 

25.09.2014. The large Surmic group of languages (East Sudanic family) is now represented with a wordlist for the Narim (Longarim) language, compiled and annotated by G. Starostin based on several available sources. More to follow.

 

16.09.2014. A small addition to the Dravidian section: The Gondi-Kui («Gondwan») database has been opened with the construction of a wordlist for the Manda language, based on a recently published dictionary as well as older fieldnotes by Th. Burrow and S. Bhattacharya. Compiled and annotated by G. Starostin.

 

20.08.2014. Two updates:

1) The (particularly obscure!) Slavic group of languages is finally represented on the site with a wordlist for Macedonian (Dihovo dialect), compiled and annotated by A. Kassian with supplementary data on other Macedonian dialects.

2) The Hmong database is updated with wordlists for the Longhua and Liuxiang dialects of the Jiongnai language, compiled and annotated by G. Starostin, based on data published by Mao Zongwu and Li Yunbing in 2002.

 

15.08.2014. Two updates:

1) The Bai database has been updated with a wordlist for Bijiang (Northern) Bai, the most divergent variety of this macrolanguage. All the three main dialects are now represented in the database. Compiled and annotated by G. Starostin based on the same sources as older lists for Jianchuan and Dali Bai.

2) The Dargwa database (North Caucasian section) has been initialized with some GLD-exclusive content, kindly supplied and annotated in GLD format by Oleg Belyaev from his own field data on three poorly studied dialects: Shiri, Amuzgi, Ashti Dargwa.

 

03.08.2014. The Tsezic database has been completed with a proto-wordlist for Proto-Tsezic, reconstruc­ted by A. Kassian in accordance with general GLD methodology based on available lexical data and the original Proto-Tsezic reconstruction by S. Nikolayev in the «North Caucasian Etymological Dictionary», with a few modifications.

 

28.07.2014. Two updates:

1) The Krongo-Kadugli database has been expanded with a wordlist for Krongo, compiled and annotated by G. Starostin on the basis of M. Reh's description and Th. Schadeberg's comparative wordlist.

2) The Sinitic database sees the arrival of the first two wordlists on non-Mandarin Chinese "dialects": Jian'ou Min and Wenchang Hainanese (also Min), compiled and annotated by Elena Kuzmina based on several available sources.

 

15.07.2014. A wordlist for Bokmål Norwegian (based on the current orthographic norm and general usage as indicated in some of the most recent dictionaries) has been compiled and annotated by G. Starostin and included in the Germanic database.

 

04.07.2014. Design update: We have now introduced nice icon links that take the user directly from the name of the language in the database to the corresponding info not only in the Ethnologue, but also in the Glottolog language list. The system is now working in test mode on several databases (Bantu-F, Bantu-L, Upper Sepik), but will gradually be expanded to the entire system of databases. This will help the user get easy extra access to additional information, including Glottolog's large bibliographical lists.

 

26.06.2014. A new complete database in the East Sudanic section: 6 wordlists for the Tama language group (Tama, Erenga, Sungor, Miisiirii, Ibiri, Abuu Shaarib), compiled from the comparative vocabulary of Tama languages by John Edgar and an additional source for Tama proper and annotated by G. Starostin, have been uploaded.

 

19.05.2014. Two updates:

1) The Greek database has been updated with a list for Ancient Attic Greek, based on the idiolect of Plato as reflected in the latter's collected works. Compiled and annotated by A. Kassian.

2) The Northeastern Dravidian database has been updated with a list for the Malto language, compiled and annotated by G. Starostin based on one old and one relatively recent dictionary.

 

26.10.2014. Another double update for April:

1) The Tsezic database has been updated with a wordlist for Sagada Dido and two dialects of Khwarshi (Khwarshi proper and Inkhokwari), compiled and annotated by A. Kassian based on recent fieldwork by A. Abdulaev and R. Karimova.

2) Two wordlists for the Nara language have been added to the Eastern Sudanic subsection of the African section of the site. Compiled by G. Starostin based on the material of a 19th century source («Old Nara») and on M. L. Bender's data, collected in the 1960s («Modern Nara»).

 

04.10.2014. After a month without updating, three new contributions, two of them provided by new participants:

1) The Greek database is updated with a wordlist for Modern Demotic Greek, freshly collected by Alexandra Evdokimova from native speakers, and converted into GLD format by A. Kassian.

2) We are introducing a new database for Iranian languages, with two wordlists for the principal dialects of Ossetic (Iron and Digor), also freshly collected by Oleg Belyaev from native speakers, and converted into GLD format (with additional etymological annotations) by A. Kassian.

3) The Hmong database has been updated with three wordlists for different dialects of the Bunu language (Bunu proper, Baonao, and Numao), compiled by G. Starostin based on recent Chinese sources.

 

03.06.2014. Two updates:

1) The Miwokan database is completed (except for the proto-list) with a wordlist for Lake Miwok, compiled and annotated by M. Zhivlov.

2) The Nubian database is also completed (except for the proto-list) with wordlists for Birgid and Midob Nubian, compiled and annotated by G. Starostin.

 

02.21.2014. A wordlist for Dali Bai has been added to the Bai database, compiled by G. Starostin based on the same sources as the ones earlier used for Jianchuan Bai.

 

02.11.2014. Wordlists for Kadaru and Debri, two small Hill Nubian languages / dialects, have been added to the site based on published selections from R. C. Stevenson's materials. Unfortunately, the wordlists contain multiple gaps, but nevertheless remain of some use.

 

02.01.2014. Two updates:

1) A wordlist for the Kidero dialect of the Dido language, compiled and annotated by A. Kassian, (temporarily) completes the Tsezic database.

2) Three more wordlists added for the Krongo-Kadugli languages (Tulishi, Kanga, Tumtum), compiled and briefly annotated by G. Starostin based exclusively on comparative wordlists by Thilo Schadeberg.

 

01.23.2014. The Nubian database is expanded with the first wordlist for a Hill Nubian language, Dilling (Deleny), compiled and annotated by G. Starostin based on D. Kauczor's grammatical description and auxiliary sources.

 

01.09.2014. After some temporary trouble (relocation to a new server), the GLD is back up and functioning as always, with two first updates of the new year:

1) The Tsezic database is expanded with a list for the Hinukh language, compiled and annotated by A. Kassian;

2) A wordlist for Modern Icelandic added to the Germanic database by G. Starostin, based on some bilingual dictionaries (and further tested on a random selection of Internet sources for additional precision). In the process of comparing Modern Icelandic forms with their Old Norse equivalents, a few mistakes have also been corrected for the Old Norse list (some of them, with the assistance of I. Sverdlov).

 

12.24.2013. A wordlist for the Kurux language has been uploaded to the newly added database for the Northeastern group of Dravidian languages. Compiled by G. Starostin on the basis of A. Grignard's classic dictionary, compared with a recent SIL survey.

 

12.04.2013. Two updates:

1) The Hmong database has been expanded with a list for Hmong Njua, compiled and annotated by G. Starostin from Th. Lyman's dictionary.

2) The Tsezic database has been expanded with three lists for Bezhta (Bezhta proper; Khoshar-Khota; Tlyadal), compiled and annotated by A. Kassian based on a variety of old and recent sources.

 

11.20.2013. Two updates:

1) A new database in the American section: the Uto-Aztecan family (Takic group) is introduced with a wordlist for Cahuilla, compiled by M. Zhivlov on the basis of a recent dictionary.

2) A new database in the Sino-Tibetan section: the Bai cluster of dialects is introduced with a wordlist for Jianchuan Bai, compiled by G. Starostin on the basis of two comparative sources.

 

10.30.2013. Two updates:

1) A new database in the North Caucasian section - Tsezic, for now, with only one language (Hunzib), soon to be expanded with more. Compiled and annotated by A. Kassian on the basis of both recent and older lexicographic material.

2) The Krongo-Kadugli database has been expanded with wordlists for the closely related Kadugli (proper) and Miri languages. Compiled and annotated by G. Starostin on the basis of field data published by Thilo Schadeberg and other authors.

 

10.19.2013. A wordlist for Kenuzi Nubian has been added to the Nubian database; this exhausts the list of all the languages in the Nile-Nubian subgroup. Compiled by G. Starostin.

 

10.13.2013. Two updates:

1) The Athapaskan database has been expanded with a new wordlist for the Tanacross language, compiled by A. Kassian.

2) The Sino-Tibetan section of the site has been expanded with a new database that contains five wordlists for three subdialects of Northern Tujia (Tasha Tujia, Duogu Tujia, Dianfang Tujia) and two subdialects of Southern Tujia (Boluo Tujia, Tanxi Tujia), based on fieldwork published in Chinese and European sources. Compiled by G. Starostin.

 

09.20.2013. Two updates:

1) Two more wordlists added to the Hmong database, for the Chuanqiandian Hmong and Diandongbei Hmong dialects, spoken in China. Compiled and annotated by G. Starostin on the basis of comparative Hmong-Mien lexical data, published in 1987.

2) Three wordlists added for different varieties of Miwok (Bodega Miwok, Central Sierra Miwok, Southern Sierra Miwok), based mostly on dictionaries and grammatical descriptions from the 1960s-1970s. Compiled and annotated by M. Zhivlov.

 

09.12.2013. A wordlist for the Khinalug language, compiled by A. Kassian based on F. Ganieva's dictionary and older sources, has been uploaded to the North Caucasian section of the site in its own database.

 

08.14.2013. Two updates:

1) A wordlist for the Katcha language (Krongo-Kadugli group, of disputable affiliation) has been added to the African section. Compiled and annotated by G. Starostin based on the published fieldwork of Thilo Schadeberg and Roland Stevenson.

2) A wordlist for the Klon language (Bring dialect), based on a variety of new and old sources, has been added by A. Kassian to the former Alor (now Alor-Pantar) database in the small New Guinean section of the site.

 

07.31.2013. Six new wordlists for different varieties of the Pomoan languages (Kashaya; Northern, Central, Northeastern, Southeastern, and Southern) have been added to the Pomo database by M. Zhivlov. The annotated lists are mostly based on Robert L. Oswalt's publications, with some additional sources also considered.

 

07.28.2013. It's been long in the making, but it's finally here: a reconstructed Swadesh wordlist for Proto-Yeniseian, compiled, annotated, and explained in detail by G. Starostin, based primarily on S. A. Starostin's Proto-Yeniseian reconstruction, but with numerous modifications through additional phonetic, semantic, and distributional analysis of cognate forms.

 

07.25.2013. The Nubian database is expanded with a list for Dongolawi (Dongolese) Nubian, culled by G. Starostin from Charles Armbruster's classic dictionary and cross-referenced with G. von Massenbach's earlier data.

 

06.10.2013. Large update:

1) A wordlist for the Dogrib language, compiled by A. Kassian, has been added to the Athapaskan database.

2) A wordlist for the Plains Miwok language, compiled by M. Zhivlov, has been added to the Miwok database.

3) The Hmong database has been expanded with a new wordlist for Qiandong Hmong, compiled by G. Starostin.

4) The Dravidian language family makes its first appearance in the GLD with a wordlist for Brahui, compiled by G. Starostin based on Denis Bray's classic dictionary.

 

05.11.2013. Two updates:

1) The Athapaskan database has been expanded with new wordlists for Central and Mentasta Ahtena dialects, compiled by A. Kassian on the basis of James Kari's dictionaries and additional sources.

2) The Benue-Congo section of the site now has a «Bantu-S» database with a wordlist for the Xhosa language, re-edited by G. Starostin from an older version by Ye. Chekmeneva. More Southern Bantu wordlists to be expected within the year.

 

04.28.2013. A wordlist for the moribund Konkow language has been added to the Maidu database. Compiled by M. Zhivlov.

 

04.17.2013. The Burushaski database has been completed with a wordlist for Hunza Burushaski (the third Burushaski dialect, Nagar, is not differentiated from Hunza on a lexicostatistical basis). Compiled by G. Starostin, based on H. Berger's data. Some inaccuracies in the Yasin wordlist corrected as well.

 

04.08.2013. A wordlist for modern Nobiin (= Fadidja-Mahas) has been added to the Nubian database, compiled by G. Starostin on the comparative basis of several recent and older sources.

 

03.28.2013. A wordlist for the nearly extinct Washo language isolate (sometimes tentatively grouped with Hokan, but such an affiliation is highly questionable) has been added to the American section of the site. Compiled by M. Zhivlov, based on William H. Jacobsen's research.

 

03.27.2013. The recent conference on «Comparative-Historical Linguistics of the XXIst Century», held in RSUH, Moscow, on March 20-22, features presentations from all the major contributors to the GLD project as well as numerous other specialists in the field. The program, materials, and even videos of the conference can be located on the «Meetings» page of the «Tower of Babel» site.

 

03.26.2013. The Athabaskan database (renamed from former Pacific Coast Athabaskan) has been expanded to include wordlists for four different dialects of the Tanaina language, based on dictionaries by J. Kari and other sources. Compiled by A. Kassian.

 

02.24.2013. Two more wordlists added to the Ekoid database, for the closely related Ekparabong and Balep dialects (extracted from D. Crabb's comparative wordlist).

 

02.13.2013. The first list for a «Nilo-Saharan» language uploaded today: Old Nubian, with 75 Swadesh items extracted from Gerald M. Browne's dictionary, opens the brand new Nubian database. Compiled by G. Starostin.

 

01.29.2013. Two updates:

1) The Hokan section of the site has been expanded with a database for the extinct Yana group, containing the wordlists for Northern Yana, Central Yana, and Yahi dialects, documented by E. Sapir in the early 20th century. Compiled by M. Zhivlov.

2) The Germanic database has been expanded with a wordlist for Old Norse, compiled by G. Starostin based on Cleasby's dictionary and cross-checked against earlier lists.

 

01.03.2013. A wordlist uploaded for the Yasin dialect of the Burushaski isolate, based on H. Berger's published materials (compiled by G. Starostin).

 

12.19.2012. Two new lists uploaded today:

1) A wordlist for the extinct Shasta language (of the small and completely extinct Shastan group), hypothetically belonging to the Hokan family; based on a selection of sources dating mostly to the 1950s / 1960s. Compiled by M. Zhivlov.

2) A wordlist for the click language Hadza, an isolate of Tanzania traditionally assigned to the "Khoisan" macrofamily, but without any sufficient basis. Compiled by G. Starostin mostly on the basis of relatively recent fieldwork by B. Sands, but adding data from numerous older sources as well. With the addition of this wordlist, all of the "Khoisan" languages / dialects for which sufficient amounts of data have been attested are now properly represented in the GLD, without a single exception.

 

12.14.2012. A wordlist for the isolated (possibly distantly related to the Central Khoisan family) language Sandawe, based on recent fieldwork publications as well as adding comparative data from several earlier sources, has been uploaded.

 

12.12.2012. The Pacific Coast Athapaskan database has been expanded with a wordlist for Taldash Galice, an extinct dialect, data on which was collected by H. Hoijer and H. Landar from the last living speaker in the 1960s-1970s.

 

12.04.2012. A small list for the extinct Kwadi language uploaded in the Central Khoisan section. Unfortunately, only a little over 50% of the entries could be filled in due to the extreme scarceness of data; nevertheless, the list was still included due to the importance of this link for Khoisan studies.

 

12.02.2012. Four lists altogether uploaded on this day — all of them, incidentally, on languages that are no longer living:

1) The Pacific Coast Athapaskan database has been expanded with a wordlist for the extinct Kato, based mainly on P. E. Goddard's fieldwork.

2) A new database on the Chimariko isolate, with data mainly taken from E. Sapir's field notes, added to the Hokan section.

3) The Yeniseian database is finally completed (except for the proto-wordlist) with lists for the long-extinct Arin and Pumpokol (the latter with some significant gaps due to scarceness of data), constructed from available XVIIIth century sources.

 

11.17.2012. Two updates:

1) The Pacific Coast Athapaskan database has been expanded with a wordlist for Mattole, based primarily on Li Fang-kuei's description from the 1930s.

2) The Kalahari Khoe database has been expanded with a wordlist for the Hiechware database, based on S. S. Dornan's old description. This completes the database as far as all attested dialects, apt for lexicostatistical analysis, are concerned.

 

11.14.2012. The Lezgian database has finally been capped off with a wordlist for Proto-Lezgian, reconstructed based on GLD standards by A. Kassian, with extensive notes justifying the details.

 

11.07.2012. Two updates:

1) The Yeniseian database is expanded with a wordlist for the long-extinct Kott, compiled mostly from M. Castrén's data, originally published in 1858, with the addition of materials from even earlier sources.

2) The Yuman database has been expanded with wordlists for Yavapai and Jamul Tiipay, compiled from several recent sources on these languages.

 

10.11.2012. Large update to the former West Khoe database, now retitled "Kalahari Khoe" and including seven more wordlists on minor Khoe languages of Botswana: Cara, ǀXaise, Danisi, Ts'ixa, Deti, Kua, Tsua. All the data have been extracted from publications based on fieldwork carried out by R. Vossen in the 1980s.

 

09.26.2012. Another wordlist added to the Ekoid Bantu database: this time, for the Ekajuk language, with limited information on dialectal variety.

 

09.25.2012. The Cocopa list has been added to the Yuman database (Hokan family).

 

08.21.2012. Three more wordlists added to the West Khoe database, for the ǂHaba, ǀGui, and ǁGana languages of Botswana.

 

08.09.2012. First two wordlists added to the Coast Salish database: Upriver Halkomelem and Island Halkomelem Salish, based on recent comprehensive dictionaries for these dialects and personal communication with the authors. Both wordlists have been compiled by Elena Barreiro, our newest contributor; more Salish data are expected in the near future.

 

08.03.2012. After a month-long break, finally the next update: a wordlist for the extinct Yugh dialect (closely related to Ket), based on H. Werner's and earlier sources and including comparative notes on Common Ket-Yugh. A few mistakes corrected in the proper Ket section of the database as well.

 

06.30.2012. Last couple of updates for June:

1) A wordlist for the Lezgi language (Gyune dialect), along with comparative notes on numerous other Lezgi dialects based on a variety of sources;

2) A wordlist for the Naro language (West Khoe subgroup), based on two dictionaries and R. Vossen's comparative notes.

 

06.21.2012. Another Hmong-Mien update: a list for Hmong Daw (White Hmong) has been added, based on E. Heimbach's detailed dictionary of this language.

 

06.09.2012. A new list added for the Tol (Eastern Jicaque) language (Jicaquean group, possibly of the Hokan family).

 

06.07.2012. Another update in the Lezgian group database: this time, with two wordlists for two different dialects of the Tabasaran language (Northern and Southern), based on a variety of old and recent sources.

New feature: The «Language Comparison» option on the main page of the site («Lists for specific languages») has been significantly upgraded. It is now possible not only to view any two different wordlists for any two languages side by side, but also to highlight phonetically similar forms betwe­en them (similarity is determined based on the same algorithm as the «objectively generated tree of lexical similarity», see here for details). This is particularly useful for determining the quantitative differences between the numbers of accidental look-alikes on lexicostatistical lists and the ave­rage numbers of true cognates that still preserve archaic phonological shapes.

 

06.01.2012. Another Khoisan update: new list uploaded for the Kxoe (Khwe) language (Central Khoisan family), based on a recent dictionary by Christa Kilian-Hatz and older works by O. Köhler.

 

05.26.2012. A new list added for the Highland Oaxaca Chontal language (Tequistlatecan group, possibly of the Hokan family).

 

05.12.2012. Two new lists added to the Ekoid database: Nkum and Nnam.

 

05.06.2012. The Lezgian group database has been expanded with five wordlists for five different dialects of the Aghul language (Keren, Koshan, Gequn, Fite, and Aghul proper), based on a variety of old and recent sources.

 

05.02.2012. Uploaded a list for the extinct Khoekhoe language !Ora, drawn from two short vocabularies published in 1920 and 1930. This comple­tes the Khoekhoe database, since not enough data are available on the remaining extinct members of the group to perform proper lexicostatistics.

 

04.29.2012. A list for the Gothic language, compiled with ample references not only to dictionaries, but to the existing text corpus as well (Ulfilas' Bible), initiates the new database for Germanic languages.

 

04.10.2012. A big day for updates: [1] The Lezgian group database has been expanded with three wordlists for three different dialects of Rutul (Mukhad, Ixrek, and Luchek), based, as usual, on a mix of old and recent sources. A few updates to other Lezgian wordlists have also been made.

[2] A list for Maidu (Maiduan group, Penutian family) has been compiled, based on F. Shipley's dictionary (1963).

[3] The Khoekhoe lexicostatistical database is initiated with a wordlist for Nama (Khoekhoegowab), compiled from the recent highly informative dic­tionary by W. Haacke & E. Eiseb (with references to the older Krönlein-Rust dictionary as well).

[4] Finally, the Sinitic database has been expanded with a list for Standard Chinese (= Pǔtōnghuà or Standard Mandarin). With the aid of the ac­companying comments, it is now possible to trace, in details, the evolution of Chinese basic lexicon from Early Zhou (XI-VIIIth centuries BC) to mo­dern times, by following the database.

 

03.26.2012. Two updates: [1] The Taa database (Peripheral Khoisan family) is completed (except for the proto-list) with a wordlist for Nǀuǁen (Nǀusan), extracted, like the list for Kakia, from D. F. Bleek's semi-reliable materials. It is the third and last dialect of Taa for which enough data exist to make it suitable for lexicostatistical purposes.

[2] The Hmong-Mien languages make their first appearance on the GLD site with a list for Xiangxi Hmong, based on cross-examination of data from one general and one comparative lexicographical source. New wordlists for other varieties of Hmong may be expected before the year is out.

 

03.13.2012. The Taa database (Peripheral Khoisan family) is expanded with a wordlist for Kakia (Masarwa), extracted from D. F. Bleek's materials: not a very reliable source, but the only one that exists for this presumably extinct dialect.

 

03.05.2012. Two new lists added to the Ekoid database: Nselle and Nta (dialects of the Nde-Nselle-Nta cluster; lexicostatistics based on available data shows practically no lexical discrepancies between the three).

New feature: The "Build a tree" procedure on the site now includes the option "Show lexicostatistical matrix", which yields all cognacy percentages between the languages in the database in the form of a standard table (which can be easily copy-pasted into a document).

 

02.29.2012. The Lezgian group database has been expanded with three wordlists for three different dialects of the Tsakhur language (Mishlesh, Mikik, and Gelmets), based on a variety of old and recent sources. A few updates to the Budukh wordlist as well.

 

02.28.2012. The database for Taa (one of the two branches of South Khoisan, along with !Kwi) is initiated with a wordlist for !Xóõ, the only sur­vi­ving member of this branch, extracted from the extensive dictionary by Anthony Traill and properly annotated.

 

02.23.2012. Two new wordlists added to the West Caucasian database: one for the «literary» Abzhuwa dialect of Abkhaz and a different one for the Bzyb dialect, although no lexical discrepancies in the 100-wordlist have been elicited (there are, however, significant phonetic differences between the two dialects).

 

02.21.2012. A new Hokan list uploaded, this time, for the moribund (if not already extinct) Eastern Pomo language of the Pomo group, based on data published by Sally McLendon.

 

02.13.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǀHaasi (based on one single known recording, made by R. Story in 1937).

 

02.05.2012. New wordlist edited and uploaded for the Ket language, the only survivor of the Yeniseian family, based on data from G. Werner's dic­tionaries, with references to M. Castrén's earlier data from the XIXth century.

 

01.19.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǀʼAuni (based on data collected by D. F. Bleek in 1911 and 1936).

 

01.17.2012. The Lezgian group database has been expanded with a wordlist for the Budukh language.

 

01.13.2012. A new wordlist edited and uploaded for the extinct Abipon (Guaicuruan group, presumably Mataco-Guaicuruan family), based on two vocabularies compiled in the late XVIIIth century.

 

01.07.2012. The !Kwi group (Peripheral Khoisan family) database has been expanded with a wordlist for the extinct language ǁXegwi (based on data mostly recorded in the 1950s).

 

01.02.2012. Two new databases added: (1) A wordlist for Seri (Seri group, presumably Hokan family); (2) Five new wordlists for the Lezgian group (North Caucasian family): Udi (Nidzh and Vartashen dialects), along with additional notes on Common Udi; Archi; Kryts (two dialects - Kryts «proper» and Alyk).

 

12.31.2011. We now have a Facebook group for The Global Lexicostatistical Database. Please join for quicker updates!

 

12.31.2011. The Sinitic group (Sino-Tibetan family) wordlists have been expanded by a list constructed for Late Middle Chinese on the basis of the (semi-)vernacular document, The Record of Linji (≈ IX-X centuries A.D.).

 

11.10.2011. The Ekoid group (Benue-Congo family) database has been expanded with lists from two more languages: Efutop and Nde.

 

10.31.2011. A list for the ancient extinct Hurrian language (Hurro-Urartian group and family) has been uploaded (unfortunately, only 66 out of 110 Swadesh meanings are recoverable from known sources).

 

10.24.2011. Lists for Abé and Abidji (Agneby group, Kwa family) have been uploaded in all formats.

 

10.20.2011. After more than a year in the making, the GLD finally goes public — with 67 different annotated Swadesh wordlists and 2 recon­st­ruc­ted proto-wordlists from 29 language groups of Eurasia, Africa, America, and Papua. New updates coming soon!

 

BACK TO MAIN PAGE                                   DATABASE LIST                              RUSSIAN VERSION

 

     © 2011-2014 George Starostin (site design, data input coordination)
    © 2011-2014 Phil Krylov (programming, technical support)