Annotated Swadesh wordlists for the West Nilotic group (East Sudanic family).

Languages included: Nuer [wnl-nue]; Dinka [wnl-dnk]; Thok Reel [wnl-rel]; Mabaan [wnl-mab]; Jumjum [wnl-jum]; Kurmuk [wnl-krm]; Mayak [wnl-may]; Shilluk [wnl-shl].


I. General

Bender 1971 = Bender, Lionel M. 1971. The Languages of Ethiopia: A New Le\-xicostatistic Classification and Some Problems of Diffusion. In: Anthro\-po\-lo\-gical Linguistics 13(5): 165-288. // A lexicostatistical study of most of the languages of Ethiopia. Includes slightly modified Swadesh wordlists for a large number of Cushitic, Omotic, Ethiosemitic, and Nilo-Saharan (Nilotic, Surmic, Koman, etc.) languages. Unfortunately, the survey suffers from numerous inaccuracies of phonetic transcription and semantic glossing, making it practically unusable as a primary source for any of the languages concerned.

Storch 2005 = Storch, Anne. 2005. The Noun Morphology of Western Nilotic. Köln: Rüdiger Köppe Verlag. // A monograph on the synchronic and diachronic mechanisms of nominal paradigm formation in West Nilotic languages. Includes a lot of illustrative data, much of it collected by the author in person.

II. Nuer

Kiggen 1948 = Kiggen, J. 1948. Nuer-English Dictionary. Nederland: Drukkeri van het Missiehuis, Steyl bij Tegelen. // As of now, this source remains the single largest officially published document on Nuer lexicography. Precise dialect is not indicated, although the dictionary occasionally indicates dialectal variants. Accuracy of transcription is debatable, since the author clearly did not have a full comprehension of the extremely complicated vowel system of Nuer.

Frank 1999 = Frank, Wright Jay. 1999. Nuer Noun Morphology. M. A. Thesis, State University of New York at Buffalo. // Detailed description of Nuer nominal morphology with plenty of well-transcribed illustrative data. Unfortunately, the paradigmatic data are (predictably) limited to large numbers of Nuer nouns.

III. Dinka

Nebel 1979 = Nebel, Arthur. 1979. Dinka-English / English-Dinka Dictionary. Bologna: Editrice Missionaria Italiana. // A large dictionary of Dinka, concentrating on the Rek dialect of the language, but also containing a large number of specially marked dialectal forms. Phonetic transcription (particularly in the area of Dinka's complicated vowel system) is not highly accurate, and tones are not marked at all.

Duerksen 2005 = Duerksen, John et al. 2005. Dinka-English Dictionary. Ms.: SIL. // A large compilation of various sources on Dinka lexicon, including Nebel's dictionary as well as multiple addenda from fieldwork seemingly conducted by SIL personnel. Since the individual sources of data are not indicated, can only be used as an occasional control source for lexicostatistical purposes.

Roettger 1989 = Roettger, Larry; Roettger, Lisa. A Dinka Dialect Survey. In: Occasional Papers in the Study of Sudanese Languages 6: 1-65. // A sociolinguistic and lexicostatistical study of the complex network of Dinka dialects. Includes 150-item wordlists collected for 20 subdialects of the 4 major Dinka dialects, as well as for the separate language of Atuot (Thok Reel). However, phonetic and semantic accuracy of the data are questionable.

Andersen 1987 = Andersen, Torben. 1987. The Phonemic System of Agar Dinka. In: Journal of African Languages and Linguistics 9: 1-27. // A detailed description of the complex phonology of Agar Dinka, based on the author's own fieldwork; well illustrated by accurately transcribed lexical material.

Andersen 2002 = Andersen, Torben. 2002. Case inflection and nominal head marking in Dinka. In: Journal of African Languages and Linguistics 23: 1-30. // This paper on the nominal morphology of Agar Dinka has a lot of information on the paradigmatic behavior of many basic nouns of this dialect.

Andersen 2007 = Andersen, Torben. 2007. Auxiliary verbs in Dinka. In: Studies in Language 31: 89-116. // This paper contains some useful information on Agar Dinka verbs and verbal morphology.

IV. Thok Reel

Reid 2010 = Reid, Tatiana. 2010. Aspects of phonetics, phonology and morphophonology of Thok Reel. Ph.D. thesis, University of Edinburgh. // A detailed description of the phonetic and phonological aspects of Thok Reel, illustrated by a large number of lexical examples, but without an accompanying glossary.

V. Mabaan

Miller 2006 = Miller, Betty. 2006. Mabaan Dictionary. Draft version. Ms., available at:\ // A large dictionary of the Mabaan language, although with some transcriptional inconsistencies and no prosodic notation.

Andersen 1992 = Andersen, Torben. 1992. Aspects of Mabaan Tonology. In: Journal of African Languages and Linguistics 13: 183-204. // Detailed description of the prosodic system of Mabaan, well illustrated by examples and containing notes on the general phonology of the language as well.

Andersen 1999 = Andersen, Torben. 1999. Vowel quality alternation in Ma\-ba\-an and its Western Nilotic history. In: Journal of African Languages and Lin\-guistics 20: 97-120. // Detailed description of the vowel system and vocalic morphophonology of Mabaan.

Andersen 1999b = Andersen, Torben. Anti-Logophoricity and Indirect Mode in Mabaan. In: Studies in Language 23(3): 499-530. // This paper contains some important information on the general verbal system of Mabaan, among other things.

Andersen 2006 = Andersen, Torben. 2006. Layers of number inflection in Mabaan (Western Nilotic). In: Journal of African Languages and Lin\-guistics 27: 1-27. // Detailed description of the nominal morphology of Mabaan, well illustrated by paradigmatic examples.

VI. Jumjum

Andersen 2004 = Andersen, Torben. 2004. Jumjum phonology. In: Studies in African Linguistics 33(2): 133-162. // A detailed description of the phonological system of Jumjum, well illustrated by lexical examples.

Andersen 2006b = Andersen, Torben. 2006. [ATR] reversal in Jumjum. In: Diachronica 23(1): 3-28. // A diachronic study of Jumjum vocalism with comparative data from other West Nilotic languages.

VII. Kurmuk

Andersen 2007b = Andersen, Torben. 2007. Kurmuk phonology. In: Studies in African Linguistics 36(1): 29-90. // A detailed description of the phonological system of Kurmuk, well illustrated by lexical examples.

Andersen 2015 = Andersen, Torben. 2015. Syntacticized topics in Kurmuk: a ternary voice-like system in Nilotic. In: Studies in Language 39(3): 508-554. // A study of some syntactic properties of Kurmuk (topicalization), illustrated by numerous textual examples.

VIII. Mayak

Andersen 1999c = Andersen, Torben. 1999. Consonant Alternation and Verbal Morphology in Mayak (Northern Burun). In: Afrika und Übersee 82: 65-97. // A study of some aspects of verbal morphology and morphophonology of the Mayak dialect of Buruun.

Andersen 1999d = Andersen, Torben. 1999. Vowel harmony and vowel alternation in Mayak (Western Nilotic). In: Studies in African Linguistics 28(1): 1-29. // A study of Mayak morphophonology, illustrated with comparative lexical data on Mayak and Mabaan.

Andersen 2000 = Andersen, Torben. 2000. Number Inflection in Mayak (Northern Burun). In: R. Vossen, A. Mietzner, A. Meissner (eds.). "Mehr als nur Worte...": Afrikanistische Beiträge zum 65. Geburtstag von Franz Rottland. Köln: Rüdiger Köppe Verlag, 29-43. // A study of the basic nominal paradigm of Mayak, well illustrated by examples of nouns in the singular and plural numbers.

IX. Shilluk

Gilley 1992 = Gilley, Leoma. 1992. An autosegmental approach to Shilluk phonology. (Summer Institute of Linguistics and University of Texas at Arlington. Publications in Linguistics, 103.) Dallas: Summer Institute of Linguistics. // Detailed analysis of the phonology and phonetics of Shilluk, with large amounts of illustrative lexical data.

Gilley 2000 = Gilley, Leoma. 2000. Singulars and plurals in Shilluk: a search for order. In: Occasional papers in the study of Sudanese languages 8: 1-21. // Analysis of the number category in Shilluk, well illustrated by examples of nominal paradigms.

Heasty 1937 = Heasty, J. A. 1937. English-Shilluk / Shilluk-English Dictionary. Dolieb Hill, The Anglo-Egyptian Sudan: The American Mission. // One of the two most comprehensive Shilluk dictionaries to date; phonetic transcription, however, is somewhat inadequate (especially in reference to vowels; prosody remains completely unmarked).

Kohnen 1994 = Kohnen, B. 1994. Dizionario Shilluk. A cura di Manuela Brovarone. Roma: Missionari Comboniani. // Another large dictionary of Shilluk, published long after the compiler's death. Transcription is poor, but the dictionary is exceptionally well illustrated by contexts and detailed semantic notations.


I. Nuer.

I.1. General.

Our default source on the Nuer language remains [Kiggen 1948], as the single most comprehensive collection of lexical data, well illustrated by examples / contexts and ideally suited to the extraction of the Swadesh wordlist. For additional control, we have also checked the (not very reliable in itself) 100-item list in [Bender 1971], and also included transcriptions for noun stems from [Frank 1999] (important for the purpose of comparison and reconstruction, since Frank's description of Nuer's vowel system is much more detailed than Kiggen's).

Nuer, like its close relative Dinka, is well-known for its complicated system of vocalic and consonantal gradations in nominal and verbal paradigms; for this reason, we consistently adduce paradigmatic information (noun singulars and plurals; verbs in the infinitive and in the 3rd p. sg.) where it is available in Kiggen's dictionary and/or in Frank's thesis, since this information is critical for external comparison and reconstruction.

I.2. Transcription.

The system of transcription in [Kiggen 1948] is relatively simple, since the author simplifies the complex network of vocalic oppositions. The only amendments introduced are as follows:

(a) long vowels (aa, ee, etc.) have been converted to UTS standards (, , etc.);

(b) palatal affricates c, j are transcribed as ɕ, ʓ;

(c) the opposition between two series of coronal consonants that Kiggen transcribes as t / th, d / dh, n / nh, is converted to UTS t / t̪, d / d̪, n / n̪ respectively. Kiggen indicates that th, dh are pronounced as interdentals (θ, ð), but existing descriptions of Nuer phonetics are in conflict and indicate that the pronunciation of the second series actually varies between interdental and dental (stop) articulation. For reasons of consistency with data in related languages, we prefer to re-transcribe the phonemes as dental stops (it also makes the data more consistent with the marking of the nasal dental consonant, since cannot be properly realised as "interdental").

In the thesis [Frank 1999], consonantal transcription largely agrees with Kiggen, but the vocalic system is analyzed in a completely different way, with many more oppositions in timbre and a major additional feature (breathiness) added as distinctive. The correlations between Frank's system and the UTS, based on Frank's own explanation, is as follows:

[Frank 1999] UTS
a a
a_ æʰ
i i
ɛ ɛ
ɛ̈ ɛʰ
e e
o o
o_ ʋʰ
ɔ ɔ
ɔ_ ɔʰ

II. Dinka.

II.1. General.

The Dinka language, spoken by close to a million and a half native speakers, is usually regarded as a "macrolanguage", with at least four or five major dialect clusters that could formally qualify as separate languages: (a) Northeastern Dinka (Padang-Ageer), (b) Northwestern Dinka (Pan Aru-Ruweng), (c) South-Central Dinka (Agar), (d) Southeastern Dinka (Bor), (e) Southwestern Dinka (Rek). Ideally, one should have at least one representative wordlist from each of these dialects (languages?). However, the situation is difficult: despite a lot of fieldwork conducted with speakers of all these varieties, the only coherent dictionary to focus on one particular dialect, produced so far, is [Nebel 1979], systematically describing the most widely spoken Rek dialect.

The single most useful comparative source on Dinka dialects is [Roettger 1989], with 150-item wordlists provided for no less than 20 different subdialects of the language: (a) Padang dialect: Abiliang, Paloc/Ageer, Dongjol, Ngok-Sobat, Thoi, Rut, Luac, Ruweng, Alor, Ngɔk; (b) Rek dialect: Rek, Luac, Twic, Malual; (c) Agar dialect: Agar, Aliab, Gɔk, Ciec; (d) Bor dialect: Bor, Twic, Nyarweng, Ghɔl. Theoretically, although including all of this data in our wordlists would be overkill (most of the individual subdialects within one dialect have completely or almost completely coinciding Swadesh lists), at least one representative wordlist from each dialect would be useful. However, careful analysis of the data in Roettger's wordlists and its comparison with other sources on Dinka raises certain doubts as to complete semantic accuracy of his entries - a very important detail when dealing with dialects the number of discrepancies between which on the Swadesh wordlist rarely exceeds 10%.

In the light of this, we currently prefer not to rely on Roettger's data as primary sources. All of it is, however, included in the Notes section on Dinka, and therefore, open to manual comparison. Addition of extra control sources, such as, e. g., data from a series of papers by T. Andersen on the phonology and grammar of the Agar dialect, shows that there are, indeed, some significant divergences between dialects (for Agar, cf. at least the following entries: 'cloud', 'give', 'leaf', 'mountain', 'new', 'road', 'sun', 'snake'), but the construction of a detailed and accurate set of wordlists on the most divergent dialects of the language remains a task for the future.

II.2. Transcription.

The main discrepancies between the Dinka alphabet used in [Nebel 1979] and UTS are summarized in the following table.

[Nebel 1979] UTS Notes
c, j ɕ, ʓ Palatal affricates.
th, dh, nh t̪, d̪, n̪ Dental consonants.
ny ɲ Palatal nasal.
q ʕ Laryngeal articulation acc. to Nebel.
è, ò ɛ,ɔ "Open" e and o.
Breathy articulation of vowel.
VV Long vowels.
ä, ï, ö, ë ɐ, ɨ, ɵ, ɘ Centralized vowels.

It should be noted that Nebel's notation of the complex vowel system in Dinka is notoriously inaccurate; a much better source to ascertain the base quality and various secondary features of the vocalism is T. Andersen's description of the phonology of Agar (e.g. Andersen 1987), but it does not include a full coverage of the basic lexicon, and it is not entirely clear how precisely it correlates with the phonological systems of other dialects. Fortunately, this is not highly significant for basic lexicostatistical purposes, where Nebel's lexicon remains perfectly usable.

The main differences between Andersen's and Nebel's notation and description are as follows:

(a) In the consonantal system, Andersen marks the palatal nasal as ɲ; Nebel's laryngeal q (= UTS ʕ) corresponds to Andersen's velar voiced fricative ɣ.

(b) The base vowel system is described by Andersen as i, u, e, o, ɛ, ɔ, a, i. e. is practically the same as Nebel's. No special "centralized" phonemes or allophones are postulated in his description. Breathy vowels () in Andersen's description are opposed to creaky vowels (). We do not specially mark creakiness in Andersen's entries, since it seems to be the default (unmarked) quality (Andersen himself marks it inconsistently in his records, since it is phonologically superfluous).

(c) Andersen postulates three degrees of length for Dinka: short (V), medium (VV), and long (VVV), as opposed to only two in Nebel's description. We transliterate Andersen's "medium" vowels as long (Vː) and his "long" vowels as "ultra-long" (VVː).

(d) Andersen postulates two level tones (low V̀ and high V́), as well as one contour tone (high-low V̂) for Dinka. His are the only sources so far that consistently note prosodic information for Dinka.

III. Thok Reel.

III.1. General.

Thok Reel, or Atuot (the name of the small ethnicity speaking the language in the Yirol West county of Lakes State of Southern Sudan), is a small and relatively recently discovered variety of Nuer-Dinka that is sufficiently distant from both the Nuer and the Dinka dialectal clusters to be considered a separate language, although in many respects (including lexicostatistics) it is genetically closer to Nuer than to Dinka (although the speakers have migrated into a Dinka-occupied area).

Data on the language are very scarce; however, a large wordlist may be found in [Roettger 1989], appended to the large collection of wordlists for various Dinka dialects. Like all of the survey materials in this source, certain inclusions are semantically problematic, and the accuracy of phonetic representation is also questionable (not to mention lack of any prosodic information), but some of the data may be double-checked by means of [Reid 2010], a detailed phonetic and grammatical description of the idiom - unfortunately, this source cannot be used as primary for the lexicostatistical wordlist due to huge gaps. In between the two sources, however, a more or less accurate picture of the language still emerges, although it is certainly liable to future amendments as (if) more and better data become available.

III.2. Transcription.

Transcription in [Roettger 1989] is the same as for the Dinka dialects in that source. It almost completely coincides with the orthography of the [Nebel 1979] Dinka dictionary, except that -ATR vowels ɛ, ɔ are transcribed as such.

IV. Mabaan.

IV.1. General.

Mabaan, spoken by about 50,000 speakers in Mabaan county (Upper Nile state), remains a somewhat poorly described language. As our main source, we have chosen the formally unpublished dictionary [Miller 2006], kindly made available for the general public by Roger Blench; it is a comprehensive source, well illustrated by textual examples, but one that suffers from various inaccuracies (e.g. the same word may be transcribed in different ways throughout the dictionary) and incomplete understanding of the language's phonology (particularly vocalism and prosody, with tonal information completely missing from transcriptions).

Of a far higher quality are the numerous works by Torben Andersen [1992, 1999, 1999b, 2006] that deal with various aspects of Mabaan phonetics, morphophonology, and morphology: Andersen's field data are accurately transcribed, and he always illustrates his observations with a plethora of examples. Unfortunately, all of these are still short papers, and it is impossible to put together a comprehensive Swadesh wordlist on the basis of Andersen's data alone. In a few cases where Miller's data are unavailable or most likely erroneous, we still include Andersen's elicitations in the primary slot (e.g. 'fat', 'louse'); but for the most part, we list his data in the notes section, since it is extremely important for historical reconstruction, but not so important for straightforward lexicostatistics.

A defective 100-item wordlist for Mabaan is also available in [Bender 1971: 269]; it has been made almost completely obsolete by the availability of Miller's and Andersen's results, but we still include the data in the notes section for the sake of completeness.

IV.2. Transcription.

The transcription used in [Miller 2006] is slightly simplified compared to T. Andersen's (largely for typographic reasons). For the most part, we do not introduce any amendments, except for the following:

(a) Mabaan distinguishes between two series of coronal stops: interdental and post-alveolar / retroflex, which Miller distinguishes as t, d vs. , ; Andersen usually transcribes them as , vs. t, d. To avoid ambiguity, we use diacritic marks everywhere, transcribing the first series as , and the second as ʈ, ɖ.

(b) Mabaan palatal affricates are transcribed as c, j by Miller and as c, ɟ by Andersen; they are transliterated as UTS ɕ, ʓ. Palatal nasal is transcribed as and transliterated as ɲ.

(c) Long vowels are transcribed as doubled aa, oo, etc. by Miller and by Andersen; they are transliterated as UTS , , etc.

V. Jumjum.

V.1. General.

The Jumjum language, spoken in the Blue Nile Province to the north of Mabaan by about 25,000 speakers (Ethnologue), remains poorly described. No systematic grammatical description or vocabulary has been published so far, and the majority of information on select grammatical aspects and lexicon of the language is to be gained from several publications of Torben Andersen [Andersen 2004; Andersen 2006b], containing high quality, but, unfortunately, limited field data collected by the author.

Alternate sources of data on Jumjum are scarce and far less reliable. Where available, we always quote the equivalents from M. L. Bender's Swadesh wordlist on Jumjum [Bender 1971: 268], and sometimes, when Andersen has no equivalent, Bender's data provide the only possibility to fill in the primary slot. These entries, however, are always marked with #, since Bender's data typically suffer from phonetic and semantic inaccuracies.

V.2. Transcription.

T. Andersen's transcription is IPA-based and requires only minimal cosmetic amendments to UTS. We transcribe long vowels (aa, oo, etc.) as , , etc.; palatal plosives/affricates c, j as ɕ, ʓ.

VI. Burun (Kurmuk, Mayak).

VI.1. General.

The Burun language, spoken by several thousand people to the north of the Jumjum area, consists of several closely related dialects that Torben Andersen considers to be separate languages (belonging to the Northern Burun subgroup as opposed to Southern Burun, consisting of Mabaan and Jumjum); their names vary depending on the source, but according to Andersen, the main dialects include Mayak, Kurmuk, and Surkum.

Unfortunately, not a single exhaustive grammatical description or vocabulary exists for any of these dialects; like Jumjum, most of the phonetically and semantically accurate information on them has to be extracted from T. Andersen's papers (see the complete list of references in the Data sources section). Seriously gapped, but workable Swadesh lists may be extracted for Mayak and Kurmuk (but not for Surkum, data on which are quite minimal). Additionally, some gaps in the Mayak list may be tentatively filled in by data from M. L. Bender's [Bender 1971: 272] wordlist on Burun, with the same caveats as for Jumjum (see above). We also rely on [Storch 2005] for supportive information, since Storch's data on Mayak nouns is partially original and can be used to fill in a few gaps and resolve some controversial situations.

VI.2. Transcription.

More or less the same cosmetic amendments to UTS are relevant for T. Andersen's transcription of Mayak and Kurmuk data as for his transcription of Jumjum data (see above).

VII. Shilluk.

VII.1. General.

Despite the large number of Shilluk speakers and the overall importance of the Shilluk ethnicity in South Sudan, there is as of now no modern-level dictionary or grammar for Shilluk. Lexicostatistical and etymological studies, therefore, have to take place at the intersection of older lexicographic sources (not highly reliable in terms of phonology) and partial new studies that provide better insights into the grammar and phonetics of Shilluk, but do not have enough data to properly fill up the entire Swadesh wordlist.

Our main source is [Heasty 1937], with [Kohnen 1994] (the actual data was collected in the first decades of the 20th century) selected as a supporting source. Where possible, we also provide the corresponding equivalents from more recent works by Leoma Gilley [Gilley 1992, 2000] that give a better idea of the words' phonetic shape; unfortunately, there are no wordlists in Gilley's works, and, moreover, some of the author's own data vary in shape from one work to another.

VII.2. Transcription.

The transcription (alphabet) system of Heasty is generally simple and does not require a lot of transliteration efforts. The following systematic recodings have been performed:

(a) Palatals: Heasty's c, j, ny = UTS ɕ, ʓ, ɲ;
(b) Dentals (interdentals): Heasty's th, dh, nh = UTS , , ;
(c) Heasty consistently distinguishes between "unmarked" vowels (a, e, ɛ, i, o, ɔ, u) and their "centralized" variants (which he transcribes as ä, ë, ɛ̈, ï, ö, ɔ̈). This is not the same as the well-known +/-ATR opposition and rather seems to correspond to the rare feature that Gilley [1992: 28] calls "expanded pharynx". Since in the other Luo languages the same feature is often analyzed as an opposition between breathy and non-breathy vowels, we tentatively mark Heasty's "centralized" vowels (= Gilley's [+EX] vowels) as breathy, i.e. aʰ, eʰ, ɛʰ, iʰ, oʰ, ɔʰ. It should, however, be noted that Heasty's and Gilley's transcriptions are frequently uncorrelated, hinting at possible mistakes that could have been made by either of them.

The system of transcription in [Kohnen 1994] is almost exactly the same as Heasty's, except that Kohnen omits any indications of the centralized / uncentralized opposition. L. Gilley [1992, 2000] uses standard IPA to transcribe all the forms.

Database compiled and annotated by: G. Starostin (last update: January 2018).