Annotated Swadesh wordlists for the Yeniseian group (Yeniseian family).

Languages included: Ket [yen-ket], Yugh [yen-yug], Kott [yen-kot], Arin [yen-ari], Pumpokol [yen-pum].
Reconstruction: Proto-Yeniseian reconstruction available.


I. General.

Main sources

Castrén 1858 = Castrén, M. A. Versuch einer jenissei-ostjakischen und kottischen Sprachlehre nebst Wörterverzeichnissen aus den genannten Sprachen. Sankt-Petersburg. // First systematic and detailed description of Ket and Kott grammar, accompanied by representative vocabularies for both languages. The Ket part is mostly obsolete in the light of newer data, but still contains important information on some phonetic peculiarities of XIXth century Ket. The Kott part is the most important source of data on that language.

Dulzon 1961 = Dulzon, A. P. Slovarnyje materialy XVIII v. po ketskim narechijam [XVIIIth century vocabularies of Ket idioms]. In: Uchenyje zapiski Tomskogo gosudarstvennogo pedagogicheskogo instituta, Tomsk, pp. 152-189. // Transliterated and briefly annotated data collections, extracted from old field records and compiled sources. Besides containing valuable old data on Ket and Kott, this publication serves as the only source of data for the extinct Pumpokol language, and the primary source of data for the equally extinct Arin.

Werner 2002 = Werner, Heinrich. Vergleichendes Wörterbuch der Jenissej-Sprachen. Wiesbaden: Harrassowitz Verlag. // Huge comparative lexicon that includes most of the known lexical data on Yeniseian languages. Also contains the author's own Proto-Yeniseian reconstructions.

S. Starostin 1995 = Starostin, S. A. Sravnitel'nyj slovar' jenisejskix jazykov [Comparative dictionary of Yeniseian languages]. In: Ketskij sbornik. Lingvistika [Ket Volume, Linguistics], Moscow, pp. 176-315. // First comprehensive comparative dictionary of Yeniseian languages that includes protolanguage reconstructions based on S. Starostin's system of phonetic correspondences.

YED = Starostin, S. A. Yeniseian Etymological Database. // Computerized version of the Proto-Yeniseian corpus, available at Includes some etymologies that have not been included in [S. Starostin 1995], for the most part, due to the author's taking into account the new data that became available with the publication of [Werner 2002]. A significant number of old reconstructions has also been revised in this version.

II. Ket, Yugh.

Werner 1977 = Werner, G. K. Akcentirovannyje sravnitel'nyje slovarnyje materialy po sovremennym jenisejskim dialektam [Accentuated comparative lexical data on modern Yeniseian dialects]. In: Jazyki i toponimija [Languages and Toponymy], Tomsk, pp. 131-195. // First ever comparative vocabulary of Ket dialects that fully accounts for the suprasegmental features of all the words. Most of the data have been incorporated into [Werner 2002].

Werner 1993 = Werner, G. K. Ketsko-russkiy / russko-ketskiy slovar' [Ket-Russian, Russian-Ket dictionary]. Saint-Petersburg. // Large dictionary of Ket, based primarily on the South Imbatsk dialect. All data are given in "official" Ket Cyrillic orthography, since the dictionary's primary purpose is educational.

Werner 2011 = Werner, Heinrich. Die Jugen (Sym-Jenissejer) im Lichte ihrer Sprache. München: Lincom Europa. // Large monograph on Yugh, most of which is occupied by an extensive German-Yugh dictionary. This latest source on Yugh lexicon is used as the default source for Yugh data, although it should be noted that most of the data are directly copied from [Werner 2002].

III. Kott.

Verner 1990 = Verner, G. K. Kottskij jazyk [The Kott Language]. Rostov-na-Donu. // A large monograph (in Russian) describing the phonetics, grammar, and available lexical data on Kott. In terms of data, it is generally dependent on [Castrén 1858], but it also adds valuable materials from earlier, less accurate sources on Kott.


I. Ket.

I.1. General.

The main entry, transliterated into UTS, is quoted after [Werner 2002], reflecting the Southern dialect. It is immediately followed by the standard Cyrillic orthographic representation of the word as presented in [Werner 1993]. Comments include:

(a) basic grammatical info on the words, such as gender and plural form for nouns, and information on conjugation, along with some paradigmatic evidence, for verbs. All of the info is also quoted from [Werner 2002];

(b) re-transliteration of the item's representation in [Werner 1977], along with all the dialectal data presented there. This information is useful to assess the dialectal variety of Ket. Abbreviated names of the dialects are: S.-Imb. = South Imbat; N.-Imb. = North Imbat; Bak. = Baklanikha; Sur. = Surgutikha; Kur. = Kureyka;

(c) re-transliteration of the item's representation in the early dictionary of M. Castrén [1858]. It should be noted that, although Castrén generally distinguishes between Ket (Imbat) and Sym (Yugh), forms from both dialects are frequently conflated in the dictionary without adequate differentiation. Forms specifically marked as "Sym" in Castrén's dictionary, or forms that specifically betray Sym phonetic features without being marked as such (e. g. entries beginning with f- as opposed to Imbat h-), are not entered in the notes on "Ket", but this does not necessarily mean that each single form quoted in the notes belongs to the Imbat variety of the language.

I.2. Transliteration.

UTS Werner 2002 Werner 1993 Castrén 1858
p p п p
b b б b
m m м m
t t т t
d d д d
s s с s
сь s with stroke
l l л l
ль ɫ
r r р r
р ~ рь r
n n н n
нь n with stroke
k k к k
g g г g
ɣ ɣ г g
q q қ k`
ʁ R ґ g`
ŋ ŋ ӈ ŋ
h h х h
y j й -i ~ -j-
ʔ ʔ ʼ ʼ ~ not marked
a a а a
ä ä я ä ~ eä
e e е e
ɛ ɛ э ä ~ eä
i i и i
o o ө o
ɔ ɔ о o
u u у u
ɨ ɨ ы y
ɜ ʌ ъ
ǝ ǝ ǝ
V V ~ V̂


M. Castrén's data are notably different from XXth century data and may represent both peculiarities of earlier phonetics and the author's own mistranscriptions (the author does not distinguish between o and ɔ, or ɛ and ä; his g` should be graphically interpreted as voiced uvular affricate ɢ, but probably represented fricative ʁ, etc.).

Tones. [Werner 1977] and subsequent publications by the same author consistently mark four different tones, plus two more contour tones on polysyllabic forms. The numeric notation has been reproduced here in the notes section, next to the quoted variants of forms from [Werner 1977]. However, tonal notation as such is mainly superfluous in Ket words if the accompanying features are marked instead, such as:

(1) Tone 1 is automatically correlated with semi-long vowels (Vˑ);
(2) Tone 2 is automatically correlated with the presence of a glottal stop (ʔ);
(3) Tone 3 is automatically correlated with fully long vowels (Vː);
(4) Only Tone 4 in South Imbat dialects, with a falling contour (V̂), is fully phonologized as such. North Imbat dialects usually accompany this contour with additional vowel length and a reduced at the end of the word (cf. South Imbat tîɣ 'snake' vs. North Imbat tîːɣǝ id.), but, since the South Imbat form does not have this , it is important to mark the tone explicitly.

Tones 5 and 6 may be interpreted as different types of stress in a bisyllabic word: tone 5 = stress on the first syllable, tone 6 = stress on the second syllable. We mark stress position according to the data in [Werner 2002] and list the numeric tonal notation for the corresponding entries in [Werner 1977].

II. Yugh.

II.1. General.

The default source for data on the now extinct Yugh (= Sym) dialect are fieldnotes collected by H. Werner and subsequently published in numerous sources, chief among them [Werner 1977, 2002, 2011]. Apart from that, Yugh data are consistently quoted from [Castrén 1858] where they are explicitly marked as such or betray usually archaic phonetic peculiarities that are specifically characteristic of the Yugh dialect (such as f instead of Imbat h, r instead of Imbat l, etc.).

Most of the grammatical and other types of notes are the same as for Ket proper (Imbat), to which Yugh is very closely related.

II.2. Transliteration.

Transliteration rules are mostly the same as for Ket (see above). Minor additions are as follows:

UTS Werner 2002/2011 Werner 1977 Castrén 1858
f f ф f
χ χ х k`
č č ч t with stroke
dj д' d with stroke

Castrén's transcriptions of k` and stroked t actually surmise phonetic values of q and , which may have been the pronunciation norm in the XIXth century; in Werner's transcriptions, these sounds consistently correspond to χ and č.

Tones. The basic tonal system of Yugh is the same as in Ket, but the phonetic realization of particular tones may be slightly different. Namely, Tone 1 is correlated with short vowels (instead of semi-long in Ket); Tone 4 is correlated with long breathy vowels (as in some, but not all, subdialects of Ket); Tones 2 and 3 are essentially the same as in Ket.

Additionally, Yugh distinguishes between three degrees of vowel length (short, semi-long, long); cf. the difference between 'fish' (semi-long) and 'eye' (short) on the wordlist.

I-IIa. Notes on Common Ket-Yugh.

This field contains the intermediate reconstruction for Ket-Yugh. The forms are taken either from [S. Starostin 1995] (where the reconstructions, distribution-wise, are exclusively Ket-Yugh rather than Proto-Yeniseian) or constructed by G. Starostin based on S. Starostin's system of correspondences, with minor modifications.

III. Kott.

III.1. General.

The default source for data on the long-extinct Kott language is [Castrén 1858], a source that is fairly reliable, although hardly free of occasional phonetic and semantic inaccuracies.

Small selections of Kott data have also been recorded in earlier sources; all of them are summarized in [Verner 1990] and, where necessary, quoted in transliterated forms in the Notes section. These are marked as follows: (M.) = lexical data from G. F. Miller's records (collected in 1731); (Dict.) = lexical data from the anonymous "Dictionary of Five Arin Lects", supposedly from the mid-XVIIIth century; (Pal.) = data from P. S. Pallas' late-XVIIIth century collections, usually derived from (Dict.); (Kl.) = data from J. Klaproth's "Asia Polyglotta", for the most part, also derived from (Dict.); (Kh.) = XVIIIth century archival data, discovered and published by Ye. Khelimsky in 1986.

III.2. Transliteration.

Castrén's Kott data have been recorded according to the same principles as Ket-Yugh data; UTS transliteration issues are, therefore, mostly the same as already specified for Ket-Yugh. Data from earlier sources have already been retransliterated into "conventional" Latinized notation in [Verner 1990] and are, for the most part, left unchanged in the database (with the exception of standard UTS conventions, such as, e. g., changing j to y, etc.).

IV. Arin; Pumpokol.

IV.1. General.

Both the Arin and Pumpokol languages, unfortunately, became extinct before proper scholarly work, at least on M. Castrén's level of professionalism, could be done on both of them. Most of the available sources, dating from exploratory work performed in the XVIIIth century, were carefully assembled, transliterated, and reprinted by A. P. Dulzon in [Dulzon 1961], which remains the major source on Arin and Pumpokol data. The sources compiled in this work are the same ones that have already been listed above in the section on Kott (M., Dict., Pal., Kl.). The only notable source to be added to this are some records of Arin made in 1735 by A. Loskutov, found and published by Ye. Khelimskiy in 1986 and later reprinted in [Werner 2002] (Kh.).

It can be easily guessed that the majority of the lexical data was recorded with relatively poor transcriptional quality; semantic accuracy of the transcribed words can also be frequently put under doubt. In addition, the sources on Pumpokol are heavily flawed by regularly mixing "proper" Pumpokol words with words that, in reality, represent one of the Yugh dialects - this can be very easily established through a large number of "doublet" forms, where one of the two members of the "doublet" coincides with or is very close phonetically to the corresponding Yugh word. Most of these suspicious cases have been filtered out in the lexicostatistical list, but the status of a small handful of entries is still unclear. Nevertheless, it has been possible to fill in almost 60 positions in the Pumpokol list and close to 80 positions in the Arin list, which enables us to make important classificatory conclusions based on these results. (For Arin, it may be assumed that Loskutov's and Miller et al.'s data come from more or less the same dialect, with minor variations possibly reflecting the inaccuracy of data collectors).

IV.2. Transliteration.

Transliteration from A. Dulzon's Cyrillic-based system into the Latin-based UTS system generally follows the same straightforward principles as adopted by H. Werner in [Werner 2002] and hardly needs detailed explanation. The only non-trivial convention employed by Dulzon is to mark velar k as Cyrillic к and uvular q as Latin k. An entirely different question is how well XVIIIth century transcriptions actually convey all the phonological oppositions of Arin and Pumpokol; for a detailed discussion of the matter, see [Dulzon 1961].

V. Proto-Yeniseian.

V.1. General.

The first comprehensive attempt at a systematic reconstruction of the Proto-Yeniseian phonological system was published by Sergei A. Starostin in 1982 (Sergei Starostin. Prayeniseyskaya rekonstrukciya i vneshniye sv'azi yeniseyskix yazykov /Proto-Yeniseian reconstruction and the external relations of Yeniseian languages/, in: Ketskiy sbornik. Antropologiya, etnografiya, mifologiya, lingvistika /The Ket Volume. Anthropology, ethnography, mythology, linguistics/, Leningrad, Nauka publishers, pp. 144-237.) A decade later, it was followed by a compact comparative-etymological dictionary of the Yeniseian family [S. Starostin 1995], which featured very minor "cosmetic" changes to the reconstruction. All of the Proto-Yeniseian etymologies were also computerized in the StarLing database format [YED]; the database was significantly expanded and updated by S. Starostin around 2003-2004, after the publication of H. Werner's comparative dictionary.

Alternate variants of the Proto-Yeniseian reconstruction have been offered by H. Werner and E. Vajda, although neither of the two specialists has published a separate, sufficiently detailed description. H. Werner's reconstructions for multiple Proto-Yeniseian lexical items have, however, been published in [Werner 2002]: many of them are significantly different from S. Starostin's, and have often been criticized by the latter in his 2003-2004 notes in [YED].

The present attempt at the reconstruction of a Swadesh wordlist for Proto-Yeniseian takes S. Starostin's reconstruction as its starting point; however, Werner's alterations to the reconstructions are considered on a regular basis, and some modifications to the etymologies have also been suggested by G. Starostin (all such modifications are stated and justified in the notes section).

The phonetic correspondences between Ket-Yugh, Kott, Arin, and Pumpokol are relatively complex; for a detailed explanation, the user should probably refer to [Starostin 1982]. In this introductory section, however, it is possible to summarize the major correspondences in a short table.

Proto-Yeniseian Ket Yugh Kott Arin Pumpokol
*p h- / -0- / -p f / -p f- ~ pʰ- / p p- ~ pʰ- ~ f- / p pf- ~ f- ~ p- / p
*b b- / -b- / -p b / -p p p p
*m m m m m m
*w b- / -0- / -w b- / -0- / -w b- / -p- / -w b / -w w ~ m
*t t t tʰ- / t t / -d- ~ -t- t / -d- ~ -t-
*d d- / -d- ~ -r- / -t d / -t t- / r t- / -0 ~ -y d-
*n n n n n n
*r lʸ ~ l l ~ r r r r
*l lʸ ~ l l l l ~ r l
*c t č- / tʸ h- ~ t- / t k- ~ t- / t x- ~ c- / t
d- / -d- ~ -r- / -t d / -t dʸ- / y k- / y k- / d
*r1 lʸ ~ l r l l l
*s sʸ ~ s s š- / -č- / -š ~ -t s ~ š ~ č / -s ~ -š ~ -t t- ~ c- ~ s- / -t ~ -č ~ -š
t č- / tʸ š- / -č- s- ~ š- ~ č- ~ k- x- ~ k- / -č
d- / -d- ~ -r- / -t dʸ- / -tʸ č- / y s- ~ š- / y č- / -y- ~ -dʸ-
n n n ɲ ~ n
*rʸ l- / r dʸ- / y t- ~ d- / l l
*y 0- / y 0- / y dʸ- / y 0- ~ y- / y d- ~ 0- / -y
dʸ- / l r ~ l l
*k k- / -ɣ- k / -g- h- / k ~ g ~ x k- ~ x- / g ~ y ~ 0 k- ~ x- / -0- ~ -y- / -t ~ -č
*g k- / -g- / -ŋ k- / -g- k- / -k- ~ -g- k- / -g- -k- ~ -g-
ŋ ŋ ŋ ~ n ŋ ~ g ŋ
*x 0- / -ɣ- 0- / -g- / -k 0- / -y- ~ -0- 0- ~ k- / 0 0- ~ h-
*q q- / -ʁ- ~ -0- x- / -x ~ -q x- ~ kʰ- / k ~ g ~ x k- ~ q- / 0 k- ~ x- / -k
q- / 0 x- / 0 k- / k ~ g ~ x k- ~ q- / -0- ~ -g- x- / -k- / -0
q- / -0- / -k x- / -0- / -k h- / -0- ~ -ʔ- ~ -y- / -k ~ -g ~ -x k- ~ q- / -g- ~ -0- ~ -y- / -0 k- ~ x- / g ~ k
*h 0- 0- h- 0- ?
*i i i i ~ e i (a, e) i (a, e)
*e e ~ ɛ e ~ ɛ e i (a, e) a (e, i, u)
a a e a (i) a (ɨ, o, e)
ɨ ɨ ɨ e (i, a, u, o) i (ɨ, o, a)
ɜ ɜ i ~ a ~ e a (u, o, i, e) a (o, i, ɨ, u)
*u u ~ ɨ u ~ ɨ u u (o, i, e) u (o)
*o o ~ ɔ ~ u o ~ ɔ o o (u, e, a) o (u, e)
o ~ ɔ o ~ ɔ a o (a, u) a

Additionally, Ket-Yugh prosody, well studied and described by H. Werner (accurate data on the prosodic features of other, now extinct, Yeniseian languages are non-existent) is projected by S. Starostin onto the Proto-Yeniseian level as follows:

Proto-Yeniseian Ket Yugh
*CV̆ CVˑ1 CV1
*CV̆ʔ CVʔ2 CVʔ2
*CV̆ʔC CVʔC2 CVʔC2
*CVːC CVːCǝ4 CVːʰC4
*CV̆Ce CVˑCǝ1 CVˑC1
*CV̆ʔCV CVʔC(ǝ)2 CVʔC2
*CVːCV CVːCǝ4 CVːʰC4


(1) In the table, the slash sign ( / ) separates positional reflexes; the tilde sign ( ~ ) separates "fluctuating" reflexes that are sometimes conditioned by phonetic context, sometimes by the dialectal affiliation of the form (especially in the case of Ket), and sometimes represent conflicting orthographies in old, phonetically inaccurate sources (especially true for Arin and Pumpokol, less so for Kott). More detailed information on all this may be found in S. Starostin's paper from 1982.

(2) S. Starostin's reconstruction model may be defined as "maximalist", assigning as many series of phonetic correspondences as possible to individual Proto-Yeniseian phonemes: this particularly concerns the affricates (*c, , , ), the uvular series (, ), the velar fricative *x, and some of the resonants (*r1, *rʸ), none of which are found as autonomous phonemes in attested languages. In comparison, H. Werner's reconstruction is more "cautious", trying to stick to actually attested phonetic inventories when reconstructing Proto-Yeniseian forms. Nevertheless, the extra series of correspondences, described by S. Starostin, do exist, and most of them cannot be easily explained away as unmotivated splits of reflexation. This does not necessarily mean that S. Starostin's system should be regarded as completely finalized and "waterproof", but it does mean that the phonological oppositions set up therein should be respected until one can come up with a suitable explanation for all the "extra" splitting of reflexes.

(3) In our reconstruction of the Swadesh wordlist, we have eliminated only one consonant from the Starostin model: Proto-Yeniseian uvular fricative . In word-initial position, it is only distinguished from *q- because of a questionable reflex splitting in Kott (*q- > x- ~ kʰ-, but *χ- > h-), which may at least partially be explained by contextual conditioning; in word-medial position, it is practically indistinguishable from , and in word-final position, from *k. On the other hand, the velar fricative *x is still necessary in order to account for Arin word-initial k- in such cases, as Ket uːsʸǝ, Kott uːča = Arin kus 'birch tree', Ket ɨn, Kott iːna = Arin kina 'two', etc.

(4) Multiple questions with individual etymologies still remain unresolved - in particular, vocalism of the first and especially the second syllable still remains reconstructed very approximately. Unfortunately, this is at least partially caused by very poor transcription quality in the early sources on Arin and Pumpokol. In the table above, the most frequent ("default") vowel reflexes for Arin and Pumpokol are listed at the beginning, then all the alternate (statistically less frequent) representations are listed in parentheses; it is practically impossible to determine which of them represent real phonetic developments and which ones are simply the result of inadequate transcription.

Database compiled and annotated by: G. Starostin (last update: July 2013).