Annotated Swadesh wordlists for the Karen group (Sino-Tibetan family).

Languages included: Bwe Karen [krn-bwe]; Geba Karen [krn-geb]; Eastern Kayah Li [krn-kli]; Western Kayah Li [krn-wkl]; Kayah Monu [krn-mnu]; Brek Kayaw [krn-bre]; Yintale [krn-yin]; Kayan, Pekon [krn-pek]; Kayan, Lahta [krn-lah]; Yinbaw [krn-ynb].



Manson Ms. = Manson, Ken. Lexical data on various Karen languages. Ms. // Data from fieldwork, conducted by Ken Manson on various Karen languages, including Pekon Kayan and Yinbaw, apart from small samples in published sources, is also partially available as part of the STEDT (Sino-Tibetan Etymological Dictionary and Thesaurus) databases. The latter are currently available publicly at

I. Bwe Karen.

Henderson 1997 = Henderson, Eugenie J. A. 1997. Bwe Karen dictionary. With texts and English-Karen word list. Anna J. Allott (ed.). London: School of Oriental and African studies. // An extensive dictionary of Bwe Karen, with most words accompanied by examples of usage, as well as a large collection of illustrative texts published in a separate volume. Essentially the only source of linguistic knowledge on this language so far, but a nearly exhaustive one from a lexicographical point of view.

II. Geba Karen.

Shee 2008 = Shee, Naw Hsar. 2008. A descriptive grammar of Geba Karen. M.A. Thesis: Payap University, Chiang Mai. // A detailed grammatical description of the Geba Karen language, including examples of texts collected by the author and a list of basic words.

III. Kayah Li (Eastern, Western).

Kirkland & Dawkins 2007 = Kirkland, Cortney; Dawkins, Erin. 2007. A Sociolinguistic Survey of Eastern Kayah Li in Thailand. Chiang Mai: Payap University. // This sociolinguistic report on several varieties of Kayah Li includes a 155-item comparative wordlist elicited by the authors from two dialects of Eastern Kayah Li (Huai Phung and Huang Chai Kham villages), as well as a separate wordlist on Eastern Kayah Li and another on Western Kayah Li, taken from data collected by Fraser Bennett in 1991-1992.

Solnit 1997 = Solnit, David B. 1997. Eastern Kayah Li. University of Hawai'i Press. // A detailed grammatical description of the Eastern Kayah Li language. Includes some samples of texts collected by the author and a brief vocabulary.

Bennett Ms. = Bennett, Fraser. Additional data on Eastern and Western Kayah Li, collected by Fraser Bennett in 1991-1992 and available in manuscript form. Much, but not all, of it was officially published in [Kirkland & Dawkins 2007].

IV. Kayah Monu; Brek Kayaw; Yintale.

Myar 2004 = Myar Doo Myar Reh. 2004. A phonological comparison of selected Karenic language varieties of Kayah State. M.A. Thesis: Payap University, Chiang Mai. // A detailed phonetic description and comparison of several varieties of Karen, including Kayah, Kayah Monu (Monumanaw), Brek (Bre) Kayaw, and Yintale. Includes 400+ item lexical wordlists for each of the described varieties.

Wai 2013 = Wai Lin Aung. 2013. A descriptive grammar of Kayah Monu. M.A. Thesis: Payap University, Chiang Mai. // A detailed, but not too lexically rich grammatical description of Kayah Monu that can nevertheless be used as a control source in some cases.

V. Kayan, Pekon.

Manson 2007 = Manson, Ken. 2007. Pekon Kayan phonology. Chiang Mai: Payap University. // Detailed phonological description of Pekon Kayan, well illustrated by examples.

VI. Kayan, Lahta.

Ywar 2013 = Ywar, Naw Hsa Eh. 2013. A Grammar of Kayan Lahta. M.A. Thesis: Payap University, Chiang Mai. // Detailed phonetic and grammatical description of Lahta Kayan, well illustrated by examples, but without a glossary.


I.1. General.

Bwe Karen is a Central Karen variety spoken by approximately 17,000 people in the states of Kayin and Kayah. All major sources on this language have been created by Prof. Eugénie J. A. Henderson, whose large dictionary of the language [Henderson 1997], based on the Western dialect of Bwe Karen, serves as the main source for the wordlist.

I.2. Transcription.

E. Henderson's transcription largely follows basic IPA standards. The following symbols were subject to recoding in the UTS system:

(a) palato-alveolar affricates c, ch, j have been recoded as ɕ, ɕʰ, ʓ respectively;
(b) aspirated plosives and affricates ph, th, ch, kh have been recoded as pʰ, tʰ, ɕʰ, kʰ respectively;
(c) palato-alveolar ʃ has been recoded as ʆ;
(d) "velar unrounded semivowel", marked as ʀ, is apparently the velar approximant ɰ and has been marked as such (only in the word ɰū 'snake');
(e) Bwe Karen has three tones (high, mid, and low), of which high and low are consistently marked in the dictionary in the regular manner (V́ and V̀), and the middle tone is left unmarked; we recode it as V̄.

II. Geba Karen.

II.1. General.

Geba Karen, being quite close to Bwe Karen, has so far been properly described only in one source - the descriptive grammar [Shee 2008]; fortunately, the grammar also includes a short wordlist and may be used for lexicostatistical purposes.

II.2. Transcription.

Naw Hsar Shee's transcription of Geba Karen is generally based on the IPA standard and does not need a lot of recoding, except for some usual IPA > UTS conversions (post-alveolar fricatives and affricates ʃ, tʃʰ, dʒ > UTS š, čʰ, ǯ).

III. Kayah Li.

III.2. Transcription.

The phonological system of Eastern Kayah Li as described by David Solnit is relatively simple, with a basic opposition between voiceless aspirated, voiceless non-aspirated, and voiced stops and affricates. Specific comments on transcription and transliteration are as follows:

(a) Solnit's c and ch = UTS ɕ and ɕʰ, respectively (alveo-palatal affricates);
(b) Solnit's /j/ is described as varying "between standard palatal glide and voiced palatal fricative, also occasionally appearing as a slightly prenasalized alveopalatal affricate [ndʑ], especially in the Low Falling tone". Since the conditions for this variation are not mentioned, we consistently re-transcribe it as UTS ʓ (voiced correlate to the voiceless phonemes ɕ and ɕʰ), except for those cases where it is encountered in word-medial position (i. e. initial clusters such as pj-, bj-, etc. = UTS py-, by-, etc.);
(c) Solnit's velar nasal ŋ is described as having a palatal allophone ɲ before front vowels and glide /j/. Since this is not reflected in his orthography, we retain the phonemic transcription of ŋ in all cases.
(d) Solnit's r is described as a "retroflex approximant", but "an alveolar trill" "in emphatic speech". We retain the simplified transcription of r.

The tonal system of Kayah Li consists of five tonemes: mid (33), low level (11), low falling (21), high (55), high falling (52). Our recoding correlates with Solnit's notation as follows:

Toneme Solnit's transcription UTS recoding
Mid (33)
Low level (11) V [unmarked]
Low falling (21) V̰̀
High (55)
High falling (52) V52

Note that the low falling tone is described as "passing... into a brief stretch of creaky voice that shades immediately into glottal stop"; this allows us to distinguish it notationally from low level tone with the addition of a subscript tilde for "creaky" articulation, rather than introduce an inconvenient additional tonal diacritic. The high-falling tone is met very rarely and is generally restricted to grammatical morphemes, which is why it is not encountered in the basic lexicon at all.

The transcription system in [Kirkland & Dawkins 2007], as well as the system used by Fraser Bennett, employs standard IPA conventions.

IV. Kayah Monu; Brek Kayaw; Yintale.

IV.1. General.

Our main source on these Karenic varieties is the comparative study [Myar 2004], containing enough lexical material to construct adequately filled out Swadesh wordlists. Of these, Kayah Monu (= Manu, Manumanaw) and Brek Kayaw (= Bre Kayaw, Kayaw) are usually described as belonging to the Central branch; Yintale is taken in some classifications (e. g. Ken Manson's) to belong to the Northern branch, but lexicostatistically also aligns itself with Central languages, despite some serious phonetic archaisms (e. g. preservation of final ) that seem to be absent in the rest of the Central group.

IV.2. Transcription.

The transcription system used in [Myar 2004], our main source on lexical data for Kayah Monu, Brek Kayaw, and Yintale, is largely IPA-compatible, with the usual "cosmetic" changes between IPA and UTS: post-alveolar ʃ = UTS š, palatal ç = UTS ʆ, palatal approximant j = UTS y, affricate = UTS ǯ.

Tones are represented with tone letters in the source, recoded to UTS diacritics. The typical four-way opposition (Kayah Monu, Brek Kayaw) is high (marked as ), high-mid (marked as ), mid (marked as ), and low (marked as ). Additionally, Yintale also has contour tones (rising , falling ), and some of the vowels in some of the dialects may also be characterized by additional breathiness (, recoded to UTS ).

V. Kayan, Pekon.

V.1. General.

Our main data on the Kayan language comes from fieldwork by Ken Manson [Manson 2007; Manson Ms.], representing the Pekon variety (which is considered the standard / prestigious dialect of the language, altogether spoken by over 130,000 people). Dialectal variety is impossible to measure in lexicostatistical terms based on currently available minimal data on other forms of Kayan.

V.2. Transcription.

Manson's data are strictly IPA-based and only require the usual cosmetic changes when transcribed to UTS, namely:

(a) palatal affricates c, cʰ, ɟ > UTS ɕ, ɕʰ, ʓ respectively;

(b) palatal glide j > UTS y;

(c) a system of six tones is postulated, which can also be phonologically interpreted as four tones plus an additional parameter of breathiness. The correlation between phonetic registers, Manson's transcription, and UTS is as follows:

Registers Manson's transcription UTS
[54] á̤ áʰ
[22] à̤ àʰ
[33] ā ā

VI. Kayan Lahta.

VI.1. General.

The only source on Kayan Lahta, a variety of Karen spoken in approximately 40 villages in the Shan and Kayah states of Burma, is [Ywar 2013], a description of the phonology and grammar of the language that is well illustrated by examples (including phrases that allow to get a better perspective of the semantics of the analyzed items), but contains no separate glossary. Unfortunately, this means a large number of gaps in the current version of the wordlist - approximately a quarter out of the total number - which makes the data somewhat unreliable for lexicostatistical classification. Nevertheless, we have decided to still include Kayan Lahta, pending further publication of data, since the wordlist is at least useful for etymological research on Karen.

V.2. Transcription.

Ywar's data are strictly IPA-based and require only the usual cosmetic changes when transcribed to UTS, namely:

(a) palatal fricative ç > UTS ʆ;

(b) palatal glide j > UTS y.

Ywar transcribes the dipthongs of Lahta Kayan with raised indexes. These are re-transcribed as follows: ei > ei̯, ou > ou̯, ai > ai̯.

Kayan Lahta has four tones, transcribed by Ywar with tone letters and re-transcribed in the UTS with diacritics: low (V̀), mid (V̄), high (V́) and high glottalic (V́ʔ).

VII. Yinbaw.

VII.1. General.

The only extensive collection of data on Yinbaw Karen, spoken by approximately 7,000 individuals in the Shan and Kayah states of Burma (to the southeast of the Lahta-speaking area), is found in the as-of-yet unpublished fieldwork by Ken Manson [Manson Ms.], available to us in the form of Excel databases. A more or less complete Swadesh wordlist for Yinbaw has been extracted from that database.

VII.2. Transcription.

The transcription is essentially the same as for Pekon Kayan.

Database compiled and annotated by: George Starostin (last update: June 2017).