Annotated Swadesh wordlists for the Sinitic group (Sino-Tibetan family).

Languages included: Early Zhou Chinese [sin-ezh], Classical Chinese [sin-cch], Late Middle Chinese [sin-mch], Standard (Mandarin, Putonghua) Chinese [sin-pth], Jian'ou Chinese [sin-jou], Wenchang Hainanese [sin-wch].


I. Early Zhou Chinese

Schuessler 1987 = Schuessler, Axel. A Dictionary of Early Zhou Chinese. Honolulu: University of Hawaii Press. // Large dictionary of Chinese characters and words (includes the author's own version of Old Chinese phonological reconstruction) covering all known literary documents from the Early Zhou era (Shangshu, Shijing, Yijing, etc.) as well as epigraphic material from Early Zhou bronze inscriptions.

II-IV. Classical Chinese; Late Middle Chinese; Standard Chinese

HYDCD = Hànyǔ dà cídiǎn [漢語大詞典]. Ed. by Luo Zhufeng (羅竹風) et al. 13 vols. Shanghai: Cishu chubanshe. // Monumental dictionary of the Chinese language, covering all the stages of the literary and vernacular language; each entry and each meaning is usually accompanied with references to the source of first attestation, so that chronological stratification of the data is possible.

Additional source for Standard Chinese

DEHCD 1985 = Dà É-Hàn cídiǎn [大俄漢詞典]. Beijing: Shangwu yinshuguan. // A huge Russian-Chinese dictionary (appr. 157,000 entries), providing principal Standard Chinese equivalents, used in colloquial and official speech, for Russian vocabulary.

V. Min dialects: Jian'ou Chinese

Li 1998 = Jiàn'ōu fāngyán cídiǎn [建甌方言詞典]. Ed. by Li Rong (李榮) et al. Nanjing: Jiangsu jiaoyu chubanshe. // Large dictionary of the Jian'ou vernacular, focusing primarily on lexicon that is distinct from Standard Chinese.
Huang 1958 = Huang Diancheng. Jiàn'ōu fāngyán chūtán [建甌方言初探]. // A brief phonetic description of the Jian'ou dialect, accompanied by selected illustrative lexicon and text examples.

VI. Min dialects: Wenchang Hainanese

Hashimoto 1976 = Mantaro J. Hashimoto. The Wênch'ang Dialect of the Hainan Language. In: Journal of Asian and African studies, Vol. 11, pp. 65-86. // Short paper with some basic phonetic and lexical information on the Wenchang dialect of Hainan, including a complete 200-item Swadesh list.


1. General.

General note on historical varieties of Chinese: Quotations from original textual sources for Old and Middle Chinese indicate the name of the document and include references to chapter (poem) and, if available, sub-chapter (verse) number. All quotations from Old Chinese literary monuments may be easily verified through, the largest and most convenient source of ancient Chinese texts on the Web. For an electronic version of the text of Línjì lù, see, for instance,

I. Early Zhou Chinese

"Early Zhou Chinese" is understood as the language of the most archaic parts of the Five Classics, most notably the Shījīng and the Shàngshū (Shūjīng), as well as that of the epigraphic monuments of Early Zhou, with more or less the same limits on relevant sources as imposed in [Schuessler 1987]. Dialectal differences are acknowledged for these texts, but it has been so far impossible to confirm their relevance for the Swadesh wordlist.

II. Classical Chinese

"Classical Chinese" is understood as the language of literary texts spanning the approximate time from the 5th to the 3rd centuries B.C. (the "Warring States" era). Since dialectal differences here are generally more pronounced than in the case of Early Zhou Chinese, primary source material for the wordlist is defined as "Early Confucian texts", i. e. the Lùnyǔ (authorship generally attributed to Confucius' disciples) and the Mèngzǐ (authorship generally attributed to Mencius' disciples), usually ascribed to the same dialect (although the language of the Lùnyǔ is slightly more archaic, or archaicized, than in the Mèngzǐ). Most of the words are, therefore, accompanied with quotations from these texts to confirm their presence. If the required word is not attested in Early Confucian texts, it is allowed to bring in data from other texts of the same era, provided (a) there is only one basic equivalent for the term throughout all the texts and (b) its basic usage coincides with either earlier (Early Zhou) or later (e. g. modern Chinese) data, confirming lack of specific replacements.

III. Late Middle Chinese

"Late Middle Chinese" is a problematic, but, nevertheless, extremely important inclusion. Most of the literary texts written in the Middle Chinese period (appr. 6-12 centuries AD) are either written in a form of wényán (Literary Old Chinese) or a hybrid form of wényán and contemporary vernacular. In addition, the problem of dialectal attribution of vernacular-based texts remains as actual as it used to be for the Classical Chinese period. It is, therefore, virtually impossible to offer a "pure" Middle Chinese 100-wordlist that would, at the same time, (a) pretend to represent a particular form of live speech and (b) be complete.

The database offers a compromise version: a wordlist primarily based on the analysis of one single and more or less uniform text: the Línjì lù (臨濟錄), "Records of (Master) Linji", generally dated to the end of the 9th / beginning of the 10th century (i. e. "Late Middle Chinese") and clearly based on a vernacular dialect, with abundant grammatical and lexical evidence to prove that. Additional textual data were not consulted in an attempt to respect "dialectal purity", because even the yǔlù genre of moralist/religious literature oriented at general listeners/readers was highly diverse in terms of linguistic form. Nevertheless, it is still possible to fill in more than 80 positions of the standard Swadesh list, based on evidence of varying quality. Almost each of the words is accompanied by one or more contexts, not all of which happen to be diagnostic according to GLD standards. However, a "compromise" decision has been taken: if a particular word, encountered in a questionable context of Línjì lù, is known to serve as the basic equivalent for the required Swadesh meaning both in earlier forms of Chinese (e. g. Classical) and later forms (e. g. modern dialects), it is included into the list as a "reasonable" candidate for that position in vernacular Middle Chinese.

IV. Standard Chinese

"Standard Chinese" is understood here as the equivalent of Pǔtōnghuà [普通δ酡, the official national standard of Modern Chinese. Although, in general, "Standard Chinese" is based on the present day Běijīng dialect, the two lects do not completely coincide, since certain phonetic and lexical peculiarities of "pure" Běijīnghuà are not reflected in the national language. For lexicostatistical purposes, however, these differences are generally insignificant, so that the 100-wordlist for "Standard Chinese" may be understood as representing the same dialect as "Běijīng Mandarin Chinese".

2. Transcription.

I-III. Reconstructed varieties (Early Zhou Chinese; Classical Chinese; Late Middle Chinese)

Phonological reconstructions for Early Zhou, Classical, and Middle Chinese are based on Sergei Starostin's version as originally published in: [Starostin, Sergei. Rekonstrukcija drevnekitajskoj fonologicheskoj sistemy [Reconstruction of the Phonological System of Old Chinese]. Moscow, 1989.] Particular reconstructions are transliterated into the UTS from S. Starostin's etymological database of Chinese characters (bigchina.dbf), available online at

IV. Standard Chinese

For Standard Chinese, the official pinyin (Latin transcription) equivalents of the words have not been included in the main field, so as not to clutter the entries. Textual examples in the notes are, however, reprinted in pinyin. In the process of conversion from pinyin to UTS, a mixed phonetico-phonological representation was chosen (i. e. some of the phonetic details, such as the non-syllabic character of glide medials, the fronting of a between medial and word-final n, etc., have been indicated, but not all of the different vocalic allophones of the spoken language have been marked out). The general transliteration from pinyin to UTS is as follows:


p- pʰ-
b- p-
f- f-
m- m-
w- w-
t- tʰ-
d- t-
n- n-
l- l-
c- cʰ-
z- c-
s- s-
ch- ʰ-
zh- -
sh- ʂ-
r- ɻ-
q- ɕʰ-
j- ɕ-
x- ʆ-
y- y-
k- kʰ-
g- k-
h- h-


-a -a
-ai -ai
-an -an
-ang -aŋ
-ao -ao
-ei -ei
-en -en
-eng -ɤŋ
-i -i / -ɨ
-ia -i̯a
-ian -i̯än
-iang -i̯aŋ
-ie -i̯e
-in -in
-ing -iŋ
-iu -i̯ou
-o -o
-ong -uŋ
-ou -ou
-u -u
-ua -u̯a
-uai -u̯ai
-uan -u̯an
-uang -u̯ang
-ui -u̯ei
-un -u̯en
-uo -u̯o
-üe -ü̯e
-üan -ü̯an

The special "post-terminal" -r, characteristic of Beijing speech, is transcribed as .

The four tones of Standard Chinese are marked as numbers rather than diacritics, since UTS tonal diacritics significantly differ from the standard tonal markings employed in pinyin and may therefore look quite confusing onscreen. The correlations between numeric notation, pinyin markings, and most common phonetic realization of the four tones in Standard Chinese are as follows:

Number Pinyin Tonal characteristics
1 High level (55)
2 Mid-rising (35)
3 Dipping (21 / 214)
4 High-falling (51)

In bisyllabic compounds where the second unaccented syllable is deprived of tonal characteristics ("neutral" tone), no tonal markings are made (e. g. ye4-cɨ 'leaf').

V. Dialectal data.

So as not to overcomplicate things, we prefer to keep this section reasonably brief. The majority of sources on various Chinese dialects (a.k.a. "Sinitic languages") from the past 20-30 years tend to be generally consistent in transcribing the data according to IPA standards, meaning that only the standard IPA vs. UTS discrepancies usually require fixing.

Below we list the approximate tonal correspondences for Min dialects, compared with the standard Middle Chinese system.

Middle Chinese Jian'ou [Huang 1958] Jian'ou [Li 1998] Wenchang [Hashimoto 1978]
1 Dark level (陰平) 1 (high-falling: 53) 1 (high-dipping: 54) 1 (mid-rising)
2 Light level (陽平) 5 (mid-level: 33) 3 (mid-level: 33) 2 (mid-level)
3 Dark rising (陰上) 3 (bottom-level: 11) 2 (low-dipping: 21) 3 (low-falling)
4 Light rising (陽上) 3 (bottom-level: 11) 2 (low-dipping: 21) 3 (low-falling)
5 Dark departing (陰去) 2 (low-level: 22) 3 (mid-level: 33) 4 (low-falling)
6 Light departing (陽去) 8 (high-level: 55) 4 (high-level: 55) 5 (high-falling_
7 Dark entering (陰入) 4 (low-rising: 13) 5 (low-rising: 24) 6 (high checked)
8 Light entering (陽入) 7 (mid-falling: 31) 6 (high-falling: 42) 7 (mid checked)

Database compiled and annotated by: G. Starostin (Standard Chinese; all forms of Old and Middle Chinese); E. Kuzmina (Min dialects). (Latest update: July 2014).