12.1.10.23:59: *B-UT *WATA-BOUT THE ONSETS? (PART 1)
My previous entry only dealt with the medial consonant of Old Japanese wata and Middle Korean patah ~ parʌr 'sea'. The initial consonants of those words don't match. (The final consonants are also problematic, but I'll deal with them later.) If those words are indeed related, it's unlikely that Japonic speakers borrowed Koreanic *p- as w-. Therefore one or both initials must be innovations.
Some derive OJ w from Proto-Japonic *b. PJ *b- is closer than PJ *w- to MK p- but is still not a perfect match. Given that MK p- normally corresponds to PJ *p- in words such as
why would early Koreanic *p- sometimes be borrowed as early western Japonic (EWJ)** *b-?MK pər < *pətɯ* : PJ *pati 'bee'
Solution 1: Early Koreanic *p- was unaspirated, whereas EWJ *p- was aspirated [ph]. Speakers of languages like English and, in this scenario, EWJ, with initial [ph] and [b] but no initial [p] may perceive foreign [p] as being like their [ph] or [b]. Hence early Koreanic *p- was borrowed at random as either EWJ *p- or EWJ *b-.
Solution 1a: If Koreanic and Japonic are related (which I doubt), EWJ *p- and EWJ *b- corresponding to Koreanic *p- are from two different strata of vocabulary: one inherited from Proto-Koreo-Japonic and another borrowed from Koreanic.
Solution 1b: If Koreanic and Japonic are not related, EWJ *p- and EWJ *b- corresponding to Koreanic *p- are from two different strata of borrowing from Koreanic.
(1a and 1b added 1.11.2:18.)
Solution 2: Early Koreanic had a *p-/*b-distinction and the Early Koreanic word for 'sea' had initial *b- which was borrowed as EWJ *b-. There are two problems with this scenario.
First, none of the attested transcriptions of 'sea' on the Korean peninsula had initial *b-:
Koguryo*** 波且: Middle Chinese *patshjaʔ (regarded by Ryu 1983: 520 as an error for 波旦 *patanh)
Koguryo 波利: Middle Chinese *palih
Shilla 波珍: Middle Chinese *paʈin
Shilla 波澄: Middle Chinese *paɖɨŋ
This does not rule out the possibility that EWJ speakers happened to borrow the word from a Koreanic language (Paekche?) that had not yet shifted *b- to *p-. Unfortunately, there is no known Paekche cognate of this word.
Second, interchangeable initial transcription characters in other words on the Korean peninsula such as
Koguryo: 夫 *puə ~ *buə : 伏 *buk (are there better examples in initial position?)
Paekche: 沸 *pujh : 避 *bieh
Paekche: 富 *puh : 伐 *buat
Shilla: 發 *puat : 伐 *buat
imply that early Koreanic did not have an initial *p-/*b-distinction and that early Koreanic speakers pronounced Chinese *p- and *b- as *p-.
Next: *D-u-*b-ious Voiced Stops in Proto-Japonic
*1.11.1:55: MK pər could be from Proto-Koreanic *pərɯ or *pətɯ with a LH pitch accent but must be from the latter if it is related to PJ *pati.
MK patʌri HHR ~ HHH 'wasp' is vaguely similar, though its vowels belong to the low class whereas *pəCɯ LH had high class vowels with a very different pitch accent pattern. Nonetheless, one could try to relate patʌri to *pəCɯ by positing a common root *pVttV with a geminate *-tt- that was simplified to *-t- and lenited in one dialect but not another.
In my last post, I wrote (emphasis mine),
In Hebrew, intervocalic single stops lenited, whereas intervocalic geminates were simplified [...] The same thing happened in one dialect of Koreanic
I did not intend to imply that Hebrew and Koreanic underwent exactly the same sound changes. In fact, none of the modern Hebrew and Koreanic reflexes of lenited stops are the same:
| Lenited intervocalic stop | VpV | VtV | VkV |
| Modern Hebrew | VfV | VtV < VθV | VxV |
| Modern Korean (< Middle Korean) | VwV < VβV | VrV | VV < VɣV |
The Middle and Modern Korean reflexes are similar to the lenited stops of Tangut and Vietnamese:
Tangut vV, lV, ɣV < *VPV, *VTV, *VKV
Vietnamese [vV zV zV ɣV] < *VPV, *VTV, *VCV, *VKV (*C = palatal stop)
Moreover, I should have noticed that lenition also occurred in final position after vowels in Hebrew, whereas lenition was purely intervocalic in Korean and Tangut.
1.11.2.22: Finally, Korean and Tangut had lenited fricatives and affricates:
Modern Korean VV < Middle Korean VzV < Proto-Korean *V(t)sV
Tangut zV, ʒV < *V(T)SV, *V(T)ŠV
No fricatives lenited in Hebrew which originally had no affricates. (Modern Hebrew ts is from earlier emphatic s.)
**1.11.0:44: I use the term 'early western Japonic' here instead of PJ because Western OJ wata has no Ryukyuan or Eastern OJ cognates that would allow me to reconstruct it at the PJ level.
***1.11.1:59: I use terms like 'Koguryo', 'Paekche', and 'Shilla' to represent any languages or dialects spoken in those kingdoms. I am agnostic about the number of dialects of languages on the Korean peninsula prior to unification. I tentatively assume that the peninsular languages were all Koreanic with remnants of a Japonic substratum.
12.1.9.23:59: A HEBREW HINT FOR A MARITIME MYSTERY?
Japanese has two words for 'sea', one shared with Okinawan (umi < Proto-Japonic *omi) ́ and another shared with Korean (Old Japanese wata). According to Vovin (2010: 12-32), Korean intervocalic *-t- became -r- at some point after Japanese borrowed wata from Korean, so one would expect the later Korean word to have -r-. But there are two Middle Korean words for 'sea', and only one has *-r-!
patah
parʌr (ʌ may be a reduction of *a)
Vovin derived MK intervocalic -t- from earlier *-nt-. But if the earlier Korean word were *panta, it should correspond to Old Japanese wada [wanda], not wata.
Here's what I think happened. In Hebrew, intervocalic single stops lenited, whereas intervocalic geminates were simplified: e.g.,
saapar > safar 'he counted'
sappaar > sapar 'barber'
(Examples from Hetzron 1993: 695.)
The same thing happened in one dialect of Koreanic:
*kətan > MK *kəran (unattested?) > modern kŏran 'Khitan'
*pattak > MK patah > modern pada 'sea'
However, in another Koreanic dialect, simplified intervocalic geminates also lenited:
*pattar > *patar > MK parʌr > (no modern descendant)
Thus Old Japanese wata corresponds to an early Koreanic *pat(t)a with or without a geminate prior to lenition.
Next: *B-ut *W-hat about the Onsets?
Then: A C-l-as-h of Codas
1.10.1:40: Unfortunately, the earliest Chinese character transcriptions of Koreanic words for 'sea' do not point to Vovin's *-nt- or my *-tt-:
Koguryo 波且: Middle Chinese *patshjaʔ (regarded by Ryu 1983: 520 as an error for 波旦 *patanh; *-n could transcribe foreign *-r)
Koguryo 波利: Middle Chinese *palih (for *parih?; there was no MC *r)
Shilla 波珍: Middle Chinese *paʈin < Old Chinese *tər
Shilla 波澄: Middle Chinese *paɖɨŋ
I would expect OJ wata to be a borrowing from Paekche, the peninsular state that was the source of literacy and Buddhism in Japan, but the Paekche word for 'sea' was transcribed as 内米 MC *nəjmejʔ which vaguely resembles Japanese nami 'wave'. 内米 could also refer to ponds, so it may have meant 'body of water'. If Paekche had a word meaning only 'sea', it might have been cognate to MK patah and parʌr.
The earliest Chinese character transcriptions of names and titles from Japan had clusters that might represent geminates or tense consonants: e.g.,
邪馬臺 Late Old Chinese *jæmæʔdə 'name of the state of Yamatai' (for *yamaddə?; Yamatai is the modern Sino-Japanese reading of the transcription; even the alternate spelling 邪馬壹 *jæmæʔʔit may have represented *yamaʔʔit(V) with a geminate)
彌馬獲支 Late Old Chinese *miemæʔwɛkkie 'a title of Yamatai' (for *mema(w)wekke?; cf. Proto-Ryukyuan *weke 'male' [Thorpe 1983: 304])
己百支 Late Old Chinese *kɨəʔpakkie 'name of a state' (for *kəppakke?)
好古都 Late Old Chinese *xouʔkɔʔtɔ 'name of a state' (for *hokkotto?; *h may have merged with zero in early Japonic; modern Japanese h- is from proto-Japonic *p-, not PJ *h-)
對蘇 Late Old Chinese *tuəssɔ 'name of a state' (for *tusso?)
Although the linguistic affiliation of these names is unknown, perhaps early Japonic also had geminates that were later reduced to Old Japanese single consonants and the Koreanic word for 'sea' could have been borrowed as *watta with a geminate. The geminates of modern Japanese would be unrelated to these early geminates.
12.1.8.23:59: JURCHEN POLYPHONY 3: THE WE BACK TO THE CAPITAL
I began this series with a Jurchen character
transcribed in Chinese as 苦 *ku 'bitter' and as 都蠻 *duman 'capital-southern barbarian', and I am ending it with another 'urban' Jurchen character which has a record number of different readings:
~
~
Kiyose 70 (hereafter K70): <her> (J: <hele>) 'city'
phonogram for <hu>, <u>, <we>, (J: <huwe>), <e>, (Y: <o>), <du> (transcribed as 都 *du 'capital'), (J: <ke>)
The readings are from Kiyose (1977: 65, 127) except for those marked with 'J' from Jin (1984: 35) and 'Y' from Yamaji Hiroaki.
If a word spelled with <huwe> came to be spelled with <we>
>
![]()
<huwe> > <huwe.we> = huwe
then <huwe> could have been reinterpreted as a phonogram <hu>.
And just as <clha> might have once been <ilha>, <we> might have once been <huwe>:
>
<huwe> > <hu.huwe> reinterpreted as <hu.we> = huwe
If those derivations are correct, the number of readings of K70 can be reduced to seven: <her>/<hele>, <huwe>, <u>, <e>, <o>, <du>, <ke>. Perhaps each of the characters now regarded as variants originally had only one or two of these readings. Were there originally up to seven distinct characters?
1.9.3:15: Are we seeing the Jurchen equivalent of merging the mostly
unrelated though similar-looking Chinese characters (all readings are
in Cantonese)
into a single 'character'?田 tin
由 yau
甴 jaat (derived from inversion of 曱 below; Ct 曱甴 gaatjaat is a disyllabic word 'cockroach')
申 san < Old Chinese *hlin
电 din < Old Chinese *lins (derived from 申 above)
甲 gaap
曱 gaat (derived from near-homophone 甲?)
Next: A Hebrew Hint for a Maritime Mystery?
1.9.1:15: Jin (1984: 35) noted that K70
~
~
resembled Chinese 左 *tso 'left' which was Jurchenized as
![]()
<dzo>
since Chinese unaspirated obstruents were borrowed as Jurchen voiced obstruents.
The Jurchen word for 'left' was
<hai.su> (cf. Manchu has'hu; <su> is derived from the right side of Chn 穌 *su)
<dzo>, <hai.su>, and their graphs bear no resemblance to K70 and its readings. So is the resemblance between K70 and Chn 左 'left' coincidental? The reading <o> of K70 is vaguely like Middle Korean 왼 oyn 'left'. Was K70 based on a Parhae modification of 左 representing a Koreanic word for 'left'?
1.9.1:29: <her> [xər]/<hele> [xələ] 'city' must be related to the Koguryo word for 'fortress' transcribed as 忽 Late Old Chinese *xwət ~ Middle Chinese *xot. Chinese *-t might correspond to a Koguryo *-r or *-l. LOC and MC did not have liquid codas.
The reading <du> for K70 could be a Jurchenization of Chn 都 *tu 'capital' which in turn might have been a loose translation of <her>/<hele> 'city'.
I don't know why Kiyose (1977: 65) reconstructed <her> with <r>. The Chinese transcription was 黑勒 *xəj-ləj, not 黑兒 *xəj-r̩ which would correspond to <her>.
12.1.7.17:09: JURCHEN POLYPHONY 2: SCIENTIA ALBA
Jurchen has a single character with variants~
~
for both shang 'white' (cf. Manchu shanggiyan 'id.') and sa- 'to know' (cf. Manchu sa- 'id.'*). (Jin 1984: 98 only listed sa as a reading for the third variant.) Did one or more variants originally represent shan while the other(s) represented sa-? Kiyose (1977: 68) wrote, "It is impossible to say whether similar characters with different pronunciations were erroneously written the same way."
Jin (1984: ) derived this graph from the Khitan large script graph
<sha>
~
in the Khitan title
<sha.ri>
transcribed in Chinese as 沙里 *shali and translated as 郎君.
There is also a Khitan large script character
<?> '?'
resembling one of the Jurchen variants.
'White' was
<?>
in the Khitan large script. Janhunen (2003: 397) regarded Manchu shanggiyan 'white' as a loan from a Para-Mongolic cognate of Proto-Mongolian *cagaxan with a Para-Mongolic innovation *c- > sh-. One might think that the Manchu and Jurchen words for 'white' were borrowed from Khitan, but Khitan had a c- in addition to an sh- even in native words (implying that the c- > sh- shift had not taken place) and the Khitan word for 'white' is unknown, so it is doubtful that Khitan had a sh-word for 'white'. In any case, the KLS graph for 'white' only very vaguely resembles the Jurchen characters and is probably not related to them.
If the Jurchen characters were derived from the KLS:
|
Khitan large script |
Early Jurchen |
Later Jurchen (and/or mistakes in the Sino-Jurchen Vocabulary) |
|
<sha> |
<shang>? (but not <sha>!) |
<shang> ~ <sa> (but not <sha>!) |
|
<?> (<sa> like its Jurchen derivative?) |
<sa> |
If the Jurchen and KLS characters were independently derived from common (Parhae?) prototypes:
|
Parhae |
Derivatives |
|
? |
Khitan large script:
<sha> |
|
Jurchen:
<shang> ~ <sa> (but not <sha>!) |
|
|
? |
Khitan large script:
<?> (<sa> like its Jurchen counterpart?) |
|
Jurchen:
<sa> |
Next: The We Back to the City
*A shaman is a sa-man 'knower' in Manchu. The same suiffix was in Jurchen
<sori.duman>
soridu-man 'fighting'
from soridu- 'to fight' in part 1.
12.1.6.23:56: JURCHEN POLYPHONY 1: BITTER URBAN BARBARIANS
Kiyose (1977: 80) listed two Jurchen characters in a row with identical shapes:
399: <ku>; transcribed in Chinese as 苦 *ku 'bitter'; phonogram for the second syllable of takura- 'send':
<ta.ku.ra> (cf. Manchu takūra- 'id.')
400: <duman>; transcribed in Chinese as 都蠻 *duman 'capital-southern barbarian'; phonogram for the second half of soridu-man 'fighting, melee':
<sori.duman> (cf. Manchu soridu- 'to fight' [Kiyose 1977: 122; I can't find this word in any Manchu lexicon at hand])
One might wonder if Jurchen characters often had multiple readings like the Chinese characters used in Japanese. However, I could only find three Jurchen characters in Kiyose (1977) with multiple readings. I'll look at the other two in parts 2 and 3 of this series.
Kiyose (1977: 80) wrote,
This character [400] seems to be exactly the same as character 399 as far as appearance goes. These characters were, however, perhaps different from each other, and one or the other is presumably a scribal error.
Either possibility raises questions I cannot answer:
If 399 and 400 are a single polyphonous character (i.e., a character with multiple readings), which reading came first? Or was the character designed with two readings in mind?
If 399 and 400 were originally distinct characters, what did the lost other character look like?
1.7.00:59: Jin (1984: 160) listed two variants of 399/400:
The second was read <ku>. I do not know which reading(s) belonged to the first. Could one or both of these variants actually be distinct characters? For example, perhaps
≠
~
or
~
≠
![]()
were <ku> and <duman> or vice versa.
In either case,
what is the relationship between the shape(s) of 399/400 and its readings?
why was a phonogram <duman> created even though there was no suffix -duman? Kiyose (1977: 122) analyzed soriduman as soridu-man with a nominal suffix -man, and Kiyose (1977: 80) speculated that -du- "could be the cooperative verbal suffix; cf. Ma. -ndu- id." I know of no Manchu word duman, so there may not have been a Jurchen word duman.
1.7.1:28: why not write the infrequent syllable sequence duman with phonograms as, say,
<du.man>
(The character <man> is in Jin's (1984: 231) entry for the place name 滿涇站 <man.ging.jan> but does not have an entry of its own in that dictionary.)
or
<du.ma.an>?
Next: Scientia alba
12.1.5.18:20: FLORA DIVINA
Kiyose's (1977) A Study of the Jurchen Language and Script: Reconstruction and Decipherment has a list of 728 Jurchen (large script) characters arranged by number of strokes:
| Number of strokes | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| Number of characters | 1 | 4 | 14 | 73 | 165 | 197 | 155 | 88 | 22 | 9 |
| Percentage of total | 0.1% | 0.5% | 2% | 10% | 23% | 27% | 21% | 12% | 3% | 1% |
The characters in Jin Qizong's (1984) Jurchen dictionary have a similar distribution.
I'd like to make a similar table for the number of syllables in a Jurchen character. There is no correlation between graphic complexity and the complexity of a Jurchen reading: e.g.,
~
<r> (phonogram; zero syllables : eight strokes)
~
<uyewunju> 'ninety' (logogram; four syllables: three strokes)
Kiyose lists only one Jurchen character whose reading begins with a consonant cluster:
~
<lha> [lχɑ] (the variant is from Jin 1984)
(Written Tibetan lha [l̥a] is 'god'.
This character only appears in the spelling
<i.lha>
for ilha 'flower'. No Jurchen word could begin with <lh>, and no other Jurchen character had a reading beginning with a nonhomorganic cluster. Why wasn't 'flower' spelled
<il.ha>
just as ilhahong 'shallow' was spelled
<il.ha.hong>?
The above two readings are based on those in Jin (1984). However, Kiyose (1977: 135, #694) read 'shallow' as <ir.ha.hun> with <r> instead of <l> on the basis of the Chinese transcription 一兒哈洪 *i ri xa xuŋ. Kiyose did not reconstruct <il> as the reading of any Jurchen character. Hence if Kiyose is correct, a spelling <il.ha> would not be possible. There was no Jurchen character read <l>, so <i.l.ha> would also not be possible.
Why was irhahun (possibly irhahũ?) written as <ir.ha ...> whereas flower was written as <i.lha>? Why not write Ch-clusters consistently either as <C.h> or as <.Ch>?
I was surprised a <lhV> graph exists at all because I assumed that
- there were no <lh> (or <Ch>-initial words in Jurchen just as there were no such words in Manchu. The absence of initial consonant clusters is a trait of 'Altaic' languages , though there are exceptions: e.g., Middle Korean had ᄣ pst- and Monguor developed st- under Tibetan influence (Ramsey 1987: 202).
- Jurchen character readings would be pronounceable in isolation.
Then again, Jurchen did not have syllabic r, so
~
<r>
would also not be pronounceable in isolation.
Literate Jurchen knew of the Khitan scripts which also contained chararacters with subsyllabic readings, though such characters might have been read with inherent vowels in isolation.
I wonder if
was originally a disyllabic logogram <ilha> 'flower' (could it be even a drawing of a flower?) and
<i.lha>
was added later, so the later two-character spelling for ilha
should be transliterated as <i.ilha> and Jurchen referred to the second character in isolation as <ilha>, not <lha> with an initial <lh> they couldn't pronounce.
There are many Jurchen words written as <logogram.phonogram(.phonogram)>: e.g.,
<guru.un> 'country' (the first character is related to Chn and Khitan large script 囯 'country').
I presume these words appeared as <logogram> sans <phonogram> in 女眞字書 The Book of Jurchen Characters in which "[a]lmost all the individual characters [...] represent complete words" (Kane 1989: 8-9): e.g.,
>
?
Kane (1989: 29) lists an extreme case of a logogram (a drawing of a saddle?) being replaced by a logogram-phonogram-phonogram sequence:
>
<engemer> (BJC spelling) > <engemer.ge.mer> (later spelling) engemer 'saddle'
Is <i.ilha> the only case of a logogram later preceded by a phonogram, or are there others?
Next: The Bitter City of the Southern Barbarians
Maybe Someday: Initial Consonant Clusters in Jin's Jurchen Reconstruction
12.1.4.16:09: *K-OUNTING IN TANGUT
Last Friday, I proposed that Tangut 'six' and 'seven' shared a *k-prefix. The following day, I wondered if I could reconstruct a *k-prefix in other Tangut numerals. Let's see how far I can go with that. But first, let me predict the effects of *k-prefixes on different pre-Tangut initial classes:
1. *k- + nonfricative obstruent = aspirated nonfricative obstruent
2. *k- + fricative = fricative + tense vowel
3. *k- + nasal = nasal
4. *k- + glide = ʔ- + glide
5. *k-r- > *k- + grade II rhyme
6. *k-l- > lh-
7. *kV- may condition
lenition of the following consonant
bending of the root vowel
upward if *V is *ɯ
downward if *V is *ʌ
How many of those predicted reflexes can be found among Tangut numerals?
| Gloss | Tangraph | Reading | Pre-Tangut | Tibetan transcription | Also cf. | Notes |
| one | ![]() |
1lew | *Cʌ-tek | gliH, gli, kli | OC 隻 *Cɯ-tek, WT gcig | Pre-Tangut and OC prefixes could have had initial *k- |
| two | ![]() |
1niəə | *(k-)niəə | gniH, gni, nyi | WT gnyis | *k-prefix possible, but no internal evidence for it: *kn- > *hn- > n- |
| three | ![]() |
1sọ | *k/s-so | gsoH, gso, bso, so | WT gsum | Prefix could be *k- or *s- |
| four | ![]() |
1lɨəəʳ | *r-ləə | ldiH, ldi, zlaH | Mawo Qiang gʐə | Initial l- instead of lh- < *k-l- rules out *k-prefix; I could claim the prefix dropped before it left a trace, but I'd rather not do so |
| five | ![]() |
1ŋwə | *(k-)pʌ-ŋə | bngiH, rngwa | WT lnga | *k-prefix possible, but no internal evidence for it: *(k-)pʌ-ŋ- > *kpŋ- > *kŋw- > *hŋw- > ŋw- |
| six | ![]() |
1tʃhɨiw | *k-trik | chiH, chi | WT drug | Aspirate initial points to *k-prefix |
| seven | ![]() |
1ʃɨạ | *k/s-ʃa | sha, gshaH | Mawo Qiang stə, Taoping Qiang ɕiŋ | Prefix could be *k- or *s- |
| eight | ![]() |
1ʔjaʳ | *k-rja | rye, na (sic!) | WT brgyad < *bryad | Unsure if Tangut had ʔj- instead of j- |
| nine | ![]() |
1giəə | *gəə | HgiH, dgiH | Mawo Qiang rguə | Initial g- instead of kh- < *k-g- (cf. 'ten thousand' below) rules out *k-prefix; I could claim the prefix dropped before it left a trace, but I'd rather not do so |
| ten | ![]() |
2ɣạ | *(k-)sʌ-KaH | Hga, k.ha, dgaH | Daofu zʁa | No need for a *k-prefix, but if one existed, it could have fused with *s-: *k-s- > *kʃ- > *ʃ- > *s-. |
| hundred | ![]() |
1ʔjiʳ | *k-rji | (none) | Mawo Qiang khiʴ, WT brgya < *brya | Unsure if Tangut had ʔj- instead of j- |
| thousand | ![]() |
1təụ | *(k-)sʌ-tu | tu (?) | Taoping Qiang χto < *st-?, WT stong | No need for a *k-prefix, but if one existed, it could have fused with *s-: *k-s- > *kʃ- > *ʃ- > *s-. |
| ten thousand | ![]() |
2khiə | *k-gəH | (none) | Taoping Qiang χgya < *s-g-?, Daofu khʂə < *s-g-?, WT khri | Not possible to reconstruct *k-prefix using internal evidence due to lack of kh- ~ g- alternation; root *g- reconstructed on the basis of Taoping Qiang |
In a 'strong-k' scenario in which I reconstruct as many *k-prefixes as possible, the only numbers without them are 'four' and 'nine' (unless I resort to the 'prefix dropped without a trace' trick - ugh).
In a 'weak-k' scenario in which I reconstruct as few *k-prefixes as possible, the only number with a prefix is 'six', and even its *k- is debatable because there is no Tangut-internal alternation tʃ- ~ tʃh- suggesting a prefix.I suspect several, though not all, of the numerals above had *k-. I am tempted to interpret the k- and g- in the Tibetan transcriptions as a prefix rather than as a tone letter, so 'one', 'two', and 'three' and perhaps 'ten' may have had k- in the transcribed dialect.
I don't understand why prefixation isn't consistent even in the 'strong-k' scenario or in Written Tibetan:
| WT prefix | g- | b- | l- | d- | s- | none? |
| WT numeral | gcig 'one', gnyis 'two', gsum 'three', perhaps dgu 'nine' via Sa-skya Pandita's law (named by Hill 2011): *g- > d- before grave consonants. | bzhi 'four', bdun 'seven', brgyad 'eight', bcu 'ten', brgya 'hundred' | lnga 'five' | drug 'six'; the d- of dgu 'nine' could be from *g- (see the g-column) | stong 'thousand' | khri 'ten thousand' |
Janhunen (1994) pointed out that there is no consistent numerical radical in the Tangut characters for numerals. The component
(alphacode: dex)
appears in 'one', 'four', 'six', and 'nine' and at least 1183 other characters. One out of five characters contains dex, the most frequent of 825 different character components. Determining the function(s) of dex could be a key to the Tangut script.
Next: Flora divina
12.1.3.2:31: THE *REK-ONING
On Friday I stumbled on Schuessler's (2009: 132) entry for the homophones 歷 'to count, to experience, calendar' and 曆 'calendar', both Old Chinese *rek. Schuessler compared those words to
Written Burmese ရေ re 'to count'
Kanauri ri (no definition given)
Written Tibetan rtsi-ba < *rhji < *rhi 'to count', rtsis-pa 'astronomer'
This comparison also appeared on p. 73 of his 2007 dictionary.
I was surprised by his derivation of rtsi from *rhi. I couldn't find any sound change like -ts- < *-h- before *i in Nathan Hill's recent (2011) compilation of Tibetan sound laws. Such a change looked odd to me until I inserted some extra steps (in bold):
0. *rhi
1. *rhyi (palatalization of *rh before *i)
2. *rhji (fortition of *y)
3. *rhci (devoicing of *j before *h)
4. *rci (loss of *-h-)5. rtsi (shift of *c from palatal to alveolar - why?)
Written Tibetan has no rc-, though other similar clusters are possible (derivations from Jacques 2004):
| lc- < *hly-, *lt-y- | (no rc-) | rts- |
| lj < *n-ly- | rj- < *r-ly- | rdz- |
Is the absence of rc a chance gap (was there no pre-Tibetan *r-hly-?) or is rts- partly derived from *rc-?
Benedict derived Tibetan rtsi- from Proto-Tibeto-Burman *r-tsiy, later revised to *r-tśrəy (Matisoff 2003: 79; STEDT etymon 2738), and linked it to his Old Chinese 數 *śri̭u = my *sroʔ 'to count'. However, the vowels of the PTB and Chinese forms do not match. OC 數 *sroʔ might share a *s-r-ʔ root with OC 算 *sonʔ < ?*sorʔ 'to calculate' (cf. Jpn soroban 'abacus').
Neither Schuessler's *rhi nor Benedict's PTB forms have codas corresponding to the *-k of OC 歷/曆 *rek. Would Schuessler regard that *-k as a "k-extension"?
I think there might be a relationship between OC *rek and WB re, but am hesitant to relate them to WT rtsi-.
There are no TSr-clusters in WT, so rts- might be partly from *tsr-. However, I do not know of any Tibetan prefix ts- and hence cannot derive rtsi- from *ts-ri- with a root *ri cognate to the OC and WB r-words.
Schuessler's *rhi avoids the problem of explaining what *ts was, but are there any other examples of WT rts- from *rh-, and why did Tibetan have *rh- instead of simple *r-?
My *r- [rˁ] was an allophone of */r/ before and after nonhigh vowels that became phonemic after presyllabic loss and reduction:
*/re ra ro/ [rˁeˁ rˁaˁ rˁoˁ] > */rˁe rˁa rˁo/ *re *ra *ro
*/ri rə ru/ [ri rə ru] > */ri rə ru/ *ri *rə *ru
*/Cʌ-rV/ [CˁʌˁrˁVˁ] > */(C)rˁV/ *(C)rV
*/Cɯ-rV/ [CɯrV] > */(C)rV/ (C)rV
The phonemicization of pharyngealization (a.k.a. 'emphasis' on this site) in Old Chinese is similar to the phonemicization of palatalized consonants in Slavic: e.g.,
Russian тьма /tʲma/ (one syllable) < *tĭma (two syllables) 'fog'
In both cases, vowels that conditioned allophony were lost and the allophones became phonemes.
12.1.2.2:02: DISSECTING DRAGONS
(I originally intended to only dissect the Tangut character for 'dragon', but why stop there?)
There are two words for 'dragon' in Chinese languages, the noncalendrical 龍 (with at least 51 variants!) and the calendrical 辰 (with at least 19 variants!).The right side of 龍 looks like a drawing of a dragon, but the left side initially seems to defy explanation: 立 is 'to stand' and 月 is either 'moon' or 'flesh'. One might wonder if the Chinese think of dragons as standing on the moon. In Shuowen (100 AD), Xu Shen analyzed 龍 *luoŋ as an abbreviation of a phonetic 童 *doŋ plus 月 'flesh' and the shape (of a dragon, presumably) in flight. However, *d-phonetics are otherwise unknown in *l-graphs and as far as I know, 龍 originated as a drawing of a dragon that was later split into three components. Two resemble unrelated components 立 'to stand' and 月 'moon/flesh' while the third is only found in 龍 and its variants and compounds.
According to Richard S. Cook (1995), 辰is in fact a representation of a scorpion in striking position as seen in profile. It is shown that this representation bears directly upon the once vigorous traditions relating to the ancient equinoctial position of the star Antares in the Breast of the Celestial Scorpion. And though certain stellar concepts betray the likelihood of an early (pre-OBI [oracle bone inscription]) Sino-Mesopotamian relation (stimulus diffusion), these concepts nevertheless took peculiar Chinese form, such that it is possible to demonstrate the cognacy of Chinese 辰 chén and ‘scorpion’ words in Sino-Tibetan.
I have not yet read this monograph, so I don't know how 辰 'scorpion' came to mean 'dragon'. I would reconstruct 辰 as Old Chinese *dər which only shares a *d with Matisoff's Proto-Tibeto-Burman *s-diik 'scorpion' and doesn't have any strong matches in the STEDT database or in Tangut.
One might expect the Tangut, Jurchen, and Khitan graphs for 'dragon' to resemble some of the 70+ Chinese graphs for 'dragon', but none have any obvious Chinese origin:
| Khitan large script | Jurchen (large) script | Khitan small script | Tangut |
![]() |
![]() ~![]() ![]() |
![]() |
![]() |
| <lu> | <mudu.r> = mudur | <lu> | 1vəi |
The Khitan large script character and Jurchen <mudu> are obviously related, though it is not certain whether the Jurchen character was derived from its KLS equivalent or if both were derived from a common Parhae prototype.
The second Jurchen character <r> may have been added later if the first character originally stood for <mudur>. <r> has nine strokes and is surprisingly complex for a graph representing a single consonant. Then again, its Chinese equivalent 兒 -r has eight strokes. To Chinese eyes, <r> looks like two 人 people standing atop a 羊 sheep minus one horizontal stroke. The rationale for the structure of <r> is unknown. It also has a variant with Xs instead of 'people':
No other Jurchen characters have 人x 2 or X x 2 as top elements.
The Khitan small script character <lu> may or may not be derived from its large script equivalent.
Khitan <lu> was borrowed from Chinese 龍 *liuŋ, though their graphs are completely different.Jurchen mudur is from Proto-Tungusic *muduri. It is vaguely similar to Middle Korean mirɯ < ?*mitɯ 'dragon', but the vowels do not match. Japanese mi 'snake (calendrical)' might be related to the Korean word. But if it's from *mi rather than *məi, *moi, or *mui, it could just be an abbreviation of Old Japanese pəymi, itself probably a loan from a relative of Middle Korean pʌyam 'snake'.
Tangut 1vəi may be from *Cʌ-Pi. The presyllabic vowel conditioned the lenition of the following labial consonant and the partial lowering of *i.
The Tangut character for 'dragon' has four parts:
=
+
+
+
The Tangraphic Sea analysis of 'dragon' is
=
+
0083 1vəi 'dragon' =
top of 0111 (first half of
1lɨə 1lwɨụ 'to crawl' - a reduplicated root?) +
bottom of 4234 (first half of
1vəi 1məuʳ 'dragon tree' (lit. 'dragon dark' = 'dark dragon')
I doubt that the character for the first half of 'dragon tree' was devised before the much more frequent character 'dragon'. 'Dragon tree' looks like 'dragon' and 1məuʳ 'dark' plus the 'wood' radical. The Tangraphic Sea analyses confirm that:
=
+
4234 1vəi = top of 4250 1si 'wood' + bottom of 0083 1vəi 'dragon'
=
+
4117 1məuʳ = top of 4250 1si 'wood' + all of 1məuʳ 'dark'
The second half of 'to crawl' is derived from 'dragon' and the first half of 'to crawl':
=
+
+
+
0047 1lwɨụ (2nd half of 'to crawl') =
top of 0083 1vəi 'dragon' +
bottom right of 0111 1lɨə (1st half of 'to crawl') +
bottom left of 41691tshõ 'desolate' (why?) +
bottom left of 0080 2phɔ 'snake'
The first half of 'to crawl' is not derived from the second:
=
+
0111 1lɨə (1st half of 'to crawl') =
0054 1tswa 'hair worn in a bun' (why?)
0338 1lɨə 'to lock up' (phonetic)
The top component of 0054 may mean 'top'. 'Dragon' is a top animal and hair worn in a bun is near or at the top of the body, but things crawl on the bottom, not the top.
The analysis of 0054 implies that the top element does mean 'top':
=
+
0054 1tswa 'hair worn in a bun' =
0055 2tʃɨw 'top of the head' +
2061 2pɛ̃ 'hair'
Unfortunately, no analysis of 0055 is known, so the chain of characters with 'top' ends there.
If the top element of 'dragon'
is 'top', what is the bottom? There is only one other tangraph with the same bottom elements
+
+
as 'dragon' and the first half of 'dragon tree':
1188 2ŋa 'egg' (analysis unknown)
The function of the top element ユ is unknown. Were dragons 'top eggs'? 1188 in turn had a derivative
=
+
1210 2dʒæ̃ 'egg' =
frame of 1188 2ŋa 'egg' +
? of 0088 1tew 'egg' (defined as 1210 in Tangraphic Sea)
No part of 0088 matches the bottom center of 1188. Is this Precious
Rhymes of the Tangraphic Sea analysis really a list of synonyms?
Why did Tangut have three words for 'egg'? How did these words differ?
What if 'dragon' had nothing to do with eggs? Three of the four parts of 'dragon' vaguely resemble the components of 龍:
: 立
:月
: right of 龍
But what about the fourth part 干? Is it the horizontal lines 二 from 月 plus an additional vertical line?
:月?
If 'dragon' is not a heavily disguised 龍, what is it?
Next: The *Rek-oning
12.1.1.12:21: DISSECTING THE DATE 2012
Here's the solution to the problem I posted last night:
The five Tangut characters under 'dragon'
say '2012 year'. Can you identify the characters for
1. 'two'
2. 'thousand'
3. 'ten'
4. 'year'
And can you figure out whether the line at the bottom is read from left to right or right to left?
The first clue is '2012 year'. The un-English order of these words is absolute. '2012' comes first, followed by 'year'. If the line was meant to be read from left to right, 'year' should be the character on the right:
Conversely, if the line was meant to be read from right to left, 'year' should be the character on the left:
Since
appears twice and '2012 year' only contains one 'year', that character cannot mean 'year'. So by process of elimination, the line must read '2012 year' from right to left:
|
|
|
![]() |
![]() |
|
| 4. year | ? | ? | ? | ? |
| 2012 | ||||
My next clues were in the questions. I asked if you could identify characters for 'two', 'thousand', 'ten', and 'year'. We already know what 'year' is, so 'two', 'thousand', and 'ten' must be among the remaining four characters.
The gloss 'ten' hints that 'twelve' must contain 'ten' in it: 'ten two' or 'two ten' (cf. Sanskrit dvaa-daśa 'two-ten' = 'twelve')..
The character
appears twice, so it must be the 'two' in 'two thousand' and 'twelve' (= 10 + 2 or 2 + 10):
|
|
|
![]() |
![]() |
|
| 4. year | 1. two | ? | ? | 1. two |
| 2012 | ||||
Often the key to solving my puzzles lies in finding a character that appears more than once correlated with something that appears more than once in the gloss. Once this character is identified, the rest of the pieces fall into place.
In theory, the line could be either
'two ten thousand two year' ((2+10) + (1000 x 2))
or
'two thousand ten two year' ((2 x 1000) + (10 + 2))
read from right to left, but I was hoping the reader would assume that the order I listed the glosses in
1. 'two'
2. 'thousand'
3. 'ten'
4. 'year'
was the order the characters were read in:
|
|
|
![]() |
![]() |
|
| 4. year | 1. two (again) | 3. ten | 2. thousand | 1. two |
| 2012 | ||||
I was also hoping that the reader would have the English phrase two thousand (and) twelve in mind. The Tangut equivalent 'two thousand ten two' is close.
If Tangut had the more exotic word order
'two ten thousand two year' ((2+10) + (1000 x 2))
I would have listed the glosses in that order as a hint: 'two', 'ten', 'thousand'. Or I might not have asked the question. I'd be reluctant to ask someone to figure out that French
quatre-vingt-quatre
lit. 'four-twenty-four' ((4 x 20) + 4)
is 'eighty-four' is tough unless I gave the hints 'four' and 'twenty'. And even then, one would not know if the structure of that numeral were
((4 x 20) + 4) = 84
or
(4 + ((4 x 20)) = 84
without another example like
quatre-vingts
(4 x 20s) = 80
Moreover, one might even think that 'four-twenty-four' could be 'ninety-six' (4 x 24). Sanskrit has numerals like tri-nava 'three-nine' (3 x 9) for 'twenty-seven', though I haven't seen any Sanskrit numeral as complex as catuś-catur-viṃśati 'four-four-twenty' (4 x (4 + 20)).
(Skt catur is cognate to Eng four and Fr quatre. Its final -r becomes -ś before c-: catuś-catur. Skt viṃśati 'twenty' is cognate to Fr vingt.)
Next: Dissecting the Dragon (I originally meant to include that in this post, but I decided to separate the two topics.)
12.1.1.3:31: HAPPY SIW YEAR 2012
This siw (Tangut: 'new') year is associated with the vəi (Tangut: 'dragon'):
The five Tangut characters under 'dragon' say '2012 year'. Can you identify the characters for
1. 'two'
2. 'thousand'
3. 'ten'
4. 'year'
And can you figure out whether the line at the bottom is read from left to right or right to left?
No knowledge of Tangut is required. Logic is sufficient.
11.12.31.20:23: TRAGICAL TANGRAPHY
This description of a cryptic crossword clue reminded me of some tangraphic analyses:
15D Very sad unfinished story about rising smoke (8)
is a clue for TRAGICAL. This breaks down as follows.
15D indicates the location and direction (down) of the solution in the grid
"Very sad" is the definition
"unfinished story" gives "tal" ("tale" with one letter missing; i.e., unfinished)
"rising smoke" gives "ragic" (a "cigar" is a smoke and this is a down clue so "rising" indicates that "cigar" should be written up the page; i.e., backwards)
"about" means that the letters of "tal" should be put either side of "ragic", giving "tragical"
"(8)" says that the answer is a single word of eight letters.
There are many "code words" or "indicators" that have a special meaning in the cryptic crossword context. (In the example above, "about", "unfinished" and "rising" all fall into this category). Learning these, or being able to spot them, is a useful and necessary part of becoming a skilled cryptic crossword solver.
Tangraphs have no components equivalent to "15D" or "(8)", but "very sad" is like a semantic element of a tangraph and "unfinished story" and "rising smoke" are like cryptophonetics in tangraphs: e.g.,
=
+
5916 1xã (transcription of Chinese 漢 *xã 'Chinese') =
all of 5882 1zaʳ 'Chinese' (cryptophonetic referring to its Chinese translation 漢 *xã 'Chinese'; a semantic compound of
=
+
'small' + 'insect') +
right of 0789 2ɣʊ 'the surname Ghu' (function unknown)
High-frequency elements like ヒ (alphacode: cin) on the right of 5916 and 547 other tangraphs might be like the "code words" or "indicators" of cryptic crossword puzzles.
The indicator "about" reminds me of the term
5258 1ʔɔ̣ 'round'
used in tangraphic analyses to mean 'take the surrounding elements of the preceding character': e.g., 2634 is made up of the surrounding elements of 2639 plus the right side of 2705:
=
![]()
2634 1dʒwiõ 'publicize; propagate; declare; spread; to name' =
2639 2miee 'name' (semantic)
5258 (take the surrounding elements of 2639)
3678 2to 'to be born; to rise' (semantic)
2705 (take the right side of 3678)
I translate 5258 as 'frame' in analyses: e.g.,
2634 = frame of 2639 + right of 2705
I still do not know whether the analyses from the Tangraphic Sea reflect the intent of the creator(s) of the script or were (independently?) devised later as mnemonic devices.
11.12.31.13:33: LEDYARD ON "THE SO-CALLED JURCHEN SCRIPT"
While looking through The Korean Alphabet for Middle Korean ss-words last night, I found this passage by Gari Ledyard (1997: 54; emphasis mine):The so-called Jurchen script was more a code than a writing system; to this day its complete decipherment is unattained and probably unattainable given the few written texts that still exist. Although what exists is often partly decipherable because of surviving Sino-Jurchen glossaries, no one yet has figured out the principle of this writing - indeed it may not have had any. If it did no more than discourage Koreans from imitating it in developing their own writing, it made a noble contribution [to the development of hangul, the Korean alphabet].
14 years later, I have yet to see anyone explain the principle(s) of the Jurchen script. When I first took a serious look at it 15 years ago, it struck me as a random imitation of Chinese characters. Its strokes were mostly Chinese, but they weren't combined into phonetic or semantic elements recycled in multiple characters. Learning one character with a certain component would not help you learn the pronunciation or meaning of any other characters sharing that component. The recurring shape 山 has no apparent recurring function in Jurchen. How could anyone learn such a nonsystem of c. 1,000 characters, excluding variants? I used to think that the Jurchen script was to sinography what the Cherokee syllabary was to the Roman alphabet - a recycling of shapes without regard for phonetics.
Some [Cherokee] symbols do resemble the Latin, Greek and even the Cyrillic scripts' letters, but the sounds are completely different (for example, the sound /a/ is written with a letter that resembles Latin D).
However, my analogy was incorrect because the Jurchen elite were literate in Chinese, whereas Sequoyah was not literate in English. Sequoyah did not know how alphabets worked, so he independently invented a syllabary. The Jurchen, on the other hand, must have understood the semantophonetic principles of sinography, so why did they create a script that had no (obvious) principle?
Juha Janhunen did not think the Jurchen actually created a script. He viewed the Jurchen script as an offshoot of a Manchurian branch of the Chinese script:
| Proto-sinography | ||
| Sinography proper | Manchurian sinography | |
| (the existence of the Parhae script is still controversial) | ||
| Khitan large script | Jurchen (large) script | |
Although I think Janhunen is correct, his view leads to more questions. What was the principle of the Khitan large script? Why does the Manchurian sinographic tradition seem to be based on different principles (if any?) from mainstream sinography? Do the Khitan and Jurchen (large) scripts seem to lack principles because they were originally designed for a third language spoken in Parhae? That third language would most likely be Koreanic (or possibly even Japonic) since Parhae was a successor to Koguryo. But I don't recall seeing anything hinting at Japonic-based phonetic elements in the Khitan and Jurchen (large) scripts and my attempts to find Koreanic-based phonetic elements have been unconvincing:
Koreanic *an 'not' in "An-certain about Oxen in Jurchen"
Koreanic *on- 'to come' in "Getting Back on the Jurchen Track"
I am more interested in the Khitan and Jurchen (large) scripts than the Khitan small script because the principles of the former are a mystery, whereas the principles of the latter are at least somewhat understood, though the details remain hazy and the phonetic values of many symbols await identification.
The stacking principle of the Khitan small script (and the Jurchen small script?) is very reminiscent of the stacking principle of hangul. I am still not certain that this similarity is just a coincidence. Could the stacking in all three scripts reflect stacking in an earlier fourth script (i.e., Parhae) rather than Khitan and/or Jurchen influence on hangul? If the occasional ligatures of the Khitan large script such as
<
<muɣoo> < <mu> + <ɣoo> 'snake'
predate the Khitan small script, they could be forerunners to the stacking of the Khitan small script.
11.12.30.23:59: SSEQUENCES (SSIC)
While looking up 0586 in Li Fanwen (2008: 100) last night for the analysis of 1306 in line 95 of the Golden Guide, I saw the entry for 0585
2śjị 'cogon grass' (in Gong's reconstruction; mine is 2ʃɨị)
and wondered how it was pronounced in pre-Tangut if Tangut tense vowels (in rhymes 61-75 in Gong's reconstruction and mine) were conditioned by earlier *s-clusters:
*sCV > *CCV > *CCṾ > CṾ
Was 'cogon grass' once *sʃiH? (The -ɨ- in later 2ʃɨị is nonphonemic. /i/ is [ɨi] after alveopalatals: cf. Russian ши [ʃɨ]. *-H - a glottal stop or fricative - is the source of the second tone. *-H may ultimately be from an *-s in at least some cases.)
Just as Russian SS-clusters came from earlier *SVS-sequences: e.g.,
ссора < съсора (the prerevolutionary spelling) 'quarrel'
Tangut SS-clusters could have had similar origins: e.g.,
*sʃiH < *sɯʃiH 'cogon grass'
*ɯ is my cover symbol for a pre-Tangut vowel that conditioned high vowels in Tangut.
But the simple prefix *s- that Gong proposed may be another source.
A third source might be *h(V) or *χ(V) or *x(V): cf. Ramsey's (1997: 135) emphatic prefix *hɯ- in proto-Korean on the basis of the 雞林類事 Jilin leishi (1103-1104) transcription of 'to write' as
核薩 *xəʔ saʔ (cf. Late Middle Korean ssɯ-)
Qiang languages have χC- and xC-clusters. Ronghong Qiang has xs- corresponding to Mawo Qiang khs-:
RQ xsə : MQ khsə 'new' < *k-sə (the root is *sə, cognate to Tangut
1siw < *sik
and Old Chinese 新 *sin 'new')
Perhaps pre-Tangut *h- or *χ- or *x- could be from an even earlier *kV- that lenited to a fricative before fricatives after its vowel was lost: e.g.,
*kVSV > *kSV > *HSV > *SSV > *SSṾ > SṾ
I derive Tangut aspirates from pre-Tangut *k-C- if they alternate with nonaspirateś: e.g.,
1pị < *s-pi 'to aim at' (*s- is a transitive verb prefix)
1phi < *k-pi 'aim' (noun)
There are no fricative-initial words with such alternations since Tangut has no aspirated fricatives. Perhaps the reflexes of *kS-clusters may be found among SṾ-words with SV-cognates: e.g.,
2sie < *Cɯ-seH 'to know; knowledge'
2siẹ (rather than 2shie) < *kɯ-seH 'knowledge'
12.31.11:36: The Tangut words for 'know' are cognate to Tibetan shes- 'to know'. According to von Koerber's rule*, sh- is from *sy-. So I could rewrite the Tangut derivations as
2sie < *sjeH 'to know; knowledge'
2siẹ (rather than 2shie) < *k-sjeH 'knowledge'
I would no longer need *ɯ-prefixes to account for the upward bending of *e to ie. ie would simply be a glide-vowel sequence /je/ reanalyzed as a diphthong /ie/.
Tangut -H was probably from an *-s corresponding to the final -s of Tibetan shes- < *syes-. So pre-Tangut *sjes and pre-Tibetan *syes were identical. (The choice of j or y for the palatal glide is merely a convention.) Of course, one should not expect all pre-Tangut and pre-Tibetan forms to be identical: e.g., Tangut ʃɨạ 'seven' is not cognate to Tibetan bdun 'id.'
Could Tangut 'six' and 'seven' share a *k(V)-prefix?
1tʃhɨiw < *k(ɯ)-trik or *k(ɯ)-drik 'six' (cf. Tibetan drug; could Tangut -i- be from a *-y- < *-u- that assimilated to a front vowel *-i- in the prefix?)
1ʃɨạ < *kɯ-ʃa 'seven'
(or did ʃ < *kʃ- < *ks- < *kɯ-ʃ-? cf. Skt kṣ [kʂ] < *ks)
(or did ʃ < *ʃt- < *st-? cf. Mawo Qiang stə 'seven' and German st [ʃt] < *st)
12.31.12:09: Here are several kinds of *s-/*k-presyllables and their effects on Tangut syllables.
I. Dropped without a trace
Presyllable vowel matches height class of following vowel:
*Cɯ-Ci > Ci
*Cʌ-Ca > Ca
II. Dropped with a trace
Presyllable vowel height causes following vowel to bend:
*Cɯ-Ca > *Cɯ-Cia > Cia
*Cʌ-Ci > *Cʌ-Cəi > Cəi
III. Fused before presyllabic vowel (if any) can condition lenition
*s(V)-CV > *sCV > *CCV > *CCṾ̣ > CṾ
*k(V)-CV > *kCV > ChV (if C is not a fricative; Sh- is not possible)
*k(V)-SV > *kSV > *xSV > *hSV > *SSV > *SSṾ > SV (S is any fricative)
IV. Fused after presyllabic vowel conditioned lenition
*sV-sV > *sV-zV > *szV > *zzV > *zzṾ̣ > zṾ
*kV-tsV > *kV-dzV > *kV-zV > *kzV > *gzV > *ɣzV > *ɦzV > *zzV > *zzṾ > zṾ
I couldn't think of cover symbols for 'lenited fricative' or 'lenited nonfricative', so I gave specific examples above.
Many consonants merged in lenition:
|
Consonant class (Homophones chapter) |
Before lenition |
After lenition |
|
Labials (I) |
*-p-, *-ph-, *-b- |
v- |
|
Dentals (III) |
*-t-, *-th-, *-d- |
l- |
|
Alveolars (VI) |
*-s-, *-ts-, *-tsh-, *-dz- |
z- |
|
Alveopalatals (VII) |
*-ʃ-, *-tʃ-, *-tʃh-, *-dʒ- |
ʒ- |
|
Velars (V, VIII) |
*-x-, *-k-, *-kh-, *-g- |
ɣ- |
Perhaps the glottal stop and sonorant consonants (nasals, liquids, and glides including v- /w/) did not lenite.
*I use Nathan Hill's (2011) names for Tibetan sound laws.
11.12.29.20:49: THE GOLDEN GUIDE: LINE 95: TANGRAPHS 471-475
95. Three out of five tangraphs are transcriptive characters not associated with any specific morpheme:
| Tangraph number | 471 | 472 | 473 | 474 | 475 |
| Tangraph | ![]() |
![]() |
![]() |
![]() |
![]() |
| Li Fanwen number | 1936 | 0707 | 4660 | 3774 | 1306 |
| My reconstructed pronunciation | 2xɛ̃ | 1tʃɨw | 1ʔiã | 2ʃɨõ | 1kiõ |
| Tangraph gloss | (transcription of Chinese) | district | (transcription of Chinese) | to guard | (transcription of Chinese) |
| Word | the surname 解 Xie (*xɛ) | the surname 周 Zhou (*tʃɨw) | the surname 燕/閆/鄢 Yan (*jã) | the surname 尚/商/賞 Shang (*ʃɨõ) or 昌/常 Chang (*tʃhɨõ) | the surname 龔/弓/宮/鞏 Gong (*kiũ) or 姜 Jiang (*kiõ) |
| Translation | Xie, Zhou, Yan, Shang/Chang, Gong/Jiang | ||||
471: 'High' on the left of 1936 is an abbreviated phonetic. Was there a Xie family that raised livestock?
=
+
1936 2xɛ̃ (transcription of Chinese) =
left of 2949 2xɛ̃ 'skill' +
all of 2306 1pə 'small livestock'
I am not sure that 1936 should be reconstructed with a nasal vowel. It could transcribe Chinese syllables with oral and nasal vowels:
解薤 *xɛ
衡 *xɛ̃
Perhaps 1936 was 2xɛj with a -j (cf. Gong's reconstruction 2xiəj).
(12.30:13:30: Li Fanwen 2008: 322 phonetically glossed 1936 as 郝 *xa, but the vowel doesn't match.)
472: 0707 is a semantic compound:
=
+
0707 1tʃɨw 'district' (borrowed from Chn 州 *tʃɨw) =
bottom left of 1408 1lhiooʳ 'place, site, market, street, military formation' +
left of 2627 2lɨə̣ 'earth'
473: Were the components of 4660 meant to be reminscent of Chn 炎/焱/焰 *jã 'flames'? 炎 and 焱 both consist of multiple 火 fires.
=
+
4660 1ʔiã (transcription of Chinese) =
bottom right of 4408 1məə 'fire' +
left of 5659 1veʳ 'flourishing, luxuriant'
I am not sure whether had a simple initial j- (as reconstructed by Arakawa) or an initial ʔ- (as reconstructed by Gong). I chose ʔ- because of its fanqie initial speller:
But perhaps 0932 also had initial j-. 0932 transcribed Chinese syllables which were *ʔi and *ji in Middle Chinese. It is not clear whether the *ʔi/*ji distinction survived into Tangut period northwestern Chinese.
=
+
4660 1ʔiã (transcription of Chinese) =
0932 1ʔɨi 'many, more, much' +
1102 1kiã (transcription of Chinese)
474: I suppose guarding enables the guarded to evade the effects of evil, but I would have expected a semantic compound like 'evil' + 'shield':
=
+
3774 2ʃɨõ 'to guard' =
left of 3551 2niõ 'evil, wicked, bad' +
center and right of 3789 1phie 'to escape, evade'
(12.30.12:21: Possibly borrowed from Chn 避 *phi 'to avoid'? But I would expect that to correspond to Tangut 1phi, not 1phie. Tangut -ie matches the -ie of Early Middle Chinese *bieh, but the initials don't match. Could Tangut ph- be from *k-b- with a native prefix *k- rather than from Tangut period NW Chn *ph- from EMC *b-?)
3774 could represent affricate-initial as well as fricative-initial Chinese syllables:
章 *tʃɨõ
昌*tʃhɨõ
Why not transcribe those syllables with tangraphs for tʃɨõ and tʃhɨõ, syllables which existed in Tangut?
Although all Chinese syllables transcribed with 3774 had nasal vowels, Gong reconstructed it as 2ɕjow and I wonder if its rhyme was -ow with a nasal vowel. Gong's glide codas correspond to my nasal vowels in his rhyme groups VIII and XI:
| Rhyme group | Rhyme | Grade | Gong | This site (nasal interpretation) |
This site (glide interpretation) |
| VIII | 41 | I | -əj | -ẽ | -ej |
| 42 | II | -iəj | -ɛ̃ | -ɛj | |
| 43a | III | -jɨj | -ɨẽ | -ɨej | |
| 43b | IV | -iẽ | -iej | ||
| XI | 56 | I | -ow | -õ | -ow |
| 57 | II | -iow | -ɔ̃ | -ɔw | |
| 58a | III | -jow | -ɨõ | -ɨow | |
| 58b | IV | -iõ | -iow |
(I have excluded tense, retroflex, and long vowel rhymes for simplicity. Unlike Gong, I recognize a Grade IV distinct from Grade III.)
2ʃɨow without a nasal vowel is close to Chn 守 *ʃɨw 'to guard', but I doubt the former was borrowed from the latter because the vowels don't match. I would expect Chn *ʃɨw to correspond to ʃɨw, a syllable that exists in Tangut.
475: 1306 represented the 龔 Gong of the late Tangutologist 龔煌城 Gong Hwang-cherng in the Forest of Categories.
1306 1kiõ was not a perfect match for Chn 龔/弓/宮/鞏 *kiũ but it was the best available match other than 1kiu. There was no Tangut rhyme -iũ.
Were any Gongs or Jiangs related to Su and/or Qian families?
=
+
1306 1kiõ (transcription of Chinese) =
0586 2siu (transcription of Chinese: e.g., the surnames 蘇 *su [without *-i-!] and 宿 *siu, now both Su in modern standard Mandarin)
3277 2tshia (transcription of Chinese: e.g., the surname 錢 *tshiã, now Qian in modern standard Mandarin)
3277 only transcribed Chn 千潛賤淺錢踐 *tshiã with a nasal vowel even though it belongs to the oral vowel rhyme group IV rather than the nasal rhyme group V. Were Tangut period northwestern Chinese vowels losing nasalization?
11.12.28.23:15: THE GOLDEN GUIDE: LINE 94: TANGRAPHS 466-470
94. Four out of these five are transcription characters not associated with any specific morpheme:
| Tangraph number | 466 | 467 | 468 | 469 | 470 |
| Tangraph | ![]() |
![]() |
![]() |
![]() |
![]() |
| Li Fanwen number | 5916 | 2152 | 2635 | 2138 | 3617 |
| My reconstructed pronunciation | 1xã | 1ʃɨi | 1xiõ | 2bəəu | 2xwe |
| Tangraph gloss | (transcription of Chinese) | grave | (transcription of Chinese) | ||
| Word | the surname 韓 Han (*xã) | the surname 施/史時/石/師 Shi (*ʃɨĩ)? | the surname 馮/鳳/豐酆/封 Feng (*fɨũ) or 方/房 Fang (*fɨõ) or Xiang 向 (*xɨõ) | the surname 慕 Mu (*mbəu)? | the surname 惠 Hui (*xwej) |
| Translation | Han, Shi, Feng/Fang/Xiang, Mu, Hui. | ||||
466: 5916 has 5882 as a cryptophonetic (its Chinese translation was 漢 *xã) plus the mysterious right-hand element ヒ (alphacode cin):
=
+
5916 1xã (transcription of Chinese 漢/韓/邯 *xã) =
all of 5882 1zaʳ 'Chinese' +
right of 0789 2ɣʊ 'the surname Ghu'
Does 0789 represent a Ghu family related to the Han?
The 馬韓 Mahan confederacy in Korea was called
2bæ 1xã (cf. Tangut period NW Chn *mbæ xã)
so Korea, the 韓國 'Han country', might be known as
1xã 2lhiẹ 'Han country'
in modern Tangut. Only two strokes (cin) would distinguish 'Korea' from 'Chinese'!
<>
1xã 'Korea' <> 1zaʳ 'Chinese'
467: Were the Shi the 'elder Nga'?
=
+
+
2152 1ʃɨi (transcription of Chinese 漢/韓/邯 *xã) =
2888 2mə 'surname' +
1633 2pəụ 'elder' +
2075 2ŋa 'the surname Nga'
468: 2635 looks like a combination of 'earth' (indicating a geographic name? from which tangraph?) plus an element of unknown function (alphacode: dol) found in only eight other tangraphs that don't sound like xiõ.
=
+
Although Nishida and Arakawa have reconstructed Tangut f-, I am skeptical because Chinese f-syllables were transcribed with tangraphs like this one listed in chapter VIII (glottal initials) of Homophones. (Velar x- is treated as a glottal initial and may have been glottal [h].)
469: The analysis of 2138 is unknown. It looks like 'earth' (cf. the 土 'earth' in Chn 墓 'grave') plus 'hand' plus an right-hand element of unknown function (alphacode: dal) found in 80 other tangraphs:
=
+
+
2bəəu 'grave' is borrowed from Tangut period northwestern Chinese 墓 *mbəu 'id.' The reason for the Tangut long vowel is unknown. Could it compensate for the loss of a native Tangut suffix?
*CV-X > CVV
470: The analysis of 3617 is unknown:
=
+
+
Its left component is 'person' but I don't know what the other two (alphacodes bal and juu) are doing. The sequence baljuu does not occur anywhere else. There are no other tangraphs pronounced xwe.
12.29.1:25: Could 'person' be from 2888 'surname' as in 2152 above?
11.12.27.23:45: THE ROOTS OF RAWNESS
Having just written about the etymology of Zhuang sawgun 'Chinese character', I should write about the etymology of the second half of sawndip. (The saw is the same.)
Despite the spelling, ndip 'raw' is [ɗip7] without a nasal. d without n is unaspirated [t] in Zhuang spelling. This usage is a carryover from Pinyin* in which d and t respectively represent unaspirated [t] and aspirated [th]. Zhuang has no [th], though it does have a [θ] written as s. The n of nd [ɗ] differentiates it from d [t]. The 1957-1982 spelling of [ɗ] was Ƌ, which might be a mirror image of the 1957-1982 letter Ƃ [ɓ] as well as a derivative of d.
Zhuang even-numbered tones usually developed in syllables with *voiced initials, but syllables with voiced implosive initials developed the odd-numbered tones associated with *voiceless initials:
| *Proto-voicing | *Proto-initial | Tones |
| voiceless | *p-, *t- ... | 1, 3, 5, 7 |
| voiced | *ɓ-, *ɗ- ... | |
| *b-, *d- ... | 2, 4, 6, 8 |
Tone 1 is not indicated in spelling. Tones 2-5 are indicated by silent letters following a syllable:
| Tone | 1957 spelling | 1982 spelling |
| 2 | -ƨ | -z |
| 3 | -з | -j |
| 4 | -ч | -x |
| 5 | -ƽ | -q |
| 6 | -ƅ | -h |
Note how similar the 1957 letters are to the numerals 2-6 and the Cyrillic letters г (italic), з, ч, and ь. (ƽ doesn't look like any Cyrillic letter.)
h can also be an initial letter in Zhuang, but z, j, x, and q are always tonal.
Syllables ending in stops can only have tones 7 and 8 which are indicated by the spelling of the stops:
| Tone | Spelling |
| 7 | -p, -t, -k |
| 8 | -b, -d, -g |
Tones 7 and 8 are identical to 5 and 6, but this spelling convention avoids final digraphs like -pq for -p with tone 5, etc.
Li Fang-Kuei (1977: 129) reconstructed 'raw' in Proto-Tai as *dl/rip. The reflexes of PT *dl/r- in 'raw' vary from d- to ɗ- to n- to r-. Some of the sawndip spellings of ndip imply earlier phonetic similarity with Middle Chinese *ɳ- (which is r-like) and *l-:
生 'raw' + 尼 *ɳi
立 MC *lip + 生 'raw'
生 'raw' + 立 MC *lip
月 < 肉 'meat'+ 立 MC *lip
立 MC *lip by itself
Other spellings have no (?) phonetic:
米 'rice' + 生 'raw' over 失 'lose'
生 'raw' + 勺 'ladle'
㐅'?' + 力 *lɨk 'strength' (phonetic?; *-k is a grave consonant like -p)
㐅 appears in at least 13 sawndip characters. I don't know what its function is.
Sawndip may be the earliest indigenous Tai writing system. It would be interesting to reexamine existing reconstructions of Proto-Tai with sawndip evidence in mind. Although the spellings of ndip 'raw' imply an earlier liquid or even nasal, Pittayaporn (2009) reconstructed Proto-Tai 'raw' as *C̥.dip without either a liquid or a nasal. *C̥- is a presyllable with a voiceless initial. Pittayaporn compares his PT *C̥.dip with Blust's Proto-Austronesian *quDip. PAN *D is [ɖ]. Could PAN *quɖip have been borrowed into Proto-Kra-Dai as *qudrip**, simplifying to PT *C̥.dip and Norquest's (2008: 277) Proto-Hlai *Curiip and Proto-Be *Curjəp? Or did PKD inherit 'raw' from an ancestor shared with PAN or even from PAN itself?
Benedict's Austro-Tai (Austro-Kra-Dai in modern terminology?)
| Proto-Austro-Tai (Proto-Austro-Kra-Dai) | |
| Austronesian | Kra-Dai (including Tai) |
Sagart: Kra-Dai as (Sino-)Austronesian subgroup
| Proto-Sino-Austronesian | |||
| Sino-Tibetan | Proto-Austronesian | ||
| Non-Muic subgroups of Austronesian | Muic | ||
| Non-Kra-Dai subgroups of Muic | Kra-Dai | ||
I used to be highly skeptical of a connection between Kra-Dai and Austronesian. As Pittayaporn wrote,
Benedict’s [Austro-Tai] work has been rightly criticized for its methodology and the quality of its evidence.
However, I now see
Undeniable evidence (Benedict 1942, Sagart 2004, and Ostapirat 2005) for some kind of relationship
But I have no opinion about which kind of relationship exists between Kra-Dai and Austronesian. For now, I can only recommend Pittayaporn (2009) as an overview of compression phenomena which may be relevant to compression in the histories of Chinese and Tangut.
*Not all Zhuang letters are used as in Pinyin. Exceptions:
Zh c is [ɕ] like Pinyin x, not Pinyin c [tsh]. Zhuang has no aspirates.
Zh j, x, q, z are tonal letters (see above), not consonants as in Pinyin.
Zh s is [θ], not [s] as in Pinyin.
Zhuang spelling indicates short vowels with an added -e- in closed syllables, whereas Pinyin has no devices for vowel length:
Short ae [a] oe [o] Long a [aa] o [oo] There is no length distinction in open syllables.
**12.28.00:11: A presyllable initial *q- is reconstructible in Proto-Kra-Dai on the basis of Buyang qaɗip 'raw' (Li Jinfang 1999 as cited in Sagart 2004: 50).
11.12.26.23:59: SAWGUN STRATOGRAPHY?
I was surprised to see this in Wikipedia's "Sawndip" entry (emphasis mine):The Zhuang word for Chinese characters used in the Chinese language is sawgun (Sawndip: (史+書)倱; lit. "original writing system") (saw meaning character or book, and gun meaning the Han Chinese ethnicity, cognate to 漢)
According to Sawndip sawdenj (Sawndip Dictionary), gun [kun1] means 汉 = 漢 'Chinese', not 'original'.
The Zhuang were in contact with Cantonese speakers. In standard Cantonese, 漢 is [hɔn5] from Middle Chinese *xanh which in turn may be from Old Chinese *hnars. If gun were "cognate to" 漢, I would expect it to be hoenq [hon5], hanq [haan5], or nanq [naan5]. No Chinese language known to me has
in 漢 'Chinese'.- initial k-
- the vowel -u-
- tone 1
Sawndip sawdenj lists three sawndip spellings for gun on p. 208:
倱 = 亻 'person'* + phonetic 昆 Ct [kwan1] < MC *kon < OC *kun
軍 Ct [kwan1] < MC/late OC *kun < OC *kur 'army'
倌 = 亻 'person' + phonetic 官 Ct [kuun1] < MC/late OC *kwan < OC *kwan 'government official'
At first I thought that the third spelling of gun might be the key to its etymology. The Zhuang may have heard Cantonese speakers refer to themselves as 官 [kuun1] 'officials' and come to call the Chinese gun 'officials'. The problem is that Ct [kuun1] has a long vowel, whereas Zh gun [kun1] has a short vowel. Vowel length is phonemic in Zhuang, so Ct [kuun1] should correspond to a Zh guen [kuun1]. Moreover, the shift of MC *wa to Ct [uu] was sometime within the last millennium. (Sino-Vietnamese from the late Tang Dynasty still has [waa] corresponding to MC *wa.) The Zhuang have been in contact with the Chinese for much longer than that, so their name for the Chinese would probably be older than a millennium.
MC (and late OC) 軍 *kun 'army' is a perfect phonetic match for Zh gun [kun1] 'Chinese'. though the semantic match is loose. Did the Zhuang hear Chinese soldiers speaking about a *kun and adopt that term as the name for their occupiers and even the civilians associated with them?
The "strato" in the title is from Greek στρατός 'army' and refers to Zh gun. Of course, "graphy" is also from Greek and refers to Zh saw [θaɰ1]. I also considered the title "Stratobiblion" because I think saw is borrowed from Chinese 書 'book, script' which is also the phonetic in three of its sawndip spellings from p. 451 of Sawndip sawdenj:
史 'history' + 書 'script'
書 'script' + 青 'green' (implying 'not ripe'; cf. ndip [ɗip7] 'immature' in sawndip 'Zhuang writing', lit. 'immature writing'.)
字 'character' + 書 'script'
字 'character' by itself can also represent Zh saw. Zh sawdenj 'dictionary' is a calque of Chinese 字典 'dictionary' = 'character book'. Zh denj [teen3] sounds like a borrowing from 典 MC *t(i)enʔ 'reference book'. (Oddly, denj has no entry in Sawndip sawdenj. I presume it was written as 典 without modification.)
A fourth spelling (土 atop 卜) might be a simplification of 書 'script'. Compare 土 atop 卜 to these cursive forms of 書.
Zh saw [θaɰ1] could be an attempt to imitate Cantonese 書 [søɥ1] which has a vowel and glide absent from Zhuang. However, I doubt the word is a recent loan. Its rhyme also matches the *-aɰ that Pulleyblank reconstructed for the Old Chinese rhyme category of 書, though most would reconstruct that category as *-a.
12.27.15:09: Sino-Vietnamese has two layers of correspondences for that Chinese rhyme category:
-ư [ɨɨ] (newer layer from Late Middle Chinese: e.g., 書 thư)
-ưa [ɨə] (older layer from Early Middle Chinese; no known loan of 書 from this layer)
I am surprised Zh saw isn't sw [θɯ1]: cf. Proto-Tai *sɯ A 'writing' from Chn 書.
One might think that PT *-ɯ became Zh -aw [aɰ], but no such change is reflected in
PT *mɯ A > Zh mwz [mɯ2] 'hand'
and Zh -aw [aɰ] corresponds to PT *-aɰ:
PT *ɓaɰ A > Zh mbaw [ɓaɰ1] 'leaf'
This does not necessarily mean that all Zh -aw [aɰ] are from PT *-aɰ. Nonetheless, I suspect that Zh saw is a very old loan preserving the rhyme *-aɰ without the nonlow vowels reflected in the later Vietnamese borrowings from Middle Chinese.
One might also propose a connection between Zh saw and PT *dʑɯ B 'name', possibly from Early Middle Chinese 字 *dzɨh* 'character, name'. (PT had no *dz-.) However, Chinese *dz- corresponds to Zh c-, not Zh s-. The Sino-Zhuang version of 字 is cih [ɕi6], written as 字+之. The front vowel [i] indicates that the borrowing occurred
- after the fronting of EMC *ɨ to *i (Sino-Vietnamese chữ 'character' predates this change)
- before the shift of *dzi to *dzzˌ (Sino-Vietnamese tự 'character' reflects the latter)
Next: The Roots of Rawness
*12.27.1:05: By coincidence, the native Zhuang word goenz [kon2] 'person' vaguely resembles gun 'Chinese', but its second tone derives from an earlier voiced initial: [kon2] < *gon. It is cognate to Thai คน khon < Proto-Tai *ɣon 'person'.
11.12.25.23:59: TANGUT THROUGH TIBETAN (PART 4: CONCLUSION)
This concludes my comments on Andrew West's observations on Tangut in Tibetan transcription:
Although most Tibetan glosses do approximately correspond to the modern phonetic reconstructions of the corresponding Tangut characters, the correspondence is disappointingly poor, with only a very few characters showing an exact correspondence between Tangut reconstruction and Tibetan transcription (e.g.
L[i Fanwen 2008 #] 2098 "I, me"
which is reconstructed *ŋa and glossed ŋa ... which also happens to be the Tibetan word for "I, me").
This correspondence, though vague at best, was sufficient to identify most of the consonant classes in the monolingual Tangut Homophones dictionary (see part 3). However, it is certainly not sufficient to identify the 105 rhymes of the Tangraphic Sea.
In most cases the Tibetan glosses miss out what should be essential phonetic features, for example transcribing *mja as ma, *ŋwu as ŋu, *ɣjɨ̣ as rgi, *war as wa, *lew as li, and *lhjwịj as lhi.
These nonmatches belong to at least four categories:
1. Random errors (slips of the brush) and pseudoerrors caused by damage to the manuscripts preventing us from seeing vowel symbols that were once there: e.g.,
- a hole above the base consonant of a gloss could cause us to read it as <Ca> with the default vowel <a> instead of the ི <i>, ེ <e>, or ོ <o> that was once above it
- a hole below the base consonant of a gloss could cause us to read it as <Ca> with the default vowel <a> instead of the ུ <u> that was once below it
2. Nonmatches involving Tangut features that did not exist in Tibetan: e.g., there were no consonant clusters ŋw- and lhjw- in Classical Tibetan, so it's not surprising that such Tangut clusters were glossed as <ng> and <l> even though <ngw> and <lhyw> would have been ideal.
3. Nonmatches involving Tangut features that did exist in Tibetan: e.g., Tangut mja could easily have been glossed as Tibetan <mya> instead of <ma>. Why wasn't it? None of the 87 complete glosses in Tai (2008: 209-210) for syllables reconstructed with -ja by Gong have <y>, so the eight instances of <ma> for expected <mya> cannot be disregarded as a random error.
4. Nonmatches involving Tibetan letters corresponding to nothing in standard Tangut: e.g., the <r> of <rgi> for standard Tangut ɣjɨ̣.
The third category makes me think that, in Andrew's words,
the modern reconstructions of Tangut are seriously flawed (a possibility I can't reject)
At present I do not think any reconstruction of Tangut - not even any of my own - is anywhere near accurate. I expect a major overhaul of my reconstruction once I analyze the Tangut rhyme tables. I will do that after I finish the translation of the Golden Guide.
The fourth category makes me think that the glosses reflect a nonstandard variety of Tangut: e.g., <rgi> reflected nonstandard ɣjɨ̣̣ʳ with a retroflex vowel rather than standard ɣjɨ̣ with a nonretroflex vowel. Even the most accurate reconstruction of the standard dialect may not match the dialect(s) reflected in the glosses.
It does not help that I don't know what dialect(s) of Tibetan underlie the glosses. That problem would have to be solved by a Tibetologist like Nathan Hill with a background in Tangutology. Converting the glosses into any Classical Tibetan romanization system does not necessarily generate a result resembling their intended pronunciations.
What happens when we look at language A transcribed by speakers of language B and assume that we are looking at language A' transcribed by speakers of language B'?
Suppose we know that the Russian word for 'place' was место [ˈmʲɛstə] and we know how to pronounce Chinese characters in modern standard Mandarin. What if we find the transcription
滅陀
Md mietuo [mjɛthwɔ]
for what we assume to be место? We might wonder why there was no attempt to transcribe the [s] and why Rus [tə] didn't correspond to Md 特 te [thə]. (Let's assume Md [t] was already used as a transcription of Ru [d], so Md [th] was chosen as a transcription of Ru [t].)
But we didn't know that the transcription was made by a Cantonese speaker who heard Ukrainian місто [mistɔ] and intended 滅陀 to be pronounced [mitthɔ]. Cantonese [tt] corresponds to Ukrainian [st].* Md [jɛ] does happen to correspond to Rus [ʲɛ], but the scribe really intended Cantonese [i] to correspond to Ukrainian [i]!
| Later misinterpretation | Russian | m | ʲɛ | s | t | ə |
| Mandarin | m | jɛ | - | th | wo | |
| Transcription | 滅 | 陀 | ||||
| Original intent | Cantonese | m | i | t | th | ɔ |
| Ukrainian | m | i | s | t | ɔ | |
Is it too much to expect accuracy? Were
On the one hand, I do find Andrew's theory attractive:the Tibetan scribes were content to provide a very approximate representation of Tangut, so approximate that it is hard to imagine that a Tangut speaker could have understood much that a Tibetan reading the Tibetan transcriptions of Tangut was saying[?]
So what was the purpose of the Tibetan transcriptions? My theory is that they were intended for Tibetan monks to be able to chant in unison with their Tangut colleagues, not knowing what they were chanting or needing to chant perfectly, but just vaguely correct enough to be able to chant along without sticking out like a sore thumb. Maybe the Tibetan monks who made the transcriptions did not speak a word of Tangut, and they just wrote down what they thought they heard, which would explain why the transcriptions are so imprecise.
On the other hand (emphasis mine),
Some hypotheses about these letters:Thirdly, the Tibetan glosses utilise prefix letters (g, d, b, m and ') and superfixed letters (s, r and l) in a way that suggests they might have been intended to indicate a particular pronunciation of the corresponding Tangut character, but it is not immediately obvious what this might have been (it has been suggested that these nominally silent letters may have been intended to represent tone in Tangut, but I am not convinced), and they are used inconsistently (e.g. L1245 ·jij is glossed as either ye or g.ye). Likewise, the glosses frequently use a final letter -'a, seemingly to indicate a long vowel, but again it is used inconsistently (e.g. L1278 ·jɨ is glossed as either g.yi or g.yi'). Perhaps the oddest feature of the Tibetan transcriptions is the use of prefix letters in front of letters that do not allow prefix letters in standard Tibetan orthography, for example d.wi དཝི and g.ru' གརུའ. This feature occurs across different manuscripts, and could suggest that the scribes were actually using a formally defined orthography for transcribing Tangut, and not just putting down what they could hear, as I suggested above.
Future studies of the glosses may lead to the birth of Tangut dialectology.- I long assumed that the preinitial letters represented real consonants not preserved in standard Tangut, but am inclined to view many of them as attempts at tonal spelling. An exception is preinitial <b> which may indicate a Tangut medial -w- (Nie 1986). This usage may help us identify the underlying Tibetan dialect(s) of the glosses.
Apparent exceptions to Arakawa's (1999) tonal spelling interpretation (preinitial = level tone, no preinitial = rising tone)
1542 1ku (Gong), glossed as <gu> as well as <b.ku> and <H.ku>
2999 2swu (Gong), glossed as <b.zu> instead of <s(w)u>
(1:23: Were <b.z> and <s> pronounced with different tones in the Tibetan dialect underlying this transcription?)
may be due to tone sandhi in compounds or within phrases. All studies I have seen of the Tibetan glosses take the glosses out of context, so my hypothesis has yet to be tested.
- Preinitial <m> and <H> (Andrew's <'>) could represent prenasalization.
- Final <H> may represent a final consonant that only appears in certain phonological environments: e.g.,
*CVC >
CV if the next syllable begins with a (certain type of?) consonant
CVH if the next syllable begins with a vowel (and/or a certain type of consonant?)
- Preinitial <r> and final <r> could represent vowel retroflexion, which may also be present in words lacking retroflex vowels in standard Tangut (see example above).
*12.26.2:36: I am assuming that the transcriber does not know Cyrillic and is not influenced by spelling.
Cantonese still preserves the final stops of Middle Chinese. Neither Cantonese nor MC had syllables with final *-s or initial *st-. I found two instances of Indic st-type sequences transcribed as MC *-tt(h?)- in Soothill (1937): e.g.,
Prakrit Kustana as 屈丹 MC *khuttan
Sanskrit Veṣṭana (or an unattested Pali *Veṭṭhana?) as 別他那 MC *bɨetthana
Unfortunately, I haven't been able to find any examples of Cantonese [VttV] corresponding to English [VstV] in the pages of Bauer and Benedict (1997) visible at Google Books. I suspect no such cases exist because borrowings from English are likely to be influenced by English spelling.
I wonder if any Cantonese speaker has ever pronounced English [VstV] as [VttV]. If not, then my место example is invalid.
Perhaps this would be a better example of the same kind of problem. Suppose we are puzzled by Russian от [ɔt] 'from' transcribed as what we think is supposed to be Mandarin 威地 weidi [wejti] ... which in fact was meant to be a Cantonese [wajtej] transcribing Ukrainian від [ʋid]. (There is no Cantonese [ʋi(t)], [vi(t)], or [wi(t)]. Cantonese syllables cannot end in voiced stops.)
| Later misinterpretation | Russian | - | ɔ | t | - |
| Mandarin | w | ej | t | i | |
| Transcription | 威 | 地 | |||
| Original intent | Cantonese | w | aj | t | ei |
| Ukrainian | ʋ | i | d | - | |
(Tables added 12.26.13:46.)
11.12.24.23:59: TANGUT THROUGH TIBETAN (PART 3)
Now I can finally start my comments on Andrew West's observations on Tangut in Tibetan transcription:
The Tibetan glosses are particularly difficult (for me at least) to read as they are generally written in an untidy, cursive, headless script in which many letterforms are very similar to other letterforms (e.g. the letters ng ང, d ད and ra ར all look almost identical in some hands), and without context it can be difficult to be sure exactly what letters are intended.
When I first started learning Tibetan 18 years ago, I had difficulty distinguishing ང <ng> and ད <d> in a clear computer font. Looking back, I'm surprised I didn't confuse ར <r> with them. The Tibetan glosses of Tangut are indeed hard to read. Until a few years ago, I had never seen the actual glosses themselves and relied on Nevsky's 1926 handwritten copies of them. I acquainted myself with their script using Nevsky's romanization.
For this reason, in many cases the identification of the Tibetan gloss can only be determined with certainty by reference to the reconstructed reading of the corresponding Tangut character.
This sounds more circular and dangerous than it actually is. Andrew mentions a case in which a gloss looks like it could be <ngu>, <du>, or <ru>. Let's pretend we don't know anything about Tibetan. Using Chinese glosses alone (risky, I know) to avoid circularity, we can determine that the entries in the Homophones dictionary were organized into nine chapters by initial class:
I. labials
II. labiodentals (this category is controversial)
III. dentals
IV. 'retroflexes' (this category is even more mysterious than II, partly due to a paucity of transcriptive evidence)
V. velars
VI. alveolars
VII. alveopalatals
VIII. glottals
IX. liquids
I would determine the interpretation of a <ng>/<d>/<r> gloss on the basis of a tangraph's location in Homophones: e.g., a <ngu>-graph would be in chapter V, etc.
Secondly, Tibetan is a writing system that is particularly well-equipped to represent a wide range of phonetic values
The Tibetan alphabet is an Indic alphabet, so it is suited to write the rich consonantal inventory of Sanskrit: e.g., three kinds of /s/-sounds. However, Sanskrit did not have a rich vowel system compared to, say, about 30 for Khmer*. Tangut may have had an even richer vowel system than Khmer because the Tangraphic Sea rhyme dictionary has 105 rhymes disregarding tones, and all 105 may have lacked final consonants. This poses a huge problem for Tangutologists: how can a language have 105 different ways of ending open syllables?
Gong's solution was to posit seven basic vowels with or without
- glides** (medial -i-, -y-, -w-, -iw-, -jw-, final -y, -w)
(I am writing Gong's j as y to simplify comparison with Tibetan y = IPA [j].)
- length
- nasalization
- tenseness
- retroflexion
There are
(±-i/y/w/iw/yw-) x (a/e/i/o/u/ə/ɨ) x (±length) x (±nasal) x (±tense) x (±retro) x (±-y/w)
6 x 7 x 2 x 2 x 2 x 2 x 3 = 2016
possible combinations of these elements, but only 105 are actually needed. In any case, no language has 105 simple vowels not otherwise distinguished by length, nasalization, etc.
The Tibetan alphabet already has symbols for medial <y> and <w>. Although Tibetan words do not end in -y or -w, Chinese syllables ending in those glides were transcribed in Tibetan as <Hi> and <Hu>. (འ <H> is a Tibetan letter whose phonetic value is controversial. Nathan Hill has convinced me that <H> was a voiced fricative in an earlier period, but I do not know what its phonetic value was in the dialect[s] of the authors of the Tibetan glosses of Tangut.)
Indic long vowels can be indicated with an <H>-like symbol.
So in theory, Tibetan should support the transcription of
<±y/w><±H for vowel length><a/e/i/o/u><±Hi/Hu>
3 x 2 x 5 x 3 = 90
rhymes which is close to 105. The number increases to 120 if the un-Tibetan sequence <yw> or <wy> is allowed.
However, in reality
- Gong's medials often correspond to no medial in the Tibetan glosses
- I have never seen a Tibetan gloss ending in <Hi> or <Hu>, suggesting that the dialect(s) of Tangut that were transcribed lacked final glides
But wait. Notice that I referred to "dialect(s) of Tangut". There is no guarantee that transcribed Tangut dialect(s) were the same as the presumably standard dialect recorded in Tangut dictionaries. So maybe the glosses never reflected a dialect with 105 different rhymes.
Similarly, the Chinese glosses in the late 12th century Pearl in the Palm (dating long after the compilation of Tangut monolingual dictionaries in the 11th century) may reflect a (nonstandard?) dialect with less than 105 rhymes.
If these glosses and the Tangut dictionary tradition reflect different dialects, trying to reconstruct the 105 rhymes of the latter on the basis of the former is like trying to reconstruct the rhymes of Cantonese on the basis of romanizations of Mandarin. Looking at Peking and Beijing for 北京, how could anyone guess that the first character 北 was read as pak in Cantonese? (Cantonese retained final stops lost in Mandarin.)
Perhaps the reconstruction of Tangut requires two stages:
Phase 1: Reconstruction of the glossed dialect(s) with smaller rhyme inventori(es)
Phase 2: Expansion of the rhyme inventory of the standard dictionary dialect using
- internal evidence
- external evidence:
- Chinese loanwords
- formats of Chinese dictionaries and rhyme tables which influenced their Tangut counterparts
The preceding procedure assumes that the standard dialect has more rhymes than the nonstandard glossed dialects. However, it is also possible that the nonstandard dialects preserve distinctions lost in the standard dialect. Until a few years ago, I assumed that the consonant clusters in the Tibetan glosses represented actual consonant clusters in Tangut that were not reflected in the Tangut dictionary tradition which preserved a 'sinified' dialect of Tangut. I am now more sympathetic to a tonal interpretation of those clusters, but that approach also has problems. Other candidates for nonstandard conservative features might be
- prenasalization (glossed with preinitial <H->)
- voiced aspirates (glossed as <dh>, <bh>, etc.)
- final consonants (glossed with final <-H>)
A third phase of Tangut reconstruction could result in a pre-Tangut reflecting features from both the nonstandard and standard dialects:
| Standard Tangut: fewer consonants? more rhymes |
Pre-Tangut: lots of consonants and rhymes |
| Glossed Tangut: more consonants? fewer rhymes |
Next: The Conclusion (I hope!)
*12.25.00:51: There is no agreement on the number of vowels in standard Khmer. Khmer vowels are written in the Khmer alphabet - another Indic script - with a combination of consonants and vowels, reflecting how Khmer's complex vowel inventory arose from lost consonantal distinctions: e.g.,
កា <kaa> [kaa] < *kaa
គា <gaa> [kiə] < *gaa
កី <kii> [kəy] < *kii
គី <gii> [kii] < *gii
Khmer vowels after *voiceless consonants lowered if possible: e.g., *ii > [əy], still written as <voiceless consonant> + <ii>. (*aa was already low, so it had nowhere to go.)
Khmer vowels after *voiced consonants raised if possible: e.g., *aa > [iə], still written as <voiced consonant> + <aa>. (*ii was already high, so it had nowhere to go.)
**12.25.2:01: Gong reconstructed a distinction between -i(w)- and -y(w)- in Tangut. He may have been influenced by the same distinction in Li Fang-Kuei's Middle Chinese reconstruction which in turn inherited it from Karlgren's Middle Chinese reconstruction. Does any language have such a distinction? Vietnamese comes close: e.g.,
kia [kiə] 'that'
giơ [zəə] < ?*kyəə or *C-cəə 'to extend, raise'
(A loan from Middle Chinese 舉 *kɨəʔ 'to raise'? But MC *-ʔ should correspond to a sắc tone, not a ngang tone. Also, the MC word does not mean 'extend', though it's possible that 'extend' was a later semantic, um, extension within Vietnamese.)
gió [zɔɔ] < *kyɔʔ 'wind'
However, medial -i- and *-y- are in complimentary distribution:
-i- before short ə
*-y- before all other vowels including long əə
There were/are no pairs in Vietnamese like Gong's reconstructions of
3459 1kio 'to drive'
2264 1kyo 'mother's brother'
which have medial -i- and -j- before the same vowel. My Tangut reconstruction only has -i- (which could be rewritten as -y-). I normally reconstruct a zero medial corresponding to Gong's -i- and an -i- or -y- corresponding to Gong's -y-, though I have considered reconstructing no -i- or -y- at all:
| Tangraph | Gong | This site | Tibetan glosses for rhymes |
|
|
1kio | 1kɔ (or 1kœ?) | <uH>, <a> (<a> is the default vowel for a Tibetan consonant letter without a vowel symbol, so a vowel symbol may have been lost or accidentally omitted) |
|
|
1kyo | 1kio (or 1kyo; perhaps even 1kø?) | <o>, <oH>, <ooH>, <uH> (no medial <y>!) |
I have not seen any Tibetan glosses for these two tangraphs, but I extracted glosses from tangraphs sharing their rhymes.
11.12.23.23:59: TANGUT THROUGH TIBETAN (PART 2: VLADIVOSTOK DETOUR)
In part 1, I tried to exemplify the problems of transcribing non-Chinese words in Chinese characters mostly using abstract formulae. Here's a concrete example. Suppose that in the future, we only have Chinese transcriptions of Russian. How well could we reconstruct Russian phonetics even if we had a perfect knowledge of the Mandarin underlying those transcriptions? For example, Владивосток [vɫədʲɪvɐˈstok] with four syllables would be transcribed as a seven-syllable sequence
符拉迪沃斯托克
fu la di wo si tuo ke [fulatiwɔsz̩thwɔkɤ]
which is far from the original:
| Sinograph | Mandarin in IPA | Russian in IPA | Match? |
| 符 | f | v | No; voicing mismatch (inevitable because Mandarin has no [v]) |
| u | (none) | Corresponds to nothing in Russian (inevitable because Mandarin does not allow the clusters [fl] or [vl]) | |
| 拉 | l | ɫ /l/ | More or less: Mandarin /l/ corresponds to Russian /l/, though the former is not velarized (inevitable because Mandarin has no [ɫ]) |
| a | ə | No; Mandarin is influenced by Russian spelling а <a> rather than Russian phonetics | |
| 迪 | t | dʲ | No; voicing and palatalization mismatch (inevitable because Mandarin has no [dʲ]) |
| i | ɪ | No (inevitable because Mandarin has no [ɪ]) | |
| 沃 | w | v | No (inevitable because Mandarin has no [v]) |
| ɔ | ɐ | No; Mandarin is influenced by Russian spelling о <o> rather than Russian phonetics | |
| 斯 | s | s | Yes |
| z̩ | (none) | Corresponds to nothing in Russian (inevitable because Mandarin does not allow the cluster [st]) | |
| 托 | th | t | No; aspiration mismatch |
| wɔ | o | No (inevitable because Mandarin has no [tɔ] or [to]) | |
| 克 | kh | k | No; aspiration mismatch |
| ɤ | (none) | Corresponds to nothing in Russian (inevitable because Mandarin does not allow syllable-final [k]) |
Only one segment matched: [s]! (12.24.14:02: Or two if one ignores phonetic details and is satisfied with Md /l/ : Rus /l/.) The Russian stress accent could not be reconstructed from the Mandarin transcription which contains tones (omitted here) of no relevance to the original.
Now I'll finish commenting on Andrew West's first paragraph:
For these reasons, phonetic glosses in Chinese characters are inferior to phonetic glosses given in phonetic scripts such as Tibetan or Phags-pa.
Neither Tibetan nor Phags-pa are IPA. As we will see in part 3, Tibetan probably did not have enough letters for an accurate segmental transcription of Tangut. Nonetheless, both of those scripts are alphabets which are moreflexible than a huge syllabary like sinography: e.g., I could transcribe 'Vladivostok' in Tibetan as
ཝླ་དི་ཝ་སྟོཀ་
<wla.di.wa.stok.>
(with an un-Tibetan sequence <wl> and final <k>)
and in Phags-pa (displayed here on its side rather than vertically) as
ꡓꡙ ꡊꡞ ꡓ ꡛꡈꡡꡀ
<wla di wa stok> (<w> may have been [v]; see Andrew West's section on letter 20)
which are not perfect, yet still a vast improvement over 符拉迪沃斯托克 Fuladiwosituoke.
Luckily for us, a number of Tangut Buddhist manuscripts with phonetic transcriptions of Tangut characters in the Tibetan script are known, and have been the subject of considerable interest to Tangutologists ever since the existence of such manuscripts was first reported by Nevsky in 1926.
I got ahold of Nevsky's book on Tibetan transcriptions of Tangut 70 years later and have been struggling to understand them for the last 15 years.
Next: Tibetan at Last!
11.12.22.23:59: TANGUT THROUGH TIBETAN (PART 1: TANGUT THROUGH CHINESE)
Here is my first round of comments on Andrew West's post on the Tibetan transcriptions of Tangut.
Perhaps the core problem of Tangutology, which has directly and indirectly involved most of the effort of most Tangutolgists most of the time, has been the reconstruction of the pronunciation of the extinct Tangut language.
I've been wrestling with that problem off and on since 1996. The core of the reconstruction I currently use on this site dates from 2008, though the details have fluctuated ever since.
However, it is necessary to first reconstruct the pronunciation of 11th century Chinese before the Chinese glosses can be used to try to reconstruct the pronunciation of the corresponding Tangut characters
Although the phonology of Middle Chinese and Old Mandarin is well documented, the phonology of the northwestern Chinese dialect underlying the Chinese transcriptions of Tangut is poorly understood. It does not help that those same transcriptions are the only direct evidence of that dialect. The problem is seemingly circular but not entirely hopeless. Coblin's (1991, 1994) studies of the Tibetan transcriptions of northwestern Chinese predating the rise of the Tangut Empire give us an idea of the ancestor of that dialect. Unfortunately, modern northwestern Chinese dialects may not be descended from that dialect (which may have been their substratum rather than their ancestor). So we cannot necessarily assume that the unknown dialect was partway between two knowns (Tang Dynasty NW Chinese and current NW Chinese).
as Chinese characters are notoriously incapable of accurately representing the phonetic systems of other languages
Although it is often difficult to represent one language in a script designed for another, Chinese characters are particularly problematic because they are syllabic symbols with rare exceptions*. Here are examples of distortions found in Chinese character transcriptions:
- Consonant clusters absent from Chinese are either broken up or simplified:
non-Chn C1C2 > Chn C1VC2 or C1
or even C3 (a completely different consonant if neither C1 nor C2 exist in Chinese)
- Even simple CV syllables may be absent from Chinese, forcing writers to compromise:
non-Chn C1V1 >
Chn C1V2 (accurate consonant, inaccurate vowel)
or C2V1 (inaccurate consonant, accurate vowel)
or C2V2 (completely inaccurate; only vaguely like original)
Chn V2 could even be a diphthong corresponding to a foreign monophthong.
- Near-impossibility of indicating foreign tones
Tone systems of different languages almost never match. (One can't indicate the six-plus tones of Cantonese with a transcription based on Mandarin with four different tones.)
And even if the tones in one Chinese variety perfectly matched those of a non-Chinese language, there is no guarantee that a non-Chinese CV + tone combination also exists in Chinese.
- Derography, the use of derogatory characters with little regard for phonetic accuracy:
The name of a 3rd century queen in Japan was transcribed as 卑彌呼 *pie mie xo 'humble full call'. Although the second and third characters are not derogratory and hence may be reliable, it is uncertain whether 卑 *pie 'humble' represented a non-Chinese *pie, *pe, *pi, etc.
On the other hand, letters of an alphabet can be assembled into novel combinations to represent a foreign language: e.g., the initial cluster zn- does not occur in English, yet I can romanize Russian знать as znat'.
Next: My Final Comments on Andrew's First Paragraph (and more, I hope!)*E.g., -儿 -r and graphs with polysyllabic readings like 圕 tushuguan, an abbreviation of 圖書館 tushuguan 'library'.
11.12.21.23:59: THE GOLDEN GUIDE: LINE 93: TANGRAPHS 461-465
My last entry on the Golden Guide inspired me to pick up where I left over a year ago.
93. This line is in the middle of the list of Chinese surnames. All five tangraphs normally represent other words but double as phonetic transcription characters.
| Tangraph number | 461 | 462 | 463 | 464 | 465 |
| Tangraph | ![]() |
![]() |
![]() |
![]() |
![]() |
| Li Fanwen number | 2012 | 3630 | 5910 | 2527 | 5630 |
| My reconstructed pronunciation | 1təũ | 1vieʳ | 1kæ | 2nee | 1dʒɛw |
| Tangraph gloss | winter | divination | price | imperial court | to be worried about |
| Word | the surname 董 Dong (*təũ) | the surname 衛 Wei (*wɨẽ)? | the surname 賈 Jia (*kæ) | the surnames 佴 or 能 Nai (*ne)? | the surname 卓 Zhuo (*tʃæw) |
| Translation | 董 Tun, 衛 Wei, 賈 Jia, 佴/能 Nai, 卓 Zhuos | ||||
461: 2012 is one of only three rhyme 104 (-əũ) syllables in Tangut. All three are borrowings from Chinese:
1402 1xəũ 'red' < Chn 紅 *xəũ
2012 1təũ 'winter' < Chn 冬 *təũ
4305 1tsəũ (a Chinese surname) < Chn 宗 *tsəũ
(12.22.22:44: Native -ũ denasalized at some point shortly before Tangut was first written. The loanword
5625 1thwəəu < Chn 同 *thəũ 'same'
has an initial th- reflecting the *d- > *th- shift that occurred not long before the birth of tangraphy. The word may have been borrowed as *1P-thəəũ with a native *P-prefix added. That prefix is the source of the medial -w- absent from the Chinese original:
1. Borrowing and partial nativization: 同 *thəũ > *1P-thəəũ
The long vowel is unexpected. I once thought vowel length might be a trace of a lost final consonant: e.g., Chn *thəuŋ > T thəəu, but Chn syllables ending in oral vowels also correspond to Tangut syllables with long vowels: e.g.,
2138 2bəəu < Chn 墓 *mbəu 'grave'
Is it a coincidence that all three examples of such Chinese open syllables all had prenasalized initials?
2. Prefix loss: *1P-thəəũ > *1thwəəũ
3. Denasalization: *1thwəəũ > 1thwəəu)
Why would winter be written as a collection of water rather than ice or snow?
=
+
2012 1təũ 'winter' =
left ('water') of 3058 2ʒɨəəʳ 'water' +
all of 0269 1khiəə 'to collect'
462: Nie and Shi (1995) identified 3630 as the surname 隋 Sui (*swi in the Chinese dialect known to the Tangut), but 1vieʳ doesn't sound anything like it.
3630 was analyzed as the 'meaning of the four brights':
=
+
+
3630 1vieʳ 'divination' =
left and central line of 2205 1lɨəəʳ 'four' +
top left of 5120 1swew 'bright' +
left of 0797 1phi 'meaning; idea'
Is this a mnemonic? Were there 'four brights' in Tangut divination?
463: 5910 is a loanword from Chinese 價 *kæ 'price' which was homophonous with the Chinese surname 賈 *kæ. I'm surprised there's no 'money' in its analysis:
=
+
5910 1kæ 'price' =
left of 5875 2ʒɨị 'to sell and buy' (< Early Middle Chinese 市 *ʑɨəʔ 'market') +
center and right of 3934 2kwɛ 'true; precious':
464: The only phonetic matches for 2ne are the rare Chinese surnames 佴 *ne and 能 *ne. The reading *ne is based on the modern Mandarin reading nai in the list of surnames in Giles (1892: 1358).
I wonder if 2527 2nee was borrowed from Chinese 內 *ne 'inner' as in 內宮 'inner court'.
Its analysis is unknown, but
'sage'
is on its right side. Perhaps 'sage' is taken from one of the tangraphs for the imperial surname Ngwimi:
= ? +
or
2nee 'imperial court' = ? + left of 2339 2ŋwəi or bottom center of 1903 1mi
465: The Tangut initial dʒ- normally corresponds to Chinese *ndʒ- - e.g., in
4706 1733 2dʒɨəəu 2tʃɨi < Chn 女直 *ndʒɨu tʃɨi 'Jurchen'
- but I can't think of any Chinese surname like *ndʒɛw. Nie and Shi (1995) identified 5630 as the surname 卓 Zhuo *tʃæw in spite of the voiceless initial.
ソ is short for 'many' in 5630:
=
+
5630 1dʒɛw 'to be worried about' =
top left of 5414 2reʳ 'many' +
all of 1262 1ʒɨị 'vexed, worried'
(12.22.22:49: The entire left side of 5414 has a similar function in
=
+
5076 1dʒʃɨəəʳ 'feast' =
top left of 5414 2reʳ 'many' +
4508 1tị 'to eat')
11.12.20.23:59: PRACTICING THE GOLDEN GUIDE
Thanks to Andrew West for leading me to this fragment of Tangut handwriting practice with five tangraphs (Tangut characters) on it, I hoped that the student was copying a five-tangraph line from the Golden Guide, and I was right! I wrote about that very line back in August 2010. (Note that I presented the tangraphs of line 75 from left to right, whereas they were written from right to left on this fragment:
1xiõ, 0ʔa, 1de, 1lwo, 2so (from right to left)
'Solwo, De, Ahon' ((from left to right; a list of Tangut surnames; A and Hon might be two separate surnames)
I should complete my translation of the Guide. I haven't translated a single line of it this year. I hope to do something about that soon.
12.21.0:25: The next fragment seems to continue from the last. It contains the last tangraph of line 75 of the Golden Guide and the two surnames at the beginning of line 76: 'Babi, Dew ...'
1dew, 2bəi, 2ba, 1xiõ (from right to left)
'...hon, Babi, Dew ...' (from left to right)
12.21.2:07: But alas, I have no idea where the third fragment is from:
2lhị 2dʒɨa 2siẹ 1ʃɨẽ 2ziuʳ (from right to left)
'broom, accomplish, wisdom, sharp, moon'
'broom accomplish' could be a Tangut object-verb sequence ('become a broom'?) and 'wisdom sharp' could be a Tangut noun-adjective sequence ('sharp wisdom') but neither make sense together or with 'moon'. Maybe this is an excerpt from some primer other than the Golden Guide. 'Broom accomplish' might be the end of one line and 'wisdom sharp' might be the start of another.
11.12.19.16:21: UP ON THESE MOUNTAINS
In "Feminine Lines", I mentioned that Jin (1984: 263) derived Jurchen
~
~
<ali> 'mountain'
from a Khitan large script (KLS) character
which in turn resembles a KLS character that has a Chinese 山 *shan 'mountain'-like shape on the left:
Could this be the KLS graph for the Khitan word for 'mountain'? Then I could posit the following chain of derivation:
>
>
>
Chn *shan 'mountain' > KLS 'mountain' (reading unknown) > Jurchen <ali> 'mountain'
No, that would be too easy. The KLS 山-like graph and its variants, including one that is identical in shape to Chn 此 *tsï 'this'
are phonograms for Liao Chinese 上 'up' and 尚 'still'* *shang (not 山 *shan!). Did these graphs originate from 上 plus added strokes?
Up north
Kane (2009: 182) listed a lookalike of Chn 北 'north'
as another phonogram for 尚 *shang. Presumably its variants could also be read *shang:
These graphs also mean 'north' as in Chinese. Did the Khitan borrow Chinese 上 *shang 'up' as 'north'? Should the Khitan small script graph for 'north'
also be read <shang>**? Could 一 be an abbreviation of 上?
<as> and <or>
There is another set of KLS allographs including 山-like shapes:
But none of these mean 'mountain' or even Chn 正 'correct'. 正 is a phonogram for <as>. It and one of its allographs combines with <ar> to form
~
both <as.ar> (are other allographs attested as the first character?)
the KLS equivalents of
<as.ar> 'quietness, peace(ful), clear'
at the end of line 1 of the poem in the eulogy for 宣懿皇后 Empress Xuanyi.
Perhaps that set of allographs should be split into two, as three represent <or> of <po.or> 'become':
~
~
also spelled
~
equivalent to Khitan small script
<p.o.or>
Summing up how I split the 山-type KLS graphs:
| Reading | Allographs |
| <as> | ![]()
|
| <or> | ![]() ![]() ![]() ![]() ![]()
|
Maybe these are all allographs after all which can be read as either <as> or <or> depending on context.
After all that, I still have no idea what the Khitan word(s) or large script graph(s) for 'mountain' were. I conclude with two final mysteries.
<po> 'mountain'?
KLS 山 is not only a phonogram for *shan (Kane 2009: 181) but also appears in a ligature for 'monkey' implying another reading <po>:
=
![]()
Subtracting the <o> graph results in<poo>? = <po.o>
=
![]()
<po> = <po>
Should these two comprise an allograph set? Was <po> the Khitan word for 'mountain'?
<u> in 'mountain'?
KLS 山 appears in line 26 of the epitaph of the 北大王 Great Prince of the North:
The first two graphs at first seem to be <ten.shan> (cf. Chn 天山 *tienshan 'heaven mountain'). They should be followed by the genitive suffix 至 <an>, but are actually followed by the genitive suffix <un> which implies that the preceding noun had an <u>. Did KLS 山 have a third reading with <u>? Did 天 have a second native reading? Did 天山 represent a native Khitan word rather than Chn 天山, or was it a combination of a Chinese loanword *ten 'heaven' plus a native word with <u>?
The fourth and fifth graphs are <NORTH> and <EAST> in Khitan order rather than Chinese order (東北 'east-north').
The sixth and seventh graphs (readings/meanings unknown) may represent a noun modified by 'northeast'.
So the whole phrase may mean 'The northeastern ... of Tenshan'.
*尚 is also 'upwards' and probably shares an Old Chinese root *daŋ with 上. Schuessler (2009: 81-82) reconstructed a medial *-j- in both words on the basis of proposed Tibetan cognates, but there is no unambiguous Chinese evidence for *-j-. The nonemphatic *d- of both words palatalized to *dʑ-. This would have happened with or without a medial *-j-.
**Kane (2009: 35) listed other "[s]uggested readings based on Mo[ngolian ... which] lack evidence":
<umar-a>: cf. Written Mongolian umar-a 'north'
<xoina>, <xoi>: cf. Written Mongolian qoyna 'in the rear / north'
<aru>: cf. Written Mongolian aru 'back, north'