b35

ON LANGUAGES

ASHOK R. KELKAR

A NOTE ON STANDARD CHINESE NUMERALS

The purpose of this note* is to focus attention on the structure of complex numerals in Standard Chinese. To this end a set of rewrite rules is presented that would generate the infinitely denumerable set of cardinal numerals in standard Chinese as used, say, in naming abstract integers in mathematical discourse. It will be interesting to see how these rules would fit into a fuller grammar of noun phrases  how, to be more specific, the cardinal numerals generated here would subsequently behave in relation to the so-called measures or classifiers that would follow them ordinarily. (Subsequently here of course refers to the descriptive order). But this has not been attempted in this note.

 The starting point of this fragment of a generative grammar may be designated as N which represents the set of cardinal numeral expressions as named in the abstract including the expression # lng # zero All Chinese forms are cited in the Yale romanization.¹ There are as many forms generated after applying the lexical rule as there are integers plus, of course, the zero. The remaining rules represent merely non-distinctive morphophonemic readjustment that is either obligatory (conditioned changes) or optional (free alternations) The ordering of the rules is indicated by the prefixed numerals; small letters are used merely to indicate unordered subdivisions of a complex rule. Thus, (11a) and (11b) may be applied in either order after applying (10). The expression opt. Prefixed to a rule instructs that the application of the rule is optional n refers to any of the interior exponential numerals suffixed to the constituent - position symbols H and L (referring to the higher and the lower order constituents respectively) and to the constituent class symbols h and I (higher and lower order group numerals) and u (unit numerals). As the glosses inserted in the lexical rules will bring out, this is essentially a decimal system reaching up to 1,0000,0000,0000 (the placing of the commas is Chinese) and using the operations of multiplication and addition.

 Start with: # N #

Constituent structure rules:

 (1) N → Ht₄ + H₃+ H₂ + H

 This is the basic division into four higher order constituent designating respectively the number of 101²₅, 10⁸₅, 10⁴₅, and units respectively. So the highest numeral that can be named without using the next recursive rule comes out as 9999 X 10¹² X 0 X 10⁸ X 0 X 10⁴ X 0 (= 10¹⁶-1).

 N + h₄

2) H₁₄ →

 H₄

 This is the recursive deus ex machine that makes the set N not an infinitely denumerable set. So that the next numeral after the one just named comes out as (1X10) X 10¹² + 0 X10⁸ + 0 X 10⁴ + 0 (= 10^16).

 3) Hn → L₄ + L₃ + L₂ + L₁ + h_n

 Each higher order constituent is broken down into four lower order constituents designating the thousands, the hundreds, the tens, and the units in its followed by the higher order group numeral (10², 10⁹10⁴, 10³ as the case may be).

 4) L_n → u + ln

 Finally the lower order constituent is broken into two  the unit numeral (one to nine or zero) followed by the lower order group numeral (thousand, hundred, ten, or unit as the case may be). Note how the inferior_n is s Rules 3 and 4 serves to impart numerical exponents to h and l matching those of H and L respectively. The terminal string of class symbols (as assortment of h, l, u) is now ready for the lexicon.

Lexicon:

 Only u remains without a numerical exponent.

(5) u → u₉/u₈/u₇/u₆/u₅/u₄/u₃/u₂/u₁/u₀.

 For the convenience of the reader the spellings and glosses of the ultimate constituent may be presented here. The spellings are actually not needed till after Rule 14 below.

Class h: h₄ Jao x 10¹²+

 h₃ yi x 10⁸ +

 h₂ wan x 10⁶ 

 h₁ x 10⁰ +.

Class l: l_{4}chyan x 10³+

 l₃ bei x 10² +

 l₂ shi x 10¹ 

 l₁ x 10⁰ +.

Class u u_{9}jyou 9

 u₈ ba  8 

 u_{7}chi_{}7

_{}u_{6}lyu 6

_{}u_{5}wu 5

_{}u_{4}sz 4

_{}u_{3}san 3

 _{u2}er 2

 u^{1}yl 1

 u^{0}ling o

 Not the multiplication and addition symbols that go with the group numerals glossed as powers of ten. The string generated after the application of (5) is subject to a certain amount of simplification and free alternative as seen in the remaining rues.

Non distinctive readjustment:

 6) u_o + l_n → u_o

 7) u_o + u_o → u_o

 8) l_n + u_o +h_n → l_n + h_n

 9) u_o + h_n → u_o

 10) u_o + u_o → u_o

 11a) h_n + u_0# → u_o#

 11b) # u_o + u_o → # u_n

 These are all essentially zeroing rules that cut out the excess fat left by the earlier rules and that all involve u_o the all-important line zero. Even a short numeral expression like # wǔshi # 50 will appear in a form glossable in some such fashion after applying (5)  (0x10³+0x10²+0+10¹+0x10⁰) x10¹² +(0x10³ +0x10² ox10¹ +0 x10⁰) x10⁸ +( 0x10³ + 0x10¹+ 0x10⁰) x10⁸+(0 x10³ +0 x 10²+ 0x10⁰) x 10⁸ + (0x10³ + 0x10² +0x10¹ + 0x 10⁰) x 10⁴ + (0 x10³+ 0x10² + 5x10¹ +0x10⁰) x10^0). These six rules will reduce this to us + l2 + h1 (that is, (5 x 10¹) x 10⁶). Of course ling does survive in some expression  in # wǔ chy ān ling wu # 5005 for example. Indeed the handling of this fact precisely constitutes the principal methodological interest of this note.²Note the reinsertion of the same rule at two points = (7) and (10); this rule is of course recursive in its effect of reducing a string of successive u_o s to a single u_o. Rules 6 to 8 simplify lower order constitutes within a single higher order constituent. Rules 9 to 11 simplify higher order constituents within the numeral expression as a whole.

 (12_a) h₁ → ʘ

(12_b) l_l → ʘ

 Now we can remove the two dummy group numerals h₁ and l₁, since they have served their function. They enabled us to simplify Rules 1 to 4 since the last higher order constituent did not have to be treated differently from the remaining constituents of the same rank and also to simplify Rules 8 and 9 for similar reasons. They have also the added merit of providing a convenient base for the glossing rules that we may want to apply after the lexicon has been worked through.

 (13_a opt) l_n + u_n + 1_n-1 # → l_n + u_n

_{}(13¹ opt) h_n + u_n + 1_n-1 # → l_n + u_n

 (14 opt.)u_{1 +}l₂→ l_2

These are a couple of idiomatic omissions illustrated by # y wan we chayn # 15000 also appearing as ## y wn wǔ or by # y sh wŭ# 15 much more commonly appearing as # sh wu # (ten five rather than one ten-five).

 For numeral expressions in the telephonic style used, for example, in naming calendar years (as # yῑ jyǒu sźyῑnyn # the year 1941) we need a set of alternative rules that will replace Rules 6 to 14.

 (6*a) h_n→ʘ

 (6*b) l_n→ʘ

(7*) # u_o→ʘ

Note that the year 5005 will come out as #3 wu ling ling wǔ nyn # while the regular expression will be # wǔ chayān ling wǔ # with only one lng. Rule (7*) of course will be recursive in its effect of wiping out strings of zeros at the beginning.

 Finally, we have a couple of rules about the phonemic shape of the unit numerals that would, in a fuller grammar, be couched in slightly different terms to cover additional facts not within the scope of this fragment. The position of these rules will also have to be reconsidered.

 (15 opt.) u₂(er) (except in the following environments : before # and before l₂) →lyăng.

 (16a) u₁ (yi) (before  )___ y

 (16b) u₁ (y) (before ̀ )___ y

 (17a opt.) u₁ (chi) (before  )___ chi

 (17b opt.) u₈ (ba) (before  )___ ba

Rules for juncture and the placement of the carminatives accent have not been considered here. Probably these can be built into the constituent structure.

 Some sample generative histories may now be presented for some cardinal numeral expressions. For all numerals choosing H₄ rather than N + h₄ under Rule (2)  that is all numerals from zero to (10¹⁶-1)  the string is going to look like this after applying Rule (4) (omitting # and + signs):

 U_l4 u_l3 u_l2ul_lh₄ ul₄ ul₃ ul_lul_l h₃  ul₄ ul₃ ul₂ ul₁ h₂ ul₄ ul₂ul₁h₁

Where u may be replaced by anything from u₀ to u₉ by virtue of Rule (5). The generative histories from that point on for 232, 1560 and 400,0506 are as follows (omitting # and + signs) :

 First Example # r běi sān shr or # lyng bei sān shr # 232.

 After Rule 6 (applicable at 13 points):

 u₀ u₀u₀ u_{0 h4
uo uo yo yo h33 uo yo yo h2uo u2l3u3l2 u33l2u2l1}h_1

After Rule 7 (applicable at 6 points and again at 3 points):

 U_oh₄uoh₃u₀u₂l₃u₃l_2u2l₁h₁

 Rule 8 is inapplicable

 After Rule 9 (applicable at 3 points):

 U_ou_ou_ou_ou₂ l₃u₃l₂u₂l₁h₁

After Rule 10 (applicable at 2 points and again at 1 point):

 U_ou₂l₃u₃u₂l₁h₁

After Rule 11 (11 a is inapplicable):

 U₂l₃u₃l₂u₂l₁h₁

After Rule 12:

 U₂l₃u₃l₂u₂ # r běi sān sh r #

Rules 13 and 14 are inapplicable

After rule 15 (optional):

# lyăng běi sān shr #

Rule 16 and 17 are inapplicable

After applying the general rule governing sequences of Tone˅:

 # lyăng běisān sh r #

Second example # ychyān wũ bei lyu sh#or # ychyān w  bei ly  # 1560

After Rule 6 to 11 (11 a is in applicable)

 U1l4_u5lu6l2h1

After Rule 12 (12 b is in applicable)

 U₁l₄u₅l₃u₆l₂ # y chayān w bĕi ly sh #

After 13 (optional; 13 b is inapplicable):

 # yīchyān wu bĕi ly #

Ruels 14 and 15 are imapplicable./

After making the total readjustments, we get the two expected forms.

Third example # sź bĕi wn lng w bĕi lng ly # 400, 0506.

After Rule 6:

 u_ou_ou_ou_oh₄u_ou_ou_oh₃u_ou₄l₃u_ou_oh₂u₅l₃u_o u₆ l₁h₁

After Rule 7:

 U_oh₄u_oh₃u_ou₄l₃u_oh₂u_ou₅l₃u_ou₆l₁h₁

After Rule 8:

 U_oh₄u_oh₃u_ou₄l₃h_uo_u5l₃u_ou_g l_lh₁

After Rule 9:

 U_ou_ou_ou₄l₃h₂u_ou₅l₃u_ou₆l₁h_l

After Rule 10:

 U_ou₄l₃h₂u_ou₅l₃u_ou₆l₁h₁

After Rule 11 (11 a is inapplicable):

 U₄l₃h₂u_ou₅l₃u_ou₆l₁h₁

After Rule 12:

 U₄l₃h₂u_ou₅l₃u_ou₆ # szˊ bei wan ling wu bei ling lyu

Rules 13 to 17 are inapplicable.

After making the total readjustments, we get the required form.

 To conclude, I should like to highlight two methodological choices we have made that have a bearing on evaluation procedures. First, I have handled the patterning entirely through constituent structure rules rather than having both these and transformation rules. (I do not regard the so-called obligatory transformations and the optional non distinctive transformations like our Rules 15 and 17 as transformation rules properly belonging to grammar.) I have been guided by two considerations: If a complexity can be handled at a lower level, then one should not postpone it to a higher, costlier level. I have distributed the complexity over constituent structure rules and non-distinctive readjustment rules and eliminated PENGS option distinctive transformations (see footnotes * and 1). Further I would rather have it that the contrast between a kernel and its transform be what Martin Joos once called a negligible one than a major one. Secondly, in making this choice, we have also made a choice at another point  we have accepted simple generative rules and complicated generative histories rather than complicated generative rules and simpler generative histories (as PENG appears to have done). I think we are justified in doing this as a matter of principle. Moreover the fact that the generative histories of all lower numerals (choosing H⁴at Rule) run on parallel lines over the grammatical stretch is certainly a relieving feature of our choice. That this enables us to handle telephonic style numerals with great ease (Rules 6* and 7*) is of course an added bonus.

------------------------------------------------------------------------------------------------------------

* The stimulus of this note was the paper on Chinese numerals read by Dr. Fred C.C. PENG (Bunker-Ramo Corporation) at the Summer 1965 meeting of the Linguistic Society of America at Ann Arbor, Michigan. I am thankful to him for this and to Mr. James LIANG (Pennsylvania), Dr, William S.Y. WANG (California at Berkeley), Dr. H.S. BILIGIRI and Dr. D.N. Shankara BHATT (both Deccan College) for useful discussions.

1 The symbols ˍˎ˅ˋ˗˘ over the nucleus represent respectively the first, the second, the third, and the forth tone.

2 Dr. PENG The stimulus of this note was the paper on Chinese numerals read by Dr. Fred C.C. PENG (Bunker-Ramo Corporation) t the summer 1965 meeting of the linguistic society of America at Ann arbor, Michigan. I am thankful to him for this and to Mr.James LIANG (Pennsylvania), dr. William S-Y. Wang (California at Berkeley), Dr, H. S. Biligiri and Dr. D. N. SHANKARA BHATT (both Deccan college) for useful discussions, generates only the expressions lacking lng by his constituent structure rules and then goes on to generate the expressions with lng be a set of optional distinctive transformations of the substitution type. In both sets of rules he has to offer a battery of alternatives with no apparent principle holding them together.

COLOPHON:

 This was published in Indian Linguistics 16: 196-02, 1965 (published 1966) (Sukumar Sen Felicitation volume).

 Dr. Fred C.C. Pengs presentation (referred to footnote*) is available as The Numeric system of standard Chinese, Appendix III to Fulcrum technique for Chinese English Machine Translation, Bunter  Ramo Corporation Report, July 1963)

 It is interesting to note that another study was also stimulated by Dr. Pengs study and its many imperfections (no generation of second numerals higher than 9999, 9999, 9999, 9999 the ad hoc nature of some of his rules): See Barron Brainerd.

 Two grammar for Chinese number names, Canedian journal of Linguistics 12:1, 83-51, 1966, Brainerds first of the two grammars presented in section 4 A detection grammar of Chinese number means. Is very much similar is the one presented here.