The purpose of this note* is to focus attention on the structure of complex  numerals in Standard Chinese.  To this end a set of rewrite rules is presented that would generate the infinitely denumerable set of cardinal numerals in standard Chinese as used, say, in naming abstract integers in mathematical discourse.  It will be interesting to see how these rules would   fit into a fuller grammar of noun phrases – how, to be more specific, the cardinal numerals generated here would “subsequently” behave in relation to the so-called measures or classifiers that would follow them ordinarily.  (“Subsequently” here of course refers to the descriptive order).  But this has not been attempted in this note.

­            The starting point of this fragment of a generative grammar may be designated as N which represents the set of cardinal numeral expressions as named in the abstract including the expression # líng # ‘zero’ All Chinese forms are cited in the Yale romanization.1 There are as many forms generated after applying the lexical rule as there are integers plus, of course, the zero.  The remaining rules represent merely non-distinctive morphophonemic readjustment that is either obligatory (conditioned ‘changes’) or optional (free alternations) The ordering of the rules is indicated by the prefixed numerals; small letters are used merely to indicate unordered subdivisions of a complex rule.  Thus, (11a) and (11b) may be applied in either order after applying (10).  The expression ‘opt’.  Prefixed to a rule instructs that the application of the rule is optional n refers to any of the interior exponential numerals suffixed to the constituent  - position symbols H and L (referring to the higher and the lower order constituents respectively) and to the constituent class symbols h and I (higher and lower order group numerals) and u (unit numerals).  As the glosses inserted in the lexical rules will bring out, this is essentially a decimal system reaching up to 1,0000,0000,0000’  (the placing of the commas is Chinese) and using the operations of multiplication and addition.

            Start with:  # N #

Constituent structure rules:

            ­(1) N       Ht4 + H3+ H2 + H­

            This­ is the basic division into four higher order constituent designating respectively the number of 10125, 1085, 1045, and units respectively.  So the highest numeral that can be named without using the next recursive rule comes out as ‘9999 X 1012 X 0 X 108 X 0 X 104 X 0’ (= ‘1016-1).

                                    N + h4

2) H14          


            This is the recursive deus ex machine that makes the set N not an infinitely denumerable set. So that the next numeral after the one just named comes out as ‘(1X10) X 1012 + 0 X108 + 0 X 104 + 0 (= ‘1016’). 

            3) Hn        L4 + L3 + L2 + L1  + hn

            Each higher order constituent is broken down into four lower order constituents designating the thousands, the hundreds, the tens, and the units in its followed by the higher order group numeral (‘102’, ‘109’’104’, ‘103’ as the case may be).

            4) Ln     u + ln

            Finally the lower order constituent is broken into two – the unit numeral (one to nine or zero) followed by the lower order group numeral (thousand, hundred, ten, or unit as the case may be).  Note how the inferiorn is s Rules 3 and 4 serves to impart numerical exponents to h and l matching those of H and L respectively.  The terminal string of class symbols (as assortment of h, l, u) is now ready for the lexicon.


            Only u remains without a numerical exponent.

(5) u       u9/u8/u7/u6/u5/u4/u3/u2/u1/u0.

            For the convenience of the reader the spellings and glosses of the ultimate constituent may be presented here.  The spellings are actually not needed till after Rule 14 below.

Class h:            h4                     Jao                   ‘x 1012+’

                        h3                     yi                      ‘x 108 +’

                        h2                     wan                  ‘x 106

                        h1                                             ‘x 100 +.

Class l:             l4                             chyan               ‘x 103+’

                        l3                      bei                    ‘x 102 +’

                        l2                      shi                    ‘x 101

                        l1                                              ‘x 100 +.

Class u             u9                            jyou                  ‘9’

                        u8                     ba                    ‘ 8 ‘

                        u7                            chi                          ‘7’

                                u6                            lyu                    ‘6’

                                u5                            wu                    ‘5’       

                                u4                            sz                     ‘4’

                u3                            san                   ‘3’

                        u2                             er                     ‘2’

                        u1                            yl                      ‘1’

                        u0                            ling                   ‘o’

            Not the multiplication and addition symbols that go with the group numerals glossed as powers of ten.  The string generated after the application of (5) is subject to a certain amount of simplification and free alternative as seen in the remaining rues.

Non distinctive readjustment:

            6)         uo + ln            uo

            7)         uo + uo           uo

            8)         ln + uo +hn      ln + hn

            9)         uo + hn           uo

            10)       uo + uo           uo

            11a)     hn + u0  #        uo  #

            11b)     # uo + uo       # un

­            These are all essentially zeroing rules that cut out the excess fat left by the earlier rules and that all involve uo – the all-important line ‘zero’.  Even a short numeral expression like  # wǔshi # 50’ will appear in a form glossable in some such fashion after applying (5) – (0x103+0x102 +0+101+0x100) x1012 +(0x103 +0x102 ox101 +0 x100) x108 +( 0x103 + 0x101+ 0x100) x108 +(0 x103 +0 x 102+ 0x100) x 108 + (0x103 + 0x102 +0x101 + 0x 100) x 104 + (0 x103+ 0x102 + 5x101 +0x100) x100). These six rules will reduce this to us + l2 + h1 (that is, (5 x 101) x 106).  Of course ling does survive in some expression – in # wǔ chy ān ling wu # 5005’ for example.  Indeed the handling of this fact precisely constitutes the principal methodological interest of this note.2  Note  the reinsertion of the same rule at two points = (7) and (10); this rule is of course recursive in its effect of reducing a string of successive uo s to a single uo.   Rules 6 to 8 simplify lower order constitutes within a single higher order constituent.  Rules 9 to 11 simplify higher order constituents within the numeral expression as a whole.

            (12a) h1          ʘ        

(12b) ll          ʘ

            Now we can remove the two dummy group numerals h1 and l1, since they have served their function.  They enabled us to simplify Rules 1 to 4 since the last higher order constituent did not have to be treated differently from the remaining constituents of the same rank and also to simplify Rules 8 and 9 for similar reasons.  They have also the added merit of providing a convenient   base for the glossing rules that we may want to apply after the lexicon has been worked through.

            (13a opt) ln + un + 1n-1 #         ln + un

                (131 opt) hn + un + 1n-1 #        ln + un

            (14 opt.) u1 + l2 l2  

These are a couple of idiomatic omissions illustrated by # yí wan we chayn # 15000 also appearing as ## yí wán wǔ or by # yí shí wŭ# ‘15’ much more commonly appearing as # shí wu # (‘ten –five’ rather than ‘one –ten-five’).

            For numeral expressions in the telephonic style used, for example, in naming calendar years (as # y jyǒu sźynyán # ‘the year 1941’) we need a set of alternative rules that will replace Rules 6 to 14.

            (6*a) hn→ʘ

            (6*b) ln→ʘ      

(7*) # uo→ʘ

Note that ‘the year 5005’ will come out as #3 wu ling ling wǔ  nyán # while the regular expression will be # wǔ  chayān ling wǔ  # with only one líng.  Rule (7*) of course will be recursive in its effect of wiping out strings of zeros at the beginning.

            Finally, we have a couple of rules about the phonemic shape of the unit numerals that would, in a fuller grammar, be couched in slightly different terms to cover additional facts not within the scope of this fragment.  The position of these rules will also have to be reconsidered.

            (15 opt.) u2 (er) (except in the following environments : before # and before l2) lyăng.

            (16a) u1 (yi) (before ‘ )___ yí

            (16b) u1 (y) (before ̀ )___ yí

            (17a opt.) u1 (chi) (before ‘ )___ chi

            (17b opt.) u8 (ba) (before ‘ )___ ba

Rules for juncture and the placement of the carminatives accent have not been considered here.  Probably these can be built into the constituent structure.

            Some sample generative histories may now be presented for some cardinal numeral expressions.  For all numerals choosing H4 rather than N + h4 under Rule (2) – that is all numerals from zero to (1016-1) – the string is going to look like this after applying Rule (4) (omitting # and + signs):

            Ul4 ul3 ul2 ull h4 ul4 ul3 ull ull   h3 – ul4 ul3 ul2 ul1 h2 ul4 ul2 ul1 h1        

Where u may be replaced by anything from u0 to u9 by virtue of Rule (5).  The generative histories from that point on for 232’, ‘1560’ and ‘400,0506’ are as follows (omitting # and + signs) :

            First Example # ér běi sān shíér or # lyáng bei sān shíér # ‘232’.

            After Rule 6 (applicable at 13 points):

            u0 u0 u0 u0 h4  uo  uo  yo  yo  h33 uo yo yo h2uo u2l3u3l2 u33l2u2l1h1 

After Rule 7 (applicable at 6 points and again at 3 points):


            Rule 8 is inapplicable

            After Rule 9 (applicable at 3 points):

            Uouououou2 l3u3l2u2l1h1

After Rule 10 (applicable at 2 points and again at 1 point):


After Rule 11 (11 a is inapplicable):


After Rule 12:

            U2l3u3l2u2 # ér běi sān shí ér #

Rules 13 and 14 are inapplicable

After rule 15 (optional):

# lyăng běi sān shíér #

Rule 16 and 17 are inapplicable

After applying the general rule governing sequences of Tone˅:

            # lyăng běisān shí ér #

Second example # yíchyān wũ bei lyu shí#or # yíchyān w ú bei ly ú # 1560

After Rule 6 to 11 (11 a is in applicable)


After Rule 12 (12 b is in applicable)

            U1l4u5l3u6l2 # yí chayān wú bĕi lyú shí #

After 13 (optional; 13 b is inapplicable):

            # yīchyān wu bĕi lyú #

Ruels 14 and 15 are imapplicable./

After making the total readjustments, we get the two expected forms.

Third example # sź bĕi wàn líng wú bĕi líng lyú # ‘400, 0506’.

After Rule 6:

            uouououoh4uououoh3uou4l3uouoh2u5l3uo u6 l1h1

After Rule 7:


After Rule 8:

            Uoh4uoh3uou4l3huou5l3uoug ll h1

After Rule 9:


After Rule 10:


After Rule 11 (11 a is inapplicable):


After Rule 12:

            U4l3h2uou5l3uou6 # szˊ bei wan ling wu bei ling lyu

Rules 13 to 17 are inapplicable.

After making the total readjustments, we get the required form.

            To conclude, I should like to highlight two methodological choices we have made that have a bearing on evaluation procedures.  First, I have handled the patterning entirely through constituent structure rules rather than having both these and transformation rules.   (I do not regard the so-called obligatory transformations and the optional non distinctive transformations like our Rules 15 and 17 as transformation rules properly belonging to grammar.)  I have been guided by two considerations: If a complexity can be handled at a “lower” level, then one should not postpone it to a “higher”, costlier level.  I have distributed the complexity over constituent structure rules and non-distinctive readjustment rules and eliminated PENG’S option distinctive transformations (see footnotes * and 1).  Further I would rather have it that the contrast between a kernel and its transform be what Martin Joos once called a “negligible” one than a “major” one.  Secondly, in making this choice, we have also made a choice at another   point – we have accepted simple generative rules and complicated generative histories rather than complicated generative rules and simpler generative histories (as PENG appears to have done).  I think we are justified in doing this as a matter of principle.  Moreover the fact that the generative histories of all lower numerals (choosing H4 at   Rule) run on parallel lines over the grammatical stretch is certainly a relieving feature of our choice.  That this enables us to handle telephonic style numerals with great ease (Rules 6* and 7*) is of course an added bonus.


*          The stimulus of this note was the paper on Chinese numerals read by Dr. Fred C.C. PENG (Bunker-Ramo Corporation) at the Summer –1965 meeting of the Linguistic Society of America at Ann Arbor, Michigan.  I am thankful to him for this and to Mr. James LIANG (Pennsylvania), Dr, William S.–Y. WANG (California at Berkeley), Dr. H.S. BILIGIRI and Dr. D.N. Shankara BHATT (both Deccan College) for useful discussions.

1          The symbols ˍˎ˅ˋ˗˘   over the nucleus represent respectively the first, the second, the third, and the forth tone.

2          Dr. PENG The stimulus of this note was the paper on Chinese numerals read by Dr. Fred C.C. PENG (Bunker-Ramo Corporation) t the summer 1965 meeting of the linguistic society of America at Ann arbor, Michigan.  I am thankful to him for this and to Mr.James LIANG (Pennsylvania), dr. William S-Y.  Wang (California at Berkeley), Dr, H. S. Biligiri and Dr. D. N. SHANKARA BHATT (both Deccan college) for useful discussions, generates only the expressions lacking líng by his constituent structure rules and then goes on to generate the expressions with líng be a set of optional distinctive transformations of the substitution type.  In both sets of rules he has to offer a battery of alternatives with no apparent principle holding them together.


            This was published in Indian Linguistics 16: 196-02, 1965 (published 1966) (Sukumar Sen Felicitation volume).

            Dr. Fred C.C. Peng’s presentation (referred to footnote*) is available as “The Numeric system of standard Chinese, Appendix III to ‘Fulcrum technique for Chinese English Machine Translation, Bunter – Ramo Corporation Report, July 1963)

            It is interesting to note that another study was also stimulated by Dr. Peng’s study and its many imperfections (no generation of second numerals higher than 9999, 9999, 9999, 9999 the ad hoc nature of some of his rules): See Barron Brainerd. 

            Two grammar for Chinese number names’, Canedian journal of Linguistics 12:1, 83-51, 1966, Brainerd’s first of the two grammars presented in section 4 A detection grammar of Chinese number means.” Is very much similar is the one presented here.