**A
NOTE ON STANDARD CHINESE NUMERALS**

The purpose of this
note* is to focus attention on the structure of complex numerals in Standard Chinese. To this end a set of rewrite rules is presented
that would generate the infinitely denumerable set of cardinal numerals
in standard Chinese as used, say, in naming abstract integers in mathematical
discourse. It will be interesting
to see how these rules would fit
into a fuller grammar of noun phrases – how, to be more specific,
the cardinal numerals generated here would “subsequently” behave in
relation to the so-called measures or classifiers that would follow
them ordinarily. (“Subsequently” here of course refers to the
descriptive order). But this
has not been attempted in this note.

The starting point of this fragment of a generative grammar
may be designated as N which represents the set of cardinal numeral
expressions as named in the abstract including the expression # líng
# ‘zero’ All Chinese forms are cited in the Yale romanization.^{1}
There are as many forms generated after applying the lexical rule
as there are integers plus, of course, the zero.
The remaining rules represent merely non-distinctive morphophonemic
readjustment that is either obligatory (conditioned ‘changes’) or
optional (free alternations) The ordering of the rules is indicated
by the prefixed numerals; small letters are used merely to indicate
unordered subdivisions of a complex rule.
Thus, (11a) and (11b) may be applied in either order after
applying (10). The expression
‘opt’. Prefixed to a rule
instructs that the application of the rule is optional n refers to
any of the interior exponential numerals suffixed to the constituent - position symbols H and L (referring to the
higher and the lower order constituents respectively) and to the constituent
class symbols h and I (higher and lower order group numerals) and
u (unit numerals). As the glosses inserted in the lexical rules
will bring out, this is essentially a decimal system reaching up to
1,0000,0000,0000’ (the placing
of the commas is Chinese) and using the operations of multiplication
and addition.

Start with: # N #

Constituent structure rules:

(1) N → Ht_{4} + H_{3}+ H_{2}
+ H

This is the basic division into four higher order constituent
designating respectively the number of 101^{2}_{5},
10^{8}_{5}, 10^{4}_{5}, and units
respectively. So the highest numeral that can be named without using the next
recursive rule comes out as ‘9999 X 10^{12} X 0 X 10^{8}
X 0 X 10^{4} X 0’ (= ‘10^{16}-1).

N + *h*_{4}

2) H_{14} →

H_{4}

This is the recursive deus ex machine that makes the set N
not an infinitely denumerable set. So that the next numeral after
the one just named comes out as ‘(1X10) X 10^{12} + 0 X10^{8}
+ 0 X 10^{4} + 0 (= ‘10^{16’}).

3) Hn → L_{4} + L_{3} + L_{2}
+ L_{1} + h_{n}

Each higher order constituent is broken down into four lower
order constituents designating the thousands, the hundreds, the tens,
and the units in its followed by the higher order group numeral (‘10^{2}’,
‘10^{9}’’10^{4}’, ‘10^{3}’ as the case may
be).

4) L_{n} → u
+ ln

Finally the lower order constituent is broken into two – the
unit numeral (one to nine or zero) followed by the lower order group
numeral (thousand, hundred, ten, or unit as the case may be).
Note how the inferior_{n} is s Rules 3 and 4 serves
to impart numerical exponents to h and l matching those of H and L
respectively. The terminal string of class symbols (as assortment of h, l, u)
is now ready for the lexicon.

Lexicon:

Only u remains without a numerical exponent.

(5) u → u_{9}/u_{8}/u_{7}/u_{6}/u_{5}/u_{4}/u_{3}/u_{2}/u_{1}/u_{0}.

For the convenience of the reader the spellings and glosses
of the ultimate constituent may be presented here.
The spellings are actually not needed till after Rule 14 below.

Class h: h_{4}
Jao ‘x 10^{12}+’

h_{3} yi ‘x 10^{8} +’

h_{2} wan ‘x 10^{6} ‘

h_{1} ‘x
10^{0} +.

Class l: l_{4
}chyan ‘x 10^{3}+’

l_{3} bei ‘x 10^{2} +’

l_{2} shi ‘x 10^{1} ‘

l_{1} ‘x
10^{0} +.

Class u u_{9
}jyou ‘9’

u_{8} ba ‘ 8 ‘

u_{7 }chi_{ }‘7’_{}

_{ }u_{6 }lyu
‘6’_{}

_{ }u_{5 }wu
‘5’ _{}

_{ }u_{4 }sz
‘4’_{}

_{ }u_{3 }san ‘3’

_{u2 }er ‘2’

u^{1 }yl ‘1’

u^{0 }ling ‘o’

Not the multiplication and addition symbols that go with the
group numerals glossed as powers of ten.
The string generated after the application of (5) is subject
to a certain amount of simplification and free alternative as seen
in the remaining rues.

Non distinctive readjustment:

6) u_{o} + l_{n} → u_{o}

7) u_{o} + u_{o} → u_{o}

8) l_{n} + u_{o} +h_{n} → l_{n} + h_{n}

9) u_{o} + h_{n} → u_{o}

10) u_{o} + u_{o} → u_{o}

11a) h_{n} + u_{0 }# → u_{o
}#

11b) # u_{o} + u_{o} → # u_{n}

These are all essentially zeroing rules that cut out the excess
fat left by the earlier rules and that all involve u_{o }–
the all-important line ‘zero’. Even
a short numeral expression like #
wǔshi # 50’ will appear in a form glossable in some such fashion
after applying (5) – (0x10^{3}+0x10^{2 }+0+10^{1}+0x10^{0})
x10^{12} +(0x10^{3} +0x10^{2} ox10^{1}
+0 x10^{0}) x10^{8} +( 0x10^{3} + 0x10^{1}+
0x10^{0}) x10^{8 }+(0 x10^{3} +0 x 10^{2}+
0x10^{0}) x 10^{8} + (0x10^{3} + 0x10^{2}
+0x10^{1} + 0x 10^{0}) x 10^{4} + (0 x10^{3}+
0x10^{2} + 5x10^{1} +0x10^{0}) x10^{0).}
These six rules will reduce this to us + l2 + h1 (that is, (5 x 10^{1})
x 10^{6}). Of course ling does survive in some expression
– in # wǔ chy ān
ling wu # 5005’ for example. Indeed
the handling of this fact precisely constitutes the principal methodological
interest of this note.^{2 } Note the reinsertion of the
same rule at two points = (7) and (10); this rule is of course recursive
in its effect of reducing a string of successive u_{o} s to
a single u_{o}. Rules 6 to 8 simplify lower order constitutes
within a single higher order constituent. Rules 9 to 11 simplify higher order constituents within the numeral
expression as a whole.

(12_{a}) h_{1} → ʘ

(12_{b}) l_{l} → ʘ

Now we can remove the two dummy group numerals h_{1}
and l_{1}, since they have served their function.
They enabled us to simplify Rules 1 to 4 since the last higher
order constituent did not have to be treated differently from the
remaining constituents of the same rank and also to simplify Rules
8 and 9 for similar reasons. They
have also the added merit of providing a convenient
base for the glossing rules that we may want to apply after
the lexicon has been worked through.

(13_{a}
opt) l_{n} + u_{n} + 1_{n-1} # → l_{n} + u_{n}

_{ }(13^{1}
opt) h_{n} + u_{n} + 1_{n-1} # → l_{n} + u_{n}

(14 opt.)_{
}u_{1 + }l_{2 }→
l_{2 }

These are a couple of idiomatic omissions illustrated
by # yí wan we
chayn # 15000 also appearing as ## yí
wán
wǔ
or by # yí shí wŭ# ‘15’
much more commonly appearing as # shí
wu # (‘ten –five’ rather than ‘one –ten-five’).

For numeral expressions in the telephonic style used, for example,
in naming calendar years (as # yῑ
jyǒu
sźyῑnyán # ‘the
year 1941’) we need a set of alternative rules that will replace Rules
6 to 14.

(6*a)
h_{n}→ʘ

(6*b)
l_{n}→ʘ

(7*) # u_{o}→ʘ

Note that ‘the year
5005’ will come out as #3 wu ling ling wǔ
nyán # while the regular expression will be # wǔ chayān ling wǔ # with only one líng. Rule (7*)
of course will be recursive in its effect of wiping out strings of
zeros at the beginning.

Finally, we have a couple of rules about the phonemic shape
of the unit numerals that would, in a fuller grammar, be couched in
slightly different terms to cover additional facts not within the
scope of this fragment. The position of these rules will also have
to be reconsidered.

(15 opt.) u_{2 }(er) (except in the following environments
: before # and before l_{2}) →lyăng.

(16a) u_{1} (yi) (before ‘ )___ yí

(16b) u_{1} (y) (before ̀
)___ yí

(17a opt.) u_{1} (chi) (before ‘ )___ chi

(17b opt.) u_{8} (ba) (before ‘ )___ ba

Rules for juncture
and the placement of the carminatives accent have not been considered
here. Probably these can be
built into the constituent structure.

Some sample generative histories may now be presented for some
cardinal numeral expressions. For
all numerals choosing H_{4} rather than N + h_{4}
under Rule (2) – that is all numerals from zero to (10^{16}-1)
– the string is going to look like this after applying Rule (4) (omitting
# and + signs):

U_{l4} u_{l3} u_{l2 }ul_{l }h_{4}
ul_{4} ul_{3} ul_{l }ul_{l}
h_{3} – ul_{4} ul_{3} ul_{2}
ul_{1} h_{2} ul_{4} ul_{2 }ul_{1
}h_{1 }

Where *u* may be replaced by anything
from u_{0} to u_{9} by virtue of Rule (5). The generative histories from that point on
for 232’, ‘1560’ and ‘400,0506’ are as follows (omitting # and + signs)
:

First Example # ér běi sān
shíér
or # lyáng bei sān shíér #
‘232’.

After Rule 6 (applicable at 13 points):

u_{0} u_{0 }u_{0} u_{0 h4
uo uo yo yo h33 uo yo yo h2uo u2l3u3l2 u33l2u2l1}h_{1 }

After Rule 7 (applicable at 6 points and again at 3 points):

U_{o}h_{4}uoh_{3}u_{0}u_{2}l_{3}u_{3}l_{2u}2l_{1}h_{1}

Rule
8 is inapplicable

After
Rule 9 (applicable at 3 points):

U_{o}u_{o}u_{o}u_{o}u_{2}
l_{3}u_{3}l_{2}u_{2}l_{1}h_{1}

After Rule 10 (applicable at 2 points and again at 1 point):

U_{o}u_{2}l_{3}u_{3}u_{2}l_{1}h_{1}

After Rule 11 (11 a is inapplicable):

U_{2}l_{3}u_{3}l_{2}u_{2}l_{1}h_{1}

After Rule 12:

U_{2}l_{3}u_{3}l_{2}u_{2}
# ér
běi sān shí ér
#

Rules 13 and 14 are inapplicable

After rule 15 (optional):

# lyăng běi
sān
shíér
#

Rule 16 and 17 are inapplicable

After applying the general rule governing sequences of
Tone˅:

# lyăng
běisān shí ér
#

Second example # yíchyān
wũ
bei lyu shí#or
# yíchyān w ú bei
ly ú # 1560

After Rule 6 to 11 (11 a is in applicable)

U1l4_{u}5lu6l2h1

After Rule 12 (12 b is in applicable)

U_{1}l_{4}u_{5}l_{3}u_{6}l_{2}
# yí chayān
wú bĕi
lyú shí #

After 13 (optional; 13 b is inapplicable):

# yīchyān
wu bĕi
lyú #

After making the total readjustments, we get the two expected
forms.

Third example # sź bĕi
wàn
líng wú bĕi
líng lyú # ‘400,
0506’.

After Rule 6:

u_{o}u_{o}u_{o}u_{o}h_{4}u_{o}u_{o}u_{o}h_{3}u_{o}u_{4}l_{3}u_{o}u_{o}h_{2}u_{5}l_{3}u_{o}
u_{6} l_{1}h_{1}

After Rule 7:

U_{o}h_{4}u_{o}h_{3}u_{o}u_{4}l_{3}u_{o}h_{2}u_{o}u_{5}l_{3}u_{o}u_{6}l_{1}h_{1}

After Rule 8:

U_{o}h_{4}u_{o}h_{3}u_{o}u_{4}l_{3}h_{u}o_{u5}l_{3}u_{o}u_{g}
l_{l }h_{1}

After Rule 9:

U_{o}u_{o}u_{o}u_{4}l_{3}h_{2}u_{o}u_{5}l_{3}u_{o}u_{6}l_{1}h_{l
}

After Rule 10:

U_{o}u_{4}l_{3}h_{2}u_{o}u_{5}l_{3}u_{o}u_{6}l_{1}h_{1}

After Rule 11 (11 a is inapplicable):

U_{4}l_{3}h_{2}u_{o}u_{5}l_{3}u_{o}u_{6}l_{1}h_{1}

After Rule 12:

U_{4}l_{3}h_{2}u_{o}u_{5}l_{3}u_{o}u_{6}
# szˊ bei
wan ling wu bei ling lyu

Rules 13 to 17 are inapplicable.

After making the total readjustments,
we get the required form.

To conclude, I should like to highlight two methodological
choices we have made that have a bearing on evaluation procedures.
First, I have handled the patterning entirely through constituent
structure rules rather than having both these and transformation rules. (I do not regard the so-called obligatory
transformations and the optional non distinctive transformations like
our Rules 15 and 17 as transformation rules properly belonging to
grammar.) I have been guided by two considerations: If
a complexity can be handled at a “lower” level, then one should not
postpone it to a “higher”, costlier level.
I have distributed the complexity over constituent structure
rules and non-distinctive readjustment rules and eliminated PENG’S
option distinctive transformations (see footnotes * and 1).
Further I would rather have it that the contrast between a
kernel and its transform be what Martin Joos once called a “negligible”
one than a “major” one. Secondly, in making this choice, we have also
made a choice at another point
– we have accepted simple generative rules and complicated generative
histories rather than complicated generative rules and simpler generative
histories (as PENG appears to have done).
I think we are justified in doing this as a matter of principle. Moreover the fact that the generative histories of all lower numerals
(choosing H^{4 }at Rule)
run on parallel lines over the grammatical stretch is certainly a
relieving feature of our choice.
That this enables us to handle telephonic style numerals with
great ease (Rules 6* and 7*) is of course an added bonus.

------------------------------------------------------------------------------------------------------------

*
The stimulus of this note was the paper on Chinese numerals
read by Dr. Fred C.C. PENG (Bunker-Ramo Corporation) at the Summer
–1965 meeting of the Linguistic Society of America at Ann Arbor, Michigan.
I am thankful to him for this and to Mr. James LIANG (Pennsylvania),
Dr, William S.–Y. WANG (California at Berkeley), Dr. H.S. BILIGIRI
and Dr. D.N. Shankara BHATT (both Deccan College) for useful discussions.

1 The symbols ˍˎ˅ˋ˗˘ over
the nucleus represent respectively the first, the second, the third,
and the forth tone.

2 Dr. PENG The stimulus of this note was
the paper on Chinese numerals read by Dr. Fred C.C. PENG (Bunker-Ramo
Corporation) t the summer 1965 meeting of the linguistic society of
America at Ann arbor, Michigan. I
am thankful to him for this and to Mr.James LIANG (Pennsylvania),
dr. William S-Y. Wang (California at Berkeley), Dr, H. S. Biligiri
and Dr. D. N. SHANKARA BHATT (both Deccan college) for useful discussions,
generates only the expressions lacking líng by his constituent structure rules and then goes on to
generate the expressions with líng be a set of optional distinctive
transformations of the substitution type.
In both sets of rules he has to offer a battery of alternatives
with no apparent principle holding them together.

COLOPHON:

This was published in *Indian Linguistics* 16: 196-02,
1965 (published 1966) (Sukumar Sen Felicitation volume).

Dr. Fred C.C. Peng’s presentation (referred to footnote*) is
available as “The Numeric system of standard Chinese, Appendix III
to ‘Fulcrum technique for Chinese English Machine Translation, Bunter
– Ramo Corporation Report, July 1963)

It is interesting to note that another study was also stimulated
by Dr. Peng’s study and its many imperfections (no generation of second
numerals higher than 9999, 9999, 9999, 9999 the ad hoc nature of some
of his rules): See Barron Brainerd.

Two grammar for Chinese number names’, *Canedian journal
of Linguistics* 12:1, 83-51, 1966, Brainerd’s first of the two
grammars presented in section 4 A detection grammar of Chinese number
means.” Is very much similar is the one presented here.