FREQUENCY
COUNTS : THE HOW AND THE WHY
What
follows is the ‘front matter’ on Oriya language frequency count that
was undertaken by me at Deccan College and completed in June 1966.
The reason for presenting the front matter consisting of table
of Contests, Preface, Introduction, Appendixes 1-3 to the Introduction
in this fashion are twofold:
(1)
The body of the text consisting of the frequency counts in the form
of Lists 1-8 in mentioned in the ‘Contents’ has already been published-but
without the front matter thanks to a gross oversight on the publisher’s
part. Those who use this book
will naturally have looked for some such introduction. They will find this presentation indispensable for the proper use
of the book.
(2)
Other scholars engaged in frequency counts of this kind or even merely
planning such projects will probably find this presentation useful
as a model worth considering and will find the sections entitled ‘Procedural
Steps’ (steps1-14 giving the ‘how’) and: Possible Applications’ (technical,
educational, the scholarly the ‘why’ from the Introduction suggestive
even if they have little or no interest in Oriya.
I trust
that this presentation will be of some use.
C O N T E N T S
Preface
Introduction
Appendix 1 Oriya alphabet
in its graphic and phonic aspects.
Appendix 2 Grammatically important endings.
Appendix 3 The source material.
List 1 Words in alphabetical
order with their frequencies
1-2846
List 2 Words in rank
order of descending frequency (of 10 or above) 287-311
List 3 Syllables in
alphabetical order with their frequencies
312-340
List 4 Syllables in
rank order of descending frequency
341-369
List 5 Letter symbols
in alphabetical order with their frequencies 370
List 5A Graphemes
in alphabetical order with their frequencies
371
List 5B Phonemes in
alphabetical order with their frequencies
372
List 6 Letter symbols
in rank order of descending frequency.
List 7 Grammatically
important endings in alphabetical order with their
Frequency.
List 8 Grammatically
important endings in rank order of descending
Frequency.
*************
P R E F A C E
This
study of the phonemic and morphemic frequencies in Oriya is part of
a project sponsored by the Ministry of Education, Government of India
for ‘ the evolution of a system or shorthand suited to the genius
of Hindi Language in particular and other regional language in particular
and other regional language in general’.
Dr. S. V. Bhagwat’s doctotral dissertation on the Marathi frequencies
antedates this project, under which studies in Hindi, Gujarati, Kannada,
and Malayam have been completed.
Work on Oriya was carried out at Deccan College.
It was started early in 1963 and completed in a little over
three year’s time. Benefiting
from the other studies, in this series this study suffered on fewer
occasions from false starts and retracing of steps.
Dr. A. M. Ghatage give generously of his time and advice and
provided general supervision. Dr.
D. P. Pattanyak (of the American Institute of Indian Studies) was
consulted on the phonological system and the grammatical endings of
Oriya. Dr. S. G. Prabhu Ajgaonkar,
Research Associate in Statistics, gave valuable technical advice and
took over immediate supervision during my leave of absence from April
to October 1985. Mr.
Mohan Charan Senapati, B. A. , native speaker of Oriya, was
appointed a Research Scholar and carried out the actual counting, the processing and arranging
of the slips, the machine-sorting of the punched cards, and the preparation
of the lists and also helped with the proofreading of the typescript.
Mrs. B. V. Guhagarkar did the punching of the cards and helped
with their sorting. Miss S. M. Kulakarni did the mechanical verification
of the of the punched cards. Mr. D. D. Phadke was enstrusted with the complicated
typing.
My thanks are due to all of these as also to
the office and the library of Deccan College.
It
is hoped that the results presented here will be of use not only for
the immediate purpose of devising methods of speedwriting but also
for various other technical, educational, and scholarly purposes.
30 June 1966.
A. R. Kelkar.
I N T R O D U C T I O N
ORIENTATION
The
fact that this study of phonological, lexical, and grammatical frequencies
in Oriya is one of a series has imparted to it a set shape, so that
the results obtained for the various Indian languages covered remain
comparable. Though the devising of methods of speedwriting
was the end immediately in view, the work was designed in such a way
that other purposes may also be served.
The
immediate goal of this study was the preparation of four pairs of
lists.
(i)
The words with the number of times each occurred in the corpus.
(ii)
The syllable shapes with their frequencies.
(iii)
The letters in their graphic and phonic aspects with their frequencies.
(iv)
Some grammatically important endings (whether consisting of single
morphs or not) with their frequencies.
The lists are personated
first in the Oriya alphabetical order and then in the rank-
order and then in the rank-order of descending
frequency.
In
counting words the various grammatical forms of what would be a single
entry-heading in a dictionary were counted separately-it was as if
take, takes, took, taken, taking were all treated as
different word-types, in deciding where one word ends and another
begins, conventional spacing was the criterion used, as if full
stop, full-time, and fulfill were counted as two, one,
and one word-token respectively. All this sounds (and is) a little arbitrary but unavoidably so in
handling a large corpus of about 1,00,000 word-tokens.
The
corpus consisted of word-tokens sampled out of a larger body of printed
matter in the form of books published in contemporary standard Oriya,
and of periodicals covering of year’s duration.
It was decided that obtaining and using radio and movie scripts
as was done with some of the other studies in this series was not
feasible.
A
word about the slightly technical but very important distinction between
type and token at earlier may not be out of place. When we are counting units of any kind-whether letters or syllables
or words or suffixes-we can do so in either of two ways-either count
the occurrences of tokens or count the recurrent types. For example, in the text it is raining,
the letter-type –I has 4 letter-tokens, the letter-type n has 2 letter-tokens,
etc., and in all there are 7 letter types an 11 letters-token.
Before
looking at some of the findings and possible applications, we should
be clearly aware of the procedure used in arriving at the lists. The procedural statement will enable the reader to understand and
evaluates the results properly; it may also help some future worker
undertaking a similar frequency count.
PROCEDURAL STEPS
For
the sake of convenience of exposition, a set of distinct steps are
described below. In actual
practice, of course, there was a good deal of overlap and adjustment
to the availability of man-power, machines, and material.
Steps
1 to 3 below constitutes the stage of the collection of data. The next stage, namely collation and analysis
of the data comprises steps 4 to 8.
the remaining steps, 9 to 14, cover the final stage leading
to the presentation of the results in the form of 10 lists-numbered
1, 2, 3, 4, 5, 5A, 5B, 6, 7, and 8.
The three stages roughly took a year each.
Step
1 selecting and obtaining of the course material:
The
body of source material had to be large enough so that a sample of
about one lac word-tokens could be obtained at the rate of 2 word-tokens
per page of a book or per column of a periodical.
It also had to be diversified enough and distributed properly
so that the whole represented a fair selection of contemporary standard
Oriya as used in printed matter.
The following table will give a general idea of the distribution
of the material.
Tokens
|
Number of
titles
|
Matter
|
year
of publication
|
word-tokens
|
Books
|
85
|
16532 Pages
|
1896 to 1963(about
75 from 1951 to 1953)
|
33,033
|
Monthlies
|
3
|
6383Columns
|
1963-4
|
12,760
|
Delies
|
2
|
30634
Columns
|
1963-4
|
54,271
|
|
Total:
|
1,00,064
|
Geographical distribution was not a serious problem as most
of the publishing activity in Oriya is centred in Cuttack, Orissa
and, to a lesser extent, in Calcutta, W.Bengal, In securing diversity
of contents and media, due weightage was given to the effective readership.
The 85 book titles were distributed as under:-
Noves
13
Collections
of short stories 15
Humour
2
Stage,
lays
8
Biography
and autobiography 8
Essays
and literary criticism 8
Belles
Letters -----
Total 53
-----
science
and science fiction 6
social
and political writings 7
history
2
Non-scientific
technical 1
Travel
1
-----
Adult
non-literary total 17
School
texts
9
For
children
4
General
knowledge for children 2
-----
Non-adult
total 15
----
---
85
---
Of the 3 monthlies one was primarily devoted to political satire,
one the other two being literary.
The 2 dailies were the ones with the highest circulation.
See Appendix 3 of this introduction for a complete list of
the source material with the breakdown or to corpus.
Only 2 word-tokens were randomly pulled out of each page or
column. Thus less than 2 percent out of the running
text was utilized for the count.
This ensured that the frequency of a given item will not be
exaggerated because of its constant recurrence in a short span because
of the subject matter of the text.
Step 2 sampling of the source material in order to obtain the
corpus:
Each item-book or periodical-was ordinarily excepted from twice. In the first count were underlined the first
word of each page of a book, the first word of each column of each
age of a monthly (2-3 columns to a page), and the last word of the
seventh line of each column of each page of a daily (8 column to a
page). In the second count were underlined the last word of the first line
of the last paragraph of each page of a book, the last word of the
first line of the last paragraph of each column of each page of a
monthly, and the last word of the seventh line of each column of each
page of daily. (The procedure for dailies was designed to
avoid the date lines and the news agency names.) A count was maintained of the word-tokens extracted. As soon as the figure 1,00,000 was exceeded.
Collection was suspended (the actual count being 1, 00, 064).
The
way a word was defined for this operation has already been indicated. An incomplete word marked with a hyphen at
the end of a line was completed from the next line. In applying the rules of random choice above the following items
were ignored; proper names, abbreviations, figures (but not spelt-out
numerals), the running heading of a page, news headings in a daily,
verse citations, Sanskrit citations, matter not in the Oriya script
and in display or fancy printing in the advertisement (but not Sanskrit
or English loanwords in Oriya script nor the ordinary letterpress
in an advertisement No page that had anything on it was left out.
Step 3
preparation of the word-token slips:
For each word-token so underlined, a slip (3” X b” size used
throughout) was written out with the word in Oriya script at the center. No attempt to normalize orthography was made
at this stage beyond correcting obvious misprints.
Step 4
Arranging the slips alphabetically:
All slips were arranged in a single alphabetically arranged
file. The Oriya order was used ( see Appendix 1 of
this introduction). This was
done by hand with the help of a pigeon-holed cabinet-taking one letter
at a time from left to right.
Step
5 Removal of duplicate slips:
For each word-type only one slip was retained, the duplicates
being discarded. In doing so the total number of tokens appearing
in the corpus for the word-type-the frequency that word-was written
in the top right corner of the slip.
When a word-type had alternate spellings, a slip with the preferred
spelling was retained with the other spellings noted in the bottom
left corner.
Step 6 Preparation of
the word-type slips:
The 20968 slips so retained were then processed as shown in
the diagram to yield word-type slips.
(4) Serial
(3) Word frequency
Numbers
(6) Number of letters in each syllable
(1)
Spelling I Oriya script
(2)
Spelling in Roman with
syllable divisions.
(5)
Alternate spellings (7) Grammatical
type
in Roman. (8) Endings symbolized.
|
Notes:
1. This is preferred spelling in the judgement of the Research
Scholar who prepared all the slips.
He consulted Pramodacandara Deb’s Pramoda-abhidhāna
(2 vols, Cuttack, 1942.
2. A Roman writing system was devised consisting of 51 letter-symbols
in a certain order (analogous to the Oriya alphabetical order). The symbols together with their equivalence
to the letters of the Oriya script (the graphemes of Oriya and to
the units of the Oriya sound system (the phonemes of
Oriya ) are set out in Appendix 1.
This enables us to deduce the frequencies of the gramphemes (List 5A) and of the phonemes (List 5B) from those
of the letter symbols (List 5). The
slips were arranged in the order of the letter symbols. Syllabic divisions were shown with hyphens.
(For the rules for inserting hyphens see Appendix 1.)
3. This is carried over from step 5. The number includes the occurrence of the word-type in alternate
spellings.
4. This number-from 1 to 20968-shows the place of the slip in
the alphabetical order and links it to the corresponding punched card.
5. These are recorded later in list 1.
6. Thus
for a word like I-la-ka the notation at this point will be 1299 that
is, 1 letter in the first syllable, 2 in the second, and 2 in the
third.
7. All words were classified into 5 types as follows:
Type
0 without
any of the listed endings.
Type 1 with
one nominal endings.
Type 2
with two nominal endings
Type x
with one verbal endings.
Type y
with two verbal endings.
8. A list of the principal grammatical endings was first prepared
and a code number was assigned to each (See Appendix 2 of this Introduction.
The endings occurring in the word were noted here in the form
of code numbers. Thus 22-07 means that the word contains ending 22 followed by ending
07.
Step 7
Preparation of a punched card for each word-type slip:
ICT cards of type 4-354 with column 1 to 80 and twelve rows
(x, y, 0, and 1 to 9) were used).
The columns were utilized punching as follows:
1 to 5 coding the serial number (the first digit in column
1, blanks to the right) tying the card the slip.
6 to 9 coding the word frequency number followed by x (thus,
frequency 13 was punched as follows: 1 under 6, 9 under 7, x under
8, nothing under 9).
10 to 14 coding the spelling of the first syllable
with blanks to the left. For the code number that is assigned to teach
letter symbol and is indicated by single or double punching, see under
that column in the table included in Appendix 1.
15
to 54 coding the spellings of the second to the ninth syllables in
a similar fashion with 5 columns to a syllable.
(No word in the corpus exceeded 9 syllables.)
55
to 63 coding the number of letters in each syllable followed by 0
with blanks to the right (see Step6, note 6).
64
Coding the grammatical type-0, 1, 2, x, y, as the case may be (see
step 6, note 7.)
65
to 68 Coding the grammatical endings-the first in 65-66, the second
if any in 67-68. See Appendix 2 for the code.
69
to 80 Not utilized and left blank.
Step
8 verification of punched cards:
As
soon as the puncher transferred the data from the slip to the punch
card with in the help of a punching machine the verifier verified
the correctness against the slip with the help of verifying machine.
If necessary a new card was punched and the wrongly punched
one discarded.
Step
9 Preparation of List 1:
The
file of word slips at the end of step 6 was utilized for writing out
list 1: word-types spelt in Roman letter symbols arranged alphabetically
with the alternate spellings (if any) in Roman and with the frequency
(out of 1,00,064 tokens) noted against each.
Step 10 Preparation of
List 2:
The slips were hand-sorted according to the
frequency number (slips with frequency numbers below ten were not
sorted discarded) and List 2 was written out: words arranged in an
order of descending frequency from the maximum to frequency 10.
Under each frequency number the order is alphabetical. Words with
a frequency 1 to 9 are too many to be listed profitably-such lists
can always be complied visually by scanning list 1.
Step
11 Machine-sorting for the preparation of lists of syllable-types,
letter-types, and ending types:
The
punched cards were fed into a tabulator for the purpose of sorting. First, they were sorted according to the frequency
number. Subsequently, each
stack was handled separately. Note
that the X suffixed to the frequency number-thus, all cards bearing
X in column 8 and frequencies from 10 to 99 and thus could be separated
first conveniently before they could be sorted further into the stack
for 10, for 11, etc.
Using
the 0 in columns 55-63 (analogously to then use of X in column 6-9)
the cards were than sorted according to the number of syllables in
the word. Using columns 10 to 68 the frequency for the
syllable-type and each letter-type was added up. The appearance of a given letter 30 times in the stack of frequency
3 accounts for 30 x 3 tokens of that letter type, and so on.
The
stack for a given frequency was put together again and then resroted
using columns 64 to 68. The
frequency for each ending-type was added up.
Step
12 Preparation of Lists, 3, 4, 5, 6, 7, 8:
Slips
were than prepared for each syllable-type, each letter-type, and each
ending-type with the frequency number ( the total number of tokens
encountered it the corpus of 1,00,064 word-tokens) in the top right
corner of the slip.
The
three files were each arranged alphabetically and Lists 3, 5, and
7 were prepared respectively for syllable types, and ending-types.
The
files were then manually rearranged in rank order of descending frequency
and Lists 4, 6, and 8 were prepared respectively for syllable-types,
letter-types, and ending-types.
Step
13 preparation of Lists 5A and 5B:
Given
the frequencies of all the letter symbols, those of all the graphemes
and the phonemes can be computed by using the equivalences. For details see Appendix 1.
Step
14 Typing and proofreading:
A
typescript of the 10 lists was prepared constituting the results of
this study and proofread.
SOME FINDINGS:
The
term ‘morphemic frequency’ in the title of this study is, one must
admit, slightly misleading. With
the help of List 1 and List 7, somebody determined enough and knowing
the language well enough could compile a list of morph-types.
The situation is similar if what one is looking for is the
frequency of entry-words in an Oriya dictionary.
(Since Oriya inflections are suffixal in character, all the
grammatical forms of, say, a given verb will be brought together in
the alphabetical word list-List 1.)
The
type-token ratio for the different kinds of units can be seen in the
following table:
Total number
|
Total
number of types
|
Total
number of tokens
|
Words (List 1)
|
20,968
|
1,00,064
|
Syllables (List 3)
|
2,128
|
3,11,318
|
Letter symbols (List 5)
|
51
|
6,39,613
|
Graphemes (List 5A)
|
54
|
5,16,097
|
Phonemes (List 5B)
|
38
|
5,89,518
|
Vowels (List 5B)
|
6
|
3,11,318
|
Consonants (List 5B)
|
31
|
2,75,021
|
Ovowel (List 5B)
|
1
|
3,179
|
Endings (List 7)
|
98
|
31,212
|
Other
interesting ratios that could be worked out from this table are: Syllable-token/word-token,
phoneme-token/word- token, phoneme- token/word- token, ending- token/word-
token, Vowel- token/consonant- token, phoneme- token/syllable- token
etc.,
The
highest and lowest frequencies for the different kinds of units have
been shown by the following types:
|
|
Highest Frequency
|
|
Lowest Frequency
|
Word
|
0
|
1137
|
several
|
1
|
Syllable
|
r
|
11951
|
480 types
|
1
|
Letter symbol
|
ə
|
125314
|
h
|
237
|
Grapheme
|
|
58587
|
|
18
|
Phoneme
|
ə
|
126416
|
|
278
|
Vowel Phoneme
|
ə
|
126416
|
|
8424
|
Consonant phoneme
|
r
|
47282
|
|
278
|
Endings
|
rə,ri
|
4154
|
{four
|
1
1
|
|
|
|
two
|
0
|
it
will be interesting to see how far high-frequency words and endings
contribute to the high frequency of certain syllables and phonemes.
Word-types
may be classified according to the frequency-range (from so many present
to so many percent) and the total number of types and the token percentage)
token total for the range divided by 1,00,064 multiplied by 100) may
be noted for each class.
Frequency
range
|
Number
of types
|
Percent
of Total Population
|
Above 1000
(1% or more)
|
1
|
1.1 %
|
From
200 to 999 (0.2% +)
|
28
|
9.6 %
|
From
100 to 199 (0.1
% +)
|
45
|
9.4 %
|
From
50 to 99 (0.05 % +)
|
176
|
5.9 %
|
Form
10 to 49 (0.01 % +)
|
1548
|
28.4 %
|
From
1 to 9 (at low 0.01 %)
|
|
|
Syllable-types
may be classified according to the structure of each (V, CV, VC, CVC,
etc.) and the total number of types and the token percentage may be
noted for each class.
POSSIBLE APPLICATIONS:
The
results of the study presented in the form of the lists can be put
to various uses when properly interpreted and manipulated.
Technical uses: (1) Designing of methods of speedwriting (short
hand systems)-all the lists, especially syllable and phoneme lists. (2) Designing
of typewriter key boards-grapheme list especially.
(3) Typing fonts-grapheme list especially, (4)
Telecommunication engineering.
(5) Script and spelling
reform Educational uses: designing curricula, reading or recorded
material, proficiency tests, etc. for language learners and users-whether
young or old, whether native speakers, dialect speakers, or foreigners.
Scholarly
uses: (1) Phonological and grammatical analysis, Lexical study. (2) Study of statistical and cybernetic properties
and comparing these with those of other languages. (3) Comparison
between different languages from the point of view of historical study,
linguistic typology, and linguistic anthropology. (4) Statistical
study of literary style.
APPENDIX
–1
ORIYA ALAPHABET IN ITS GRAPHIC AND PHONIC ASPECTS
Serial No.
|
Letter symbol
|
Grapheme
|
Phoneme
|
Code
|
Remarks.
|
1
|
ə
|
|
ə
|
0
|
The phoneme ə also appears in 9, 11;
phonetically it is slightly rounded.
|
2
|
a
|
|
a
|
1
|
|
3
|
i
|
|
i
|
2
|
Cp.3 and 4, length is not phonemic.
|
4
|
i
|
|
i
|
3
|
|
5
|
u
|
|
u
|
4
|
Cp.5 and 6, length is not phonemic.
|
6
|
ū
|
|
u
|
5
|
The phoneme u also appears in 7.
|
7
|
r
|
|
ru
|
6
|
One grapheme but two phonemes.
|
8
|
e
|
|
e
|
7
|
|
9
|
əy
|
|
əy
|
8
|
Like 7.
|
10
|
o
|
|
O
|
9
|
|
11
|
əv
|
|
əv
|
0,1
|
Like 7.
|
12
|
m̄
|
o
|
ṅ
|
0,2
|
Cf.
19; this grapheme correspondence to phoneme ṅ
Except
before letters 20-40 (see below).
|
13
|
m͂
|
|
̃
|
0,3
|
|
14
|
h
|
:
|
h
|
0,4
|
Cf.49.
|
15
|
k
|
|
k
|
0,5
|
|
16
|
kh
|
|
kh
|
0,6
|
The phoneme kh also appears in
50.
|
17
|
g
|
|
g
|
0,7
|
The phoneme g also appers in 51.
|
18
|
gh
|
|
gh
|
0,8
|
|
19
|
ṅ
|
|
|
0,9
|
The phoneme ṅ also
|
20
|
c
|
|
č
|
1,2
|
|
21
|
ch
|
|
čh
|
1,3
|
|
22
|
j
|
|
|
1,4
|
The phoneme also appears in 40.
|
23
|
Jh
|
|
|
1,5
|
|
24
|
n̅
|
|
n
|
1,6
|
Cf. 34.
|
25
|
ṭ
|
|
ṭ
|
1,7
|
|
26
|
ṭ h
|
o
|
ṭ h
|
1,8
|
|
27
|
ḍ
|
|
ḍ
|
1,9
|
The 2 graphs correspond to the stop and the
flap allophones respectively
|
28
|
ḍ
h
|
|
ḍ
h
|
2,3
|
Like 27.
|
29
|
ṇ
|
|
ṇ
|
2,4
|
|
30
|
t
|
|
t
|
2,5
|
|
31
|
th
|
|
th
|
2,6
|
|
32
|
d
|
|
d
|
2,7
|
|
33
|
dh
|
|
dh
|
2,8
|
|
34
|
n
|
|
n
|
2,9
|
The phoneme in also appears in 24.
|
35
|
p
|
|
P
|
3,4
|
|
36
|
ph
|
|
ph
|
3,5
|
|
37
|
b
|
|
b
|
3,6
|
The grapheme also appears in 45.
|
38
|
bh
|
|
bh
|
3,7
|
|
39
|
m
|
|
m
|
3,8
|
|
40
|
ȳ
|
|
|
4,5
|
Cf.22.
|
41
|
y
|
|
y
|
4,6
|
The phoneme y also appears in 9,5051
|
42
|
r
|
|
r
|
4,7
|
The phoneme r also appears in 7
|
43
|
ḷ
|
|
ḷ
|
4,8
|
The corresponding Devanagari letter is placed
after the correlate of 49.
|
44
|
l
|
|
l
|
4,9
|
|
45
|
b
|
|
v
|
3,9
|
Cf. 37; the grapheme has this value in some
clusters and in some loan words; the phoneme v also appears
in 11; for the three graphs, see below.
|
46
|
ś
|
|
s
|
5,6
|
Cf. 48
|
47
|
ṣ
|
|
s
|
5,7
|
Cf.48
|
48
|
s
|
|
s
|
5,8
|
Phoneme s also appears in 46,47.
|
49
|
h
|
|
h
|
5,9
|
The phoneme h also appears in 14.
|
50
|
x
|
|
khy
|
6,7
|
The graph is traditionally a conjuct of 15+47;
phonemically is followed
by 41.
|
51
|
ñ
|
|
gy
|
6,8
|
The graph is tradtitonally
a conjuct of 22+24; phonemically 17 is followed by 41.
|
|
|
|
|
|
|
|
The grapheme ***** corresponding to Devanagari
»Ö is not found in the corpus. The diacritics
12,13,14 are romanized as letters.
The viārma symbolizing
the absence of the so-called inherent ə is ignored in the count. Note that the inherent of ə Oriya is a
regularly pronounced; unlike many other modern Indo-Aryan languages,
Oriya has very few word-types ending in a consonant phoneme. Lists 1 and 2 retain hyphens only when a part of Oriya orthography.
Syllabic hyphens were inserted at step 6 (see Note 2 there)
according to the following principles:
1. As
many syllables as there are vowel letters (1 to 11). So a hyphen interprevenses two successive vowel letters-as o-ḍi-a, k ə -ṇə.
Initial and final sequences
of non-vowels if any go with the adjacent vowel.
2. The following are treated as single letters-9,11,16,18,21,23,26,28,31,33,36,38,50,51.
3. The diacritic-letters 12,13,14 always follow
a vowel letter and go with it. Any
other non-vowel when followed by a vowel goes with it. A geminate consonant (as, kk) is never split
up.
4. Out of a sequence of non- vowels (other than
germinate between tow vowels, the first goes with the preceding vowel,
the rest go with the following vowel.
In the grapheme List (List 5A) observe that the post-consonant
forms of vowel-letters and the conjunct shapes of consonant letters
naturally went with the respective letters that the frequency for
the grapheme ******** can be calculated by adding up the frequencies of all
syllable-types beginning with (this grapheme has no post-consonantal
form); that the two variants of 27, 23 remain unsorted in that the
virāma at the end of the few consonant-ending words remaining
ignored in the count; and that the frequencies of 37 and 45 are added up for the grapheme
*****-which is shaped like ob or oy when it corresponds
to the phoneme /v/ in initial positions or after a vowel (see 45 in
the table).
In the list phoneme list (List 5 B) observe that the following
phonemes add up the frequencies of the letter symbols shown against
them: /ə/- ə, əy, v; /i/-i, ī
, /u/ -u, ū, r, /kh/-kh, x; /g/-g, ñ;/
/- j, ȳ; /n/ - ñ, n; /y/ -əy, y, x, ñ; /r/-r, r; /v/ -v,
b; /s/ - ś, ṣ, s; /h, -h, h.
The frequencies of letter symbol have to be distributed among
the phonemes /n/ (42 tokens all in words beginning with gə before
letters 20-24, 30-34 and 41), /m/ (81 tokens, all in words beginning
with before letters 35-39),/ / (1534 tokens, elsewhere).
The phonemes of Oriya
can be listed group-wise as follows: p, t, ṭ, k; ph, th,
ṭh,
čh, kh; b, d, ḍ, g; bh, dh,
ḍh
,h, gh; m, n, ṇ, s, l, ḷ,
r; h; y, v; I, e; u, o; ə,a; (to go with I, e, u, ə, a).
The phonemes ḍ,
ḍh
have flap variants between vowels, finally, before a consonant, or
fer a non-nasal consonant, but stop variants initial, in remination,
or after a nasal consonant. (The
dotted variants of graphemes 27 and 28 represent flaps; the other
stops.).
APPENDIX-2
GRAMMATICALLY
IMPORTANT ENDINGS
The number against ending (or group of endings in a few cases)
refers to the coding used on slips (steps 6) and cards (step 7).
Some endings are composed of more than one morph. Structurally they may be derivational, inflectional,
or phrasal. The addition of
‘etc.’ means other endings of a similar shape and function are also
coded alike and thus counted along with the ending that precedes. The hyphen after the ending shows that it may be followed by another
ending.
ENDING
NOMINAL
|
CODE
|
ENDING
|
CODE
|
aḍə
etc.
|
43
|
|
|
Uprə, etc.
|
46
|
ṭ e
|
19
|
Ka
|
08
|
ṭharu,
ṭ
hum͂,
ṭhəum͂
|
33
|
Ke, ṅke
|
10
|
ṭhare
|
34
|
guḍaku,
guḍaeȳake
|
24
|
ṭhi
|
35
|
guḍaku,
ȳake
|
28
|
ṭhu
|
32
|
guḍaṅka
|
26
|
te, tə
|
16
|
guḍaṅku
|
25
|
digə, etc.
|
44
|
Guḍik,
təkə
|
23
|
Pəchə,
etc.
|
45
|
guḍiku
|
27
|
pari
|
39
|
ṅkə-
|
12
|
pakhə, etc.
|
42
|
ṅku
|
11
|
bhəḷi, rupe, bhabe
|
07
|
chəḍa,
etc.
|
47
|
bhadere
|
50
|
jəṅe-
|
20
|
bhitəre
|
|
Dhayəere
|
40
|
mane
|
15
|
manə-
|
96
|
saṅge,
etc.
|
41
|
manəṅka-
maṅku
|
14
13
|
gathe, səhita,
səhə
səhəkare,
etc.
|
48
|
VERBAL
ə, ə m͂
|
54
|
ibe, im͂bə
|
61
|
ənti nti
|
55
|
ib, ib-
|
63
|
ntu, ntu, nta
|
56
|
ibar
|
64
|
anti, nti
|
57
|
ibi
|
59
|
antu, anta, nta,
ante
|
58
|
ibu, ibum͂
|
60
|
ante, nte
|
00
|
ibe
|
62
|
i,im͂
|
51
|
ilə
|
90
|
ichə, icə,
iəchi,
|
73
|
ilā,
im̃la
|
98
|
ichəti, icənti,
Icnti, ichəti
|
74
|
ili,
im̃li ,
ilu,
ilum̃
|
88
89
|
ichi, ici, iəchi,
im̃ci, im̃chi
|
71
|
ie,
im̃le
|
91
|
ichu, icu, iəchu
|
72
|
ucə hnti, ucəh, um̃cnti
uəc, ucənti
|
70
|
itha
|
09
|
uchi, uəchi,
uci
|
68
|
ithiba, ithib, etc.
|
97
|
um̃ci,
um̃chi
|
|
ithibarə
|
65
|
uchu,
ucu, um̃cu
|
69
|
ithilə
|
85
|
uthibā, uthibe,
uthibə
|
66
|
ithilā,
im̃thila
|
87
|
thibarə
|
67
|
thili,
im̃thili
|
83
|
uthilə
|
80
|
thili,
ithilim̃
|
84
|
uthila
|
82
|
ithile,
im̃thile
|
86
|
uthili
|
78
|
Uthilu
|
79
|
Le
|
94
|
Uthile
|
81
|
La
|
99
|
Chə, cə
|
77
|
Li
|
92
|
Chi, ci, echi
|
75
|
Lu,
lum̃
|
93
|
Chu, cu
|
76
|
Le
|
95
|
****
|
05
|
|
|
NOMINAL AND VERBAL
agə
|
49
|
na,
naim̃, nə, ni,
|
02
|
u,
um̃
|
53
|
Nuhəm̃nti, nei,
|
|
e,
em̃
|
52
|
Paim̃,
nimtaə,
|
|
ei,
|
01
|
səkaśeə-
|
38
|
ku, haku
|
29
|
R, ri
|
|
ṭā,
ṭha-
|
17
|
Ru
|
36
|
ṭi
|
18
|
ru
|
31
|
tə, təh
|
|
Re
|
37
|
dbara, oge
|
04
|
Lagi, hetu, etc.
|
06
|
|
30
|
Him̃,
hi, hem̃
|
03
|
APPENDIX-3
THE SOURCE
MATERIAL
The names of the authors and the titles of books and periodicals
are spelt with the letter symbols tabulated in Appendix 1.
The names of persons
are given with the surname transposed to the beginning-thus senapati
mohənə
cərəṇ
in a scad of the usual Mohan Charan Senapati.
The place of publication in Cuttack, Orisa unless otherwise
mentioned.
`
The year of publication of the book is for the first edition
unless otherwise mentioned. The
actual copy used for this count may of course be of a later year than
the one mentioned.
The number of slips refer to the word-token slips (step 3). The first and the second cont refer to the
two underlings (Step 2).
*****