What
follows is the ‘front matter’ on Oriya language frequency count that was undertaken
by me at Deccan College and completed in June 1966.
The reason for presenting the front matter consisting of table of Contests,
Preface, Introduction, Appendixes 1-3 to the Introduction in this fashion are
twofold:
(1)
The body of the text consisting of the frequency counts in the
form of Lists 1-8 in mentioned in the ‘Contents’ has already been published-but
without the front matter thanks to a gross oversight on the publisher’s part.
Those who use this book will naturally have looked for some such introduction. They will find this presentation indispensable for the proper use
of the book.
(2)
Other scholars engaged in frequency counts of this kind or even
merely planning such projects will probably find this presentation useful as a
model worth considering and will find the sections entitled ‘Procedural Steps’
(steps1-14 giving the ‘how’) and: Possible Applications’ (technical, educational,
the scholarly the ‘why’ from the Introduction suggestive even if they have little
or no interest in Oriya.
I
trust that this presentation will be of some use.
C
O N T E N T S
Preface
Introduction
Appendix
1 Oriya alphabet in its graphic and phonic aspects.
Appendix
2 Grammatically important endings.
Appendix
3 The source material.
List 1 Words in alphabetical
order with their frequencies
1-2846
List 2 Words in rank
order of descending frequency (of 10 or above) 287-311
List 3 Syllables in
alphabetical order with their frequencies
312-340
List 4 Syllables in
rank order of descending frequency 341-369
List 5 Letter symbols
in alphabetical order with their frequencies 370
List 5A Graphemes
in alphabetical order with their frequencies
371
List 5B Phonemes in
alphabetical order with their frequencies
372
List 6 Letter symbols
in rank order of descending frequency.
List 7 Grammatically
important endings in alphabetical order with their
Frequency.
List 8 Grammatically
important endings in rank order of descending
Frequency.
*************
P
R E F A C E
This study of the phonemic and morphemic frequencies in Oriya is part of
a project sponsored by the Ministry of Education, Government of India for ‘ the
evolution of a system or shorthand suited to the genius of Hindi Language in particular
and other regional language in particular and other regional language in general’.
Dr. S. V. Bhagwat’s doctotral dissertation on the Marathi frequencies antedates
this project, under which studies in Hindi, Gujarati, Kannada, and Malayam have
been completed.
Work on Oriya was carried out at Deccan College.
It was started early in 1963 and completed in a little over three year’s
time. Benefiting from the other studies,
in this series this study suffered on fewer occasions from false starts and retracing
of steps.
Dr. A. M. Ghatage give generously of his time and advice and provided general
supervision. Dr. D. P. Pattanyak (of the
American Institute of Indian Studies) was consulted on the phonological system
and the grammatical endings of Oriya. Dr.
S. G. Prabhu Ajgaonkar, Research Associate in Statistics, gave valuable technical
advice and took over immediate supervision during my leave of absence from April
to October 1985. Mr.
Mohan Charan Senapati, B. A. , native speaker of Oriya, was appointed a
Research Scholar and carried out the actual counting, the processing and arranging
of the slips, the machine-sorting of the punched cards, and the preparation of
the lists and also helped with the proofreading of the typescript.
Mrs. B. V. Guhagarkar did the punching of the cards and helped with their
sorting. Miss S. M. Kulakarni did the mechanical verification of the of the punched
cards. Mr. D. D. Phadke was enstrusted with the complicated
typing.
My thanks are due to all of these as also to
the office and the library of Deccan College.
It is hoped that the results presented here will be of use not only for
the immediate purpose of devising methods of speedwriting but also for various
other technical, educational, and scholarly purposes.
30
June 1966.
A. R. Kelkar.
I
N T R O D U C T I O N
ORIENTATION
The fact that this study of phonological, lexical, and grammatical frequencies
in Oriya is one of a series has imparted to it a set shape, so that the results
obtained for the various Indian languages covered remain comparable. Though the devising of methods of speedwriting
was the end immediately in view, the work was designed in such a way that other
purposes may also be served.
The immediate goal of this study was the preparation of four pairs of lists.
(i)
The words with the number of times each occurred in the corpus.
(ii)
The syllable shapes with their frequencies.
(iii)
The letters in their graphic and phonic aspects with their frequencies.
(iv)
Some grammatically important endings (whether consisting of single
morphs or not) with their frequencies.
The
lists are personated first in the Oriya alphabetical order and then in the rank-
order
and then in the rank-order of descending frequency.
In counting words the various grammatical forms of what would be a single
entry-heading in a dictionary were counted separately-it was as if take, takes,
took, taken, taking were all treated as different word-types, in deciding
where one word ends and another begins, conventional spacing was the criterion
used, as if full stop, full-time, and fulfill were counted as two,
one, and one word-token respectively. All this sounds (and is) a little arbitrary but unavoidably so in
handling a large corpus of about 1,00,000 word-tokens.
The corpus consisted of word-tokens sampled out of a larger body of printed
matter in the form of books published in contemporary standard Oriya, and of periodicals
covering of year’s duration. It was decided
that obtaining and using radio and movie scripts as was done with some of the
other studies in this series was not feasible.
A word about the slightly technical but very important distinction between
type and token at earlier may not be out of place. When we are counting units of any kind-whether letters or syllables
or words or suffixes-we can do so in either of two ways-either count the occurrences
of tokens or count the recurrent types. For example, in the text it is raining,
the letter-type –I has 4 letter-tokens, the letter-type n has 2 letter-tokens,
etc., and in all there are 7 letter types an 11 letters-token.
Before looking at some of the findings and possible applications, we should
be clearly aware of the procedure used in arriving at the lists. The procedural statement will enable the reader to understand and
evaluates the results properly; it may also help some future worker undertaking
a similar frequency count.
PROCEDURAL
STEPS
For the sake of convenience of exposition, a set of distinct steps are
described below. In actual practice, of
course, there was a good deal of overlap and adjustment to the availability of
man-power, machines, and material.
Steps 1 to 3 below constitutes the stage of the collection of data. The next stage, namely collation and analysis
of the data comprises steps 4 to 8. the
remaining steps, 9 to 14, cover the final stage leading to the presentation of
the results in the form of 10 lists-numbered 1, 2, 3, 4, 5, 5A, 5B, 6, 7, and
8. The three stages roughly took a year
each.
Step 1 selecting and obtaining of the course material:
The body of source material had to be large enough so that a sample of
about one lac word-tokens could be obtained at the rate of 2 word-tokens per page
of a book or per column of a periodical. It
also had to be diversified enough and distributed properly so that the whole represented
a fair selection of contemporary standard Oriya as used in printed matter.
The following table will give a general idea of the distribution of the
material.
Tokens |
Number of titles |
Matter |
year
of publication |
word-tokens |
Books | 85 | 16532
Pages | 1896
to 1963(about 75 from 1951 to 1953) | 33,033 |
Monthlies | 3 | 6383Columns |
1963-4 |
12,760 |
Delies | 2 | 30634
Columns | 1963-4 | 54,271 |
| Total: | 1,00,064 |
Geographical distribution was not a serious problem as most of the publishing
activity in Oriya is centred in Cuttack, Orissa and, to a lesser extent, in Calcutta,
W.Bengal, In securing diversity of contents and media, due weightage was given
to the effective readership. The 85 book
titles were distributed as under:-
Noves
13
Collections
of short stories 15
Humour
2
Stage,
lays
8
Biography
and autobiography 8
Essays
and literary criticism 8
Belles
Letters -----
Total 53
-----
science
and science fiction 6
social
and political writings 7
history
2
Non-scientific
technical 1
Travel
1
-----
Adult
non-literary total 17
School
texts
9
For
children
4
General
knowledge for children 2
-----
Non-adult
total 15
----
---
85
---
Of the 3 monthlies one was primarily devoted to political satire, one the
other two being literary. The 2 dailies
were the ones with the highest circulation.
See Appendix 3 of this introduction for a complete list of the source material
with the breakdown or to corpus.
Only 2 word-tokens were randomly pulled out of each page or column. Thus less than 2 percent out of the running
text was utilized for the count. This
ensured that the frequency of a given item will not be exaggerated because of
its constant recurrence in a short span because of the subject matter of the text.
Step 2 sampling of the source material in order to obtain the corpus:
Each item-book or periodical-was ordinarily excepted from twice. In the first count were underlined the first
word of each page of a book, the first word of each column of each age of a monthly
(2-3 columns to a page), and the last word of the seventh line of each column
of each page of a daily (8 column to a page). In the second count were underlined the last word of the first line
of the last paragraph of each page of a book, the last word of the first line
of the last paragraph of each column of each page of a monthly, and the last word
of the seventh line of each column of each page of daily. (The procedure for dailies was designed to
avoid the date lines and the news agency names.) A count was maintained of the word-tokens extracted. As soon as the figure 1,00,000 was exceeded.
Collection was suspended (the actual count being 1, 00, 064).
The way a word was defined for this operation has already been indicated. An incomplete word marked with a hyphen at
the end of a line was completed from the next line. In applying the rules of random choice above the following items
were ignored; proper names, abbreviations, figures (but not spelt-out numerals),
the running heading of a page, news headings in a daily, verse citations, Sanskrit
citations, matter not in the Oriya script and in display or fancy printing in
the advertisement (but not Sanskrit or English loanwords in Oriya script nor the
ordinary letterpress in an advertisement No page that had anything on it was left
out.
Step
3 preparation of the word-token slips:
For each word-token so underlined, a slip (3” X b” size used throughout)
was written out with the word in Oriya script at the center. No attempt to normalize orthography was made
at this stage beyond correcting obvious misprints.
Step
4 Arranging the slips alphabetically:
All slips were arranged in a single alphabetically arranged file. The Oriya order was used ( see Appendix 1 of
this introduction). This was done by hand
with the help of a pigeon-holed cabinet-taking one letter at a time from left
to right.
Step
5 Removal of duplicate slips:
For each word-type only one slip was retained, the duplicates being discarded.
In doing so the total number of tokens appearing in the corpus for the word-type-the
frequency that word-was written in the top right corner of the slip.
When a word-type had alternate spellings, a slip with the preferred spelling
was retained with the other spellings noted in the bottom left corner.
Step
6 Preparation of the word-type slips:
The 20968 slips so retained were then processed as shown in the diagram
to yield word-type slips.
(4) Serial
(3) Word frequency Numbers
(6) Number of letters in each syllable (1)
Spelling I Oriya script (2)
Spelling in Roman with syllable divisions. (5)
Alternate spellings (7) Grammatical
type in Roman. (8) Endings symbolized. |
Notes:
1. This is preferred spelling in the judgement of the Research
Scholar who prepared all the slips. He
consulted Pramodacandara Deb’s Pramoda-abhidhāna (2 vols, Cuttack,
1942.
2. A Roman writing system was devised consisting of 51 letter-symbols
in a certain order (analogous to the Oriya alphabetical order). The symbols together with their equivalence
to the letters of the Oriya script (the graphemes of Oriya and to the units of
the Oriya sound system (the phonemes of Oriya
) are set out in Appendix 1. This enables
us to deduce the frequencies of the gramphemes (List 5A) and of the phonemes (List 5B) from those
of the letter symbols (List 5). The slips
were arranged in the order of the letter symbols. Syllabic divisions were shown with hyphens.
(For the rules for inserting hyphens see Appendix 1.)
3. This is carried over from step 5. The number includes the occurrence of the word-type in alternate
spellings.
4. This number-from 1 to 20968-shows the place of the slip in
the alphabetical order and links it to the corresponding punched card.
5. These are recorded later in list 1.
6.
Thus for a word like I-la-ka the notation at this point will be 1299 that
is, 1 letter in the first syllable, 2 in the second, and 2 in the third.
7. All words were classified into 5 types as follows:
Type 0 without
any of the listed endings.
Type 1 with one nominal
endings.
Type 2
with two nominal endings
Type x
with one verbal endings.
Type y
with two verbal endings.
8. A list of the principal grammatical endings was first prepared
and a code number was assigned to each (See Appendix 2 of this Introduction.
The endings occurring in the word were noted here in the form of code numbers. Thus 22-07 means that the word contains ending 22 followed by ending
07.
Step
7 Preparation of a punched card for each word-type slip:
ICT cards of type 4-354 with column 1 to 80 and twelve rows (x, y, 0, and
1 to 9) were used). The columns were utilized
punching as follows:
1 to 5 coding the serial number (the first digit in column 1, blanks to
the right) tying the card the slip.
6 to 9 coding the word frequency number followed by x (thus, frequency
13 was punched as follows: 1 under 6, 9 under 7, x under 8, nothing under 9).
10 to 14 coding the spelling of the first syllable
with blanks to the left. For the code number that is assigned to teach letter
symbol and is indicated by single or double punching, see under that column in
the table included in Appendix 1.
15 to 54 coding the spellings of the second to the ninth syllables in a
similar fashion with 5 columns to a syllable.
(No word in the corpus exceeded 9 syllables.)
55 to 63 coding the number of letters in each syllable followed by 0 with
blanks to the right (see Step6, note 6).
64 Coding the grammatical type-0, 1, 2, x, y, as the case may be (see step
6, note 7.)
65 to 68 Coding the grammatical endings-the first in 65-66, the second
if any in 67-68. See Appendix 2 for the code.
69 to 80 Not utilized and left blank.
Step
8 verification of punched cards:
As soon as the puncher transferred the data from the slip to the punch
card with in the help of a punching machine the verifier verified the correctness
against the slip with the help of verifying machine.
If necessary a new card was punched and the wrongly punched one discarded.
Step
9 Preparation of List 1:
The file of word slips at the end of step 6 was utilized for writing out
list 1: word-types spelt in Roman letter symbols arranged alphabetically with
the alternate spellings (if any) in Roman and with the frequency (out of 1,00,064
tokens) noted against each.
Step
10 Preparation of List 2:
The slips were hand-sorted according to the
frequency number (slips with frequency numbers below ten were not sorted discarded)
and List 2 was written out: words arranged in an order of descending frequency
from the maximum to frequency 10. Under
each frequency number the order is alphabetical. Words with
a frequency 1 to 9 are too many to be listed profitably-such lists can always
be complied visually by scanning list 1.
Step
11 Machine-sorting for the preparation of lists of syllable-types, letter-types,
and ending types:
The punched cards were fed into a tabulator for the purpose of sorting. First, they were sorted according to the frequency
number. Subsequently, each stack was handled
separately. Note that the X suffixed to
the frequency number-thus, all cards bearing X in column 8 and frequencies from
10 to 99 and thus could be separated first conveniently before they could be sorted
further into the stack for 10, for 11, etc.
Using the 0 in columns 55-63 (analogously to then use of X in column 6-9)
the cards were than sorted according to the number of syllables in the word. Using columns 10 to 68 the frequency for the
syllable-type and each letter-type was added up. The appearance of a given letter 30 times in the stack of frequency
3 accounts for 30 x 3 tokens of that letter type, and so on.
The stack for a given frequency was put together again and then resroted
using columns 64 to 68. The frequency
for each ending-type was added up.
Step
12 Preparation of Lists, 3, 4, 5, 6, 7, 8:
Slips were than prepared for each syllable-type, each letter-type, and
each ending-type with the frequency number ( the total number of tokens encountered
it the corpus of 1,00,064 word-tokens) in the top right corner of the slip.
The three files were each arranged alphabetically and Lists 3, 5, and 7
were prepared respectively for syllable types, and ending-types.
The files were then manually rearranged in rank order of descending frequency
and Lists 4, 6, and 8 were prepared respectively for syllable-types, letter-types,
and ending-types.
Step
13 preparation of Lists 5A and 5B:
Given the frequencies of all the letter symbols, those of all the graphemes
and the phonemes can be computed by using the equivalences. For details see Appendix 1.
Step
14 Typing and proofreading:
A typescript of the 10 lists was prepared constituting the results of this
study and proofread.
SOME
FINDINGS:
The term ‘morphemic frequency’ in the title of this study is, one must
admit, slightly misleading. With the help
of List 1 and List 7, somebody determined enough and knowing the language well
enough could compile a list of morph-types.
The situation is similar if what one is looking for is the frequency of
entry-words in an Oriya dictionary. (Since
Oriya inflections are suffixal in character, all the grammatical forms of, say,
a given verb will be brought together in the alphabetical word list-List 1.)
The type-token ratio for the different kinds of units can be seen in the
following table:
Total number |
Total number of types |
Total
number of tokens |
Words (List 1) | 20,968 | 1,00,064 |
Syllables (List 3) | 2,128 | 3,11,318 |
Letter symbols (List 5) | 51 | 6,39,613 |
Graphemes (List 5A) | 54 | 5,16,097 |
Phonemes (List 5B) | 38 | 5,89,518 |
Vowels (List 5B) | 6 | 3,11,318 |
Consonants (List 5B) | 31 | 2,75,021 |
Ovowel (List 5B) | 1 | 3,179 |
Endings (List 7) | 98 | 31,212 |
Other interesting ratios that could be worked out from this table are:
Syllable-token/word-token, phoneme-token/word- token, phoneme- token/word- token,
ending- token/word- token, Vowel- token/consonant- token, phoneme- token/syllable-
token etc.,
The highest and lowest frequencies for the different kinds of units have
been shown by the following types:
|
|
Highest Frequency |
|
Lowest Frequency |
Word | 0 | 1137 | several |
1 |
Syllable | r | 11951 | 480 types | 1 |
Letter symbol | ə | 125314 | h | 237 |
Grapheme | | 58587 | |
18 |
Phoneme | ə | 126416 | |
278 |
Vowel Phoneme | ə | 126416 | |
8424 |
Consonant phoneme | r | 47282 | |
278 |
Endings | rə,ri | 4154 | {four | 1 1 |
| | | two | 0 |
it will be interesting to see how far high-frequency words and endings
contribute to the high frequency of certain syllables and phonemes.
Word-types may be classified according to the frequency-range (from so
many present to so many percent) and the total number of types and the token percentage)
token total for the range divided by 1,00,064 multiplied by 100) may be noted
for each class.
Frequency
range |
Number
of types |
Percent of Total Population |
Above 1000
(1% or more) | 1 | 1.1 % |
From 200
to 999 (0.2% +) | 28 | 9.6 % |
From 100
to 199 (0.1 % +) | 45 | 9.4 % |
From 50
to 99 (0.05 % +) | 176 | 5.9 % |
Form 10
to 49 (0.01 % +) | 1548 | 28.4 % |
From 1
to 9 (at low 0.01 %) | | |
Syllable-types may be classified according to the structure of each (V,
CV, VC, CVC, etc.) and the total number of types and the token percentage may
be noted for each class.
POSSIBLE
APPLICATIONS:
The results of the study presented in the form of the lists can be put
to various uses when properly interpreted and manipulated.
Technical
uses: (1) Designing of methods of speedwriting (short
hand systems)-all the lists, especially syllable and phoneme lists. (2) Designing
of typewriter key boards-grapheme list especially.
(3) Typing fonts-grapheme list especially, (4)
Telecommunication engineering. (5)
Script and spelling reform Educational uses: designing curricula, reading
or recorded material, proficiency tests, etc. for language learners and users-whether
young or old, whether native speakers, dialect speakers, or foreigners.
Scholarly uses: (1) Phonological and grammatical analysis, Lexical study. (2) Study of statistical and cybernetic properties
and comparing these with those of other languages. (3) Comparison
between different languages from the point of view of historical study, linguistic
typology, and linguistic anthropology. (4) Statistical
study of literary style.
APPENDIX
–1
ORIYA ALAPHABET IN ITS GRAPHIC AND PHONIC ASPECTS
Serial No. |
Letter symbol |
Grapheme |
Phoneme |
Code |
Remarks. |
1 | ə | | ə | 0 | The phoneme ə also appears
in 9, 11; phonetically it is slightly rounded. |
2 | a | | a | 1 | |
3 | i | | i | 2 | Cp.3 and 4, length is not
phonemic. |
4 | i | | i | 3 | |
5 | u | | u | 4 | Cp.5 and 6, length is not
phonemic. |
6 | ū | | u | 5 | The phoneme u also appears
in 7. |
7 | r | | ru | 6 | One grapheme but two phonemes. |
8 | e | | e | 7 | |
9 | əy | | əy | 8 | Like 7. |
10 | o | | O | 9 | |
11 | əv | | əv | 0,1 | Like 7. |
12 | m̄ | o | ṅ | 0,2 | Cf.
19; this grapheme correspondence to phoneme ṅ Except
before letters 20-40 (see below). |
13 | m͂ | | ̃ | 0,3 | |
14 | h | : | h | 0,4 | Cf.49. |
15 | k | | k | 0,5 | |
16 | kh | | kh | 0,6 | The phoneme kh also
appears in 50. |
17 | g | | g | 0,7 | The phoneme g also appers
in 51. |
18 | gh | | gh | 0,8 | |
19 | ṅ | | | 0,9 | The phoneme ṅ also |
20 | c | | č | 1,2 | |
21 | ch | | čh | 1,3 | |
22 | j | | | 1,4 | The phoneme also appears
in 40. |
23 | Jh | | | 1,5 | |
24 | n̅ | | n | 1,6 | Cf. 34. |
25 | ṭ | | ṭ | 1,7 | |
26 | ṭ h | o | ṭ h | 1,8 | |
27 | ḍ | | ḍ | 1,9 | The 2 graphs correspond to
the stop and the flap allophones respectively |
28 | ḍ
h | | ḍ
h | 2,3 | Like 27. |
29 | ṇ | | ṇ | 2,4 | |
30 | t | | t | 2,5 | |
31 | th | | th | 2,6 | |
32 | d | | d | 2,7 | |
33 | dh | | dh | 2,8 | |
34 | n | | n | 2,9 | The phoneme in also appears
in 24. |
35 | p | | P | 3,4 | |
36 | ph | | ph | 3,5 | |
37 | b | | b | 3,6 | The grapheme also appears
in 45. |
38 | bh | | bh | 3,7 | |
39 | m | | m | 3,8 | |
40 | ȳ | | | 4,5 | Cf.22. |
41 | y | | y | 4,6 | The phoneme y also appears
in 9,5051 |
42 | r | | r | 4,7 | The phoneme r also appears
in 7 |
43 | ḷ | | ḷ | 4,8 | The corresponding Devanagari
letter is placed after the correlate of 49. |
44 | l | | l | 4,9 | |
45 | b | | v | 3,9 | Cf. 37; the grapheme has
this value in some clusters and in some loan words; the phoneme v also appears
in 11; for the three graphs, see below. |
46 | ś | | s | 5,6 | Cf. 48 |
47 | ṣ | | s | 5,7 | Cf.48 |
48 | s | | s | 5,8 | Phoneme s also appears in
46,47. |
49 | h | | h | 5,9 | The phoneme h also appears
in 14. |
50 | x | | khy | 6,7 | The graph is traditionally
a conjuct of 15+47; phonemically is followed
by 41. |
51 | ñ | | gy | 6,8 | The
graph is tradtitonally a conjuct of 22+24; phonemically 17 is followed by 41. |
| | | | | | |
The grapheme ***** corresponding to Devanagari
»Ö is not found in the corpus. The diacritics
12,13,14 are romanized as letters. The
viārma symbolizing the absence of
the so-called inherent ə is ignored in the count. Note that the inherent of ə Oriya is a
regularly pronounced; unlike many other modern Indo-Aryan languages, Oriya has
very few word-types ending in a consonant phoneme. Lists 1 and 2 retain hyphens only when a part of Oriya orthography.
Syllabic hyphens were inserted at step 6 (see Note 2 there) according to
the following principles:
1.
As many syllables as there are vowel letters (1 to 11). So a hyphen interprevenses two successive vowel letters-as o-ḍi-a, k ə -ṇə.
Initial and final sequences of non-vowels
if any go with the adjacent vowel.
2. The following are treated as single letters-9,11,16,18,21,23,26,28,31,33,36,38,50,51.
3. The diacritic-letters 12,13,14 always follow
a vowel letter and go with it. Any other
non-vowel when followed by a vowel goes with it. A geminate consonant (as, kk) is never split
up.
4. Out of a sequence of non- vowels (other than
germinate between tow vowels, the first goes with the preceding vowel, the rest
go with the following vowel.
In the grapheme List (List 5A) observe that the post-consonant forms of
vowel-letters and the conjunct shapes of consonant letters naturally went with
the respective letters that the frequency for the grapheme ******** can be calculated by adding up the frequencies of all
syllable-types beginning with (this grapheme has no post-consonantal form); that
the two variants of 27, 23 remain unsorted in that the virāma at the end
of the few consonant-ending words remaining ignored in the count; and that the
frequencies of 37 and 45 are added up for the grapheme
*****-which is shaped like ob or oy when it corresponds
to the phoneme /v/ in initial positions or after a vowel (see 45 in the table).
In the list phoneme list (List 5 B) observe that the following phonemes
add up the frequencies of the letter symbols shown against them: /ə/- ə,
əy, v; /i/-i, ī
, /u/ -u, ū, r, /kh/-kh, x; /g/-g, ñ;/
/- j, ȳ; /n/ - ñ, n; /y/ -əy, y, x, ñ; /r/-r, r; /v/ -v,
b; /s/ - ś, ṣ, s; /h, -h, h.
The frequencies of letter symbol have to be distributed among the phonemes
/n/ (42 tokens all in words beginning with gə before letters 20-24, 30-34
and 41), /m/ (81 tokens, all in words beginning with before letters 35-39),/ / (1534 tokens, elsewhere).
The
phonemes of Oriya can be listed group-wise as follows: p, t, ṭ, k; ph, th,
ṭh,
čh, kh; b, d, ḍ, g; bh, dh,
ḍh ,h, gh;
m, n, ṇ, s, l, ḷ,
r; h; y, v; I, e; u, o; ə,a; (to go with I, e, u, ə, a).
The phonemes ḍ,
ḍh have flap
variants between vowels, finally, before a consonant, or fer a non-nasal consonant,
but stop variants initial, in remination, or after a nasal consonant.
(The dotted variants of graphemes 27 and 28 represent flaps; the other
stops.).
APPENDIX-2
GRAMMATICALLY
IMPORTANT ENDINGS
The number against ending (or group of endings in a few cases) refers to
the coding used on slips (steps 6) and cards (step 7).
Some endings are composed of more than one morph. Structurally they may be derivational, inflectional,
or phrasal. The addition of ‘etc.’ means
other endings of a similar shape and function are also coded alike and thus counted
along with the ending that precedes. The hyphen after the ending shows that it may be followed by another
ending.
ENDING
NOMINAL |
CODE |
ENDING |
CODE |
aḍə
etc. | 43 | | |
Uprə, etc. | 46 | ṭ e | 19 |
Ka | 08 | ṭharu,
ṭ
hum͂,
ṭhəum͂ | 33 |
Ke, ṅke | 10 | ṭhare | 34 |
guḍaku,
guḍaeȳake | 24 | ṭhi | 35 |
guḍaku,
ȳake | 28 | ṭhu | 32 |
guḍaṅka | 26 | te,
tə | 16 |
guḍaṅku | 25 | digə,
etc. | 44 |
Guḍik,
təkə | 23 | Pəchə,
etc. | 45 |
guḍiku | 27 | pari
| 39 |
ṅkə- | 12 | pakhə,
etc. | 42 |
ṅku | 11 | bhəḷi, rupe, bhabe | 07 |
chəḍa,
etc. | 47 | bhadere | 50 |
jəṅe- | 20 | bhitəre | |
Dhayəere | 40 | mane | 15 |
manə- | 96 | saṅge,
etc. | 41 |
manəṅka- maṅku | 14 13 | gathe,
səhita, səhə səhəkare,
etc. | 48 |
VERBAL
ə,
ə m͂ |
54 |
ibe, im͂bə |
61 |
ənti nti | 55 | ib,
ib- | 63 |
ntu, ntu, nta | 56 | ibar | 64 |
anti, nti | 57 | ibi | 59 |
antu, anta, nta, ante | 58 | ibu,
ibum͂ | 60 |
ante, nte | 00 | ibe | 62 |
i,im͂ | 51 | ilə | 90 |
ichə, icə, iəchi, | 73 | ilā,
im̃la | 98 |
ichəti, icənti, Icnti,
ichəti | 74 | ili,
im̃li , ilu,
ilum̃ | 88 89 |
ichi, ici, iəchi, im̃ci, im̃chi | 71 | ie,
im̃le
| 91 |
ichu, icu, iəchu | 72 | ucə hnti, ucəh, um̃cnti uəc,
ucənti | 70 |
itha | 09 | uchi,
uəchi, uci | 68 |
ithiba, ithib, etc. | 97 | um̃ci,
um̃chi | |
ithibarə | 65 | uchu,
ucu, um̃cu | 69 |
ithilə | 85 | uthibā,
uthibe, uthibə | 66 |
ithilā,
im̃thila | 87 | thibarə | 67 |
thili,
im̃thili | 83 | uthilə | 80 |
thili,
ithilim̃ | 84 | uthila | 82 |
ithile,
im̃thile | 86 | uthili | 78 |
Uthilu | 79 | Le | 94 |
Uthile | 81 | La | 99 |
Chə, cə | 77 | Li | 92 |
Chi, ci, echi | 75 | Lu,
lum̃ | 93 |
Chu, cu | 76 | Le | 95 |
**** | 05 | | |
NOMINAL
AND VERBAL
agə |
49 |
na,
naim̃, nə, ni, |
02 |
u, um̃ | 53 | Nuhəm̃nti, nei, | |
e, em̃ | 52 | Paim̃,
nimtaə, | |
ei, | 01 | səkaśeə- | 38 |
ku, haku | 29 | R,
ri | |
ṭā,
ṭha- | 17 | Ru | 36 |
ṭi | 18 | ru | 31 |
tə, təh | | Re | 37 |
dbara, oge | 04 | Lagi,
hetu, etc. | 06 |
| 30 | Him̃,
hi, hem̃ | 03 |
APPENDIX-3
THE
SOURCE MATERIAL
The names of the authors and the titles of books and periodicals are spelt
with the letter symbols tabulated in Appendix 1.
The names of persons are given
with the surname transposed to the beginning-thus senapati mohənə
cərəṇ
in a scad of the usual Mohan Charan Senapati.
The place of publication in Cuttack, Orisa unless otherwise mentioned.
`
The year of publication of the book is for the first edition unless otherwise
mentioned. The actual copy used for this
count may of course be of a later year than the one mentioned.
The number of slips refer to the word-token slips (step 3). The first and the second cont refer to the
two underlings (Step 2).
*****