Language and Linguistics

FREQUENCY  COUNTS : THE HOW AND THE WHY

ASHOK R. KELKAR



What follows is the ‘front matter’ on Oriya language frequency count that was undertaken by me at Deccan College and completed in June 1966.

 

            The reason for presenting the front matter consisting of table of Contests, Preface, Introduction, Appendixes 1-3 to the Introduction in this fashion are twofold:

 

(1)        The body of the text consisting of the frequency counts in the form of Lists 1-8 in mentioned in the ‘Contents’ has already been published-but without the front matter thanks to a gross oversight on the publisher’s part.  Those who use this book will naturally have looked for some such introduction.  They will find this presentation indispensable for the proper use of the book.

(2)        Other scholars engaged in frequency counts of this kind or even merely planning such projects will probably find this presentation useful as a model worth considering and will find the sections entitled ‘Procedural Steps’ (steps1-14 giving the ‘how’) and: Possible Applications’ (technical, educational, the scholarly the ‘why’ from the Introduction suggestive even if they have little or no interest in Oriya.

 

I trust that this presentation will be of some use.

 

 

C O N T E N T S

 

 

 

Preface

 

Introduction

 

Appendix 1 Oriya alphabet in its graphic and phonic aspects.

Appendix 2  Grammatically important endings.

Appendix 3  The source material.

 

 

List       1            Words in alphabetical order with their frequencies                               1-2846

 

List       2            Words in rank order of descending frequency (of 10 or above)      287-311

 

List       3            Syllables in alphabetical order with their frequencies                312-340

 

List       4            Syllables in rank order of descending frequency           341-369          

List       5            Letter symbols in alphabetical order with their frequencies            370

 

List       5A            Graphemes in alphabetical order with their frequencies                   371

 

List       5B            Phonemes in alphabetical order with their frequencies                   372

 

List       6            Letter symbols in rank order of descending frequency.

 

List       7            Grammatically important endings in alphabetical order with their

                        Frequency.

 

List       8            Grammatically important endings in rank order of descending

                        Frequency.

 

 

*************

 

 

 

 

P R E F A C E

 

            This study of the phonemic and morphemic frequencies in Oriya is part of a project sponsored by the Ministry of Education, Government of India for ‘ the evolution of a system or shorthand suited to the genius of Hindi Language in particular and other regional language in particular and other regional language in general’.  Dr. S. V. Bhagwat’s doctotral dissertation on the Marathi frequencies antedates this project, under which studies in Hindi, Gujarati, Kannada, and Malayam have been completed.

 

            Work on Oriya was carried out at Deccan College.  It was started early in 1963 and completed in a little over three year’s time.  Benefiting from the other studies, in this series this study suffered on fewer occasions from false starts and retracing of steps.

 

            Dr. A. M. Ghatage give generously of his time and advice and provided general supervision.  Dr. D. P. Pattanyak (of the American Institute of Indian Studies) was consulted on the phonological system and the grammatical endings of Oriya.  Dr. S. G. Prabhu Ajgaonkar, Research Associate in Statistics, gave valuable technical advice and took over immediate supervision during my leave of absence from April to October 1985.  Mr.  Mohan Charan Senapati, B. A. , native speaker of Oriya, was appointed a Research Scholar and carried out the  actual counting, the processing and arranging of the slips, the machine-sorting of the punched cards, and the preparation of the lists and also helped with the proofreading of the typescript.  Mrs. B. V. Guhagarkar did the punching of the cards and helped with their sorting. Miss S. M. Kulakarni did the mechanical verification of the of the punched cards.  Mr. D. D. Phadke was enstrusted with the complicated typing.

 

             My thanks are due to all of these as also to the office and the library of Deccan College.

 

            It is hoped that the results presented here will be of use not only for the immediate purpose of devising methods of speedwriting but also for various other technical, educational, and scholarly purposes.

 

30 June 1966.                                                                                                   A. R. Kelkar.

 

 

 

 

I N T R O D U C T I O N

 

 

ORIENTATION

 

            The fact that this study of phonological, lexical, and grammatical frequencies in Oriya is one of a series has imparted to it a set shape, so that the results obtained for the various Indian languages covered remain comparable.  Though the devising of methods of speedwriting was the end immediately in view, the work was designed in such a way that other purposes may also be served.

 

            The immediate goal of this study was the preparation of four pairs of lists.

 

(i)                 The words with the number of times each occurred in the corpus.

(ii)                The syllable shapes with their frequencies.

(iii)              The letters in their graphic and phonic aspects with their frequencies.

(iv)              Some grammatically important endings (whether consisting of single morphs or not) with their frequencies.

 

The lists are personated first in the Oriya alphabetical order and then in the rank-

order and then in the rank-order of descending frequency.

 

            In counting words the various grammatical forms of what would be a single entry-heading in a dictionary were counted separately-it was as if take, takes, took, taken, taking were all treated as different word-types, in deciding where one word ends and another begins, conventional spacing was the criterion used, as if full stop, full-time, and fulfill were counted as two, one, and one word-token respectively.  All this sounds (and is) a little arbitrary but unavoidably so in handling a large corpus of about 1,00,000 word-tokens.

 

            The corpus consisted of word-tokens sampled out of a larger body of printed matter in the form of books published in contemporary standard Oriya, and of periodicals covering of year’s duration.  It was decided that obtaining and using radio and movie scripts as was done with some of the other studies in this series was not feasible.

 

            A word about the slightly technical but very important distinction between type and token at earlier may not be out of place.  When we are counting units of any kind-whether letters or syllables or words or suffixes-we can do so in either of two ways-either count the occurrences of tokens or count the recurrent types.  For example, in the text it is raining, the letter-type –I has 4 letter-tokens, the letter-type n has 2 letter-tokens, etc., and in all there are 7 letter types an 11 letters-token.

 

            Before looking at some of the findings and possible applications, we should be clearly aware of the procedure used in arriving at the lists.  The procedural statement will enable the reader to understand and evaluates the results properly; it may also help some future worker undertaking a similar frequency count.

 

PROCEDURAL STEPS

 

            For the sake of convenience of exposition, a set of distinct steps are described below.  In actual practice, of course, there was a good deal of overlap and adjustment to the availability of man-power, machines, and material.

 

            Steps 1 to 3 below constitutes the stage of the collection of data.  The next stage, namely collation and analysis of the data comprises steps 4 to 8.  the remaining steps, 9 to 14, cover the final stage leading to the presentation of the results in the form of 10 lists-numbered 1, 2, 3, 4, 5, 5A, 5B, 6, 7, and 8.  The three stages roughly took a year each.

 

            Step 1 selecting and obtaining of the course material:

 

            The body of source material had to be large enough so that a sample of about one lac word-tokens could be obtained at the rate of 2 word-tokens per page of a book or per column of a periodical.  It also had to be diversified enough and distributed properly so that the whole represented a fair selection of contemporary standard Oriya as used in printed matter.  The following table will give a general idea of the distribution of the material.

 

 

Tokens

Number of

titles

 

 

Matter

 

 

year of publication

 

 

word-tokens

 

Books

85

16532 Pages

1896 to 1963(about 75 from 1951 to 1953)

33,033

Monthlies

3

6383Columns

   1963-4

      12,760

Delies

2

30634 Columns

1963-4

   54,271

         

           

   Total:

1,00,064

         

            Geographical distribution was not a serious problem as most of the publishing activity in Oriya is centred in Cuttack, Orissa and, to a lesser extent, in Calcutta, W.Bengal, In securing diversity of contents and media, due weightage was given to the effective readership.  The 85 book titles were distributed as under:-

 

Noves                                                     13

 

Collections of short stories              15

 

Humour                                         2

 

Stage, lays                                            8

 

Biography and autobiography                8

 

Essays and literary criticism                       8

 

Belles Letters                                        -----

Total 53

                                                -----

 

science and science fiction                 6

 

social and political writings               7

 

history                                                      2

 

Non-scientific technical                        1

 

Travel                                                       1

                                                -----

Adult non-literary total                  17

           

School texts                                            9

 

For children                                                   4

 

General knowledge for children               2

                                                -----

Non-adult total                              15

                                                ----            ---

                                                            85

                                                            ---

            Of the 3 monthlies one was primarily devoted to political satire, one the other two being literary.  The 2 dailies were the ones with the highest circulation.

 

            See Appendix 3 of this introduction for a complete list of the source material with the breakdown or to corpus.

 

            Only 2 word-tokens were randomly pulled out of each page or column.  Thus less than 2 percent out of the running text was utilized for the count.  This ensured that the frequency of a given item will not be exaggerated because of its constant recurrence in a short span because of the subject matter of the text.

 

            Step 2 sampling of the source material in order to obtain the corpus:

 

            Each item-book or periodical-was ordinarily excepted from twice.  In the first count were underlined the first word of each page of a book, the first word of each column of each age of a monthly (2-3 columns to a page), and the last word of the seventh line of each column of each page of a daily (8 column to a page).  In the second count were underlined the last word of the first line of the last paragraph of each page of a book, the last word of the first line of the last paragraph of each column of each page of a monthly, and the last word of the seventh line of each column of each page of daily.  (The procedure for dailies was designed to avoid the date lines and the news agency names.)  A count was maintained of the word-tokens extracted.  As soon as the figure 1,00,000 was exceeded.  Collection was suspended (the actual count being 1, 00, 064).

 

            The way a word was defined for this operation has already been indicated.  An incomplete word marked with a hyphen at the end of a line was completed from the next line.  In applying the rules of random choice above the following items were ignored; proper names, abbreviations, figures (but not spelt-out numerals), the running heading of a page, news headings in a daily, verse citations, Sanskrit citations, matter not in the Oriya script and in display or fancy printing in the advertisement (but not Sanskrit or English loanwords in Oriya script nor the ordinary letterpress in an advertisement No page that had anything on it was left out.

 

Step 3 preparation of the word-token slips:

 

            For each word-token so underlined, a slip (3” X b” size used throughout) was written out with the word in Oriya script at the center.  No attempt to normalize orthography was made at this stage beyond correcting obvious misprints.

 

Step 4 Arranging the slips alphabetically:

 

            All slips were arranged in a single alphabetically arranged file.  The Oriya order was used ( see Appendix 1 of this introduction).  This was done by hand with the help of a pigeon-holed cabinet-taking one letter at a time from left to right.

 

Step 5 Removal of duplicate slips:

 

            For each word-type only one slip was retained, the duplicates being discarded. In doing so the total number of tokens appearing in the corpus for the word-type-the frequency that word-was written in the top right corner of the slip.  When a word-type had alternate spellings, a slip with the preferred spelling was retained with the other spellings noted in the bottom left corner.

 

Step 6 Preparation of the word-type slips:

 

            The 20968 slips so retained were then processed as shown in the diagram to yield word-type slips.

 

 

(4)    Serial                                           (3)        Word frequency

        Numbers                                      (6)        Number of letters in each syllable

 

 

(1)               Spelling I Oriya script

(2)               Spelling in Roman with

syllable divisions.

(5)        Alternate spellings                (7)         Grammatical type 

         in Roman.                             (8)          Endings symbolized.

 

 

 

 

 

 

 

Notes:

 

1.         This is preferred spelling in the judgement of the Research Scholar who prepared all the slips.  He consulted Pramodacandara Deb’s Pramoda-abhidhāna (2 vols, Cuttack, 1942.

 

2.         A Roman writing system was devised consisting of 51 letter-symbols in a certain order (analogous to the Oriya alphabetical order).  The symbols together with their equivalence to the letters of the Oriya script (the graphemes of Oriya and to the units of the Oriya sound system (the phonemes of  Oriya ) are set out in Appendix 1.  This enables us to deduce the frequencies  of the gramphemes (List 5A) and of the phonemes (List 5B) from those of the letter symbols (List 5).  The slips were arranged in the order of the letter symbols.  Syllabic divisions were shown with hyphens.  (For the rules for inserting hyphens see Appendix 1.)

 

3.         This is carried over from step 5.  The number includes the occurrence of the word-type in alternate spellings.

           

4.         This number-from 1 to 20968-shows the place of the slip in the alphabetical order and links it to the corresponding punched card.

 

5.         These are recorded later in list 1.

 

6.         Thus for a word like I-la-ka the notation at this point will be 1299 that is, 1 letter in the first syllable, 2 in the second, and 2 in the third.

 

7.         All words were classified into 5 types as follows:

 

            Type 0             without any of the listed endings.

            Type 1             with one nominal endings.

 

            Type 2             with two nominal endings

 

            Type x              with one verbal endings.

 

            Type y              with two verbal endings.

 

8.         A list of the principal grammatical endings was first prepared and a code number was assigned to each (See Appendix 2 of this Introduction.  The endings occurring in the word were noted here in the form of code numbers.  Thus 22-07 means that the word contains ending 22 followed by ending 07.

 

 

Step 7 Preparation of a punched card for each word-type slip:

 

            ICT cards of type 4-354 with column 1 to 80 and twelve rows (x, y, 0, and 1 to 9) were used).  The columns were utilized punching as follows:

 

            1 to 5 coding the serial number (the first digit in column 1, blanks to the right) tying the card the slip.

 

            6 to 9 coding the word frequency number followed by x (thus, frequency 13 was punched as follows: 1 under 6, 9 under 7, x under 8, nothing under 9).

 

             10 to 14 coding the spelling of the first syllable with blanks to the left. For the code number that is assigned to teach letter symbol and is indicated by single or double punching, see under that column in the table included in Appendix 1.

 

            15 to 54 coding the spellings of the second to the ninth syllables in a similar fashion with 5 columns to a syllable.  (No word in the corpus exceeded 9 syllables.)

 

            55 to 63 coding the number of letters in each syllable followed by 0 with blanks to the right (see Step6, note 6).

 

            64 Coding the grammatical type-0, 1, 2, x, y, as the case may be (see step 6, note 7.)

            65 to 68 Coding the grammatical endings-the first in 65-66, the second if any in 67-68. See Appendix 2 for the code.

 

            69 to 80 Not utilized and left blank.

 

Step 8 verification of punched cards:

 

            As soon as the puncher transferred the data from the slip to the punch card with in the help of a punching machine the verifier verified the correctness against the slip with the help of verifying machine.  If necessary a new card was punched and the wrongly punched one discarded.

 

 

Step 9 Preparation of List 1:

 

            The file of word slips at the end of step 6 was utilized for writing out list 1: word-types spelt in Roman letter symbols arranged alphabetically with the alternate spellings (if any) in Roman and with the frequency (out of 1,00,064 tokens) noted against each.

 

 

Step 10 Preparation of  List 2:

 

             The slips were hand-sorted according to the frequency number (slips with frequency numbers below ten were not sorted discarded) and List 2 was written out: words arranged in an order of descending frequency from the maximum to frequency 10.  Under each frequency number the order is  alphabetical.  Words with a frequency 1 to 9 are too many to be listed profitably-such lists can always be complied visually by scanning list 1.

 

Step 11 Machine-sorting for the preparation of lists of syllable-types, letter-types, and ending types:

 

            The punched cards were fed into a tabulator for the purpose of sorting.  First, they were sorted according to the frequency number.  Subsequently, each stack was handled separately.  Note that the X suffixed to the frequency number-thus, all cards bearing X in column 8 and frequencies from 10 to 99 and thus could be separated first conveniently before they could be sorted further into the stack for 10, for 11, etc.

 

            Using the 0 in columns 55-63 (analogously to then use of X in column 6-9) the cards were than sorted according to the number of syllables in the word.  Using columns 10 to 68 the frequency for the syllable-type and each letter-type was added up.  The appearance of a given letter 30 times in the stack of frequency 3 accounts for 30 x 3 tokens of that letter type, and so on.

 

            The stack for a given frequency was put together again and then resroted using columns 64 to 68.  The frequency for each ending-type was added up.

 

Step 12 Preparation of Lists, 3, 4, 5, 6, 7, 8:

 

            Slips were than prepared for each syllable-type, each letter-type, and each ending-type with the frequency number ( the total number of tokens encountered it the corpus of 1,00,064 word-tokens) in the top right corner of the slip.

 

            The three files were each arranged alphabetically and Lists 3, 5, and 7 were prepared respectively for syllable types, and ending-types.

            The files were then manually rearranged in rank order of descending frequency and Lists 4, 6, and 8 were prepared respectively for syllable-types, letter-types, and ending-types.

 

Step 13 preparation of Lists 5A and 5B:

 

            Given the frequencies of all the letter symbols, those of all the graphemes and the phonemes can be computed by using the equivalences.  For details see Appendix 1.

 

Step 14 Typing and proofreading:

 

            A typescript of the 10 lists was prepared constituting the results of this study and proofread.

 

SOME FINDINGS:

 

            The term ‘morphemic frequency’ in the title of this study is, one must admit, slightly misleading.  With the help of List 1 and List 7, somebody determined enough and knowing the language well enough could compile a list of morph-types.   The situation is similar if what one is looking for is the frequency of entry-words in an Oriya dictionary.  (Since Oriya inflections are suffixal in character, all the grammatical forms of, say, a given verb will be brought together in the alphabetical  word list-List 1.)

 

            The type-token ratio for the different kinds of units can be seen in the following table:

 

                                                                       

Total number

Total number of types

Total number of tokens

Words (List 1)

20,968

1,00,064

Syllables (List 3)

2,128

3,11,318

Letter symbols (List 5)

51

6,39,613

Graphemes (List 5A)

54

5,16,097

Phonemes (List 5B)

38

5,89,518

Vowels (List 5B)

6

3,11,318

Consonants (List 5B)

                          31

            2,75,021

Ovowel (List 5B)

                            1

            3,179

Endings (List 7)

98

           31,212

 

            Other interesting ratios that could be worked out from this table are: Syllable-token/word-token, phoneme-token/word- token, phoneme- token/word- token, ending- token/word- token, Vowel- token/consonant- token, phoneme- token/syllable- token etc.,

 

 

 

            The highest and lowest frequencies for the different kinds of units have been shown by the following types:

 

           

 

 

Highest Frequency

 

Lowest Frequency

Word

0

1137

several

         1

Syllable

r

11951

480 types

  1

Letter symbol

ə

125314

h

  237

Grapheme

 

58587

 

       18

Phoneme

ə

126416

 

     278

Vowel Phoneme

ə

126416

 

   8424

Consonant phoneme

r

47282

 

     278

Endings

rə,ri

4154

{four

1

         1

 

 

 

two

0

 

            it will be interesting to see how far high-frequency words and endings contribute to the high frequency of certain syllables and phonemes.

 

            Word-types may be classified according to the frequency-range (from so many present to so many percent) and the total number of types and the token percentage) token total for the range divided by 1,00,064 multiplied by 100) may be noted for each class.

 

Frequency range

Number of types

Percent of Total Population

Above     1000      (1% or more)

1

1.1  %

From      200   to   999 (0.2% +)

28

9.6  %

From       100  to 199 (0.1 % +)

45

9.4  %

From         50  to   99 (0.05 % +)

176

5.9  %

Form         10  to   49 (0.01 % +)

1548

28.4   %

From           1  to     9  (at low 0.01 %)

 

 

 

            Syllable-types may be classified according to the structure of each (V, CV, VC, CVC, etc.) and the total number of types and the token percentage may be noted for each class.

 

POSSIBLE APPLICATIONS:

 

            The results of the study presented in the form of the lists can be put to various uses when properly interpreted and manipulated.

 

Technical uses: (1)  Designing of methods of speedwriting (short hand systems)-all the lists, especially syllable and phoneme lists.  (2)  Designing  of typewriter key boards-grapheme list especially.  (3) Typing fonts-grapheme list especially, (4)  Telecommunication engineering.  (5)  Script and spelling reform Educational uses: designing curricula, reading or recorded material, proficiency tests, etc. for language learners and users-whether young or old, whether native speakers, dialect speakers, or foreigners.

 

            Scholarly uses: (1) Phonological and grammatical analysis, Lexical study.  (2) Study of statistical and cybernetic properties and comparing these with those of other languages.  (3)  Comparison between different languages from the point of view of historical study, linguistic typology, and linguistic anthropology.  (4)  Statistical study of literary style.

 

 

 

 

APPENDIX –1

 

ORIYA  ALAPHABET IN ITS GRAPHIC AND PHONIC ASPECTS

 

Serial No.

Letter symbol

Grapheme

Phoneme

Code

Remarks.

1

ə

 

ə

0

The phoneme ə also appears in 9, 11; phonetically it is slightly rounded.

2

a

 

a

1

 

3

i

 

i

2

Cp.3 and 4, length is not phonemic.

4

i

 

i

3

 

5

u

 

u

4

Cp.5 and 6, length is not phonemic.

6

ū

 

u

5

The phoneme u also appears in 7.

7

r

 

ru

6

One grapheme but two phonemes.

8

e

 

e

7

 

9

əy

 

əy

8

Like 7.

10

o

 

O

9

 

11

əv

 

əv

0,1

Like 7.

12

m̄

 

o

 

0,2

Cf. 19; this grapheme correspondence to phoneme

 Except before letters 20-40 (see below).

13

m͂

 

 

 

 

̃

 

 

0,3

 

14

h

:

h

0,4

Cf.49.

15

k

 

k

0,5

 

16

kh

 

kh

0,6

The phoneme kh also appears in 50.

17

g

 

g

0,7

The phoneme g also appers in 51.

18

gh

 

gh

0,8

 

19

 

 

0,9

The phoneme also

20

c

 

č

 

1,2

 

21

ch

 

čh

1,3

 

22

j

 

 

1,4

The phoneme also appears in 40.

23

Jh

 

 

1,5

 

24

n̅

 

n

1,6

Cf. 34.

25

 

1,7

 

26

h

o

h

1,8

 

27

 

1,9

The 2 graphs correspond to the stop and the flap allophones respectively

28

h

 

h

2,3

Like 27.

29

 

2,4

 

30

t

 

t

2,5

 

31

th

 

th

2,6

 

32

d

 

d

2,7

 

33

dh

 

dh

2,8

 

34

n

 

n

2,9

The phoneme in also appears in 24.

35

p

 

P

3,4

 

36

ph

 

ph

3,5

 

37

b

 

b

3,6

The grapheme also appears in 45.

38

bh

 

bh

3,7

 

39

m

 

m

3,8

 

40

 

 

 

4,5

Cf.22.

41

y

 

y

4,6

The phoneme y also appears in 9,5051

42

r

 

r

4,7

The phoneme r also appears in 7

43

 

 

 

4,8

The corresponding Devanagari letter is placed after the correlate of 49.

44

l

 

l

4,9

 

45

b

 

v

3,9

Cf. 37; the grapheme has this value in some clusters and in some loan words; the phoneme v also appears in 11; for the three graphs, see below.

46

ś

 

s

5,6

Cf. 48

47

 

s

5,7

Cf.48

48

s

 

s

5,8

Phoneme s also appears in 46,47.

49

h

 

h

5,9

The phoneme h also appears in 14.

50

x

 

khy

6,7

The graph is traditionally a conjuct of 15+47; phonemically  is followed by 41.

51

ñ

 

gy

6,8

The graph is tradtitonally a conjuct of 22+24; phonemically 17 is followed by 41.

 

           

            The grapheme ***** corresponding to Devanagari   »Ö is not found in the corpus. The diacritics 12,13,14 are romanized as letters.  The viārma  symbolizing the absence of the so-called inherent ə is ignored in the count.  Note that the inherent of ə Oriya is a regularly pronounced; unlike many other modern Indo-Aryan languages, Oriya has very few word-types ending in a consonant phoneme.  Lists 1 and 2 retain hyphens only when a part of Oriya orthography.

 

            Syllabic hyphens were inserted at step 6 (see Note 2 there) according to the following principles:

 

1. As many syllables as there are vowel letters (1 to 11).  So a hyphen interprevenses two successive vowel letters-as o-i-a, k ə -ə. Initial and final  sequences of non-vowels if any go with the adjacent vowel.

 

            2. The following are treated as single letters-9,11,16,18,21,23,26,28,31,33,36,38,50,51.

 

            3.  The diacritic-letters 12,13,14 always follow a vowel letter and go with it.  Any other non-vowel when followed by a vowel goes with it.  A geminate consonant (as, kk) is never split up.

 

            4.  Out of a sequence of non- vowels (other than germinate between tow vowels, the first goes with the preceding vowel, the rest go with the following vowel.

 

            In the grapheme List (List 5A) observe that the post-consonant forms of vowel-letters and the conjunct shapes of consonant letters naturally went with the respective letters that the frequency for the grapheme ******** can be calculated by adding up the frequencies of all syllable-types beginning with (this grapheme has no post-consonantal form); that the two variants of 27, 23 remain unsorted in that the virāma at the end of the few consonant-ending words remaining ignored in the count; and that the frequencies  of 37 and 45 are added up for the grapheme *****-which is shaped like ob or oy when it corresponds to the phoneme /v/ in initial positions or after a vowel (see 45 in the table).

 

            In the list phoneme list (List 5 B) observe that the following phonemes add up the frequencies of the letter symbols shown against them: /ə/- ə, əy, v; /i/-i, ī , /u/ -u, ū, r, /kh/-kh, x; /g/-g,  ñ;/ /- j, ȳ; /n/ - ñ, n; /y/ -əy, y, x, ñ; /r/-r, r; /v/ -v, b; /s/ - ś, , s; /h, -h, h.

 

            The frequencies of letter symbol have to be distributed among the phonemes /n/ (42 tokens all in words beginning with gə before letters 20-24, 30-34 and 41), /m/ (81 tokens, all in words beginning with before letters 35-39),/ /  (1534 tokens, elsewhere).

 

           

 

The phonemes of Oriya can be listed group-wise as follows: p, t, , k; ph, th, h, čh, kh; b, d, , g; bh, dh, h ,h, gh; m, n, ,  s, l, , r; h; y, v; I, e; u, o; ə,a; (to go with I, e, u,  ə, a).  The phonemes , h have flap variants between vowels, finally, before a consonant, or fer a non-nasal consonant, but stop variants initial, in remination, or after a nasal consonant.  (The dotted variants of graphemes 27 and 28 represent flaps; the other stops.).

 

 

APPENDIX-2

 

GRAMMATICALLY IMPORTANT ENDINGS

 

            The number against ending (or group of endings in a few cases) refers to the coding used on slips (steps 6) and cards (step 7).  Some endings are composed of more than one morph.  Structurally they may be derivational, inflectional, or phrasal.  The addition of ‘etc.’ means other endings of a similar shape and function are also coded alike and thus counted along with the ending that precedes.  The hyphen after the ending shows that it may be followed by another ending.

 

ENDING

 

NOMINAL

CODE

ENDING

CODE

aə etc.

43

 

 

Uprə, etc.

46

e

19

Ka

08

haru, hum͂, həum͂

33

Ke, ke

10

hare

34

guaku, guaeȳake

24

hi

35

guaku, ȳake

28

hu

32

guaka

26

te, tə

16

guaku

25

digə, etc.

44

Guik, təkə

23

Pəchə, etc.

45

guiku

27

pari

 

39

kə-

12

pakhə, etc.

42

ku

11

bhəi, rupe, bhabe

07

chəa, etc.

47

bhadere

50

e-

20

bhitəre

 

Dhayəere

40

mane

15

manə-

96

sage, etc.

41

manəka-

maku

14

13

gathe, səhita, səhə

səhəkare, etc.

48

 

 

VERBAL

 

ə, ə m͂

54

ibe, im͂

61

ənti nti

55

ib, ib-

63

ntu, ntu, nta

56

ibar

64

anti, nti

57

ibi

59

antu, anta, nta, ante

58

ibu, ibum͂

60

ante, nte

00

ibe

62

i,im͂

51

ilə

90

ichə, icə, iəchi,

73

ilā, im̃la

 

98

ichəti, icənti, Icnti, ichəti

74

ili, ili ,

ilu, ilum̃

 

88

89

ichi, ici, iəchi,

ici, ichi

71

ie, im̃le

 

91

ichu, icu, iəchu

72

ucə   hnti, ucəh, ucnti    

uəc, ucənti

70

itha

09

uchi, uəchi, uci

 

68

ithiba, ithib, etc.

97

um̃ci, um̃chi

 

ithibarə

65

uchu, ucu, um̃cu

69

ithilə

85

uthibā, uthibe, uthibə

66

ithilā, im̃thila

87

thibarə

67

thili, im̃thili

83

uthilə

80

thili, ithilim̃

 

84

uthila

82

ithile, im̃thile

86

uthili

78

Uthilu

79

Le

94

Uthile

81

La

99

Chə, cə

77

Li

92

Chi, ci, echi

75

Lu, lu

93

Chu, cu

76

Le

95

****

05

 

 

 

NOMINAL AND VERBAL

 

agə

49

na, nai, nə, ni,

02

u, u

 

53

Nuhənti, nei,

 

e, e

 

52

Paim̃, nimtaə,

 

ei,

01

səkaśeə-

38

ku, haku

29

R, ri

 

ā, ha-

17

Ru

36

i

18

ru

31

tə, təh

 

Re

37

dbara, oge

04

Lagi, hetu, etc.

06

 

30

Him̃, hi, hem̃

 

03

 

 

APPENDIX-3

 

THE SOURCE MATERIAL

 

            The names of the authors and the titles of books and periodicals are spelt with the letter symbols tabulated in Appendix 1.

 

            The names of  persons are given with the surname transposed to the beginning-thus senapati mohənə cərə in a scad of the usual Mohan Charan Senapati.

 

            The place of publication in Cuttack, Orisa unless otherwise mentioned.

 

`           The year of publication of the book is for the first edition unless otherwise mentioned.  The actual copy used for this count may of course be of a later year than the one mentioned.

 

            The number of slips refer to the word-token slips (step 3).  The first and the second cont refer to the two underlings (Step 2).

 

 

*****