Literacy Methodology
Reading & Spelling

Symbolisation of Alphabet for an Unwritten Language

Prev | Home | Next

The alphabet-the inventory of letter-devised for an unwritten languages must equate with the inventory of phonemes set up for that language. The earlier assumption that the taxonomic phonemes set up in structural phonology form the basis for the alphabet has been discussed recently (Chomsky and Halle 1968 : 49-50, Halle 1969). The role of non-linguistic, psycho-cultural factors in devising an alphabet has also been discussed (Nida 1954). This paper does not discuss these questions which arise in devising an alphabet but restricts itself to the question of giving suitable symbols or shapes to the decided phonemes for an unwritten language. It must, however, be pointed out hat the model of phonemic analysis is relevant for the symbolization of alphabet also in some cases. In Irula, a Dravidian language of the Nilgiris, voiceless and voicedstops contrast in the word initial position but only the voiced stops occur in the intervocal and post nasal positions. (There are be handled differently.) Ingenerative phonology, the underlying representations can have voiceless or voiced stops in the initial position and voiceless stops in the other two positions. If the Irula alphabet is drawn from the Tamil script which has only one series of stops, the allophonic distribution of voiceless stops will be identical in both languages and it will help the transfer of learning from Irula to tamil. Moreover, Irula will have limited instances of modification for voiced stops. (Periyalwar, forthocoming).

There is no purpose in inventing a new script to write an unwritten language and it is wise to use one of the existing scripts for this purpose. It has been recommended by one commission (GOI 1966 . 141) and more than one seminar on tribal education (NCERT 1967 : 193, CIIL 1971) that the alphabet for the unwritten tribal languages (and other majority languages) must be drawn from the script of the majority or the official languages of the State (called the State language hereafter) in which they are spoken for socio-economic and educational reasons. When the symbols in the State languages have the same sound value as the phonemes of the unwritten language, the same symbols will be used in the unwritten language also irrespective of the fact that, by the strict structural principles, the phonemes in both languages do not have the same value due to different structural relationships between them. For devising alphabet the sameness of the substance of the phonemes in the two languages is sufficient. When the state language has symbols for phonemes which are not found in the unwritten languages, those symbols must be left out. They must be left out even if the sounds they represent are available allophonically in the unwritten language because for the speaker of the unwritten language only the phonemes are psychologically real and having symbols for allophones will add unnecessary complication.

When the unwritten language has phonemes which are not found in the State language, new symbols must be devised for those phonemes. It is possible to give symbols in such case which are totally unrelated to the script of the State language. But they will stand out visually as foreign elements and may go contrary to the direction of writing the symbols in the State language, thus interfering with hand movement and lowering down the speed. Therefore, it is not normally advisable to mix foreign symbols with the symbols of the State language. It is not completely rules out. However, it has been suggested that in Gojri the phonetic sumbol e may by used for the sound it represents (Sharma forthcoming). It blends with the calligraphy of the Perso-Arabic script in which Gojri is written.1

One is normally left with two choices to represent new phonemes. The unused symbols in the State language, i,e., the symbols which do not used to represent the new phonemes. Or diacritic modifications may be made on the symbols whose sound value is closer to the new phoneme.2 This may be explained with two illustrations from the symbolization of the alphabet of Kok Borok, a Tibeto-Burman language spoken in Tripura, using the symbols drawn from the Bengali script. Kok-Borok has / w / for which there is no symbol in the Bengali script. To represent this phoneme, the symbol for the long vowel / u / in Bengali which is not a phoneme in Kok Borok may be used, or a diacritic modification of the symbol for /u/ may be used. Kok Borok does not have long vowels but has two tones, level and high. To symbolize the vowels with high tone, the symbols for long vowels in Bengali script may be used, or a small circle after (Chatterji 1972) the symbols for the corresponding short vowel may be used.

The choice between the possible symbolizations can only rarely be made on pure linguistic grounds. In the above example, the symbols for Bengali / u / cannot be used to represent Kok Borok / w / if it is also used for / ú /, since it will violate the linguistic principle that the relation between phonemes and alphabet symbols must be unique. Another linguistic consideration may be the phonetic relationship between the sound and the symbol. Even though the relationship between the sound and the symbol. Even though the relationship between the sound and the symbol from the State language which is phonetically closer to the sound of the unwritten language. In Thadou (Thirumalai 1972), for example, the phonetic closeness has decided on the choice of aw to represent
/  /. Extra-linguistic factors must often be taken into consideration to make be whether a particular choice will hinder or help the learning process of (1) the mothertongue and (2) the State language, since a primary aim of devising a script for an unwritten language is to help the education of the children of that language. Other consideration such as distinctiveness of symbols for efficient reading, speed of writing and printing-typing facility also relate in some way to the learning process. However, no study has been made on the relative efficiency of scripts from the learning point of view to tell whether using the existing symbols of the script of the State language with new sound values or using modified symbols for the new sounds will give the least problem to learn the State language. Controlled experiment on this question is immediately necessary.

When long vowel symbols are not available in a script, the tones may be represented linearly by utilising unused symbols or non-occuring sequences of symbols, or concurrently by using a diacritic mark below, above or by side of the vowel as mentioned above. For example, it was originally suggested for Ao Naga, which draws its alphabet from the Roman script. that Q and q after vowels may be used for high and low tones respectively (Gowda 1975).

The native literates objected to this since for them it makes the spelling of words cumbersome and marks the visual beauty of the words. It is suggested for Tangkhul Naga ( Arokianathan forthcoming) that repetition of the vowels as in aa and addition of h as in ah may be used respectively for high tone and low tone. It is suggested for Bolo, which uses the Devanagari script, that the visarga * may be used for high tone and consequently o for low tone. There are other ways of representing the tone linearly as in Punjabi where the symbol for voiced aspirated stop indicates the tone of the following vowel. In the present writing practice of Ao Naga a voiced stops, which are not contrastive at the segmental level, seem to indicate the high tone of the following vowel.

When a diacritic mark is used to mark tones acute and obtuse accent marks above the vowel as in á and à or a line above and below the vowel as in a and * are some possibilities. When there are more than three tones, one is compelled to use a combination of the different modes of representation. Use of diacritic mark is necessitated for writing segmental; phonemes also in some cases. For example, to use the Tamil script, which has no symbols for voiced stops and central vowels, for writing Irula, which has these phonemes, diacritic marks like colon ( : ) before stops and umlaut ( .. ) above vowels or combinatory consonant - vowel symbols are necessary.

The choice between the various representation described above must be made from the point of view of simplicity and the reading and writing difficulties it avoids. For example, the line used above and below the vowel for high and low tones respectively will be ambiguous as to whether it goes below the vowel in the first line or above the vowel in the second line when they occur in consequentive lines and will create reading difficulties.

It is argued by some that suprasegmental features like tone need not be symbolised since the native speakers can identify a word with its correct tone in the given context. This is not however a correct solution to the difficulties of representing tone. There are situations where the non-native speakers have to learn the unwritten language which is reduced to writing and they cannot predict the tone. The native speakers themselves will need the tone marking in ambiguous contexts where more than one word is possible. When the native children are taught reading in school, tone marking will be required to help them to perceive the unique symbol-sound correlations, which is an important step in the process of learning initial reading. Often tones have dialect variation. Since a function of writing is standardization of the language and in school the children must learn the standard forms, it will be necessary to symbolize tones in words as in the standard dialect.

It must be clear from the earlier discussions that it is not necessary to have a single unitary symbol for a phoneme. This is true of segmental phonemes also. For example, a language like Lushai, which uses Roman script, may use ng for the velar nasal, hm for the prereleased bilabial nasal, tl for the laterally released dental stop and aw for the lower-mid back unrounded vowel as long as the sequences of phonemes also. If they do, the invariant relationship between phonemes and the symbols of the alphabet will be violated. It can be done, however, if these sequences of phonemes are infrequent, by using an hyphen between the letters to indicate that they are two segments. This will be a natural solutions if morpheme boundary coincides with the hypen. For example, in Thadon (Thirumalai 1972) th stands for aspirated dental stop and t-h for the cluster of /t/ and /h/.

One of the allophones is chosen as the basic allophone to represents the phoneme on the basis of certain phonetic and phonemic factors of the language under analysis and the symbols of the alphabet match these basic allophenes. However, in the case of unwritten languages, symbols reflecting the sound of a non-basic allophone may be chosen if the alphabet of the State language dose not have a symbol for the basic allophone but has a symbol for a non-basic allophone.

When multiple analyses are possible, which are equally valid on linguistic grounds, the choice between them may be made on the basis of the phonemic system of the State language and the alphabet which reflects it. In Kok Borok, the non-syllabic vowels may be analysed phonemically either as syllabic vowels or as semivowels. Since in Bengali they are analysed as syllabic vowels and the Bengali alphabet uses the symbols for pure vowels and not semivowels, from the point of view of making the learning of Bengali easy, in Kok Borok also the symbols for pure vowels may be represent the phonetically non syllabic vowels.

It is not sufficient to consider the phonetic inventories alone when choosing symbols for writing. The combinatorial conventions of the symbols will also be an important consideration, particularly when the script is not alphabetic. This becomes a prime consideration when the Perso-Arabic script is used, since this script uses different symbolizations in different combinations. This consideration of consonant clusters. The Brahmi derived scripts (except the Tamil script) use combinatory symbols (conjunct letters) to represent consonant clusters. There is also another convention of using a diacritic in the word final position or in unusual clusters. When we devise a writing system of the State language, the 2question to be decided is whether the conjunct letters or the halant are as follows. First, if there is a contrast in the word final position between a pure consonant and consonant plus the inherent vowel in the unwritten language, the halant will be necessary anyway and it could be given wider functional value by use in clusters also. Secondly, if there are considerable number of unusual clusters in the unwritten languages, the halant will be necessary and it could be given universal value by use in all clusters. Thirdly, it is easier and faster to learn reading and writing of consonants with halant than conjoined consonants. Fourthly, the use of mechanical devises such as typewriter is facilitated when halani is used. The reason whieh weigh practice is established in the case of the unwritten language, the learner, who will have to learn the reading and writing of the State language, will have difficulty in learning a new practice when the facts remain the same and the interference of the old practice will slow down the learning process. It must be mentioned, however, that it is only an assumption and there is no empirical evidence and research is needed in this area. The choice between the two becomes more difficult when the State language uses conjunct letters for some clusters and halant for others. Because, the question whether one should go in for internal regularity or external commensurability will be answered differently depending on whether he has a purely linguistic consideration or an educational consideration.

Another example of such a situation in the writing of diphthongs. There are cases where the number of diphthongs in the State language and the unwritten language may not be the same. And the State language may use unitary symbols for certain diphthongs, the sequence of vowel and semi-vowel symbols for certain others and the sequence of two vowel symbols for certain others. If external commensurability is desired, the irregular system in the State language must be followed to represent the identical diphtongs in the unwritten language and the new diphthongs may follow any one of the ways. If internal regularity is desired, one of the ways must be chosen and used uniformly for all the diphtongs of the unwritten language. Similar question arises when using the Oriya script for Kuvi. Oriya has no long vowel symbols for / e / and / e / but Kuvi needs them. Doubling of the short vowel symbols may represent these long vowels. To keep this pattern consistency, it has been suggested (Reddy et al 1975) to use double short vowel symbols for all the long vowels of Kuvi. There is no research on the educational implications of this.

Another question regarding the writing convention is the value of inherent vowel. The same value, whether / " / or /  /, may be given as in the State language in the unwritten language also. The problem comes up when the unwritten language does not have a phoneme with the value of the inherent vowel in the State language or has more than one phoneme contending for this position. It is true that an unwritten language derives uses the same script. This will be very clear in the case of a language is an important consideration in devising an alphabet for the unwritten language, it is preferable to have the sound values in both alphabets similar wherever possible. The problem mentioned above can be illustrated with Bodo. Bodo has / " / and / o / with an allophone [] as distinct phonemes. If Bodo derives its alphabets from the Bengali-Assamese script and the learning of Assamese is the consideration, the phonetic value of the inherent vowel must be / o /, which is closer to the Bengali-Assamese /  /. The Bengali-Assamese symbol for / a / must be used for the Bodo phoneme / " /. Consequently, some modification of it will represent the Bodo / a /. If Bodo derives its alphabet from the Devanagari script and learning Hindi is the consideration, then the phonemic value of the inherent vowel-must be / " /, which is its value in Hindi.

Another convention in the writing system of many Indian languages is the use of anusuar to represent the nasal preceding the homorganic stop. It is perfectly possible to use the full nasal symbol with halant in this situation also as, for example, the Tamil writing system does or to use a conjunct letter. In the case of an unwritten language, therefore, there are three options. It is preferable here again to follow the convention in the State language.

A question related to the discussion of the symbolization of alphabet in the spelling of words borrowed from the State language. The question of their representation arises when the pronunciation of the loan words has been assimilated to the phonemic system of the unwritten language as well when the loan words consists of phonemes which are not found in the native phonemic system of the unwritten language. Let us take the second situation first. The choice is between writing the foreign phonemes with available native letters or writing them with same letters used in the State language. Many Tibeto-Burman languages do not have the phoneme / j / but they use frequently the English words like jeep, Jesus, etc., which have this phoneme. In such situations, following the setting up of marginal phonemes, marginal symbols of alphabet may be set up. This will be particularly helpful when the donor language is to be learnt.

In the first situation, the linguistic consideration and the educational consideration are in conflict and therefore the choice is difficult. From the linguistic point of view, it is preferable to write the loan words as they are pronounced in the unwritten language so that the learning of the systematic relationship between sounds and letters in initial reading and writing is not interfered with exceptions to general rules. The Bengali word
/ ghn?t? / 'bell' is pronounced as /gnta/ is Kok Borok, which does not have voiced aspirated and retroflex consonants in its phonemic system. If this word is written as
/ ghn?t? / in Kok Borok, it is not only against the linguistic principle of phoneme - grapheme match, but also the deviation from the pronunciation will cause reading and writing problems. This problem will be acute when the writing system of the State language from which the words have been borrowed is not phonemic like English. If words like bus, church, rough, Christ, etc., are written as they are spelled in English in the unwritten language since their pronunciation is different phonetic values in the native language. If, on the other hand, the loan words are written as they are pronounced, It may create problems when the spelling of the major language is learnt as the spelling learnt in the native language is likely to be transferred to the second language. Besides this learning problem, the speakers of the unwritten language feel that spelling the loan words, particularly proper nouns and religious words like Chirst, church etc., differently from the donor language destroys the isomorphic identity tokens may be written as they are spelled in the donor languages. This will also avoid complications in legal documents where the names have so far been written in this fashion. Regarding common nouns, the solution may be to write, particularly in text books, the loan words as they are pronounced by the speakers of the recipient language but to give the spelling of the donor language within parantheses or at the bottom of the page.

Another question to be dealt with is about the riting conventions followed in the State language such as starting every sentence and some words with a capital letter as in English, not starting a line with a pure consonant as in Tamil etc. Having more than one type fo letters - e. g., print, cursive lower case and upper case letters in English - also comes under this. There is no logical need to follow such conventions in the newly written languages also. As a matter of fact, having more than one type of letters complicates the learning of initial reading and writing. Neverthless, psycho-cultural factors play a role in accepting or rejecting the conventions of the State language.

The final question is about the organization and presentation of the alphabet chart. It was noted above that a sequence of symbols (i.e. combined letters) can be used for a single sound consisting of simultaneously occurring bundle of phonetic features. The question is whether the sequences should be listed in the alphabet chart. Should foir example, the Mizo symbols aw, ng, hm, tl be listed ? In the current practice the first two are listed but not the rest. There is no logical reason for this discrimination but it is not clear whether there is any psychological basis. This kind of discrepancy is found also in the alphabets of languages with long tradition of writing. In English, for example th, though phonetically is unitary, is not listed as a unit in the alphabet. If a part of a combined letter, it may be abstracted ans listed separately in any other unitary letter, it may be abstracted and listed separately in order to reduce the number of units in the inventory, as visarga, anusvar and chandrabindu are given in the Devanagari script. If h stands only for the low tone with many vowels, it may be listed separately and each combination of vowel and h need not be listed.

It will not, however, be possible, when a phonetic features is represented by a process and not by a unique symbol as in representing a high.

tone vowel (v) by the repetition of that vowel (vv).

Another question regarding presentation is the order in which the symbols are listed. For the educational point of view discussed above, the order of the alphabet in the newly written language must follow the order in the State language with the new symbols added after symbols which are phonetically close to them. For language which use the Roman script, however. one may argue, from the point of view of national patteran, that the articulation based order of the Brahmi derived alphabets of India may be followed rather than the arbitrary order of the Greek based Roman alphabets of the Europe.

F O O T N O T E S

1. Examples may be found in written languages also. Five of the Brahmi derived grantha letters added to the Tamil alphabet have continued to have marginal existence in spite of their non-acceptance by great literacy authors like Kamban and opposition by purists. But the use of the Roman F for the voiceless labio-dental fricative / f / strated by a Tamil magazine called Thuglak did not find and acceptance.

2. Either one of these two is followed by a written language also when it develops new phonemes. To represent fricatives, Tamil uses one of its rarely used letters called aytam (*) before the stops and Hindi uses the diacritic mark (dot) below the stops.

R E F E R E N C E S

Arokianthan, S. (Forthcoming) - Tangkhul Naga Phonetic Reader, Mysore ; CIIL.

Chatterji, Suhas, 1972. Tripurar Kok Bokar bhashar likhita rupe uttaran (Introduction to
the writing system of Kok Borok of Tripura). Calcutta : Institute of Languages and Applied Linguistics.

Chomsky, Noam and Morris Halle. 1968. The Sound Pattern of English, New York :
Harper and Row.

CIIL. 1971. Conferences of Heads of Tribal Research Bureaus/Institutes, Etc. Mysore.
(cyclostyled.)

Govt. of India, 1966. Report of the Education Commission (1964-66) Education and
National Development. New Delhi : Ministry of Education.

Gowda, K.S. G. 1975. Ao Naga Phonetic Reader, Mysore : CIIL

Halle. Morris 1969. "Some thoughts on spelling", In Kennath S. Goodman and James T.
Fleming (Ed) Psycholinguistics and the Teaching of Reading, Newark, Delaware : International Reading Association.

Karapurkar, P. 1972. Tripuri Phonetic Reader, Mysore: CIIL

NCERT, 1967. Tribal Education in India : Report of the National Seminar on Tribal
Eductaion in India.

Nida, Eugene, 1954. Practical Limitations to a Phonemic Alphabet. The Bible Translator
vol. 5 No. 1.

Perialwar, R. (Forthcoming). Irula Phonetic Reader, Mysore: CIIL

Reddy, B. R. K., Upadhyaya, S., Reddy, J. 1975. Kuvi Phonetic Reader, Mysore : CIIL.

Sharma, J. C. (Forthcoming). Gojri Phonetic Reader, Mysore ; CIIL

Thirumalai, M. S. 1972. Thadou Phonetic Reader. Mysore : CIIL.