Language and Linguistics
ASHOK R. KELKAR

 

SOUND QUALITY IN HUMAN SPEECH

A CONCEPTUAL FRAMEWORK

 

ABSTRACT: Sound quality is defined; segmental and overall articulatory qualities are distinguished from each other and from overall voice qualities. A classificatory grid for overall sound features is proposed; where possible, auditory distinctions are correlated with physiological and acoustic distinctions. Uses for the conceptual framework for future work in speech and music are suggested.

 

 

            The term ‘sound quality’ denotes the field of those auditory distinctions (a) that are not to be fitted in the pitch and loudness scale; (b) that are needed along with the pitch and loudness scale to enable us to say that sound of a specified pitch, loudness, and quality lasts for so long or shifts through time to another sound; (c) that broadly correlate with the acoustic information on the energy distribution profile in the wave along the frequency scale; and (d) that correlate with the sound production information on the complexity of the vibrations of the sound sources and more importantly on the damping and resonance channel properties of the speech gestures.

 

            Quality distinctions in human speech are broadly of two sorts :

 

            (i) Those subject to relatively greater control through modifying the controlled and ballistic gestures of the mobile portions of the speech tract and through these gestures modifying the ‘shape’ – complex or otherwise – of the vibrations in the sound sources and the larynx itself before being modified) is believed to be a tone approximating a rectangular wave (cf. Joos 1948: § 2.1); the whistle tone on the other hand approximates a sine wave (and hence a pure tone). The shape of the oral cavity and the activation or otherwise of the nasal cavity are largely responsible for the vowel – like qualities superimposed on the glottal tone. Acoustically, these distinctions correlate with the presence of formats (energy concentration zones along the component frequency scale) and the presence or absence of an identifiable fundamental frequency. Auditorily, these consist in the vowel and consonant qualities (a) identified as segments and syllables or (b) identified as overall qualities. We shall not be concerned with the segmental qualities in this note. To the latter sort of overall qualities, we shall give the name – overall articulatory qualities.

 

(ii)        Those much less subject to control which physiologically correlate with the degree of vocal-fold tension, presence or absence of moisture, shape and size of the laryngeal-pharyngal cavities (partially controlled by the local musculature and by the placement of the root and the far back of the tongue), and the like. Acoustically, these correlate with the ‘fillings’ in the formant zones (Joos 1948). To these we shall give the name overall voice qualities.

      So we have –speech qualities            (i)             (a) Segmental articulatory qualities

           (b) Overall articulatory qualities

                                                                         (ii) Overall voice qualities.

 

 

To the extent that any of these are beyond conscious control, they serve to identify the speaker (or the singer), to give a clue to the state of the body (sunken, drunken, choked voice, bad cold), or to characterize the language (or mode of singing). Now we shall set up a grid for classifying (i) (b) and (ii) in terms of independently variable features.

 

            Overall Articulatory Qualities

 

            (1)             Over fronted / Over retracted – with articulations relatively further fronted /retracted in the mouth. Thus, Hindi and Urdu are respectively over- fronted/ over retracted.

 

            (2)            Palatalized/Labiovelarized – with an accompanying y- or i- quality/w- or u- quality i. e. respectively with the front of the tongue raised towards the hard palate/with the back of the tongue raised towards the soft palate and lip corners brought closer to each other. Thus; :aggressive  vevņe/ regressive ‘pouting’ in Marathi is respectively palatalized / labiovelarized. 1

 

            (3)             Nasalized / Denasalized-with slight nasalization of the non-nasal sounds / with slight denasalization of the nasal sounds. Thus, whining is nasalized, a bad cold denasalizes speech.

 

            (4)             Breathy – with glottal friction superimposed on normal voice or breath. Hus, in Gujarati or Hindi an h consonant is often dissolved into a breathy voice (shown have by underlying) in Gujarati gāndhi as gāndhī and Hindi tarah as tara). Drunken speech often has over-voicing which is probably the opposite of Breathiness.

 

            (5)                    Creaky—with glottal trill superimposed on normal voice or breath. Thus, very low voice in singing or speech often goes with a creaky quality (as of an unoiled door hinge).

 

            (6)            Whispery – with whisper-glottis replacing voicing in the voiced sounds. Thus, extra soft speech is often whispered i.

 

            (7)            Falsetto – with whisper-glottis replacing voicing in the voiced sounds. Thus falsetto may be used by a male mimicking a female voice and in certain types of singing.

 

 

 

 

 

Overall Voice Quality

 

            (8-9)            I propose that overall voice qualities can be placed along two independent scales shown below :

                                                            Lightly Damped            Highly Damped

Acute (energy concentration            Light voice            Strident voice

            in higher overtones)

Grave (energy concentration            Mellow voice                Heavy voice

            In lower tones)

 

            The term ‘strident’ with its connotations of unpleasantness is clearly unsatisfactory. A better substitute is invited from the reader.

 

Overall Non-qualitative properties

 

            Though this Note is not concerned with Pitch, Loudness, and Concatenation, it will be useful to mention the overall properties in respect of these since they are often confused or associated with Articulatory Qualities or with Voice Qualities. The first three are pitch features, the next two loudness features, and the last two concatenation features.

 

(10)           Overhigh/Overlow – with pitch movements in the relatively higher-lower portions of the pitch scale. Thus, females and children tend to speak overhigh.

(11)            Overstreched/Oversqueezed – with pitch movements spread over an extensive/restricted portion of the pitch scale. Thus, speech and certain modes of singing may be monotonous (i.e. oversqueezed).

(12)      Pitch Quaver (vibrato) – with extra rapid up and down pitch movement superimposed over the normal pitch movements of speech or singing.

(13)            Overloud (forte)/Oversoft (piano) – with loudness movement in the relatively louder/softer portions of the loudness scale. Thus, shouting is both over-high and overloud.

(14)            Loudness Quaver (tremaloso) – with rapid increase and decrease alternating in loudness. Thus, laughter is often characterized by loudness quaver (effected in this case by spasmodic breath quaver).

(15)      Rapid (allegro) /Medium (moderato) / Slow (lento) Tempo – with relatively rapid/medium/slow changes in segmental articulatory qualities and in normal pitch movements, and more and longer / medium/fewer and shorter breaks and pauses. Also tempo changes: accelerando/rallentando. 

(16)            Overabrupt (staccato) / Oversmooth (legato) – with relatively more abrupt / smoother transitions in respect of segmental articulatory qualities and normal pitch and loudness movements.

           

            A loudness feature corresponding to (11) is possible, but appears to be of little practical significance.

 

Scope for future work

 

(a)            Determining more reliably the physiological and acoustic correlates of these various distinctions. Thus, overhigh and overlow when distinguished in the same person’s voice are thought to be correlated with a certain positioning of bones and cartilages and musculature (voice set or register). Consider the beginning towards a physiological characterization seen in Catford 1964.

 

(b)            Verifying the hypothesis about the two factor characterization of voice qualities proposed here – perhaps in terms of factor – analysis of responses to stimuli in an experimental set –up. If one sought to extend it to non – human sound sources (such as musical instruments) an attempt will have to be made to arrive at distinctions analogous to the one between (ia ), (ib), and (ii). The ‘strokes’ of a tabla or sitar thus would seem to be analogous to speech syllables.

 

(c)            Carefully defining non – technical descriptions. Thus we have earlier proposed to define shouting as overhigh, overloud speech. This will be closely linked with the immediately preceding line of investigation. Some typical English terms that lend themselves to this sort of conceptual analysis are :

 

wheezing, crying, pouting, breaking, choking, groaning, moaning, whimpering, yodeling; ventriloquist’s voice; vibrant, full, strong, rasping, dropped, thin, covered, closed, faint, suppressed, muffled, clear, sharp, flat, dark, deep, rich, shrill, hard, guttural, harsh, dry, jarring, husky, thick, throaty, hollow, booming, sepulchral, ringing, soaring, spooky, sultry, gentle, syrupy, velvety (all adjectives of voice); stammer, stutter, sotto voce, chanting, singsong, shouting, yelling, squealing, slurring, groan, moan.

 

Note also the German terms Schonstimme and Kraftstimme.

 

(d)        Last but not least, an interpretation (i.e. an identification of the functions) of all these distinctions in modes of speaking and singing (such a steady of course presupposes some success in the other three lines of investigation ).

 

The investigations will have to be undertaken jointly and severally by phoneticians, linguists, elocutionists, dramaturgists, and musicologists with the aid of physicists, physiologists, neurologists, and experimental psychologists.

 

REFERENCES

 

Catford, J.C. 1964. Phonation types: The Classification of some laryngeal components of speech production. In :Jones, Daniel (dedic)1964.

Jones, Daniel (dedic) 1964. In memory of Daniel Jones. London: Longman.

Joos, Martin 1948. Acoustic phonetics. Languages monographs. Baltimore, Md.: Ling soc. of America at the Waverly Press.

 

 

COLOPHON

 

            Those interested in evolving a modern Indian Sanskrit-based terminology may consider the following suggestions :

 

Speech Sound features svana-lakâana

                Pitch scale sura- šreņī /sāranī

           

            Loudness scale bala- šreņī /sāranī

 

                        Speech quality svana- guņa

 

            Articulatory quality parayatna- guņa

 

                        Voice quality ghoâa- guņa

 

Segmental features varņa- lakâana

 

Overall features adhivyāpī- lakâana

 

Concentration features samhita- lakâana

 

 

Overfronted/ Overretracted adhi-purogata/ adhi-parāgata

Paratalized/Labiovelarized (clear dark) adhi-tālu-ra´jita/adhi-oâ¶ha-m¤dutālu- ra´jita

Nasalized/Denasalized adhi-sānunāsika /adhi-niranunāsika

Breathy mahaprāņa- ra´jita

Creaky spandana- ra´jita

Whisper yupāšu- ra´jita

Flasetto (voice) bhraâ¶a-(ghoâa)

 

Voice/Breath ghoâa/švāsā

 

 

Acute tivra

 

Grave komala

 

Lightly Damped ninādī

 

Highly Damped ruddhanādī

 

Over high/Overlow tāra/mandra

 

Overstreched /Oversqueezed caplasura/mitasura

 

Pitch quaver Vibarto dolitasura

 

Overloud= Forte/ Oversoft =Piano adhi-prabala /adhi-durbala

 

Loudness quaver = Tremoloso dolitabala

 

Rapid = Allegro /Medium =Moderato /Slow = Lento Tempo

 

Druta-/madhya-/vilambita-laya

 

Overabrupt = Staccato /Oversmooth = Legato adhi-khaņita /adhi-dravita

 

Notes: (1) The prefix over- does not mean here ‘excessive’ (ati-), but ‘overall, not localized’. (2) Sanskrit svara- means both ‘vowel’ and ‘tone, note’; we propose here to make use of the modern differentiation between svara ‘vowel’ and sura ‘tone, note’. (3) Sanskrit dhvani-, svana- both mean ‘sound’; śabda- means ‘sound, speech sound, word. We propose to use svana for ‘speech sound’; dhvani for ‘sound (in general)’; šabda ‘meaningful speech sound’. Thus dhvani-vijñāna will be ‘acoustics’ and not ‘phonetics’; in talking about the sound quality of a musical instrument, we use dhvani- guņa and not svana- guņa; I am indebted to professor S. N.Salgarkar (Deccan College) for some useful suggestions.

 

                This was published in Indian Linguistics 35:222-6, 1974. in talking about sound features in general and  not about speech features, we shall use dhvani language and not svana-language.

 

1 The aggressive palatalization, which occasionally breaks into [yæ yæ yæ] conveys impudent challenge, exasperation and annoyance, and often accompanies jeering repetition of what other person has just said. This is chiefly used between children and between women who have set dignity aside. The regressive ‘pouting’, often accompanied by nasalization, conveys the desire to be petted or treated indulgently after an error or a misdeed, to pet or console, to whimper. This is chiefly used between a child and an adult in close relationship and between lovers (often as a part of baby talk). The latter must be distinguished from another kind of labialization which is accompanied by overflow pitch, lowered-jaw-articulation, and heavy voice. This combination (we can call it ‘booming’) is used iconically in describing something as large or forceful. This is chiefly used by or to  a child in telling a story or narrating an adventure. Dr. P. Bhaskar Rao and Dr. Amar Bahadur Singh inform me that all three are found in Telugu and Hindi. It will be interesting to examine India as a paraphonological area with subareas.