QUALITY IN HUMAN SPEECH
Sound quality is defined; segmental
and overall articulatory qualities are distinguished from each other
and from overall voice qualities. A classificatory grid for overall
sound features is proposed; where possible, auditory distinctions
are correlated with physiological and acoustic distinctions. Uses
for the conceptual framework for future work in speech and music are
The term ‘sound quality’ denotes the
field of those auditory distinctions (a) that are not to be fitted
in the pitch and loudness scale; (b) that are needed along with the
pitch and loudness scale to enable us to say that sound of a specified
pitch, loudness, and quality lasts for so long or shifts through time
to another sound; (c) that broadly correlate with the acoustic information
on the energy distribution profile in the wave along the frequency
scale; and (d) that correlate with the sound production information
on the complexity of the vibrations of the sound sources and more
importantly on the damping and resonance channel properties of the
Quality distinctions in human speech
are broadly of two sorts :
(i) Those subject to relatively greater
control through modifying the controlled and ballistic gestures of
the mobile portions of the speech tract and through these gestures
modifying the ‘shape’ – complex or otherwise – of the vibrations in
the sound sources and the larynx itself before being modified)
is believed to be a tone approximating a rectangular wave (cf. Joos
1948: § 2.1); the whistle tone on the other hand approximates a sine
wave (and hence a pure tone). The shape of the oral cavity and the
activation or otherwise of the nasal cavity are largely responsible
for the vowel – like qualities superimposed on the glottal tone. Acoustically,
these distinctions correlate with the presence of formats (energy
concentration zones along the component frequency scale) and the presence
or absence of an identifiable fundamental frequency. Auditorily, these
consist in the vowel and consonant qualities (a) identified as segments
and syllables or (b) identified as overall qualities. We shall not
be concerned with the segmental qualities in this note. To the latter
sort of overall qualities, we shall give the name – overall articulatory
Those much less subject
to control which physiologically correlate with the degree of vocal-fold
tension, presence or absence of moisture, shape and size of the laryngeal-pharyngal
cavities (partially controlled by the local musculature and by the
placement of the root and the far back of the tongue), and the like.
Acoustically, these correlate with the ‘fillings’ in the formant zones
(Joos 1948). To these we shall give the name overall voice qualities.
So we have –speech qualities (i)
(a) Segmental articulatory qualities
(ii) Overall voice qualities.
the extent that any of these are beyond conscious control, they serve
to identify the speaker (or the singer), to give a clue to the state
of the body (sunken, drunken, choked voice, bad cold), or to characterize
the language (or mode of singing). Now we shall set up a grid for
classifying (i) (b) and (ii) in terms of independently variable features.
Overall Articulatory Qualities
Over fronted / Over retracted – with articulations relatively
further fronted /retracted in the mouth. Thus, Hindi and Urdu are
respectively over- fronted/ over retracted.
Palatalized/Labiovelarized – with an accompanying y- or i-
quality/w- or u- quality i. e. respectively with the front of the
tongue raised towards the hard palate/with the back of the tongue
raised towards the soft palate and lip corners brought closer to each
other. Thus; :aggressive ve∙vņe/ regressive ‘pouting’
in Marathi is respectively palatalized / labiovelarized. 1
Nasalized / Denasalized-with slight nasalization of the non-nasal
sounds / with slight denasalization of the nasal sounds. Thus, whining
is nasalized, a bad cold denasalizes speech.
Breathy – with glottal friction superimposed on normal voice
or breath. Hus, in Gujarati or Hindi an h consonant is often
dissolved into a breathy voice (shown have by underlying) in Gujarati
gāndhi as gāndhī and Hindi tarah as tara).
Drunken speech often has over-voicing which is probably the opposite
Creaky—with glottal trill superimposed on normal voice
or breath. Thus, very low voice in singing or speech often goes with
a creaky quality (as of an unoiled door hinge).
Whispery – with whisper-glottis replacing voicing in the voiced
sounds. Thus, extra soft speech is often whispered i.
(7) Falsetto – with whisper-glottis replacing
voicing in the voiced sounds. Thus falsetto may be used by a male
mimicking a female voice and in certain types of singing.
I propose that overall voice qualities can be placed along
two independent scales shown below :
Lightly Damped Highly Damped
(energy concentration Light
voice Strident voice
in higher overtones)
(energy concentration Mellow
In lower tones)
term ‘strident’ with its connotations of unpleasantness is clearly
unsatisfactory. A better substitute is invited from the reader.
Though this Note is not concerned with
Pitch, Loudness, and Concatenation, it will be useful to mention the
overall properties in respect of these since they are often confused
or associated with Articulatory Qualities or with Voice Qualities.
The first three are pitch features, the next two loudness features,
and the last two concatenation features.
Overhigh/Overlow – with pitch movements in the relatively higher-lower
portions of the pitch scale. Thus, females and children tend to speak
Overstreched/Oversqueezed – with pitch movements spread over
an extensive/restricted portion of the pitch scale. Thus, speech and
certain modes of singing may be monotonous (i.e. oversqueezed).
Pitch Quaver (vibrato) – with extra rapid up and down pitch
movement superimposed over the normal pitch movements of speech or
(13) Overloud (forte)/Oversoft (piano) –
with loudness movement in the relatively louder/softer portions of
the loudness scale. Thus, shouting is both over-high and overloud.
Loudness Quaver (tremaloso) – with rapid increase and decrease
alternating in loudness. Thus, laughter is often characterized by
loudness quaver (effected in this case by spasmodic breath quaver).
Rapid (allegro) /Medium (moderato) / Slow (lento) Tempo – with
relatively rapid/medium/slow changes in segmental articulatory qualities
and in normal pitch movements, and more and longer / medium/fewer
and shorter breaks and pauses. Also tempo changes: accelerando/rallentando.
Overabrupt (staccato) / Oversmooth (legato) – with relatively
more abrupt / smoother transitions in respect of segmental articulatory
qualities and normal pitch and loudness movements.
A loudness feature corresponding to (11) is possible, but appears
to be of little practical significance.
Scope for future work
Determining more reliably the physiological and acoustic correlates
of these various distinctions. Thus, overhigh and overlow when distinguished
in the same person’s voice are thought to be correlated with a certain
positioning of bones and cartilages and musculature (voice set or
register). Consider the beginning towards a physiological characterization
seen in Catford 1964.
Verifying the hypothesis about the two factor characterization
of voice qualities proposed here – perhaps in terms of factor – analysis
of responses to stimuli in an experimental set –up. If one sought
to extend it to non – human sound sources (such as musical instruments)
an attempt will have to be made to arrive at distinctions analogous
to the one between (ia ), (ib), and (ii). The ‘strokes’ of a tabla
or sitar thus would seem to be analogous to speech syllables.
Carefully defining non – technical descriptions. Thus we have
earlier proposed to define shouting as overhigh, overloud speech.
This will be closely linked with the immediately preceding line of
investigation. Some typical English terms that lend themselves to
this sort of conceptual analysis are :
wheezing, crying, pouting, breaking, choking, groaning, moaning, whimpering,
yodeling; ventriloquist’s voice; vibrant, full, strong, rasping, dropped,
thin, covered, closed, faint, suppressed, muffled, clear, sharp, flat,
dark, deep, rich, shrill, hard, guttural, harsh, dry, jarring, husky,
thick, throaty, hollow, booming, sepulchral, ringing, soaring, spooky,
sultry, gentle, syrupy, velvety (all adjectives of voice); stammer,
stutter, sotto voce, chanting, singsong, shouting, yelling, squealing,
slurring, groan, moan.
Note also the German terms Schonstimme and Kraftstimme.
Last but not least, an interpretation (i.e. an identification
of the functions) of all these distinctions in modes of speaking and
singing (such a steady of course presupposes some success in the other
three lines of investigation ).
The investigations will have to be undertaken jointly and severally
by phoneticians, linguists, elocutionists, dramaturgists, and musicologists
with the aid of physicists, physiologists, neurologists, and experimental
Catford, J.C. 1964. Phonation types: The Classification
of some laryngeal components of speech production. In :Jones, Daniel
Jones, Daniel (dedic) 1964. In memory of Daniel Jones.
Joos, Martin 1948. Acoustic phonetics. Languages
monographs. Baltimore, Md.: Ling soc. of America at the Waverly Press.
Those interested in evolving
a modern Indian Sanskrit-based terminology may consider the following
Sound features svana-lakâana
Pitch scale sura-
scale bala- šreņī /sāranī
Speech quality svana- guņa
Articulatory quality parayatna- guņa
Voice quality ghoâa- guņa
Segmental features varņa- lakâana
Overall features adhivyāpī- lakâana
Concentration features samhita- lakâana
Overfronted/ Overretracted adhi-purogata/ adhi-parāgata
Paratalized/Labiovelarized (clear dark) adhi-tālu-ra´jita/adhi-oâ¶ha-m¤dutālu- ra´jita
Breathy mahaprāņa- ra´jita
Flasetto (voice) bhraâ¶a-(ghoâa)
Over high/Overlow tāra/mandra
Overstreched /Oversqueezed caplasura/mitasura
Pitch quaver Vibarto dolitasura
Overloud= Forte/ Oversoft =Piano adhi-prabala /adhi-durbala
Loudness quaver = Tremoloso dolitabala
Rapid = Allegro /Medium =Moderato /Slow = Lento Tempo
Overabrupt = Staccato /Oversmooth = Legato adhi-khaņ∙ita /adhi-dravita
Notes: (1) The prefix over- does not mean
here ‘excessive’ (ati-), but ‘overall, not localized’. (2)
Sanskrit svara- means both ‘vowel’ and ‘tone, note’; we propose
here to make use of the modern differentiation between svara
‘vowel’ and sura ‘tone, note’. (3) Sanskrit dhvani-,
svana- both mean ‘sound’; śabda- means ‘sound, speech sound, word.
We propose to use svana for ‘speech sound’; dhvani for
‘sound (in general)’; šabda ‘meaningful speech sound’. Thus
dhvani-vijñāna will be ‘acoustics’ and not ‘phonetics’;
in talking about the sound quality of a musical instrument, we use
dhvani- guņa and not svana- guņa; I am indebted
to professor S. N.Salgarkar (Deccan College) for some useful suggestions.
This was published in Indian Linguistics
35:222-6, 1974. in talking about sound features in general and not about speech features, we shall use dhvani
language and not svana-language.
1 The aggressive palatalization, which occasionally breaks
into [yæ yæ yæ] conveys impudent challenge, exasperation and annoyance,
and often accompanies jeering repetition of what other person has
just said. This is chiefly used between children and between women
who have set dignity aside. The regressive ‘pouting’, often accompanied
by nasalization, conveys the desire to be petted or treated indulgently
after an error or a misdeed, to pet or console, to whimper. This is
chiefly used between a child and an adult in close relationship and
between lovers (often as a part of baby talk). The latter must be
distinguished from another kind of labialization which is accompanied
by overflow pitch, lowered-jaw-articulation, and heavy voice. This
combination (we can call it ‘booming’) is used iconically in describing
something as large or forceful. This is chiefly used by or to a child in telling a story or narrating an
adventure. Dr. P. Bhaskar Rao and Dr. Amar Bahadur Singh inform me
that all three are found in Telugu and Hindi. It will be interesting
to examine India as a paraphonological area with subareas.