Learning Objectives
• Describe how speakers control frequency
and amplitude of vocal fold vibration
• Describe psychophysical attributes of pitch,
loudness and quality in physiological and
acoustic terms
• Define terms such as speaking fundamental
frequency, speaking fundamental frequency
variability, harmonics (or signal) to noise ratio,
jitter, shimmer, cepstrum, quefrency, and
rahmonic amplitude
What is the difference
between pitch and frequency?
Quantifying frequency
• Hertz: cycles per second (Hz)
Non-linear scales
• Octave scale
– 1/3 octave bands
– Semitones
– Cents
• Other “auditory scales”: e.g. mel, phon
Fundamental Frequency (F0)
Control
What factors dictate the vibratory frequency of the vocal folds?
• Anatomical factors
Males ↑ VF mass and length = ↓ Fo
Females ↓ VF mass and length = ↑ Fo
• Subglottal pressure adjustment – show example
↑ Psg = ↑ Fo
• Laryngeal and vocal fold adjustments
↑ CT activity = ↑ Fo
TA activity = ↑ Fo or ↓ Fo
• Extralaryngeal adjustments
↑ height of larynx = ↑ Fo
Characterizing
Fundamental Frequency (F0)
Average F0
•
•
speaking fundamental
frequency (SFF)
Correlate of pitch
• Infants
– ~350-500 Hz
• Boys & girls (3-10)
– ~ 270-300 Hz
• Young adult females
– ~ 220 Hz
• Young adult males
– ~ 120 Hz
Older females: F0 ↓
Older males: F0 ↑
F0 variability
• F0 varies due to
– Syllabic & emphatic stress
– Syntactic and semantic factors
– Phonetics factors (in some
languages)
• Provides a melody (prosody)
•
Measures
– F0 Standard deviation
• ~2-4 semitones for normal
speakers
– F0 Range
• maximum F0 – minimum F0 within a
speaking task
Estimating the limits of vocal fold
vibration
Maximum Phonational Frequency Range
• highest possible F0 - lowest possible F0
• Not a speech measure
• measured in Hz, semitones or octaves
• Males
~ 80-700 Hz1
• Females ~135-1000 Hz1
• Around a 3 octave range is often considered
“normal”
1Baken
(1987)
Approaches to Measuring
Fundamental Frequency (F0)
• Time domain vs. frequency domain
• Manual vs. automated measurement
• Specific Approaches
•
•
•
•
Peak picking
Zero crossing
Autocorrelation
The cepstrum & cepstral analysis
Autocorrelation
Data
Correlation
+ 1.0
+ 0.1
- 0.82
+ 0.92
What is a cepstrum?
• A cepstrum involves performing a spectral
analysis of an amplitude spectrum
• Returns sound representation to a “timelike” domain analysis: quefrency-domain
• Location of the dominant energy in the
cepstrum is typically associated with the
fundamental frequency of the signal
What is a cepstrum?
Sound Pressure
Time Domain (waveform)
Time
Fourier Transform
Amplitude
Frequency Domain (amplitude spectrum)
Frequency
What is a cepstrum?
Amplitude
Frequency Domain (amplitude spectrum)
Frequency
What is a cepstrum?
Fourier Transform (number 2)
Dominant rahmonic
-quefrency location: fundamental period
-height: degree of periodicity
Quefrency (msec)
Learning Objectives
• Describe how speakers control
frequency and amplitude of vocal fold
vibration
• Describe psychophysical attributes of
pitch, loudness and quality in
physiological and acoustic terms
• Explain what the decibel is and why it is a
preferred way to quantify amplitude
What is the difference between
amplitude and loudness?
Quantifying amplitude
Sound pressure level
• Pressure = force/area
• Units: micropascals
Sound intensity
• Intensity = Power/area where
– power=work/time
– work=force*distance
• Units: watts/m2
Intensity is proportionate to Pressure2
What is the decibel scale?
• We prefer to use the decibel scale to
represent signal amplitude
• We are used to using measurement scales
that are absolute and linear
• The decibel scale is relative and
logarithmic
Linear vs. logarithmic
• Linear scale: 1,2,3…
• For example, the difference between 2
and 4 is the same as the difference
between 8 and 10.
• We say these are additive
Linear vs. logarithmic
• Logarithmic scales are multiplicative
• Recall from high school math and hearing
science
10 = 101 = 10 x 1
100 = 102 = 10 x 10
1000= 103 = 10 x 10 x 10
0.1 = 10-1 = 1/10 x 1
Logarithmic scales use the exponents for the
number scale
log1010 = 1
log10100 = 2
log 101000=3
log 100.1 = -1
18
Logarithmic Scale
• base doesn’t have to be 10
• In computer science, base = 2
• In the natural sciences, the base is often
2.7… or e
Logarithmic Scale
• Why use such a complicated scale?
– logarithmic scale squeezes a very wide range
of magnitudes into a relatively compact scale
– this is roughly how our hearing works in that a
logarithmic scales matches our perception of
loudness change
Absolute vs. relative measurement
• Relative measures are a ratio of a measure
to some reference
• Relative scales can be referenced to
anything you want.
• decibel scale doesn’t measure amplitude
(intensity or pressure) absolutely, but as a
ratio of some reference value.
Typical reference values
• Intensity
– 10-12 watts/m2
• Sound Pressure Level (SPL)
– 20 micropascals
Why do we use these particular values?
However…
• You can reference intensity/pressure to
anything you want
For example,
• Post therapy to pre therapy
• Sick people to healthy people
• Sound A to sound B
Now, let us combine the idea of
logarithmic and relative…
bel= log 10(Im/ Ir)
Im –measured intensity
Ir – reference intensity
A bel is pretty big, so we tend to use decibel
where deci is 1/10. So 10 decibels makes
one bel
dBIL = 10log 10(Im/ Ir)
Intensity vs. Pressure
• Intensity is trickier to measure.
• Pressure is easy to measure – a
microphone is a pressure measuring
device.
• Intensity is proportionate to Pressure2
Extending the formula to
pressure
Using some logrithmic tricks, this translates
our equation for the decibel to
dBSPL= (2)(10)log 10(Pm/ Pr) = 20log 10(Pm/ Pr)
Amplitude control during speech
• Subglottal pressure adjustment
↑ Psg = ↑ sound pressure
• Laryngeal and vocal fold adjustments
↑ medial compression = ↑ sound pressure
• Supralaryngeal adjustments
– Optimizing sound radiation from vocal tract
Sound Pressure Level (SPL)
Average SPL
• Correlate of loudness
• conversation:
• ~ 65-80 dBSPL
SPL Variability
•  SPL to mark stress
• Contributes to prosody
• Measure
– Standard deviation for
neutral reading material:
• ~ 10 dBSPL
Estimating the limits of sound pressure
generation
Dynamic Range
• Amplitude analogue to maximum
phonational frequency range
• ~50 – 115 dB SPL
Learning Objectives
• Describe psychophysical attributes of
pitch, loudness and quality in
physiological and acoustic terms
• Define terms such as speaking
fundamental frequency, speaking
fundamental frequency variability,
harmonics (or signal) to noise ratio,
jitter, shimmer, cepstrum, quefrency,
and rahmonic amplitude
Vocal Quality
• no clear acoustic
correlates like pitch
and loudness
• However, terms have
invaded our
vocabulary that
suggest distinct
categories of voice
quality
Common Terms
• Breathy
• Tense/strained
• Rough
• Hoarse
Are there features in the acoustic
signal that correlate with these
quality descriptors?
Breathiness
Perceptual Description
• Audible air escape in the voice
Physiologic Factors
• Diminished or absent closed phase
• Increased airflow
Potential Acoustic Consequences
• Change in harmonic (periodic) energy
– Sharper harmonic roll off
• Change in aperiodic energy
– Increased level of aperiodic energy (i.e. noise), particularly in the
high frequencies
harmonics (signal)-to-noise-ratio
(SNR/HNR)
• harmonic/noise amplitude
•  HNR
– Relatively more signal
– Indicative of a normality
•  HNR
– Relatively more noise
– Indicative of disorder
• Normative values depend on method of
calculation
• “normal” HNR ~ 15
Harmonic peak
Amplitude
Noise ‘floor’
Harmonic peak
Noise ‘floor’
Frequency
First harmonic amplitude
From Hillenbrand et al. (1996)
Prominent Cepstral Peak
Spectral Tilt: Voice Source
Spectral Tilt: Radiated Sound
Peak/average amplitude ratio
From Hillenbrand et al. (1996)
WMU Graduate Students
60
40
20
Breathiness Rating
80
r=0.88
10
15
cepstral peak (dB)
20
25
Tense/Pressed/Effortful/Strained
Voice
Perceptual Description
• Sense of effort in production
Physiologic Factors
• Longer closed phase
• Reduced airflow
Potential Acoustic consequences
• Change in harmonic (periodic) energy
– Flatter harmonic roll off
Spectral Tilt
Pressed
Breathy
Acoustic Basis of Vocal Effort
Scatterplot
500.000000
Perception of Effort
Regression Adjusted (Press) Predicted
Value
Dependent Variable: effort
400.000000
300.000000
200.000000
100.000000
100.000000
200.000000
300.000000
400.000000
500.000000
effort
F0 + RMS + Open Quotient
Tasko, Parker & Hillenbrand (2008)
Roughness
• Perceptual Description
– Perceived cycle-to-cycle variability in voice
• Physiologic Factors
– Vocal folds vibrate, but in an irregular way
• Potential Acoustic Consequences
– Cycle-to-cycle variations F0 and amplitude
– Elevated jitter
– Elevated shimmer
Period/frequency & amplitude variability
• Jitter: variability in the period of each
successive cycle of vibration
• Shimmer: variability in the amplitude of each
successive cycle of vibration
…
Jitter and Shimmer
Sources of jitter and shimmer
• Small structural asymmetries
of vocal folds
• “material” on the vocal folds
(e.g. mucus)
• Biomechanical events, such as
raising/lowering the larynx in
the neck
• Small variations in tracheal
pressures
• “Bodily” events – system noise
Measuring jitter and shimmer
• Variability in measurement
approaches
• Variability in how measures are
reported
• Jitter
– Typically reported as % or msec
– Normal ~ 0.2 - 1%
• Shimmer
– Can be % or dB
– Norms not well established
Descargar

Document