An investigation of cross-language differences in
pitch range for speakers of English and German
Ineke Mennen*, Felix Schaeffler#, & Gerard Docherty^
*ESRC Centre for Bilingualism, Bangor University, UK; #Speech Science Research Centre, Queen Margaret University, UK; ^School of Education, Communication & Language
Sciences, Newcastle University, UK.
[email protected]; [email protected]; [email protected]
Background
• There has been very little focus on pitch range as a source of cross- language differences
• Yet, there is some evidence that when groups of speakers of different languages are compared
there can be a significant difference in aspects of pitch range - notwithstanding some degree of
overlap
• Two languages where this may well be the case are Southern Standard British English (SSBE)
and Northern Standard German (NSG). Anecdotal evidence suggests that people perceive
differences between the two languages, with SSBE sounding higher and having more variation
How to measure pitch range?
•
Pitch range is notoriously difficult to quantify, and many different measures have
been reported
Pitch range can vary along two dimensions, level and span (cf. [1]):
•Level = the typical pitch height (register) of a speaker’s voice
•Span = the range of frequencies covered by a speaker
Commonly used measures are long-term distributional (LTD), such as mean or
median f0 for level, and maximum minus minimum F0, 90% range, 80% range, or 4
standard deviations around the mean for span
Patterson [1] suggests that there are some problems with LTD measures (e.g. pitch
tracking errors, non-normal distribution of F0, less perceptual validity)
Alternative to LTD measures are ‘linguistic’ measures, linked to specific linguisticallydefined landmarks in the F0 contour ([1], based on [2, 3])
•
•
•
Aims
•
There are two primary dimensions to our investigation:
1. To evaluate which measures are best suited to capture any cross-language differences in
pitch range
2. To use a range of these measures to attempt to identify the nature of the cross-language
differences in performance which underpin the differences that people perceive
Method and Participants
Production
Material and participants:
• First sentence of a short story read by 60 (30
per language) female speakers, between 20
and 40 years of age
• As this material contained too few instances of
linguistic tones, these were also derived from a
larger part of the story (5 sentences), for a
subset of 50 speakers
Procedure:
• LTD measures were derived from raw F0 time
series, corrected for tracking errors:
•Level: mean, median, maximum, minimum,
quantile (95%, 90%, 75%, 25%, 10%, 5%)
•Span: 100% span (max-min), 90% span,
80% span, 50% span, standard deviation
(SD), 4 SD around the mean
• Linguistic measures were derived from
linguistically relevant F0 landmarks:
•Level: H*i, Hi, H*, H, L*, L, I, FL
•Span: H*i-FL, H*i-L, H*-FL, H*-L
• All span measures were measured in Hz and
semitones (ST), except SD (only Hz)
• Mann-Whitney U tests were used for statistical
testing, due to indications for non-normal
distribution of some variables
• Tool: Praat
• Original pitch contour was approximated by
resynthesized contour, based on pitch targets,
changes in slope and interpolation between
targets
• Pitch targets were local maxima and minima of
the contour
• Changes in slope were also marked where
necessary, but not included in present analysis
• Local maxima and minima were labelled H* and
L*, if aligned with prominent syllable, and H and L
otherwise
• Initial and final targets were labelled separately.
Final lows as FL, and the first peak of a phrase
was separately marked as H*i or Hi
Production Study
Production
Perception
Material, participants and procedure:
•
On-going study, aiming at a number of 60
participants (30 per language)
•
Currently analysed: 23 German
participants
•
Participants listen to a delexicalized
version of the sentence from the
production study (60 stimuli)
•
The sentences were taken from the story
readings from the production study
•
Participants decided whether the stimulus
was of English or German origin (binary
forced choice), and how confident they
were about their judgement (5 pt scale)
•
Spearman’s rho was used for individual
correlations, due to non-normal
distribution of some variables
•
Analysed Variable: PEJ
Perception Study
PEJ
Percentage of English Judgments per
stimulus: How often was a certain stimulus
judged as being English?
Perception
Results
Measure type Variable
LTD Measures
Level (Figure 1):
• Similar values for mean and median
across languages
• Maximum higher for SSBE (but ns)
• 95%, 90% and 75% quantiles were
significantly higher in SSBE
• Lower quantiles (25%, 10%, 5%) were
not significant.
Span (examples in Figure 2):
• Generally wider span for SSBE (all
measured spans except max – min
were significant in Hz and ST, max –
min was only significant in ST)
Linguistic Measures
Level (larger data-set, Figure 3):
• Significantly higher values in SSBE for H*i
and phrase-initial tones (I).
• Final lows (FL) were virtually equal across
languages.
• Interestingly, non-initial H* are higher for
NSG.
Fig. 3
Sig level
p<
N
95% quantile
0.669
0.001
60
90% quantile
0.644
0.001
60
75% quantile
0.584
0.001
60
Maximum
0.561
0.001
60
Mean
0.529
0.001
60
4 SD around mean
0.548
0.001
60
SD
0.548
0.001
60
90% range
0.532
0.001
60
80% range
0.469
0.001
60
Range (max – min)
0.455
0.001
60
0.658
0.001
60
L
0.203
n.s.
56
FL
0.187
n.s.
50
Linguistic span First peak – FL (Hz)
0.606
0.001
50
First peak – L (Hz)
0.519
0.001
56
First peak – FL (ST)
0.468
0.001
50
STfirst_peakLdiff
0.408
0.002
56
LTD span
Span (larger data-set)
• Significantly wider for SSBE in the case of
H*i – FL and H*i – L, and significantly wider
for NSG in the case of H*-FL and H*-L
Fig. 2
Rho
PEJ
LTD level
Linguistic level First peak
•Overall, SSBE and NSG show quite different distributions of tonal categories (see Figure 4)
• This distributional difference had consequences for the linguistic measures in the first sentence:
-only two categories (FL and L) showed sufficient numbers for cross-language comparison.
Neither was significantly different across languages
-In order to compare initial peaks, H*i and Hi had to be combined into a single category (P1). P1
was significantly higher for SSBE
-Span measures could only be calculated for P1-FL and P1-L. Both differences were significantly
wider in SSBE (when measured in Hz and ST)
Fig. 1
• Stimuli were resynthesised with the ’humming
function’ of Praat
• This maintains the pitch contour and the
voiced/voiceless distinction, but removes all
lexical information
• In order to maintain intensity relationships of the
original utterances, the intensity contour of the
resynthesised stimuli was multiplied with the
original contour of the utterance
Fig. 4
Acknowledgments
This study was funded by the
UK
Economic
&
Social
Research Council (RES-000-221858). We thank Frank Kügler
and his colleagues for the
collection of the German data.
• A range of acoustic variables show
high correlations with PEJ (see
Table 1 opposite)
• Span measures do not exceed the
correlations for level measures
• Linguistic measures do not exceed
the correlations for LTD measures
References
[1] Patterson, D., 2000. A linguistic approach to pitch range
modelling. PhD dissertation, University of Edinburgh.
[2] Ladd, D.R.; Terken, J., 1995; Modelling intra- and interspeaker pitch range variation. Proceedings of ICPhS.
Stockholm, 386-389.
[3] Shriberg, E.; Ladd, D.R.; Terken, J.; Stolcke, A., 1996.
Modeling pitch range variation within and across speakers:
predicting f0 targets when ”speaking up”. In Proceedings
ICSLP, 1–4, Philadelphia, PA, USA.
Discussion & Conclusion
• There is clear evidence for pitch range differences between SSBE and NSG.
• The linguistic measures indicate that this difference is mainly a consequence of higher H*i in
SSBE
• The more frequent use of H* accents in SSBE might also contribute to the impression of a
wider pitch range in SSBE (despite the fact that non-initial H* are lower in SSBE)
• Measures taken from the lower end of the F0 distribution (minimum, lower quantiles, FL) do
not differ greatly across the languages
• The perception study indicates that people are generally sensitive to the aspects of pitch
range which differ across the languages, and that variables that are significant in production
also play a role in perception in the context of this task
• Overall, there is no indication that linguistic measures are more perceptually valid than LTD
measures (contra [1]). Further work is under way to investigate the extent to which the
perceived pitch range difference between SSBE and NSG is a consequence of across-theboard higher amounts of pitch variation in SSBE, or is linked to only a few and rather local
events in the F0 contour
Descargar

Slide 1