Using an enhanced MDA model
in study of World Englishes
Richard Xiao
University of Central Lancashire
[email protected]
Overview of the talk
• Biber’s (1988) MF/MD analytical
framework
• The enhanced multidimensional analysis
(MDA) model
• An MDA analysis of five varieties of
English in the ICE
2
Factor analysis
• The key to the multidimensional analysis
approach
• A common data reduction method available in
many standard statistics packages such as
SPSS
• Reducing a large number of variables to a
manageable set of underlying factors or
dimensions
• Extensively used in social sciences to identify
clusters of variables
3
Biber’s MF/MD approach
• Established in Biber (1988): Variation
across Speech and Writing (CUP)
– Factor analysis of 67 functionally related
linguistic features
– 481 text samples, amounting to 960,000
running words
•
•
•
•
LOB
London-Lund
Brown corpus
A collection of professional and personal letters
4
Biber’s MF/MD approach
• Biber’s seven factors / dimensions
– Informational vs. involved production
– Narrative vs. non-narrative concerns
– Explicit vs. situation-dependent reference
– Overt expression of persuasion
– Abstract vs. non-abstract information
– Online informational elaboration
– Academic hedging
5
Biber’s MF/MD approach
• Influential and widely used
– Synchronic analysis of specific registers / genres and
author styles
– Diachronic studies describing the evolution of
registers
– Register studies of non-Western languages and
contrastive analyses
– Research of University English and materials
development
– Move analysis and study of discourse structure
• …largely confined to grammatical categories
6
The enhanced MDA model
• Enhancing Biber’s MDA by incorporating
semantic components with grammatical
categories
– Wmatrix = CLAWS + USAS
– A total of 141 linguistic features investigated
• 109 features retained in the final model
– Five million words in 2,500 text samples, with one
million for each of the 5 varieties of English
• ICE – GB, HK, India, Singapore, the Philippines
• 300 spoken + 200 written samples
• 12 registers ranging from private conversation to academic
writing
7
ICE registers and proportions
S1A (20%)
Spoken – Private
S1B (16%)
Spoken – Public
S2A (14%)
Spoken – Monologue – Unscripted
S2B (10%)
Spoken – Monologue – Scripted
W1A (4%)
Written – Non-printed – Non-professional writing
W1B (6%)
Written – Non-printed – Correspondence
W2A (8%)
Written – Printed – Academic writing
W2B (8%)
Written – Printed – Non-academic writing
W2C (4%)
Written – Printed – Reportage
W2D (4%)
Written – Printed – Instructional writing
W2E (2%)
Written – Printed – Persuasive writing
W2F (4%)
Written – Printed – Creative writing
8
141 linguistic features covered
• A) Nouns 21 categories, e.g.
– nominalisation, other nouns; 19 semantic classes of
nouns (e.g. evaluations, speech acts)
• B) Verbs: 28 categories, e.g.
– Do as pro-verb, be as main verb, tense and aspect
markers, modals, passives, 16 semantic categories of
verbs
• C) Pronouns: 10 categories, e.g.
– Person, case, demonstrative
• D) Adjectives: 11 categories, e.g.
– Attributive vs. predicative use, 9 semantic categories
9
141 linguistic features covered
•
•
•
•
•
•
•
•
•
•
E) Adverbs: 7 categories
F) Prepositions (2 categories)
G) Subordination (3 categories)
H) Coordination (2 categories)
I) WH-questions / clauses (2 categories)
J) Nominal post-modifying clauses (5 categories)
K) THAT-complement clauses (3 categories)
L) Infinitive clauses (3 categories)
M) Participle clauses (2 categories)
N) Reduced forms and dispreferred structures (4
categories)
• O) Lexical and structural complexity (3 categories)
10
141 Linguistic features covered
•
•
•
•
•
•
•
•
•
•
P) Quantifiers (4 categories)
Q) Time expressions (11 categories)
R) Degree expressions (8 categories)
S) Negation (2 categories)
T) Power relationship (4 categories)
U) Definiteness (2 categories)
V) Helping/hindrance (2 categories)
X) Linear order (1 category)
Y) Seem / Appear (1 category)
Z) Discourse bin (1 category)
11
Procedure of data analysis
• 1) Data clean-up
• 2) Grammatical and semantic tagging with Wmatrix
• 3) Extracting the frequencies of 141 linguistic features
from 2,500 corpus files
• 4) Building a profile of normalised frequencies (per 1,000
words) for each linguistic feature
• 5) Factor analysis
– Factor extraction (Principal Factor Analysis)
– Factor rotation (Pramax)
– Optimum structure: 9 factors
• 6) Interpreting extracted factors
• 7) Computing factor scores
• 8) Using the enhanced MDA model in exploration of
variation across registers and language varieties
12
The enhanced MDA model
• Nine factors established in the new model
– 1) Interactive casual discourse vs. informative
elaborate discourse
– 2) Elaborative online evaluation
– 3) Narrative concern
– 4) Human vs. object description
– 5) Future projection
– 6) Personal impression and judgement
– 7) Lack of temporal / locative focus
– 8) Concern with degree and quantity
– 9) Concern with reported speech
• Robustness of the model in register analysis
13
5 English varieties across 9 factors
5
Factor score
0
Factor Factor Factor Factor Factor Factor Factor Factor Factor
1
2
3
4
5
6
7
8
9
-5
GB
HK
IN
-10
PH
SG
-15
-20
Factors
• Both differences and similarities
• This general picture may blur many register-based subtleties
– Language can vary across registers even more substantially than
across language varieties (cf. Biber 1995)
14
1) Interactive casual discourse vs.
informative elaborate discourse
Register
Factor score
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
60
50
40
30
20
10
0
-10
-20
-30
-40
-50
F=9.04,
4 d.f.
p<0.001
GB
•
HK
IN
PH
SG
Indian English displays the lowest score in nearly all registers - it is less
interactive but more elaborate
– Sanyal (2007): “clumsy Victorian English [that] hangs like a dead Albatross
around each educated Indian’s neck”
•
•
Modern BrE appears to be most interactive and least elaborate (e.g. S1A,
S1B, W2D)
3 varieties of English used in East and Southeast Asia are very similar
15
2) Elaborative online evaluation
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
8
F=14.13
4 d.f.
p<0.001
Factor score
6
4
2
0
-2
-4
-6
GB
HK
IN
PH
SG
• BrE generally shows a higher score than non-native varieties of
English (e.g. W2A, W1B, S2B)
• Non-native English varieties tend to be very similar in most registers
16
3) Narrative concern
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
8
Factor score
6
F=7.97
4 d.f.
p<0.001
4
2
0
-2
-4
-6
-8
GB
HK
IN
PH
SG
• BrE demonstrates a greater propensity for narrative concern
– Most noticeably in news reportage (W2C) and instructional writing (W2D)
• Indian English is least concerned with narrative
– Esp. in registers like correspondence (W1B), instructional writing (W2D),
and unscripted monologue (S2A)
17
4) Human vs. object description
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
3
Factor score
2
1
F=5.92
4 d.f.
p<0.001
0
-1
-2
-3
-4
-5
-6
GB
HK
IN
PH
SG
• Very close in a number of registers
• Indian English and BrE show similarity in a greater range of registers
• HK and Singapore Englishes display great similarity
18
5) Future projection
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
10
F=47.63
4 d.f.
p<0.001
Factor score
8
6
4
2
0
-2
-4
-6
-8
GB
HK
IN
PH
SG
• BrE has the highest score in all printed written registers (W2A–W2F)
• Indian English shows the lowest score in nearly all registers
19
6) Personal impression / judgement
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
10
Factor score
8
6
F=12.25
4 d.f.
p<0.001
4
2
0
-2
-4
GB
HK
IN
PH
SG
• Very similar in many registers…with most noticeable differences in
non-printed written registers (W1A, W1B), non-academic writing
(W2B), and news reportage (W2C)
• HK English displays a distribution pattern similar to Singapore
English in spoken registers (S1A–S2B) and unpublished written
registers (W1A, W1B), but it is very close to Philippine English in
printed writing (W2A–W2F)
20
7) Lack of temporal / locative focus
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
4
Factor score
2
0
F=2.28
4 d.f.
p=0.058
-2
-4
-6
-8
-10
-12
GB
HK
IN
PH
SG
• Overall difference is not significant statistically
– …but there are noticeable differences in some registers (e.g. W1B,
W2D)
• Indian English demonstrates a consistently higher score in spoken
registers (S1A-S2B)
– …but a lower score in unpublished writing (e.g. W1B)
21
8) Concern with degree / quantity
Register
Factor score
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
F=24.32
4 d.f.
p<0.001
GB
•
•
•
HK
IN
PH
SG
BrE generally displays a higher score in nearly all registers
HK English does not appear to be concerned with degree and quantity (e.g.
W2D)
Similarly Indian English also lacks a focus on degree and quantity (e.g.
W1B)
22
9) Concern with reported speech
Register
S1A S1B S2A S2B W1A W1B W2A W2B W2C W2D W2E W2F
10
F=1.51
4 d.f.
p=0.196
Factor score
8
6
4
2
0
-2
-4
-6
GB
HK
IN
PH
SG
• Overall difference is not significant
• Noticeable difference in news reportage (W2C)
– East and Southeast Asian English varieties show a greater propensity for
concern with reported speech than BrE and Indian English
23
Summary and future research
• Summary
– Seeking to enhance Biber’s MDA model with
semantic components
– Introducing the new model in research of World
Englishes
• Directions for future research
– More native English varieties from the Inner Circle
– A wider and more balanced coverage of geographical
regions
– Including socio-culturally relevant semantic categories
– Combining corpora and more traditional resources in
socio-cultural studies and historical research
• …adequately descriptive + sufficiently explanatory…
24
Thank you!
25
Descargar

Using an enhanced MDA model in study of World …