What cross-linguistic variation
tells us about information
density in on-line processing
John A. Hawkins
UC Davis & University of
Cambridge
Patterns of variation across languages
provide relevant evidence for current issues
in psychology on information density in online processing.
2
Some background, first of all.
I have argued (Hawkins 1994, 2004, 2009, to
appear) for a ‘Performance-Grammar
Correspondence Hypothesis’:
3
Performance-Grammar Correspondence
Hypothesis (PGCH)
Languages have conventionalized grammatical
properties in proportion to their degree of
preference in performance, as evidenced by
patterns of selection in corpora and by ease of
processing in psycholinguistic experiments.
4
I.e. languages have conventionalized or ‘fixed’ in
their grammars the same kinds of preferences and
principles that we see in performance,
esp. in those languages in which speakers have
alternatives to choose from in language use
5
E.g. between:
alternative word orders
relative clauses with or without a relativizer,
with a gap or a resumptive pronoun
extraposed vs non-extraposed phrases
‘Heavy’ NP Shift or no shift
alternative ditransitive constructions
zero vs non-zero case markers
and so on
6
The patterns and principles found in these
selections are, according to the PGCH, the same
patterns and principles that we see in grammars in
languages with fewer conventionalized options
(more fixed orderings, gaps only in certain
relativization environments, etc).
7
If so, linguists developing theories of grammar and
of typological variation need to look seriously at
theories of processing, in order to understand which
structures are selected in performance, when, and
why, with the result that grammars come to
conventionalize these, and not other, patterns.
See Hawkins (2004, 2009, to appear)
8
Conversely, psychologists need to look at
grammars and at cross-linguistic variation in order
to see what they tell us about processing. since
grammars are conventionalized processing
preferences.
9
Alternative variants across grammars are also, by
hypothesis, alternatives for efficient processing.
And the frequency with which these alternatives are
conventionalized is, again by hypothesis, correlated with
their degree of preference and efficiency in processing.
10
Looking at grammatical variation from a processing
perspective can be revealing, therefore.
11
E.g. Japanese, Korean, Dravidian languages do not move
heavy and complex phrases to the end of their clauses, like
English does, they move them to the beginning, in
proportion to their (relative) complexity.
If your psychological model predicts that all languages
should be like English, then you need to go back to the
drawing board and look at these different grammars, and at
their performance, before you define and test your model
further.
12
Which brings me to today’s topic:
What do grammars and typological variation tell us
about information density in on-line processing?
13
Let us define Information as:
the set of linguistic forms {F} (phonemes,
morphemes, words, etc) and the set of
properties {P} (ultimately semantic properties
in a semantic representation) that are
assigned to them by linguistic convention
and in processing.
14
Let us define Density as:
the number of these forms and properties
that are assigned at a particular point in
processing, i.e. the size of a given {Fi}-{Pi}
pairing at point … i … in on-line
comprehension or production.
15
I see evidence for two very general and
complementary principles of information density in
cross-linguistic patterns.
16
First, minimize {Fi}
minimize the set {Fi} required for the
assignment of a particular Pi or {Pi}
I.e. minimize the number of linguistic forms that
need to be processed at each point in order to
assign a given morphological, syntactic or
semantic property or set of properties to these
forms on-line.
17
The conditions that determine the degree of
permissible minimization can be inferred from the
patterns themselves and essentially involve
efficiency and ease of processing in the
assignment of {Pi} to {Fi}.
18
Examples will be given from morphological
hierarchies and from syntactic patterns such as
word order and filler-gap dependencies.
19
Second, maximize {Pi}
maximize the set {Pi} that can be
assigned to a particular Fi or {Fi}.
I.e. select and arrange linguistic forms so that as many as
possible of their (correct) syntactic and semantic properties
can be assigned to them at each point in on-line
processing.
20
A set of linear ordering universals will be
presented in which category A is systematically
preferred before B regardless of language type,
i.e. A + B. Positioning B first would always result in
incomplete or incorrect assignments of properties
to B on-line, whereas positioning it after A permits
the full assignment of properties to B at the time it
is processed.
These universals provide systematic evidence for
maximize {Pi}.
21
Consider first some grammatical patterns from morphology
that support the minimize {Fi} principle
minimize the set {Fi} required for the
assignment of a particular Pi or {Pi}
22
In Hawkins (2004) I formulated the following principle of
form minimization based on parallel data from crosslinguistic variation and language-internal selection patterns.
23
Minimize Forms (MiF)
The human processor prefers to minimize the formal
complexity of each linguistic form F (its phoneme,
morpheme, word or phrasal units) and the number of
forms with unique conventionalized property
assignments, thereby assigning more properties to fewer
forms. These minimizations apply in proportion to the
ease with which a given property P can be assigned in
processing to a given F.
24
The basic premise of MiF is that the processing of linguistic
forms and their conventionalized property assignments
requires effort. Minimizing the forms required for property
assignments is efficient since it reduces that effort by finetuning it to information that is already active in processing
through accessibility, high frequency, and inferencing
strategies of various kinds.
25
MiF is visible in two sets of variation data across and within
languages.
The first involves complexity differences between surface
forms (morphology and syntax), with preferences for
minimal expression (e.g. zero morphemes) in proportion to
their frequency of occurrence and hence ease of processing
through degree of expectedness (cf. Levy 2008, Jaeger
2006).
26
E.g. singular number for nouns is much more frequent than
plural, absolutive case is more frequent than ergative.
Correspondingly singularity on nouns is expressed by
shorter or equal morphemes, often zero (cf. English cat vs.
cat-s), almost never by more. Similarly for absolutive and
ergative case marking.
27
A second data pattern captured in MiF involves the number
and nature of lexical and grammatical distinctions that
languages conventionalize.
The preferences are again in proportion to their efficiency,
including frequency of use.
28
There are preferred lexicalization patterns across
languages.
Certain grammatical distinctions are cross-linguistically
preferred:
certain numbers on nouns
certain tenses
aspects
causativity
some basic speech act types
thematical roles like Agent, Patient
etc
29
The result is numerous ‘hierarchies’ of lexical and
grammatical patterns
E.g. the famous color term hierarchy of Berlin & Kay
(1969), and the Greenbergian morphological hierarchies
30
Where we have comparative performance and grammatical
data for these hierarchies it is very clear that the
grammatical rankings (e.g. Singular > Plural) correspond to
a frequency/ease of processing ranking, with higher
positions receiving less or equal formal marking and more
or equal unique forms for the expression of that category
alone.
31
Form Minimization Prediction 1
The formal complexity of each F is reduced in proportion to
the frequency of that F and/or the processing ease of
assigning a given P to a reduced F (e.g. to zero).
32
The cross-linguistic effects of this can be seen in the
following Greenbergian (1966) morphological hierarchies
(with reformulations and revisions by the authors shown):
33
Sing > Plur > Dual > Trial/Paucal (for number)
[Greenberg 1966, Croft 2003]
Nom/Abs > Acc/Erg > Dat > Other (for case marking)
[Primus 1999]
Masc,Fem > Neut (for gender) [Hawkins 2004]
Positive > Comparative > Superlative [Greenberg 1966]
34
Greenberg pointed out that these grammatical hierarchies
define performance frequency rankings for the relevant
properties in each domain.
The frequencies of number inflections on nouns in a corpus
of Sanskrit, for example, were:
Singular = 70.3%; Plural = 25.1%; Dual = 4.6%
35
By MiF Prediction 1 we therefore expect:
For each hierarchy H the amount of formal marking (i.e.
phonological and morphological complexity) will be greater
or equal down each hierarchy position.
36
E.g. in (Austronesian) Manam:
3rd Singular suffix on nouns = 0
3rd Plural suffix = -di,
3rd Dual suffix = -di-a-ru
3rd Paucal = -di-a-to
(Lichtenberk 1983)
The amount of formal marking increases from singular to
plural, and from plural to dual, and is equal from dual to
paucal, in accordance with the hierarchy prediction.
37
Form Minimization Prediction 2
The number of unique F:P pairings in a language is
reduced by grammaticalizing or lexicalizing a given F:P in
proportion to the frequency and preferred expressiveness
of that P in performance.
38
In the lexicon the property associated with teacher is
frequently used in performance, that of teacher who is late
for class much less so. The event of X hitting Y is
frequently selected, that of X hitting Y with X’s right hand
less so.
The more frequently selected properties are
conventionalized in single lexemes or unique categories
and constructions. Less frequently used properties must
then be expressed through word and phrase combinations
and their meanings must be derived by semantic
composition.
39
This makes the expression of more frequently used
meanings shorter, that of less frequently used meanings
longer, and this pattern matches the first pattern of less
versus more complexity in the surface forms themselves
correlating with relative frequency.
Both patterns make utterances shorter and the
communication of meanings more efficient overall, which is
why I have collapsed them both into one common Minimize
Forms principle.
40
By MiF Prediction 2 we expect:
For each hierarchy H (A > B > C) if a language assigns at
least one morpheme uniquely to C, then it assigns at least
one uniquely to B; if it assigns at least one uniquely to B, it
does so to A.
41
E.g.a distinct Dual implies a distinct Plural and Singular in
the grammar of Sanskrit.
A distinct Dative implies a distinct Accusative and
Nominative in the case grammar of Latin and German
(or a distinct Ergative and Absolutive in Basque, cf. Primus
1999).
42
A unique number or case assignment low in the hierarchy
implies unique and differentiated numbers and cases in all
higher positions.
43
I.e. grammars prioritize categories for unique formal
expression in each of these areas in proportion to their
relative frequency and preferred expressiveness.
This results in these hierarchies for conventionalized
categories whereby languages with fewer categories match
the performance frequency rankings of languages with
many.
44
By MiF Prediction 2 we also expect:
For each hierarchy H any combinatorial features that
partition references to a given position on H will result in
fewer or equal morphological distinctions down each lower
position of H.
45
E.g. when gender features combine with and partition
number, unique gender-distinctive pronouns often exist for
the singular and not for the plural
English he/she/it vs they
the reverse uniqueness is not found (i.e. with a genderdistinctive plural, but gender-neutral singular).
46
More generally MiF Prediction 2 leads to a general principle
of cross-linguistic morphology:
Morphologization
A morphological distinction will be grammaticalized
in proportion to the performance frequency with
which it can uniquely identify a given subset of
entities {E} in a grammatical and/or semantic domain
D.
47
This enables us to make sense of ‘markedness reversals’.
E.g. in certain nouns in Welsh whose referents are much
more frequently plural than singular, like ‘leaves’ and
‘beans’, it is the singular form that is morphologically more
complex than the plural:
deilen ("leaf") vs. dail ("leaves")
ffäen ("bean") vs. ffa ("beans")
Cf. Haspelmath (2002:244).
48
All of these data provide support for our minimize {Fi}
principle:
minimize the set {Fi} required for the
assignment of a particular Pi or {Pi}
I.e. minimize the number of linguistic forms that need to be
processed at each point in order to assign a given
morphological, syntactic or semantic property or set of
properties to these forms on-line.
49
Either the surface forms of the morphology are reduced,
in proportion to frequency and/or ease of processing.
Or lexical and grammatical categories are given priority for
unique formal expression, in proportion to frequency
and/or preferred expression, resulting in reduced
morpheme and word combinations for their expression.
50
The result of both is more minimal forms in proportion to
frequency/ease of processing/preferred expressiveness, i.e.
fewer and shorter forms for the expression of the speakers’
preferred meanings in performance.
51
Consider now some patterns from syntax that support the
minimize {Fi} principle
minimize the set {Fi} required for the
assignment of a particular Pi or {Pi}
52
In Hawkins (2004) I formulated a second minimization
principle for the combination of forms and dependencies
between them based on parallel data from cross-linguistic
variation and language-internal selection patterns:
Minimize Domains (MiD).
53
Minimize Domains (MiD)
The human processor prefers to minimize the connected
sequences of linguistic forms and their conventionally
associated syntactic and semantic properties in which
relations of combination and/or dependency are
processed.
54
E.g. in order to recognize how the words of a sentence are
grouped together into phrases and into a hierarchical tree
structure the human parser prefers to access the smallest
possible linear string of words that enable it to make each
phrase structure decision:
the principle of Early Immediate Constituents (EIC)
(Hawkins 1994).
55
more generally the processing of all syntactic and semantic
relations prefers minimal domains (Hawkins 2004).
56
Minimize Domains predicts that each Phrasal Combination
Domain (PCD) should be as short as possible.
A PCD consists of the smallest amount of surface
structure on the basis of which the human processor can
recognize (and produce) a mother node M and assign
the correct daughter ICs to it, i.e. on the basis of which
phrase structure can be processed.
57
Some linear orderings reduce the number of words and
their associated properties that need to be accessed for this
purpose.
The degree of this preference is proportional to the
minimization difference for the same PCDs in competing
orderings.
58
I.e. linear orderings should be preferred that minimize
PCDs by maximizing their “IC-to-word” ratios.
The result will be a preference for short before long
phrases in head-initial languages like English.
59
(1) a. The man vp[waited pp1[for his son] pp2[in the cold but not unpleasant wind]]
1
2 3 4
5
----------------------------------b. The man vp[waited pp2[in the cold but not unpleasant wind] pp1[for his son]]
1
2 3 4 5 6
7
8
9
-----------------------------------------------------------------
The three items, V, PP1, PP2 can be recognized and constructed on
the basis of five words in (1a), compared with nine in (1b), assuming
that (head) categories such as P immediately project to mother
nodes such as PP, enabling the parser to construct them on-line.
(1a) VP PCD: IC-to-word ratio of 3/5 = 60%
(1b) ------------------------------------- 3/9 = 33%
60
For experimental support (in production and
comprehension) for short before long effects in English,
see e.g. Stallings (1998), Gibson (1998), Wasow (2002).
61
A Corpus Study Testing MiD in English
Structures like (1ab) with vp{V, PP1, PP2} were examined
(Hawkins 2000) in which the two PPs were permutable
with truth-conditional equivalence (i.e. the speaker had
a choice).
Only 15% (58/394) had long before short. Among those
with at least a one-word weight difference, 82% had
short before long, and there was a gradual reduction in
the long before short orders the bigger the weight
difference (PPS = shorter PP, PPL = longer PP):
62
(2)
[V PPS PPL]
[V PPL PPS]
PPL > PPS by 1 word
60% (58)
40% (38)
by 2-4
86% (108)
14% (17)
by 5-6
by 7+
94% (31)
6% (2)
99% (68)
1% (1)
63
For head-final languages long before short orders provide minimal
domains for processing phrase structure:
(3) a. Mary ga [[kinoo John ga
kekkonsi-ta to]s it-ta]vp
Mary SU yesterday John SU
married
that said,
‘Mary said that John got married yesterday’
b. [kinoo John ga kekkonsi-ta to]s Mary ga [it-ta]vp
64
Why?
Because placing longer before shorter phrases in
Japanese positions constructing categories or heads (V,
P, Comp, etc) close, or as close as possible, to each
other, each being on the right of their respective phrasal
sisters.
Result: PCDs are smaller
65
(4) Some basic word orders of Japanese grammar
a.
b.
c.
Taroo ga vp[tegami o kaita]
T. SU
letter DO wrote
'Taroo wrote a letter'
Taroo ga pp[Tokyo kara] ryokoosita
T. SU
Tokyo from travelled
'Taroo travelled from Tokyo'
np[[Taroo no] ie]
Taroo 's
house
NP-V
NP-P
Gen-N
The heavier phrasal categories, e.g. NPs, occur to the left
of their single-word (shorter) heads in Japanese, e.g.
before V and P, and P and V are adjacent on the right of
their respective sisters
66
For experimental and corpus support for long before short
phrases in Japanese and Korean when there is a plurality of
phrases before V, see Hawkins (1994, 2004), Yamashita &
Chang (2001, 2006), Choi (2007)
67
An early corpus study testing long before short in Japanese
(Hawkins 1994):
[{NPo, PPm} V]
(5) a. (Tanaka ga) [[Hanako kara]pp [sono hon o]np
katta]vp
Tanaka SU
Hanako from
that book DO
bought,
'Tanako bought that book from Hanako'
b. (Tanaka ga) [[sono hon o]np [Hanako kara]pp katta]vp
68
ICS = shorter Immediate Constituent; ICL = longer Immediate
Constituent; regardless of NP or PP status
(6) ICL>ICS by 1-2 words
[ICS ICL V]
34% (30)
[ICL ICS V]
66% (59)
by 3-4
by 5-8
28% (8) 17% (4)
72% (21) 83% (20)
by 9+
9% (1)
91% (10)
Data from Hawkins (1994:152), collected by Kaoru Horie.
I.e. the bigger the weight difference, the more the
heavy phrase occurs to the left; the mirror-image
of English
69
Given these data from performance, we can now better
understand:
(a) the Greenbergian word order correlations
(b) why there are two, and only two, productive word order
types cross-linguistically, head-initial and head-final
(c) why and when there are “exceptional” departures from
the expected head-initial and head-final orders
70
The "Greenbergian" word order correlations (Greenberg
1963, Dryer 1992)
(7)
vp{V, pp{P, NP}}
a. vp[travels pp[to the city]]
-------c. vp[travels [the city to]pp]
------------------
b. [[the city to]pp travels]vp
-------d. [pp[to the city] travels]vp
-------------------
The adjacency of V and P guarantees the smallest possible string of
words for the recognition and cnstruction of VP and its two
constituents (V and PP), see the underlinings.
71
Language Quantities in Matthew Dryer's (1992) Cross-linguistic Sample
(8)
a. vp[V pp[P NP]] = 161 (41%)
c. vp[V [NP P]pp] = 18 (5%)
b. [[NP P]pp V]vp = 204 (52%)
d. [pp[P NP] V]vp = 6 (2%)
Preferred (a)+(b) with consistent ‘head’ ordering = 365/389 (94%)
72
Both head-initial (English) and head-final (Japanese)
orders can be equally efficient for processing: whether
heads are adjacent to one another on the left of their
respective sisters (English), or on the right (Japanese),
hence two and only two highly word order productive
types, as predicted by MiD
73
MiD helps us to understand these cross-linguistic patterns
and their frequencies. It also enables us to explain some
systematic grammatical exceptions to these headordering universals.
74
Dryer (1992): there are exceptions to the preferred
consistent head ordering when the category that modifies a
head is a single-word item, e.g. an adjective modifying a
noun (yellow book).
75
Many otherwise head-initial languages have non-initial
heads with the adjective preceding the noun here (e.g.
English), many otherwise head-final languages have
noun before adjective (e.g. Basque).
BUT when the non-head is a branching phrasal category
(e.g. adjective phrase, cf. English books yellow with age)
there are good correlations with the predominant head
ordering.
Why?
76
When heads are separated by a non-branching single
word, then the difference between, say,
vp[V [Adj N]np] and vp[V np[N Adj]]
[read [yellow book]]
[read [book yellow]]
is short, only one word. Hence the MiD preference for
noun initiality (and for noun-finality in postpositional
languages) is significantly less than it is for intervening
branching phrases, and either less head ordering
consistency or no consistency is predicted
77
English [yellow book] but [book [yellow with age]]
Romance languages have both prenominal and
postnominal adjectives
French grand homme / homme grand
but postnominal adjective phrases like English
78
Similarly, when there is just a one-word difference between
competing domains in performance, e.g. in the corpus data
of English and Japanese above, both ordering options are
generally productive, and so too in grammars.
79
Center embedding hierarchies and EIC
The more complex a center-embedded constituent and the longer the
PCD for its containing phrase, the fewer languages.
E.g. in the environment pp[P np[__ N]] we have a center-embedding
hierarchy, cf. Hawkins (1983).
(9) Prep lgs:
AdjN
32%
PosspN 12%
RelN
1%
NAdj
68%
NPossp 88%
NRel
99%
Mary traveled pp[to np[interesting cities]]
np[[this country’s] cities]]
np[[I already visited] cities]]
AdjN
PosspN
RelN
80
I.e. The Greenbergian word order universals support
domain minimization and locality (Hawkins 2004, Gibson
1998).
There are minor and predicted departures from consistent
ordering and head adjacency, as we have seen.
There are also certain conflicts between MiD and other
ease of processing principles, e.g. Fillers before Gaps,
which result in e.g. NRel in certain (non-rigid) OV languages
(Hawkins 2004, to appear).
81
Apart from these, I see no evidence in grammars for any
preference for “non-locality” of the kind that certain
psycholinguists have argued for based on experimental
evidence with head-final languages (e.g. Konieczny 2000,
Vasishth & Lewis 2006).
E.g. Konieczny showed in a self-paced reading experiment
in German that the verb is read systematically faster when a
NRel precedes it, in proportion to the length of Rel.
82
This finding makes sense in terms of expectedness and
predictability (Levy 2008, Jaeger 2006): the longer you
have to wait for a verb in a verb-final structure, the more
you expect to find one, making verb recognition easier.
However, Konieczny found no evidence for this facilitation
at the verb in his German corpus data (Uszkoreit et al.
1998). Instead the predictions made for the relevant
structures by MiD and locality were strongly confirmed.
83
In fact, corpus studies quite generally do not support nonlocality: none of the data from numerous typologically
diverse language corpora reported in Hawkins (1994, 2004)
support it.
84
Nor do word order universals support it. The Greenbergian
correlations strongly support locality, and the exceptions to
Greenberg involve either small single-word non-localities or
competitions with independently motivated preferences
that do produce some non-localities in certain language
types – but not because non-locality is a good thing!
85
The experimental evidence for greater ease of processing
at the verb appears to be evidence, therefore, for a certain
facilitation (arguably through predictability) at a single
temporal point in sentence processing: it tell us nothing,
about processing load for the structure as a whole, and it
does not implicate any preference for non-locality as such.
86
Corpus data appear to reflect these overall processing
advantages for alternative structures within which the verb
may appear early or late. The predictions for these
alternations are based squarely on the preferred locality of
phrasal daughters and these predictions are empirically
correct (Konieczny 2000, Uszkoreit et al. 1998). Nonlocality arises only when the locality demands of two
phrases are in conflict and cannot be satisfied at the same
time.
E.g. if N is adjacent to its Rel in German, then N is
separated from a final V.
87
Grammars also support locality in word order universals
and provide no evidence for non-locality as an independent
factor.
Let us turn now to relative clauses and look at the crosslinguistic evidence for form and domain minimization in this
area.
88
Relative clauses in many languages (e.g. Hebrew) exhibit both a 'gap'
and a 'resumptive pronoun' structure:
(10) a. the studentsi [that I teach Oi]
b. the studentsi [that I teach themi]
Gap
Resumptive Pronoun
In English we find relative clauses with and without a relative pronoun:
(11) a. the studentsi [whomi I teach Oi]
b. the studentsi [Oi I teach Oi]
Relative Pronoun
Zero Relative
89
Patterns in Performance
The retention of the relative pronoun in English is correlated, inter alia,
with the degree of separation of the relative clause from its head noun:
the bigger the separation, the more the rel pros are retained (Quirk
1957, Hawkins 2004:153).
90
(12) a. [the studentsi [whomi I teach Oi]] visited me
b. [the studentsi [Oi I teach Oi]] visited me
(13) a. [the studentsi (from Denmark) [whomi I teach Oi]] visited me
b. [the studentsi (from Denmark) [Oi I teach Oi]] visited me
(14) a. [the studentsi (from Denmark)] visited me [whomi I teach Oi]
b. [the studentsi (from Denmark)] visited me [Oi I teach Oi]
(12a)
(13a)
(14a)
Rel Pro = 60%
Rel Pro = 94%
Rel Pro = 99%
(12b)
(13b)
(14b)
Zero Rel = 40%
Zero Rel = 6%
Zero Rel = 1%
91
The Hebrew gap is favored when the distance between head
and gap is small, cf. Ariel (1999):
(15) a. Shoshana hi [ha-ishai
[she-nili ohevet Oi]] Gap
Shoshana is the-woman that-Nili loves
b. Shoshana hi [ha-ishai
[she-nili ohevet otai]] Res Pro
that-Nili loves her
(15a) Gap = 91% (15b) Res Pro = 9%
92
Resumptive pronouns in Hebrew become more frequent in more
complex relatives with bigger distances between the head and the
position relativized on, as in (16b):
(16) a. Shoshana hi ha-ishai [she-dani siper [she-moshe rixel [she-nili ohevet Oi]]]
b. Shoshana hi ha-ishai [she-dani siper [she-moshe rixel [she-nili ohevet otai]]]
Shoshana is the-woman that-Danny said that-Moshe gossiped that-Nili loves (her)
For just 3+ words separating head and position relativized on (i.e. gap
or resumptive pronoun), many more pronouns, Ariel (1999)
(16a) Gap = 58%
(16b) Res Pro = 42%
93
Relative clauses with larger domains are more complex and
harder to process. The harder to process relatives have the
less minimal and more explicit form, in accordance with our
minimize {Fi} principle above.
94
Specifically, the explicit resumptive pronoun makes the relative
easier to process because the position relativized on is now
explicitly signaled and flagged, in contrast to the zero gap, and
because the explicit pronoun shortens various domains for
processing combinatorial and dependency relations within the
relative clause (these processes must otherwise access the head
noun itself), cf. Hawkins (2004)
95
A Cross-linguistic Universal: the Accessibility
Hierarchy
Keenan & Comrie (1977) proposed an Accessibility
Hierarchy (AH) for universal rules of relativization on
different structural positions within a clause:
Subjects > Direct Objects > Indirect Objects/Obliques > Genitives
(17) a. the professori [that Oi/hei wrote the letter]
SU
b. the professori [that the student knows Oi/himi]
DO
c. the professori [that the student showed the book to Oi/himi]
IO/OBL
d. the professori [that the student knows Oi/hisi son] GEN
96
Relative clauses "cut off" (may cease to apply) down AH, cf. (18): if a
language can form a relative clause on any low position, it can
(generally) relativize on all higher positions.
(18)
SU only:
SU & DO only:
SU & DO & IO/OBL only:
SU & DO & IO/OBL & GEN:
Malagasy, Maori
Kinyarwanda, Indonesian
Basque, Catalan
English, Hausa
(19)
ny mpianatrai [izay nahita ny vehivavy Oi]
(Malagasy)
the student
that saw the woman
'the student that saw the woman' (NOT the student that the woman saw)
97
Distribution of gaps to resumptive pronouns across
languages also follows the AH with gaps higher and
pronouns lower:
If a gap occurs low on the hierarchy, it occurs all the way up;
if a pronoun occurs high, it occurs all the way down.
98
Languages Combining Gaps with Resumptive Pronouns
(data from Keenan-Comrie 1977)
Aoban
Arabic
Gilbertese
Kera
Chinese (Peking)
Genoese
Hebrew
Persian
Tongan
Fulani
Greek
Welsh
Zurich German
Toba Batak
Hausa
Shona
Minang-Kabau
Korean
Roviana
Turkish
Yoruba
Malay
Javanese
Japanese
Gaps
=
Res Pros =
SU
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
gap
DO
pro
pro
pro
pro
gap/pro
gap/pro
gap/pro
gap/pro
gap/pro
gap
gap
gap
gap
*
gap
gap
*
gap
gap
gap
gap
gap
*
gap
24 [100%]
0 [0%]
17 [65%]
9 [35%]
IO/OBL
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
gap/pro
gap/pro
*/pro
gap
gap
gap
0
RP
*
gap
6 [26%]
17 [74%]
GEN
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
pro
gap/pro
1 [4%]
24 [96%]
99
Keenan-Comrie argued that these grammatical patterns
were ultimately explainable by declining ease of processing
down the AH
They hypothesized that the AH was a complexity ranking
Cf. Hawkins 1999, 2004:177-190, to appear for elaboration in terms of Minimize
Forms and Minimize Domains
100
Keenan (1987) gave data from English corpora showing
declining frequencies of relative clause usage correlating
with the AH positions relativized on
101
Experimental evidence for SU > (easier than) DO
relativization (English)
Wanner & Maratsos (1978): first pointed to greater processing load for
DO rels
Ford (1983): longer lexical decision times in DO rels
King & Just (1991): lower comprehension accuracy and longer lexical
decision times in self-paced reading experiments
Pickering & Shillcock (1992): significant reaction time differences in
self-paced reading experiments, both within and across clause
boundaries (i.e. for embedded and non-embedded gap positions)
King & Kutas (1992, 1993): neurolinguistic support using ERPs
Traxler et al (2002): eye movement study controlling also for agency
and animacy
Frauenfelder et al (1980) and Holmes & O'Regan (1981): similar (SU >
DO) results for French
Kwon et al (2010): for an eye-tracking study of Korean and a recent
literature review of the SU/DO asymmetry in English and other lgs
102
Let us take stock
We see in these studies a clear correlation between performance data
measuring preferred selections in corpora and ease of processing in
experiments, on the one hand, and the fixed conventions of grammars in
languages with fewer options:
● SU relatives have been shown to be easier to process than DO in English
and certain other lgs - correspondingly lgs like Malagasy only have the SU
option
● the distribution of resumptive pronouns to gaps across grammars follows
the AH ranking, with pronouns in the more difficult environments, and gaps
in the easier ones: this reverse implicational hierarchy appears to be
structured by ease of processing
103
All of these data, morphological and
syntactic, support minimize {Fi}, in
proportion to the ease with which a given
property Pi can be assigned in processing
to a given Fi.
104
Let is turn now to our second principle of Information
Density, maximize {Pi}.
maximize the set {Pi} that can be
assigned to a particular Fi or {Fi}.
105
In Hawkins (2004) I argued for a further very general
principle of efficiency, in addition to Minimize Forms and
Minimize Domains: Maximimize On-line Processing.
There is a clear preference for selecting and arranging
linguistic forms so as to provide the earliest possible access
to as much of the ultimate syntactic and semantic
representation as possible.
106
This principle also results in a preference for error-free online processing since errors delay the assignment of
intended properties and increase processing effort.
107
Maximize On-line Processing (MaOP)
The human processor prefers to maximize the set of properties
that are assignable to each item X as X is processed, thereby
increasing O(n-line) P(roperty) to U(ltimate) P(roperty) ratios. The
maximization difference between competing orders and structures
will be a function of the number of properties that are unassigned
or misassigned to X in a structure/sequence S, compared with the
number in an alternative.
108
Clear examples can be seen across languages when
certain common categories {A, B} are ordered
asymmetrically A + B, regardless of the language type, in
contrast to symmetries in which both orders are
productive [A+B/B+A], e.g. Verb+Object [VO] and
Object+Verb [OV].
Some examples of asymmetries are summarized below:
109
Some Asymmetries (Hawkins 2002, 2004)
(i) Displaced WH preposed to the left of its (gap-containing) clause
[almost exceptionless]
Whoi [did you say Oi came to the party]
(ii) Head Noun (Filler) to the left of its (gap-containing) Relative Clause
E.g. the studentsi [that I teach Oi]
If a lg has basic VO, then NRel [exceptions = rare] (Hawkins 1983)
VO
OV
NRel (English)
NRel (Persian)
*RelN
RelN (Japanese)
110
(iii) Antecedent precedes Anaphor [highly preferred cross-linguistically]
E.g. John washed himself (SVO), Washed John himself (VSO),
John himself washed (SOV) = highly preferred over e.g. Washed
himself John (VOS)
(iv) Wide Scope Quantifier/Operator precedes Narrow Scope Q/O [preferred]
E.g. Every student a book read (SOV lgs)  preferred
A book every student read (SOV lgs)  preferred
111
In these examples there is an asymmetric dependency of B on A: the
gap is dependent on the head-noun filler in (ii) (for gap-filling), the
anaphor on its antecedent in (iii) (for co-indexation), the narrow
scope quantifier on the wide scope quantifier in (iv) (the number of
books read depends on the quantifier in the subject NP in Every
student read a book/Many students read a book/Three students read
a book, etc).
112
The assignment of dependent properties to B is more efficient when A
precedes, since these properties can be assigned to B immediately in
on-line processing. In the reverse B + A there will be delays in property
assignments on-line ("unassignments") or misanalyses
("misassignments").
If the relative clause precedes the head noun the gap is not immediately
recognized and there are delays in argument structure assignment
within the relative clause; if a narrow scope quantifier precedes a wide
scope quantifier, a wide scope interpretation will generally be
(mis)assigned on-line to the narrow scope quantifier; and so on.
113
I have argued that MaOP (in the form of Fillers before Gaps)
competes with Minimize Domains to give asymmetries in relative
clause ordering:
a head before relative clause preference is visible in both VO and
OV languages, with only rigid V-final languages resisting this
preference to any degree (Hawkins 2004:203-10).
114
VO & NRel:
VO & RelN:
OV & RelN:
OV & NRel:
MiD
+
+
-
MaOP
+
+
115
WALS data (Dryer 2005ab):
Rel-Noun
Rigid SOV
50% (17)
Non-rigid SOV
0% (0)
VO
3% (3)
Noun-Rel or Mixed/Other
50% (17)
100% (17)
97% (116)
116
Language Variation in Psycholinguistics
What this all means for psycholinguistics is that
grammatical patterns and rules provide data that can
inform language processing theories (Hawkins 2007, Jaeger &
Norcliffe 2009).
Conversely, processing can help us understand
grammars better.
117
We can now give an explanation for what has been simply
observed and stipulated so far in grammatical models,
e.g. the existence of a head ordering parameter, with
head-initial (VO) and head-final (OV) lgs being roughly
equally productive:
they are equally efficient for processing whether adjacent
heads occur on the left of their sisters (English), or on the
right (Japanese).
118
Performance data motivate the Accessibility Hierarchy for
relative clause formation, the cut-offs for relativization, the
reverse implicational patterns for gaps and resumptive
pronouns, and numerous other regularities and languageparticular subtleties (Hawkins 1999, 2004, to appear).
119
This approach helps us understand exceptions to proposed
universals (involving e.g. differential ordering for singleword versus phrasal modifiers of heads).
I.e. linguists can benefit from the inclusion of processing
ideas in their theories and descriptions.
120
The leftward versus rightward movement of heavy phrases
in different language types is directly relevant for
processing theories, on the other hand (cf. the theory of de
Smedt 1994 which predicts only rightward movements).
As is the absence of any independent evidence for “antilocality” in any word order universals.
121
For theories of information density we have seen lots of
cross-linguistic patterns and hierarchies in morphology and
syntax that support two complementary principles:
minimize {Fi} and maximize {Pi}
122
Minimize {Fi}
minimize the set {Fi} required for the
assignment of a particular Pi or {Pi}
in proportion to the processing ease with which each Pi can
be assigned.
123
Maximize {Pi}
Maximize the set {Pi} that can be
assigned to a particular Fi or {Fi}
at each point in on-line processing.
124
References
Ariel, M. (1999) 'Cognitive universals and linguistic conventions: The case of resumptive
pronouns', Studies in Language 23:217-269.
Choi, H.W. (2007) ‘Length and order: A corpus study of Korean dative-accusative
construction’, Discourse and Cognition 14: 207-27.
Croft, W. (1990) Typology and Universals, CUP, Cambridge.
de Smedt, K.J.M.J. (1994) 'Parallelism in incremental sentence generation', in G.
Adriens & U. Hahn, eds., Parallelism in Natural Language Processing, Ablex,
Norwood, NJ.
Dryer, M.S. (1992) 'The Greenbergian word order correlations', Language 68: 81-138.
Dryer, M.S. (2005a) ‘Order of relative clause and noun’, in M. Haspelmath, M.S. Dryer,
D. Gil & B. Comrie, eds., The World Atlas of Language Structures, OUP, Oxford.
Dryer, M.S. (2005b) ‘Relationship between the order of object and verb and the order of
relative clause and noun’, in M. Haspelmath, M.S. Dryer, D. Gil & B. Comrie, eds.,
The World Atlas of Language Structures, OUP, Oxford.
Ford, M. (1983) 'A method of obtaining measures of local parsing complexity throughout
sentences', Journal of Verbal Learning and Verbal Behavior 22: 203-218.
Gibson, E. (1998) 'Linguistic complexity: Locality of syntactic dependencies', Cognition
68: 1-76.
Greenberg, J.H. (1963) 'Some universals of grammar with particular reference to the
order of meaningful elements', in J.H. Greenberg, ed., Universals of Language, MIT
Press, Cambridge, Mass..
Greenberg, J.H. (1966) Language Universals with Special Reference to Feature
Hierarchies, Mouton, The Hague.
Haspelmath, M. (2002) Morphology, Arnold, London.
Hawkins, J.A. (1983) Word Order Universals, Academic Press, New York.
125
Hawkins, J.A. (1994) A Performance Theory of Order and Constituency, CUP,
Cambridge.
Hawkins, J.A. (1999) 'Processing complexity and filler-gap dependencies', Language 75:
244-285
Hawkins, J.A. (2000) 'The relative ordering of prepositional phrases in English: Going
beyond manner-place-time', Language Variation and Change 11: 231-266.
Hawkins, J.A. (2004) Efficiency and Complexity in Grammars, OUP, Oxford.
Hawkins, J.A. (2007) ‘Processing typology and why psychologists need to know about it’,
New Ideas in Psychology 25: 87-107.
Hawkins, J.A. (2009) ‘Language universals and the performance-grammar
correspondence hypothesis’, in M.H. Christiansen, C. Collins & S. Edelman, eds.,
Language Universals, OUP, Oxford, 54-78.
Hawkins, J.A. (to appear) Cross-linguistic Variation and Efficiency, OUP, Oxford.
Holmes, V.M. & O'Regan, J.K. (1981) 'Eye fixation patterns during the reading of relative
clause sentences', Journal of Verbal Learning and Verbal Behavior 20: 417-430.
Jaeger, T.F. (2006) ‘Redundancy and syntactic reduction in spontaneous speech’,
Unpublished PhD dissertation, Stanford University, Stanford, CA.
Jaeger, T.F. & Norcliffe, E. (2009) ‘The cross-linguistic study of sentence production:
State of the art and a call for action’, Language and Linguistics Compass, Blackwell.
Just, M.A. & Carpenter, P.A. (1992) 'A capacity theory of comprehension: Individual
differences in working memory', Psychological Review 99:122-49.
Keenan, E.L. (1987) ‘Variation in Universal Grammar’, in E.L. Keenan Universal
Grammar: 15 Essays, Croom Helm, London, 46-59.
126
Keenan, E.L. & Hawkins, S. (1987) 'The psychological validity of the Accessibility
Hierarchy', in E.L. Keenan, Universal Grammar: 15 Essays, Croom Helm, London.
King, J. & Just, M.A. (1991) 'Individual differences in syntactic processing: The role of
working memory', Journal of Memory and Language 30: 580-602.
King, J. & Kutas, M. (1992) 'ERP responses to sentences that vary in syntactic
complexity: Differences between good and poor comprehenders', Poster, Annual
Conference of the Society for Psychophysiological Research, San Diego, CA.
King, J. & Kutas, M. (1993) 'Bridging gaps with longer spans: Enhancing ERP studies of
parsing', Poster presented at the Sixth Annual CUNY Sentence Processing
Conference, University of Massachusetts, Amherst.
Konieczny, L. (2000) ‘Locality and parsing complexity’, Journal of Psycholinguistic
Research 29(6): 627-645.
Kwon, N., Gordon, P.C., Lee, Y., Kluender, R. & Polinsky, M. (2010) ‘Cognitive and
linguistic factors affecting subject/object asymmetry: An eye-tracking study of
prenominal relative clauses in Korean’, Language 86: 546-82.
Levy, R. (2008) ‘Expectation-based syntactic comprehension’, Cognition 106: 1126-1177.
Lichtenberk, F. (1983) A Grammar of Manam, University of Hawaii Press, Honolulu.
Primus, B. (1999) Cases and Thematic Roles, Max Niemeyer Verlag, Tuebingen.
Quirk (1957) 'Relative clauses in educated spoken English', English Studies 38: 97-109.
Keenan, E.L. & Comrie, B. (1977) 'Noun phrase accessibility and Universal Grammar',
Linguistic Inquiry 8: 63-99.
Stallings, L. M. (1998) 'Evaluating Heaviness: Relative Weight in the Spoken Production
of Heavy-NP Shift', Ph.D. dissertation, University of Southern California.
Traxler, M.J., Morris, R.K. & Seeley, R.E. (2002) ‘Processing subject and object relative
clauses: Evidence from eye movements’, Journal of Memory and Language 47: 6990.
127
Uszkoreit, H., Brants, T., Duchier, D., Krenn, B., Konieczny, L., Oepen, S. and Skut, W.
(1998) ‘Studien zur performanzorientierten Linguistik: Aspekte der
Relativsatzextraposition im Deutschen’, Kognitionswissenschaft 7: 129-133.
Vasishth, S & Lewis, R. (2006) ‘Argument-head distance and processing complexity:
Explaining both locality and anti-locality effects’, Language 82: 767-794.
Wanner, E. & Maratsos, M. (1978) 'An ATN approach to comprehension', in M. Halle, J.
Bresnan & G.A. Miller, eds., Linguistic Theory and Psychological Reality, MIT Press,
Cambridge, Mass., 119-161.
Wasow, T. (2002) Postverbal Behavior, CSLI Publications, Stanford University, Stanford.
Yamashita, H. & Chang, F. (2001) '"Long before short" preference in the production of a
head-final language', Cognition, 81: B45-B55.
Yamashita, H. & Chang, F. (2006) ‘Sentence production in Japanese’, in M. Nakayama, R.
Mazuka & Y. Shirai, eds., Handbook of East Asian Psycholinguistics, Vol.2, CUP,
Cambridge.
128
Acknowledgements
Special thanks to the many collaborators and contributors to
this research program as presented here, especially:
Gontzal Aldai
Bernard Comrie
Gisbert Fanselow
Luna Filipovic
Kaoru Horie
Ed Keenan
Lewis Lawyer
Barbara Jansing
Stephen Matthews
Fritz Newmeyer
Beatrice Primus
Anna Siewierska
Lynne Stallings
Tom Wasow
129
Financial Support
has been received from the following sources for the
research reported here and is gratefully acknowledged:
German National Science Foundation fellowship (DFG grant INK 12/A1)
European Science Foundation small grant
Max Planck Institute for Evolutionary Anthropology (Leipzig) research
fellowships 2000-04
University of California Davis research funds
University of Cambridge Research Centre for English and Applied
Linguistics research funds and UCD teaching buy-outs 2007-10
130
Descargar

Language Universals and their Relevance for Psychologists