Rules or Connections in Past Tense Inflections:
What does the Evidence Rule Out?
Stanford Cognitive Core Class
May 9, 2007
Is there a past tense rule?
• Early on, children often produce exceptional past tenses
correctly (went, took, etc).
• But at some point, they also produce ‘regularizations’.
• Also, children (and adults) usually produce ‘regular’
inflections for novel items when prompted, as in:
this man is ricking… yesterday he ____.
• [Some responses are, however similar to some
• Regularizations errors and regular responses to
nonwords were once taken as suggesting that young
children discover ‘the past tense rule’.
• The fact that children learn exceptions was explained by
‘memorization’ or ‘lexical lookup’.
An Alternative to Assuming that Children
‘Acquire’ the Past-Tense Rule
• Rumelhart and McClelland proposed that the past tense
reflects regularities captured in the connections among units
in a connectionist system that learns from examples.
• We demonstrated this by implementing and running a simple
model of how people perform the task.
• The resulting debate has been fierce… and there are many
who still think our approach is misguided.
• But as I’ll try to show the evidence appears to be consistent
with our perspective.
• The work illustrates how models influence empirical research;
how easy it is to get misleading results in experiments; and
how important it is for researchers on each side of an issue to
follow up on each other’s findings.
• The RM model, introducing the connectionist alternative
• A key concept arising from the connectionist program:
– Quasi-regularity: The tendency for exceptions to
partially confirm to the regular pattern
• Early critiques and responses lead to:
– The Pinker symbolic, dual mechanism account
– Accumulation of arguments and evidence eventually
support the connectionist, single-mechanism
The Rumelhart-McClelland 1986 past-tense model
Training and Testing Procedure
• Training:
– Present WF pattern representing present tense of verb.
– Compute WF pattern representing past tense of verb using
stochastic sigmoid activation function.
– Compare computed past-tense pattern to correct past tense
– Adjust connections using Perceptron Convergence Procedure:
• Increase strength of connection from active input units to
output units that should be active but are not.
• Decrease strength of connection from active input units to
output units that should not be active but are.
• Testing:
Present WF pattern of present tense of verb.
Compute WF pattern.
Compare to various alternatives on various measures.
OR: Generate output using fixed decoding net.
Sigmoid Activation Function
Training Regime
• First ten epochs use 10 most frequent words
– Feel, have, make, get, give, take, come, go, look, need
• Remainder of training uses 10 most frequent
plus 400 words of ‘middle frequency’
• Each word is presented once per epoch
• An additional 84 lower-frequency words is
saved for generalization testing
Recapitulation of U-shaped learning
Responses to t/d
and other verbs
The Tendency of Irregular Forms to
Conform Partially to the Regular Pattern
Most exceptions are quasi-regular:
• Some add d or t but delete a consonant or change the vowel
– Had, made; Did, said; sold, told; kept, slept, crept…
• Many verbs that end in /d/ or /t/ either have no change or
just a reduced vowel:
– Cut, hit, bid, beat; read, lead, fight…
• Some devoice the inflection after a liquid or a nasal,
sometimes with a vowel reduction;
– Spelt, spilt, burnt; dealt, felt, meant…
• Some change d to t, or have a consonant and/or vowel
change ending up with a final t:
– Built, bent, sent; taught, thought, bought, caught…
• 59% of the irregular verbs end in d or t.
• All but two of the rest preserve all the stem-initial and any
stem final consonants and make the past by a change of
– Sang, rang, dug, swung; flew, blew, knew…
The connectionist approach can
capture this!
• The regular correspondence is captured in the
weights that map, for example, seep to
seeped and seem to seemed.
• The same connection weights come into play
in mapping creep to crept and dream to
• Correct performance with irregulars amounts
to slight modulation of the regular pattern,
rather than doing something completely
unrelated as in the words-or-rules approach.
with Novel
48/72 only activated
correct responses;
6 activated no response;
these are the remaining
18 items
• Model can learn regulars and exceptions.
• Correctly inflects most unfamiliar regular verbs.
– The ability of a single system to do all these things remains
counter-intuitive to many people.
• Also captures children’s tendency to produce occasional
‘irregularization’ responses and other signs of sensitivity
to sub-regularities.
• Uses same knowledge that applies to regulars in
processing quasi-regular items like wept and kept.
• Produces U-shaped developmental curve.
• All in a single system without explicit rules, ruleacquisition mechanisms, or a separate lexicon of
Critique (Pinker and Prince, 1988)
• Training regime unrealistic
– Child’s experience is relatively constant over time.
• Performance on regulars not good enough
– Makes quite a few errors, some quite strange
• Model can’t produce different past tenses for
– ring the bell, ring the city, wring the clothes
• Wickelfeature representation has problems
Reply: Conceptualizations are not
implementations (MacWhinney and Leinbach)
• Included semantic as well as phonological
• Used a different input representation that led
to better performance on regulars
• Did not address U-shaped curve
Plunkett and Marchman
• Used simplified corpus and network (all present tense
forms reduced to three slots, like ‘run’ or ‘put’)
• Found ‘micro-U’ shaped learning
– Performance on a given item can vacillate so that correct
responses precede incorrect responses.
– This is consistent with the actual pattern of over-regularizations
seen in most children’s production, where over-regularizations are
generally occasional.
• Noted special difficulty learning ‘arbitrary suppletions’
like go-went and pointed out that they are also very
rare in English and other languages, consistent with the
properties of connectionist networks.
• Suggested that properties of networks actually offer an
explanatory basis for understanding nature of U-shaped
development and for understanding the distribution of
word forms in the language.
Pinker (1991, and elsewhere)
• Noted that performance on exceptions does show some
signs of exhibiting features like those seen in the RM
model. E.g., there is some similarity-based
generalization, so that forms occasionally ‘join’ irregular
clusters (e.g., kneel-knelt, gling-glang).
• Proposed a dual mechanism account in which there is
one system that uses categorical, ‘algebraic’ rules
insensitive to item properties, and another that uses an
‘associative memory mechanism’ much like the RM
• With Marcus, developed the notion that the rule is
completely insensitive to semantic and phonological
factors, depending only on the form-class of the stem.
• Has waffled extensively on the question of whether the
rule is acquired ‘suddenly’.
Is the onset of the regular past tense sudden?
• According to Marcus et al., it is
“Adam’s first over-regularization
occurred during a three-month
period in which regular marking
increased from 0 to 100%”
Let’s see the rest of the picture
• Hoeffner notes one could
just as easily say:
“Adam’s first over-regularization
occurred during a 6-month
period in which regular
marking went from 24% to
Two analyses of Adam’s use of the regular past
tense in obligatory contexts
The picture from Marcus
et al.
The picture from
Hoeffner’s dissertation
According to Roger Brown…
• All aspects of inflection and grammar exhibit
gradual acquisition.
• This doesn’t look like what you would expect if
the use of regular inflections were actually
based on a ‘rule’.
• However, there are those who have proposed
that rules gradually accumulate strength.
• To capture the data they need also to
gradually extend their generality
– Initial use of forms always appears to be restricted to a
few high-frequency cases, then gradually spreads to
similar forms.
Other Empirical Claims in
Pinker 1991
• Claimed to demonstrate strong dissociations
between regulars and exceptions:
– Performance on exceptions but not regulars is frequency
– Performance on exceptions but not regulars depends on
phonological similarity to known exceptions.
– Also claimed that syntactic but not semantic variables
affect choice of regular vs. exception.
• Denominal status: ‘Why no mere mortal ever flew out
to left field’ (to ‘fly’ said to be derived from a noun).
– Brain damage and developmental disorders can selectively
impair performance on regulars and irregulars.
Performance of regulars but not
exceptions is frequency sensitive
Empirically, frequency effects are
weaker in regulars than exceptions.
Connectionist models show this
effect, as illustrated in the SM
model of single word reading.
This arises from the fact that
regulars benefit from help from
what is learned about other words;
this is less true of exceptions.
There is ongoing debate about
whether a small effect of frequency
actually exists among regulars, once
‘special factors’ have been
Thus, the evidence here offers no
special support for Pinker’s theory,
and may even weigh against it.
Dependence on phonological
similarity to known regulars
• Prasada and Pinker compared judgments and
generation of inflected forms such as plipped (near
known regulars) and ploamphed (far).
• Ploamphed was judged less acceptable and generation
slower than plipped, but P&P claimed this was due to an
influence from phonological features of the stem; when
they subtracted stem acceptability/reading time, no
difference remained.
• Albright and Hayes pointed out that this did not provide
unambiguous support for their hypothesis.
• Found strings that were very high in phonological
acceptability but differed in whether they had regular or
exceptional neighbors.
• Number of regular and exception neighbors both made
independent contributions to ratings and to past tense
generation time, inconsistent with the categorical nature
of the past tense rule as proposed by Pinker.
Semantic but not derivational
Factors affect choice of regular vs.
irregular past tense
Beige: irregular; Blue: regular
Dissociation in a developmental disorder
(the case of the ‘grammar gene’)
• Gopnik & Craigo reported a selective
impairment in regular but not exception
inflection in the KE family, a large family with
a genetically transmitted speech and language
• Vargha-Khadem et al performed a more
detailed investigation of the KE family and
– General deficits including nearly all aspects of verbal and
non-verbal abilities.
– Severe orofacial apraxia.
– Equivalent deficits in regular and exception past-tense
KE Family Performance on Regular
and Exception Verbs
• Both affected and
unaffected members of the
KE family were tested using
a version of Berko’s
sentence completion test,
with a set of 20 items
provided by K. Patterson
• Affected individuals were
impaired on both types of
• 41% of the exception errors
of affected individuals were
demonstrating sensitivity to
the regular past tense.
What about
effects of brain
Ullman et al
considered effects of
anterior vs posterior
lesions in the Berko
sentence completion
The effect of posterior
lesions is also
observed in patients
with semantic
A single-mechanism account
• Joanisse and Seidenberg
suggest that computation of
inflections involves both
semantic and phonological
• A deficit in semantics will
influence exceptions and lead to
regularization errors because
semantics provides a source of
differentiating information that
helps overcome the tendency of
the speech input->output
pathway to regularize.
• J&S were able to simulate the
effect of semantic lesions
(although they used ‘localist’
• A good model using distributed
semantics remains to be
What about the deficit in regular
inflection seen in anterior aphasia?
• Lesions to phonology in the J&S model
produce a disadvantage for novel verbs, but
do not produce an advantage for exceptions
over regulars.
• In Bird et al (JML, 2003) we have argued that
the apparent advantage for exceptions reflects
phonological differences between regular and
exceptional past tenses.
Phonological Complexity Differences
between Regular and Exceptional
Past Tenses
• The regular past tense always increases the complexity
of the word.
– like -> liked, love -> loved, hate -> hated
• Some forms so created violate phonotactic constraints
on mono-morphemic English word forms (Burzio, 1998).
– Voiced stop-stop pairs (lobbed) never occur
– Unvoiced stop-stop pairs (as in liked) never occur after
diphthongs (fact, but not *faict)
• Exceptional past tenses are generally no more complex
than their stems, which are often very simple.
– Eat -> ate, take -> took
– Weep -> wept reduces stem to compensate for added ‘t’.
Reg/Irreg not CV-matched
Reg/Irreg CV-matched
Empirical Claims in Pinker 1991 –
None of which hold up
• Claimed to demonstrate strong dissociations
between regulars and exceptions:
– Performance on exceptions but not regulars is frequency
– Performance on exceptions but not regulars depends on
semantic and phonological similarity to known exceptions.
– Brain damage and developmental disorders can selectively
impair performance on regulars and irregulars.
• Also claimed that syntactic but not semantic
variables affect choice of regular vs.
– Denominal status: ‘Why no mere mortal ever flew out to
left field’ (to ‘fly’ said to be derived from a noun).
A model of language change that
produces quasi-regular past tenses (with
Gary Lupyan)
• Our initial interest focused on quasi-regular exceptions:
– Items that add /d/ or /t/ and reduce the vowel:
• Did, made, had, said, kept, heard, fled…
– Items already ending in /d/ or /t/ that change (usually reduce)
the vowel:
• hid, slid, sat, read, bled, fought..
• We suggest these items reflect historical change
sensitive to:
– Pressure to be brief contingent on comprehension
– Consistency in mapping between sound and meaning
Bibliography [available online at]
• Bybee, J. and McClelland, J. L. (2005). Alternatives to
the combinatorial paradigm of linguistic theory based on
domain general principles of human cognition. The
Linguistic Review, 22(2-4), 381-410.
• Lupyan, G. and McClelland, J. L. (2003). Did, made,
had, said: Capturing quasi-regularity in exceptions.
Proceedings of the Annual Meeting of the Cognitive
Science Society, 2003.
• McClelland, J. L., Patterson, K., Pinker, S. and Ullman,
M. (2002). The Past Tense Debate: Papers and replies
by S. Pinker and M. Ullman and by J. McClelland and K.
Patterson.Trends in Cognitive Sciences, 6,456-474.
The End
Applying the Idea to Language Communication
(Lupyan and McClelland, 2003)
• The spoken form I produce
is constrained:
Your understanding
of what I said
– To allow you to understand
– To be as short as possible
given that it is understood.
• Model focuses on sound to
meaning mapping (box),
with an intervening hidden
• Above constraints are
allowed to influence the
model’s spoken input
My Intended Meaning
Back Propagation of Error (d)
dj ~
di ~
dk ~ (tk-ak)
Error-correcting learning:
Weights to output layer:
Dwki= ewdkai
Weights to a hidden layer:
Dwij = ewdiaj
Activations in input representation: Daj = erdj-ecaje-E
Simulation of Reductive
Irregularization Effects
In English, frequent items are
less likely to be regular.
Also, d/t items are less likely to
be regular.
The same effects emerge in the
While the past tense is usually
one phoneme longer than
present, this is less true for the
high frequency past tense items.
Reduction of high frequency past
tenses is to a phoneme other
than the word final /d/ or /t/.
Regularity and role in mapping to
meaning protects inflection.
Ongoing work is exploring
regularization of low frequency
Not just the past tense, but all of
language and many other domains exhibit
• Spelling-sound correspondence
– Date; mint; bread; pint
• In inflectional morphology
– Prefabricate
– Predict
– Prefer
• In idioms, collocations, and ordinary sentences
John loves Mary. Everyone loves ice cream. The pope loves sinners.
She felt the baby kick.
He hit the nail on the head.
He kicked the bucket.
• And the same applies to categories and subcategories.
– Sparrow
– Turkey
– Penguin
• An important part of the appeal of connectionist models is
that they provide natural framework for capturing regularity,
not only in what is fully regular, but also (which is most of the
time) when regularities coexist with idiosyncrasies.
• Fundamentally, language is not the fully regular system
that linguists want us to think of it as.
• In reality, language is quasi-regular (as are many other
domains, such as the domain of living things).
• An important part of the appeal of connectionist models
is that they easily capture this quasi-regularity.
• They also provide natural vehicles for capturing graded
constraints on language structure and gradual processes
of language change.

Approaches to the Analysis of the Brain Organization of