Rules or Connections in Past Tense Inflections: What does the Evidence Rule Out? Stanford Cognitive Core Class May 9, 2007 Is there a past tense rule? • Early on, children often produce exceptional past tenses correctly (went, took, etc). • But at some point, they also produce ‘regularizations’. • Also, children (and adults) usually produce ‘regular’ inflections for novel items when prompted, as in: this man is ricking… yesterday he ____. • [Some responses are, however similar to some exceptions…] • Regularizations errors and regular responses to nonwords were once taken as suggesting that young children discover ‘the past tense rule’. • The fact that children learn exceptions was explained by ‘memorization’ or ‘lexical lookup’. An Alternative to Assuming that Children ‘Acquire’ the Past-Tense Rule • Rumelhart and McClelland proposed that the past tense reflects regularities captured in the connections among units in a connectionist system that learns from examples. • We demonstrated this by implementing and running a simple model of how people perform the task. • The resulting debate has been fierce… and there are many who still think our approach is misguided. • But as I’ll try to show the evidence appears to be consistent with our perspective. • The work illustrates how models influence empirical research; how easy it is to get misleading results in experiments; and how important it is for researchers on each side of an issue to follow up on each other’s findings. Overview • The RM model, introducing the connectionist alternative • A key concept arising from the connectionist program: – Quasi-regularity: The tendency for exceptions to partially confirm to the regular pattern • Early critiques and responses lead to: – The Pinker symbolic, dual mechanism account – Accumulation of arguments and evidence eventually support the connectionist, single-mechanism account. The Rumelhart-McClelland 1986 past-tense model Training and Testing Procedure • Training: – Present WF pattern representing present tense of verb. – Compute WF pattern representing past tense of verb using stochastic sigmoid activation function. – Compare computed past-tense pattern to correct past tense pattern. – Adjust connections using Perceptron Convergence Procedure: • Increase strength of connection from active input units to output units that should be active but are not. • Decrease strength of connection from active input units to output units that should not be active but are. • Testing: – – – – Present WF pattern of present tense of verb. Compute WF pattern. Compare to various alternatives on various measures. OR: Generate output using fixed decoding net. Sigmoid Activation Function Training Regime • First ten epochs use 10 most frequent words only – Feel, have, make, get, give, take, come, go, look, need • Remainder of training uses 10 most frequent plus 400 words of ‘middle frequency’ • Each word is presented once per epoch • An additional 84 lower-frequency words is saved for generalization testing Recapitulation of U-shaped learning Responses to t/d and other verbs Quasi-Regularity: The Tendency of Irregular Forms to Conform Partially to the Regular Pattern Most exceptions are quasi-regular: • Some add d or t but delete a consonant or change the vowel – Had, made; Did, said; sold, told; kept, slept, crept… • Many verbs that end in /d/ or /t/ either have no change or just a reduced vowel: – Cut, hit, bid, beat; read, lead, fight… • Some devoice the inflection after a liquid or a nasal, sometimes with a vowel reduction; – Spelt, spilt, burnt; dealt, felt, meant… • Some change d to t, or have a consonant and/or vowel change ending up with a final t: – Built, bent, sent; taught, thought, bought, caught… • 59% of the irregular verbs end in d or t. • All but two of the rest preserve all the stem-initial and any stem final consonants and make the past by a change of vowel: – Sang, rang, dug, swung; flew, blew, knew… The connectionist approach can capture this! • The regular correspondence is captured in the weights that map, for example, seep to seeped and seem to seemed. • The same connection weights come into play in mapping creep to crept and dream to dreamt. • Correct performance with irregulars amounts to slight modulation of the regular pattern, rather than doing something completely unrelated as in the words-or-rules approach. Performance with Novel Irregulars Novel Regulars 48/72 only activated correct responses; 6 activated no response; these are the remaining 18 items Summary • Model can learn regulars and exceptions. • Correctly inflects most unfamiliar regular verbs. – The ability of a single system to do all these things remains counter-intuitive to many people. • Also captures children’s tendency to produce occasional ‘irregularization’ responses and other signs of sensitivity to sub-regularities. • Uses same knowledge that applies to regulars in processing quasi-regular items like wept and kept. • Produces U-shaped developmental curve. • All in a single system without explicit rules, ruleacquisition mechanisms, or a separate lexicon of exceptions. Critique (Pinker and Prince, 1988) • Training regime unrealistic – Child’s experience is relatively constant over time. • Performance on regulars not good enough – Makes quite a few errors, some quite strange • Model can’t produce different past tenses for homophones – ring the bell, ring the city, wring the clothes • Wickelfeature representation has problems Reply: Conceptualizations are not implementations (MacWhinney and Leinbach) • Included semantic as well as phonological input • Used a different input representation that led to better performance on regulars • Did not address U-shaped curve Plunkett and Marchman • Used simplified corpus and network (all present tense forms reduced to three slots, like ‘run’ or ‘put’) • Found ‘micro-U’ shaped learning – Performance on a given item can vacillate so that correct responses precede incorrect responses. – This is consistent with the actual pattern of over-regularizations seen in most children’s production, where over-regularizations are generally occasional. • Noted special difficulty learning ‘arbitrary suppletions’ like go-went and pointed out that they are also very rare in English and other languages, consistent with the properties of connectionist networks. • Suggested that properties of networks actually offer an explanatory basis for understanding nature of U-shaped development and for understanding the distribution of word forms in the language. Pinker (1991, and elsewhere) • Noted that performance on exceptions does show some signs of exhibiting features like those seen in the RM model. E.g., there is some similarity-based generalization, so that forms occasionally ‘join’ irregular clusters (e.g., kneel-knelt, gling-glang). • Proposed a dual mechanism account in which there is one system that uses categorical, ‘algebraic’ rules insensitive to item properties, and another that uses an ‘associative memory mechanism’ much like the RM model. • With Marcus, developed the notion that the rule is completely insensitive to semantic and phonological factors, depending only on the form-class of the stem. • Has waffled extensively on the question of whether the rule is acquired ‘suddenly’. Is the onset of the regular past tense sudden? • According to Marcus et al., it is sudden: “Adam’s first over-regularization occurred during a three-month period in which regular marking increased from 0 to 100%” Let’s see the rest of the picture • Hoeffner notes one could just as easily say: “Adam’s first over-regularization occurred during a 6-month period in which regular marking went from 24% to 44%”. Two analyses of Adam’s use of the regular past tense in obligatory contexts The picture from Marcus et al. The picture from Hoeffner’s dissertation According to Roger Brown… • All aspects of inflection and grammar exhibit gradual acquisition. • This doesn’t look like what you would expect if the use of regular inflections were actually based on a ‘rule’. • However, there are those who have proposed that rules gradually accumulate strength. • To capture the data they need also to gradually extend their generality – Initial use of forms always appears to be restricted to a few high-frequency cases, then gradually spreads to similar forms. Other Empirical Claims in Pinker 1991 • Claimed to demonstrate strong dissociations between regulars and exceptions: – Performance on exceptions but not regulars is frequency sensitive. – Performance on exceptions but not regulars depends on phonological similarity to known exceptions. – Also claimed that syntactic but not semantic variables affect choice of regular vs. exception. • Denominal status: ‘Why no mere mortal ever flew out to left field’ (to ‘fly’ said to be derived from a noun). – Brain damage and developmental disorders can selectively impair performance on regulars and irregulars. Performance of regulars but not exceptions is frequency sensitive • Empirically, frequency effects are weaker in regulars than exceptions. • Connectionist models show this effect, as illustrated in the SM model of single word reading. • This arises from the fact that regulars benefit from help from what is learned about other words; this is less true of exceptions. • There is ongoing debate about whether a small effect of frequency actually exists among regulars, once ‘special factors’ have been controlled. • Thus, the evidence here offers no special support for Pinker’s theory, and may even weigh against it. Dependence on phonological similarity to known regulars • Prasada and Pinker compared judgments and generation of inflected forms such as plipped (near known regulars) and ploamphed (far). • Ploamphed was judged less acceptable and generation slower than plipped, but P&P claimed this was due to an influence from phonological features of the stem; when they subtracted stem acceptability/reading time, no difference remained. • Albright and Hayes pointed out that this did not provide unambiguous support for their hypothesis. • Found strings that were very high in phonological acceptability but differed in whether they had regular or exceptional neighbors. • Number of regular and exception neighbors both made independent contributions to ratings and to past tense generation time, inconsistent with the categorical nature of the past tense rule as proposed by Pinker. Semantic but not derivational Factors affect choice of regular vs. irregular past tense Beige: irregular; Blue: regular Dissociation in a developmental disorder (the case of the ‘grammar gene’) • Gopnik & Craigo reported a selective impairment in regular but not exception inflection in the KE family, a large family with a genetically transmitted speech and language disorder. • Vargha-Khadem et al performed a more detailed investigation of the KE family and found: – General deficits including nearly all aspects of verbal and non-verbal abilities. – Severe orofacial apraxia. – Equivalent deficits in regular and exception past-tense formation. KE Family Performance on Regular and Exception Verbs • Both affected and unaffected members of the KE family were tested using a version of Berko’s sentence completion test, with a set of 20 items provided by K. Patterson • Affected individuals were impaired on both types of items. • 41% of the exception errors of affected individuals were regularizations, demonstrating sensitivity to the regular past tense. 100 90 80 70 60 50 40 30 20 10 0 Affected Unaffected Reg Exc What about effects of brain damage? Ullman et al considered effects of anterior vs posterior lesions in the Berko sentence completion task. The effect of posterior lesions is also observed in patients with semantic dementia. A single-mechanism account • Joanisse and Seidenberg suggest that computation of inflections involves both semantic and phonological representations. • A deficit in semantics will influence exceptions and lead to regularization errors because semantics provides a source of differentiating information that helps overcome the tendency of the speech input->output pathway to regularize. • J&S were able to simulate the effect of semantic lesions (although they used ‘localist’ semantics). • A good model using distributed semantics remains to be implemented. What about the deficit in regular inflection seen in anterior aphasia? • Lesions to phonology in the J&S model produce a disadvantage for novel verbs, but do not produce an advantage for exceptions over regulars. • In Bird et al (JML, 2003) we have argued that the apparent advantage for exceptions reflects phonological differences between regular and exceptional past tenses. Phonological Complexity Differences between Regular and Exceptional Past Tenses • The regular past tense always increases the complexity of the word. – like -> liked, love -> loved, hate -> hated • Some forms so created violate phonotactic constraints on mono-morphemic English word forms (Burzio, 1998). – Voiced stop-stop pairs (lobbed) never occur – Unvoiced stop-stop pairs (as in liked) never occur after diphthongs (fact, but not *faict) • Exceptional past tenses are generally no more complex than their stems, which are often very simple. – Eat -> ate, take -> took – Weep -> wept reduces stem to compensate for added ‘t’. Reg/Irreg not CV-matched Reg/Irreg CV-matched Empirical Claims in Pinker 1991 – None of which hold up • Claimed to demonstrate strong dissociations between regulars and exceptions: – Performance on exceptions but not regulars is frequency sensitive. – Performance on exceptions but not regulars depends on semantic and phonological similarity to known exceptions. – Brain damage and developmental disorders can selectively impair performance on regulars and irregulars. • Also claimed that syntactic but not semantic variables affect choice of regular vs. exception. – Denominal status: ‘Why no mere mortal ever flew out to left field’ (to ‘fly’ said to be derived from a noun). A model of language change that produces quasi-regular past tenses (with Gary Lupyan) • Our initial interest focused on quasi-regular exceptions: – Items that add /d/ or /t/ and reduce the vowel: • Did, made, had, said, kept, heard, fled… – Items already ending in /d/ or /t/ that change (usually reduce) the vowel: • hid, slid, sat, read, bled, fought.. • We suggest these items reflect historical change sensitive to: – Pressure to be brief contingent on comprehension – Consistency in mapping between sound and meaning Bibliography [available online at psychology.stanford.edu/~jlm/papers] • Bybee, J. and McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of linguistic theory based on domain general principles of human cognition. The Linguistic Review, 22(2-4), 381-410. • Lupyan, G. and McClelland, J. L. (2003). Did, made, had, said: Capturing quasi-regularity in exceptions. Proceedings of the Annual Meeting of the Cognitive Science Society, 2003. • McClelland, J. L., Patterson, K., Pinker, S. and Ullman, M. (2002). The Past Tense Debate: Papers and replies by S. Pinker and M. Ullman and by J. McClelland and K. Patterson.Trends in Cognitive Sciences, 6,456-474. The End Applying the Idea to Language Communication (Lupyan and McClelland, 2003) • The spoken form I produce is constrained: Your understanding of what I said – To allow you to understand – To be as short as possible given that it is understood. • Model focuses on sound to meaning mapping (box), with an intervening hidden layer. • Above constraints are allowed to influence the model’s spoken input representations. Speech My Intended Meaning Back Propagation of Error (d) dj ~ ai Sdiwij wij di ~ Sdkwki ai wki dk ~ (tk-ak) ak Error-correcting learning: Weights to output layer: Dwki= ewdkai Weights to a hidden layer: Dwij = ewdiaj Activations in input representation: Daj = erdj-ecaje-E Simulation of Reductive Irregularization Effects • • • • • In English, frequent items are less likely to be regular. Also, d/t items are less likely to be regular. The same effects emerge in the simulation. While the past tense is usually one phoneme longer than present, this is less true for the high frequency past tense items. Reduction of high frequency past tenses is to a phoneme other than the word final /d/ or /t/. – • Regularity and role in mapping to meaning protects inflection. Ongoing work is exploring regularization of low frequency exceptions. Not just the past tense, but all of language and many other domains exhibit quasi-regularity • Spelling-sound correspondence – Date; mint; bread; pint • In inflectional morphology – Prefabricate – Predict – Prefer • In idioms, collocations, and ordinary sentences – – – – John loves Mary. Everyone loves ice cream. The pope loves sinners. She felt the baby kick. He hit the nail on the head. He kicked the bucket. • And the same applies to categories and subcategories. – Sparrow – Turkey – Penguin • An important part of the appeal of connectionist models is that they provide natural framework for capturing regularity, not only in what is fully regular, but also (which is most of the time) when regularities coexist with idiosyncrasies. Conclusion • Fundamentally, language is not the fully regular system that linguists want us to think of it as. • In reality, language is quasi-regular (as are many other domains, such as the domain of living things). • An important part of the appeal of connectionist models is that they easily capture this quasi-regularity. • They also provide natural vehicles for capturing graded constraints on language structure and gradual processes of language change.