Becoming Recursive or, Recursion as an Epiphenomenon of Distributed Role/Filler Serialization or, How I Learned to Stop Recurring and Love the Brain Simon D. Levy Computer Science Department Washington & Lee University Recursion in Human Languages Conference Illinois State University 27 April 2007 Part I Background Two Views on Recursion 1. “Essentialist”: Recursion is a fundamental property of the Faculty of Language in the Narrow Sense / FLN / UG (Hauser, Chomsky, Fitch 2002) 2. “Nominalist”: Recursion is one of several strategies for “the transmission of propositional structures through a serial interface”1 1Pinker & Bloom (1990) c.f. Power Laws (Physics) Now, just because these simple mechanisms exist, doesn't mean they explain any particular case.... You need to do "differential diagnosis", by identifying other, non-power-law consequences of your mechanism, which other possible explanations don't share. This, we hardly ever do. - C. Shalizi (2007) M. E. J. Newman. Power laws, Pareto distributions, and Zip's law. Contemporary Physics, 46, 323-351 (2005). Critique of Pure Recursion If we want to imitate human memory with models, we must take account of the weaknesses of the nervous system as well as its powers. D. Gabor (1968) Once again, however, my claim is not that the Pirahã cannot think recursively, but that their syntax is not recursive. D. Everett (2007) Part II Model Role/Filler Serialization • Propositional representations built from composing role/filler bindings (Fillmore 1968; Schank 1972) • Syntax / grammar replaced by a neurally plausible mechanism for serializing recursively-structured propositional representations through role prediction (Chang et al. 2006) • Syntactic recursion becomes possible when, e.g., noun roles (agent, patient) are generalized to intentional predicates (knows, wants) MARY LOVES KNOWS JOHN BILL MARY LOVES KNOWS JOHN BILL Neurally Plausible Role/Filler Models • Distributed Representations: massively parallel, gracefully degrading, non-local storage (McClelland et al. 1986) • Vector Symbol Architectures (Plate 2003; Kanerva 1994): roles, fillers represented as high-dimensional, low precision vectors of fixed size • Efficient (parallel) binding, unbinding, composition through vector arithmetic • Psychologically realistic model of analogy through vector distance metric Vector Symbolic Architectures: Binding, Composition Vector Symbolic Architectures: Unbiding Vector Symbolic Architectures: Recursion Serializing VSA Representations • Sequence-processing network (Elman 1990; Dominey et al. 2006) can be trained to predict role-vector sequences for a given language (e.g., AGENT-PRED-PATIENT for English) • Role vectors unbind fillers • Associative network maps fillers to words • Neurally plausible “soft stack” network (Levy 2007) supports fillers requiring further decomposition Advantages of the Model • Predicts observed progression from simple, idiosyncratic to complex, recursive constructions in language acquisition (Tomasello 2003) • “Soft-wired”, learnable, mutable role inventory (Blank & Gasser 1992), generalizable to social & other networks • Supports both directions of language / culture influence – Sapir-Whorf – Immediacy of Experience (Everett 2005) Advantages of the Model • Predicts soft limits on depth of embedding in memory, speech (Rohde 2002) • Neurally plausible implementation (Eliasmith 2004; Dominey et al. 2006) • Concept / sequence processing distinction supported by neuroscience (Crow 1997) Part III Conclusions Current Work • Role Production by Analogy in Vector Symbolic Architectures • Iterated Learning Model (Kirby & Hurford 2002) References & Related Work • Blank, D. and M. Gasser (1992) Grounding via Scanning: Cooking up Roles from Scratch. Proceedings of the 1992 Midwest Artificial Intelligence and Cognitive Science Society Conference. • Crow, T.J. (1997) Is Schizophrenia the Price that Homo Sapiens Pays for Language? Schizophrenia Research, 28: 127-141. • Chang, F., G.S. Dell, and K. Bock (2006) Becoming Syntactic. Psychological Review, 113, 2, 234-272. • Dominey P.F., M. Hoen, and T. Inui (2006) A Neurolinguistic Model of Grammatical Construction Processing, In Press, Journal of Cognitive Neuroscience. 18 : 2088-2107. • Eliasmith, C. (2004). Learning context sensitive logical inference in a neurobiological simulation. in S. Levy, S. and R. Gayler, eds., Compositional Connectionism in Cognitive Science. AAAI Fall Symposium. AAAI Press. p. 17-20. References & Related Work • Elman, J.: Finding structure in time. Cognitive Science 14 (1990) 179– 211 • Everett., D.L. (2007) Cultural Constraints on Grammar in PIRAHÃ: A Reply to Nevins, Pesetsky, and Rodrigues (2007) lingBuzz/000427. • Everett, D.L. (2005). Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language. Current Anthropology, August-October, 2005. • Fillmore, C. J. (1968) The Case for Case. In Bach and Harms, eds., Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston, 1-88. • Gabor, D. Improved holographic model of temporal recall. Nature 217 (1968) 1288-1289. • Hauser, M.D., N. Chomsky, and W. T. Fitch (2002) The Faculty of Language: What Is It, Who Has It, and How Did It Evolve? Science 22 November 2002: Vol. 298. no. 5598, pp. 1569 – 1579. References & Related Work • Kanerva, P. (1994) The Spatter Code for Encoding Concepts at Many Levels. In M. Marinaro and P.G. Morasso (eds.), ICANN '94: Proceedings International Conference on Artificial Neural Networks (Sorrento, Italy), vol. 1; 226--229. London: Springer-Verlag. • Kirby, S. and J. Hurford (2002) The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi and D. Parisi, eds., Simulating the Evolution of Language. London: Springer Verlag, 121–148. • Levy, S.D. (2007). Continuous States and Distributed Symbols: Toward a Biological Theory of Computation (Poster). Proceedings of Unconventional Computation: Quo Vadis?, Santa Fe, NM • McClelland, J.L., D. E. Rumelhart and G. E. Hinton (1986) The Appeal of Parallel Distributed Processing. In D. E. Rumelhart and J. L. McClelland, eds., Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, Massachusetts: MIT Press. • Miikkulainen, R. (1996) Subsymbolic Case-Role Analysis of Sentences with Embedded Clauses. Cognitive Science 20 : 47-73. References & Related Work • M. E. J. Newman. Power laws, Pareto distributions, and Zip's law. Contemporary Physics, 46, 323-351 (2005). • Pinker, S. & P. Bloom (1990). Natural language and natural selection. Behavioral and Brain Sciences 13 (4): 707-784. • Plate, T. (2003) Holographic Reduced Representations. CSLI Lecture Notes Number 150. Stanford, California: CSLI Publications. • Rohde, D.L.T. (2002) A Connectionist Model of Sentence Comprehension and Production. PhD thesis, School of Computer Science, Carnegie Mellon University. • Schank, R.C. (1972). Conceptual Dependency: A Theory of Natural Language Understanding, Cognitive Psychology, (3)4, 532-631. • Shalizi, C. R. (2007) Power Law Distributions, 1/f Noise, Long-Memory Time Series. http://cscs.umich.edu/~crshalizi/notebooks/powerlaws.html References & Related Work • Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press.