Becoming Recursive
or, Recursion as an Epiphenomenon of
Distributed Role/Filler Serialization
or, How I Learned to Stop Recurring and
Love the Brain
Simon D. Levy
Computer Science Department
Washington & Lee University
Recursion in Human Languages Conference
Illinois State University
27 April 2007
Part I
Two Views on Recursion
1. “Essentialist”: Recursion is a
fundamental property of the Faculty of
Language in the Narrow Sense / FLN /
UG (Hauser, Chomsky, Fitch 2002)
2. “Nominalist”: Recursion is one of
several strategies for “the
transmission of propositional
structures through a serial interface”1
& Bloom (1990)
c.f. Power Laws (Physics)
Now, just because these
simple mechanisms exist,
doesn't mean they explain
any particular case.... You
need to do "differential
diagnosis", by identifying
other, non-power-law
consequences of your
mechanism, which other
possible explanations don't
share. This, we hardly ever
- C. Shalizi (2007)
M. E. J. Newman. Power laws, Pareto distributions, and Zip's law.
Contemporary Physics, 46, 323-351 (2005).
Critique of Pure Recursion
If we want to imitate human memory with
models, we must take account of the
weaknesses of the nervous system as well
as its powers. D. Gabor (1968)
Once again, however, my claim is
not that the Pirahã cannot think
recursively, but that their syntax
is not recursive. D. Everett (2007)
Part II
Role/Filler Serialization
• Propositional representations built from composing
role/filler bindings (Fillmore 1968; Schank 1972)
• Syntax / grammar replaced by a neurally plausible
mechanism for serializing recursively-structured
propositional representations through role prediction
(Chang et al. 2006)
• Syntactic recursion becomes possible when, e.g., noun
roles (agent, patient) are generalized to intentional
predicates (knows, wants)
Neurally Plausible Role/Filler Models
• Distributed Representations: massively parallel,
gracefully degrading, non-local storage (McClelland
et al. 1986)
• Vector Symbol Architectures (Plate 2003; Kanerva
1994): roles, fillers represented as high-dimensional,
low precision vectors of fixed size
• Efficient (parallel) binding, unbinding, composition
through vector arithmetic
• Psychologically realistic model of analogy through
vector distance metric
Vector Symbolic Architectures:
Binding, Composition
Vector Symbolic Architectures:
Vector Symbolic Architectures:
Serializing VSA Representations
• Sequence-processing network (Elman 1990;
Dominey et al. 2006) can be trained to
predict role-vector sequences for a given
language (e.g., AGENT-PRED-PATIENT for
• Role vectors unbind fillers
• Associative network maps fillers to words
• Neurally plausible “soft stack” network (Levy
2007) supports fillers requiring further
Advantages of the Model
• Predicts observed progression from simple,
idiosyncratic to complex, recursive
constructions in language acquisition
(Tomasello 2003)
• “Soft-wired”, learnable, mutable role
inventory (Blank & Gasser 1992), generalizable
to social & other networks
• Supports both directions of language /
culture influence
– Sapir-Whorf
– Immediacy of Experience (Everett 2005)
Advantages of the Model
• Predicts soft limits on depth of embedding in
memory, speech (Rohde 2002)
• Neurally plausible implementation (Eliasmith
2004; Dominey et al. 2006)
• Concept / sequence processing distinction
supported by neuroscience (Crow 1997)
Part III
Current Work
• Role Production by Analogy in Vector
Symbolic Architectures
• Iterated Learning Model (Kirby &
Hurford 2002)
References & Related Work
Blank, D. and M. Gasser (1992) Grounding via Scanning: Cooking up Roles
from Scratch. Proceedings of the 1992 Midwest Artificial Intelligence
and Cognitive Science Society Conference.
Crow, T.J. (1997) Is Schizophrenia the Price that Homo Sapiens Pays
for Language? Schizophrenia Research, 28: 127-141.
Chang, F., G.S. Dell, and K. Bock (2006) Becoming
Syntactic. Psychological Review, 113, 2, 234-272.
Dominey P.F., M. Hoen, and T. Inui (2006) A Neurolinguistic Model of
Grammatical Construction Processing, In Press, Journal of Cognitive
Neuroscience. 18 : 2088-2107.
Eliasmith, C. (2004). Learning context sensitive logical inference in a
neurobiological simulation. in S. Levy, S. and R. Gayler, eds.,
Compositional Connectionism in Cognitive Science. AAAI Fall Symposium.
AAAI Press. p. 17-20.
References & Related Work
Elman, J.: Finding structure in time. Cognitive Science 14 (1990) 179–
Everett., D.L. (2007) Cultural Constraints on Grammar in PIRAHÃ: A
Reply to Nevins, Pesetsky, and Rodrigues (2007) lingBuzz/000427.
Everett, D.L. (2005). Cultural Constraints on Grammar and Cognition in
Pirah&atilde: Another Look at the Design Features of Human Language.
Current Anthropology, August-October, 2005.
Fillmore, C. J. (1968) The Case for Case. In Bach and Harms, eds.,
Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston,
Gabor, D. Improved holographic model of temporal recall. Nature 217
(1968) 1288-1289.
Hauser, M.D., N. Chomsky, and W. T. Fitch (2002) The Faculty of
Language: What Is It, Who Has It, and How Did It Evolve? Science 22
November 2002: Vol. 298. no. 5598, pp. 1569 – 1579.
References & Related Work
Kanerva, P. (1994) The Spatter Code for Encoding Concepts at Many
Levels. In M. Marinaro and P.G. Morasso (eds.), ICANN '94: Proceedings
International Conference on Artificial Neural Networks (Sorrento,
Italy), vol. 1; 226--229. London: Springer-Verlag.
Kirby, S. and J. Hurford (2002) The emergence of linguistic structure:
An overview of the iterated learning model. In A. Cangelosi and D.
Parisi, eds., Simulating the Evolution of Language. London: Springer
Verlag, 121–148.
Levy, S.D. (2007). Continuous States and Distributed Symbols: Toward
a Biological Theory of Computation (Poster). Proceedings of
Unconventional Computation: Quo Vadis?, Santa Fe, NM
McClelland, J.L., D. E. Rumelhart and G. E. Hinton (1986) The Appeal of
Parallel Distributed Processing. In D. E. Rumelhart and J. L. McClelland,
eds., Parallel Distributed Processing: Explorations in the Microstructure
of Cognition. Cambridge, Massachusetts: MIT Press.
Miikkulainen, R. (1996) Subsymbolic Case-Role Analysis of Sentences
with Embedded Clauses. Cognitive Science 20 : 47-73.
References & Related Work
M. E. J. Newman. Power laws, Pareto distributions, and Zip's law.
Contemporary Physics, 46, 323-351 (2005).
Pinker, S. & P. Bloom (1990). Natural language and natural selection.
Behavioral and Brain Sciences 13 (4): 707-784.
Plate, T. (2003) Holographic Reduced Representations. CSLI Lecture
Notes Number 150. Stanford, California: CSLI Publications.
Rohde, D.L.T. (2002) A Connectionist Model of Sentence Comprehension
and Production. PhD thesis, School of Computer Science, Carnegie
Mellon University.
Schank, R.C. (1972). Conceptual Dependency: A Theory of Natural
Language Understanding, Cognitive Psychology, (3)4, 532-631.
Shalizi, C. R. (2007) Power Law Distributions, 1/f Noise, Long-Memory
Time Series.
References & Related Work
• Tomasello, M. (2003). Constructing a Language: A Usage-Based
Theory of Language Acquisition. Harvard University Press.

Language in Space and Time