A new corpus for Spanish Second
Language Acquisition Research
L. Dominguez, R. Mitchell, M. J. Arche (U. of
Southampton), E. Marsden (U. of York), F. Myles
(Newcastle U.)
A corpus for L2 Acquisition
SLA theory aims to understand the complex
mechanisms and conditions behind learner
grammars
 Access to good quality data is crucial: learner
production data + focused comprehension tasks
 Increasing interest in the creation of electronic
learner corpora:

– sharing data more easily
– automatising some aspects of data analysis through
the use of software such as concordancers, part of
speech taggers, etc.
Some Existing Learner Corpora
– CHILDES: http://childes.psy.cmu.edu/
– TALKBANK: http://talkbank.org/
– English Corpus Linguistics:
http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm
– L2 FRENCH
 FLLOC: www.flloc.soton.ac.uk/
– L2 (Written) SPANISH
 CEDEL 2: www.ugr.es/~cristoballozano/cedel2.htm
SPLLOC
“Spanish Language Learner Oral Corpus”
2 year ESRC funded corpus project investigating the
development of L2 Spanish
 Aims:

– a small scale, high quality cross-sectional database of
spoken learner Spanish
– topics being investigated lie at the syntax/discourse
interface

Data:
 Collected
- c40 hours of audio recordings (native/non-native)
- 80 written focused tests on word order
- 60 computer based tests on clitic comprehension
 95% transcribed to date!
Immediate Research Agenda

Syntax/discourse interface as conceptualised in
generative linguistics, including:
– The acquisition of Spanish word order
– Clitic pronouns

Verbal morphology

Development of the L2 lexicon
Corpus Design





Balance of spontaneous and focused data (semispontaneous oral tasks are complemented by focused
judgement and production tasks)
Balance of genres (semi-spontaneous oral tasks include
interview, narrative and discussion)
Balance of participants (20 L2 speakers from each of
beginner, intermediate and advanced levels + NS
speakers)
Flexibility of computer-aided analysis (use of the
CHILDES system, plus an XML version)
Free web access to all materials (anonymised sound
files, transcripts, analysis files) for all bonafide research
users.
Summary of tasks by type, elicitation
method and genre
Task
Type
SEMI SPONTA
NEOUS
Elicitation
Oral
Genre
Modern
Times
Loch Ness
Photos
Narrative
√
√
√
Interview
Discussion
Paired
Discussion
Clitic
Production
Clitic
Comprehe
nsion
Word
Order
√
√
√
Oral
FOCUSED
TASKS
Production
√
Computer
based
Paper
Written
Comprehens
ion
√
Some task samples
Loch Ness
Illustrations by Alex Brychta for “A Monster Mistake” by Roderick Hunt (Oxford Reading Tree, 2003)
used by permission of Oxford University Press.
Modern Times
Photos task

Description of states
And

Description of events
Clitic Comprehension (computer based)
The learner hears a sentence with a clitic pronoun and has to click on the
object it refers to.
 32 screens:
 Combination of number and gender (canonical and non-canonical) plus
syntactic collocation.

•
•
•
•
Canonical feminine: -a ending (e.g. calculadora ‘calculator’)
Canonical masculine: -o ending (e.g. teléfono ‘phone)
Non canonical: no –a/-o ending (e.g. lápiz)
Collocation: Proclitic (as in coniugated verbs) vs. enclitic (as in infinitives).
Clitic Production (computer based)


The learner is asked a question referring to an object
based on the sequence of pictures shown.
32 slides; combination of number and gender (canonical
and non-canonical) plus syntactic collocation.
Word Order Task (paper & pencil)

1.
Context-dependent word order preference test
• The learner is presented with 28 situations with a following question
• Two types of questions: What happened? (Broad focus)
Who did x? (Narrow focus)
• 4 items by 7 syntactic contexts:
4xSVO, 4xVOS, 4xCLLD, 4xUnerg/Narrow, 4xUnerg/ Broad,
4xUnacc/Narrow and 4xUnacc/Broad
• Three options: Inverted (VS), non-inverted (SV) and both.
You get home and your brother just tells you that he has got an email from your friend Sue and that he has very
good news to tell you. You ask your brother “¿Qué ha pasado?” (What happened?)
What could he say?
a. Se ha comprado un coche Sue b. .Sue se ha comprado un coche c. Both sentences
(Sue has bought a car)
(Sue has bought a car)
2.
Your brother is having some friends over for a get together at home. When your mother comes she sees some
smoke coming out of the bathroom and she asks your brother: “¿Quién está fumando?” (Who’s smoking?)
What could you brother say?
a.Oscar está fumandob.
(Oscar is smoking)
B. Está fumando Oscar
(Oscar is smoking)
c. Both sentences
Summary of subjects by task
(to date)
Task Type
Openended
Focused
Task Name
University
(Final Year)
Sixth Form
College
(Year 13)
Lower
Secondary
School (Year
9)
Natives (all
ages)
Modern
Times
20
5
Loch Ness
20
20
20
15
Photos
20
20
20
15
Paired
Discussion
20
20
Clitic
Comprehen
sion
20
20
20
3
Picture
Sequence
20
20
20
10
Word Order
20
20
19
20
5
Tools for Data Analysis

CHILDES (The Child Language Data
Exchange System)
– CLAN = Computerised Language Analysis
 Computer program suite for transcribing, searching
and analysing language data
– CHAT = Codes for the Human Analysis of
Transcripts
 A format for notation and transcription

Types of Analyses:
– FREQ, MLU, COMBO, KWAL
Next Steps
Database will be available for use by the
research community via www.splloc.soton.ac.uk
(in spring 2008)
 Articles & conference papers (in 2007):

–
–
–
–
–

BAAL LLT SIG
GALA
BUCLD
HLS
SLRF
CHILDES training workshop:
– 25 January 2008, University of Southampton.
Acknowledgments
The SPLLOC project is
supported by an ESRC research grant (RES 000231609)
We would like to thank all the participants in the project,
including subjects, transcribers and fieldworkers
References





Domínguez, L., Arche, M.J. 2007a. “Deviant optional forms in L2 Spanish: the case of word order variation”.
Poster presentation at GALA, Barcelona, 6-8 September.
Domínguez, L., Arche, M.J. 2007b. “Optionality in L2 grammars: the acquisition of SV/VS contrast in Spanish”. To
be presented at BUCLD 32,Boston, 1-4 November.
Domínguez, L., Arche, M.J. 2007c. “The L2 Acquisition of SV/VS contrast in Spanish”. To be presented at the
Hispanic Linguistic Symposium, Texas, 1-4 November.
Domínguez, L., Arche, M.J., Mitchell, R, Marsden, E. and Myles, F 2007. “Innovations in Spanish SLA research
methodology: introducing the ‘Spanish Learner Language Oral Corpus’”. To be presented at the Hispanic
Linguistic Symposium, Texas, 1-4 November.
Granger, S., J. Hung and S. Petch-Tyson (eds.). 2002. Computer Learner Corpora, second language acquisition
and foreign language teaching. Amsterdam: John Benjamins.

Lozano, C. & Mendikoetxea, A. (in press). Verb-Subject order in L2 English: new evidence from the ICLE corpus.
In: Actas del XXV Congreso Internacional de AESLA. Universidad de Murcia.

Lozano, C. & Mendikoetxea, A. (forthcoming 2007). Postverbal subjects at the interfaces in Spanish and Italian
learners of L2 English: a corpus analysis. In: Papp, S., Díez, B. and Gilquin, G. (eds). Linking up contrastive and
corpus learner research. Rodopi
Mitchell, R., Marsden, E., Domínguez, L., Arche, M. J. and Myles, F. 2007 “Creation and analysis of a Spanish
language learner oral corpus (SPLLOC)”. Poster presentation at BAAL LLT SIG Conference “Towards a Researched
Pedagogy”, University of Lancaster, 2-3 July.
Mitchell, R., Dominguez, L., Arche, M.J., Myles, F. and Marsden, E. “Developing a CHILDES-based corpus of L2
oral Spanish”. To be presented at Second Language Research Forum, Urbana-Champaign, 11-14 October.
Myles, F. 2002. Linguistic development in classroom learners of French: a cross-sectional study (No. End of ESRC
award report R000223421). Southampton: University of Southampton.
Myles, F. 2005. Interlanguage corpora and second language acquisition research. Second Language Research,
21,4: 373-391.




Descargar

L2 corpus design - SPLLOC - University of Southampton