A new corpus for Spanish Second Language Acquisition Research L. Dominguez, R. Mitchell, M. J. Arche (U. of Southampton), E. Marsden (U. of York), F. Myles (Newcastle U.) A corpus for L2 Acquisition SLA theory aims to understand the complex mechanisms and conditions behind learner grammars Access to good quality data is crucial: learner production data + focused comprehension tasks Increasing interest in the creation of electronic learner corpora: – sharing data more easily – automatising some aspects of data analysis through the use of software such as concordancers, part of speech taggers, etc. Some Existing Learner Corpora – CHILDES: http://childes.psy.cmu.edu/ – TALKBANK: http://talkbank.org/ – English Corpus Linguistics: http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm – L2 FRENCH FLLOC: www.flloc.soton.ac.uk/ – L2 (Written) SPANISH CEDEL 2: www.ugr.es/~cristoballozano/cedel2.htm SPLLOC “Spanish Language Learner Oral Corpus” 2 year ESRC funded corpus project investigating the development of L2 Spanish Aims: – a small scale, high quality cross-sectional database of spoken learner Spanish – topics being investigated lie at the syntax/discourse interface Data: Collected - c40 hours of audio recordings (native/non-native) - 80 written focused tests on word order - 60 computer based tests on clitic comprehension 95% transcribed to date! Immediate Research Agenda Syntax/discourse interface as conceptualised in generative linguistics, including: – The acquisition of Spanish word order – Clitic pronouns Verbal morphology Development of the L2 lexicon Corpus Design Balance of spontaneous and focused data (semispontaneous oral tasks are complemented by focused judgement and production tasks) Balance of genres (semi-spontaneous oral tasks include interview, narrative and discussion) Balance of participants (20 L2 speakers from each of beginner, intermediate and advanced levels + NS speakers) Flexibility of computer-aided analysis (use of the CHILDES system, plus an XML version) Free web access to all materials (anonymised sound files, transcripts, analysis files) for all bonafide research users. Summary of tasks by type, elicitation method and genre Task Type SEMI SPONTA NEOUS Elicitation Oral Genre Modern Times Loch Ness Photos Narrative √ √ √ Interview Discussion Paired Discussion Clitic Production Clitic Comprehe nsion Word Order √ √ √ Oral FOCUSED TASKS Production √ Computer based Paper Written Comprehens ion √ Some task samples Loch Ness Illustrations by Alex Brychta for “A Monster Mistake” by Roderick Hunt (Oxford Reading Tree, 2003) used by permission of Oxford University Press. Modern Times Photos task Description of states And Description of events Clitic Comprehension (computer based) The learner hears a sentence with a clitic pronoun and has to click on the object it refers to. 32 screens: Combination of number and gender (canonical and non-canonical) plus syntactic collocation. • • • • Canonical feminine: -a ending (e.g. calculadora ‘calculator’) Canonical masculine: -o ending (e.g. teléfono ‘phone) Non canonical: no –a/-o ending (e.g. lápiz) Collocation: Proclitic (as in coniugated verbs) vs. enclitic (as in infinitives). Clitic Production (computer based) The learner is asked a question referring to an object based on the sequence of pictures shown. 32 slides; combination of number and gender (canonical and non-canonical) plus syntactic collocation. Word Order Task (paper & pencil) 1. Context-dependent word order preference test • The learner is presented with 28 situations with a following question • Two types of questions: What happened? (Broad focus) Who did x? (Narrow focus) • 4 items by 7 syntactic contexts: 4xSVO, 4xVOS, 4xCLLD, 4xUnerg/Narrow, 4xUnerg/ Broad, 4xUnacc/Narrow and 4xUnacc/Broad • Three options: Inverted (VS), non-inverted (SV) and both. You get home and your brother just tells you that he has got an email from your friend Sue and that he has very good news to tell you. You ask your brother “¿Qué ha pasado?” (What happened?) What could he say? a. Se ha comprado un coche Sue b. .Sue se ha comprado un coche c. Both sentences (Sue has bought a car) (Sue has bought a car) 2. Your brother is having some friends over for a get together at home. When your mother comes she sees some smoke coming out of the bathroom and she asks your brother: “¿Quién está fumando?” (Who’s smoking?) What could you brother say? a.Oscar está fumandob. (Oscar is smoking) B. Está fumando Oscar (Oscar is smoking) c. Both sentences Summary of subjects by task (to date) Task Type Openended Focused Task Name University (Final Year) Sixth Form College (Year 13) Lower Secondary School (Year 9) Natives (all ages) Modern Times 20 5 Loch Ness 20 20 20 15 Photos 20 20 20 15 Paired Discussion 20 20 Clitic Comprehen sion 20 20 20 3 Picture Sequence 20 20 20 10 Word Order 20 20 19 20 5 Tools for Data Analysis CHILDES (The Child Language Data Exchange System) – CLAN = Computerised Language Analysis Computer program suite for transcribing, searching and analysing language data – CHAT = Codes for the Human Analysis of Transcripts A format for notation and transcription Types of Analyses: – FREQ, MLU, COMBO, KWAL Next Steps Database will be available for use by the research community via www.splloc.soton.ac.uk (in spring 2008) Articles & conference papers (in 2007): – – – – – BAAL LLT SIG GALA BUCLD HLS SLRF CHILDES training workshop: – 25 January 2008, University of Southampton. Acknowledgments The SPLLOC project is supported by an ESRC research grant (RES 000231609) We would like to thank all the participants in the project, including subjects, transcribers and fieldworkers References Domínguez, L., Arche, M.J. 2007a. “Deviant optional forms in L2 Spanish: the case of word order variation”. Poster presentation at GALA, Barcelona, 6-8 September. Domínguez, L., Arche, M.J. 2007b. “Optionality in L2 grammars: the acquisition of SV/VS contrast in Spanish”. To be presented at BUCLD 32,Boston, 1-4 November. Domínguez, L., Arche, M.J. 2007c. “The L2 Acquisition of SV/VS contrast in Spanish”. To be presented at the Hispanic Linguistic Symposium, Texas, 1-4 November. Domínguez, L., Arche, M.J., Mitchell, R, Marsden, E. and Myles, F 2007. “Innovations in Spanish SLA research methodology: introducing the ‘Spanish Learner Language Oral Corpus’”. To be presented at the Hispanic Linguistic Symposium, Texas, 1-4 November. Granger, S., J. Hung and S. Petch-Tyson (eds.). 2002. Computer Learner Corpora, second language acquisition and foreign language teaching. Amsterdam: John Benjamins. Lozano, C. & Mendikoetxea, A. (in press). Verb-Subject order in L2 English: new evidence from the ICLE corpus. In: Actas del XXV Congreso Internacional de AESLA. Universidad de Murcia. Lozano, C. & Mendikoetxea, A. (forthcoming 2007). Postverbal subjects at the interfaces in Spanish and Italian learners of L2 English: a corpus analysis. In: Papp, S., Díez, B. and Gilquin, G. (eds). Linking up contrastive and corpus learner research. Rodopi Mitchell, R., Marsden, E., Domínguez, L., Arche, M. J. and Myles, F. 2007 “Creation and analysis of a Spanish language learner oral corpus (SPLLOC)”. Poster presentation at BAAL LLT SIG Conference “Towards a Researched Pedagogy”, University of Lancaster, 2-3 July. Mitchell, R., Dominguez, L., Arche, M.J., Myles, F. and Marsden, E. “Developing a CHILDES-based corpus of L2 oral Spanish”. To be presented at Second Language Research Forum, Urbana-Champaign, 11-14 October. Myles, F. 2002. Linguistic development in classroom learners of French: a cross-sectional study (No. End of ESRC award report R000223421). Southampton: University of Southampton. Myles, F. 2005. Interlanguage corpora and second language acquisition research. Second Language Research, 21,4: 373-391.