An IPO Task Difficulty Matrix for Prototypical
Tasks for Task-based Assessment
Sheila Shaoqian Luo
School of Foreign Languages
Beijing Normal university
Sept 22 2007
The presentation structure…








introduction
literature
the rationale of the research
research questions and research methods
studies: the evolution of the IPO TD matrix
findings
issues and suggestion for future research
implications
I Introduction:
The Chinese National English Curriculum (CNEC, 2001)
Characteristics:
 Multidimensional curriculum + Humanistic
approach
 Focus on ability to use the language
 Nine levels + Competence-based: Can-do-statements
 Promoting Task-Based Language Teaching (TBLT)
 Lists of themes, functions, grammar and vocabulary
The CNEC Goals
Affect and
attitudes
Learning
strategies
Integrated
Cultural
awareness
ability for use
Language
skills
Linguistic
knowledge
II Literature:
Language competence models
 Canale and Swain’s Model: linguistic competence;
sociolinguistic competence; discourse competence (Canale,
1983); strategic competence
 Bachman’s communicative competence model:
(1) Organization: - Grammar; Text;
(2) Pragmatics: - illocution; sociolinguistics
 Skehan’s TBLA model:
(1) to inform task selection (to predict the relative
difficulty of each task); (2) to ensure the full range of
candidates’ ability will be tapped); (3) to assist test
developers in structuring test tasks and the conditions
under which these tasks are performed in appropriate ways;
(4) to inform development of rating scale descriptors; (5)
to facilitate interpretation of test scores (which may differ
according to tasks)
Task based Performance and Language Testing:
The Skehan Model V (2006)
Rater
Scale/Criteria
}
Score
Performance
Interlocutors
Context of
testing
Task
Task characteristics
Task conditions
Ability for Use
Underlying Competences
Consequences
Language competence models:
Assessment
 Canale and Swain’s Model framework play
compensatory roles
 Bachman’s model: strategic competence plays
the central role, which orchestrates knowledge,
language, context, assessment, planning, and
execution; emphasizes on the search for an
underlying “structure-of-abilities”
 Skehan’s model: Task is the center in
generalizing learners’ ability for use; goes beyond
the role of strategic competence and draw into
play “generalized processing capacities and the
need to engage worthwhile language use
(Skehan, 1998a, p. 171).
Issues in language testing
 What we give test takers to do
 Unless tasks have known properties, we will not know if
performance is the result of the candidate or the
particular task
 Without a knowledge of how tasks vary, we cannot
know how broadly we have sampled someone’s
language ability, cf. narrowness and therefore
ungeneralisabilty
 How we decide about candidate ability
 Obviously underlying competences are important
 We also need to probe how people activate these
competences, through ability for use
 Knowledge of this area will enable us to make more
effective context-to-context generalisations and avoid
the narrowness of context-bound performance testing
(Skehan, Dec. 2006)
If tasks are a relevant unit for testing, the
research problem is to try to systematically
“develop more refined measures of task difficulty” .
(Skehan, 1998:80)
The Problem of Difficulty
 Traditional approaches
 Give a series of test items
 Calculate the pass proportion
 Rank the items in difficulty (classical, IRT)
 Blue Skies Solutions
 Effects of different tasks on performance areas
 Do construct validation research
 Use a range of tasks when testing
(Skehan, Dec. 2006)
A more realistic solution:
The present research
 Use an analytic scheme to make
estimates of task difficulty
 Explore whether this analytic scheme
can generate agreement between
different raters
 Explore whether this analytic scheme
has a meaningful relationship to (a)
performance ratings, and (b)
discourse analysis measures
III Research Rationale: Defining the problem
 Identification of valid, user-friendly sequencing
criteria for tasks and test tasks is a pressing but old
problem
 Grading task difficulty and sequencing tasks both
appear to be arbitrary processes not based on
empirical evidence (Long & Crookes, 1992)
 The Norris-Brown et al. matrix (1998; 2002;
influenced by Skehan (1996) offers one way of
characterising test task difficulty, but lacks obvious
connection to a Chinese secondary context.
Weaknesses in previous findings on task difficulty: were of only
moderate support for the proposed relationships between the
combinations of cognitive factors with particular task types…
(Elder et al., 2002)
This research…
 investigates the development and use of a
prototype task difficulty scheme based on
current frameworks for assessing task
characteristics and difficulty, e.g. Skehan
(1998), Norris et al. (1998), and Brown et al
(2002).
Hypothesis:
 There is a systematic relationship between task
difficulty and hypothesized task complexity (see
also Elder , 2002)
IV Research questions
How can language ability in TBLT in mainland
Chinese middle schools best be assessed?
1.
2.
3.
4.
Is the Brown et al. task difficulty framework appropriate
to the mainland Chinese school context? If it is not, then
what is an alternative framework?
Is it possible to have a task difficulty framework that can
be generalized from context to context?
What are the teachers’ perceptions of task difficulty in a
Chinese context?
What are the factors that are considered to affect task
difficulty in this context?
 Underlying abilities:
(1) competence-oriented underlying abilities;
(2) a structure made up of different interactive and interrelated components (Canale & Swain, 1980; Bachman, 1990);
(3) different performances drawing upon these underlying
abilities (Bachman, 1990);
(4) sampling such underlying abilities in the comprehensive
and systematic manner so to provide the basis for
generalizing to non-testing situations.
 Predicting performance: the way abilities are actually used
through tasks (factors may affect performance)
 Generalizing from context to context: to characterize features
of context in order to identify what in common different
contexts are and how knowledge of performance in one area
could be the basis for predicting a learner’s performance in
another area.
 A processing approach: to establish a sampling frame for the
range of performance conditions which operate so that
generalizations can be made, in a principled way, to a range
of processing conditions. (Table (1).doc)
Research Design and Methodology
 A hybrid method of quantitative analysis and qualitative
analysis in both deductive and inductive ways:
matrix
studies
matrix
deductive
inductive
(1) A correlational analysis to explore the relationship between
tasks and task difficulty components; and
(2) a qualitative analysis of verbal self- reports and focus group
interviews on the factors that affect task difficulty
Two research phases:
(1) Phase one: Study One~Study Four (March~May 2004)
Application of the Norris-Brown et al. task difficulty matrix
(2) Phase two: Study Five~Study Ten (Oct 2004~2005)
Establishing and evolution of the IPO task difficulty matrix
Summary of research participants
Participants
Raters
Students
Number Experience
69
5~25 years
60
Grade 8
Fields
Teachers; TEFL
material writers; test
writers; curriculum
developers
2–3/w 40-minute
lessons (Grade
3~Grade 6); 5–6/w
45-minute lessons
(Grade 7~Grade 9)
Summary of research instruments
Instruments
Data type
•
Task difficulty
matrix
Holistic vertical line
Research Question (RQ) to be Addressed
Quantitative •
data
Introspective reports Qualitative
data
Focus-group
interview
Documents: CNEC
Qualitative
data
Constructing IPO task difficulty matrix
Validating IPO task difficulty matrix
(RQ 1 and RQ 4)
•
Teachers’ and students’ perceptions of
tasks and task difficulty (RQ 3)
Quantitative  Constructing, validating, & generalizing
+
the IPO task difficulty matrix
Qualitative  Teachers’ and students’ perceptions of
data
tasks and task difficulty
(RQ 1, RQ 2, RQ 3, and RQ 4)
Research studies
1. Phase one: Applying Norris et al.’s task difficulty matrix
Study One~Four (March~May 2004)
 Applying modified Norris et al. (1998)’s task difficulty matrix
(Tables1) to 28 professional and experienced English teachers to
investigate its transferability in mainland China
 Results:
(1) Impossible to rate task difficulty with pluses and minuses
(2) Among fourteen tasks, + agree - three tasks: Planning the
weekend, Shopping in supermarket, Radio weather information.
(common general topics in the daily life.)
(3) Tremendous disagreement between the Chinese teachers’ ratings
and Norris et al.’s predicted difficulty level (Table2).
Task difficulty matrix for prototypical tasks: ALP
(Norris et al., 1998, p. 84)
Component
code
cognitive
complexity
communicative
demand
Characteristic
tasks
(by theme)
Planning the
weekend
Highlighting
the main idea
diff. index
range #input
sources
in/out input
mode
organiz. availa
response
level
Modified Task difficulty matrix
Task
Code C
Cognit C
Comm S
Task C
1
…
6
Code C (complexity):
linguistic complexity; linguistic input
Cognit C (Cognitive complexity): cognitive familiarity; cognitive processing;
amount of input
Comm S (Stress):
time; interaction; context
Task C (conditions):
language proficiency; language abilities;
language skills; culture & other
Phase one: Conclusions
 The Norris et al (1998) and Brown et al. (2002) matrix
unable to be reliably employed
 There was a discrepancy on the difficulty levels of
tasks between Norris et al. and the Chinese teachers
 Agreement with general topics yet much disagreement
among more cognitive demanding tasks
 Norris et al. tasks might not be appropriate and there
might need an alternative framework for predicting
task difficulty
Phase Two: Establishing IPO task difficulty matrix
(Studies Five~Ten; 2004~2005)
1 The IPO-CFS task difficulty scheme
2 CNEC-theme related tasks (Table3)
 24 CNEC (2001) themes
Personal information; Family, friends and people around;
Personal environments; Daily routines; School life; Interests
and hobbies; Emotions; Interpersonal relationships; Plans
and intentions; Festivals, holidays and celebrations; Shopping;
Food and drink; Health and fitness; Weather; Entertainment
and sports; Travel and transport; Language learning; Nature;
The world and the environment; Popular science and modern
technology; Topical issues; History and geography; Society;
Literature and art
Input
Processing
Output
Content
Content
Content
Form
Form
Form
Modality
Modality
Modality
Support
(making input
clearer)
Support
(making processing
more efficient)
Support
(making oral/written
expression more
accurate and fluent)
Findings
 Study 5: Correlation for the means of both teachers : .65
 However, the 2 sets of tasks generated variations in difficulty within one
theme  Leading to further research into task characteristics and
requirements, and task analysis (Table4)
 Study 6 and 7: 24 CNEC tasks (Table5) that vary in difficulty
 IPO x extended CFMS (Table6)
 2 self-reporters + Rater comments: detailed verbal self-report data to
examine mental processes during rating of the tasks and help refining the
matrix.
 Findings:
(1) Encouraging correlations: all but one range from .52 to .83. The exceptional
pair of .34 leads to further data collection from both raters and students for
the matrix reliability and validity.
(2) The matrix is improving, but needs input from actual raters; Inseparable
Input, Processing, Output
S
corr
D
corr
L
corr
P
corr
X
corr
Y
corr
S
D
L
P
X
Y
1
.53
.61
.83
.82
.72
1
.52
.70
.54
.34
1
.79
.64
.76
1
.81
.74
1
.70
1
Refining the IPO task difficulty matrix
Studies Eight~Ten
 Raters: Professionals (10 + 5 + 9)
 CNEC-theme related tasks (15 + 9)
 IPO x Information, Language, Performance
conditions, Support (ILPS) (Table8)
 Inter-rater correlations:
(1) Study Eight correlation range: .69 to .92
(2) Study Nine correlation range: .62 to .91
(3) Study Ten correlation range: .75 to .87.
Fifteen prototypical tasks
theme
easy
medium
difficult
1: Personal
information
Where does Linda Applying for a
live?
summer club
Li Pei’s bedroom
11: Shopping
What a nice bike
12:
Food and drink
Put the vegetables A quiz:
in order:
What am I?
A plan of the
shops
Customer
Satisfaction Form
18: Nature
Classifying pets
What are they?
Natural disasters
Keep safe from
sharks
Plastics
20:
The Vux
Popular science
Shopping list
IPO task difficulty matrix for
task-based assessment Table9.doc
Dimensions:
Component:
I
INPUT
PROCESSING
OUTPUT
Information:
 Amount
 Type: Familiar-unfamiliar; Personal-impersonal; Concreteabstract; Unauthentic-authentic
 Organization: Structured-unstructured
 Operations: Retrieval vs. transformation; Reasoning
II Language: Level of syntax; Level of vocabulary
III Performance Conditions: Modality; Time pressure
IV Support
Structured -Unstructured
1. Input information or task has a clear and tight
organizational structure, e.g. clear narrative with
beginning, middle, end. All or most elements of task are
clearly connected to one another.
2. Input information or task has organizational structure, but
this is fairly loose, so that some connections need to be
made by the test-taker.
3. Input information or task is partly organized, with some
sections which are structured and organized, but with
other areas which need more active integration by the
test-taker.
4. Information or task requires test-taker to bring
organization to material which isn’t organized. Test taker
has to make the links which are necessary for the task to
be done, or to organize the material which is involved..
A comparison between Brown et al.’s matrix and
the IPO task difficulty matrix
 Similarities (5):
Primary research question; Similar purposes; similar design
of matrix; an example of an assessment alternative;
Sources
 Differences (10):
Test Objects; Task Themes; Task Focus; +(-)related to
curriculum; Task Selection; Definitions/Labels;
Characteristics; Layout; Rating System; Raters
Focus group interview summary
Features
Sample tasks
Pets (Task 7); Feelings (Task 12); Writing an e-mail (Task 6)
Familiar
Education policy and compulsory education (T3)
Unfamiliar
Describing feelings (Task 12); Replying emails (Task6); Pets (Task 7)
Authentic
Length (Task 3)
Difficult
Easy
Amount
Replying emails (Task 6); Pets (Task 7)
Task 7; Task 12
VI Implications: IPO-ILPS task difficulty matrix
 Tasks and Task-based Assessment
(1) Estimating task difficulty: to use learner performances on sampled
tasks to predict future performances on tasks that are constituted
by related difficulty components. (Norris et al., 1998:58)
(2) Students with greater levels of underlying ability will be able to
successfully complete tasks which come higher on such a scale of
difficulty. (Skehan, 1998:184)
 Language Teaching and Learning
(1) may be useful for syllabus designers to develop and sequence
pedagogic tasks in order of increasing task difficulty: to promote
language proficiency and facilitate L2 development, the
acquisition of new L2 knowledge, and restructuring of existing
L2 representations” (Robinson, 2001, p. 34).
(2) may help language teachers and testers when they make decisions
regarding classroom teaching and design, and regarding the taskbased assessments appropriate for the testing inferences they
must make in their own education settings.
VII Limitations
language assessment does not necessarily need to
“subscribe to a single model of language test
development and use”: teachers and students may be
interested more “in specific aspects of performance
more appropriately conceived of as task- or textrelated competence” (Brown et al., 2002, p. 116).
 the matrix and procedures developed and
investigated here is that they were from a cognitive
perspective and many other factors are not explored
from other perspectives.
 the nature of the target language tasks that serve as
the basis of the assessment instruments and
procedures: task appropriateness in particular
learning contexts + locally defined assessment needs.

VIII
Issues and suggestions for future research
 the IPO task difficulty matrix for TBA:
-- to promote the generalizability: more research needed in
different regions in EFL contexts
 Tasks: Both carefully sampled spoken and written tasks +
calibrated test items for reading and listening.
 the social practice (McNamara & Roever, 2006) of the task
difficulty matrix
 More qualitative dimension on judging the difficulty level of a
task would bring the main outcome a qualitative profile,
mainly features of the tasks.
 Role of strategies in determining the difficulty levels
 To what extent does the IPO task difficulty matrix provide a
basis for the assessment of various language activities and
competences?
IX Conclusions
 Tasks are an interesting basis for exploring language
teaching (Skehan, 2006a) and language testing.
 “We need to find more and find out how to make tasks work
more effectively. We don’t know yet how this can be done,
but we will never know if we don’t do research” (Skehan,
2006a).
 Hopefully, Norris and Brown et al.’s (1998; 2002) studies
and the studies attributed in this thesis have provided useful
information and instruments that will profitably contribute
to this research area of task-based teaching and learning,
and assessment.
Acknowledgments
This is a presentation based on my Ph.D. research under
the supervision of Professor Peter Skehan. My great gratitude
goes to my supervisor, Professor Skehan. I also thank my
committee members, Professor Jane Jackson and Professor
David Coniam at the Chinese University of Hong Kong, who
have contributed thoughtful and helpful suggestions to this
study. My thanks go to the the participants in the research.
Selected References
Bachman, L. F. (2002). Some reflections on task-based language performance
assessment. Language Testing, 19(4), 453-476.
Brown, J. D., Hudson, T., Norris, J. & Bonk, W. J. (2002). An investigation of second
language task-based performance assessments. Second Language Teaching &
Curriculum Center, University of Hawai’i at Manoa.
Coniam, D., & Falvey, P. (1999). Assessor training in a high-stakes test of speaking:
the Hong Kong English language benchmarking initiative. Melbourne Papers in
Language Testing, 8 (2), 1–19.
del Pilar Garcia Mayo, M. (Ed.). (2007). Investigating tasks in formal language learning.
Clevedon: Multilingual Matters.
den Branden, K. V. (Ed.) (2006), Task-based language education: From theory to
practice (pp. 1-16). Cambridge: Cambridge University Press.
Elder C., Iwashita N., & McNamara, T. (2002). Estimating the difficulty of oral
proficiency tasks: What does the test-taker have to offer? Language Testing,
19,4, 343-368.
Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to
enhance rater training: Does it work? Language Assessment Quarterly, 2(3), 175196.
Ellis, R. (2003). Task-based language learning and teaching. Oxford: Oxford University
Press.
Ellis, R., & Barkhuizen, G. (2005). Analyzing learner language. Oxford: Oxford
University Press.
Iwashita N., Elder C., & McNamara T. (2001). Can we predict task difficulty in an oral
proficiency test? Exploring the potential of an information-processing approach to
task design. Language Learning, 51(3), 401-436.
Knoch, U., Read, J., & Von Randow, J. (2006, June). Re-training writing raters online:
How does it compare with face-to-face training? Paper presented at the 28th
Annual Language Testing Research Colloquium of the International Language
Testing Association, University of Melbourne, Australia (June 29, 2006).
Ministry of Education, China (2001). A pilot paper: The national English curriculum
standards. Beijing: Beijing Normal University Press.
Norris, J. M., Brown, J. D., Hudson, T. D., & Bonk, W. (2002). Examinee abilities and
task difficulty in task-based second language performance assessment.
Language Testing, 19(4), 395-418.
Nunan, D. (1993). Task-based syllabus design: Selecting, grading, and sequencing
tasks. In G.. V. Crookes & S. M. Gass (Eds.), Tasks in a pedagogical context:
Integrating theory and practice (pp. 55-68). Clevedon, Avon: Multilingual
Matters.
Nunan, D. (2004). Task-Based Language Teaching. Cambridge: Cambridge
University Press.
Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring
interactions in a componential framework. Applied Linguistics, 22 (1), 27 – 57.
Skehan, P. (1996). A framework for the implementation of task-based instruction.
Applied Linguistics, 17 (1), 38-62.
Skehan, P. (1998). A Cognitive approach to language learning. Oxford: Oxford
University Press.
Skehan, P. (1999). The influence of task structure and processing conditions on
narrative retellings. Language Learning, 49(1), 93–120.
Skehan, P. (2001). Tasks and language performance assessment. In M. Bygate, P.
Skehan & M. Swain (Eds.), Researching pedagogic tasks: Second language
learning teaching and testing (pp. 167-185). London: Longman.
Skehan, P. (2003). Task-based instruction. Language Teaching, 36(1), 1-14.
Skehan, P., & Foster, P. (1997). Task type and task processing conditions as
influences on foreign language performance. Language Teaching Research,
1(3), 185–211.
Wolfe-Quintero, K., Inagaki, S., & Kim, H. (1998). Second language development in
writing: Measures of fluency, accuracy & complexity. Second Language
Teaching & Curriculum Center, Honolulu: University of Hawai‘i Press.
The Great Wall starts from where we stand: A long way to go…
Descargar

投影片 1 - University of Hawaii System