Principles in language testing
What is a good test?
What is the purpose of testing?
• The purpose of testing is to obtain information
on language skills of the learners.
• Information is very costly. The more specific it is,
the more cost it involves.
– Is language testing targeting specific information?
– Costs here involve human and material resources
and TIME.
– Once an institution/teacher decided that the
information is needed, it/he should be ready to meet
the costs.
Types of tests
Achievement tests (final or progress)
Proficiency tests
Pro-achievement tests
Diagnostic tests
Placement tests
Test marking
• Assessment scale (also: rating scale)
– criteria by which performances at a given
level will be recognized
– levels of performance:
10 (excellent), 9 (very good), 8 (good)
bands 0-9 in IELTS
1-100 pts in the national English examination
level descriptors – verbal descriptions of
performances that illustrate each level of
competence on the scale
Communicative language
• Linguistic competences
– lexical, grammatical, semantic, phonological,
orthographic, orthoepic
• Sociolinguistic competences
– markers of social relations, politeness conventions,
expressions of folk wisdom, register differences,
dialect and accent
• Pragmatic competences
– discourse comp. (ability to arrange sentences in
proper sequence), functional (requests, invitations
(adapted from CEFR 2001)
Competences vs. skills
• Competences are tested through skills
• The four major skills are subdivided into
minor subskills:
– reading comprehension:
reading for general orientation
reading for information
reading for main ideas
reading for specific information
reading for implications etc.
(CEFR 2001)
What is good testing?
It is valid
It is reliable
It is practical
It has positive impact
on the teaching
Test validity
• It appropriateness of the test; OR
• It shows that a test tests what it is
supposed to test; OR
• A test is valid if it measures accurately
what it is intended to measure.
• To establish that a test is valid, empirical
evidence is needed. The evidence comes
from different sources…
Types of validity
• Construct validity:
– the extent to which a test measures the
underlying psychological construct (“ability,
– the extent to which a test reflects the essential
aspects of the theory on which that test is
– an overarching notion of validity reflected in
many subordinate forms of validity
In a more complicated way…
• If a test does not have construct validity,
test scores will show CONSTRUCT
– E. g. in an advanced speaking test candidates
may be asked to speak on an abstract topic.
Personal engagement in the topic, however,
may weaken or improve the performance.
BUT: having previous knowledge about the
abstract topic should not be assessed.
Types of validity
• Content validity:
– the extent to which a test adequately and sufficiently measures
the particular skills it sets out to measure (cf. test specifications)
• Response validity:
– … test takers respond in the way expected by the test
• Predictive validity:
– … a test accurately predicts future performance
• Concurrent validity:
– … one test relate to scores on another external measure
• Face validity:
– … test appears to measure whatever it claims to measure
(Hughes 2003: 26-35)
Types of validity
• Nearly 40 different types have been
collected on a language testers’ forum…
• The more different types of validity are
established in a test, the more valid that
test is considered to be.
Test reliability
• Quality of test scores resulting from test
– accuracy of marking and fairness of scores
– consistency of marking:
• similar scores on different days
• similar scores from different markers
– inter-rater reliability
– intra-rater reliability
Factors influencing reliability
1. The performance of test takers
a sufficient number of items
restricted freedom of test behaviour
unambiguous items, clear instructions and rubrics
layout, good copies, familiar format
proper administration
2. The reliability of scorers
1. objective scoring vs. subjective scoring
2. restricting freedom of response
3. a detailed scoring/marking key
Test feasibility/practicality
• It is the ease with which the items/tasks
can be replicated in terms of resources
needed, e. g. time, materials, people
Washback effect
(sometimes ‘backwash’)
• It is a type of impact of examinations/tests
on the classroom situation.
• Washback may be positive or negative.
How to achieve positive washback?
1. Test the abilities/skills whose
development you want to encourage.
2. Sample widely and unpredictably.
3. Use direct testing.
4. Make testing criterion-referenced.
5. Base achievement tests on objectives.
6. Make sure that the test is known and
understood by students and other
References and additional reading
1. Alderson, Ch., D. Clapham and D. Wall.
1995. Language Test Construction and
Evaluation. Cambridge: CUP
2. Hughes, A. 2003. Testing for Language
Teachers. 2nd ed. Cambridge: CUP.
3. Council of Europe. 1991. Common European
Framework of Reference for Languages.
Cambridge: CUP.

Principles in language testing