Assessing the Fit of IRT
Models in Language Testing
Muhammad Naveed Khalid
Ardeshir Geranpayeh
© UCLES 2013
Outline
• Item Response Theory (IRT)
• Importance of Model Fit within IRT
• Fit Procedures
•
•
Issues and Limitations
Lagrange Multiplier (LM) Test
• An empirical study using LM Fit statistics
•
Sharing Results
• Conclusions
© UCLES 2013
Item Response Theory (IRT)
 A family of mathematical models that
provide a common framework for describing
people and items
 Examinee performance can be predicted in
terms of the underlying trait
 Provides a means for estimating abilities of
people and characteristics of items
© UCLES 2013
IRT Models
Dichotomous or Discrete
1 Parameter Logistic Model / Rasch (1PL)
2 Parameter Logistic Model (2PL)
3 Parameter Logistic Model (3PL)
Polytomous or Scalar
Partial Credit Model (PCM)
Generalized Partial Credit Model (GPCM)
Graded Response Model (GRM)
© UCLES 2013
Shape of Item Response Function
© UCLES 2013
Model for Item with 5 response categories
Probability
Response
Category
© UCLES 2013
IRT Applications
IRT applications in language testing are mainly used in
Test development
Item banking
Differential item functioning (DIF)
Computerized adaptive testing (CAT)
Test equating, linking and scaling
Standard setting
The utility of the IRT model is dependent upon the extent to
which the model accurately reflects the data
© UCLES 2013
Model Fit from Item Perspective
Measurement Invariance (MI): Item responses can be
described by the same parameters in all subpopulations.
Item Characteristic Curve (ICC): Describes the relation
between the latent variable and the observable
responses to items.
Local Independence (LI): Responses to different items
are independent given the latent trait variable value.
Uni-dimensionalty
Speededness
Global
© UCLES 2013
Consequences of Misfit
Yen (2000) and Wainer & Thissen (2003) have shown
the inadequacy of model-data fit
Some of the adverse consequences are:
Biased ability estimates
Unfair ranks
Wrongly equated scores
Student misclassifications
Score precision
Validity
© UCLES 2013
Existing Item Fit Procedures
Chi – Square Statistics
Tests of the discrepancy between the observed and
expected frequencies.
Pearson-Type Item-Fit Indices (Yen, 1984; Bock, 1972).
Likelihood Ratio Based Item-Fit Indices (McKinley & Mills,
1985).
© UCLES 2013
Issues in Existing Fit Procedures
 The standard theory for chi-square statistics does not
hold.
 Failure to take into account the stochastic nature of the
item parameter estimates.
 Forming of subgroups for the test are based on modeldependent trait estimates.
 There is an issue of the number of degrees of freedom.
 It is sensitive to test length and sample size.
© UCLES 2013
Lagrange Multiplier (LM) Test
Glas(1999) proposed the LM test to the evaluation of model fit.
The LM tests are used for testing a restricted model against a more
general alternative one.
Consider a null hypothesis about a model with parameters  0
This model is a special case of a general model with parameters
 '0 = (  '01 , c)
LM ( c )  h ( c ) W
'
© UCLES 2013
1
h (c )

LM Item Fit Statistics
MI / DIF
LI
ICC
© UCLES 2013
Pi ( n ) 
ex p ( i ( n   i )  y n  i ))
1  ex p ( i ( n   i )  y n  i ))
P ( X ni  1, X nl  1 |  n ,  il ) 
P ( X ni  1 |  n ,  ig ) 
exp( i ( n   i   n   l   il ))
1  exp( i ( n   i   n   l   il ))
exp( i ( n   ig   i ))
1  exp( i ( n   ig   i ))
Null Model
i  0
Alternative Model
Null Model
 il  0
Alternative Model
 il  0
Null Model
 ig  0
Alternative Model
 ig  0
i  0
Empirical Example
Data from Cambridge English First (FCE)
–
–
Reading 3 parts/30 questions
Listening 4 parts/30 questions
Sample size over 35000
The approach can be applied to any other
language exam
© UCLES 2013
MI
© UCLES 2013
Lagrange tests
for Rasch MODEL
-------------------------------------------------------------Focal-Group Reference
Abs.
Item
LM
df
Prob Obs
Exp
Obs
Exp
Dif.
-------------------------------------------------------------1 Item1
0.60
1
0.44 0.74 0.72 0.75 0.76 0.01
2 Item2
0.34
1
0.56 0.94 0.94 0.96 0.95 0.00
3 Item3
0.04
1
0.84 0.70 0.71 0.75 0.75 0.00
4 Item4
2.10
1
0.15 0.78 0.75 0.78 0.79 0.02
5 Item5
1.77
1
0.18 0.82 0.80 0.81 0.82 0.02
6 Item6
0.15
1
0.69 0.70 0.71 0.75 0.75 0.01
7 Item7
1.43
1
0.23 0.71 0.68 0.70 0.71 0.02
8 Item8
0.40
1
0.53 0.87 0.87 0.89 0.90 0.01
9 Item9
0.17
1
0.68 0.89 0.88 0.90 0.90 0.00
10 Item10
0.85
1
0.36 0.77 0.78 0.83 0.82 0.01
11 Item11
0.97
1
0.32 0.87 0.85 0.87 0.87 0.01
12 Item12
0.09
1
0.76 0.87 0.87 0.89 0.89 0.00
13 Item13
7.10
1
0.01 0.45 0.50 0.59 0.56 0.04
14 Item14
2.04
1
0.15 0.51 0.55 0.61 0.60 0.02
15 Item15
0.00
1
0.97 0.72 0.72 0.75 0.75 0.00
16 Item16
0.03
1
0.85 0.62 0.62 0.68 0.68 0.00
17 Item17
2.63
1
0.10 0.48 0.52 0.60 0.59 0.03
18 Item18
0.01
1
0.91 0.44 0.44 0.49 0.49 0.00
19 Item19
0.36
1
0.55 0.78 0.79 0.83 0.83 0.01
20 Item20
1.05
1
0.31 0.66 0.69 0.73 0.72 0.02
21 Item21
2.77
1
0.10 0.80 0.83 0.88 0.87 0.02
22 Item22
4.17
1
0.04 0.71 0.75 0.81 0.80 0.02
23 Item23
0.58
1
0.44 0.87 0.85 0.87 0.87 0.01
24 Item24
0.13
1
0.71 0.83 0.83 0.87 0.87 0.00
25 Item25
0.94
1
0.33 0.92 0.93 0.95 0.95 0.01
26 Item26
5.05
1
0.02 0.60 0.55 0.59 0.61 0.03
27 Item27
4.55
1
0.03 0.64 0.60 0.64 0.65 0.03
28 Item28
2.76
1
0.10 0.49 0.45 0.49 0.50 0.03
29 Item29
0.26
1
0.61 0.62 0.61 0.66 0.67 0.01
30 Item30
3.07
1
0.08 0.70 0.66 0.69 0.71 0.03
---------------------------------------------------------------
MI
© UCLES 2013
Lagrange tests
for Rasch MODEL
-------------------------------------------------------------Focal-Group Reference
Abs.
Item
LM
df
Prob Obs
Exp
Obs
Exp
Dif.
-------------------------------------------------------------1 Item1
0.60
1
0.44 0.74 0.72 0.75 0.76 0.01
2 Item2
0.34
1
0.56 0.94 0.94 0.96 0.95 0.00
3 Item3
0.04
1
0.84 0.70 0.71 0.75 0.75 0.00
4 Item4
2.10
1
0.15 0.78 0.75 0.78 0.79 0.02
5 Item5
1.77
1
0.18 0.82 0.80 0.81 0.82 0.02
6 Item6
0.15
1
0.69 0.70 0.71 0.75 0.75 0.01
7 Item7
1.43
1
0.23 0.71 0.68 0.70 0.71 0.02
8 Item8
0.40
1
0.53 0.87 0.87 0.89 0.90 0.01
9 Item9
0.17
1
0.68 0.89 0.88 0.90 0.90 0.00
10 Item10
0.85
1
0.36 0.77 0.78 0.83 0.82 0.01
11 Item11
0.97
1
0.32 0.87 0.85 0.87 0.87 0.01
12 Item12
0.09
1
0.76 0.87 0.87 0.89 0.89 0.00
13 Item13
7.10
1
0.01 0.45 0.50 0.59 0.56 0.04
14 Item14
2.04
1
0.15 0.51 0.55 0.61 0.60 0.02
15 Item15
0.00
1
0.97 0.72 0.72 0.75 0.75 0.00
16 Item16
0.03
1
0.85 0.62 0.62 0.68 0.68 0.00
17 Item17
2.63
1
0.10 0.48 0.52 0.60 0.59 0.03
18 Item18
0.01
1
0.91 0.44 0.44 0.49 0.49 0.00
19 Item19
0.36
1
0.55 0.78 0.79 0.83 0.83 0.01
20 Item20
1.05
1
0.31 0.66 0.69 0.73 0.72 0.02
21 Item21
2.77
1
0.10 0.80 0.83 0.88 0.87 0.02
22 Item22
4.17
1
0.04 0.71 0.75 0.81 0.80 0.02
23 Item23
0.58
1
0.44 0.87 0.85 0.87 0.87 0.01
24 Item24
0.13
1
0.71 0.83 0.83 0.87 0.87 0.00
25 Item25
0.94
1
0.33 0.92 0.93 0.95 0.95 0.01
26 Item26
5.05
1
0.02 0.60 0.55 0.59 0.61 0.03
27 Item27
4.55
1
0.03 0.64 0.60 0.64 0.65 0.03
28 Item28
2.76
1
0.10 0.49 0.45 0.49 0.50 0.03
29 Item29
0.26
1
0.61 0.62 0.61 0.66 0.67 0.01
30 Item30
3.07
1
0.08 0.70 0.66 0.69 0.71 0.03
---------------------------------------------------------------
MI
© UCLES 2013
Lagrange tests
for Rasch MODEL
-------------------------------------------------------------Focal-Group Reference
Abs.
Item
LM
df
Prob Obs
Exp
Obs
Exp
Dif.
-------------------------------------------------------------1 Item1
0.60
1
0.44 0.74 0.72 0.75 0.76 0.01
2 Item2
0.34
1
0.56 0.94 0.94 0.96 0.95 0.00
3 Item3
0.04
1
0.84 0.70 0.71 0.75 0.75 0.00
4 Item4
2.10
1
0.15 0.78 0.75 0.78 0.79 0.02
5 Item5
1.77
1
0.18 0.82 0.80 0.81 0.82 0.02
6 Item6
0.15
1
0.69 0.70 0.71 0.75 0.75 0.01
7 Item7
1.43
1
0.23 0.71 0.68 0.70 0.71 0.02
8 Item8
0.40
1
0.53 0.87 0.87 0.89 0.90 0.01
9 Item9
0.17
1
0.68 0.89 0.88 0.90 0.90 0.00
10 Item10
0.85
1
0.36 0.77 0.78 0.83 0.82 0.01
11 Item11
0.97
1
0.32 0.87 0.85 0.87 0.87 0.01
12 Item12
0.09
1
0.76 0.87 0.87 0.89 0.89 0.00
13 Item13
7.10
1
0.01 0.45 0.50 0.59 0.56 0.04
14 Item14
2.04
1
0.15 0.51 0.55 0.61 0.60 0.02
15 Item15
0.00
1
0.97 0.72 0.72 0.75 0.75 0.00
16 Item16
0.03
1
0.85 0.62 0.62 0.68 0.68 0.00
17 Item17
2.63
1
0.10 0.48 0.52 0.60 0.59 0.03
18 Item18
0.01
1
0.91 0.44 0.44 0.49 0.49 0.00
19 Item19
0.36
1
0.55 0.78 0.79 0.83 0.83 0.01
20 Item20
1.05
1
0.31 0.66 0.69 0.73 0.72 0.02
21 Item21
2.77
1
0.10 0.80 0.83 0.88 0.87 0.02
22 Item22
4.17
1
0.04 0.71 0.75 0.81 0.80 0.02
23 Item23
0.58
1
0.44 0.87 0.85 0.87 0.87 0.01
24 Item24
0.13
1
0.71 0.83 0.83 0.87 0.87 0.00
25 Item25
0.94
1
0.33 0.92 0.93 0.95 0.95 0.01
26 Item26
5.05
1
0.02 0.60 0.55 0.59 0.61 0.03
27 Item27
4.55
1
0.03 0.64 0.60 0.64 0.65 0.03
28 Item28
2.76
1
0.10 0.49 0.45 0.49 0.50 0.03
29 Item29
0.26
1
0.61 0.62 0.61 0.66 0.67 0.01
30 Item30
3.07
1
0.08 0.70 0.66 0.69 0.71 0.03
---------------------------------------------------------------
ICC
for Rasch MODEL
Lagrange multipliers
--------------------------------------------------------------------------Abs.
3
2
Groups: 1
Prob Obs. Exp. Obs. Exp. Obs. Exp. Dif.
df
LM
Item
--------------------------------------------------------------------------0.17 0.56 0.55 0.72 0.71 0.82 0.83 0.01
2
3.56
1 Item1
0.37 0.60 0.59 0.79 0.78 0.89 0.90 0.01
2
1.98
2 Item2
0.54 0.54 0.56 0.76 0.74 0.86 0.87 0.01
2
1.25
3 Item3
0.54 0.67 0.66 0.83 0.83 0.91 0.92 0.01
2
1.23
4 Item4
0.24 0.71 0.71 0.86 0.84 0.91 0.92 0.01
2
2.81
5 Item5
0.23 0.58 0.57 0.68 0.71 0.84 0.83 0.02
2
2.96
6 Item6
0.27 0.17 0.19 0.33 0.31 0.49 0.49 0.01
2
2.65
7 Item7
0.09 0.65 0.66 0.76 0.77 0.87 0.86 0.01
2
4.82
8 Item8
0.11 0.20 0.20 0.33 0.36 0.60 0.58 0.02
2
4.40
9 Item9
0.14 0.24 0.23 0.51 0.54 0.84 0.82 0.02
2
3.89
10 Item10
0.44 0.73 0.72 0.86 0.88 0.95 0.95 0.01
2
1.62
11 Item11
0.00 0.42 0.37 0.50 0.57 0.77 0.76 0.04
2
19.55
12 Item12
0.63 0.43 0.44 0.76 0.75 0.91 0.92 0.01
2
0.94
13 Item13
0.24 0.64 0.63 0.89 0.88 0.96 0.97 0.01
2
2.82
14 Item14
0.00 0.36 0.36 0.65 0.63 0.81 0.84 0.02
2
11.03
15 Item15
0.14 0.52 0.51 0.83 0.83 0.95 0.96 0.01
2
3.88
16 Item16
0.66 0.51 0.51 0.77 0.77 0.92 0.92 0.01
2
0.84
17 Item17
0.65 0.25 0.25 0.41 0.41 0.59 0.60 0.01
2
0.85
18 Item18
0.61 0.49 0.50 0.70 0.70 0.86 0.85 0.01
2
0.99
19 Item19
0.64 0.34 0.33 0.59 0.59 0.81 0.81 0.00
2
0.90
20 Item20
0.60 0.18 0.17 0.27 0.28 0.44 0.43 0.01
2
1.02
21 Item21
0.23 0.43 0.44 0.72 0.72 0.90 0.89 0.01
2
2.92
22 Item22
0.88 0.73 0.73 0.93 0.93 0.98 0.98 0.00
2
0.26
23 Item23
0.48 0.69 0.70 0.91 0.90 0.97 0.97 0.01
2
1.47
24 Item24
0.74 0.45 0.46 0.61 0.59 0.71 0.72 0.01
2
0.61
25 Item25
0.01 0.53 0.56 0.74 0.71 0.81 0.82 0.02
2
8.56
26 Item26
0.25 0.36 0.36 0.56 0.58 0.79 0.78 0.01
2
2.76
27 Item27
0.44 0.38 0.36 0.53 0.56 0.76 0.75 0.02
2
1.64
28 Item28
0.86 0.55 0.55 0.78 0.79 0.92 0.92 0.00
2
0.31
29 Item29
0.33 0.37 0.39 0.53 0.50 0.62 0.63 0.02
2
2.21
30 Item30
--------------------------------------------------------------------------© UCLES 2013
LI
Lagrange multipliers
for Rasch MODEL
------------------------------------------------------Itm Itm
LM
df
Prob Observed Expected
Abs.Dif
------------------------------------------------------2
1
0.15
1
0.70 0.55 0.55 0.62 0.63
0.01
3
2
6.31
1
0.04 0.57 0.59 0.71 0.69
0.01
4
3
1.79
1
0.18 0.62 0.64 0.72 0.71
0.02
5
4
0.26
1
0.61 0.72 0.73 0.77 0.77
0.01
6
5
0.07
1
0.79 0.75 0.75 0.82 0.82
0.01
7
6
0.02
1
0.88 0.51 0.52 0.62 0.61
0.03
8
7
23.95
1
0.00 0.53 0.59 0.70 0.66
0.03
9
8
0.27
1
0.61 0.61 0.61 0.76 0.76
0.01
10
9
1.97
1
0.16 0.40 0.42 0.68 0.67
0.01
11 10
1.20
1
0.27 0.61 0.60 0.78 0.79
0.01
12 11
24.08
1
0.00 0.72 0.77 0.93 0.91
0.05
13 12
2.11
1
0.15 0.53 0.56 0.81 0.80
0.01
14 13
4.24
1
0.06 0.68 0.71 0.91 0.90
0.01
15 14
41.66
1
0.00 0.14 0.25 0.62 0.60
0.05
16 15
4.02
1
0.07 0.70 0.69 0.84 0.85
0.02
17 16
7.04
1
0.01 0.66 0.70 0.87 0.86
0.01
18 17
4.37
1
0.08 0.51 0.55 0.80 0.79
0.01
19 18
13.69
1
0.00 0.52 0.57 0.84 0.82
0.04
20 19
2.04
1
0.12 0.69 0.70 0.93 0.91
0.02
21 20
3.85
1
0.05 0.41 0.46 0.67 0.66
0.01
22 21
1.71
1
0.11 0.80 0.82 0.92 0.91
0.01
23 22
2.01
1
0.16 0.79 0.82 0.94 0.94
0.01
24 23
10.60
1
0.00 0.62 0.72 0.93 0.92
0.03
25 24
1.02
1
0.31 0.61 0.58 0.84 0.84
0.02
26 25
2.34
1
0.13 0.58 0.60 0.82 0.82
0.01
27 26
2.10
1
0.09 0.41 0.45 0.67 0.65
0.02
28 27
1.62
1
0.92 0.86 0.85 0.89 0.91
0.02
29 28
0.17
1
0.68 0.48 0.47 0.63 0.63
0.01
30 29
0.47
1
0.49 0.77 0.77 0.86 0.86
0.01
------------------------------------------------------© UCLES 2013
Conclusions
LM statistics overcome existing FIT issues
Less computational intensive
Size of residuals in the form of Abs.Dif is
highly valuable
Fit of IRT model holds reasonably (FCE)
Items violated - MI (4); ICC (3); LI (7)
Magnitude of violation is not severe
© UCLES 2013
Thank you!
&
Questions
© UCLES 2013
Descargar

Assessing the Fit of IRT Models in Language Testing