Complexities of measuring change
in psychotherapy
Chris Evans
Acknowledgements
Phil Richardson, Kevin Jones & others
Jan Lees, Mark Freestone, Nick Manning
& others
Michael Barkham and many others
Mark Ashworth, Mel Shepherd, Susan
Robinson, Maria Kordowicz & others
Susan McPherson
Jo-anne Carlyle
Classical psychometric model
We all have a position on an
unmeasurable (“latent”) dimension of
interest and of change (“true” value).
Quality of measurement a function of two
issues:
Reliability
Validity
“No validity without reliability”
Reliability
Extent to which the measure is uncontaminated by
random noise
Example 1: (“hard measurement”) working with
obesity and using a poor scales to measure
people’s weight it may fluctuate a lot entirely
randomly.
Example 2 (“our measurement”) measuring
depression using a visual analogue rating
scale the measurement may be contaminated
by imprecision in where the person places
their mark (and much else potentially).
Validity
Extent to which the measure measures what it is supposed
to and is uncontaminated by systematic corrupting by
measuring other non-random issues
Example 1: obesity – measuring people’s weight is pretty
useless unless you also measure height as obesity is
(largely) a function of weight and height so
measuring the one without the other leaves your
measure systematically biased: invalid.
Example 2: in a multi-item measure of depression an item
asking about weight loss will be systematically
affected by recent deliberate dieting, drinking alcohol
rather than eating wisely or by serious physical illness
causing weight loss (or famine but rarely in the
western world).
Reliability: a graphical model
circles are “latent”,
unmeasurable, variables
squares are measurables
straight arrows show
directional influence
everything is nomothetic,...
...i.e. something on which
each person has a value
Psychometrics: classical model
assume one source of
common variance...
... the latent trait to measure
… only source of
covariation between items
each item is also affected by
“error”
errors are independent, and
... uncorrelated with the
latent trait of interest
Cronbach’s alpha
Reliability: proportion of
the measured variation
(sum of the boxes)
… from the latent trait
... not to the sources of
error
estimated as coefficient
alpha, proportion of
covariance to variance
Challenges of “our measurement”
We have little time or money for
measuring
What we measure is often either complex
(“quality of life”) or
idiosyncratic (“recovering from death of
partner bringing back abuse in childhood
and early death of abusing parent”).
Recent measures
 Format:
 Short(ish),
 multi-item,
 self-report measures
 Intention
 Not so much designed to provide strong measurement of a
unidimensional latent variable but
 … to provide rapid coverage of a broad range of issues likely
to cover many clients’ likely change.
 Typical e.g.s:
 Brief Symptom Inventory
 CORE-OM
 OQ-45.
Typical measures
Multiple items, e.g.
“I have felt terribly alone and isolated”
Time focus
“Over the last week”
Use rating anchors by frequency:
“Not at all”, “Only occasionally”, “Sometimes”,
“Often”, “Most or all the time”
Or intensity
“Not at all” to “Extremely”
Issues about items
 I have felt I have someone to turn to for support when needed
 What does “turning to” involve? What is “support”? How much
does “when needed” limit applicability?
 I have felt O.K. about myself
 How OK is OK?!
 I have felt able to cope when things go wrong
 How wrong is wrong? What is coping? (Quite a few European
languages don’t have a verb “to cope”)
 Tension and anxiety have prevented me doing important things
 What if it was only tension? Or only anxiety? How important do
things have to be to be important?
Issues about time frame
“Over the last week”
Do people really anchor to that?
Could it mean:
since Sunday?
since Monday?
the last seven days?
Issues about anchors
Not at all
Only occasionally
Sometimes
Often
Most or all the time
Is my “Only occasionally” your
“Sometimes”?
“Panel” change model
t1
t2
Simple change variance model

Instead of modelling each occasion separately
… look at the variance of the differences between
… observed scores
… for each individual
Get internal reliability of item change
Item change
Binary, Y/N item now have three possible
change scores:
-1, 0, +1
Three level item: five scores:
-2, -1, 0, +1, +2
Four level item: seven scores:
-3, -2, -1, 0, +1, +2, +3
n-level item: always 2n – 1 differences
Real data for the simple model
Exploratory, pragmatic RCT
“Slim” paradigm RCT:
Twelve weeks of
Group based AT cf.
Treatment as usual
Design was N = 120 (60 per arm)
Minimisation randomisation
Richardson, Jones, Evans, Stevens & Rowe
(2007) An exploratory randomised trial of
group based art therapy as an adjunctive
treatment in severe mental illness. Journal of
Mental Health 16(4): 483-491.
Test: BrSI (k=53)
n
Low
α
Up
n
Low
α
Up
T1
43
.97
.98
.99
38
.92
.95
.97
T2
36
.96
.97
.98
34
.89
.93
.96
T3
22
.93
.96
.98
17
.90
.95
.98
1-2
34
.90
.94
.96
31
.87
.92
.95
1-3
22
.83
.90
.95
15
.76
.87
.95
Test2: SANS (k=24)
n
Low
α
Up
n
Low
α
Up
1
46
.90
.93
.96
42
.81
.87
.92
2
38
.89
.93
.96
35
.81
.88
.93
3
22
.93
.96
.98
18
.48
.71
.87
1-2
38
.81
.88
.93
35
.78
.86
.92
1-3
22
.86
.92
.96
18
.62
.79
.91
But … IIP (k=32)
n
Low
α
Up
n
Low
α
Up
T1
44
.86
.90
.94
42
.80
.87
.92
T2
37
.87
.90
.95
35
.84
.90
.94
T3
22
.78
.87
.94
18
.82
.90
.96
1-2
36
.64
.76
.86
34
.29
.54
.74
1-3
21
.47
.69
.85
17
.60
.79
.91
Rating? BPRS (k=19)
n
Low
α
Up
n
Low
α
Up
1
46
.64
.75
.85
43
.53
.68
.81
2
38
.62
.75
.85
35
.40
.62
.78
3
22
.62
.78
.89
18
.47
.70
.87
1-2
38
.60
.75
.85
35
.20
.48
.70
1-3
22
.66
.80
.90
18
-.10
.39
.73
Ratings: HoNOS (k=12)
n
Low
α
Up
n
Low
α
Up
1
46
.58
.72
.83
43
.45
.64
.78
2
38
.46
.65
.8
35
.45
.65
.8
3
22
.44
.68
.85
18
.23
.58
.82
1-2
38
-.20
.22
.54
35
-1.24 -.43
.18
1-3
22
.07
.47
.74
18
-1.15 -.16
.49
Routine test-retest
(CORE, k=34, students)
n
Low
α
Up
1
53
.92
.94
.96
2
41
.94
.96
.98
1-2
40
.65
.77
.86
Diversity & complexity of change
 Naturalistic study of Therapeutic Communities in the
UK
 Borderline Syndrome Index
 Lees, Evans, et al. (2006) Who comes into therapeutic
communities? A description of the characteristics of a
sequential sample of client members admitted to 17
therapeutic communities Therapeutic Communities
27(3): 411-433
 Lees, Evans, et al. (2005) A cross-sectional snapshot of
therapeutic community client members Therapeutic
Communities 26(3): 295-314
Change boxplots: men
-20
-30
-40
-50
Change
-10
0
10
BoSI change to 90 days: men, sequential
n=
34
9
13
7
Drug TC
Prison TC
Residential TC
Day TC
TC group
Change boxplots: women
-10
-20
-30
Change
0
10
BoSI change to 90 days: women, sequential
n=
10
1
32
19
Private sector
Drug TC
Residential TC
Day TC
TC group
50
Jacobson plot: men
30
20
10
0
BoSI score at 90 days
40
Drug TC
Prison TC
Residential TC
Day TC
0
10
20
30
Initial BoSI score
40
50
50
Jacobson, women
30
20
10
0
BoSI score at 90 days
40
Private sector
Drug TC
Residential TC
Day TC
0
10
20
30
Initial BoSI score
40
50
Cat’s cradle plot: men
30
20
10
0
BoSI score
40
50
BoSI scores: Male, sequential data, drug TCs only
0
100
300
200
Days
400
500
Cat’s cradle: men
30
20
10
0
BoSI score
40
50
BoSI scores: Male, sequential data, residential TCs only
0
100
200
300
Days
400
500
Cat’s cradle, men
30
20
10
0
BoSI score
40
50
BoSI scores: Male, sequential data, day TCs only
0
100
200
300
Days
400
500
Cat’s cradle, women
30
20
10
0
BoSI score
40
50
BoSI scores: Female, sequential data, private sector TCs only
0
100
200
300
Days
400
500
Cat’s cradle, women
30
20
10
0
BoSI score
40
50
BoSI scores: Female, sequential data, drug TCs only
0
100
200
300
Days
400
500
Cat’s cradle, women
30
20
10
0
BoSI score
40
50
BoSI scores: Female, sequential data, residential TCs only
0
100
200
300
Days
400
500
Cat’s cradle, women
30
20
10
0
BoSI score
40
50
BoSI scores: Female, sequential data, day TCs only
0
100
200
300
Days
400
500
Idiographic & hybrid measures
 “Patient generated” measures:
 Problem rating & target rating
 Personal questionnaire
 PSYCHLOPS (from MYMOPS)
 www.psychlops.org
 Ashworth, Robinson, et al. (2005) Measuring mental health
outcomes in primary care: the psychometric properties of a
new patient-generated outcome measure, 'Psychlops'
('Psychological Outcome Profiles') Primary care mental health
3: 261-270.
 Ashworth, Evans, et al. (2009) Measuring psychological
outcomes after cognitive behaviour therapy in primary care: a
comparison between a new patient-generated measure
‘PSYCHLOPS’ (Psychological Outcome Profiles) and ‘HADS’
(Hospital Anxiety and Depression Scale) Journal of Mental
Health 18(2): 169-177.
Conventional psychometrics
110 pre and post PSYCHLOPS from
primary care largely CBT interventions
Cronbach alpha t1 .79 and t2 .87 (cf.
usual .94/.95 for CORE-OM)
Change effect size large 1.53 cf. 1.06 for
CORE-OM (p <.001)
Correlations with CORE-OM .48 to .61
Conclusions
 Applying cross-sectional psychometric models (same
for IRT/Rasch) is hiding complexity in our change data
 Group summaries are hiding non-linearity and diversity
in change profiles
 Nomothetic questionnaires should be complemented
with patient generated measures (PSYCHLOPS/PQ)
 We need to stop hiding the complexity of our therapies!
 … but we need a paradigm shift if we’re to manage the
organisational anxieties that provokes
 … and we need money and time to explore complexity
 … and we won’t get money/time without a paradigm
shift that answers questions
Thanks!
[email protected]
Descargar

CORE-OM Clinical Outcomes in Routine Evaluation