```Issues in Assessment
Mathematics Assessment and Intervention
Statistical Issues Related to Assessment
• Reliability
– Consistency
– The degree to which students’ results
remain consistent over replications of an
assessment procedure.
– Measurement example
Popham, J. W. (2011). Classroom assessment: What teachers need to know.
Boston: Pearson.
Nitko, A. J. & Brookhart, S. M. (2007). Educational assessment of students.
Reliability Evidence
• Stability Reliability – test/retest
– The stability of results if no significant
– Typically calculated using a correlation
coefficient
– Can also used classification consistency
• Does the student receive the same
classification, such as proficient/not
proficient.
– Note – there is always some instability
Reliability Evidence
• Alternative Form Reliability
– This is when you have multiple test forms
– Typically calculated using a correlation
coefficient.
Reliability Evidence
• Internal Consistency Reliability
– Do the items in an assessment function in
a consistent fashion
– Do the items on the assessment measure
a single variable, such as Fraction
– More items make an assessment more
reliable
– There are specific statistical tests that
are used to determine internal
consistency.
Standard Error of Measurement
– Any assessment result is an estimate of a
student’s “true” score.
– The standard error of measurement is
the “band” wherein the students true
score likely lies.
– It is found by multiplying the standard
deviation of the assessment by the
square root of 1-the reliability coefficient.
– See graph on Nitko p. 77
Example
– Parent Talk example from Popham (p. 70)
Validity
• Validity refers to the inferences from
or use of assessment results.
– There is no such thing as a valid test, only
if the inferences based on the results are
valid.
Curricular Aim:
Example – multiplication
fact fluency
Student’s inferred status
Assessment:
Example – 2 minute
timed multiplication test
Validity Evidence
– Content Validity
• Does the content of the test represent
the content of the curricular aim?
– Ex: What if the multiplication test used only the 1
and 5 facts?
• Does the curricular aim involve
different processes? If so, the
assessment must use those different
processes.
• Look at categorical concurrence, depth and
range of knowledge, and balance.
Validity Evidence
• Construct Validity
– Does the assessment measure the stated
construct?
• Make a hypothesis about how the construct
works and how student results should
illustrate that construct
• Gather data from the assessment
Validity Evidence
• Criterion Validity
– This is primarily related to tests that
purport to predict some result such as
SAT tests predicting college GPA.
Statistics related to assessment: Percentile
• The raw scores of the norming
population are put in order from
lowest to highest. They are then split
into 100 equal groups, called
PERCENTILES. Each student’s score
is then compared to the norming
scores to see where it falls.
Percentiles can only be used on a
norm-referenced test. Why?
Stanines: The percentile score is divided into nine
segments, each of which represents a “standard nine.”
Statistics Related to Assessment Results
– Measures of Central Tendency
• Mean: the average ( X )
• Mode: the most common
• Median: the middle number when the data is put
in order from least to greatest
– When should you use which measure?
Box and Whisker Plot
• A box and whisker plot uses medians and
percentiles to describe data.
• Create a box and whisker plot with the
following data
– Data set 1: 7, 8, 2, 10, 9, 9, 3, 5, 7, 8, 8, 10, 10, 6
– Data set 2: 45, 60, 85, 95, 100, 50, 80, 90, 100,
95, 60, 25
More Descriptive Statistics
• Measures of Variability
– Standard Deviation (SD): a measure of
how spread out the data are; roughly, the
average of how far each data point is from
the mean
– Range: difference between the lowest
data point and the highest data point
– Interquartile Range: rank order the data,
split it in half and in half again, subtract the
median of the bottom half from the median
of the top half
Norm-referenced assessments
• Norm-referenced tests compare a student’s
assessment results to other students (norm group)
who have taken the same test. Only a raw score is
used which is then converted to a percentile.
– Examples
• Iowa Test of Basic Skills
• SAT 9
Criterion-referenced assessments
• Criterion-referenced assessments compare
student’s assessment results with pre-established
criteria, such as the core curriculum. Result can be
reported as raw scores, percentages, or other
conversions of the score.
– Examples
• End of level tests
Bias
• Assessment bias refers to “qualities
of an assessment instruct that offend
or unfairly penalize a group of
students because of students’
gender, race, ethnicity,
socioeconomic status, religion, or
other such group-defining
characteristics” (Popham, p.111).
Bias
– Offensiveness
• Negative stereotypes presented
• Slurs
• Distress may influence test results
– Unfair penalization
• Content that, while not offensive,
disadvantages a student because of group
membership.
• Think about experiences that some students
may have had while others may not have the
same types of opportunities.
• What about assessments in other languages?
Bias
• Does the fact that students of
different races perform differently
indicate bias?
• Disparate impact
Bias Detection in the Classroom
• Think seriously about the impact that
differing experiential backgrounds
will have on the way students respond
Assessing Students with Disabilities and ELLs
accommodations on the IEP
• Accurately assess ELLs
Self Check p. 133
• In a small group, look at your own
assessments, evaluating them for
– Reliability
– Validity (the use of the assessments)
– Bias
Teacher responsibilities
When creating assessments
• Apply sound principles of assessment
planning
• Craft assessment procedures that are free
from characteristics irrelevant to curricular
aim
• Accommodate in appropriate ways
• Present results in ways that encourage
students
• Ensure that assessment materials do not
contain errors
Teacher responsibilities
When choosing assessments
• Use quality assessment materials
• Publication does not equal quality
• Conduct the assessment professionally
• Accommodate students with disabilities
standardized tests
Teacher responsibilities
When scoring assessments
•
•
•
•
•
•
Score responses accurately and fairly
Provide feedback for learning
Explain rubrics
Review evaluations individually
Score and return results as quickly as
possible.
Teacher responsibilities: Do No Harm?
• Mr. Allen is having his students score
each other's quizzes and then call out
the scores so he can plot them on the
board.
Do No Harm?
Students in Miss Ela's class are
discussing samples of anonymous
science lab notes to decide which
are great examples, which have
some good points, and which don't
tell the story of the lab at all well.
criteria for their own lab "learning
logs."
Do No Harm?
Pat's latest story is being read
aloud for the class to critique.
Like each of her classmates,
she's been asked to take notes
during this "peer assessment" so
that she can revise her work
later.
Do No Harm?
Students in Henry's basic writing class
are there because they have failed to
meet the state's writing proficiency
requirements. Henry tells students
that the year will consist of teaching
them to write. Competence at the end
will be all that matters.
Do No Harm?
Jeremy's teacher tells him that his
test scores have been so dismal so
far that no matter what he does
from then on he will fail the class.
Assessment Ethics: Confidentiality
assessment results
•
•
•
•
•
Student aides recording scores
Public displays of student progress, i.e.
charts, graphs, etc.
Assessment Type Presentation
Work with your group. The presentation should take
approximately one hour and 15 minutes of class
time.
```