Issues in Assessment Mathematics Assessment and Intervention Statistical Issues Related to Assessment • Reliability – Consistency – The degree to which students’ results remain consistent over replications of an assessment procedure. – Measurement example Popham, J. W. (2011). Classroom assessment: What teachers need to know. Boston: Pearson. Nitko, A. J. & Brookhart, S. M. (2007). Educational assessment of students. Upper Saddle River, NJ: Pearson. Reliability Evidence • Stability Reliability – test/retest – The stability of results if no significant event occurred between administrations – Typically calculated using a correlation coefficient – Can also used classification consistency • Does the student receive the same classification, such as proficient/not proficient. – Note – there is always some instability between administrations Reliability Evidence • Alternative Form Reliability – This is when you have multiple test forms – Typically calculated using a correlation coefficient. Reliability Evidence • Internal Consistency Reliability – Do the items in an assessment function in a consistent fashion – Do the items on the assessment measure a single variable, such as Fraction Addition or Shape Identification – More items make an assessment more reliable – There are specific statistical tests that are used to determine internal consistency. Standard Error of Measurement – Any assessment result is an estimate of a student’s “true” score. – The standard error of measurement is the “band” wherein the students true score likely lies. – It is found by multiplying the standard deviation of the assessment by the square root of 1-the reliability coefficient. – See graph on Nitko p. 77 Example – Parent Talk example from Popham (p. 70) Validity • Validity refers to the inferences from or use of assessment results. – There is no such thing as a valid test, only if the inferences based on the results are valid. Curricular Aim: Example – multiplication fact fluency Student’s inferred status Assessment: Example – 2 minute timed multiplication test Validity Evidence – Content Validity • Does the content of the test represent the content of the curricular aim? – Ex: What if the multiplication test used only the 1 and 5 facts? • Does the curricular aim involve different processes? If so, the assessment must use those different processes. • Look at categorical concurrence, depth and range of knowledge, and balance. Validity Evidence • Construct Validity – Does the assessment measure the stated construct? • Make a hypothesis about how the construct works and how student results should illustrate that construct • Gather data from the assessment Validity Evidence • Criterion Validity – This is primarily related to tests that purport to predict some result such as SAT tests predicting college GPA. Statistics related to assessment: Percentile • The raw scores of the norming population are put in order from lowest to highest. They are then split into 100 equal groups, called PERCENTILES. Each student’s score is then compared to the norming scores to see where it falls. Percentiles can only be used on a norm-referenced test. Why? Stanines: The percentile score is divided into nine segments, each of which represents a “standard nine.” Statistics Related to Assessment Results – Measures of Central Tendency • Mean: the average ( X ) • Mode: the most common • Median: the middle number when the data is put in order from least to greatest – When should you use which measure? Box and Whisker Plot • A box and whisker plot uses medians and percentiles to describe data. • Create a box and whisker plot with the following data – Data set 1: 7, 8, 2, 10, 9, 9, 3, 5, 7, 8, 8, 10, 10, 6 – Data set 2: 45, 60, 85, 95, 100, 50, 80, 90, 100, 95, 60, 25 More Descriptive Statistics • Measures of Variability – Standard Deviation (SD): a measure of how spread out the data are; roughly, the average of how far each data point is from the mean – Range: difference between the lowest data point and the highest data point – Interquartile Range: rank order the data, split it in half and in half again, subtract the median of the bottom half from the median of the top half Norm-referenced assessments • Norm-referenced tests compare a student’s assessment results to other students (norm group) who have taken the same test. Only a raw score is used which is then converted to a percentile. – Examples • Iowa Test of Basic Skills • SAT 9 Criterion-referenced assessments • Criterion-referenced assessments compare student’s assessment results with pre-established criteria, such as the core curriculum. Result can be reported as raw scores, percentages, or other conversions of the score. – Examples • End of level tests Bias • Assessment bias refers to “qualities of an assessment instruct that offend or unfairly penalize a group of students because of students’ gender, race, ethnicity, socioeconomic status, religion, or other such group-defining characteristics” (Popham, p.111). Bias – Offensiveness • Negative stereotypes presented • Slurs • Distress may influence test results – Unfair penalization • Content that, while not offensive, disadvantages a student because of group membership. • Think about experiences that some students may have had while others may not have the same types of opportunities. • What about assessments in other languages? Bias • Does the fact that students of different races perform differently indicate bias? • Disparate impact Bias Detection in the Classroom • Think seriously about the impact that differing experiential backgrounds will have on the way students respond to your assessments. Assessing Students with Disabilities and ELLs • Must follow modifications and accommodations on the IEP • Accurately assess ELLs Self Check p. 133 Evaluate your assessments • In a small group, look at your own assessments, evaluating them for – Reliability – Validity (the use of the assessments) – Bias Teacher responsibilities When creating assessments • Apply sound principles of assessment planning • Craft assessment procedures that are free from characteristics irrelevant to curricular aim • Accommodate in appropriate ways • Present results in ways that encourage students • Ensure that assessment materials do not contain errors Teacher responsibilities When choosing assessments • Use quality assessment materials • Publication does not equal quality When administering assessments • Conduct the assessment professionally • Accommodate students with disabilities • Follow rules when administering standardized tests Teacher responsibilities When scoring assessments • • • • • • Score responses accurately and fairly Provide feedback for learning Explain rubrics Review evaluations individually Correct your errors quickly Score and return results as quickly as possible. Teacher responsibilities: Do No Harm? • Mr. Allen is having his students score each other's quizzes and then call out the scores so he can plot them on the board. Do No Harm? Students in Miss Ela's class are discussing samples of anonymous science lab notes to decide which are great examples, which have some good points, and which don't tell the story of the lab at all well. They are gradually developing criteria for their own lab "learning logs." Do No Harm? Pat's latest story is being read aloud for the class to critique. Like each of her classmates, she's been asked to take notes during this "peer assessment" so that she can revise her work later. Do No Harm? Students in Henry's basic writing class are there because they have failed to meet the state's writing proficiency requirements. Henry tells students that the year will consist of teaching them to write. Competence at the end will be all that matters. Do No Harm? Jeremy's teacher tells him that his test scores have been so dismal so far that no matter what he does from then on he will fail the class. Assessment Ethics: Confidentiality • Who should have access to student assessment results – Discuss: What about • • • • • Students grading each others’ work Parents grading student work Student aides recording scores Faculty room discussions about students Public displays of student progress, i.e. charts, graphs, etc. Assessment Type Presentation Sign up for Presentation Work with your group. The presentation should take approximately one hour and 15 minutes of class time.