MEASURING VARIABLES Topic #4 Empirical Research Propositions / Expectations / Hypotheses • Consider propositions #1 and #16 in Problem Set #3A, which we can state more formally as follows: – #1 DEGREE OF SENIORITY is associated with DEGREE OF PRAGMATISM [among individuals, in particular members of Congress]. – #16 DEGREE OF PROPORTIONALITY is associated with NUMBER OF POLITICAL PARTIES [among the electoral systems of the world]. [“Duverger’s Law”] • In order to test such hypotheses, we need to – identify the relevant unit of analysis and population, – Identify the variables of interest pertaining to these variables, and – collect empirical data concerning such cases. Empirical Research (cont.) • But before we can start collecting data, we need to devise some way of actually measuring the variables that we have conceptually identified in the cases we are using. • This requires us to establish some kind of linkage or correspondence between the data that we will collect and the “conceptual” (or “abstract”) variables in the hypothesis. • For some variables, the measurement problem is relatively straightforward, but for other variables it can be very difficult. • In political science especially, appropriate data about individuals often (but certainly not always) comes from surveys. – In doing Problem Set #4, students too often assume that data used to measure variables must always come from surveys. Coding vs. Operationalization • Measuring variables entails two distinct though interrelated problems. • The coding problem is relatively straightforward: – It entails determining what the variable will “look like” in a codebook, i.e., its name, description, and (in particular) its range of possible values (in the manner of Problem Set #3A). – Will it be dichotomous or more “refined”? – Will the values be nominal, ordinal, interval or ratio? • The operationalization problem is often much more difficult. – It entails specifying the practical operations that will be used to “observe” or “measure” the actual value of the variable in each case, so that data is appropriately collected and (if necessary) coded. • In sentences #2 and #11 of Problem Set #3A, LEVEL OF EDUCATION (pertaining to individuals) is one of the variables that has been identified. Let us consider how this variable might be measured, i.e., first coded and then operationalized. Coding LEVEL OF EDUCATION We must decide what the range of possible values will look like (and whether these possible values will result in a nominal, ordinal, interval, or ratio variable). Here are some possibilities: (1) LEVEL OF EDUCATION (dichotomous): 1 Low 2 High (2) LEVEL OF EDUCATION (qualitative/ordinal but quite “imprecise” ): 1 Low 2 Medium 3 High Coding LEVEL OF EDUCATION (cont.) (3) LEVEL OF EDUCATION: HIGHEST LEVEL ATTAINED (qualitative/ordinal and more “precise” ) 1 2 3 4 5 6 7 8 (4) Grade school Middle school Some high school High school graduate Some college College graduate Some graduate/professional school Graduate/professional degree LEVEL OF EDUCATION: NUMBER OF YEARS OF FORMAL EDUCATION (quantitative / interval): ACTUAL NUMBER OF YEARS OF FORMAL EDUCATION [uncoded] (5) LEVEL OF EDUCATION (quantitative / interval): SOME KIND OF NUMERICAL SCALE DERIVED FROM A TEST (perhaps 0-100) [uncoded] Operationalizing LEVEL OF EDUCATION • We must decide what practical operations we will use to assign a particular value of LEVEL OF EDUCATION to a particular case (individual) • If we are doing survey research, we presumably will determine a respondent’s level of education simply by asking an appropriately phrased question and recording the response. – See variable V62 in the SETUPS/NES Codebook. • “What is the highest grade of school that you have completed?” (with responses recoded in coding categories) – The full-scale ANES uses this basic question plus several follow-up questions.) – Validation? • In other circumstances — for example if we were looking at student testing data — we might use documentary records. Operationalizing other Variables [from PS#3A]: Class Discussion • LEVEL OF SENIORITY (#1) • DEGREE OF PRAGMATISM (#1) • DEGREE OF RELIGIOSITY (#2) • DIRECTION OF IDEOLOGY or DEGREE OF LIBERALISM/CONSERVATISM (sentence #14 or Congressional study) • DEGREE OF PROPORTIONALITY (#16) • NUMBER OF POLITICAL PARTIES (#16) Party ID – Seven-Point Scale (Q1 + Q2) Party ID – Seven-Point Scale (n = 48) 15 10 3 1 5 3 8 3 (1) (2) (3) (4) (5) (6) (4) (9) Strong Democrat Weak Democrat Democratic Leaner Pure Independent Republican Leaner Weak Republican Strong Republican Other, DK, or inconsistent Recoding/Transforming Variables • Remember that the 7-Category PARTY IDENTIFICATION variable [V09] is formed out of responses to two questions (Q1 and Q2 in Student Survey). • V09 in turn can be recoded into two other variables Party ID – Seven-Point Scale Recoded DIRECTION OF PARTY ID Party ID – Seven-Point Scale Recoded (n = 48) 28 (1) Democrat 1 16 3 (2) (3) (9) Independent Republican Other, DK, or inconsistent Party ID – Seven-Point Scale Recoded STRENGTH OF PARTY ID Party ID – Seven-Point Scale Recoded (n = 48) 23 (1) Strong Partisan 13 8 1 3 (2) (3) (4) (9) Weak Partisan Leaner Independent Other, DK, or inconsistent Recode into Different Variable Recode: Old and New Values Empirical Research Propositions / Expectations / Hypotheses • From PS #3A: # 8 When times are bad, incumbent candidates are punished in elections. • [PS #3A] In order to test such hypotheses, we need to – identify the relevant unit of analysis (and population) and variables: DEGREE OF BADNESS OF TIMES DEGREE INC PUNISHED [elections] • It may be convenient to reverse the “polarity” of the variables: DEGREE OF GOODNESS OF TIMES DEGREE INC REWARDED [elections] • [PS #4] Then we need to operationalize (devise measures for) each variable: RATE OF EC GROWTH (Low%– High%) INC VOTE PERCENT (0% - 100%) [Presidential elections] • We then need to collect empirical data pertaining these cases that will be used to actually measure the variables of interest. Variables and Empirical Data: PS #11 • But there are other possible measures of GOODNESS/BADNESS OF THE TIMES, e.g., – unemployment rate (from CPS), – inflation rate (CPI), or – an index formed out of several economic indicators. • Or maybe GOODNESS/BADNESS OF THE TIMES shouldn’t refer (only) to economic matters. Data into SPSS Spreadsheet Election GDP INC2PC 1948 6.6 52 1952 1.7 45 1956 1.2 56 1960 2.9 50 1964 5.7 62 1968 6.2 49 1972 6.1 62 1976 5.9 49 1980 1.8 45 1984 7.9 59 1988 4.8 54 1992 3.3 47 1996 4.2 54 2000 4.9 50 2004 3.8 51 OBSERVED VALUES Election GDP INC2PC 1948 6.6 52 1952 1.7 45 1956 1.2 56 1960 2.9 50 1964 5.7 62 1968 6.2 49 1972 6.1 62 1976 5.9 49 1980 1.8 45 1984 7.9 59 1988 4.8 54 1992 3.3 47 1996 4.2 54 2000 4.9 50 2004 3.8 51 A “Scattergram” Criteria for Measuring Variables • In general, there is no “best” way to measure a variable in social science research. • Usually there are several reasonably good ways and many very bad ways. • Here are some considerations. Conceptual Clarity • We need to be conceptually clear about what it is that we are trying to measure. • For example, we may be doing research about rank and file members of the VAP and be interested in aspects of their partisanship (or independence of party). We need to be clear about what it is that we have in mind: • how people ordinarily vote; • how people voted in the most recent election (and for what offices); • how people intend to vote in the upcoming election; • how people are registered to vote (in states that have registration by party); • whether people actually belong to party organizations, clubs, etc.; and • how people think about themselves in party terms – PARTY ID is intended to measure this last aspect of partisanship. Dimensionality • Thinking about [non-dichotomous] ordinal, interval, or ratio level variables encourages us to think in dimensional terms, e.g., – low to high – small to big – left to right – cool to warm (thermometer scales) – degree of closeness (of elections) • Our measures of such variables should have similar dimensionality (vs. dichotomous or nominal). – For example, in the previous extended example, we used INC CANDIDATE VOTE %, not WHETHER OR NOT INC CANDIDATE WON Dimensionality (cont.) • Multidimensional concepts should usually be broken up into several one-dimensional variables and measures, e.g., – IDEOLOGY (Liberal to Conservative) => • ECONOMIC LIBERALISM/CONSERVATISM vs. • SOCIAL LIBERALISM /CONSERVATISM – POLITICAL EFFICACY => • “INTERNAL” POLITICAL EFFICACY based on sense of “self” ("Sometimes politics and government seem so complicated that a person like me can't really understand what's going on"?) vs. • EXTERNAL POLITICAL EFFICACY based on sense of “the system” ("I don't think public officials care much what people like me think"?). • But this may not always be true: – PARTY IDENTIFICATION (STRENGTH and DIRECTION) – (STRONG/WEAK) AGREE/DISAGREE questions (ditto) Standard or Conventional Measures • Use standard or conventional measures when they are available. • For example, if you want to do a small local survey inquiring about respondents’ party identification (among other things), it make sense to use the standard ANES measure of party identification because: – this will be generally easier than devising your own measure; – you know the ANES survey question has been extensively pretested (and is known to work well); – you will not need to describe the measure in detail when you report the results of your research (you can simply cite ANES); and – your results will then be directly comparable to ANES (and many other) studies. Public and Reproducible Measures • Measures should be public and reproducible (by others). • Every term paper, journal article, book, etc., reporting the results of empirical research should describe clearly how the variables were operationalized and measured: – in special methodology section or chapter, or – an appendix, or (if you used standard measures) – by appropriate citation. • Unless standard questions were used, a research report based on survey data should include a verbatim transcript of the relevant questions in the survey questionnaire. Indirect Measures (Indicators or Proxies) • Often we must use an indirect measure (an indicator or proxy) of a variable that has not been (and perhaps cannot be) measured directly. • For example, it may be difficult to directly measure the DEGREE OF RELIGIOSITY of individuals. – We can measure their FREQUENCY OF CHURCH (etc.) ATTENDANCE more readily (SETUPS/ANES V68) – Since it is plausible that these two variables are closely related, we may let the second (easier-to-measure) variable “stand in” as an indicator of or proxy for the first (harder-to-measure) variable. • In a sense, all survey variables are merely indicators of the variables we are truly interested in, because all survey research takes responses to questions to be more or less accurate indicators of – unobserved behavior (e.g., whether and how respondents voted) or – unobservable attitudes and opinions. • Readily available proxy for AGE (of faculty members)?? Avoid Tautologies • As suggested by PS #3A, empirical research typically involves studying relationships between two (or more) variables • This requires that each variable be independently measured; – otherwise, a hypothesis becomes a tautology (a statement that is true by definition) masquerading as an empirical proposition. – For example, “More religious people attend church (etc.) more frequently than less religious people.” Composite Measures (or Indices) • • • • • • Suppose we want to measure LEVEL OF POLITICAL TRUST/CYNICISM in survey respondents. A single question is unlikely to be helpful. Certainly a question like “How cynical are you about the political process in this country?” is unlikely to produce useful data. A single question like Q18 (Govt Waste $$) in the Student Survey is likely to be both imprecise and unreliable [defined later], since people may respond on the basis of idiosyncratic circumstances of the moment, rather than on the basis of their more general and enduring dispositions. If respondents answer in consistently cynical or trusting ways over a variety of related questions (like Student Survey Q18-20), we are likely to be more confident that the overall pattern of responses indicates something meaningful about their LEVEL OF POLITICAL CYNICISM. If you measure a variable by means of an index, it is necessary not only to describe how the individual measures are constructed but also to specify the rule of composition by which they are combined into a single overall measure. Because it is a combination of several or many measures, an index is often not expressed in any particular unit of measurement but in terms of some arbitrary index number or score, e.g., 1-5 [that is at least ordinal and is usually deemed to be an interval measure]. – Consumer Price Index (CPI) is not expressed in $$ but rather an Index Number. Accuracy in Measurement • We want our measures of variables to be as accurate as possible — that is, we want a close correspondence between our measures and the concepts of interest. • Accuracy in measurement is usefully subdivided into a number of logically distinct components: – precision, – reliability, – [lack of] bias, and – validity. Precision (Discrete Variables) With respect to discrete variables, precision refers to how “refined” the (coded) categories are; for example: RELIGIOUS AFFILIATION [U.S. Population] Least Precise [dichotomous]: 1 2 Christian Non-Christian More Precise: 1 2 3 Protestant Catholic Non-Christian Still More Precise: 1 2 3 4 Mainline Protestant Evangelical Protestant Catholic Jewish etc. Most Precise: 1 2 3 4 Episcopalian Lutheran Baptist Presbyterian, etc. Precision: Continuous Variables • With respect to continuous variables, how precisely do we try to measure and record their values? – LEVEL OF TURNOUT in 1996 – 49% 49.4% 49.3946% • Note: SPSS by default rounds percentages to the nearest 0.1%; • in handouts and A&Ds, I normally round to nearest whole %. – Individual SAT scores used to be rounded to the nearest whole point but are now rounded to the nearest 10 points – Recoding a variable (e.g. AGE, PARTY ID) is to reduce the precision of the measure for analytical purposes (usually to increase the number of cases with each value). • Precision relates only to the coding problem; other components of accuracy pertain to the more important and difficult problem of operationalization. Reliability • A measure is unreliable to the extent that, when it is applied repeatedly to the same case (whose true value remains constant), it gives (somewhat) different observed (measured) values. – All measures (in particular of continuous variables with an infinite number of possible values) are unreliable to some degree. – If you take the SAT repeatedly, you certainly don’t get the same score each time (even if scores are rounded to nearest 10 points). • When we attempt to measure a population parameter by using a sample statistic, the measurement is somewhat unreliable because of sampling error. – Perhaps SAT scores should not be rounded off but reported with a “margin of error” that reflects the degree of their inherent unreliability. • An index for measuring a variable is typically more reliable than its individual components. • A measure that is merely unreliable gives the right value on average, in contrast to a measure that is biased or invalid. Bias • A measure is biased to the extent that it tends consistently to produce measured values that are too high or too low relative to some independently determined “true value.” – Random samples produce sample statistics (for percentages and averages) that are unbiased (though unreliable). – Once recognized, bias is often easy to correct for. – Some social science measure are known (or suspected) to be biased • ANES measures of VOTING TURNOUT [biased upwards] – – – – biased responses (despite question stem) sampled vs. target population downward bias in “true turnout” obstrusive measurement • Consumer Price Index (CPI) [biased upwards?] • Unemployment Index [biased downwards?] Bias and Reliability (in Measuring a Continuous Variable) • Notice that ununreliable biased measure is more likely to produce the correct vaule than a reliable biased measure. • SAT is sometimes said to be biased, but the complaints here actually pertain to its validity. – SAT, and many other measures, cannot be said to be biased in the sense defined here, because there is no independent way of determining the “true value” of the variable in each case. – Your “true” SAT score can only be defined as your average score if you took the test many times, • relative to which individual scores cannot be biased. Validity • A measure is invalid to the extent it is actually measuring variables other than the variable it is intended to measure. – Some people say SAT really measures “test-taking ability” rather than “true” scholastic aptitude. – Others say that SAT (in part) measures “middleclassness” or socially privileged status, rather than (or, more plausibly, in addition to) scholastic aptitude. • That is, SAT is “biased against” students from poorer, lower status, minority, or immigrant backgrounds. – But this is not a problem of “bias” as defined above but of validity. The claim is not that • SAT is giving everyone scores that are too low (or too high) but • that SAT is measuring students’ test-taking ability or social status instead of, or as well as, their scholastic aptitude. Validity: Examples • A clear-cut example of an invalid test of scholastic aptitude would be one administered in English to a group of children some of whom are “native Americans” and others of whom are (children of) recent immigrants to the U.S. who speak a language other than English at home. – The test probably measures scholastic aptitude in part but clearly it also measures English language fluency as well. • Variables that pertain to characteristics of aggregates such as states, nations, or other jurisdictions (e.g., LEVEL OF CRIME, LEVEL OF TRAFFIC DEATHS) should be based on rates, and are invalid if based on totals (since totals mostly reflect SIZE OF POPULATION), e.g., – murders per 100,000 per year Question Design and Validity • Response bias • Political [Internal and External] Efficacy Index Index of respondent's trust in government, built from responses to the following questions: (1) Do you agree or disagree with the statement, "People like me don't have any say about what the government does"? (2) Do you agree or disagree with the statement, "I don't think public officials care much what people like me think"? (3) Do you agree or disagree with the statement, "Sometimes politics and government seem so complicated that a person like me can't really understand what's going on"? – Do (some) people who “agree” with all these statements • really have a low sense of political efficacy, or • are they just “agreeable” (i.e., tend to say they agree with any statement read to them)? Validity (cont.) • There is no simple or straightforward way for assessing the validity of a measure. • Several considerations: – DEGREE OF RACIAL PREJUDICE [e.g., among whites in the US]. • It would be very difficult or impossible to construct of single measure of racial prejudice across racial groups in US, let alone across cultures – Use an index of multiple items: • General validity: phrase questions carefully. • Face validity: the items pertain directly to race relations. – But “symbolic racism.” • External validity: apply the measure to “known groups.” Valid Rates: What Should the Denominator Be? • US traffic deaths: ~35,000 per year (third highest number world-wide [after India and China]) – How dangerous is it to travel by car in different countries? – Traffic fatality rate: traffic deaths/???? [per capita, per vehicle, per passenger-mile?] • Administrative costs (coding problem: what is administrative?) of Medicare vs. private insurance: – Administrative cost rate = administrative costs/total $$ • Medicare = ~3% vs. private = ~15-20% – Administrative costs per patient • Medicare = $509 vs. private = $453 – Administrative costs as a percent of $$ per patient • Prostate cancer death rates: Many more men die of prostate cancer in US than UK. – Deaths within 5 years of prostate cancer diagnosis/ all men diagnosed with prostate cancer • U.S. = ~10% vs. U.K. [with its “socialized medicine”] = ~50% – Deaths from prostate cancer/(age-adjusted) population • US ≈ UK [US has much higher incidence diagnosed prostate cancer; PSA test] 7-Category Party ID in F10 Student Survey Discussion: Class Tests as Measures of Student Mastery of Course Material • What are we trying to measure? • Relative accuracy of grades for multiple-choice tests vs. written (“blue-book”) tests? – Precision? – Reliability? – Bias? – Validity? Discussion: Accuracy of the Criminal Justice System • Criminal defendants have “true values” on the [dichotomous] variable GUILTY OR NOT GUILTY? • The criminal justice system attempts to “measure” this variable in each case. • How accurate is this measure with respect to – precision? – reliability? – bias? – validity?