```MEASURING VARIABLES
Topic #4
Empirical Research Propositions /
Expectations / Hypotheses
• Consider propositions #1 and #16 in Problem Set #3A,
which we can state more formally as follows:
– #1 DEGREE OF SENIORITY is associated with DEGREE OF
PRAGMATISM [among individuals, in particular members of
Congress].
– #16 DEGREE OF PROPORTIONALITY is associated with
NUMBER OF POLITICAL PARTIES [among the electoral systems
of the world]. [“Duverger’s Law”]
• In order to test such hypotheses, we need to
– identify the relevant unit of analysis and population,
– Identify the variables of interest pertaining to these variables, and
– collect empirical data concerning such cases.
Empirical Research (cont.)
• But before we can start collecting data, we need to
devise some way of actually measuring the variables
that we have conceptually identified in the cases we are
using.
• This requires us to establish some kind of linkage or
correspondence between the data that we will collect
and the “conceptual” (or “abstract”) variables in the
hypothesis.
• For some variables, the measurement problem is
relatively straightforward, but for other variables it can be
very difficult.
• In political science especially, appropriate data about
individuals often (but certainly not always) comes from
surveys.
– In doing Problem Set #4, students too often assume that data
used to measure variables must always come from surveys.
Coding vs. Operationalization
• Measuring variables entails two distinct though interrelated problems.
• The coding problem is relatively straightforward:
– It entails determining what the variable will “look like” in a
codebook, i.e., its name, description, and (in particular) its range
of possible values (in the manner of Problem Set #3A).
– Will it be dichotomous or more “refined”?
– Will the values be nominal, ordinal, interval or ratio?
• The operationalization problem is often much more
difficult.
– It entails specifying the practical operations that will be used to
“observe” or “measure” the actual value of the variable in each
case, so that data is appropriately collected and (if necessary)
coded.
• In sentences #2 and #11 of Problem Set #3A, LEVEL OF
EDUCATION (pertaining to individuals) is one of the
variables that has been identified. Let us consider how
this variable might be measured, i.e., first coded and
then operationalized.
Coding LEVEL OF EDUCATION
We must decide what the range of possible values will look
like (and whether these possible values will result in a
nominal, ordinal, interval, or ratio variable). Here are
some possibilities:
(1) LEVEL OF EDUCATION (dichotomous):
1 Low
2 High
(2) LEVEL OF EDUCATION (qualitative/ordinal but quite “imprecise” ):
1 Low
2 Medium
3 High
Coding LEVEL OF EDUCATION (cont.)
(3) LEVEL OF EDUCATION: HIGHEST LEVEL ATTAINED
(qualitative/ordinal and more “precise” )
1
2
3
4
5
6
7
8
(4)
Middle school
Some high school
Some college
LEVEL OF EDUCATION: NUMBER OF YEARS OF FORMAL
EDUCATION (quantitative / interval):
ACTUAL NUMBER OF YEARS OF FORMAL EDUCATION [uncoded]
(5)
LEVEL OF EDUCATION (quantitative / interval):
SOME KIND OF NUMERICAL SCALE DERIVED FROM A TEST
(perhaps 0-100) [uncoded]
Operationalizing LEVEL OF EDUCATION
• We must decide what practical operations we will use to
assign a particular value of LEVEL OF EDUCATION to a
particular case (individual)
• If we are doing survey research, we presumably will
determine a respondent’s level of education simply by
asking an appropriately phrased question and recording
the response.
– See variable V62 in the SETUPS/NES Codebook.
• “What is the highest grade of school that you have
completed?” (with responses recoded in coding categories)
– The full-scale ANES uses this basic question plus
several follow-up questions.)
– Validation?
• In other circumstances — for example if we were looking
at student testing data — we might use documentary
records.
Operationalizing other Variables [from
PS#3A]: Class Discussion
• LEVEL OF SENIORITY (#1)
• DEGREE OF PRAGMATISM (#1)
• DEGREE OF RELIGIOSITY (#2)
• DIRECTION OF IDEOLOGY or DEGREE OF
LIBERALISM/CONSERVATISM (sentence #14 or
Congressional study)
• DEGREE OF PROPORTIONALITY (#16)
• NUMBER OF POLITICAL PARTIES (#16)
Party ID – Seven-Point Scale (Q1 + Q2)
Party ID – Seven-Point Scale (n = 48)
15
10
3
1
5
3
8
3
(1)
(2)
(3)
(4)
(5)
(6)
(4)
(9)
Strong Democrat
Weak Democrat
Democratic Leaner
Pure Independent
Republican Leaner
Weak Republican
Strong Republican
Other, DK, or inconsistent
Recoding/Transforming Variables
• Remember that the 7-Category PARTY
IDENTIFICATION variable [V09] is formed out of
responses to two questions (Q1 and Q2 in Student
Survey).
• V09 in turn can be recoded into two other variables
Party ID – Seven-Point Scale Recoded
DIRECTION OF PARTY ID
Party ID – Seven-Point Scale Recoded (n = 48)
28
(1)
Democrat
1
16
3
(2)
(3)
(9)
Independent
Republican
Other, DK, or inconsistent
Party ID – Seven-Point Scale Recoded
STRENGTH OF PARTY ID
Party ID – Seven-Point Scale Recoded (n = 48)
23
(1)
Strong Partisan
13
8
1
3
(2)
(3)
(4)
(9)
Weak Partisan
Leaner
Independent
Other, DK, or inconsistent
Recode into Different Variable
Recode: Old and New Values
Empirical Research Propositions /
Expectations / Hypotheses
• From PS #3A: # 8 When times are bad, incumbent candidates are
punished in elections.
• [PS #3A] In order to test such hypotheses, we need to
– identify the relevant unit of analysis (and population) and variables:
DEGREE INC PUNISHED [elections]
• It may be convenient to reverse the “polarity” of the variables:
DEGREE OF GOODNESS OF TIMES DEGREE INC REWARDED [elections]
• [PS #4] Then we need to operationalize (devise measures for) each
variable:
RATE OF EC GROWTH
(Low%– High%)
INC VOTE PERCENT
(0% - 100%)
[Presidential elections]
• We then need to collect empirical data pertaining these cases that will
be used to actually measure the variables of interest.
Variables and Empirical Data: PS #11
• But there are other possible measures of GOODNESS/BADNESS OF
THE TIMES, e.g.,
– unemployment rate (from CPS),
– inflation rate (CPI), or
– an index formed out of several economic indicators.
• Or maybe GOODNESS/BADNESS OF THE TIMES shouldn’t refer
(only) to economic matters.
Data into SPSS
Election GDP INC2PC
1948
6.6
52
1952
1.7
45
1956
1.2
56
1960
2.9
50
1964
5.7
62
1968
6.2
49
1972
6.1
62
1976
5.9
49
1980
1.8
45
1984
7.9
59
1988
4.8
54
1992
3.3
47
1996
4.2
54
2000
4.9
50
2004
3.8
51
OBSERVED VALUES
Election GDP INC2PC
1948
6.6
52
1952
1.7
45
1956
1.2
56
1960
2.9
50
1964
5.7
62
1968
6.2
49
1972
6.1
62
1976
5.9
49
1980
1.8
45
1984
7.9
59
1988
4.8
54
1992
3.3
47
1996
4.2
54
2000
4.9
50
2004
3.8
51
A “Scattergram”
Criteria for Measuring Variables
• In general, there is no “best” way to measure a
variable in social science research.
• Usually there are several reasonably good ways
• Here are some considerations.
Conceptual Clarity
• We need to be conceptually clear about what it is that we
are trying to measure.
• For example, we may be doing research about rank and file
members of the VAP and be interested in aspects of their
partisanship (or independence of party). We need to be clear
about what it is that we have in mind:
• how people ordinarily vote;
• how people voted in the most recent election (and for what
offices);
• how people intend to vote in the upcoming election;
• how people are registered to vote (in states that have
registration by party);
• whether people actually belong to party organizations, clubs,
etc.; and
• how people think about themselves in party terms
– PARTY ID is intended to measure this last aspect of
partisanship.
Dimensionality
• Thinking about [non-dichotomous] ordinal, interval, or
ratio level variables encourages us to think in
dimensional terms, e.g.,
– low to high
– small to big
– left to right
– cool to warm (thermometer scales)
– degree of closeness (of elections)
• Our measures of such variables should have similar
dimensionality (vs. dichotomous or nominal).
– For example, in the previous extended example, we used INC
CANDIDATE VOTE %, not WHETHER OR NOT INC
CANDIDATE WON
Dimensionality (cont.)
• Multidimensional concepts should usually be broken up
into several one-dimensional variables and measures,
e.g.,
– IDEOLOGY (Liberal to Conservative) =>
• ECONOMIC LIBERALISM/CONSERVATISM vs.
• SOCIAL LIBERALISM /CONSERVATISM
– POLITICAL EFFICACY =>
• “INTERNAL” POLITICAL EFFICACY based on sense of “self”
("Sometimes politics and government seem so complicated that a
person like me can't really understand what's going on"?) vs.
• EXTERNAL POLITICAL EFFICACY based on sense of “the system”
("I don't think public officials care much what people like me think"?).
• But this may not always be true:
– PARTY IDENTIFICATION (STRENGTH and DIRECTION)
– (STRONG/WEAK) AGREE/DISAGREE questions (ditto)
Standard or Conventional Measures
• Use standard or conventional measures when they are
available.
• For example, if you want to do a small local survey
inquiring about respondents’ party identification (among
other things), it make sense to use the standard ANES
measure of party identification because:
– this will be generally easier than devising your own measure;
– you know the ANES survey question has been extensively
pretested (and is known to work well);
– you will not need to describe the measure in detail when you
report the results of your research (you can simply cite ANES);
and
– your results will then be directly comparable to ANES (and many
other) studies.
Public and Reproducible Measures
• Measures should be public and reproducible (by others).
• Every term paper, journal article, book, etc., reporting the
results of empirical research should describe clearly how
the variables were operationalized and measured:
– in special methodology section or chapter, or
– an appendix, or (if you used standard measures)
– by appropriate citation.
• Unless standard questions were used, a research report
based on survey data should include a verbatim
transcript of the relevant questions in the survey
questionnaire.
Indirect Measures (Indicators or Proxies)
• Often we must use an indirect measure (an indicator or
proxy) of a variable that has not been (and perhaps
cannot be) measured directly.
• For example, it may be difficult to directly measure the
DEGREE OF RELIGIOSITY of individuals.
– We can measure their FREQUENCY OF CHURCH (etc.)
– Since it is plausible that these two variables are closely related, we may
let the second (easier-to-measure) variable “stand in” as an indicator of
or proxy for the first (harder-to-measure) variable.
• In a sense, all survey variables are merely indicators of
the variables we are truly interested in, because all
survey research takes responses to questions to be
more or less accurate indicators of
–
unobserved behavior (e.g., whether and how respondents voted) or
– unobservable attitudes and opinions.
• Readily available proxy for AGE (of faculty members)??
Avoid Tautologies
• As suggested by PS #3A, empirical research typically
involves studying relationships between two (or more)
variables
• This requires that each variable be independently
measured;
– otherwise, a hypothesis becomes a tautology (a
statement that is true by definition) masquerading as
an empirical proposition.
– For example, “More religious people attend church
(etc.) more frequently than less religious people.”
Composite Measures (or Indices)
•
•
•
•
•
•
Suppose we want to measure LEVEL OF POLITICAL TRUST/CYNICISM in
survey respondents.
A single question is unlikely to be helpful. Certainly a question like “How
cynical are you about the political process in this country?” is unlikely to
produce useful data.
A single question like Q18 (Govt Waste \$\$) in the Student Survey is likely to
be both imprecise and unreliable [defined later], since people may respond
on the basis of idiosyncratic circumstances of the moment, rather than on
the basis of their more general and enduring dispositions.
If respondents answer in consistently cynical or trusting ways over a variety
of related questions (like Student Survey Q18-20), we are likely to be more
confident that the overall pattern of responses indicates something meaningful about their LEVEL OF POLITICAL CYNICISM.
If you measure a variable by means of an index, it is necessary not only to
describe how the individual measures are constructed but also to specify
the rule of composition by which they are combined into a single overall
measure.
Because it is a combination of several or many measures, an index is often
not expressed in any particular unit of measurement but in terms of some
arbitrary index number or score, e.g., 1-5 [that is at least ordinal and is
usually deemed to be an interval measure].
– Consumer Price Index (CPI) is not expressed in \$\$ but rather an Index Number.
Accuracy in Measurement
• We want our measures of variables to be as accurate as
possible — that is, we want a close correspondence
between our measures and the concepts of interest.
• Accuracy in measurement is usefully subdivided into a
number of logically distinct components:
– precision,
– reliability,
– [lack of] bias, and
– validity.
Precision (Discrete Variables)
With respect to discrete variables, precision refers to how “refined” the
(coded) categories are; for example:
RELIGIOUS AFFILIATION [U.S. Population]
Least Precise [dichotomous]:
1
2
Christian
Non-Christian
More Precise:
1
2
3
Protestant
Catholic
Non-Christian
Still More Precise:
1
2
3
4
Mainline Protestant
Evangelical Protestant
Catholic
Jewish
etc.
Most Precise:
1
2
3
4
Episcopalian
Lutheran
Baptist
Presbyterian, etc.
Precision: Continuous Variables
• With respect to continuous variables, how precisely do
we try to measure and record their values?
– LEVEL OF TURNOUT in 1996
– 49%
49.4%
49.3946%
• Note: SPSS by default rounds percentages to the nearest
0.1%;
• in handouts and A&Ds, I normally round to nearest whole %.
– Individual SAT scores used to be rounded to the nearest whole
point but are now rounded to the nearest 10 points
– Recoding a variable (e.g. AGE, PARTY ID) is to reduce the
precision of the measure for analytical purposes (usually to
increase the number of cases with each value).
• Precision relates only to the coding problem; other
components of accuracy pertain to the more important
and difficult problem of operationalization.
Reliability
• A measure is unreliable to the extent that, when it is applied
repeatedly to the same case (whose true value remains
constant), it gives (somewhat) different observed
(measured) values.
– All measures (in particular of continuous variables with an infinite
number of possible values) are unreliable to some degree.
– If you take the SAT repeatedly, you certainly don’t get the same
score each time (even if scores are rounded to nearest 10 points).
• When we attempt to measure a population parameter by
using a sample statistic, the measurement is somewhat
unreliable because of sampling error.
– Perhaps SAT scores should not be rounded off but reported with a
“margin of error” that reflects the degree of their inherent unreliability.
• An index for measuring a variable is typically more reliable
than its individual components.
• A measure that is merely unreliable gives the right value on
average, in contrast to a measure that is biased or invalid.
Bias
• A measure is biased to the extent that it tends
consistently to produce measured values that are too
high or too low relative to some independently
determined “true value.”
– Random samples produce sample statistics (for
percentages and averages) that are unbiased (though
unreliable).
– Once recognized, bias is often easy to correct for.
– Some social science measure are known (or
suspected) to be biased
• ANES measures of VOTING TURNOUT [biased upwards]
–
–
–
–
biased responses (despite question stem)
sampled vs. target population
downward bias in “true turnout”
obstrusive measurement
• Consumer Price Index (CPI) [biased upwards?]
• Unemployment Index [biased downwards?]
Bias and Reliability (in Measuring a
Continuous Variable)
• Notice that ununreliable biased measure is more likely to produce
the correct vaule than a reliable biased measure.
• SAT is sometimes said to be biased, but the complaints here
actually pertain to its validity.
– SAT, and many other measures, cannot be said to be biased in
the sense defined here, because there is no independent way of
determining the “true value” of the variable in each case.
– Your “true” SAT score can only be defined as your average
score if you took the test many times,
• relative to which individual scores cannot be biased.
Validity
• A measure is invalid to the extent it is actually measuring
variables other than the variable it is intended to
measure.
– Some people say SAT really measures “test-taking
ability” rather than “true” scholastic aptitude.
– Others say that SAT (in part) measures “middleclassness” or socially privileged status, rather than
(or, more plausibly, in addition to) scholastic aptitude.
• That is, SAT is “biased against” students from poorer, lower
status, minority, or immigrant backgrounds.
– But this is not a problem of “bias” as defined above
but of validity. The claim is not that
• SAT is giving everyone scores that are too low (or too high)
but
• that SAT is measuring students’ test-taking ability or social
status instead of, or as well as, their scholastic aptitude.
Validity: Examples
• A clear-cut example of an invalid test of scholastic
aptitude would be one administered in English to a group
of children some of whom are “native Americans” and
others of whom are (children of) recent immigrants to the
U.S. who speak a language other than English at home.
– The test probably measures scholastic aptitude in part but
clearly it also measures English language fluency as well.
• Variables that pertain to characteristics of aggregates
such as states, nations, or other jurisdictions (e.g.,
LEVEL OF CRIME, LEVEL OF TRAFFIC DEATHS)
should be based on rates, and are invalid if based on
totals (since totals mostly reflect SIZE OF POPULATION), e.g.,
– murders per 100,000 per year
Question Design and Validity
• Response bias
• Political [Internal and External] Efficacy Index
Index of respondent's trust in government, built from responses
to the following questions:
(1) Do you agree or disagree with the statement, "People like
me don't have any say about what the government does"?
(2) Do you agree or disagree with the statement, "I don't think
public officials care much what people like me think"?
(3) Do you agree or disagree with the statement, "Sometimes
politics and government seem so complicated that a person
like me can't really understand what's going on"?
– Do (some) people who “agree” with all these
statements
• really have a low sense of political efficacy, or
• are they just “agreeable” (i.e., tend to say they agree with
Validity (cont.)
• There is no simple or straightforward way for assessing
the validity of a measure.
• Several considerations:
– DEGREE OF RACIAL PREJUDICE [e.g., among
whites in the US].
• It would be very difficult or impossible to construct of single
measure of racial prejudice across racial groups in US, let
alone across cultures
– Use an index of multiple items:
• General validity: phrase questions carefully.
• Face validity: the items pertain directly to race relations.
– But “symbolic racism.”
• External validity: apply the measure to “known groups.”
Valid Rates:
What Should the Denominator Be?
• US traffic deaths: ~35,000 per year (third highest number world-wide
[after India and China])
– How dangerous is it to travel by car in different countries?
– Traffic fatality rate: traffic deaths/???? [per capita, per vehicle, per passenger-mile?]
Medicare vs. private insurance:
• Medicare = ~3% vs. private = ~15-20%
• Medicare = \$509 vs. private = \$453
– Administrative costs as a percent of \$\$ per patient
• Prostate cancer death rates: Many more men die of prostate cancer in
US than UK.
– Deaths within 5 years of prostate cancer diagnosis/ all men diagnosed with prostate
cancer
• U.S. = ~10% vs. U.K. [with its “socialized medicine”] = ~50%
– Deaths from prostate cancer/(age-adjusted) population
• US ≈ UK [US has much higher incidence diagnosed prostate cancer; PSA test]
7-Category Party ID in F10 Student
Survey
Discussion: Class Tests as Measures
of Student Mastery of Course Material
• What are we trying to measure?
• Relative accuracy of grades for multiple-choice
tests vs. written (“blue-book”) tests?
– Precision?
– Reliability?
– Bias?
– Validity?
Discussion: Accuracy of the Criminal
Justice System
• Criminal defendants have “true values” on the
[dichotomous] variable GUILTY OR NOT GUILTY?
• The criminal justice system attempts to “measure” this
variable in each case.
• How accurate is this measure with respect to
– precision?
– reliability?
– bias?
– validity?
```