Chap Le
Xianghua Luo
David Vock
William Thomas
Science is built upon rigorous observation and
experimentation. Biostatistics - the application of
statistics to understanding health and biology provides powerful tools for developing questions,
designing studies, refining measurements, and
analyzing data. A biostatistician’s unique
contribution to a research team is founded on
quantifying uncertainty in and generating sound
inferences from data. Because of the increasing
complexity and quantity of health-related data, the
need for biostatistics expertise and the need for
biostatisticians are expanding and evolving.
Biostatistics contributions take one of the two
forms: Consultation and Collaboration.
Statistical consultation is often unplanned, less
organized, and aimed at smaller projects. Groups
that focus on consultation provide a valuable
service but fail to maximize the contributions
biostatisticians can make to research. In those
organizations, biostatistics is sometimes
regarded as an ancillary service rather than an
academic discipline; investigators or clinical
departments expect biostatisticians to fill a
perceived service role.
In more modern Medical Centers, especially
Academic Medical Centers, Biostatistics supports
are organized in a way where the field has a
strong identity – as an academic discipline ,
which spurs intellectual growth, values
methodological contributions to health-related
research. And contributions are made through
collaborations where biostatisticians get involved
early and in a continuing manner in each and all
projects, from developing questions, designing
studies, refining measurements, to analyzing
data, and publishing results.
All of us in applied environment still provide
some statistical consultation – because not
all investigators are experienced; but even
those gradually becoming more like
“mentoring” instead of consulting. Those
who have been around for a while are often
involved in more meaningful, more
rewarding collaborations. So, where do we
contribute? The following few slides provide
a simple picture of the makeup of a research
Truth in
The Universe
Research Question
Truth in
The Study
Study Plan
Findings in
The Study
Study Data
The biggest thread or the most
important component in research is
the concept of “validity”. It involves
the assessment against accepted
standards; we have to be sure that
the evaluation covers its intended
target or targets.
Two major levels of inferences are involved in
interpreting the results/findings of a study:
The first level concerns Internal validity; the
degree to which the investigator draws the
correct conclusions about what actually
happened in the study.
The second level concerns External Validity
(also referred to as generalizability or
inference); the degree to which these
conclusions could be appropriately applied to
people and events outside the study.
External Validity
Truth in
The Universe
Research Question
Internal Validity
Truth in
The Study
Study Plan
Findings in
The Study
Study Data
Biostatistics contributes to both internal validity (dealing with missing
data, refining measurements, analyzing data) and external validity (helping
to develop research question, designing study, estimating sample size)
Clinical Research
Studies can be grouped into there areas: Population,
Laboratory, and Clinical; plus Translational Research,
the component of basic science that interacts with
clinical (T1) or with population research (T2).
We form or evaluate a research or research
project from/on two different angles or
parts: the anatomy and the physiology of
research; just like the hardware and
software to run a computer operation.
 From the anatomy of the research, one
can describe/see what it’s made of; this
includes the tangible elements of the
study plan: research question, design,
subjects, measurements, sample size
calculation, etc…
 The goal is to create these elements in
a form that will make the project
feasible, efficient, and cost-effective.
 From the physiology of the research,
one can describe/see how it works; first
about what happened in the study
sample and then about how study
findings generalized to people outside
the study.
 The goal is to minimize the errors that
threaten conclusions based on these
• The structure of a Research Project,
both its anatomy and physiology parts,
are described in its protocol; the written
part of the study.
• The Protocol have a vital scientific
function to help the investigator
organize his/her research in a logical,
focused, & efficient way.
 Research Question: What is the objective of the study,
the uncertainty the investigator wants to resolve?
 Background and Significance: Why these questions
 Design: How is the study structured?
 Subjects: Who are the subjects and how they will be
selected and recruited.
 Variables: What measurements will be made: predictors,
confounders, and outcomes.
 Statistical Considerations: How large is the study and
how will data be analyzed (“Design” is an important
statistical component but listed in the Design Section).
You can see “Statistical Fingerprints”
everywhere! But productive contributions
require some understanding of the
“Content Sciences”. That’s why many
statisticians are gradually specialized in
only a few areas of biomedical research.
SOME PROJECTS in Chap Le’s Portfolio:
(1) P01: Biology and Transplantation of the Human Stem Cell
Director: Phil McGlave; NCI: 7/1/10-6/30/15
This program project has three projects, all are in Minnesota; They focus
on three important issues in the UCB transplant setting: 1) graft versus host
disease (GVHD); 2) delayed immune reconstitution with resultant late
infection; and 3) refractory or relapsed leukemia. Statisticians: Chap Le
(Core Director), Qing Cao, Todd DeFor, Bruce Lindgren, Xianghua Luo, and
Ryan Shanley.
(2) P01: NK Cells, Their Receptors and Unrelated Donor Transplant
Director: Jeff Miller; NCI: 9/1/10-7/31/15
This Program includes a group of international experts in NK cell biology
and bone marrow transplantation collaborating to investigate the relevance of
NK alloreactivity in URD HCT, a setting where KIR repertoires differ in nearly
all donor-recipient pairs. This program project has three projects; one is here
in Minnesota, one at Stanford University, and the third one is a multi-center
randomized Clinical Trial with a PI here. Statisticians: Chap Le (Core
Director), Todd DeFor, Xianghua Luo, and Yan Zhang.
SOME PROJECTS in Chap Le’s Portfolio:
(3) P30: Cancer Center Support Grant (CCSG)
Director: Doug Yee; NCI: 6/1/98-1/31/14
The Masonic Cancer Center has 8 research programs (Cancer Outcomes
and Survivorship, Carcinogenesis and Chemoprevention, Genetic
Mechanisms of Cancers, Immunology, Prevention and Etiology, Transplant
Biology and Therapy, Tumor Microenvironment, and Cell Signaling);
Biostatistics and Bioinformatics is one of its 13 Shared Resources.
Statisticians: Chap Le (Core Director), Haitao Chu, Yen-Yi Ho, Robin Bliss,
Todd DeFor, Bruce Lindgren, and Yan Zhang.
(4) P30: Minnesota Obesity Center
Director: Allen Levine; NIDDK: 9/30/1995 -3/31/2016
The Center has 73 active investigators with 137 funded projects related to
obesity, energy metabolism and eating disorders and is one of 12 funded
Nutrition Obesity Research Centers; statisticians are part of the Biostatistics
and Epidemiology Core led by Dr. Robert Jeffery of Epidemiology/SPH (basic
rationale is the relationship between obesity and cancers). Statisticians:
Chap Le, Robin Bliss, and Yan Zhang.
SOME PROJECTS in Chap Le’s Portfolio:
(5) P50 (SPORE): UAB/UMN SPORE in Pancreatic Cancer
Directors: Donald Buchsbaum (Alabama) and Selwyn Vickers
(Minnesota); NCI: 8/15/2010-6/30/2015
This SPORE has four projects; two are in Birmingham, one here in
Minnesota, and one with Co-PIs in both campuses. Statisticians: Chap Le
(Core Co-Director), Yen-Yi Ho, and Bruce Lindgren, and (for pilot projects)
Xianghua Luo, Aaron Sarver, and Ryan Shanley.
(6) U54: Evaluating New Nicotine Standards for Cigarettes
Directors: Eric Donny (Pittsburgh) and Dorothy Hatsukami
(Minnesota); NIDA-FDA: 9/15/11-6/30/16
This specialized research center has four projects; one is a multi-center
clinical trial headquartered here in Minnesota, two at Pittsburgh and one at
Brown. Statisticians: Chap Le (Core Director), Qing Cao, Bruce Lindgren,
Xianghua Luo, and Joseph Koopmeiners.
SOME PROJECTS in Chap Le’s Portfolio:
(7) U19: Models for Tobacco Products Evaluation
Director: Dorothy Hatsukami; NCI: 9/20/12-9/19/17
The overall goal of this Program Project is to provide scientists and
regulatory agencies scientifically-based guidelines with methods and
measures for the evaluation of tobacco products. The new program includes
four projects – all are here in Minnesota. Statisticians: Chap Le (Core
Director), Robin Bliss, Yen-Yi Ho, Bruce Lindgren, and Yan Zhang.
(8) R01: Randomized Trial of PEITC as a Modifier of NNK Metabolism
Principal Investigator: J. Yuan; NCI: 4/1/08-1/31/13
The primary aim is to assess, via a cross-over design Clinical Trial, the
effect of PEITC supplementation (at 40 mg per day) as a modifier of NNK
metabolism in smokers. Statistician: Chap Le
(9) R01: Green Tea and Reduction of Breast Cancer Risk.
Principal Investigator: M. Kurzer; NCI: 9/11/08-7/31/13
The primary aim is to gain full understanding of the mechanisms by which
tea catechins inhibit breast carcinogenesis in humans. It’s a controlled
randomized Clinical Trial. Statistician: Chap Le
(10) P50: Tobacco Center of Regulatory Science (TCORS)
Program Director: Anne Joseph & Sharon Murphy
We are in the process of applying (deadline: November 14) to establish a
Tobacco Center of Regulatory Science (TCORS). There will be 4
projects to start, three are here and one at Boston University.
Statisticians: Chap Le (Core director), Dipankar Bandyopadyay, Haitao
Chu, Yen-Yi Ho, Bruce Lindgren, Robin Bliss, and Ryan Shanley.
• What kind of skills are needed?
• Most important: “People Skills” (to weather out the
conflict of two different cultures) – will elaborate a bit on
this hidden barrier in class.
• Years 1-3 are specially hard & important; but
“networking” is always important, even for veterans.
-- By Xianghua Luo
• Why still need to do methodological
– Help you do your consulting work better
– Give yourself a closure on a study you’ve
been involved in or a new method you have
just learnt
– One way to drive yourself to learn new stuff
• How to find topics?
– Research on how previous people analyze the
same type of data. Need a lot of reading.
– What can be improved?
– How to convince people to use the new method
you proposed? Publish!
– Write for both statistical journals and scientific
journals. (This is the proof that you care about
their scientific problems, you understand their
problems, you know their languages, etc…)
• Do you need to have your own funding
to support your research?
– Depends.
– If you need to, you will find it not that difficult
to find an existing data or ongoing study that
you are involved in. So, an R03/R21 on
secondary analysis of an existing data/project
might be a good starting point for you.
– Being a PhD means you will be a PI one way
or another someday. Better to practice early.
Analysis of Cigarette Purchase Task
Instrument Data with a Left-Censored
Mixed Effects Model
- A joint work with Liao W, Le C, Epstein LH, Yu J, Ahluwalia JS,
and Thomas J.
Cigarette Purchase Task Survey
Imagine a TYPICAL DAY during which you smoke. The following
questions ask how many cigarettes you would consume if they
cost various amounts of money. Assume the following:
• Available cigarettes are your favorite brand
• You have the same income/savings that you have now
• You have NO ACCESS to any cigarettes or nicotine products
other than those offered at these prices
• You consume the cigarettes you request on that day (in other
words, no stockpiling)
Participants were then asked to respond to the following set of
questions: How many cigarettes would you smoke if they
were_____ each?: 0¢ (free), 1¢, 5¢, 13¢, 25¢, 50¢, $1, $2, $3,
$4, $5, $6, $11, $35, $70, $140, $280, $560, $1,120.
Figure. A typical cigarette demand curve for a smoker, derived
from cigarette purchase task survey data (log-log coordinate used)
• Existing statistical methods:
– Individual-specific ordinary least square
– Mixed effects model.
• How the extra zeros/missing values are
handled in existing methods?
– Ignore all zeros or missing values;
– Impute the first zero with an arbitrary small
number ω, e.g. 0.1, but ignoring further zeros;
– Impute all zeros/missing values with ω.
• Any problems in the existing methods?
– Could the zeros be small values not
observable because they are lower than a
certain threshold (LOD)?
• Left-censored mixed effects model
– What if some zeros are real zero
consumptions (complete cessation of
• A joint modeling approach with a logistic
regression component for the cessation status and
a left-censored model for the those the complete
cessation hasn’t achieved.
What else you can do to improve your
consulting work?
– Serve as a referee for scientific journals
– Serve in protocol/proposal review committees
– Go to scientific seminars
– Be approachable, be responsible, be
professional always!
Understanding the Science in
Collaborative Research
David M. Vock, Ph.D.
My Background
• First year at University of Minnesota
• Graduate school at North Carolina State
• Interned at Duke Clinical Research Institute
• Worked on secondary manuscripts mostly in
hepatitis C, lung transplantation, and
What Does “Understanding the
Science” Entail
• Should be able to give an “elevator talk” to
another subject area expert
• Know major objectives
• Understand protocol for data collection
• Read the major recent papers
• Comprehend how study fits within the larger
research agenda of discipline
Not a Revolutionary Idea, But . . .
• Academic departments teach a certain set of
skills amenable to solving varied problems
• “Real-world” problems usually require lots of
tools to solve them  interdisciplinary teams
• Too often statisticians think of themselves as
separate from the team
Why is Understanding Science
Builds credibility with investigators
Improve the research agenda
Guide appropriate analysis
Strengthen manuscript for publication and
anticipate problems with review
• Troubleshoot problems
Builds Credibility
• Statisticians too-often viewed as another
hoop in research process
• To be part of interdisciplinary team have to be
able to speak common language
• Stats not universally known: must learn
scientific language and thought process
• Forthcoming: value to the team is increased
by understanding science
• Think of yourself as scientist with purview
over entire research process
Improve Research Agenda
• If you know the science . . .
• Focus research question – no fishing
• Help prioritize scientific hypotheses
• Ensure that the question can be answered
from the data collected
Guide appropriate analysis
• Anticipate appropriate confounders to
account for
• Prediction versus estimations problem
• Avoid analyses not scientifically interesting
• Move from associational analyses to causal
treatment analyses
• Not going to “win” every disagreement, want
to fight hardest for those points that will
affect scientific conclusions
Anticipate Problems in Review
• Extreme resistance to “different” analytical
• Must be able to justify departures from
standard analysis
• Statistical articles written in medical
journals are immensely valuable
• Want to ensure that subject-area
conclusions match analysis performed
(cannot be too speculative, either)
Troubleshoot Problems
• Example: quality of life (QOL) study part of
VALGAN trial
• Pre-specified secondary analysis of a
randomized trial of CMV prophylaxis for lung
transplant recipients
• Goal was to characterize QOL changes over
first year post-transplant using SF-36
• Preliminary analyses showed extremely
small gain in QOL even in physical domains

Le_etal_PubH8400 - Biostatistics