Politecnico di Milano
Delta Debugging
An advanced debugging technique
Authors:
Carlo Curino, Alessandro Giusti
AAIS 05
Curino, Giusti
Delta Debugging
Motivations
• Reducing faults:
• 50%-80% of total cost
• Debugging:
• One of the hardest, yet least systematic activities of
software engineering
• most time-consuming
• Locating faults:
• most difficult
S oftw are m ain te na nc e/total
softw are cost
100%
90%
80%
70%
60%
50%
1979
AAIS 05
Curino, Giusti
1984
1990
2000
Delta Debugging
Overview
•
Which problems are solved by Delta Debugging
•
Four solutions: a common approach
1. Simplifying failure-inducing input
2. Isolating failure-inducing thread schedule
3. Identifying failure-inducing changes in the code
4. Isolating Cause-Effect Chains
AAIS 05
Curino, Giusti
Delta Debugging
Failure-inducing input
• This HTML input makes Mozilla crash (segmentation fault). Which
portion is the failure-inducing one?
AAIS 05
Curino, Giusti
Delta Debugging
Thread scheduling
• The result of a multithread program seems not deterministic.
Why it happens?
AAIS 05
Curino, Giusti
Delta Debugging
Code changes
• The old version of GDB works with DDD, the new one doesn’t!
• 178.000 lines of code have been modified between the two
versions where’s the bug?
AAIS 05
Curino, Giusti
Delta Debugging
Cause-effect chain
• Which part of the program state is involved in the failure?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05
Curino, Giusti
Delta Debugging
Four solutions: a single approach
• The underlying problem is:
• Find which part of something determines the failure
So a common strategy can be applied:
• Divide et impera applied to deltas between:
• Working and failing Inputs
• Working and failing code versions
• Working and failing threads schedules
• Working and failing program states
This allows:
• Efficient and automatic debugging procedure
AAIS 05
Curino, Giusti
Delta Debugging
Common terminology
• A test case can either:
• Fail
• (The failure shows up)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Pass
• (program runs properly)
• Be Unspecified
• (different problems arise)
• Delta debugging Algorithms iteratively:
• Apply changes (to input, code, schedule or state)
• Run tests
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05
Curino, Giusti
Delta Debugging
Common terminology (2)
• Concept of difference:
• A really general delta between something in 2 test cases
• Examples:
• Difference in the input: different character (or bit) in the input
stream
• Difference in thread schedule: difference in the time a given
thread switch is performed
• Difference in the code: different statement in 2 version of a
program
• Difference in the program state: different values of the internal
variables of a program
AAIS 05
Curino, Giusti
Delta Debugging
Simplifying Failure-inducing input
AAIS 05
Curino, Giusti
Delta Debugging
Minimizing vs Isolating
• Minimizing (ddmin algorithm):
• Slower
• More human friendly
• Isolating (dd algorithm):
• Generalization of the ddmin algorithm
• Faster
• Good to generate the input of the cause-effect chain DD
AAIS 05
Curino, Giusti
Delta Debugging
Minimizing: Mozilla bug
• Minimizing:
• 57 test to simplify the 896 line
HTML input to the “<SELECT>”
tag that causes the crash
• Each character is relevant (as
shown from line 20 to 26)
• Only removes deltas from the
failing test
• Returns a n-minimal (global
minimum is NP) input that
causes a failure
AAIS 05
Curino, Giusti
Delta Debugging
Minimizing: didactic example
AAIS 05
Curino, Giusti
Delta Debugging
Isolating: Mozilla bug
• Isolating:
• Only 7 tests (instead of 26)
• Removes deltas from the failing test and add deltas to passing test
• Isolates a single delta “<” that makes the failure to go away
• Returns the 2 nearest input on failing and the other passing
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Differences
Initial Fail
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Differences
Initial Fail
What if we remove these diff
from current failing test?
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Differences
Initial Fail
Failure disappears:
“Move up”
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Initial Fail
Differences
What if we remove these diff?
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Initial Fail
Differences
UNRESOLVED TEST:
“Increase Granularity”
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Initial Fail
Differences
What if we remove these diff
from current failing test?
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
General DD Algorithm
Initial Fail
Differences
Still Fails:
“Move Down”
Initial Pass
AAIS 05
Curino, Giusti
Delta Debugging
Formally: the Algorithm
AAIS 05
Curino, Giusti
Delta Debugging
Efficiency considerations
• The worst case: |k|2 + 3|k| tests (k=cardinality of the
change set)
• all test cases are unresolved except the last one
• very unlikely
• The best case: 2*log|k|
• Try to avoid unresolved tests outcomes
• Lexical, syntactical knowledge about input
AAIS 05
Curino, Giusti
Delta Debugging
DEMO
Eclipse Plugin Live Demo
AAIS 05
Curino, Giusti
Delta Debugging
Thread Scheduling
• The behavior of a multithreaded program may depend on the
schedule.
AAIS 05
Curino, Giusti
Delta Debugging
DD applied to Thread Scheduling
• Debug is even harder here:
• Thread switches and schedules are nondeterministic
• It is difficult to reproduce and isolate failures
• Goal:
• Relate failure to a small set of relevant differences from passing
and failing schedules
• Again a “purely experimental approach”, no need to
understand the program
AAIS 05
Curino, Giusti
Delta Debugging
Purely experimental: Pros and Cons
• Pros:
• program treated as a black box: requires only to execute the
program
• Failure: an arbitrary behaviour of the program. Requires only to
distinguish failure from success.
• Cons:
• (w.r.t static analysis) Test-based: can not determine properties
for all runs of a program like the general absence of deadlocks
• require an observable failure
AAIS 05
Curino, Giusti
Delta Debugging
Dejavu tool
• Tool: Dejavu (DEterministic JAVa replay Utility) by IBM
• Reproduce of schedules and induced failures
• Exploiting Dejavu
• the Thread Schedule becomes an input
• We can generate schedules by mixing 1 running schedule and 1
failing schedule
AAIS 05
Curino, Giusti
Delta Debugging
Differences in thread scheduling
• Starting point:
• Passing run
• Failing run
• Differences (for t1):
• t1 occurs in
at time 254
• t1 occurs in
at time 278
• ∆1 = |278 − 254| induces a
statement interval: the code
executed between time 254
and 278
AAIS 05
Curino, Giusti
Delta Debugging
Differences in thread scheduling
• We can build further test cases mixing the two schedule to
isolate the relevant differences
AAIS 05
Curino, Giusti
Delta Debugging
Real life test: setting
• Test #205 of the SPEC JVM98 Java test suite
• Modification of the raytracer program to a multi-threaded version
• Introduction of a simple race condition
• Implementation of an automated test that checks failure/passing
• Generation of random schedules to find a passing schedule and a
failing schedule
• Differences between the passing and failing schedule:
• 3,842,577,240 differences
• Each diff moves thread switch time to +1 or -1
AAIS 05
Curino, Giusti
Delta Debugging
Real life test: results
• DD isolate one single difference after 50 test (about 28 min)
AAIS 05
Curino, Giusti
Delta Debugging
Real life test: pin-point the failure
• The failure occurs if and only if thread switch #33 occurs at
yield point (safe point like function invocation) 59,772,127
(instead of 59,772,126)
• at 59,772,127 line 91 is the first yield point after the initialization
of OldScenesLoaded
• At 59,772,126 line 82 is the yield point just before the initialization
of OldScenesLoaded
AAIS 05
Curino, Giusti
Delta Debugging
Real life test: conclusion
• Delta Debugging is efficient
• even when applied to very large thread schedules
(>3,000,000,000 diff)
• No analysis is required as Delta Debugging relies on
experiments alone
• only the schedule was observed and altered
• failure-inducing thread switch is easily associated with code
• Alternate runs are obtained automatically
• by generating random schedules
• only one initial run (pass or fail) is required
AAIS 05
Curino, Giusti
Delta Debugging
Code changes
• A given revision of a program behaves correctly. The next one
does not.
• Find which of the changes in the code causes the problem.
• Inconvent when difference == thousands of lines of code
AAIS 05
Curino, Giusti
Delta Debugging
The manual solution
•
Binary search through the revision history
 Regression containment
•
AAIS 05
Does not always work:
•
Multiple changes that cause the failure only when combined
(interference)
•
A single change can amount to many code lines (granularity)
•
Mixing parallel developement branches originates inconsistency
problems
Curino, Giusti
Delta Debugging
Procedure
• Developed in 1999: some differences with current general DD
algorithms.
• Consider the differences between the working and failing
revisions.
• Ignore any knowledge about the temporal ordering of the
changes.
• Goal: find a minimal failure-inducing change set.
AAIS 05
Curino, Giusti
Delta Debugging
Inconsistencies
• Mixing code changes regardless of their ordering originates
lots of tests with “Unresolved” outcome:
• Integration failure
• Construction failure
• Execution failure
• They increase complexity of the DD algorithm!
AAIS 05
Curino, Giusti
Delta Debugging
Future work
• Group related changes (partly done)  less inconsistent
trials.
• Common change dates/sources
• Location criteria
• Lexical criteria
• Syntactic criteria (common funcions/modules)
• Semantic criteria
AAIS 05
Curino, Giusti
Delta Debugging
Cause-Effect Background
• A bit of background:
• A program state is represented by variable values, and
references.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
AAIS 05
Curino, Giusti
Delta Debugging
Background (2)
• While the program runs, the state evolves.
• We assume the program is
• Deterministic
• Not interactive
 identical states at identical times have identical evolutions.
AAIS 05
Curino, Giusti
Delta Debugging
Idea: apply DD to program
states.
• We need two distinct runs:
• one failing
• one passing
• We want the two runs to be (initially) as much similar as
possibile.
• If we let the two runs evolve in parallel, their initial state will be
similar.
• Isolating failure-inducing input can help.
• Apply DD to different "slices" of the program evolution. (A
sort of TAC for computer routines).
AAIS 05
Curino, Giusti
Delta Debugging
Procedure
•
•
Iteratively
•
Build a new state mixing the passing and failing state.
•
Let the program evolve and see if it passes, fails, or does unrelated
weird things (undefined outcome).
•
Isolate the smallest subset of the state relevant for the failure.
No news so far. But:
•
AAIS 05
this happens at a specific moment of the program evolution. It will be
repeated (e.g. at important functions' entry points).
Curino, Giusti
Delta Debugging
The result
• A cause-effect chain that leads to a failure.
AAIS 05
Curino, Giusti
Delta Debugging
The cause-effect chain
•
The initial states are absolutely legitimate: for example, direct
consequence of a specific input that the program should handle.
 intended program states.
•
The final effects are the failure.
 faulty program states.
•
AAIS 05
The error lies somewhere in the middle, when an intended
program states evolves into a faulty one.
Curino, Giusti
Delta Debugging
Fascinating terminology
• A defect in the code originates an infection in the state.
• The infection usually propagates as the program evolves.
AAIS 05
Curino, Giusti
Delta Debugging
Limits
• No automatic discrimination of intended and faulty (infected)
states!
• The human user can increase resolution of slices, and
pinpoint the code that evolves an INTENDED state to a
FAULTY one.
 Correct the error (== defect in the code) and break the
cause-effect chain that leads to the failure.
AAIS 05
Curino, Giusti
Delta Debugging
Cause Transitions
• Sometimes executing an instruction
• a given variable ceases to be failure-inducing
• others begin
 the failure-inducing subset of the state changes (cause
transition)
• An algorithm can efficiently find cause transitions in causeeffect chains, by means of binary search (again).
AAIS 05
Curino, Giusti
Delta Debugging
Cause Transitions (2)
AAIS 05
Curino, Giusti
Delta Debugging
Cause Transitions (3)
Why do we bother looking for cause transitions?
•
A variable begins to cause a failure:
•
•
Good location for a fix
More important:
•
“cause transitions are significantly better locators of defects than any
other methods previously known”
•
Result: valuable help in the search for the defect: only a bunch of
cause transitions, and nearby code locations need to be analyzed
as the source of the infection.
AAIS 05
Curino, Giusti
Delta Debugging
Other approaches to defect localization
• Coverage
• Slicing
• Dynamic invariants
no success with Siemens test suite
• Explicit specification
good results, but needs specification of desired internal
behavior
• Nearest neighbor (using coverage)
best results albeit quite naive
AAIS 05
Curino, Giusti
Delta Debugging
Evaluation setup
• Siemens suite
• 7 C sample programs (hundreds of lines of code each).
• 132 variations with one realistic defect each.
• A test suite for each program.
• Apply the different defect locators, and compare their
performance (only comparison to NN is presented).
AAIS 05
Curino, Giusti
Delta Debugging
Evaluation results
AAIS 05
Curino, Giusti
Delta Debugging
Clarification
• Two small improvements;
• relevance of code locations (automatic)
• sources of infection (programmer-driven): Unfair!
Jump to the conclusion
AAIS 05
Curino, Giusti
Delta Debugging
Zoom on the representation of the state
We said:
“A program state is represented by variable values, and
references”
In general, representing and manipulating the state is not
trivial
• One of the problems: C pointers
copying their value does not make sense
Solution: Memory graphs.
AAIS 05
Curino, Giusti
Delta Debugging
Memory graphs
• Systematically unfold all data structures, starting from base
variables.
AAIS 05
Curino, Giusti
Delta Debugging
Memory graphs (2)
• Nodes: all values and all variables of a program operations
like
• Edges:
• variable access
• pointer dereferencing
• struct member access
• array element access
 Abstract from memory addresses.
 Compare and alter pointers.
AAIS 05
Curino, Giusti
Delta Debugging
Memory graphs (3)
• What if the set of variables differ in the two states we are
mixing?
• Just compute the largest common subgraph.
The deltas we apply to a state:
• Change variable values.
• Alter data structures.
AAIS 05
Curino, Giusti
Delta Debugging
Implementation considerations
• All we need is a way to access and modify program state.
• GDB is the solution for C programs, but has performance
problems (5000% overhead).
• DD applied to states is still a black box approach (sort of)
• Easily extended to other languages as soon as something provides GDBlike functionality.
AAIS 05
Curino, Giusti
Delta Debugging
Conclusions
Delta Debugging:
• is an extremely interesting technique
• works pretty good at least in theory
• there are no usable tools
• can be usefully integrated in various IDE
• the algorithm is now patent-free (expired patent)
SO :
LET’S MAKE SOME MONEY ON IT!
AAIS 05
Curino, Giusti
Delta Debugging
Acknowledgements
•
Some slides and images adapted from Dr. Andreas Zeller’s
presentations and papers
• (http://www.st.cs.uni-sb.de/~zeller/)
AAIS 05
Curino, Giusti
Delta Debugging
References
• Yesterday, My Program Worked. Today, It does Not. Why?, Andreas
Zeller, FSE 1999
• Finding Failure Causes through Automated Testing. Holger Cleve,
Andreas Zeller; 4° International Workshop on Automated
Debugging 2000
• Simplifying failure-inducing input, Ralf Hildebrandt, Andreas Zeller,
ISSTA 2000
• Automated Debugging: Are We Close? Andreas Zeller; IEEE
Computer, November 2001.
• Isolating Failure-Inducing Thread Schedules. Jong-Deok Choi and
Andreas Zeller, ISSTA 2002
• Isolating Cause-Effect Chains from Computer Programs, Andreas
Zeller, FSE 2002
• Locating Causes of Program Failures. Holger Cleve and Andreas
Zeller, ICSE 2005
AAIS 05
Curino, Giusti
Delta Debugging
Descargar

Retargetable Compilers