Data Mining:
Concepts and Techniques
— Chapter 11 —
— Additional Theme: Software Bug Mining—
Jiawei Han and Micheline Kamber
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber. All rights reserved.
Acknowledgement: Chao Liu
10/3/2015
Data Mining: Principles and Algorithms
1
10/3/2015
Data Mining: Principles and Algorithms
2
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
3
Motivation

Software is “full of bugs”

Windows 2000, 35 million lines of code


63,000 known bugs at the time of release, 2 per 1000 lines
Software failure costs



Courtesy to CNN.com
Ariane 5 explosion due to “errors in the software of the inertial
reference system” (Ariaen-5 flight 501 inquiry board report
http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf)
A study by the National Institute of Standards and Technology found
that software errors cost the U.S. economy about $59.5 billion
annually http://www.nist.gov/director/prog-ofc/report02-3.pdf
Testing and debugging are laborious and expensive

10/3/2015
“50% of my company employees are testers, and the rest spends
50% of their time testing!” —Bill Gates, in 1995
Data Mining: Principles and Algorithms
4
A Glimpse on Software Bugs


Crashing bugs

Symptoms: segmentation faults

Reasons: memory access violations

Tools: Valgrind, CCured
Noncrashing bugs

Symptoms: unexpected outputs

Reasons: logic or semantic errors




10/3/2015
if ((m >= 0)) vs. if ((m >= 0) && (m != lastm))
< vs. <=, > vs. >=, etc ..
j = i vs. j= i+1
Tools: No sound tools
Data Mining: Principles and Algorithms
5
Example of Noncrashing Bugs
void
void subline(char
subline(char *lin,
*lin, char
char *pat,
*pat, char
char *sub)
*sub)
{{
int
int i,
i, lastm,
lastm, m;
m;
lastm
lastm == -1;
-1;
ii == 0;
0;
while((lin[i]
ENDSTR)) {{
while((lin[i] !=
!= ENDSTR))
mm == amatch(lin,
amatch(lin, i,
i, pat,
pat, 0);
0);
if
0) && (lastm != m) ){
if ((m
(m >=
>>=0){
0){
putsub(lin,
putsub(lin, i,
i, m,
m, sub);
sub);
lastm
lastm == m;
m;
}}
if
if ((m
((m ==
== -1)
-1) ||
|| (m
(m ==
== i)){
i)){
fputc(lin[i],
fputc(lin[i], stdout);
stdout);
ii == ii ++ 1;
1;
}} else
else
ii == m;
m;
}}
}}
10/3/2015
Data Mining: Principles and Algorithms
6
Debugging Crashes
Crashing Bugs
10/3/2015
Data Mining: Principles and Algorithms
7
Bug Localization via Backtrace

Can we circle out the backtrace for noncrashing bugs?

Major challenges


We do not know where abnormality happens
Observations

Classifications depend on discriminative features,
which can be regarded as a kind of abnormality

10/3/2015
Can we extract backtrace from classification results?
Data Mining: Principles and Algorithms
8
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
9
Related Work

Crashing bugs

Memory access monitoring


Purify [HJ92], Valgrind [SN00] …
Noncrashing bugs
10/3/2015

Static program analysis

Traditional model checking

Model checking source code
Data Mining: Principles and Algorithms
10
Static Program Analysis

Methodology
 Examine source code directly
 Enumerate all the possible execution paths without running the
program
 Check user-specified properties, e.g.






free(p) ×
…… (*p)
lock(res) …… unlock(res)
receive_ack() … … send_data()
Strengths
 Check all possible execution paths
Problems
 Shallow semantics
 Properties can be directly mapped to source code structure
Tools
 ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker
[ECC00] …
10/3/2015
Data Mining: Principles and Algorithms
11
Traditional Model Checking




Methodology
 Formally model the system under check in a particular
description language
 Exhaustive exploration of the reachable states in
checking desired or undesired properties
Strengths
 Model deep semantics
 Naturally fit in checking event-driven systems, like
protocols
Problems
 Significant amount of manual efforts in modeling
 State space explosion
Tools
 SMV [M93], SPIN [H97], Murphi [DDH+92] …
10/3/2015
Data Mining: Principles and Algorithms
12
Model Checking Source Code

Methodology
 Run real program in sandbox
 Manipulate event happenings, e.g.,




Strengths
 Less significant manual specification
Problems
 Application restrictions, e.g.,



Message incomings
the outcomes of memory allocation
Event-driven programs (still)
Clear mapping between source code and logic event
Tools
 CMC [MPC+02], Verisoft [G97], Java PathFinder
[BHP+-00] …
10/3/2015
Data Mining: Principles and Algorithms
13
Summary of Related Work

In common,
 Semantic inputs are necessary



Application scenarios


10/3/2015
Program model
Properties to check
Shallow semantics
Event-driven system
Data Mining: Principles and Algorithms
14
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
15
Example Revisited
void subline(char *lin, char *pat, char *sub)
{
int i, lastm, m;
lastm = -1;
• No memory violations
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m
((m>=
>>=
0){
0){
0) && (lastm != m) ){
putsub(lin, i, m, sub);
• Not event-driven program
• No explicit error properties
lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
10/3/2015
Data Mining: Principles and Algorithms
16
Identification of Incorrect Executions

A two-class classification problem
 How to abstract program executions


Feature selection


Program behavior graph
Edges + Closed frequent subgraphs
Program behavior graphs
 Function-level abstraction of program behaviors
int main(){
...
A();
...
B();
}
int A(){ ... }
int B(){ ... C() ... }
int C(){ ... }
10/3/2015
Data Mining: Principles and Algorithms
17
Values of Classification


A graph classification problem
 Every execution gives one behavior graph
 Two sets of instances: correct and incorrect
Values of classification
 Classification itself does not readily work for bug
localization



? relies on discriminative features
Successful classification


Can discriminative features be treated as a kind of abnormality?
When abnormality happens?

10/3/2015
Classifier only labels each run as either correct or incorrect as a
whole
It does not tell when abnormality happens
Incremental classification?
Data Mining: Principles and Algorithms
18
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
19
Incremental Classification



10/3/2015
Classification works only when
instances of two classes are
different.
So that we can use classification
accuracy as a measure of
difference.
Relate classification dynamics to
bug relevant functions
Data Mining: Principles and Algorithms
20
Illustration: Precision Boost
main
A
B
A
E
C
D
main
F
D
One Correct Execution
10/3/2015
C
B
G
E
F
H
G
One Incorrect Execution
Data Mining: Principles and Algorithms
21
Bug Relevance

Precision boost
 For each function F:


Intuition




Precision boost = Exit precision - Entrance
precision.
Differences take place within the execution of F
Abnormalities happens while F is in the stack
The larger this precision boost, the more likely F
is part of the backtrace
Bug-relevant function
10/3/2015
Data Mining: Principles and Algorithms
22
Outline

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Case Study

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
23
Case Study
void subline(char *lin, char *pat, char *sub)
{

int i, lastm, m;
lastm = -1;
i = 0;
while((lin[i] != ENDSTR)) {
m = amatch(lin, i, pat, 0);
if (m
((m>=
>=0){
0) && (lastm != m) ){
putsub(lin, i, m, sub);

lastm = m;
}
if ((m == -1) || (m == i)){
fputc(lin[i], stdout);
i = i + 1;
} else
i = m;
}
}
10/3/2015

Subject program
 replace: perform regular
expression matching and
substitutions
 563 lines of C code
 17 functions are involved
Execution behaviors
 130 out of 5542 test cases
fail to give correct outputs
 No incorrect executions incur
segmentation faults
Logic bug
 Can we circle out the
backtrace for this bug?
Data Mining: Principles and Algorithms
24
Precision Pairs
10/3/2015
Data Mining: Principles and Algorithms
25
Precision Boost Analysis




10/3/2015
Objective judgment of
bug relevant functions
main function is always
bug relevant
Stepwise precision
boost
Line-up property
Data Mining: Principles and Algorithms
26
Backtrace for Noncrashing Bugs
10/3/2015
Data Mining: Principles and Algorithms
27
Method Summary

Identify incorrect executions from program runtime
behaviors

Classification dynamics can give away “backtrace” for
noncrashing bugs without any semantic inputs

Data mining can contribute to software engineering and
system researches in general
10/3/2015
Data Mining: Principles and Algorithms
28
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
29
An Example
void dodash(char delim, char *src, int *i, char *dest, int *j, int
maxset)
{
while (…){
…
if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){
for(k = src[*i-1]+1; k<=src[*i+1]; k++)
junk = addst(k, dest, j, maxset);
*i = *i + 1;
}
*i = *i + 1;
}
}



Replace program: 563 lines of C code, 20 functions
Symptom: 30 out of 5542 test cases fail to give correct outputs, and no
crashes
Goal: Localizing the bug, and prioritizing manual examination
10/3/2015
Data Mining: Principles and Algorithms
30
Difficulty & Expectation


10/3/2015
Difficulty
 Statically, even small programs are complex
due to dependencies
 Dynamically, execution paths can vary
significantly across all possible inputs
 Logic errors have no apparent symptoms
Expectations
 Unrealistic to fully unload developers
 Localize buggy region
 Prioritize manual examination
Data Mining: Principles and Algorithms
31
Execution Profiling


10/3/2015
Full execution trace
 Control flow + value tags
 Too expensive to record at runtime
 Unwieldy to process
Summarized control flow for conditionals (if,
while, for)
 Branch evaluation counts
 Lightweight to take at runtime
 Easy to process and effective
Data Mining: Principles and Algorithms
32
Analysis of the Example
if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){
for(k = src[*i-1]+1; k<=src[*i+1]; k++)
junk = addst(k, dest, j, maxset);
*i = *i + 1; }
A = isalnum(isalnum(src[*i+1]))
B = src[*i-1]<=src[*i+1]


An execution is logically correct until (A ^ ¬B) is evaluated as true
when the evaluation reaches this condition
If we monitor the program conditionals like A here, their evaluation
will shed light on the hidden error and can be exploited for error
isolation
10/3/2015
Data Mining: Principles and Algorithms
33
Analysis of Branching Actions



Correct vs. in correct runs in program P
B
A
nAB
¬A
n¬AB
¬B
nA¬B = 0
n¬A¬B
B
¬B
A
nAB
nA¬B ≥1
¬A
n¬AB
n¬A¬B
AS we tested through 5542 test cases, the true eval prob
for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect
execution on average
Error location does exhibit detectable abnormal
behaviors in incorrect executions
10/3/2015
Data Mining: Principles and Algorithms
34
Conditional Test Works for
Nonbranching Errors
Void makepat (char *arg, int start, char delim, char *pat)
{
…
if (!junk)
result = 0;
else
result = i + 1; /* off-by-one error */
/* should be: result = i */
return result;
}

Off-by-one error can still be detected using the conditional tests
10/3/2015
Data Mining: Principles and Algorithms
35
Ranking Based on Boolean Bias


10/3/2015
Let input di has a desired output oi. We execute P. P
passes the test iff oi’ is identical to oi
Tp = {ti| oi’= P(di) matches oi}
Tf = {ti| oi’= P(di) does not match oi}
Boolean bias:
 nt: # times that a boolean feature B evaluates true,
similar for nf
 Boolean bias: π(B) = (nt – nf )/(nt + nf )
 It encodes the distribution of B’s value: 1 if B always
assumes true, -1 if always false, in between for all
the other mixtures
Data Mining: Principles and Algorithms
36
Evaluation Abnormality



Boolean bias for branch P
 the probability of being evaluated as true within one
execution
Suppose we have n correct and m incorrect executions,
for any predicate P, we end up with
 An observation sequence for correct runs
 S_p = (X’_1, X’_2, …, X’_n)
 An observation sequence for incorrect runs
 S_f = (X_1, X_2, …, X_m)
Can we infer whether P is suspicious based on S_p and
S_f?
10/3/2015
Data Mining: Principles and Algorithms
37
Underlying Populations



Imagine the underlying distribution of boolean bias for
correct and incorrect executions are f(X|θp) and f(X|θf)
S_p and S_f can be viewed as random sample from the
underlying populations respectively
Major heuristic: The larger the divergence between f(X|θp)
and f(X|θf), the more relevant the branch P is to the bug
Prob
0
10/3/2015
Prob
Evaluation bias
1
0
Evaluation bias
Data Mining: Principles and Algorithms
1
38
Major Challenges
Prob
Prob
0


10/3/2015
Evaluation bias
1
0
Evaluation bias
1
No knowledge of the closed forms of both distributions
Usually, we do not have sufficient incorrect executions
to estimate f(X|θf) reliably.
Data Mining: Principles and Algorithms
39
Our Approach: Hypothesis Testing
10/3/2015
Data Mining: Principles and Algorithms
40
Faulty Functions

Motivation



Bugs are not necessarily on branches
Higher confidence in function rankings than
branch rankings
Abnormality score for functions


10/3/2015
Calculate the abnormality score for each
branch within each function
Aggregate them
Data Mining: Principles and Algorithms
41
Two Evaluation Measures

CombineRank



Combine these score by summation
Intuition: When a function contains many
abnormal branches, it is likely bug-relevant
UpperRank


10/3/2015
Choose the largest score as the representative
Intuition: When a function has one extremely
abnormal branch, it is likely bug-relevant
Data Mining: Principles and Algorithms
42
Dodash vs. Omatch: Which function is likely
buggy?─And Which Measure is More Effective?
10/3/2015
Data Mining: Principles and Algorithms
43
Bug Benchmark

10/3/2015
Bug benchmark
 Siemens Program Suite
 89 variants of 6 subject programs, each of
200-600 LOC
 89 known bugs in total
 Mainly logic (or semantic) bugs
 Widely used in software engineering
research
Data Mining: Principles and Algorithms
44
Results on Program “replace”
10/3/2015
Data Mining: Principles and Algorithms
45
Comparison between CombineRank and
UpperRank

Buggy function ranked within top-k
10/3/2015
Data Mining: Principles and Algorithms
46
Results on Other Programs
10/3/2015
Data Mining: Principles and Algorithms
47
More Questions to Be Answered





What will happen (i.e., how to handle) if multiple
errors exist in one program?
How to detect bugs if only very few error test
cases are available?
Is it really more effective if we have more
execution traces?
How to integrate program semantics in this
statistics-based testing algorithm?
How to integrate program semantics analysis with
statistics-based analysis?
10/3/2015
Data Mining: Principles and Algorithms
48
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
49
Mining Copy-Paste Bugs


Copy-pasting is common void __init prom_meminit(void)
{
 12% in Linux file
……
system [Kasper2003]
for (i=0; i<n; i++) {
total[i].adr = list[i].addr;
 19% in X Window
total[i].bytes = list[i].size;
system [Baker1995]
total[i].more = &total[i+1];
Copy-pasted code is
}
……
error prone
for (i=0; i<n; i++) {
 Among 35 errors in
taken[i].adr = list[i].addr;
Linux drivers/i2o,
taken[i].bytes = list[i].size;
34 are caused by
taken[i].more = &total[i+1];
copy-paste [Chou2001] }
Forget to
change!
(Simplified example from linux-2.6.6/arch/sparc/prom/memory.c)
10/3/2015
Data Mining: Principles and Algorithms
50
An Overview of Copy-Paste Bug Detection
Parse source code &
build a sequence database
Mine for basic copy-pasted segments
Compose larger copy-pasted segments
Prune false positives
10/3/2015
Data Mining: Principles and Algorithms
51
Parsing Source Code



Purpose: building a sequence database
Idea: statement  number
 Tokenize each component
 Different operators/constant/key words  different tokens
Handle identifier renaming:
 same type of identifiers  same token
old = 3;
Tokenize
new = 3;
5 61 20
Hash
16
10/3/2015
Data Mining: Principles and Algorithms
5 61 20
Hash
16
52
Building Sequence Database


Program  a long sequence Hash values
65 for (i=0; i<n; i++) {
 Need a sequence database
Cut the long sequence
 Naïve method: fixed length
 Our method: basic block
Final sequence DB:
(65)
(16, 16, 71)
…
(65)
(16, 16, 71)
10/3/2015
total[i].adr = list[i].addr;
total[i].bytes = list[i].size;
total[i].more = &total[i+1];
16
16
71
}
…
65
16
16
71
……
for (i=0; i<n; i++) {
taken[i].adr = list[i].addr;
taken[i].bytes = list[i].size;
taken[i].more = &total[i+1];
}
Data Mining: Principles and Algorithms
53
Mining for Basic Copy-pasted Segments

Apply frequent sequence mining algorithm on the
sequence database
Frequent
subsequence
total[i].adr = list[i].addr;
Insert
1 statement
total[i].bytes
= list[i].size;
(gap
=
1)
total[i].more = &total[i+1];
(16, 16, 71)
……
(16, 16, 71)
10, 71) taken[i].adr = list[i].addr;

Modification
 Constrain the max gap
10/3/2015
taken[i].bytes = list[i].size;
taken[i].more = &total[i+1];
Data Mining: Principles and Algorithms
54
Composing Larger Copy-Pasted Segments

Combine the neighboring copy-pasted segments
repeatedly
Hash values
65
16
16
16
71
71
copypasted
10/3/2015
for (i=0; i<n; i++) {
total[i].adr = list[i].addr;
total[i].adr = list[i].addr;
total[i].bytes
= list[i].size;
total[i].bytes == &total[i+1];
list[i].size;
total[i].more
} total[i].more = &total[i+1];
combine
……
65
16
16
16
71
71
for (i=0; i<n; i++) {
taken[i].adr = list[i].addr;
taken[i].adr
= list[i].addr;
taken[i].bytes
= list[i].size;
taken[i].bytes == &total[i+1];
list[i].size;
taken[i].more
} taken[i].more = &total[i+1];
Data Mining: Principles and Algorithms
combine
55
Pruning False Positives


Unmappable segments
 Identifier names cannot be mapped to corresponding
ones
f (a1);
f (a2);
f (a3);
Tiny segments
f1 (b1);
f1 (b2);
f2 (b3);
conflict

For more detail, see

10/3/2015
Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner:
A Tool for Finding Copy-paste and Related Bugs in Operating
System Code, in Proc. 6th Symp. Operating Systems Design and
Implementation, 2004
Data Mining: Principles and Algorithms
56
Some Test Results of C-P Bug Detection
Software
Verified Bugs
Potential Bugs (careless
programming)
Linux
28
21
FreeBSD
23
8
Apache
5
0
PostgreSQL
2
0
Software
# LOC
Software
Linux
4.4 M
FreeBSD
Time
Space (MB)
Linux
20 mins
527
3.3 M
FreeBSD
20 mins
459
Apache
224 K
Apache
15 secs
30
PostgreSQL
458 K
PostgreSQL
38 secs
57
10/3/2015
Data Mining: Principles and Algorithms
57
Outline

Motivation

Related Work

Classification of Program Executions

Extract “Backtrace” from Classification Dynamics

Mining Control Flow Abnormality for Logic Error
Isolation

CP-Miner: Mining Copy-Paste Bugs

Conclusions
10/3/2015
Data Mining: Principles and Algorithms
58
Conclusions






Data mining into software and computer systems
Identify incorrect executions from program runtime
behaviors
Classification dynamics can give away “backtrace” for
noncrashing bugs without any semantic inputs
A hypothesis testing-like approach is developed to localize
logic bugs in software
No prior knowledge about the program semantics is
assumed
Lots of other software bug mining methods should be and
explored
10/3/2015
Data Mining: Principles and Algorithms
59
References




[DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static
checking, 1998
[EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using
specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations
of Software Engineering, pages 87-96, 1994.
[DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in
polynomial time. In Conference on Programming Language Design and Implementation, 2002.
[ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic,
programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and
Implementation, October 2000.

[M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993

[H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295, 1997.


[DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as a
hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages
522-525, 1992.
[MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach
to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002.
10/3/2015
Data Mining: Principles and Algorithms
60
References (cont’d)









[G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM
Symp. Principles of Programming Languages, 1997
[BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf.
Automated Software Engineering (ASE), 2000.
[HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in
Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California
Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in
Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, in
Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal,
Sept. 2005.
C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing
Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.
[SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86GNU/Linux http://valgrind.org/
[LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copypaste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and
Implementation, 2004
[LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block
Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004
10/3/2015
Data Mining: Principles and Algorithms
61
10/3/2015
Data Mining: Principles and Algorithms
62
Descargar

No Slide Title