Data Mining: Concepts and Techniques — Chapter 11 — — Additional Theme: Software Bug Mining— Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber. All rights reserved. Acknowledgement: Chao Liu 10/3/2015 Data Mining: Principles and Algorithms 1 10/3/2015 Data Mining: Principles and Algorithms 2 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 3 Motivation Software is “full of bugs” Windows 2000, 35 million lines of code 63,000 known bugs at the time of release, 2 per 1000 lines Software failure costs Courtesy to CNN.com Ariane 5 explosion due to “errors in the software of the inertial reference system” (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf) A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf Testing and debugging are laborious and expensive 10/3/2015 “50% of my company employees are testers, and the rest spends 50% of their time testing!” —Bill Gates, in 1995 Data Mining: Principles and Algorithms 4 A Glimpse on Software Bugs Crashing bugs Symptoms: segmentation faults Reasons: memory access violations Tools: Valgrind, CCured Noncrashing bugs Symptoms: unexpected outputs Reasons: logic or semantic errors 10/3/2015 if ((m >= 0)) vs. if ((m >= 0) && (m != lastm)) < vs. <=, > vs. >=, etc .. j = i vs. j= i+1 Tools: No sound tools Data Mining: Principles and Algorithms 5 Example of Noncrashing Bugs void void subline(char subline(char *lin, *lin, char char *pat, *pat, char char *sub) *sub) {{ int int i, i, lastm, lastm, m; m; lastm lastm == -1; -1; ii == 0; 0; while((lin[i] ENDSTR)) {{ while((lin[i] != != ENDSTR)) mm == amatch(lin, amatch(lin, i, i, pat, pat, 0); 0); if 0) && (lastm != m) ){ if ((m (m >= >>=0){ 0){ putsub(lin, putsub(lin, i, i, m, m, sub); sub); lastm lastm == m; m; }} if if ((m ((m == == -1) -1) || || (m (m == == i)){ i)){ fputc(lin[i], fputc(lin[i], stdout); stdout); ii == ii ++ 1; 1; }} else else ii == m; m; }} }} 10/3/2015 Data Mining: Principles and Algorithms 6 Debugging Crashes Crashing Bugs 10/3/2015 Data Mining: Principles and Algorithms 7 Bug Localization via Backtrace Can we circle out the backtrace for noncrashing bugs? Major challenges We do not know where abnormality happens Observations Classifications depend on discriminative features, which can be regarded as a kind of abnormality 10/3/2015 Can we extract backtrace from classification results? Data Mining: Principles and Algorithms 8 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 9 Related Work Crashing bugs Memory access monitoring Purify [HJ92], Valgrind [SN00] … Noncrashing bugs 10/3/2015 Static program analysis Traditional model checking Model checking source code Data Mining: Principles and Algorithms 10 Static Program Analysis Methodology Examine source code directly Enumerate all the possible execution paths without running the program Check user-specified properties, e.g. free(p) × …… (*p) lock(res) …… unlock(res) receive_ack() … … send_data() Strengths Check all possible execution paths Problems Shallow semantics Properties can be directly mapped to source code structure Tools ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00] … 10/3/2015 Data Mining: Principles and Algorithms 11 Traditional Model Checking Methodology Formally model the system under check in a particular description language Exhaustive exploration of the reachable states in checking desired or undesired properties Strengths Model deep semantics Naturally fit in checking event-driven systems, like protocols Problems Significant amount of manual efforts in modeling State space explosion Tools SMV [M93], SPIN [H97], Murphi [DDH+92] … 10/3/2015 Data Mining: Principles and Algorithms 12 Model Checking Source Code Methodology Run real program in sandbox Manipulate event happenings, e.g., Strengths Less significant manual specification Problems Application restrictions, e.g., Message incomings the outcomes of memory allocation Event-driven programs (still) Clear mapping between source code and logic event Tools CMC [MPC+02], Verisoft [G97], Java PathFinder [BHP+-00] … 10/3/2015 Data Mining: Principles and Algorithms 13 Summary of Related Work In common, Semantic inputs are necessary Application scenarios 10/3/2015 Program model Properties to check Shallow semantics Event-driven system Data Mining: Principles and Algorithms 14 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 15 Example Revisited void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; • No memory violations i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m ((m>= >>= 0){ 0){ 0) && (lastm != m) ){ putsub(lin, i, m, sub); • Not event-driven program • No explicit error properties lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } 10/3/2015 Data Mining: Principles and Algorithms 16 Identification of Incorrect Executions A two-class classification problem How to abstract program executions Feature selection Program behavior graph Edges + Closed frequent subgraphs Program behavior graphs Function-level abstraction of program behaviors int main(){ ... A(); ... B(); } int A(){ ... } int B(){ ... C() ... } int C(){ ... } 10/3/2015 Data Mining: Principles and Algorithms 17 Values of Classification A graph classification problem Every execution gives one behavior graph Two sets of instances: correct and incorrect Values of classification Classification itself does not readily work for bug localization ? relies on discriminative features Successful classification Can discriminative features be treated as a kind of abnormality? When abnormality happens? 10/3/2015 Classifier only labels each run as either correct or incorrect as a whole It does not tell when abnormality happens Incremental classification? Data Mining: Principles and Algorithms 18 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 19 Incremental Classification 10/3/2015 Classification works only when instances of two classes are different. So that we can use classification accuracy as a measure of difference. Relate classification dynamics to bug relevant functions Data Mining: Principles and Algorithms 20 Illustration: Precision Boost main A B A E C D main F D One Correct Execution 10/3/2015 C B G E F H G One Incorrect Execution Data Mining: Principles and Algorithms 21 Bug Relevance Precision boost For each function F: Intuition Precision boost = Exit precision - Entrance precision. Differences take place within the execution of F Abnormalities happens while F is in the stack The larger this precision boost, the more likely F is part of the backtrace Bug-relevant function 10/3/2015 Data Mining: Principles and Algorithms 22 Outline Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Case Study Conclusions 10/3/2015 Data Mining: Principles and Algorithms 23 Case Study void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m ((m>= >=0){ 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } 10/3/2015 Subject program replace: perform regular expression matching and substitutions 563 lines of C code 17 functions are involved Execution behaviors 130 out of 5542 test cases fail to give correct outputs No incorrect executions incur segmentation faults Logic bug Can we circle out the backtrace for this bug? Data Mining: Principles and Algorithms 24 Precision Pairs 10/3/2015 Data Mining: Principles and Algorithms 25 Precision Boost Analysis 10/3/2015 Objective judgment of bug relevant functions main function is always bug relevant Stepwise precision boost Line-up property Data Mining: Principles and Algorithms 26 Backtrace for Noncrashing Bugs 10/3/2015 Data Mining: Principles and Algorithms 27 Method Summary Identify incorrect executions from program runtime behaviors Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs Data mining can contribute to software engineering and system researches in general 10/3/2015 Data Mining: Principles and Algorithms 28 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 29 An Example void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset) { while (…){ … if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; } *i = *i + 1; } } Replace program: 563 lines of C code, 20 functions Symptom: 30 out of 5542 test cases fail to give correct outputs, and no crashes Goal: Localizing the bug, and prioritizing manual examination 10/3/2015 Data Mining: Principles and Algorithms 30 Difficulty & Expectation 10/3/2015 Difficulty Statically, even small programs are complex due to dependencies Dynamically, execution paths can vary significantly across all possible inputs Logic errors have no apparent symptoms Expectations Unrealistic to fully unload developers Localize buggy region Prioritize manual examination Data Mining: Principles and Algorithms 31 Execution Profiling 10/3/2015 Full execution trace Control flow + value tags Too expensive to record at runtime Unwieldy to process Summarized control flow for conditionals (if, while, for) Branch evaluation counts Lightweight to take at runtime Easy to process and effective Data Mining: Principles and Algorithms 32 Analysis of the Example if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; } A = isalnum(isalnum(src[*i+1])) B = src[*i-1]<=src[*i+1] An execution is logically correct until (A ^ ¬B) is evaluated as true when the evaluation reaches this condition If we monitor the program conditionals like A here, their evaluation will shed light on the hidden error and can be exploited for error isolation 10/3/2015 Data Mining: Principles and Algorithms 33 Analysis of Branching Actions Correct vs. in correct runs in program P B A nAB ¬A n¬AB ¬B nA¬B = 0 n¬A¬B B ¬B A nAB nA¬B ≥1 ¬A n¬AB n¬A¬B AS we tested through 5542 test cases, the true eval prob for (A^¬B) is 0.727 in a correct and 0.896 in an incorrect execution on average Error location does exhibit detectable abnormal behaviors in incorrect executions 10/3/2015 Data Mining: Principles and Algorithms 34 Conditional Test Works for Nonbranching Errors Void makepat (char *arg, int start, char delim, char *pat) { … if (!junk) result = 0; else result = i + 1; /* off-by-one error */ /* should be: result = i */ return result; } Off-by-one error can still be detected using the conditional tests 10/3/2015 Data Mining: Principles and Algorithms 35 Ranking Based on Boolean Bias 10/3/2015 Let input di has a desired output oi. We execute P. P passes the test iff oi’ is identical to oi Tp = {ti| oi’= P(di) matches oi} Tf = {ti| oi’= P(di) does not match oi} Boolean bias: nt: # times that a boolean feature B evaluates true, similar for nf Boolean bias: π(B) = (nt – nf )/(nt + nf ) It encodes the distribution of B’s value: 1 if B always assumes true, -1 if always false, in between for all the other mixtures Data Mining: Principles and Algorithms 36 Evaluation Abnormality Boolean bias for branch P the probability of being evaluated as true within one execution Suppose we have n correct and m incorrect executions, for any predicate P, we end up with An observation sequence for correct runs S_p = (X’_1, X’_2, …, X’_n) An observation sequence for incorrect runs S_f = (X_1, X_2, …, X_m) Can we infer whether P is suspicious based on S_p and S_f? 10/3/2015 Data Mining: Principles and Algorithms 37 Underlying Populations Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X|θp) and f(X|θf) S_p and S_f can be viewed as random sample from the underlying populations respectively Major heuristic: The larger the divergence between f(X|θp) and f(X|θf), the more relevant the branch P is to the bug Prob 0 10/3/2015 Prob Evaluation bias 1 0 Evaluation bias Data Mining: Principles and Algorithms 1 38 Major Challenges Prob Prob 0 10/3/2015 Evaluation bias 1 0 Evaluation bias 1 No knowledge of the closed forms of both distributions Usually, we do not have sufficient incorrect executions to estimate f(X|θf) reliably. Data Mining: Principles and Algorithms 39 Our Approach: Hypothesis Testing 10/3/2015 Data Mining: Principles and Algorithms 40 Faulty Functions Motivation Bugs are not necessarily on branches Higher confidence in function rankings than branch rankings Abnormality score for functions 10/3/2015 Calculate the abnormality score for each branch within each function Aggregate them Data Mining: Principles and Algorithms 41 Two Evaluation Measures CombineRank Combine these score by summation Intuition: When a function contains many abnormal branches, it is likely bug-relevant UpperRank 10/3/2015 Choose the largest score as the representative Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant Data Mining: Principles and Algorithms 42 Dodash vs. Omatch: Which function is likely buggy?─And Which Measure is More Effective? 10/3/2015 Data Mining: Principles and Algorithms 43 Bug Benchmark 10/3/2015 Bug benchmark Siemens Program Suite 89 variants of 6 subject programs, each of 200-600 LOC 89 known bugs in total Mainly logic (or semantic) bugs Widely used in software engineering research Data Mining: Principles and Algorithms 44 Results on Program “replace” 10/3/2015 Data Mining: Principles and Algorithms 45 Comparison between CombineRank and UpperRank Buggy function ranked within top-k 10/3/2015 Data Mining: Principles and Algorithms 46 Results on Other Programs 10/3/2015 Data Mining: Principles and Algorithms 47 More Questions to Be Answered What will happen (i.e., how to handle) if multiple errors exist in one program? How to detect bugs if only very few error test cases are available? Is it really more effective if we have more execution traces? How to integrate program semantics in this statistics-based testing algorithm? How to integrate program semantics analysis with statistics-based analysis? 10/3/2015 Data Mining: Principles and Algorithms 48 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 49 Mining Copy-Paste Bugs Copy-pasting is common void __init prom_meminit(void) { 12% in Linux file …… system [Kasper2003] for (i=0; i<n; i++) { total[i].adr = list[i].addr; 19% in X Window total[i].bytes = list[i].size; system [Baker1995] total[i].more = &total[i+1]; Copy-pasted code is } …… error prone for (i=0; i<n; i++) { Among 35 errors in taken[i].adr = list[i].addr; Linux drivers/i2o, taken[i].bytes = list[i].size; 34 are caused by taken[i].more = &total[i+1]; copy-paste [Chou2001] } Forget to change! (Simplified example from linux-2.6.6/arch/sparc/prom/memory.c) 10/3/2015 Data Mining: Principles and Algorithms 50 An Overview of Copy-Paste Bug Detection Parse source code & build a sequence database Mine for basic copy-pasted segments Compose larger copy-pasted segments Prune false positives 10/3/2015 Data Mining: Principles and Algorithms 51 Parsing Source Code Purpose: building a sequence database Idea: statement number Tokenize each component Different operators/constant/key words different tokens Handle identifier renaming: same type of identifiers same token old = 3; Tokenize new = 3; 5 61 20 Hash 16 10/3/2015 Data Mining: Principles and Algorithms 5 61 20 Hash 16 52 Building Sequence Database Program a long sequence Hash values 65 for (i=0; i<n; i++) { Need a sequence database Cut the long sequence Naïve method: fixed length Our method: basic block Final sequence DB: (65) (16, 16, 71) … (65) (16, 16, 71) 10/3/2015 total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1]; 16 16 71 } … 65 16 16 71 …… for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } Data Mining: Principles and Algorithms 53 Mining for Basic Copy-pasted Segments Apply frequent sequence mining algorithm on the sequence database Frequent subsequence total[i].adr = list[i].addr; Insert 1 statement total[i].bytes = list[i].size; (gap = 1) total[i].more = &total[i+1]; (16, 16, 71) …… (16, 16, 71) 10, 71) taken[i].adr = list[i].addr; Modification Constrain the max gap 10/3/2015 taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; Data Mining: Principles and Algorithms 54 Composing Larger Copy-Pasted Segments Combine the neighboring copy-pasted segments repeatedly Hash values 65 16 16 16 71 71 copypasted 10/3/2015 for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].bytes == &total[i+1]; list[i].size; total[i].more } total[i].more = &total[i+1]; combine …… 65 16 16 16 71 71 for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].bytes == &total[i+1]; list[i].size; taken[i].more } taken[i].more = &total[i+1]; Data Mining: Principles and Algorithms combine 55 Pruning False Positives Unmappable segments Identifier names cannot be mapped to corresponding ones f (a1); f (a2); f (a3); Tiny segments f1 (b1); f1 (b2); f2 (b3); conflict For more detail, see 10/3/2015 Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004 Data Mining: Principles and Algorithms 56 Some Test Results of C-P Bug Detection Software Verified Bugs Potential Bugs (careless programming) Linux 28 21 FreeBSD 23 8 Apache 5 0 PostgreSQL 2 0 Software # LOC Software Linux 4.4 M FreeBSD Time Space (MB) Linux 20 mins 527 3.3 M FreeBSD 20 mins 459 Apache 224 K Apache 15 secs 30 PostgreSQL 458 K PostgreSQL 38 secs 57 10/3/2015 Data Mining: Principles and Algorithms 57 Outline Motivation Related Work Classification of Program Executions Extract “Backtrace” from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions 10/3/2015 Data Mining: Principles and Algorithms 58 Conclusions Data mining into software and computer systems Identify incorrect executions from program runtime behaviors Classification dynamics can give away “backtrace” for noncrashing bugs without any semantic inputs A hypothesis testing-like approach is developed to localize logic bugs in software No prior knowledge about the program semantics is assumed Lots of other software bug mining methods should be and explored 10/3/2015 Data Mining: Principles and Algorithms 59 References [DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static checking, 1998 [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994. [DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002. [ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000. [M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295, 1997. [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992. [MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002. 10/3/2015 Data Mining: Principles and Algorithms 60 References (cont’d) [G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997 [BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000. [HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California Chao Liu, Xifeng Yan, and Jiawei Han, “Mining Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006. C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, “SOBER: Statistical Model-based Bug Localization”, in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for Backtrace of Noncrashing Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005. [SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86GNU/Linux http://valgrind.org/ [LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copypaste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004 [LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004 10/3/2015 Data Mining: Principles and Algorithms 61 10/3/2015 Data Mining: Principles and Algorithms 62

Descargar
# No Slide Title