Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner, Martin Law, Erinija Pranckeviciene, Albert Orriols-Puig, Nuria Macia Supervised Learning: Many Methods, Data Dependent Performances Bayesian classifiers, logistic regression, linear & polynomial discriminators, nearest-neighbors, decision trees & forests, neural networks, support vector machines, ensemble methods, … • No clear winners good for all problems • Often, accuracy reaches a limit for a practical problem, even with the best known method 2 Z e ro R NN1 NNK NB C 4 .5 PART SM O aud 2 5 .3 7 6 .0 6 8 .4 6 9 .6 7 9 .0 8 1 .2 - 5 7 .7 aus 5 5 .5 8 1 .9 8 5 .4 7 7 .5 8 5 .2 8 3 .3 8 4 .9 8 5 .7 bal 4 5 .0 7 6 .2 8 7 .2 9 0 .4 7 8 .5 8 1 .9 - 7 9 .8 bpa 5 8 .0 6 3 .5 6 0 .6 5 4 .3 6 5 .8 6 5 .8 5 8 .0 6 8 .2 bps 5 1 .6 8 3 .2 8 2 .8 7 8 .6 8 0 .1 7 9 .0 8 6 .4 8 3 .3 b re 6 5 .5 9 6 .0 9 6 .7 9 6 .0 9 5 .4 9 5 .3 9 6 .7 9 6 .0 cm c 4 2 .7 4 4 .4 4 6 .8 5 0 .6 5 2 .1 4 9 .8 - 5 2 .3 g ls 3 4 .6 6 6 .3 6 6 .4 4 7 .6 6 5 .8 6 9 .0 - 7 2 .6 h -c 5 4 .5 7 7 .4 8 3 .2 8 3 .6 7 3 .6 7 7 .9 - 7 9 .9 hep 7 9 .3 7 9 .9 8 0 .8 8 3 .2 7 8 .9 8 0 .0 8 3 .9 8 3 .2 irs 3 3 .3 9 5 .3 9 5 .3 9 4 .7 9 5 .3 9 5 .3 - 9 4 .7 k rk 5 2 .2 8 9 .4 9 4 .9 8 7 .0 9 8 .3 9 8 .4 9 6 .1 9 8 .6 la b 6 5 .4 8 1 .1 9 2 .1 9 5 .2 7 3 .3 7 3 .9 9 3 .2 7 5 .4 le d 1 0 .5 6 2 .4 7 5 .0 7 4 .9 7 4 .9 7 5 .1 - 7 4 .8 ly m 5 5 .0 8 3 .3 8 3 .6 8 5 .6 7 7 .0 7 1 .5 - 7 9 .0 mmg 5 6 .0 6 3 .0 6 5 .3 6 4 .7 6 4 .8 6 1 .9 6 7 .0 6 3 .4 m us 5 1 .8 1 0 0 .0 1 0 0 .0 9 6 .4 1 0 0 .0 1 0 0 .0 1 0 0 .0 9 9 .8 m ux 4 9 .9 7 8 .6 9 9 .8 6 1 .9 9 9 .9 1 0 0 .0 6 1 .6 1 0 0 .0 pm i 6 5 .1 7 0 .3 7 3 .9 7 5 .4 7 3 .1 7 2 .6 7 6 .7 7 6 .0 p rt 2 4 .9 3 4 .5 4 2 .5 5 0 .8 4 1 .6 3 9 .8 - 4 3 .7 seg 1 4 .3 9 7 .4 9 6 .1 8 0 .1 9 7 .2 9 6 .8 - 9 6 .1 s ic k 9 3 .8 9 6 .1 9 6 .3 9 3 .3 9 8 .4 9 7 .0 9 3 .8 9 6 .7 soyb 1 3 .5 8 9 .5 9 0 .3 9 2 .8 9 1 .4 9 0 .3 - 7 6 .2 ta o 4 9 .8 9 6 .1 9 6 .0 8 0 .8 9 5 .1 9 3 .6 8 3 .6 8 8 .4 th y 1 9 .5 6 8 .1 6 5 .1 8 0 .6 9 2 .1 9 2 .1 - 8 6 .3 veh 2 5 .1 6 9 .4 6 9 .7 4 6 .2 7 3 .6 7 2 .6 - 7 2 .2 v o te 6 1 .4 9 2 .4 9 2 .6 9 0 .1 9 6 .3 9 6 .5 9 5 .6 9 5 .4 vow 9 .1 9 9 .1 9 6 .6 6 5 .3 8 0 .7 7 8 .3 - 8 7 .6 w ne 3 9 .8 9 5 .6 9 6 .8 9 7 .8 9 4 .6 9 2 .9 - 9 6 .3 zoo 4 1 .7 9 4 .6 9 2 .5 9 5 .4 9 1 .6 9 2 .5 - 9 2 .6 Avg 4 4 .8 8 0 .0 8 2 .4 7 8 .0 8 2 .1 8 1 .8 8 4 .1 8 1 .7 All Rights Reserved © Alcatel-Lucent 2008 XCS Accuracy Depends on the Goodness of Match between Classifiers and Problems Problem A Problem B Better ! Better ! XCS 3 error= 1.9% NN error= 0.06% XCS All Rights Reserved © Alcatel-Lucent 2008 error= 0.6% NN error= 0.7% Measuring Geometrical Complexity of Classification Problems Our goal: tools and languages for studying Characteristics of geometry & topology of high-dim data sets How they change with feature transformations and sampling How they interact with classifier geometry We want to know: What are real-world problems like? What is my problem like? What can be expected of a method on a specific problem? 4 All Rights Reserved © Alcatel-Lucent 2008 Parameterization of Data Complexity 5 All Rights Reserved © Alcatel-Lucent 2008 Some Useful Measures of Geometric Complexity Degree of Linear Separability Find separating hyperplane by linear programming Error counts and distances to plane measure separability Length of Class Boundary Compute minimum spanning tree Count class-crossing edges 6 Fisher’s Discriminant Ratio Classical measure of class separability Maximize over all features to find the most discriminating (μ1 μ 2 )2 f σ 12 σ 2 2 Shapes of Class Manifolds Cover same-class pts with maximal balls Ball counts describe shape of class manifold All Rights Reserved © Alcatel-Lucent 2008 Using Complexity Measures to Study Problem Distributions Real-World Data Sets: Benchmarking data from UC-Irvine archive 844 two-class problems 452 are linearly separable, 392 non-separable Synthetic Data Sets: Random labeling randomly located points 100 problems in 1-100 dimensions Linearly separable real-world data 7 Metric 2 Random labeling of Linearly nonseparable realworld data Complexity Metric 1 All Rights Reserved © Alcatel-Lucent 2008 Measures of Geometrical Complexity 8 All Rights Reserved © Alcatel-Lucent 2008 Distribution of Problems in Complexity Space lin.sep lin.nonsep random• 9 All Rights Reserved © Alcatel-Lucent 2008 The First 6 Principal Components 10 All Rights Reserved © Alcatel-Lucent 2008 Interpretation of the First 4 PCs PC 1: 50% of variance: Linearity of boundary and proximity of opposite class neighbor PC 2: 12% of variance: Balance between within-class scatter and between-class distance PC 3: 11% of variance: Concentration & orientation of intrusion into opposite class PC 4: 9% of variance: Within-class scatter 11 All Rights Reserved © Alcatel-Lucent 2008 Problem Distribution in 1st & 2nd Principal Components • Continuous distribution Linearly separable • Known easy & difficult problems occupy opposite ends • Few outliers • Empty regions Random labels 12 All Rights Reserved © Alcatel-Lucent 2008 Relating Classifier Behavior to Data Complexity 13 All Rights Reserved © Alcatel-Lucent 2008 Class Boundaries Inferred by Different Classifiers XCS: a genetic algorithm 14 Nearest neighbor classifier All Rights Reserved © Alcatel-Lucent 2008 Linear classifier Domains of Competence of Classifiers • Which classifier works the best for a given classification problem? • Can data complexity give us a hint? Metric 2 ? XCS LC Decision Forest NN Complexity metric 1 15 All Rights Reserved © Alcatel-Lucent 2008 Domain of Competence Experiment Use a set of 9 complexity measures Boundary, Pretop, IntraInter, NonLinNN, NonLinLP, Fisher, MaxEff, VolumeOverlap, Npts/Ndim Characterize 392 two-class problems from UCI data, all shown to be linearly non-separable Evaluate 6 classifiers NN LP Odt Pdfc Bdfc XCS 16 (1-nearest neighbor) (linear classifier by linear programming) (oblique decision tree) (random subspace decision forest) (bagging based decision forest) (a genetic-algorithm based classifier) All Rights Reserved © Alcatel-Lucent 2008 ensemble methods Identifiable Domains of Competence by NN and LP Best Classifier for Benchmarking Data 17 All Rights Reserved © Alcatel-Lucent 2008 Less Identifiable Domains of Competence Regions in complexity space where the best classifier is (nn,lp, or odt) vs. an ensemble technique Boundary-NonLinNN IntraInter-Pretop MaxEff-VolumeOverlap •ensemble + nn,lp,odt 18 All Rights Reserved © Alcatel-Lucent 2008 Difficulties in Estimating Data Complexity 19 All Rights Reserved © Alcatel-Lucent 2008 Apparent vs. True Complexity: Uncertainty in Measures due to Sampling Density Problem may appear deceptively simple or complex with small samples 20 2 points 10 points 100 points 500 points All Rights Reserved © Alcatel-Lucent 2008 1000 points Uncertainty of Estimates at Two Levels Sparse training data in each problem & complex geometry cause ill-posedness of class boundaries (uncertainty in feature space) Sparse sample of problems causes difficulty in identifying regions of dominant competence (uncertainty in complexity space) 21 All Rights Reserved © Alcatel-Lucent 2008 Complexity Estimates and Dimensionality Reduction Feature selection/transformation may change the difficulty of a classification problem: • Widening the gap between classes • Compressing the discriminatory information • Removing irrelevant dimensions It is often unclear to what extent these happen We seek quantitative description of such changes Feature selection 22 All Rights Reserved © Alcatel-Lucent 2008 Discrimination Spread of classification accuracy and geometrical complexity due to forward feature selection F F S su b se ts a ll d a ta se ts b o u n d a ry ve rsu s 1 N N cla ssifica tio n e rro r 0 .7 0 .6 sp e ctra 1 co lo n sp e ctra 2 o e va o g ria at n sp e ctra 3 1 N N e rro r 0 .5 0 .4 0 .3 0 .2 0 .1 0 10 23 20 30 40 50 B o u n d a ry 60 70 All Rights Reserved © Alcatel-Lucent 2008 80 90 Conclusions 24 All Rights Reserved © Alcatel-Lucent 2008 Summary: Early Discoveries •Problems distribute in a continuum in complexity space •Several key measures provide independent characterization •There exist identifiable domains of classifier’s dominant competency •Sparse sampling, feature selection, and feature transformation induce variability in complexity estimates 25 All Rights Reserved © Alcatel-Lucent 2008 For the Future Further progress in statistical learning will need systematic, scientific evaluation of the algorithms with problems that are difficult for different reasons. A “problem synthesizer” will be useful to provide a complete evaluation platform, and reveal the “blind spots” of current learning algorithms. Rigorous statistical characterization of complexity estimates from limited training data will help gauge the uncertainty, and determine applicability of data complexity methods. Ongoing: DCol: Data Complexity Library ICPR 2010 Contest on Domain of Dominant Competence 26 All Rights Reserved © Alcatel-Lucent 2008

Descargar
# Value Proposition for Acoustic Fan Monitoring Technology