Data Mining
on ICDM Submission Data
Shusaku Tsumoto
Ning Zhong and Xindong Wu
ICDM 2004 Business Meeting 11/4/2004
1
Data Mining
on ICDM Submission Data

38 countries, 445 Submissions
Regular Papers: 39 (9%)
Short Papers: 66 (14.8%)

High Acceptance Ratio (Regular)


– Germany:
– Finland:
– USA:
ICDM 2004 Business Meeting 11/4/2004
4/15 (26.7%)
2/ 9
(22.2%)
20/109 (18.3%)
2
Country
Country
Regular
Short
Total
Ratio
USA
20
28
109
44.0%
China
3
4
55
12.7%
UK
1
6
39
17.9%
Japan
0
5
28
17.9%
Canada
3
3
25
24.0%
Taiwan
0
1
18
5.6%
Australia
2
1
17
17.6%
Germany
4
5
15
60.0%
France
0
2
14
14.3%
India
1
0
14
7.1%
Singapore
0
3
12
25.0%
Brazil
0
1
12
8.3%
Italy
2
1
10
30.0%
Finland
2
1
9
33.3%
Spain
0
1
7
14.3%
HongKong
1
1
6
33.3%
39
63
390
26.2%
39
66
445
23.8%
Top 15
Total
ICDM 2004 Business Meeting 11/4/2004
3
Data Mining
on ICDM Submission Data

Top 5 Areas of Submissions:
– Data mining applications
– Data mining and machine learning algorithms and methods
– Mining text and semi-structured data, and mining temporal, spatial and multimedia
data
– Data pre-processing, data reduction, feature selection and feature transformation
– Soft computing and uncertainty management for data mining

High Acceptance Ratio Areas (Regular+Short)
– Quality assessment and interestingness metrics of data mining results
5/10
50.0%
– Data pre-processing, data reduction, feature selection and feature
transformation
14/35
40.0%
– Complexity, efficiency, and scalability issues in data mining
4/11
36.4%
ICDM 2004 Business Meeting 11/4/2004
4
Regul
ar
Short
Total
Ratio
Data mining applications
4
10
84
16.7%
Data mining and machine learning algorithms and methods
9
20
81
35.8%
Mining text and semi-structured data, and mining temporal,
spatial and multimedia data
3
8
44
25.0%
Data pre-processing, data reduction, feature selection and
feature transformation
7
7
35
40.0%
3
34
8.8%
Topic
Soft computing and uncertainty management for data mining
Topics
Foundations of data mining
2
1
26
11.5%
Mining data streams
3
4
25
28.0%
1
16
6.3%
Human-machine interaction and visual data mining
Security, privacy and social impact of data mining
2
1
15
20.0%
Data and knowledge representation for data mining
1
1
12
16.7%
1
11
9.1%
Pattern recognition and trend analysis
Complexity, efficiency, and scalability issues in data mining
2
2
11
36.4%
Quality assessment and interestingness metrics of data mining
results
2
3
10
50.0%
Statistics and probability in large-scale data mining
1
9
11.1%
Integration of data warehousing, OLAP and data mining
1
9
11.1%
Collaborative filtering/personalization
2
7
28.6%
1
7
28.6%
Post-processing of data mining results
1
Others
2
6
33.3%
High performance and parallel/distributed data mining
1
2
50.0%
1
0.0%
445
23.8%
Query languages and user interfaces for mining
Total
39
66
5
Corresponding Analysis
(Country vs Final Decision)
2
r2=0.177
1.5 Slovenia
Regular
Finland
Hong Kong
Germany
-2
-1.5
-1
USA
1
Italy
0.5 Canada
0
-0.5
-0.5
Short
ICDM 2004 Business Meeting 11/4/2004
Australia India
r1=0.378
Reject
0 UK
0.5
France
Japan
1
1.5
-1
-1.5
6
Corresponding Analysis
(Topics vs Final Decision)
Applications
r2=0.184
1.5
Collaborative Filtering
1
Short
Reject
DM Methods
0.5
Quality-assessment
Soft-computing
-1.5
-1
0
-0.5
-0.5
0
0.5
Preprocessing, Feature Selection
1
1.5
2
2.5
Security, privacy
-1
Statistics and probability
-1.5
-2
r1=0.280
Regular
High-performance
-2.5
ICDM 2004 Business Meeting 11/4/2004 -3
Post-processing
7
Corresponding Analysis

Country vs Final Decision
– Regular: Germany, USA
– Short: ?
– Reject: Most of the countries are located near this region.

Topics vs Final Decision
– Regular: Quality Assessment,
Preprocessing/Feature Selection
– Short: DM/ML Methods, Collaborative Filtering
– Reject: DM Applications
ICDM 2004 Business Meeting 11/4/2004
8
Rule Mining
on ICDM Submission Data

Datasets
– Sample Size: 445
– Attributes: 5
• Paper No. : ordered by submission date
• # of Authors
• # of Characters in Title
• Country
• Category
– Analyzed by Clementine 7.1 (and SPSS12.0J)
ICDM 2004 Business Meeting 11/4/2004
9
Rule Mining (C5.0)
on ICDM Submission Data

C5.0
– [Topic=Mining semi-structured data,…] & [129< Paper No.<=369]
=> Reject (Confidence 0.87, Support 10)
– [Country=USA] & [Topic=Mining semi-structured data,…] &
[Paper No.>369] & [# of Authors <=3]
=>Accept (Confidence 0.667, Support 3)
– [Topic=Preprocessing/Feature Selection] & [# of Authors>4]
=> Accept (Confidence: 1.0, Support 3)
– Topic, Paper No, # of Authors : Important Features
ICDM 2004 Business Meeting 11/4/2004
10
Rule Mining (GRI)
on ICDM Submission Data

Generalized Rule Induction
– [# of Authors <2] & [Paper No. <120.5]
=> Rejected (Confidence 96.0%, Support 24)
– [# of Chars in Title< 27] & [Paper No. > 212]
=> Accepted (Confidence 100%, Support 5)

Paper No., # of Chars in Title, # of Authors: Important Features
ICDM 2004 Business Meeting 11/4/2004
11
Multidimensional Scaling
(2004)
0.8
Country
0.6
0.4
Decision
0.2
Paper No.
Review Score
Topics
0
-1
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
# of Chars in Title
-0.6
ICDM 2004 Business Meeting 11/4/2004
12
Summary (2004) of Mining
on ICDM Submission Data




Do not submit a paper too fast !
– Reflection not only on the contents, but also on the titles needed
Mining Text/Web/Semi-structured Data are very popular.
# of Application papers are growing now. (But, many: rejected)
Strong Topics
– Preprocessing/Feature-Selection
– Postprocessing
– Security and Privacy

Several topics are emerging in ICDM2004:
– Mining Data Streams
– Collaborative Filtering
– Quality Assessment
ICDM 2004 Business Meeting 11/4/2004
13
5.00
1,176
1,169
4.00
3.00
score
Comparison
between 02-04
Review Scores:
Box-plot
2.00
1.00
0.00
2002
ICDM 2004 Business Meeting 11/4/2004
2003
year
2004
14
Comparison between 02-04
Countries
Country
Acceptance
Ratio (2002)
Country
Acceptance
Ratio (2003)
Country
Acceptance
Ratio (2004)
Hong Kong
64.7% Israel
55.0% Germany
60.0%
USA
47.9% Hong Kong
50.0% USA
44.0%
Canada
45.5% Japan
37.0% Finland
33.0%
Finland
33.3% USA
33.0% Hong Kong
33.0%
France
33.3% Germany
32.0% Italy
30.0%
ICDM 2004 Business Meeting 11/4/2004
15
Comparison between 02 and 04
Topics
Top 5
in 2002
Acceptance
Ratio
Top 5
in 2003
Acceptance
Ratio
Top 5
in 2004
Acceptance
Ratio
Graph
Mining
75.0%
Processcentric DM
80.0% Quality Assessment
Temporal
Data
52.6%
Security,
privacy
57.0%
Preprocessing,
Feature Selection
40.0%
Theory
42.9%
Statistics and
Probability
47.0%
Complexity/Scalabil
ity
36.4%
Text
Mining
42.1%
Visual Data
Mining
38.0%
DM and ML
Methods
35.8%
Rule
41.7%
Postprocessing
41.7%
Collaborative
Filtering
28.6%
Post-processing
28.6%
50.0%
16
Multidimensional Scaling
(2003 and 2004)
0.8
Topological structure w.r.t. similarities
seems not to be changed in 2003
and 2004.
Country0.6
0.4
Decision
0.2
-1
Topics
-0.5
0
0
Paper No.
Review
0.5 Score 1
1.5
2004
-0.2
-0.4
# of Authors
0.8
Country
# of Chars -0.
in6 Title
0.6
0.4
2003
Decision
0.2
Review Score Paper No
Topics 0
-1
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
ICDM 2004 Business Meeting 11/4/2004
-0.6Title
# of Chars in
17
Data Mining
on ICDM Submission Data

Acknowledgements
– Many thanks to
• PC chairs, Vice Chairs and PC
members
• All the authors
• All the contributors to ICDM2004
– See you again in ICDM2005!
ICDM 2004 Business Meeting 11/4/2004
18
Multidimensional Scaling
(2004)
0.8
Country
0.6
0.4
Decision
0.2
Paper No.
Review Score
Topics
0
-1
-0.5
0
-0.2
0.5
1
1.5
# of Authors
-0.4
# of Chars in Title
-0.6
ICDM 2004 Business Meeting 11/4/2004
19
Multidimensional Scaling
(2003)
0.8
Country
0.6
0.4
Decision
0.2
-1
-0.5
Topics
0
-0.2
Review Score
0
0.5
# of Authors
Paper No.
1
1.5
-0.4
# of Chars in Title
ICDM 2004 Business Meeting 11/4/2004
-0.6
20
Descargar

ICDM04 Submission Data