Introduction to
Data Mining
Dr. Hany Saleeb
Why Data Mining? —
Potential Applications
 Direct Marketing
 identify which prospects should be included in a mailing list
 Market segmentation
 identify common characteristics of customers who buy same products
 Market Basket Analysis
 Identify what products are likely to be bought together
 Insurance Claims Analysis
 discover patterns of fraudulent transactions
 compare current transactions against those patterns
What Is Data Mining?
 Combination of AI and statistical analysis to discover
information that is “hidden” in the data
 associations (e.g. linking purchase of pizza with beer)
 sequences (e.g. tying events together: marriage and purchase of
furniture)
 classifications (e.g. recognizing patterns such as the attributes of
employees that are most likely to quit)
 forecasting (e.g. predicting buying habits of customers based on
past patterns) Expert systems or small ML/statistical programs
What can data mining do?
 Classification
– Classify credit applicants as low, medium, high risk
– Classify insurance claims as normal, suspicious
 Estimation
– Estimate the probability of a direct mailing response
– Estimate the lifetime value of a customer
 Prediction
– Predict which customers will leave within six months
– Predict the size of the balance that will be transferred by a
credit card prospect
What can data mining do?
(cont’d)
 Association
– Find out items customers are likely to buy together
– Find out what books to recommend to Amazon.com users
 Clustering
– Difference from classification: classes are unknown!
Market Analysis and
Management
 Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
 Target marketing
Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
 Determine customer purchasing patterns over time
Conversion of single to a joint bank account: marriage, etc.
 Cross-market analysis
Associations/co-relations between product sales
Prediction based on the association information
Data Mining: Confluence of
Multiple Disciplines
Database
Technology
Machine
Learning
Information
Science
Statistics
Data Mining
Visualization
Other
Disciplines
Data Mining: On What
Kind of Data?




Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
Data Mining Process
Learning
Collecting relevant data
Model building
Understanding of business
Problem identification
Business strategy
and evaluation
Action
Requirements/challenges
in Data Mining
User interface
Mining methodology
Performance
Data source
Social and Security
Requirements/challenges
in Data Mining(2)
User interface
- Data Visualization
Understandability and interpretation of results
Information representation and rendering
Screen real-estate
- Interactivity
Manipulation of mined knowledge
focus and refine mining tasks
Focus and refine mining results
Requirements/challenges
in Data Mining(3)
Mining Methodology
Mining different kinds of knowledge in databases
Interactive mining of knowledge at multiple levels
of abstraction
Incorporation of background knowledge
Query languages
Expression and visualization of results
Handling noise and incomplete data
Pattern evaluation
Requirements/challenges
in Data Mining (4)
Performance
Efficiency and scalability of data mining algorithms
Linear algorithms needed
Parallel and distributed methods
Incremental methods
Divide and conquer?
Requirements/challenges
in Data Mining(5)
Data Source
Diversity of data types
Handling complex types of data
Mining information from heterogenous data
bases or information repositories
Can we expect a DM algorithm to do well on all
types of data ?
Data glut
Are we collecting the right data for the right answer?
Distinguish between important and unimportant data
Requirements/challenges
in Data Mining(6)
Social and Security
-Social Impact
Private and sensitive data is gathered and mined
without individual’s knowledge and/or consent
Appropriate use and distribution of discovered
knowledge
- Regulations
Need for privacy and DM policies
Data Mining Tools
Summary
The benefits of knowing one’s business is
critical; technologies are coming together
to support data mining.
Data mining is the process and result of
knowledge production, knowledge
discovery and knowledge management.
Descargar

Mining Frequent Patterns Without Candidate Generation