Introduction to
Data Mining
Dr. Hany Saleeb
Why Data Mining? —
Potential Applications
 Direct Marketing
 identify which prospects should be included in a mailing list
 Market segmentation
 identify common characteristics of customers who buy same products
 Market Basket Analysis
 Identify what products are likely to be bought together
 Insurance Claims Analysis
 discover patterns of fraudulent transactions
 compare current transactions against those patterns
What Is Data Mining?
 Combination of AI and statistical analysis to discover
information that is “hidden” in the data
 associations (e.g. linking purchase of pizza with beer)
 sequences (e.g. tying events together: marriage and purchase of
 classifications (e.g. recognizing patterns such as the attributes of
employees that are most likely to quit)
 forecasting (e.g. predicting buying habits of customers based on
past patterns) Expert systems or small ML/statistical programs
What can data mining do?
 Classification
– Classify credit applicants as low, medium, high risk
– Classify insurance claims as normal, suspicious
 Estimation
– Estimate the probability of a direct mailing response
– Estimate the lifetime value of a customer
 Prediction
– Predict which customers will leave within six months
– Predict the size of the balance that will be transferred by a
credit card prospect
What can data mining do?
 Association
– Find out items customers are likely to buy together
– Find out what books to recommend to users
 Clustering
– Difference from classification: classes are unknown!
Market Analysis and
 Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
 Target marketing
Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
 Determine customer purchasing patterns over time
Conversion of single to a joint bank account: marriage, etc.
 Cross-market analysis
Associations/co-relations between product sales
Prediction based on the association information
Data Mining: Confluence of
Multiple Disciplines
Data Mining
Data Mining: On What
Kind of Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
Data Mining Process
Collecting relevant data
Model building
Understanding of business
Problem identification
Business strategy
and evaluation
in Data Mining
User interface
Mining methodology
Data source
Social and Security
in Data Mining(2)
User interface
- Data Visualization
Understandability and interpretation of results
Information representation and rendering
Screen real-estate
- Interactivity
Manipulation of mined knowledge
focus and refine mining tasks
Focus and refine mining results
in Data Mining(3)
Mining Methodology
Mining different kinds of knowledge in databases
Interactive mining of knowledge at multiple levels
of abstraction
Incorporation of background knowledge
Query languages
Expression and visualization of results
Handling noise and incomplete data
Pattern evaluation
in Data Mining (4)
Efficiency and scalability of data mining algorithms
Linear algorithms needed
Parallel and distributed methods
Incremental methods
Divide and conquer?
in Data Mining(5)
Data Source
Diversity of data types
Handling complex types of data
Mining information from heterogenous data
bases or information repositories
Can we expect a DM algorithm to do well on all
types of data ?
Data glut
Are we collecting the right data for the right answer?
Distinguish between important and unimportant data
in Data Mining(6)
Social and Security
-Social Impact
Private and sensitive data is gathered and mined
without individual’s knowledge and/or consent
Appropriate use and distribution of discovered
- Regulations
Need for privacy and DM policies
Data Mining Tools
The benefits of knowing one’s business is
critical; technologies are coming together
to support data mining.
Data mining is the process and result of
knowledge production, knowledge
discovery and knowledge management.

Mining Frequent Patterns Without Candidate Generation