```Personalizing Information
Retrieval in CRISs with
Fuzzy Sets and Rough Sets
Chris Cornelis2
Helga Naessens1
1. University College Ghent, 2. Ghent University (Belgium)
05/06/2007
CRIS 2008
Overview
 Problems
in CRISs
 Fuzzy sets and Rough sets
 PAS project
05/06/2007
CRIS 2008
2
Overview
 Problems
in CRISs
 Fuzzy sets and Rough sets
 PAS project
05/06/2007
CRIS 2008
3
Problems in CRISs
Fuzzy
Term = Term
Rough
05/06/2007
CRIS 2008
4
Overview
 Problems
in CRISs
 Fuzzy sets and Rough sets
 PAS project
05/06/2007
CRIS 2008
5
Fuzzy sets and rough sets
approach: crisp sets
Young people = {x  People | 0<age(x)<27}
05/06/2007
CRIS 2008
6
Fuzzy sets and rough sets
 Fuzzy
Young(x) =
05/06/2007
approach: fuzzy sets
0
if age(x) ≥ 30
1
if age(x) ≤ 20
(30 – age(x)) / 10 otherwise
CRIS 2008
7
Fuzzy sets and rough sets

Rough approach: rough sets

Upper approximation (R↑A)
A = {Numerical Analysis}
R↑A = {Num. Analysis, Ex. Sciences, Statistics, ... , Coding Theory}
B = {Compilers}
R↑B = {Compilers, Programming, GCC, YACC}
05/06/2007
CRIS 2008
8
Fuzzy rough sets

Fuzzy approach on rough sets
Fuzzy set A
 Fuzzy relation R
 R (x,y)
 Upper approximation
 (R↑A)(y) = sup min(R(x,y),A(y))

x∈X
05/06/2007
CRIS 2008
9
Fuzzy rough sets: application

Query expansion
 Allows
R
Programming
more results by using R↑A
Programming
Hardware
1.0
Hardware
C++
Java
0.8
0.8
1.0
Laptop
Algorithm
0.6
0.4
C++
0.8
1.0
0.7
0.2
Java
0.8
0.7
1.0
0.2
Laptop
Algorithm
0.4
0.6
1.0
0.2
0.2
1.0
- Query: “Programming”
- Expanded query: {(“Programming”,1.0), (“C++”,0.8), (“Java”,0.8),
(“Algorithm”,0.6)}
05/06/2007
CRIS 2008
10
Overview
 Problems
in CRISs
 Fuzzy sets and Rough sets
 PAS project
05/06/2007
CRIS 2008
11
PAS-project

What is the PAS-project?
 Goal: to get the researcher’s attention on funding
possibilities that match his/her profile
 Information: about researchers, projects, funding
possibilities (grants etc.) → matching/collaboration
 Automation and intelligence

05/06/2007
CRIS 2008
12
PAS – How does it work?
-Name
Fill in
-Staff number
User
-Department(s)
-Group
-Last update of the profile
-Percentage research time
IWETO
-Skills description
Thesaurus
-Diplomas
-Publications
HoGent
-IWETO-keywords
Thesaurus
-Free keywords
05/06/2007
CRIS 2008
13
PAS – How does it work?
-Reference
-Title
-Content
-Attachment(s)
-Level
-Duration
Messages
-Institution
IWETO
Thesaurus
-Contact person
-IWETO-keywords
-Free keywords
05/06/2007
CRIS 2008
HoGent
Thesaurus
14
PAS – How does it work?
1
2
3

The IWETO-classification has 641 research fields:
5 at the 1st level, 31 at the 2nd level, 605 at the 3rd level
05/06/2007
CRIS 2008
15
PAS – How does it work?
1
0.6
2
0.7
3
0.8

By adding “free keywords” we can refine the classification
05/06/2007
CRIS 2008
16
PAS – How does it work?
Query:
A = {k3}
Expanded query:
R↑A = {(k1,0.8), (k3,1.0), …}
M1 → R2
05/06/2007
CRIS 2008
17
PAS – How does it work?
0.6
0.7
0.7
0.8
05/06/2007
CRIS 2008
18
05/06/2007
CRIS 2008
19
05/06/2007
CRIS 2008
20
05/06/2007
CRIS 2008
21
05/06/2007
CRIS 2008
22
05/06/2007
CRIS 2008
23
05/06/2007
CRIS 2008
24
PAS – Current implementation




Prototype that will be used as skeleton for the final
system
Basic algorithm using weights and their products and
basic fuzzy rough query expansion1
Basic profiles and messages
Manual processing of feedback and manual data
extraction from text files.
1 P.
Srinivasan, M. E. Ruiz, D. H. Kraft, J. Chen: Vocabulary mining for information
retrieval: rough sets and fuzzy sets, Information Processing and Management, 37(1)
(2001) 15-38
05/06/2007
CRIS 2008
25
PAS – Future work







Richer representation of profiles and messages
Automation of the feedback mechanism
Dealing with imprecision and words from different thesauri
Dealing with ambiguity and incomplete profiles
Tracking research activities for collaboration
Automatic extraction of information from text files
Search engine
05/06/2007
CRIS 2008
26
Thank you
05/06/2007
CRIS 2008
27
```