Linguistic enrichment of ontologies: a glance to
the role of previously existing linguistic
resources
Maria Teresa Pazienza, Armando Stellato
{pazienza,[email protected]
ART group, Dept. of Computer Science, Systems and Production
Motivation
Ontologies provide vocabularies through which agents in
the Semantic Web will be able to communicate
– Every specific ontology bears its semantics, which is specified
by:
• the interpretation given by people using the ontology inside a given
framework
• the consistent use that applications make of ontological knowledge
How can we recognize if and when these constraints are
considered? Or, at least…
03/10/2015
2
Role of Natural Language
What information can both humans and machines rely on? …natural language
• Natural Language is the last exploitable resource
– …to convey data semantics
• It helps humans in understanding how formal objects relate to their world knowledge
• It may help machines in harmonizing different conceptualizations
– Pros and cons:
• Pros: it offers a rich and universally accepted mean for express meaning
• Cons: it is ambiguous; phenomena like synonymy and homonymy must be taken in
consideration
•
Possible exploitations for a linguistically motivated approach to ontology
development:
– Provides useful linguistic anchors for improving knowledge sharing efforts
– Strengthens relationships between ontology and raw textual information (for tasks
like information extraction, ontology population etc…)
– Enhances knowledge understanding and reuse even for humans
03/10/2015
3
Enriching ontologies with
lexical information
• Possible scenarios for linguistic enrichment:
– Explicit Linguistic Enrichment
Ontology
Linguistic
Resource
03/10/2015
4
Enriching ontologies with
lexical information
• Possible scenarios for linguistic enrichment:
– Producing Multilingual Ontologies
Ontology
Bilingual Linguistic
Resource
03/10/2015
5
Enriching ontologies with
lexical information
• Possible scenarios for linguistic enrichment:
– LexicoSemantic Enrichment of Ontologies
Ontology
Linguistic Resource with a
Semantic structure (e.g WordNet)
Worker
...
Craftsman
event
Academic
Employee
Researcher
Technician
...
...
Professor
Administrative
…
03/10/2015
6
Exploiting Linguistic Resources
• Different Linguistic Resources (LRs) are available on the
Web
• These resources differentiate upon:
– Trustworthiness: from free initiatives to coordinated research
projects
– Complexity: quantity and quality of detailed information, adopted
model, morphology…
– Representation: no standard for representation of linguistic
resources
– Implementation: available as databases, huge xml repositories,
proprietary text formats etc..
03/10/2015
7
Tools for Linguistic Enrichment:
Requirements
• (possibly) embedded in ontology editing applications
• Browsing different linguistic resources
• Providing functionalities for:
– Querying LRs with terms from ontology
– Enriching ontology concepts with linguistic information
– Synonyms
– Rich textual descriptions
– Translations in different languages
– Semantic Indexes from LR
– Supporting ontology development by reusing semantic
information from linguistic resources (when available)
03/10/2015
8
Infrastucture
The Linguistic
Watermark
– Offers a
classification of
different LRs
– Provides API
for accessing
their content
03/10/2015
9
Infrastucture
WordNet
The Linguistic
Watermark
http://wordnet.princeton.edu/
– Offers a
classification of
different LRs
– Provides API
for accessing
their content
03/10/2015
10
Infrastucture
Freelang
The Linguistic
Watermark
http://www.freelang.net/
– Offers a
classification of
different LRs
– Provides API
for accessing
their content
03/10/2015
11
Infrastucture
Dict
The Linguistic
Watermark
http://www.dict.org/bin/Dict
– Offers a
classification of
different LRs
– Provides API
for accessing
their content
03/10/2015
12
OntoLing: a tool for semi-automatic
linguistic enrichment of ontologies
• Deployed as a plug-in for the popular ontology editing tool Protégé
( http://protege.stanford.edu/ then go plugins -> OntoLing )
• Exploits the Linguistic Watermark API for accessing LRs
• Support linguistic enrichment of ontologies and ontology development
Ontoling
Protégé API
Architecture
Linguistic
Browser
GUI
Facade
Ontology
Browser
Different resources
may be plugged and
recognized at run time,
by inspection of their
Linguistic Watermark
Linguistic Interface
<<interface>>
Wordnet
Interface
<<Implementation>>
Wordnet
1.7
03/10/2015
Ontoling Core
Wordnet
2.0
Wordnet
2.1
…
Interface
<<Implementation>>
FreeDict
Interface
<<Implementation>>
Italian
Hungarian
English
Danish
English
Italian
...
…
…
13
Search linguistic expressions
inside the LR
Explore semantic relationships
which characterize the LR
…and linguistic relationships
Integration between ontology and linguistic
resource: search ontology terms inside the
linguistic resource
Linguistic Enrichment of
the Ontology
Ontology concepts bear a greater linguistic
expressivity: this helps in identifying similarities
with other conceptualizations.
Assist ontology creation by extracting
portions of knowledge from the LR
…semantic
…synonyms…
Concepts
Linguistic
documentation
pointers
Metadata
to the LR…
for…
03/10/2015
14
Adaptive behaviour and
Graphic User Interface
•
Linguistic Resources may be loaded into OntoLing at run time
•
Upon initialization they declare themselves and their specific Linguistic
Watermark
•
OntoLing understands their capabilities and rearranges its Linguistic
Browser according to properties and characteristics exhibited by the LR
•
Different functionalities for enriching ontologies with content from the loaded
LR are also activated depending on its watermark
•
Support to semiautomatic enrichment also takes into consideration which ki
03/10/2015
15
Dynamic Functionalities
• The Linguistic Watermark provides a generic interface
which embraces typical LR configurations and structures
• Three methods act as service providers, in that they
allow the definition of functionalities dedicated to the
exploration of particular aspects of a given LR
– exploitSearchMethod
– exploreSemanticRelation
– exploreLexicalRelation
03/10/2015
16
Representing Linguistic Information
inside Ontologies
• Standard Protégé Model
– Use of meta-classes
• Linguistic-class
• Linguistic-slot
– A terminology slot (one for
each language) for
indicating synonyms
– Frame Documentation Slot
03/10/2015
• Protégé-OWL
– Use of standard rdfs
properties:
• rdfs:label to indicate
synonyms (also specifying
the language)
• rdfs:comment to provide
documentation about
ontology objects
17
Summarizing
•
attention paid to formal conceptual representation in the Semantic Web is not being
matched by an equivalent interest on how this information will be made easily
accessible by humans, and by machines not sharing any form of semantic
commitment.
•
A wider and deeply aware adoption of Natural Language in representing knowledge
could fill this gap
•
We developed infrastructures and a tool for:
•
–
General framework for describing different kind of LRs
–
provide functionalities for accessing their content
–
enriching ontologies with information from LR
–
Support a “linguistically aware ontology development”
Future Work:
03/10/2015
–
Integrate as many lexical resources as possible!
–
Include interfaces for accessing and exploiting other kind of linguistic resources (e.g.
Framenet)
–
Establish more complex connections between lexical resources and ontologies
18
Automatic Lexico-Semantic
Enrichment (LSE) of Ontologies
• Objective:
– identify pointers (lexico-semantic anchors) from ontological objects to
semantic entities (e.g. synsets, for WordNet) of a linguistic resource
• Through:
– Observed linguistic/semantic similarities between the ontology and the
Linguistic Resource (LR) exploited for enrichment
• Exploitable Linguistic Watermarks:
– ConceptualizedLR
– At least one from:
• TaxonomicalLR
• LRWithGlosses
03/10/2015
19
Automatic Lexico-Semantic
Enrichment (LSE) of Ontologies
Intuition behind the strategy:
If a semantic pointer links a frame-synset pair <F,S>
Then other frame-synset pairs (where the frame is more specific/more
generic than F and the synset is narrower/broader than S) have a good
probability of being linked through a semantic pointer
03/10/2015
20
Automatic LSE of Ontologies:
the Framework
• O: space of ontological objects, called Frames (classes, properties,
individuals)
• L: space of semantic indexes (semex) in the LR
•
Plausibility Matrix MP (defined over a O×L space)
– MP(i,j) represents the plausibility that the ontological object i be matched with
the semantic index j
•
Evidence Matrix ME (defined over a O×L space)
– contains in each element ME(i,j) the set of evidences which contribute to the
computation of element MP(i,j) in the Plausibility Matrix.
03/10/2015
21
Automatic LSE of Ontologies:
the Framework
• Discovery Phase
– Objective: reduce the dimension of the L space
– Process: find candidate (lexical) anchors between elements in O
and elements in L, through:
• Search filtered by String similarity measures
• Exploitation of Translation and/or Synonyms vocabularies (possibly
the LR itself)
– Output:
• LA  L (all synsets bound by candidate anchors)
– Notes:
• Maximize recall
03/10/2015
22
Automatic LSE of Ontologies:
the Framework
Semantic Enrichment function:
f
se
:O  L
 0 ..1
A
Implemented through:
–
Extraction of semantic/linguistic similarity evidences  ME
–
Computation of MP
Due to mutual dependencies between evidences for different candidate anchors:
and:
f
se
 f
se
(t )
M P ( t )  f  M E , M P ( t  1), M P (0 ) 
03/10/2015
23
Automatic LSE of Ontologies:
the Framework
Legenda:
– candidate pair : < f, s >
(< frame, semex >)
with:
f  O ; s  LA
where:
p(f,s,0) ≠ 0.
– Smarter notation for plausibility:
def
p ( f , s , t )  M P ( f , s ) w ith M
03/10/2015
P
 M P (t )
24
Implementing fse
•
Guidelines
1. prizing candidate pairs characterized by positive
evidences.
2. punishing candidate pairs characterized by negative
evidences
3. evaluate quantitative factors associated to different
kind of evidences (representing the strength, or
presence, of the evidence)
4. take into account inherent ambiguity (polysemy) of
every label associated to ontology concepts
03/10/2015
25
Implementing fse
Plausibility threshold
for an anchor
to be confirmed
Plausibility
at time = 0
Plausibility threshold
for an anchor
to be discarded
Ambiguity (polysemy) of term
bounding synset to frame
p (t ) 
n


p 0   1    1    i , t      1  p 0 
i 1


m


  1
1   1   1    i , t     
 1
i 1

  p0

Plausibility
at time t
03/10/2015
26
Implementing fse
Positive Evidences
Contribution
Plausibility
at time = 0
p (t ) 
Plausibility
at time t
Weight related to
single evidence  at
time t
n


p 0   1    1    i , t      1  p 0 
i 1


m


  1
1   1   1    i , t     
 1
i 1

  p0

Normalization
factor
Negative Evidences
Contribution
03/10/2015
27
Extracting evidences
Establishing
proper context
for each type
of frame and
for each type
of evidence
03/10/2015
(1)
computeConceptualSphere(Frame frm, int DepthRange) SET OF Frame
input
frm: the class, property or individual which has been selected for linguistic enrichment
DepthRange: the number of allowed hops along the IS-A relation for retrieving super concepts of frm
output
ConceptualSphere: the conceptual sphere surrounding frm
begin
FrameType type  getOntoType(frm)
SET OF Frame ConceptualSphere  {}
if (type = class or type = property)
ConceptualSphere  ConceptualSphere  getSuperConcepts(frm, DepthRange)
else //frm is an instance
Classes  getClasses(frm)
for each class  Classes do
ConceptualSphere  ConceptualSphere  {class}  getSuperConcepts(class, DepthRange)
end for
end if
if (type = class)
for each property p, class c | frm.hasRestriction(p,c) or c.harRestriction(p,frm) do
ConceptualSphere  ConceptualSphere  { c }  { p }
if (type = instance)
for each property p  ( frm.getOwnRelationalProperties() ) do
ConceptualSphere  ConceptualSphere  { p }  frm.getOwnPropertyValues(p)
end if
if (type = property)
for each class c  ( domain(frm)  range(frm) ) do
ConceptualSphere  ConceptualSphere  {class}
end if
return ConceptualSphere
end
28
Extracting evidences
(2)
Examined evidences
– Analysis of Taxonomical alignment
• ConceptualSphere (context) := the transitive closure of the IS-A
relationship in the ontology (and hyponymy relation for LRs)
• Requirements: TaxonomicalLR compliant Linguistic Resource
– Analysis of glosses from the LR
• ConceptualSphere := depends on frame type (see example in
previous slide)
• Requirements: LRWithGlosses compliant Linguistic Resource
03/10/2015
29
Extracting evidences
(3)
Evidences based on Taxonomical Alignment
Reflect alignment between the respective structures of the ontology and the
linguistic resource exploited for enrichment
Captured taxonomy patterns may have positive as well as negative influence
over the plausibility of a given < frame, semex > pair
ONT
FH
semantic pointer
LR
ONT
LR
SH
FH
SH
pair candidate for a
semantic pointer
Positive Evidence
03/10/2015
candidate pair
IS-A
IS-A
FL
candidate pair
SL
FL
SL
Negative Evidence
30
Extracting evidences
(3)
Evidences based on Taxonomical Alignment
Reflect alignment between the respective structures of the ontology and the
linguistic resource exploited for enrichment
Captured taxonomy patterns may have positive as well as negative influence
over the plausibility of a given < frame, semex > pair
  i , t 
Weighting
coefficient for
Taxonomy
Alignment
03/10/2015
 T A  sgn  p  fram e , sem ex , t  1 
sign
Plausibility at step
t-1 of frame/semex
pair closing the
alignment square
31
Extracting evidences
(4)
Evidences extracted through Analysis of Glosses
Glosses bear a lot of semantic information; it is not formally explicited, but,
once unveiled, can provide useful hints on how to properly match ontology
concepts and linguistic expressions
Gloss Analysis generates three kind of evidences, provided by:
•
glosses which contain linguistic reference to concepts expressed in the ontology and
which are semantically related to the concept being enriched
•
glosses which contain linguistic reference to concepts which at least exist in the
ontology
•
linguistic overlap between glosses of synsets which are candidate to enrich related
concepts
Next slides: examples for enrichment of baseball ontology from:
http://www.daml.org/2001/08/baseball/baseball-ont
03/10/2015
32
Glosses containing linguistic reference to
semantically related concepts
Ontology
for each Frame rc  ConceptualSphere do
MtchLvl  match(rc, gloss),
if MtchLvl  0
Evidences  Evidences  evd(GR, rc, MtchLvl)
end if
end for
Linguistic Resource
Division
Noun.7741947
rdf triple: League division Division
Gloss:A league ranked by quality;
”he played baseball in class D…
League
GlossRelateds,League,prop(class,domain),1
03/10/2015
33
Glosses containing linguistic reference to
concepts which exist in the ontology
Ontology
for each term t  gloss do
Frame rc  find(Ontology, t, MtchLvl),
if rc null
Evidences  Evidences  evd(GG, rc, MtchLvl)
end if
end for
Noun.179011
Run
Inning
Linguistic Resource
Gloss: A score in baseball made by a runner
touching all four bases safely;
"the Yankees scored 3 runs in the bottom of the 9th";
"their first tally came in the 3rd inning"
Inning  O
GlossGeneral,Inning,1
03/10/2015
34
Overlap between glosses of synsets which
are candidate to enrich related concepts
Ontology
WorldSeries
for each Frame rfi  ConceptualSphere do
for each synset sij  candidateSynsets(rfi) do
let rfgloss[i,j]  sj.getGloss()
end for
for each term t, t  gloss and t  rfgloss[i,j]
let freq = LR.getGlossFrequency(t)
if !filter(freq)
Evidences  Evidences  evd(GO, rfi, si, freq)
end if
end for
end for
rdf triple: WorldSeries home Team
GlossOverlap,baseball,
home-noun.3399133,1
home
Linguistic Resource
Noun. 7009602
series that constitutes the
playoff for the baseball championship
(baseball) base consisting of a
rubber slab where the batter stands
Noun. 3399133
03/10/2015
35
Testing our framework
Experimental setup:
Fine tuning of evidence-typed σ-parameters has been performed over a
collection of several small ontologies and/or portions of them
Two ontologies used for testing, WordNet used for enrichment in both cases:
1.
2.
BASEBALL ontology ( http://www.daml.org/2001/08/baseball/baseball-ont )
–
Original version in DAML+OIL and converted to OWL
–
78 classes, 26 properties and 13 individuals
–
75,3% of ambiguous concepts, average ambiguity ~9,16
–
Inter-annotator agreement = 98.76% (one contrasting decision out of the whole oracle)
MOSES Ontology about university ( http://www.mondeca.com/owl/moses/ita.owl )
–
developed in the context of the EU funded project MOSES (IST-2001-37244)
–
built, in OWL language, over a pre-existing DAML ontology, and finalized for representing the
Italian university domain
03/10/2015
–
192 classes, 122 properties
–
73,1% of ambiguous concepts, average ambiguity ~5,23
36
Experimental results
Ontology
Precision
Recall
Baseball Ont
80%
39,5%
Moses Italian
81,48%
42,72%
Detailed analysis of the test data on the first experiment revealed that,
though only 40% of the original corpus (ontology) has been correctly
enriched, another 50% contains the right choice as first (but still under
acceptance threshold), second or third in order of plausibility
03/10/2015
37
Conclusions
•
attention paid to formal conceptual representation in the Semantic Web is not being
matched by an equivalent interest on how this information will be made easily
accessible by humans, and by machines not sharing any form of semantic
commitment.
•
A wider and deeply aware adoption of Natural Language in representing knowledge –
or, at least, support knowledge representation – could fill this gap
•
We defined a first framework for:
•
–
describing LRs (under an “operational point of view”) and for enriching ontologies with their
content
–
(Semi)Automatically enrich the content of ontologies with information from linguistic
resources
Future work:
03/10/2015
–
Large scale (ontologies) testing!
–
Improving glosses processing (pos tagging, shallow parsing…)
–
Development of new techniques for multilingual ontology enrichment (possibly exploiting
more than one LR at a time)
–
Embedding all these techniques inside existing frameworks for ontology editing
38
References
Maria Teresa Pazienza, Armando Stellato An Environment for Semiautomatic Annotation of Ontological Knowledge with Linguistic
Content 3rd European Semantic Web Conference (ESWC 2006)
Budva, Montenegro, June 11-14, 2006
Maria Teresa Pazienza, Armando Stellato Exploiting Linguistic
Resources for building linguistically motivated ontologies in the
Semantic Web Second Workshop on Interfacing Ontologies and
Lexical Resources for Semantic Web Technologies
(OntoLex2006), held jointly with LREC2006 ,Magazzini del Cotone
Conference Center, Genoa, Italy, 24-26 May 2006
Maria Teresa Pazienza, Armando Stellato Linguistic Enrichment of
Ontologies: a methodological framework Second Workshop on
Interfacing Ontologies and Lexical Resources for Semantic Web
Technologies (OntoLex2006), held jointly with LREC2006
,Magazzini del Cotone Conference Center, Genoa, Italy, 24-26 May
2006
References
Maria Teresa Pazienza, Armando Stellato Linguistically motivated
Ontology Mapping for the Semantic Web SWAP 2005, the 2nd
Italian Semantic Web Workshop Trento, Italy, December 14-16,
2005
Maria Teresa Pazienza, Armando Stellato The Protégé Ontoling
Plugin - Linguistic Enrichment of Ontologies in the Semantic
Web 4th International Semantic Web Conference (ISWC-2005)
Galway, Ireland, November, 2005
Armando Stellato, Michele Vindigni, Fabio Massimo Zanzotto
XeOML: An XML-based extensible Ontology Mapping Language
Workshop on Meaning Coordination and Negotiation, held in
conjunction with 3rd International Semantic Web Conference
(ISWC-2004) Hiroshima, Japan, November 8, 2004
03/10/2015
40
Thanks for your attention
….
see you in Roma for
Aiia07 congress
http://aiia.info.uniroma2.it
03/10/2015
41
Descargar

Document