Balancing Lexicographic and
Ontological Considerations in
Ontology Development
First International Workshop on Ontological Analysis
Trento, IT
16-20 July, 2012
Amanda Hicks, University at Buffalo
[email protected]
Ontologies vs. Wordnets
Wordnets represent
• how we use language
– the word ‘cat’ in context
Ontologies represent
• what it is to be a cat
– e.g., whether being a cat is a rigid property
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
2
Overview of some
ontologies
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
3
3 Layers of Ontologies
• Upper
Most abstract
• Middle
Intermediately abstract
• Domain
Specific to a domain or
application
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
4
Domain Ontologies
• are often developed by domain experts.
• model highly specific, technical information.
• often for use in a particular community of
researchers, technicians, etc.
• Examples:
– Gene Ontology
– KYOTO domain ontology
– Protein Ontology
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
5
Middle Ontologies
• are developed by ontologists or other
information technologists.
• model concepts that are often part of a
normal, spoken and written vocabulary and of
an intermediate level of abstraction.
• connect upper-level ontologies with the
domain ontology.
• Examples:
– KYOTO Middle
– Information Artifact Ontology
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
6
Upper Ontologies
• developed by ontologists
• models highly abstract concepts
– endurant vs. perdurant
– quality vs. substance
• Because the axioms at this level will be
inherited all the way down, we need to
be really careful here!
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
7
Upper Ontologies, some
examples
BFO - http://www.ifomis.org/bfo
SUMO - http://www.ontologyportal.org
DOLCE
16-20 July, 2012
http://www.loa.istc.cnr.it/DOLCE.html
Balancing Lexicographic and
Ontological Considerations
8
BFO
• is a relatively shallow top ontology
– 36 classes
– 6 layers deep
• The BFO consortium coordinates many
biomedical domain ontologies, users,
and developers.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
9
DOLCE
• DOLCE-Lite
– 37 classes
– depth of 6
• DOLCE-Lite Plus
– 208 classes
– depth of 13
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
10
The KYOTO Project
• 7th frame EU project, 2007-2010
• facilitates data mining and sharing from
texts in the domain of ecology across
seven languages
• WWF & ECNC are domain users
• www.kyoto-project.eu
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
11
The KYOTO Ontology
• Three layers Top, Middle, Domain
• Seven wordnets mapped to KYOTO Ontology
to facilitate data extraction and management
–
–
–
–
–
–
–
English
Spanish
Basque
Italian
Dutch
Japanese
Chinese
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
12
The KYOTO Ontology
KYOTO 3 - three layers Top, Middle, Domain
Wordnets mapped to KYOTO Ontology to facilitate data extraction
and sound inference
– English
– Spanish
– Basque
– Italian
– Dutch
– Japanese
– Chinese
Use Protégé 4.0 or older. KYOTO is not written in OWL2.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
13
KYOTO Top
Based on DOLCE-Lite Plus
• In DLP qualities are modeled according to the kinds
of entities that bear the quality.
– e.g., size is a physical quality since it inheres in a physical
object
• KYOTO Top extends the physical-quality hierarchy
– amount-of-matter-quality
– feature-quality
– physical-object-quality
• Added quality types
– dispositional
– relational
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
14
KYOTO Top
KYOTO Top extends the role hierarchy.
Roles are arranged according to the kind of
entity that bears that role.
– A physical-object-role is played by a physical
object.
– In the domain layer offspring is a subclass of
organism-role since organisms are the kinds of
things that play the role of offspring.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
15
KYOTO Middle
Includes:
• Base Concepts (BCs) from WordNet
– nouns
• Units of measurement, e.g., length, and other
qualities
• 72 new perdurants (processes and states)
• 123 new endurant terms (objects and substances)
• qualities that model adjectives
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
16
Base Concepts in KYOTO
Synsets from WordNet-3.0 (Fellbaum (1998))
– for each path from leaf to root: first node with at
least 50 hyponyms
– roughly: cheap (and inadequate?) computational
model for basic level concepts.
– CAREFUL: the set depends on structure and
coverage of WordNet which is idosyncratic
• cake
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
17
Base Concepts in KYOTO
BCs facilitate mapping wordnets onto the
ontology in KYOTO.
• WordNet is mapped onto the ontology
via BCs.
• BC equivalents in other languages are
indirectly mapped onto the ontology via
mappings to WordNet’s BCs.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
18
Base Concepts in KYOTO
• 297 BCs from the noun hierarchy and
• 578 BCs from the verb hierarchy
– need work, in Domain layer
– group names such as verb_change still
appear though not ontological
(Izquierdo et al. (2007)).
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
19
Sample BCs in KYOTO’s
Middle Layer
•
•
•
•
•
•
•
unit-of-measurement
number
color
change
book
message
food
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
20
BCs and KYOTO
In this case, the lexicon in conjunction
with considerations of the application
informed the population of the Middle
and Domain layers of KYOTO.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
21
KYOTO Domain
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
22
KYOTO Domain
Sample concepts from user scenarios
– fish family
– coast
– soil
– water
– breed
– biodiversity
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
23
The Lexicon
&
The Ontology
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
24
is-a
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
25
“is a”
The Problem
“Is-a” is ambiguous between individuals and
subclasses. This can lead to confusion.
For example, species terms can be confused.
Kermit is-ai leptopelis vermiculatus.
Leptopelis vermiculatus is-ac species.
Therefore, Kermit is a species.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
26
“is-a”
The Rule
The Rule: Every property of a class belongs to
every instance of that class.
Check for all inherited properties.
Species are comprised of many organisms that
can successfully reproduce fertile off-spring.
Is Kermit comprised of many organisms that
can successfully reproduce fertile off-spring?
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
27
“is a”
KYOTO’s Solution
Model species terms twice!
• Species in the sense of a group are modeled
as physical pluralities. This leptopelis
vermiculatus is an instance NOT a subclass.
• ‘Leptopelis vermiculatus’ can also refer to a
class. This is a type of organism.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
28
Rigid & Non-Rigid
Terms
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
29
Rigidity
The Problem
In ontologies and WordNet the subsumption relations
are determined according to different criteria.
• WordNet
– Hypernymy
– Based on psycholinguistic data; native language speakers
agree with word-use.
• Ontology
– Subclass
– Based on extention of a term, every x is a y.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
30
Transitivity of Subsumption
BECAREFUL! WordNet’s Hypernomy can lead to unsound
inferences.
Conclusion: If every pet has an owner, then every cat has an
owner.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
31
Rigidity
KYOTO’s Solution
• Distinguish rigid and non-rigid terms in
the wordnet.
– This distinction comes from OntoClean
(Guarino and Welty)
• Distinguish between roles and types in
the ontology.
• Map synsets to the ontology using
different mapping relations.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
32
Rigidity
• “Cat” is a rigid concept.
• “Pet” is a non-rigid concept.
• A concept is rigid if it is essential to all of
its instances.
– Permanence: Fluffy is always a cat, not
always a pet
– Necessity: Fluffy cannot stop being a cat,
Fluffy can stop being a pet.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
33
The Rule of Thumb
(See Giancarlo’s slides for a more nuanced view.)
Non-rigid terms should not
subsume rigid terms.
or
Roles should not subsume types.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
34
A Jumbled Hierarchy
amount of matter
-R drug
+R antibiotic
+R chemical compound
+R oil
-R nutriment (a source of material to nourish the body)
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
35
Clean Hierarchies
amount-of-matter
+R antibiotic
+R chemical compound
+ R oil
substance-role (role played-by some amount-of matter)
-R drug
-R nutriment
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
36
Mapping Synsets
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
37
Adjectives
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
38
Adjectives
General Strategy in KYOTO
Qualities are easily modeled according to the kinds of
entities in which they inhere.
For example, amounts of matter are the kinds of things
that have pH levels.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
39
Adjectives
General Strategy in KYOTO
The values for specific qualities like pH levels are
located in regions.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
40
Adjectives
The Problem
pH-levels are easy because
• they are measureable, i.e., objective
criteria.
• they are confined to one kind of entity,
namely, amounts of matter.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
41
Adjectives
The Problem
How should we model concepts like
“beneficial” or “important”?
• Subjective component
• Not necessarily “out there” in the world
• Not typically quantifiable
• Criteria are context dependent
• Many kinds of entities can be beneficial
or important.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
42
Adjectives
KYOTO’s Solution
The middle layer has a region
evaluative-region to accommodate
adjectives like ‘beneficial’ or
‘worthless’.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
43
Adjectives
KYOTO’s Solution
Concepts like “beneficial” and “important”
are
• not in the domain specific layer since
they are general concepts.
• not in the upper layer since they are
“subjective”.
• not in a strictly realist ontology like BFO.
• modeled orthogonally to “real” qualities
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
44
Adjectives
KYOTO’s solution
What kind of restriction can you write for
length?
long or 2m.
Indefinite qualities
Definite qualities
length q-located-in (length-measurementunit or indefinite-quality-region)
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
45
In Conclusion
• Procurement - BCs influenced the concepts
included in the KYOTO ontology.
• Hierarchy - subsumption relations must be
carefully distinguished in order to avoid
influence from the lexicon that might lead to
unsound inferences
• Qualities - Lexicalized adjectives that may not
have a realist corollary need to be modeled in
an orthogonal way.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
46
Bibliography
Fellbaum, C., editor (1998). WordNet: An Electronic Lexical Database. The MIT Press.
Guarino N., and Welty, C., (2004). An Overview of OntoClean, Handbook on Ontologies, ed. S.
Staab and R. Studer. pp. 151-172.
Herold, A., Hicks, A., Rigau, G., & Laparra, E. (2009) Kyoto Deliverable D6.2: Central
Ontology Version - 1, www.kyoto-project.eu.
Hicks, A., Rigau, G. (2010) Kyoto Deliverable D8.3: Domain Extension of the Central Ontology,
www.kyoto-project.eu.
Izquierdo, R., Suárez, A., and Rigau, G. (2007). Exploring the automatic selection of basic level
concepts. In Proceedings of the International Conference on Recent Advances on Natural
Language Processing (RANLP'07), Borovetz, Bulgaria.
Masolo, C., Borgo, S., Gangemi, A., Guarino, N., Oltramari, A., & Schneider, L.
(2002). Wonderweb Deliverable D17. The Wonderweb Library of Foundational
Ontologies and the Dolce Ontology.
Smith, B. (2004). Beyond Concepts: Ontology as Reality Representation. In Proccedings of FOIS
2004 International Conference on Formal Ontology and Information Systems.
Vossen P., et al. 2008. KYOTO: A system for Mining, Structuring and Distributing Knowledge
Across Languages and Cultures. In Proceedings of LREC 2008, Marrakech, Morocco, May
28-30, 2008.
16-20 July, 2012
Balancing Lexicographic and
Ontological Considerations
47
Descargar

Ontology, lexicon, and cognitive science