The Harmony of Music and
Computing
Expanding a DomainSpecific Database
Jantine Trapman
Overview
• Components
– LT4eL
– Cornetto
• Creation / expansion of Music Ontology
– Automatic Creation
– Watson
– Prompt
• Mapping
– Music Ontology
– Cornetto
Components
• LT4eL
• Cornetto
Components: LT4eL
Language Technology for eLearning
www.lt4el.eu
• Development of search and management
facilities in the LMS:
– Keyword Extractor
– Glossary Candidate Finder
– Semantic Search
Semantic Search
• Based on:
– (multilingual) documents (LOs) for eight
languages
– semantic annotation of LOs
– ontology
– lexicon for each language involved
• Corpus and ontology are restricted to
Computing domain
Computing Ontology (1)
• Creation:
– Manually annotated keywords in eight languages
extracted from LOs
– Translated into (English) concepts
– Definitions collected on the WWW and added to
concepts
• Extension with additional concepts from:
–
–
–
–
Restrictions on existing concepts
Superconcepts of existing concepts
Missing subconcepts
Annotation of LOs
Computing Ontology (2)
DOLC
E
• Domain ontology:
– Domain: Computing
– Manually created
– 1406 concepts
WordNet
• 50 from DOLCE
• 250 intermediate concepts
from OntoWordNet
Computin
g
• Use:
– Lexicon development for 8
languages
– Semantic annotation LOs
– LO indexing
LT4eL
lexicons
German
Polish
Romani
an
Maltese
Portugu
ese
English
Bulgari
an
Czech
Dutch
Computing Ontology Part
Computing Lexicon
• Concepts were translated in all languages
• Each entry contains three types of
information:
– Concept (and superconcept):
CDDrive (is-a Drive)
– Definition:
a drive that reads a compact disc and that
is connected to an audio system
– Set of terms in a given language:
CD-speler, CD drive
Expansion of the LT4eL KB
• Future: more domains needed
• Task:
– Expansion ontology and lexicons
– Preferably semi-automatic
• Three options:
– Top-down
– Bottom-up
– Both, ingredients:
• Cornetto, WordNet
• Music ontology
• Watson, Prompt
Cornetto
SUMO/
MILO
• Combinatorial and Relational
• Network as Toolkit for Dutch
Language Technology
• Referentie Bestand Nederlands
(RBN)
 lexical units
• Dutch part of EuroWordNet:
Dutch WordNet (DWN)
 synsets
• SUMO/MILO plus extensions
 terms and axioms
• Core: table of Cornetto Identifiers
(CIDs)
Wordnet
Dutch
WordNet
(DWN)
Referentie
Bestand
Nederlands
(RBN)
Cornetto
Database
http://www.let.vu.nl/onderzoek/projectsites/cornetto/index.html
Example Lexical Entry Cornetto (1)
[noun] zanger
Sense
CID
Iemand die zingt
c_n-42316
Vogel die zingt
c_n-42317
(Poëtisch voor) dichter
c_n-42318
…
…
[noun] zanger:1 c_n-42316
• Morphology:
type:derivation; structure:zingen[*er];
plurforms:zangers
• Syntax:
gender:m/f; article:de
• Semantics:
reference:common; countability:count; type:human;
subclass:beroepsnaam/beoefenaar; resume:iemand
die zingt
• Pragmatics:
domain:muz
Example Lexical Entry Cornetto (2)
• Combinatorics zanger1:
– De redacteur van het woordenboek was ook een zanger
– De zanger van de band
• SUMO: (+, , hasSkill)
• Synonyms:
zanger, zangeres
HAS_HYPERONYM musicus, musicienne, muzikant
HAS_HYPONYM baszanger, sopraan, blueszanger, charmezanger,
...
• Equivalence relations:
EQ_SYNONYM singer, vocalist, vocalizer, vocaliser /ENG2009908715-n  link with WordNet 2.0!
• WordNet Domains: music
Goal:
Tasks
– Extract music related terms from Cornetto
– Create a domain ontology for Music
– Map between terms from lexicon and
concepts in ontology
– Map music ontology to OntoWN and DOLCE
– Adjust Cornetto data to LT4eL format
Questions (1)
1. How can we automatize the process of
ontology building and to which extent?
2. How can we profit from existing
resources from the Semantic Web to
enrich ontologies?
3. To which extent do Watson and PROMPT
support the reuse of existing resources?
Music Ontology
• Automatic Creation
• Expansion with:
• Watson
• Prompt
Automatic Creation (1)
• (Basili et al. 2007): automatic ontology
extraction from open-domain corpus (BNC)
• Designed for three tasks:
1. lexical ambiguity resolution within a specific domain
2. restricting a set of terms to a subset relevant for an
ontology to be constructed
3. expanding this new ontology with other, novel and
relevant concepts, relations and instances.
Automatic Creation (2)
• Preprocessing:
– Corpus split in 40 sentence text segments
– PoS tagging
– Filtering of noun phrases
• General steps:
– Term extraction through Latent Semantic
Analysis (Deerwester et al. 1990)
– Ontology extraction from WordNet based on
Conceptual Density (Agirre and Rigau 1996)
Music
Ontology
Part
Music Ontology (Basili et al. ‘07)
• 46 primitive classes
• Leaf concepts have a synset ID from
WordNet
• No properties, only super-/subconcept
relation
• So.. a rather small and shallow ontology
expansion by exploiting Semantic Web
techniques
Watson (1)
http://watson.kmi.open.ac.uk/WatsonWUI/
• Every URI is clickable: all resources are
available
• Information about:
–
–
–
–
Size
Representation language
Number of classes, properties, individuals etc.
Review rating
• Interface for SPARQL queries
• Possibility of (upwards) navigation
Watson (2)
• Also available as
• Protégé plug-in (under development)
• API
• New concepts can be added
• Manually
• One by one
• Much human action required
• Faster than creation from scratch, but still
a tedious exercise
Watson (3)
• Watson provides in
– a list of URIs of available semantic databases
– a list of candidate concepts
• What is still lacking:
– a (semi-)automatic way to merge or align new
concepts or ontologies to an existing one.
• Possible solution: Prompt
PROMPT (1)
http://protege.stanford.edu/plugins/prompt/prompt.html
• Protégé plug-in
• Functionalities:
•
•
•
•
Comparison
Inclusion
Merging
Alignment
• Requirement: ontologies for merge etc. must be
available offline
• Prompt goes beyond purely syntactic matching
• Evaluation shows that experts followed 90% of
Prompt’s suggestions
Prompt (2)
• Saves time and effort:
– linguistically similar classes are found quickly
– inherited properties and subclasses can be added
automatically
– similar structures are automatically detected
– automatic consistency check
• Resources must have the exact same markup
language
• Merging:
– faster but more complex
– requires good insight in resources
Mapping
• Music Ontology
• Cornetto
Resources
• Music Ontology:
– Some nodes have WordNet ID (from the automatic
process
– Many haven’t, especially those added with Watson
• Cornetto entries:
– have synset ID from Dutch WN
– have mapping to WordNet entry through equivalence
or near-equivalence e.g.
Questions (2)
4. To which extent does WordNet support a
mapping between:
a) The Cornetto lexicon and a newly created
ontology partly based on Wordnet;
b) The existing ontology and lexicon from
LT4eL, and Cornetto + ontology
Procedures
• A concept either has or has not a WN synset ID
• Mapping via WordNet synset ID:
– Lookup synset ID in Cornetto
– Establish related DWN synset(s)
– Results: until now without problems although nearequivalence relations are expected to give
mismatches
• Mapping without synset ID:
– Syntactic matching of conceptname with terms from
WordNet synsets
– compare definitions and glosses
Examples “easy match”
• zanger:1 d_n-20810 (iemand die zingt)
is
[EQ_SYNONYM] of:
singer, vocalist, vocalizer, vocaliser /ENG20-09908715-n
(a person who sings )
• strijkkwartet:1 d_n-14287
(ensemble van vier strijkers)
strijkkwartet:2 d n-19905
(ensemble voor vier strijkers)
[EQ_NEAR_SYNONYM] of:
soloist:1/ENG20-09931035
and:
are
• Note: Cornetto contains mismatch between WN and
DWN
Matching without ID (1)
• For each owl:Class in Music ontology
–
–
–
–
try to match with:
target attribute in relation element of Cornetto XML structure, where
Attribute relation_name is (EQ_)NEAR_SYNONYM e.g.
Add synset ID to concept (for mapping to OntoWordNet)
<owl:Class rdf:about=“http:///myOntos/music.owl#orchestra"/>
<relation relation_name="EQ_NEAR_SYNONYM" target20previewtext="symphony orchestra:1, symphony:2"
version="pwn_1_6" target20="ENG20-07750308-n"
target="ENG16-06123240-n">
Matching without ID (2)
• Compare definitions and glosses:
– many ontology classes have a definition
– each WN synset has a gloss
– preprocess: stemming and filtering nouns
– Consider percentage of nouns in concept
definition that match with a certain gloss
– Evaluate results
• Note: some definitions are equal to WN
glosses
Current work
• Matching without ID on class name and
definitions/glosses
• Manually check results for precision and recall
• Problem: MWEs, e.g. class Brass_Instrument:
– has no precise WN counterpart, but
– Brass does exist, but
– it has multiple senses  how can we disambiguate?
• Question: ID allows easy and reliable match, but
can we do the task without?
Remaining and Future work
•
•
•
•
•
Attuning format lexicon to LT4eL format
Mapping to OntoWordNet (semi-automatic)
Mapping to DOLCE (manual task)
Ontology evaluation
Experiments with WordNets from different
languages
• Involve additional lexical info to improve
LT4eL search engine e.g. use
morphological info about plural forms
Descargar

Slide 1