Lexicons …
and Complex Expressions:
towards Multilingual Linking
Nicoletta Calzolari
Copenhagen, Oct. 2001
Copenhagen, October 2001
What is SIMPLE?
A set of 12 harmonised computational
lexicons for HLT applications,
geared for multilingual links
A common
rich model
representation language
methodology of building the lexicon
common Template Types, with default obligatory info (Type
defining), and indication of optional info
First time: on a large scale, for so many languages
 Lexical meaning represented in terms of integrated combinations of
different sorts of information (semantic type, argument structure, relations,
features, etc. )
 Ontology-based information comes together with predicative representation and
syntactic linking
 A shared set of SemUs (from EWN) (about 700) of the 12 Lexicons crosslingually related
Copenhagen, Oct. 2001
PAROLE/SIMPLE Architecture +
CLIPS Italian National Project
60,000
lemmas
55,000
lemmas
SynU
SemU
Sem Info
MuS
SynU
SemU
Sem Info
TEMPLATE
Sem. Rel
Sem. Feat
SemU
Sem Info
Lexical Rel
Copenhagen, Oct. 2001
MuS
55,000
SemU
SemU
Sem Info
Semantic information in SIMPLE
Word senses encoded as Semantic Units (SemUs),
containing the following info:
• Semantic type *
• Domain *
• Lexicographic gloss *
• Extdended Qualia
structure
• Reg. Polysemy altern.
• Event type
• Derivation relations
• Argument structure for
predicative SemUs *
• Selection restrictions on the
arguments *
• Link of the arguments to the
syntactic subcategorization
frames (represented in the
PAROLE lexicons) *
• Synonymy
• Collocations
Copenhagen, Oct. 2001
Semantic Multidimensionality and NLP
NLP tasks (IE, WSD, NP Recognition, etc.) need to
access multidimensional aspects of word meaning,
represented in SIMPLE with the
Extended Qualia Relations
Is_a_part_of
la pagina del libro (the page of the book)
Member_of
il difensore della Juventus (Juventus fullback)
il suonatore di liuto (the lute player)
il tavolo di legno (the wooden table)
Made_of
Copenhagen, Oct. 2001
Telic
Overall Organization
Type
Ontology
150 types
Template
...
Greek lexicon
Danish lexicon
Catalan lexicon
Instantiation
Italian lexicon
SemU
Qualia
Derivation
Pred. Layer
Polysemy
Copenhagen, Oct. 2001
Predicate, arguments,
Selection restrictions
Event Type
…
The Core Ontology represents a first level of organization of the
semantic type system
Each type is associated to a Template consisting of a cluster of
information (relations, features, argument structure, event type, etc.) that
defines the type
The information characterizing a Semantic Unit includes:
a.
The type defining information (associated to the template the SemU
instantiates)
b.
Additional information (other relations or features, selectional
restrictions, terminology, cross-part of speech relations, polysemy,
etc.)
Copenhagen, Oct. 2001
Template
S e m U:
Id en tifier of a S em U
S ynU :
Id en tifier of th e S ynU to wh ich th e S em U is lin ked
B C N um be r:
Type System
Coordinates
N um b er of th e
E u roW ordN et
T e m pla te_ T ype :
S ema n tic typ e of th e S emU
“redundancy”
Unific a tio n_ pa th:
U n ifica tion h istory of a tem p la te (on ly for u n ified top -typ es)
D o m a in:
D om a in in form a tion from ER L I's d om a in list
S e m a ntic C la ss :
O n e of W ord N et C la sses u sed b y E RL I
G los sa :
L exicog ra ph ic d efin ition
E ve nt T ype :
E ven t S ort
P re dica tive
R e prese nta tio n:
P red ica te a ssocia ted w ith th e S em U, a nd its a rgu m en t
stru ctu re
S e lec tio na l R es tr.:
S election a l restric tion s on th e a rg um en ts
D e riva tio n:
D eriva tion a l rela tion s b etw een S em U s
F o rm a l:
F orm a l rela tion b etw een S em U s
Ag e ntive :
Ag en tive rela tion s b etw een S em U s
C o ns titutive :
 C on stitu tive rela tion s b etw een S emU s
 C on stitu tive sem a n tic fea tu res
T e lic :
T elic rela tion s b etw een S em U s
S yno nym y :
S yn on ym s of th e S em U
C o lloca tes :
C olloca te in form a tion
C o m ple x:
P olysem ou s cla ss of th e S emU
Predicative
Layer
Qualia
Structure
Contextual/
Polysemy
Information
corresp on d ing
Ba se
C on cep t
in
T e m pla te_S upe rtype : S ema n tic typ e w h ich d om ina tes th e typ e of th e S emU in th e
typ e-h iera rch y
Copenhagen, Oct. 2001
Verb Examples:
Noun Examples:
Linguistic Tests:
Levin Class:
Comments:
hear, smell, etc.
Template for “Perception”
sight, look, etc.
….
30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g.
look, smell)
Processes involving an experiencing relation, ….
<guardare_2> (look)
SemU:
1
Usyn:
BC Number:
105
Template_Type:
[Perception]
Template_Supertype: [Psychological_event]
Domain:
General
Semantic Class:
Perception
Gloss:
//free//
osservare con attenzione
Event type:
process
Pred _Rep.:
Lex_Pred (<arg0>,<arg1>)
Derivation:
<Derivational relation>
Selectional Restr.:
arg0 = Animate //concept// arg1:default = [Entity]
Formal:
isa (1,<SemU>:[Perception]>)
<percepire>:[Psych_ev]
Agentive:
<Nil>
Constitutive:
instrument (1, <SemU>:[Body_part]) <occhio>
intentionality ={yes,no} //optional//
={yes}
Telic:
<Nil>
Collocates:
Collocates (<SemU1>,...<SemUn>)
Complex:
<Nil>
Copenhagen, Oct. 2001
Modular Representation of a SemU
Semantic Relations
Flexibility: an extendable framework to allow
coherent future extensions & tuning for specific applications/text types
Pred. Layer
SemU
Rel. Layer
Qualia
multiple meaning
dimensions in a sense
Predicate, arguments,
selection restrictions, ..
Relations betw.
SemUs
Features
Derivation
Polysemy
cross-PoS relations
regular polysemous
classes
Copenhagen, Oct. 2001
Collocation
collocational
information
Top
Formal
Is_a
Constitutive
Telic
Agentive
Is_a_part_of .. Property
Created_by
Agentive_cause Indirect_telic
Purpose
Activity
...
Contains
..
...
..
Instrumental
Is_the_habit_of
100 Rels.
Used_for
Used_as
The targets of relations identify:
 prototypical semantic information associated with a SemU
 elements of dictionary definitions of SemUs
 typical corpus collocates of the SemU
Copenhagen, Oct. 2001
Ala (wing)
Agentive
SemU: 3232
Type: [Part]
Part of an airplane
<fabbricare>
make
Used_for
Is_a_part_of
Isa
SemU: 3268
Type: [Part]
Part of a building
SemU: D358
Type: [Body_part]
Organ of birds for flying
Isa
<parte>
part
<volare>
fly
<aeroplano>
airplane
Used_for
Isa
<edificio>
building
Is_a_part_of
Is_a_part_of
SemU: 3467
Type: [Role]
Role in football
Isa
<giocatore>
player
Copenhagen, Oct. 2001
<uccello>
bird
Relations and Predicates
Pred_SELL <ARG0>, <ARG1>,
<ARG2>, <ARG3>
SemU
Sell V
Is_the_agent_of
SemU
SemU
Seller N
Sale N
Event_noun
Copenhagen, Oct. 2001
Comprendere V
Comprensione N
SemU: 61725
SemU: 61726
Type: [Cognitive_event]
Type: [Cognitive_event]
To understand
Understanding
master
SemU: 6962
Type: [Constitutive_state]
To include
problems
with
selection
restrictions
!!!
verb_nominalization
Comprendere#1
<Arg1 [Human]>, <Arg2 [ Semiotic]>
master
Comprendere#2
<Arg1 [Group]>, <Arg2>
Copenhagen, Oct. 2001
SIMPLE/CLIPS figures (now)
(11,000 Lex. Units)
Nouns:
Verbs:
Adjectives:
Predicates:
12161
3476
1266
4368
• Templates
Instrument
Human
PsychologicalProperty
Profession
Purpose_Act
Part
Human_Group
Relational_Act
AgentTemporaryActivity
Domain
734
712
586
541
535
503
502
521
320
303
16,903 SemUs
• Features & Relations
Agentive
EventTypeProcess
EventTypeTransition
AgentiveCause
Usedfor
Synonym
ResultingState
Isapartof
Hasaspart
Istheactivityof
Objectoftheactivity
AntonymGrad
Createdby
Agentverb
Concerns
Copenhagen, Oct. 2001
1846
1945
1463
1175
1488
1258
1197
909
800
611
598
575
525
454
421
Core Lexicons enlarged in
National Projects
PAROLE/SIMPLE/EWN start providing the common platform
For the subsidiarity concept the process started at the EU level is
continued at the national level:
extended in (at least) 9 National Projects
(Danish, Greek, Italian, Portuguese, Swedish, ...)
(to be) used in applications
True Infrastructure of harmonised LRs in EU
Basis for Multilingual LR
ENABLER (coord. A. Zampolli)
Copenhagen, Oct. 2001
Harmonisation:
Need for a Global View
 Interaction/sharing of data & software/tools
 Need of compatibility among various components
 An “exemplary cycle”:
Representation
Lexicon
Formalisms
Grammars
Software: Taggers,
Chunkers, Parsers
Software:
Acquisition Systems
I/O Interfaces
Copenhagen, Oct. 2001
Languages
Annotation
Corpora
SIMPLE wrt EAGLES/ISLE
Standards for
Multilingual Lexical resources
EAGLES guidelines for
syntactic and semantic
lexicons
PAROLE/SIMPLE
Lexicons
MT systems
ISLE recommendations
for multilingual
lexicons
Multilingual
Lexicons
Copenhagen, Oct. 2001
Mission
(http://lingue.ilc.pi.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm)
• MT and multilingual HLT need to enhance production,
maintenance & extension of computational lexical resources
• ISLE goals
– provide a common environment for the development, integration, interchange &
sharing of lexical resources with various types of linguistic information
– establish a virtuous circle betw. research, applications, & standardization
process: lay down a bridge betw. the worlds of research and application
– mark the boundary between well-consolidated practice and theoretical
achievements in multilingual HLT, and areas still open to research but critical for
future technological improvements
• Crucial role of intercontinental cooperation for preparing
ISLE recommendations and for their validation
Copenhagen, Oct. 2001
ISLE and MT
• Academic and industrial members of the MT community actively involved in
the ISLE group
– Microsoft, NMSU, Sail Labs, Systran, UMIACS, UPenn, ISI, etc.
• Survey phase:
– a number of lexical resources for MT systems surveyed by ISLE
• MT systems requirements provide the main reference points for
ISLE work, to determine:
–
–
–
–
types of lexical information critical to SL  TL mapping
criteria to create bilingual resources from existing monolingual ones
common data structures to develop reusable multilingual resources
critical areas of the lexicon: MWEs, complex transfer cases,
collocational/example-based information, etc.
MWE
parenthesis
Copenhagen, Oct. 2001
MWE in ISLE & XMELLT - 2 types of MWE:
 (Deverbal) nominalisations +support (light) verbs



make an acquisition1 (noun.act; verb.possession)
complete an acquisition1
undertake an acquisition1





make an application1 (noun/verb.communication)
have an application1 in
decide on an application1 (consider, hear)
get an application1 (receive, take)
submit an application1 (file)
1st
 Noun(/Adj/Poss)+Noun MW (Ital.: N+PP/N+Adj/N+Vinf/...)




air pollution
job application
murder suspect
police action; police scandal
•
•
•
•
•
•
coltello da macellaio
carta di credito
carta telefonica (adj)
agenzia di viaggi
film per adulti
macchina da scrivere
butcher's knife
credit card
phone card
travel agency
adult movie (adj)
typewriter (comp.)
Copenhagen, Oct. 2001
2nd
No
equivalent
structures
1st
The Boundaries:
·Support Verbs: more than Light Verbs?
· Nominalisations: …. to a broader set
Both verbs, combined with an event noun, whose subjects are :
 participants in the event identified by the noun
 related to some scenario associated with the event
 Type 1: take an exam, give an exam
 Type 2: pass an exam, fail an exam, grade (evaluate) an exam
 Type 1: perform an operation, undergo an operation
 Type 2: survive an operation
But also … enlarge the concept of nominalisation to
 event/result/abstract nouns not morphologically derived
dare un ceffone (to slap)
 provare rancore (to bear sb. a grudge)
 fare una festa (to have a party)
 fare festa (to have a holiday)
 fare festa a qno (to give sb. a warm welcome)
 prestare attenzione (to pay attention)
 fare la guerra (to wage war)
No verb
(for diachronic reason)
fare una cessione (cedere) vs. make? a cession (…)
avere una cessazione (cessare) delle ostilita vs. have? a cessation of hostilities (…)
Copenhagen, Oct. 2001
1st
Hypothesis for encoding:
“Mel’cuk type” Lexical Functions (LF)
 to record semantic contribution and/or aspectual properties
conveyed by the V
 to express argument-sharing betw 2 arg structures
Oper1: perform an operation; made an apology
Oper2: undergo an operation; merits discussion; had a visit
Func0: silence reign
Laborij: take into consideration
Incep: start the attack
Cont: maintain influence
Fin: complete the acquisition
Liqu: eradicate the disease
Real: keep the promise, approve the application
AntiReal: turn down, withdraw the application
….
Copenhagen, Oct. 2001
1st
Nominalisations:
examples from Corpus
accusa
(supp-v: formulare, lanciare, muovere, rivolgere,... (Oper1)
subire[default], beccarsi, attirarsi, rischiare,... (Oper2)
mettere, porre,... sotto a. (Laborij)
rintuzzare, rigettare, smontare, … (Liqu)
Problematic?:
ritorcere, rovesciare… (...)
sostenere,… (...)
ripetere, … (...)
…..
____________________________________________________________
acquisizione
(supp-v: (fare)[default], condurre, curare,effettuare,... (Oper1)
varare,... (Incep)
perfezionare, completare, concludere, … (Fin)
evitare, compromettere, … (Liqu)
sfumare, … (LiquFunc0)
Problematic?:
annuciare, dichiarare, … (say)
decidere, proporre, promuovere, stimolare, … (...)
consentire, permettere, proporre, garantire, … (...)
…..
Copenhagen, Oct. 2001
Automatic
acquisition
1st
Support Verbs: what to list
for multilingual lexicons?
 Decide if to include/list, for a noun
all the verbs usable for a Melcukian LF
INCEP: cominciare [default] vs. varare, intraprendere, …
INCEP: begin [default] vs. open (an investigation), …
OPER1:say a prayer (not make, like with other speech act nouns)
OPER1:pay attention
only those lexically dedicated to that noun (needed for generation)
(not the general & available by default for a LF)
 begin an exam/operation or finish an exam/operation
 similar words preferentially select different verbs to express
similar meanings (same lexical functions): lexical preference
Copenhagen, Oct. 2001
Complex nominals
2nd
in a multilingual framework
 Different syntactic patterns in L1 & L2
N+Nh (= head noun) in English is usually Nh+PP in Italian
tooth brush
spazzolino da denti
& the syntactic pattern is not predictable
hair/clothes brush
spazzola per capelli/abiti
nail brush
spazzola per le unghie
Fillmore
• travel agency
agenzia di viaggi
• real estate agency
• marriage bureau
agenzia immobiliare
agenzia matrimoniale
 A MWE in L1 corresponding to a fully compositional phrase
cucchiaino da caffè
coffee spoon???
 For MT implies some conceptual (interlingual?) representation
 but the “encoding” process must find an appropriate MWE if it is called for
 analogous to blocking/pre-emption: a regular/compositional process is not
carried out (dispreferred) because the semantic space occupied by the concept
associated with that formation is already claimed by some ready-made expression
Copenhagen, Oct. 2001
2nd
Broader scope :
extension to non MWE?
If look at devices in grammar that allow to produce new MWEs
a continuum:
N+PP>collocation>multi-word>idiom
Fillmore
productive mechanisms in the language
but idiosyncratic
information at the borderline betw. grammar & lexicon
Amounts to:
 describe productive modification relation of N in general:
 in particular those lexically selected/preferred by a N (its semantic paradigm)
MWE are a subset of these
(give good hints to discover most prominent relations??)
 look at the semantic structure of Nouns: i.e. at the variety of
modifiers they can select byCopenhagen,
virtueOct.of
2001 their meaning
2nd
Noun Compounds/Complex Nominals
…are pervasive
 There is a motivation in most N+N construction:
the context provides it
Fillmore
Busa
 The FrameNet (SIMPLE) way
appeal to specific frame structures (qualia structures)
associated with the head noun,
determine from corpus attestations which frame elements
(qualia) can get instantiated as a modifier word
 “container”: complex nominals can specify:
material (aluminium c., glass c., …)
contents (food c., trash c., …)
size (3 quart c., …)
function (shipping c., storage c., …)
...
Copenhagen, Oct. 2001
2nd
Noun Compounds/Complex Nominals
& multidimensional semantic approaches
a. FrameNet
Container Frame: Frame Elements: Material,Contents,Size,Function
• Material: aluminum container, glass c., metal c., tin c.
• Contents: food container, beverage c., trash c., water c., milk c., fuel c.
• Size:
3 quart container
• Function: shipping container, storage c.
b. SIMPLE
Qualia Relations of "container" used in compounds:
• Constitutive: made_of [MATERIAL]
aluminum container, glass c., metal c., tin c.
• Telic: contains [ENTITY]
food container, beverage c., trash c., water c., milk c., fuel c.
• Constitutive:size [QUANTITY]
3 quart container
• Telic:is_used_for [EVENT]
shipping container, storage c.
Copenhagen, Oct. 2001
2nd
Complex Nominals/Lexical Constructions
in a multilingual context…
describe vs. list?
if a compound noun is clearly lexicalized, it's simply one of the words in L1
but if it is an instance of some productive word-formation rule, we should
describe it
both describe & list:
list explicitly in the lexical entry
what is idiomatic/idiosyncratic wrt generation for
 lexical selection
 mucca pazza vs. matta
 prestare attenzione vs. pay attention
 structural pattern
 travel agency
agenzia di viaggi
 marriage bureau
agenzia matrimoniale (*di matrimonio)
 real estate agency
agenzia immobiliare
but also, an apparatus to describe how word semantics of Ns
interact when they co-occur (co-selection, co-composition, ...)
Copenhagen, Oct. 2001
2nd
In a multilingual context…
...regularities in each language, but they don’t match
 Both for decoding & encoding, we need both:
 a linguistic apparatus for interpretation
(e.g. to go to a language where it is not a MWE:
cucchiaino da caffè
for a Japanese useful to know … “used for”)
 lists for idioms…, for unpredictable/idiosyncratic
 Same apparatus to interpret both MWE & regular N
constructions (similar power of expressiveness): general principles of semantic
constitution of lex. items & their combinatorics in terms e.g. of frames/qualia/…:
 basic sem. notions &
 a general schema to characterise the problem, e.g.
frame (qualia) structure of the headN
semantic Type of the modifier N
allow the headN to impose its interpretation on the modification rel.
Copenhagen, Oct. 2001
...
2nd
Complex nominals, e.g.
knife (coltello) triggers
a “cutting frame” (FrameNet)
specific SIMPLE dimensions of meaning
extensively evaluate whether qualia roles (already) encoded in SIMPLE
correspond to what is necessary to interpret N-N modification relations
SIMPLE Extended Qualia structure
for the interpretation of the semantic relation betw. Ns
(internal relational structure of MWE)
butcher’s knife
plastic knife
table knife
hunting knife
(coltello da macellaio)  TELIC (used_by) Y [Human]  PPda
(coltello di plastica)  CONST (made_of) X [Material] PPdi
(coltello da tavola)  TELIC (used_in) Z [Location] PPda
(coltello da caccia)  TELIC (used_in_activity) E [Activity] Ppda
piatto di legno  CONST (made_of) X [Material] PPdi
piatto di pasta  CONST (contains)
X [Food] PPdi
Copenhagen, Oct. 2001
PP
disambig.
2nd
In SIMPLE:
possible extension
 Deverbal nominalisation:
 noun murder (uccisione, delitto, omicidio (different sem. pref.)
PPdi
PPda_parte_di, di
 verb murder (uccidere)
 subj:NP
 obj:NP
PRED:MURDER(uccidere)
ARG1:agent[Hum/Anim?]
ARG2:patient[Hum/Anim?]
MOD1:instr[Weapon]
MOD2:means[Action]
MOD3:...[...]
:instr: PPcon [Weapon] (knife m., con coltello)
:means: PPper [Action] (strangulation m., per strangolamento)
:loc: Ppploc|di [Location] (Kent State murders, nel ...)
:time: Ppptime|di [Time] (1983 murders, del 1983)
Copenhagen, Oct. 2001
As if it were
a Situation
… Monolingual Linguistic
Representation
Strategy:
 consider as the starting point for MILE the edited union of the
basic notions represented in the existing syntactic/semantic lexicons
(their models)
 evaluate their notions wrt EAGLES recommendations for syntax
and semantics
 evaluate their usefulness & adequacy for multilingual tasks
 evaluate integrability of their notions in a unitary MILE
 look for deficient areas, e.g. MWE
 ...
To be decided: should ISLE reach a consensus at the level
of the “types” of information only, or also at the level of
their “token” values? …. different answers for diff. notions
Copenhagen, Oct. 2001
… the Multilingual ISLE
Lexical Entry (MILE)
 General methodological principles (from EAGLES):
 Basic requirements for the MILE:
 Discover and list the (maximal) set of basic notions needed to
describe the MILE (up to which level standardisation is feasible?)
 Granularity
 The leading principle for the design of the MILE: the edited union
of existing lexicons/models (redundancy is not a problem)
 Modular and layered: various degrees of specification possible
 Allow for underspecification (& hierarchical structure)
Copenhagen, Oct. 2001
The MILE
• Main features
– factor out primitive units of lexical information
– explicit representation of information to be targeted
by multilingual NLP tools
– rely on lexical analyses with the highest degree of
inter-theoretical agreement
– avoid framework-specific representational solutions
– open to different paradigms of multilinguality
– oriented to the creation of large-scale lexical
databases
Copenhagen, Oct. 2001
MILE
Objective: definition of the MILE
as a meta-entry
to act as a common format for resource sharing and
integration/architecture for lexical data encoding
 its basic notions
 general architecture
 formalized as an entity-rel.
model (XML, RDF, etc.)
 with a tool to support it
open to task- & system-dependent parameterisation
Copenhagen, Oct. 2001
Agreed Principles
 MILE builds on the monolingual entry & expands it
 MILE incorporates previous EAGLES recommendations
 is the “complete” entry
adopt as starting point the PAROLE/SIMPLE DTD
 to be revised, augmented, ...
We consider 2 broad categories of applications :
 MT
 CLIR (linking module may be simpler/ontology based)
 (label info types wrt application)
Copenhagen, Oct. 2001
Modularity in MILE
 Advantages:
 Flexibility of representation
 Easy to customise and update
 Easy integration of existing resources
 High versatility towards different applications
Modularity at least under three respects:
 in the macrostructure and general architecture of the MILE
 in the microstructure of the MILE
• monolingual linguistic representation (previous EAGLES revised/updated)
• collocational/corpus-driven information (new)
• multilingual apparatus (e.g. transfer conditions and actions; interlingua)
(new)
 in the specific microstructure of the MILE word-sense
Copenhagen, Oct. 2001
Modularity in MILE
Meta-information
Architecture
A. MILE Macrostructure
C. Word-Sense Microstructure
MILE
1. Coarse-grained
B. MILE Microstructure
1. Monolingual
2. Fine-grained
2. Collocational
Copenhagen, Oct. 2001
3. Multilingual
The MILE Architecture
Monolingual Lexical Description
– three independent and yet linked layers characterising the MILE in a
source language
– possibly corresponds to the typology of information contained in
major existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet,
COMLEX, FrameNet, etc.
– simple and complex lexical unit (to account for MWEs)
– various degrees of granularity of lexical units representation
semantic layer
correspondence
conditions
syntactic layer
morphological layer
Copenhagen, Oct. 2001
The MILE Architecture
Multilingual Layer
– acts as an (independent) interface layer between
monolingual lexicons
multilingual layer
semantic layer
correspondence
conditions
syntactic layer
Lexicon 1
morphological layer
Copenhagen, Oct. 2001
Lexicon 2
The MILE Multilingual Layer
….(NEW)
• Correspondences can be established between different
types of linguistic objects (strings, syntactic descriptions,
semantic elements, predicates, etc.)
• Transfer tests and actions to target various types of
lexical information in the monolingual layers
–
–
–
–
constrain syntactic positions and their fillers
lexicalize syntactic positions
add positions or arguments
add new features to define more fine-grained sense distinctions
relevant at the multilingual level
– restructuring argument configurations
– collocational information
– ...
Copenhagen, Oct. 2001
Paths to Discover the
Basic Notions of MILE
• clues in dictionaries to decide on target equivalent
• guidelines for lexicographers
• clues (to disambiguate/translate) in corpus
concordances
• lexical requirements from various types of transfer
conditions and actions in MT systems
• lexical requirements from interlingua-based systems
• …
a list of critical information types that will
compose each module of the MILE
Copenhagen, Oct. 2001
Organisational Proposal:
division of labour
Highlighted some hot issues & assigned tasks:
















sense indicators (EU)
selection preferences (EU)
lexicographic relevance (EU)
argument structure (US)
MWE (EU & US)
collocations & parallel corpora (US)
modifiers (EU)
semantic relations (EU)
transfer conditions (EU & US)
collocational patterns (US)
ontology (US)
metaphors (EU)
interlingua requirements (US)
spoken lexicon (EU)
meta-representation (US & EU)
...
Copenhagen, Oct. 2001
Organisational Proposal
The tasks will lead to:
 an in-depth analysis of each area aiming at identifying:
 the most stable solutions adopted in the community
 linguistic specifications and criteria
 possible representational solutions, their compatibility, etc.
 evaluation of their respective weight/importance in a multilingual lexicon (towards
a layered approach to recommendations)
 open issues and current boundaries of the state-of-the-art (which cannot be
standardised yet)
 model limitations through creation of a sample dictionary
 …
 see how the various pieces fit together & can be merged in a unified
proposal
 evaluate if we can combine in a “hybrid super-model” the transfer &
interlingua approaches
Copenhagen, Oct. 2001
Information Types:
examples
Selectional preferences
1.
2.
3.
4.
Ontology
Transfer conditions and
actions
1.
2.
3.
4.
5.
1.
2.
3.
4.
5.
How to represent them (e.g. features, reference to an
ontology, word-senses, etc.)
Different status of the preferences
Criteria to identify them
Expressive limits of existing formal resources
Architectural issues (types of ontologies: e.g.
taxonomies, “Qualia”-based type systems, etc.)
Inheritance
Which roles for ontologies in the MILE
Representational issues
Customisation and development criteria
Identification of categories of transfer phenomena
Ranking of hard cases
Possible parameterisation wrt language types
How to formalise them
Types of actions
Copenhagen, Oct. 2001
CLWG Ongoing Activities
… to prepare a preliminary proposal of the MILE:
• existing models for lexical representation and data
interchange (Genelex, Olif, etc.) are explored
• model limitations and expressive power are tested through
creation of sample entries in a few languages
• groups at work
• lexical description and information: types of relevant info
• lexicographic exploration: systematic summary & classification of
types of transfer tests (also extracted from MRDs)
• multilingual correspondences
• lexical data modeling: format & representation issues
• tool development
Copenhagen, Oct. 2001
Representation issues
• Working with GENELEX, lexicon development work is
(can be) affected by:
– impossibility (or difficulty) of defining abstract and
general classes or types of objects
– lack of inheritance mechanisms
– lack of default expression and default rewriting
mechanisms
Cf. Lexical templates in SIMPLE:
• not included in the GENELEX data-structure
• implemented in the editing sw. tool
• very useful to capture relevant lexical generalizations, enhance
consistency in encoding, speed-up lexicographers’ work, etc.
Copenhagen, Oct. 2001
CLWG Ongoing Activity
MILE Lexical Objects
Formal Specifications
MILE Lexical Entry
Formal Specifications
MILE
Shared Lexical
Objects
User Defined
Lexical Objects
Monolingual & Multilingual
Lexicons
Copenhagen, Oct. 2001
MILE
Shared Lexical
Objects
MILE Repository of Shared Lexical Objects:
• Basic syntactic constructions (e.g. transitive,
etc.)
• (Micro-)semantic objects (e.g. features,
relations)
• (Macro-)semantic objects (e.g. lexical
templates)
• Multilingual constructions (e.g. basic transfer
conditions and actions)
Simplify using MILE
- New Lexical objects defined by the User
User-Defined
Lexical Objects
according to the common MILE formal
data-structure specification.
- Sub-types of the Shared MILE Objects
- Possibly enriched with metadata defining
their “semantics” and “usage”
- Lexical entries obtained by referring to
Monolingual
&
Multilingual
Lexicons
various lexical objects (both Shared and
User-defined)
- The MILE lexical entry model specifies
how lexical objects can be combined to
achieve the proper lexical representation
Copenhagen, Oct. 2001
Involvement
of Asian Languages
 participation in last meetings
 some input from Asia
 formal cooperation EU-ASIA: steps to put
in motion
Copenhagen, Oct. 2001
Impact & synergies
 real impact… to be evaluated later
through the use in applications
 already its being a US/EU project &
 the Asian interest
 synergies now, e.g.:
 PAROLE/SIMPLE (also instantiated in 9 national projects): main input
 EuroWordNet: provides input
 XMELLT (NSF): provides input
 OLIF: expects (& provides) input
 SALT: complementary
 ENABLER: validation (& expects input)
 ELSNET: validation
 SENSEVAL: validation
 NIMM WG for Metadata for CL(also with the US OLAC)
Copenhagen, Oct. 2001
 ...
Target: ….
Multilingual Content Management
the Resources viewpoint
The relevance/impact of (good vs. less good) LRs for high-quality
Cross/Multilingual systems is high, even if not easily measurable.
Different applications, component technologies - & approaches
within - need different info types (e.g. CLIR or content access systems
wrt MT)
For each, need to specify (not an easy task):
clear lexical/linguistic/conceptual requirements
priority info types (which, how encoded, etc.)
the respective role of e.g. annotated corpora, mono- bimultilingual lexicons (with different info types), ontologies, KBs
Copenhagen, Oct. 2001
Economic Feasibility:
for which (Multilingual) Resources
to invest?
Wrt short- vs. medium-term impact:
Basic, general purpose bi-/multilingual lexicons, but to be tuned,
adapted to different applications
need of robust systems able to acquire/tune
(multilingual) lexical/linguistic/conceptual knowledge,
to accompany static basic resources
We shouldn’t rely only on parallel corpora. More advisable to aim at
reliable methods for acquisition & use of ‘comparable corpora’,
accompanied by
robust technologies for annotation (at different levels: morphosyntactic,
syntactic/functional, semantic, …), and by
a shared set of (text) annotation
schemata
Copenhagen, Oct. 2001
Target…..
Multilingual Knowledge Management
Technical Feasibility:
 Prerequisite: is it an achievable goal a commonly
agreed text/lexicon annotation protocol also for the
semantic/conceptual level (to be able to automatically
establish links among different languages)?
Yes, at the lexical level
EAGLES/ISLE
More complex, for corpus annotation?
Copenhagen, Oct. 2001
?
Content for practical use:
Gap betw. Resources and Systems?
 If we had real-size lexicons with very fine-grained
semantic/conceptual info, would there be systems
(non ad-hoc toy systems) able to use them?
 A vicious circle between
 i) lack of suitable, large-size and knowledge intensive,
resources (lexicons and corpora, with many different types of syntactic
and semantic information encoded), and
 ii) systems’ ability to use them effectively
 The two targets should be pursued in parallel,
 should closely interact with each other, and
 be gradually integrated
Copenhagen, Oct. 2001
Descargar

Presentazione di PowerPoint