Risorse Linguistiche
(lessici, corpora, ontologie, …)
Standard e tecnologie linguistiche
Nicoletta Calzolari
Istituto di Linguistica Computazionale - CNR - Pisa
[email protected]
With many others at ILC
N. Calzolari
Dottorato, Pisa, Maggio 2009
1
Old slide with Antonio Zampolli (’80s/early ‘90s)
Why such needed LRs, were lacking
after 30 years of R&D in the field?
 1) Because the main trend until mid-’80s was to privilege the processing
of “critical” phenomena, studied by the dominating linguistic theories,
rather than focusing on the deep analysis of the real uses of a language

As a result CL was focusing on:
 few examples - often artificially built
 lexicons made of few entries (toy lexicons)
 grammars with poor coverage
 2) Because large-scale LRs are costly & their production requires a big
organizing effort
Why we still lack them??
N. Calzolari
Dottorato, Pisa, Maggio 2009
2
Historical notes
The beginnings…

After many years of complete disregard – or even disdain and contempt – for
LRs, due mainly to the prevalence and influence of the generativist school
Work on Machine Readable Dictionaries:
Early interest:
Pioneering
Research
To become machine-tractable
 To extract info from them – with much less powerful tools than now
 Precursor of the trend of automatic acquisition from corpora
Acquilex (Pisa et al.)
Work on/with Longman dictionary (Las Cruces)


NSF & EC International Cooperation grant, promoted by Wilks, Zampolli,
Calzolari (Las Cruces & Pisa)
N. Calzolari
Dottorato, Pisa, Maggio 2009
Don Walker
&
Antonio Zampolli3
… back from the ’70s/‘80s
Automatic acquisition of lexical information from MRDs
Was at the centre of activities in Pisa group, Amsler, Briscoe, Boguraev, Wilks’ group,
IBM, then Japanese groups, …
The trend was: “large-scale computational methods for the transformation of
machine readable dictionaries (MRDs) into machine tractable dictionaries”
It became evident that:
Part of the results of meaning extraction, e.g. many meaning distinctions, which
could be generalised over lexicographic definitions and automatically captured,
were unmanageable at the formal representation level, and had to be blurred
into unique features and values.
Unfortunately, it is still today difficult to constrain word-meanings within a
rigorously defined organization: by their very nature they tend to evade any strict
boundaries
N. Calzolari
Dottorato, Pisa, Maggio 2009
4
After that pioneering era,
production & use of adequate LRs
strongly increased
The lexicon has become ever more relevant

Both international and national authorities started investing in the field
as never before, interested in technologies & systems which are really
working and are economically interesting

The need of empirical methods, based on the analysis of large amount
of data, has been recognized

LRs must be robust enough for analysing the concrete uses of a
language, either theoretically “interesting” or not
Data-driven
approaches
N. Calzolari
Dottorato, Pisa, Maggio 2009
5
Since then …

LRs have acquired larger resonance in the last 2 decades, when many activities, in
Europe and world-wide, have contributed to substantial advances in


knowledge and capability of how to represent, create, acquire, access,
exploit, harmonise, tune, maintain, distribute, etc. large lexical and textual
repositories
In Europe an essential role was played by the EC, through initiatives
NERC
PAROLE
SIMPLE
EuroWordNet
EAGLES
ISLE
ELSNET
RELATOR
…

N. Calzolari
that saw the participation of many EU groups, linked over the years by sharing
common approaches and visions
Dottorato, Pisa, Maggio 2009
6
… back from the late ‘80s
After acquisition from MRDs,
Automatic acquisition of info from texts:
This trend has become today a consolidated fact, and we have
moved
from focusing on acquisition of “linguistic information” (as
at the beginning)
to broader acquisition of “general knowledge”, with more
data intensive, robust, reliable methods
N. Calzolari
Dottorato, Pisa, Maggio 2009
7
We started building:
LRs as necessary infrastructure (Lexicons/Corpora)
both for research & applications:
LRs give to NLP systems the knowledge needed for the various linguistic processing
Realising that most of the needed information


escapes individual “introspection”
can only be acquired analysing large textual corpora attesting language use in
different fields/communicative contexts

Sub-product?: Importance of statistical methods
BUT need of adequate models to handle actual usage of language
Lesson:

Going from core sets to large coverage has implications not just in
quantitative terms, but more interestingly in terms of changes to the
models and the strategies of processes
N. Calzolari
Dottorato, Pisa, Maggio 2009
8
What are we (LT& LR) assembling, ….
since many years?

Lexicons & their Ontologies



Written, Spoken, ItalWordNets, PAROLE/SIMPLE, FrameNets, …
Annotated corpora/Treebanks
Basic Tools
 Integrated Architecture for








N. Calzolari
Annotation at various levels (from morph. to conceptual)
Acquisition/learning
Classification
Ontology creation
…
Methodologies
Know-how & expertise
Infrastructural bodies (on which to build)
Dottorato, Pisa, Maggio 2009
Standards
… components
of a very large
infrastructure of
LRs & LT
9
History: Some international LRs initiatives


















ACQUILEX [since ’88]
MULTILEX
ET-7
ET-10
TEI
NERC
RELATOR
ONOMASTICA
MULTEXT
COLSIT
LSGRAM
DELIS
Essential role of EC
EAGLES
to start a basic
PAROLE
SIMPLE
Infrastructure
SPARKLE
ELSNET
EuroWordNet
Established a model
N. Calzolari
EU at the
forefront in the areas
of LRs and standards
in the ’90s
Dottorato, Pisa, Maggio 2009






















MATE
NITE
Cluster 488 (Italian)
TAL (Italian)
ISLE
ENABLER
INTERA
LIRICS
…
Senseval/Semeval
WRITE
Forum TAL (Italian)
…
ISO
ELRA
LREC
LRE Journal
NEDO
Language Grid
BootStrep
KYOTO
…
10
Today: a broad “potential” Infrastructure
Vitality & Success signs… for LRs
RELATOR
EAGLES/ISLE
ENABLER
ELSNET
TELRI
INTERA
LIRICS
…
ELRA
BLARK
Unified Lexicon (W/S)
LREC
EU
LDC & others
ISO
COCOSDA/WRITE
US Cyberinfrastructure
Japan COE21
NEDO
Language Grid
…
National
LRE journal
…
ERANET-LangNet
…
N. Calzolari
Internat
FLaReNet (ICT)
CLARIN (ESFRI)
Dottorato, Pisa, Maggio 2009
…
…
…
11
WordNets
Synsets linked by semantic relations
TOP Concepts: Object,Artifact,Building
Hyperonym: {edificio,..}
{Casa,abitazione,dimora}
{home,domicile,..}
{house}
Role_location: {stare, abitare, ...}
Hyponym:
{villetta }
{catapecchia, bicocca, .. }
{cottage}
{bungalow }
Role_target_direction: {rincasare}
Role_patient: {affitto, locazione}
Mero_part: {vestibolo}
{stanza}
Holo_part: {casale}
{frazione}
{caseggiato}
N. Calzolari
Dottorato, Pisa, Maggio 2009
12
ItalWordNet
Semantic Network
[Italian module of EuroWordNet]
~ 55.000 lemmas organized in synonym groups (synsets), structured in
hierarchies & linked by ~ 130.000 semantic relations
~ 55.000 hyperonymy/hyponymy relations
~ 16.000 relations among different POS (role, cause, derivation, etc..)
~ 2.000 part-whole relations
~ 1.500 antonymy relations, …etc.
Synsets linked to the InterLingual Index (ILI=Princeton WordNet),
Through the ILI link to all the European WordNets (de-facto standard)
& to the common Top Ontology
 Possibility of plug-in with domain terminological lexicons
(legal, maritime, … linguistic)
Usable in IR, CLIR, IE, QA, ...
N. Calzolari
Dottorato, Pisa, Maggio 2009
13
ItalWordNet: Clusters of “Base Concepts” = words
classified according to Ontology Top Concepts
Lexicon
or ontology
???
F u n ction
T op
= features
1stO rd erE n tity
C om p osition
2n d O rd erE n tity
O rigin
F orm
S itu ation T yp e
S itu ation C om p on en t
E tc… .
E tc.
C overin g P art
G rou p
N atu ral
O b ject S tatic
D yn am ic P h ysical
L ocation
E xp erien ce M en tal
L ivin g
H u m an
skin
body
hair
part
bodycell
covering
m uscle
organ
N. Calzolari
church
com pany
institute
organization
party
union
hum an
adult
adult fem ale
adult m ale
child
native
offspring
D irection
distance
spatial property
spatial relation
course
path
Dottorato, Pisa, Maggio 2009
change of position
divide
locom otion
m otion
feel
desire
disturbance
em otion
feeling
hum or
pleasance
14
2ndOrderEntity
1stOrderEntity
EWN
TopOntology
Origin
Form
Natural
Artifact
Living
Plant
Human
Creature
Animal
Substance Solid
Liquid
Gas
Object1
Composition
Part
Group
Function
Vehicle
Representation
MoneyRepresentation
LanguageRepresentation
ImageRepresentation
Software
Place
Occupation
Instrument
Garment
Furniture
Covering
Container
Comestible
Building
ItalWordNet
N. Calzolari
Dottorato, Pisa, Maggio 2009
SituationType
Dynamic
BoundedEvent
UnboundedEvent
Static
Property
Relation
SituationComponent
Cause
Agentive
Phenomenal
Stimulating
Communication
Condition
Existence
Experience
Location
Manner
Mental
Modal
Physical
Possession
Purpose
Quantity
Social
Time
Usage
3rdOrderEntity
15
EuroWordNet Multilingual Data Structure
TOP
ONTOLO GY
L IV IN G
A N IM A L
H UM AN
hond
cane
I ta lia n
WN
dog
English
p erro
S p a n is h
WN
…
…
N. Calzolari
F ren ch
WN
IL I
G erm an
WN
E s to n ia n
WN
Dottorato, Pisa, Maggio 2009
D u tc h
WN
dog
E n g lis h
WN
C zech
WN
…
…
17
Terminological Wordnets:
e.g. Jur-WordNet

Jur-WordNet  Extension for the juridical domain of
ItalWordNet
(With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)
N. Calzolari

Knowledge base for multilingual access to sources of legal information

Source of metadata for semantic markup oflegal texts

To be used, together with the generic ItalWordNet, in applications of
Information Extraction, Question Answering, Automatic Tagging,
Knowledge Sharing, Norm Comparison, etc.
Dottorato, Pisa, Maggio 2009
18
Terminological Lexicon of Navigation
 Nolo
Synset  1.614
Lemmas 
2.116
Senses  2.232
Nouns  1.621
Verbs  205
Adjectives  35
Proper Nouns
 236
N. Calzolari
Dottorato, Pisa, Maggio 2009
19
SIMPLE Lexicon & Ontology
Multidimensional Type Hierarchy
http://www.ilc.cnr.it/clips/CLIPS_ENGLISH.htm

Shared by 12 European languages
Theoretical background: Generative Lexicon (Pustejovsky)

157 language independent SIMPLE semantic types:



N. Calzolari
Based on hierarchical & non-hierarch. conceptual relations
Difference of internal complexity:

Simple types (one-dimensional) characterised in terms of hyperonymic
relations

Unified types (multi-dimensional) only definable through the
combination of:
 the relation to their supertype +
 the reference to orthogonal dimensions of meanings (through the
Qualia Structure)
Dottorato, Pisa, Maggio 2009
20
PAROLE- SIMPLE-CLIPS Lexicon:
…harmonised model for 12 European languages
N. Calzolari
Dottorato, Pisa, Maggio 2009
21
Overall Organization
...
Greek lexicon
Danish lexicon
Type
Ontology
150 types
Template
Catalan lexicon
Instantiation
Italian lexicon
Pred. Layer
SemU
Qualia
N. Calzolari
Derivation
Polysemy
Dottorato, Pisa, Maggio 2009
Predicate, arguments,
Selection restrictions
Event Type
…
22
Model Architecture
The first three levels : Information content
stress position
vowel openness
cons. prononciation
Phonological
Unit
Corresp. PhnU-MrphU
syntactic
argument
PoS (& PoS subcategory)
inflectional paradigm
position list
position restr.
position list
position restr.
Morphological
Unit
syntactic
behaviour
a. head properties
b. subcat. frame
a. head properties
b. subcat. frame
Synt. Struct 1
Frameset
Synt. Struct 2
N. Calzolari
Corresp. MrphU-SynU
Dottorato, Pisa, Maggio 2009
Syntactic
Unit
23
The semantic level:
Information types
Semantic
Unit
A
Ontological type
R
Extended Qualia Structure
M
F
E
O
E
L
N
A
Domain
A
T
T
U
I
Event Type
R
O
Synonymy
G
S
Derivation
E
E
N
M
S
S
U
Semantic properties
N. Calzolari
Regular Polysemy alt.
Dottorato, Pisa, Maggio 2009
S
Predicative Representation
lexical predicate
arguments:
sem. role; sem. restr.
Link to syntactic unit
24
SEMANTIC ENTRY CONTENT
Aumento
(Increase):
L’aumento dei prezzi di un venti%
• Semantic type: Cause_change_of_value
• Supertype: Cause_relational_change
ONTOLOGICAL INFO.
• Eventype: transition
• Domain: general, economics
• Gloss: accrescimento in dimensione o quantità
• aumento Isa cambiamento
• aumento resulting_state maggiore
EXTENDED QUALIA INFO.
• Agentivecause: yes
• Direction: up
• Morphological derivation: Eventverb aumentare
• Semantic predicate: PRED_aumentare; 3 arguments
PREDICATIVE REPRESENTATION
• Type of link: event nominalization
• Arguments description: range, semantic role & selectional restriction:
N. Calzolari
Arg0
Arg1
Arg2
Protoagent
ProtoPatient
Quantifier
Entity
Amount
Human / Institution
Dottorato, Pisa, Maggio 2009
25
Semantic entry
USem3527vaporizzatore
ontological type
semantic type: Instrument
unification_path: [Concrete_entity | ArtifactAgentive | Telic]
free definition
apparecchio usato per vaporizzare
example
un vaporizzatore per piante
event type
eventype: =====
cleaning, gardening, cosmetics
domain information
USem3527vaporizzatore synonymy
USem72288nebulizzatore
USem3527vaporizzatore instrumentverb Usem5239vaporizzare
semantic relations
=====
qualia features
regular polysemy
USem3527vaporizzatore
USem3527vaporizzatore
USem3527vaporizzatore
USem3527vaporizzatore
predicative representation
regular polysemy: =====
Extended Qualia Structure
isa
has_as_part
created_by
used_for
Usem3479apparecchio
Usem61633pulsante
UsemD387fabbricare
UsemD66019nebulizzare
semantic predicate: PRED_vaporizzare-1
type of link: instrument nominalization
arguments description:
• range
• semantic role
• select. restrictions
arg0_vaporizzare_1
Protoagent
Human/Instrument
arg1_vaporizzare_1
Protopatient
+liquid
arg2_vaporizzare_1
Location
Concrete_entity
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
26
Semantic entry
USem79678regulate
ontological type
semantic type: Cause_change_of_state
supertype: Cause_relational_change
free definition
regulation of a function or a physiological process
example
IL2 negatively regulates IL7
event type
domain information
eventype: transition
biomedicine
semantic relations
synonymy: =====
morpho. derivation: =====
qualia features
agentive_cause: yes
resulting_state: yes
regular polysemy
formal: Usem79678regulate isa
constitutive: =====
agentive: =====
telic: =====
predicative representation
regular polysemy: =====
Extended Qualia Structure
Usem64875process
semantic predicate: PRED_regulate-1
type of link: master
arguments description:
• range
arg0_regulate_1
• semantic role
Protoagent
• select. restrictions Natural_Substance
arg1_regulate_1
Protopatient
Natural_Substance
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
27
Semantic entry
UsemTH31676parotite
ontological type
semantic type: Disease
unification_path: [Phenomenon | Agentive]
free definition
Infiammazione delle ghiandole parotidi
example
il bambino ha una parotite
event type
domain information
eventype: =====
Ear-Nose-Throat
USemTH31676parotite synonymy
USem79528orecchione
semantic relations
agentive_cause: yes
qualia features
regular polysemy
USemTH31676parotite
USemTH31676parotite
USemTH31676parotite
USemTH31676parotite
USemTH31676parotite
predicative representation
regular polysemy: =====
Extended Qualia Structure
isa
affects
causes
caused_by
typical_of
USem3868malattia
USem1788ghiandola
Usem72131gonfiore
USem1971virus
USem3593bambino
=====
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
28
Syntactic entry
NF-AT positively regulates IL2, which negatively regulates IL7
SYNU_regulateV
verb
auxiliary: have
passivization: +
head properties
syntactic
arguments
P0 : subject
mandatory
NP
subcategorization frame
P1 : object
mandatory
NP
link to Semantic Unit
USem79678regulate
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
29
Syntax-semantics mapping (1)
position
synt. restr.
position
synt. restr.
a. head properties
b. subcat. frame
a. head properties
b. subcat. frame
syntactic structure 1
Frameset
syntactic structure 2
Syntactic
Unit
Corresp. Syntax-Semantics
semant. class
domain
derivation
synonymy
formal role
constitutive role
agentive role
telic role
sem. restr.
Corresp. SynU-SemU
ontological type
event type
semant. features
semant. relations
Extended Qualia Structure
regular polysemy
Semantic
Unit
type of link
arguments
predicate
predicative represent.
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
30
Regulate:
Syntax-Semantics mapping
S
E
M
predicative representation
A
N
semantic predicate: PRED_regulate-1
type of link: master
semantic arguments description:
• range
arg0_regulate_1
• semantic role
Protoagent
• select. restrictions Natural_Substance
T
I
C
arg1_regulate_1
Protopatient
Natural_Substance
S
syntactic
arguments
S
Y
N
T
subcategorization frame
id: np-v-np
A
X
from Nilda Ruimy
N. Calzolari
synsem
correspondence
P0 : subject
mandatory
NP
P1 : object
mandatory
NP
<Correspondence
id="ISObivalent"
correspargposl="ARG0-P0 ARG1-P1 ">
</Correspondence>
Dottorato, Pisa, Maggio 2009
31
SYNTAX-SEMANTIC MAPPING
SYNTACTIC LEVEL
SynU_aumentare_V ‘to increase’
Transitive structure
P0
P1
Intransitive structure
P2
Frameset
P0
P1
SEMANTIC LEVEL
SemU2_aumentare
SemU1_aumentare
Sem.Type: CAUSE_CHANGE_OF_VALUE
Sem.Type: CHANGE_OF_VALUE
LINK PREDICATE-SEMANTIC UNIT
SEMANTIC PREDICATE
PRED_ aumentare_1
from N. Ruimy
N. Calzolari
ARG0 : Agent
Entity
ARG1 : Patient
Entity
Dottorato, Pisa, Maggio 2009
ARG2 : Undersc.
Amount
32
SYNTAX-SEMANTIC MAPPING
SynU_aumentare_V
Transitive structure
P0
P1
Intransitive structure
P2
P0
Frameset
P1
CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME
non-isomorphic corresp.
isomorphic correspondence
SemU1_aumentare
SemU2_aumentare
CAUSE_CHANGE_OF_VALUE
<Correspondence
id="ISOtrivalent"
correspargposl="ARG0-P0 ARG1-P1 ARG2P2">
</Correspondence>
CHANGE_OF_VALUE
<Correspondence
id="AUG2to3erg9"
comment=" Augmented mapping from TWO Position
description to THREE argument description.
ARG0 not represented in syntax"
correspargposl="ARG1-P0 ARG2-P1">
</Correspondence>
PRED_ aumentare
ARG0 : Agent
ARG1 : Patient
ARG2 : Undersc.
from N. Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
33
Relations and Predicates
Pred_SELL <ARG0>, <ARG1>,
<ARG2>, <ARG3>
SemU
Sell V
Is_the_agent_of
SemU
SemU
Seller N
Sale N
Event_noun
N. Calzolari
Dottorato, Pisa, Maggio 2009
34
“Predicate - semantic unit(s)” link
& Relations
accusa
accusation
accusare
to accuse
Event_noun
master
process nominalisation
PRED_ACCUSARE
<ARG0>, <ARG1>,
<ARG2>,
patient nominalisation
agent nominalisation
Is_the_agent_of
accusato
accusatore
accused
accusator
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
35
The SIMPLE ontology
Simple Ontology:
multidimensional type hierarchy based on both
hierarchical and non-hierarchical conceptual relations

In the SIMPLE ontology, types are not mere
labels but the repository of a specific set of
structured semantic information
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
36
TOP
CONSTITUTIVE AGENTIVE TELIC
CAUSE
•PART
ENTITY
CONCRETE_ENTITY
The SIMPLE ontology
PROPERTY ABSTRACT_ENTITY
REPRESENTATION
•GROUP
•Location
•Quality
•Domain
•Language
•AMOUNT
•Material
•Psych Property
•Time
•Sign
•Artifact Material
•Artifact
•Physi Property
•Moral Standards
•Information
•Furniture
•Food
•Social Property
•Cognitive Fact
•Number
•Clothing
•Physical Object
•Mvmt of
Thought
•Unit of measure
•Container
•Organic Object
•Artwork
•Instrument
•Money
•Living Entity
•Human
•Substance
•Animal
•Metalanguage
•Institution
•Convention
•Abstract
Location
•Vegetal Entity
•Vehicle
EVENT
•Semiotic Artifact
Phenomenon
•Weather verbs
•Disease
•Stimuli
Aspectual
Cause Aspect.
State
•Exist
•Rel. State
Act
Psychological_event
Change
Cause_change
•Cognitive Event
•Rel. Change
•Cause Rel. Change
•Experience Event
•Change Possession
•Cause Change Location
•Move
•Change Location
•Cause Natural Transition
•Cause Act
•Natural Transition
•Creation
•Speech Act
•Acquire Knowledge
•Give Knowledge
•Non Rel. Act
•Relational Act
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
37
Ontology of Structured Semantic Types:
a Template
Schema
providing a set
of structured
information
crucial to the
definition of a
semantic type
Interface
between
ontology &
lexicon
SemU:
Related SynU:
IWN Base Concept
Template_Type:
Unification_path
Domain:
Semantic Class
Gloss:
Predicative
Representation
Arg. Selectional
Restrictions
Derivation:
Qualia_Formal:
Qualia_Agentive:
Qualia_Constitutive:
Synonymy:
Derivational relations between SemUs
isa (1, <container> or <hyperonym>)
created_by (1, <Usem>: [CREATION]) //definitorial//
made_of (1, <Usem>) //optional//
has_as_part (1, <Usem>) //optional//
contains (1, <Usem>)
used_for (1, <contain>) //definitorial//
used_for (1, <measure>) //optional//
Synonyms of the SemU //optional//
Regular Polysemy:
[Amount] [Container]
Qualia_Telic:
Guide for the
lexicographer
N. Calzolari
Identifier of the Semantic Unit
Identifier of the Syntactic Unit the SemU is related to
Number of the corresponding ItalWordNet base concept
[Container]
[Concrete_entity | ArtifactAgentive | Telic]
General
Link to the LexiQuest (or any other ontology)
Lexicographic gloss
Predicate associated to the SemU and its argument
structure [container_pred (arg0)]
Selectional restrictions (Arg0-HeadQuantified-Substance)
Dottorato, Pisa, Maggio 2009
38
Semantic type in the SIMPLE Ontology
Not just a label but rather a classificatory device consisting of a cluster of structured
semantic information
Type assignment means endowing a word-sense with a structured set of semantic features
and relations with a view to:
 distinguishing it by other senses of the same word
 expressing its similarity with other words
 expressing its relationships to other words
 drawing inferences from this information
Each semantic type is associated to a template, i.e. a schematic structure that
contains a cluster of type-defining properties and imposes constraints on lexical items
for type membership
Templates: interface between Ontology and Lexicon
Template-driven encoding methodology ensures internal and cross-lexicons consistency
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
39
Template for the sem. type ‘Instrument’
ontological
information
predicative
representation
extended
qualia structure
Identifier of a SemU
Identifier of the SynU to which the SemU is linked
Number of the corresponding Base Concept in
EuroWordNet
Template_Type:
Instrument
Template_Supertype: Semantic type which dominates the type of the SemU in the
type-hierarchy
Unification_path:
[Concrete_entity | ArtifactAgentive | Telic]
Domain information
Domain:
One of WordNet Classes
Semantic Class:
Lexicographic definition
Gloss:
Type of event (state, process, transition)
Event Type:
Predicate associated with the SemU, and its argument
Predicative
structure
Representation:
Selectional restrictions on the arguments
Selectional Restr.:
Derivational relations between SemUs
Derivation:
Usem_1 isa Usem_2 [Artifact]
Formal:
Usem_1 created_by Usem_2 [Creation]
Agentive:
Constitutive:
 Usem_1 made_of Usem_2 [Substance] OPTIONAL
 Usem_1 has_as_part Usem_2 [Artifact] OPTIONAL
Usem_1 used_for Usem_2 [Event]
Telic:
Synonyms of the SemU
Synonymy:
Collocate information
Collocates:
Polysemous class of the SemU
Complex:
SemU:
SynU:
BC Number:
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
40
Top
Formal
Is_a
Constitutive
Telic
Agentive
Is_a_part_of .. Property
Created_by
Agentive_cause Indirect_telic
Purpose
Activity
...
Contains
..
...
..
Instrumental
Is_the_habit_of
100 Rels.
Used_for
Used_as
The targets of relations identify:

prototypical semantic information associated with a SemU

elements of dictionary definitions of SemUs

typical corpus collocates of the SemU
N. Calzolari
Dottorato, Pisa, Maggio 2009
41
Qualia Structure
 One of the four levels of semantic representation in the theory of Generative
Lexicon
 Consists of four qualia roles encoding orthogonal dimensions of meaning :
 formal role (general identification)
 constitutive role (composition)
 agentive role (origin)
 telic role (function)
N. Calzolari
Dottorato, Pisa, Maggio 2009
42
Formal
isa
antonym_comp
antonym_grad
mult_opposition
disgusto, provare
disgust, feel
casa, costruire
house, build
mohair, capra
mohair, goat
proiettile, colpire
projectile, hit
bisturi, chirurgo
lancet, surgeon
medico, curare
doctor, cure
N. Calzolari
Extended QualiaAgentive
Structure
Constitutive
result_of
made_of
A
agentive_prog
G
is_a_follower_of
C
E
O
agentive_cause
has_as_member
N
N
agentive_experience
is_a_member_of
T
S
caused_by
I
has_as_part
T
V
I
source
instrument
E
T
kinship
U
created_by
ARTIFACTUAL
is_a_part_of
T
derived_from
AGENTIVE
I
resulting_state
V
relates
E
uses
causes
concerns
pane, farina
affects
constitutive_activity
bread, flour
P
contains
R
has_as_colour
has_as_effect
O
senatore, senato
has_as_property
P
measured_by
senator, senate
E
measures
R
produces
produced_by
T
property_of
Y
quantifies
manubrio, bicicletta
related_to
handlebar, bicycle
successor_of
precedes
typical_of
regulates
contains
is_regulated_by
feeling
…..
is_in
lives_in
LOCATION
Dottorato, Pisa, Maggio 2009
typical_location
Telic
used_for
used_as
used_by
used_against
INSTRUMENTAL
indirect_telic
purpose
is_the_activity_of
is_the_ability_of
is_the_habit_of
object_of_activity
TELIC
ACTIVITY
DIRECT
TELIC
43
Formal
is_a
antonym_comp
antonym_grad
mult_opposition
“Extended”
Qualia
Structure
N. Calzolari
Constitutive
made_of
is_a_follower_of
has_as_member
is_a_member_of
has_as_part
instrument
kinship
is_a_part_of
resulting_state
relates
uses
causes
concerns
affects
constitutive_activity
contains
has_as_colour
has_as_effect
has_as_property
measured_by
measures
produces
produced_by
property_of
quantifies
related_to
successor_of
precedes
typical_of
feeling
is_in
lives_in
typical_location
Agentive
result_of
A
agentive_prog
G
E
agentive_cause
N
agentive_experience
T
caused_by
I
V
source
E
created_by
ARTIFACTUAL
derived_from
C
O
N
S
T
I
T
U
T
I
V
E
AGENTIVE
Telic
used_for
used_as
used_by
used_against
INSTRUMENTAL
indirect_telic
purpose
TELIC
is_the_activity_of
is_the_ability_of
is_the_habit_of
object_of_activity
ACTIVITY
DIRECT
TELIC
T-cell, Blood Stem Cell
P
R
O
P
Ribose, Nucleotide
E
R
T
Catalyze, Enzyme
Y
regulates
is_regulated_by
…..
LOCATION
Dottorato, Pisa, Maggio 2009
NEW!
44
Meaning dimensions expressed by
Qualia relations
botte
barrel
Formal: isa
Constitutive:
made_of
recipiente
di legno
traditional
dictionary
definition
Agentive:
created_by
fatto
Constitutive:
made_of
di doghe arcuate tenute unite da cerchi di ferro
che serve per la conservazione e il trasporto
di liquidi, specialmente vino
Constitutive:
contains
Telic:
used_for
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
45
…by using Lexical Resources
 Multidimensional Knowledge Bases
Ala
agentive
fabbricare
SemU: 3232
agentive
Type: [Part]
Parte di aeroplano
used_for
part_of
aeroplano
SemU: 3268
Type: [Part]
Parte di edificio
SemU: D358
used_for
part_of
uccello
edificio
Type: [Body_part]
Organo degli uccelli
SemU: 3467
Type: [Role]
Ruolo nel gioco del calcio
N. Calzolari
volare
part_of
isa
giocatore
squadra
member_of
Dottorato, Pisa, Maggio 2009
46
Semantic Multidimensionality
& NLP
NLP tasks (IE, WSD, NP Recognition, etc.) need to
access multidimensional aspects of word meaning:
Extended Qualia Relations
Is_a_part_of
la pagina del libro (the page of the book)
Member_of
il difensore della Juventus (Juventus fullback)
il suonatore di liuto (the lute player)
il tavolo di legno (the wooden table)
Telic
Made_of
N. Calzolari
Dottorato, Pisa, Maggio 2009
47
Disambiguation = Interpretation of
conceptual
relations in context
?
duna di sabbia
made_of
?
bicchiere di birra
contains
liquid
?
fetta di pane
ONTOLOGY
is_a_part_of
……..
SUBSTANCE
from Nilda Ruimy
N. Calzolari
Dottorato, Pisa, Maggio 2009
ARTIFACTUAL_DRINK
……….
Nilda Ruimy
48
Domain - Semantic class
zucchero
NATURAL_SUBSTANCE
alloro
+edible
FLAVOURING
mangiare
Object_of_the_
tartufo
Used_for
TELIC
aactivity
AGENTIVE
VEGETAL_ENTITY
mestolo
Created_by
mangiare
cucinare
carne
pentola
cuocere
friggitrice
arrostire
bollire
tavola forchetta ristorante
lessare
stufare
posata BUILDING
friggere
cuoco
rosolare FURNITURE
bollitore
grigliare
……
mela
carota
FOOD
coniglio
FRUIT
arrosto
VEGETABLES
pesciera
INSTRUMENT
SUBSTANCE_FOOD
CONTAINER
from Nilda Ruimy
N. Calzolari
ARTIFACT _FOOD
Dottorato, Pisa, Maggio 2009
PROFESSION
49
Noun Compounds/Complex Nominals …are pervasive
 There is a motivation in most N+N construction:
 the context provides it
 The FrameNet (SIMPLE) way
 appeal to specific frame structures (qualia structures)
associated with the head noun,
 determine from corpus attestations which frame elements
(qualia) can get instantiated as a modifier word

“container”: complex nominals can specify:
•
material (aluminium c., glass c., …)
contents (food c., trash c., …)
size (3 quart c., …)
function (shipping c., storage c., …)
•
...
•
•
•
N. Calzolari
Dottorato, Pisa, Maggio 2009
50
Noun Compounds/Complex Nominals
& multidimensional semantic approaches
a. FrameNet
“Container” Frame Structure: Frame Elements:
 Material:
aluminum container, glass c., metal c., tin c.
 Contents:
food container, beverage c., trash c., water c., milk c., fuel c.
 Size:
3 quart container
 Function:
shipping container, storage c.
b. SIMPLE
Qualia Relations of "container" as used in compounds:
 Constitutive: made_of [MATERIAL] aluminum container, glass c., metal c., tin c.
 Telic: contains [ENTITY]
food container, beverage c., trash c., water c., milk c., fuel c.
 Constitutive:size [QUANTITY]
3 quart container
 Telic:is_used_for [EVENT]shipping container, storage c.
N. Calzolari
Dottorato, Pisa, Maggio 2009
51
Complex Nominals
E.g. knife (coltello) triggers:

a “cutting frame” (FrameNet)

specific (SIMPLE) dimensions of meaning
SIMPLE Extended Qualia structure
for the interpretation of the semantic relation betw. Ns
(internal relational structure of MWE)
butcher’s knife (coltello da macellaio)  TELIC (used_by) Y [Human]  PPda
plastic knife (coltello di plastica)  CONST (made_of) X [Material]  PPdi
table knife
(coltello da tavola)  TELIC (used_in) Z [Location]  PPda
hunting knife (coltello da caccia)  TELIC (used_in_activity) E[Activity]  Ppda
piatto di legno  CONST (made_of) X [Material]  PPdi
piatto di pasta  CONST (contains) X [Food]  PPdi
N. Calzolari
Dottorato, Pisa, Maggio 2009
PP
disambig.
52
SIMPLE:
possible extension

Deverbal nominalisation:
o
noun murder (uccisione, delitto, omicidio (different sem. pref.)
 PPdi
PRED: MURDER (uccidere)
 PPda_parte_di, di
o
ARG1: agent [Hum/Anim?]
ARG2: patient [Hum/Anim?]
MOD1: instr [Weapon]
MOD2: means [Action]
MOD3: ... […]
verb murder (uccidere)
 subj:NP
 obj:NP
:instr: PPcon [Weapon] (knife m., con coltello)
:means: PPper [Action] (strangulation m., per strangolamento)
As if it were
a Situation
:loc: Ppploc|di [Location] (Kent State murders, nel ...)
:time: Ppptime|di [Time] (1983 murders, del 1983)
N. Calzolari
Dottorato, Pisa, Maggio 2009
53
Ontologisation of SIMPLE

Automatically converting and enriching a computational lexicon
into a formal Ontology
For NLP semantic tasks
Potential of ontologies in NLP as Backbone in LKBs
Pivot in multilingual architectures (e.g. KYOTO)
Reasoning capabilities

Ontologisation of SIMPLE into OWL
Conversion of the SIMPLE ontology
Bottom-up enrichment: promoting lexicon knowledge to the ontology level
Language independent knowledge from Italian lexico-semantic information
from Antonio Toral
N. Calzolari
Dottorato, Pisa, Maggio 2009
54
Named Entity Repository

Automatically build LRs from existing LRs and Web 2.0 semi-
structured resources. Combine:
Authoritative lexicographic experience → precision
Collaborative “wisdom of the crowds” → recall

Case study: Multilingual NE repository from LRs (en WN, es
WN, it SIMPLE) & Wikipedia
NEs linked to three LRs and two ontologies (SUMO, SIMPLE)
Interoperable resource: LMF compliant
Applied to cross-lingual QA (validate answers): prec. +16,3%
from Antonio Toral
N. Calzolari
Dottorato, Pisa, Maggio 2009
55
Use of SIMPLE Lexicon & Ontology
for Time and Event detection/annotation
Different PoS may realise an event: verbs, nouns, adjectives, prep. phrases
The SIMPLE Lexicon helps in identifying & classifying Events (eventive nouns &
adjectives) → in a 10K Words Annotation Experiment
each event is associated with an Ontological Type
the Event-Type from the SIMPLE-Ontology can be used as default value to
provide event composition, and consequently to instantiate a temporal representation for
each Event
improvement both in identification & classification of Events by annotators: 81.17%
accuracy (vs.72.35%) and K-coefficient = 0.84 (vs. 0.7)
Morpho-Syntactic
Analysis
SIMPLE Lexicon
Event Detection &
Classification
from Tommaso Caselli
N. Calzolari
Dottorato, Pisa, Maggio 2009
56
Mapping SIMPLE Semantic Types to TimeML Classes
from Tommaso Caselli
N. Calzolari
Dottorato, Pisa, Maggio 2009
57
GLML – Generative Lexicon Markup Language
with James Pustejovsky, Olga Batiukova, Anna Rumshisky, Marc Verhagen

Annotating texts with Argument Selection, Argument Coercion, &
Qualia Roles
The corpus brings reality to the model, provides statistical cues to improve
language models
Lexical semantic info, like type coercion/selection, required for applications
such as WSD, categorisation, IR (query reformulation, filtering…), IE
(coreference resolution, relation extraction…), entailment, ..
Predicate – Argument constructions



Predicate Sense Disambiguation
Argument selection: type selection
/coercion
Qualia role/relation selection
Modification constructions
•
•
•
Noun Sense Disambiguation
Qualia role/relation selection in Adjectival
Modification
Qualia role/relation selection in Nominal
Modification
Complex Types
from Valeria Quochi
N. Calzolari
•
Type selection in modification of Dot
Objects
Dottorato, Pisa, Maggio 2009
58
Using Existing Resources for Italian
SIMPLE Lexicon&Ontology/ItalWordNet



Sense Disambiguation
Type selection /coercion
Type selection in Dot Objects
SIMPLE Extended Qualia Structure
Selection of Qualia roles/relations., e.g.
Constitutive Relations
 e.g Is_a_part_of , Is_a_member_of
Telic Relations
 e.g. Purpose, Object_of_the_activity
Agentive Relations
 e.g. Source, Result_of
from Valeria Quochi
N. Calzolari
Dottorato, Pisa, Maggio 2009
59
59
Ontology & Lexicon
Today we can easily say that ontology learning, i.e. the practical feasibility of
supporting knowledge acquisition in a domain, depends on developing automatic
methods for acquiring conceptual representations from natural language text
Semantic Web initiatives are also focussing on the building of ontological
representations from texts, and in this respect show a large amount of conceptual
overlap with the notion of a dynamic lexicon
Lexicon & Corpus
Based on various experiences, and as a work strategy for lexical/textual resources

We should push towards innovative types of lexicons: a sort of ‘examplebased living lexicons’ that participate of properties of both lexicons and
corpora

N. Calzolari
In such a lexicon redundancy is not a problem, but rather a benefit
Dottorato, Pisa, Maggio 2009
60
BUT… Mismatch between LRs and LT

Often a gap between advancement in LRs and LT

Either adequate LRs are missing … or there are no systems
able to use “knowledge intensive” LRs effectively
Shortcomings:



lack of usable implementations fully exploiting new types of
LRs
LR claims are not empirically evaluated
A parallel evolution of R&D for both LRs and LT is needed
N. Calzolari
Dottorato, Pisa, Maggio 2009
61
Phenomena to be represented/What is
missing?? from Ed Hovy


1. Bracketing / grouping of predications around entities (basic frame
structure)
2. Concepts:
done
done??
Choice of meaning/sense, with frames in some cases
 Definition and nature of concept repository / ontology
 Major high-level concept groupings and classes
3. Labels on (dependency) arcs (thematic roles, types of attributes, modifiers,
etc.)
done



4. Coreference (explicit and indirect):



5. Information Structure and Discourse structure:





N. Calzolari
intra-sentential
intersentential and cross-documents
theme-rheme and topic-focus
salience
coordination
nonsemantic inter-clausal relations (RST’s interpersonal ones)
etc.
Dottorato, Pisa, Maggio 2009
done??
62
Phenomena to be represented/ What is missing??

6. Pragmatics:



Speech Acts
Participants and audience modeling
Modality:





Ed Hovy
Epistemic modalities
Deontic modalities
Personal attitudes
done??


Deixis / reference to external world (or databases)
Social register, genre, and style













Time (Reichenbach)
Space (OWL upper ontology of space, etc.) done??
Cardinality
Quantification
Manner
Towards a
Degree and comparison
Possession
common encoding policy???
Existentials
Copular constructions
Conditionals
Consequences and inference
Co-text and intertextuality (including formatting and other media)
Meaning of prosody and other speech-related effects
7. Polarity (including scoping)
8. Microtheories (many of them to be incorporated elsewhere)
N. Calzolari
Dottorato, Pisa, Maggio 2009
63
Lexicon and Corpus:
a multi-faceted interaction















N. Calzolari
LC
CL
CL
LC
CL
CL
CL
CL
CL
LC

CL
CL
CL
CL
LC
tagging
frequencies (of different linguistic “objects”)
proper nouns, acronyms, …
parsing, chunking, …
training of parsers
lexicon updating
“collocational” data (MWE, idioms, gram. patterns ...)
“nuances” of meanings & semantic clustering
acquisition of lexical (syntactic/semantic) knowledge
semantic tagging/word-sense disambiguation
(e.g. in Senseval)
more semantic information on LE
corpus based computational lexicography
validation of lexical models
…
...
Dottorato, Pisa, Maggio 2009
64
BUT
… Dynamic lexicons
 Current computational lexicons (even WordNets) are static objects, still
shaped on traditional dictionaries
 Towards a flexible model of dynamic lexicon



extending the expressiveness of a core static lexicon
adapting to the requirements of language in use as attested in corpora
with semantic clustering techniques, etc.
Convert the extreme flexibility & multidimensionality of meaning into
large-scale and exploitable (VIRTUAL?) resources
a “Lexicon & Corpus” together
Sort of Example-based Lexicon
N. Calzolari
Dottorato, Pisa, Maggio 2009
65
Verb/Arguments Interaction
at the Lexical-Semantic Level
Verb meaning  determines/selects the ‘sense’ of its
subject and/or direct object
e.g. arrestare, both ‘to arrest’ & ‘to stop’, selects direct objects which have
themselves, or receive from the verb, a negative connotation
o
o
o
o
o
o
o
o
o
o
o
N. Calzolari
Dobj
Sem.type Conn.Feat.
ladro1
spacciatore1
trafficante1
traffico 2
invasione1
massacro1
inflazione1
pregiudicato1
balordo1
maniaco1
strozzino 1
agent_temp_act
agent_temp_act
agent_temp_act
act
cause_act
cause_nat_trans
event
human
human
human
agent_temp_act
neg
neg
neg
neg
neg
neg
neg
neg
neg
neg
neg
Dottorato, Pisa, Maggio 2009
66
Complexity of Word Sense in context:
many potential clues
A particular meaning (of a verb) may be selected by:

A specific syntactic pattern



The semantic type of subjects, dir objects, ind. objects


human subject (if not collective type) always selects the meaning ‘to understand’ of the verb
comprendere
The domain of use


comprendere + that-clause = ‘to understand’ [not = ‘to include’]
aprire + PP introduced by a (preferably with “human” head) = ‘to be ready, open, well
disposed towards someone’ (e.g. Cossiga apre a La Malfa)
perseguire un reato ‘to prosecute a crime’ (domain=law)
A specific modifier


perseguire penalmente ‘to prosecute at the penal level’, not ‘to pursue (a goal)’
comprendere benissimo ‘to understand very well’, not ‘to include’
Two different senses of a lemma cannot be selected simultaneously in the same
context

N. Calzolari
BUT…
Dottorato, Pisa, Maggio 2009
67
Complexity of Word Sense identification
The problem:



not sure tests
only partial validity & not completely discriminating
Moreover, it’s not easy to predict when to apply which test
Word Sense Disambiguation (WSD)
 in different contexts is better achieved using info types at
different levels of linguistic description:


N. Calzolari
morphosyntactic/syntactic/semantic/pragmatic…, even multilingual
BUT  a-priori unpredictable where is the “clue”
Dottorato, Pisa, Maggio 2009
68
Complexity of Word Sense & use of Corpora

The availability of large quantities of semantically tagged
corpora helps to



analyse the impact of different “clues” to perform WSD in
different contexts
study the interaction of clues belonging to different levels of
linguistic description, to improve WSD strategies
 not just statistics!!
Automatically acquire syntactic, semantic, collocational
(lexical) ‘indicators’


N. Calzolari
which can help in the identification of a word-sense
‘List’ them in the lexicon??
Dottorato, Pisa, Maggio 2009
69
BUT…
Problem of regular polysemy
… and more
 actual occurrence of “two senses” in the same context…
 e.g. both act & result (for deverbal nouns, etc.)


In una comunicazione al Parlamento la Commissione ha illustrato le sue riflessioni su
…
Berlusconi dovrà scegliere se fare l’uomo di governo o mantenere il controllo delle sue tv
Underspecified meanings?
 maybe subsuming more granular distictions, to be used only when
disambiguation is feasible/useful in a context
Theoretical language, “invented” by lexicographers/linguists who
have/want to classify in disjoint classes, vs.
 actual usage  a “continuum”
 resistant to clear-cut disjunctions
 by necessity ambiguous wrt imposed classifications
N. Calzolari
Dottorato, Pisa, Maggio 2009
70
In a “Senseval” framework …
… what cannot be easily encoded
e.g.
at the Lexical-Semantic Level
When sense interpretation requires appeal to extra-linguistic knowledge (not to be captured at
the lexical-semantic level of description)
When corpus annotation either diverges from the lexical resource or further specifies it

words acquiring a specific sense, strictly dependent on the context
la donna Pauline Collins, che ha già visto arrestare il marito dai tedeschi,…

variety of nuances of a verb, e.g. according to co-occurring dir.obj. sem-type

metaphors extended to an entire sentence
l’auto verde arriva sul tavolo del governo
(lit. the green car arrives on the table of the government)

...
Not all these “shifts of meanings” can/must be captured
through lexical-semantic annotation
N. Calzolari
Dottorato, Pisa, Maggio 2009
71
Wrt Senseval
jargon, neologisms, evaluative suffixation, ‘titles’, …

vetturetta
minitaxi
fumantino (agg. una persona
fumantina)
komeinista

…






Primula rossa (= boss mafioso)
Scarpa d'oro (= un bravo giocatore)
…
Not in any lexicon…
a semantic type easier to assign than a word-sense in a
lexicon
N. Calzolari
Dottorato, Pisa, Maggio 2009
72
Compounds and idioms
uscire di scena
farla franca
fare fuoco
andare in onda
…
fare [in tempo]
andare [a piedi]
essere [in testa]
(= essere il primo)
vincere [per un soffio]
partire [a razzo]

















Croce Rossa
Caschi Blu
conflitto a fuoco
atletica leggera
famiglia bene
un bagno di folla
…
Where is the boundary of the MWE?

N. Calzolari

"andare_a_piedi" vs. andare (Pos V) a_piedi (Pos Adv.loc).?
Dottorato, Pisa, Maggio 2009
73
Locutions and Figurative usages











N. Calzolari
per carità
in questione
per caso
in lizza
a volontà
a buon mercato
…
ci mancherebbe!
c'è mancato poco
…

due lavoratori su tre sono a casa (=
essere disoccupato)
[the collocation with ‘lavoratori’
disambiguates the expression]





uomo [di polso]
zona medaglia d'oro (= tra i primi)
a cielo aperto (discarica a ..)
la bella vita (fare …)
…
If annotation of individual components, loss of the semantic contribution
of the MWE

acquistare un oggetto a buon (Pos A) mercato (Pos S) !!
Dottorato, Pisa, Maggio 2009
74
Usual issues:
“Is there a fixed set of senses?”
or “Do senses exist as separate objects?”
 Criteria for sense distinction very application-dependent
 greater vs. lesser granularity depend on the task/ domain/situation/etc.
 i.e. the communication purpose
& there is no inherently “true” (upper or lower) limit to the granularity ...
 Impossible a “checklist theory of meaning”: meaning as a “piece of
information”
with an autonomous status independent of its use
Computational resources should provide
 multi-dimensional information
 the highest expressiveness in terms of sense-discriminating power
 contextual information
Are we dealing with semantic annotation in the right way??
N. Calzolari
Dottorato, Pisa, Maggio 2009
75
Divergences betw.
Lexicon encoding & Corpus annotation
In the lexicon
senses are “de-contextualized” (a necessity to capture generalizations)
sense discrimination must be kept “under control”  clustering (manually or
automatically)
In the corpus sense annotation task
contextualization plays a predominant role
calls for a range of pragmatic issues
corpus analysis per se would lead to excessive granularity of sense distinctions
Capture just the core basic distinctions in a core lexicon &
Acquire additional, more granular info (usu. of collocational nature) from
corpora
to be encoded within the broader senses, e.g. to help translation
N. Calzolari
Dottorato, Pisa, Maggio 2009
76
Between LRs and Linguistics:
A consequence of the corpus-based approach is
 Compels to break hypotheses too easily taken for granted in mainstream
linguistics
 In actual usage a characteristics of language is to display many properties which
behave as a continuum, not as “yes/no” properties
 The same holds true for so-called “rules”: we find more frequently “tendencies”
towards a rule than precise rules
 Many of the theoretical rules appear to be simplifications or idealisations in fact
dispelled by real usage
 A number of dichotomies must then be reconciled
Lesson learned: [IN-]Adequacy of Lexical resources
A long way to be able to recognise & integrate the many dimensions relevant to content interpretation
N. Calzolari
Dottorato, Pisa, Maggio 2009
77
A number of “dichotomies” not as opposite views,
but as complementary perspectives
 Language as a continuum:







rules
absolute constraints
discreteness
theoretical/potential
intuition/introspection
theory-driven
symbolic
vs.
vs.
vs.
vs.
vs.
vs.
vs.
tendencies
preferences
continuum/gradedness
actual
empirical evidence
data-driven
statistical
the right part must be highlighted,
then to combine the two
Choices on the syntagmatic axis are pervasive
Lexicon & Corpus must converge
N. Calzolari
Dottorato, Pisa, Maggio 2009
78
Descargar

semantic markup