Penn
Putting Meaning Into Your Trees
Martha Palmer
University of Pennsylvania
Columbia University
New York City
January 29, 2004
Columbia, 1/29/04
1
Outline
Penn
 Introduction
 Background: WordNet, Levin classes, VerbNet
 Proposition Bank – capturing shallow
semantics
 Mapping PropBank to VerbNet
 Mapping PropBank to WordNet
Columbia, 1/29/04
2
Ask Jeeves – A Q/A, IR ex.
Penn
What do you call a successful movie? Blockbuster
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
3
Ask Jeeves – filtering w/ POS tag
Penn
What do you call a successful movie?
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
4
Filtering out “call the police”
Penn
Syntax
call(you,movie,what)
≠
call(you,police)
Columbia, 1/29/04
5
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs.
 And provides clear, replicable sense
distinctions.
AskJeeves: Who do you call for a good
electronic lexical database for English?
Columbia, 1/29/04
6
WordNet – call, 28 senses
Penn
1. name, call -- (assign a specified, proper name to;
"They named their son David"; …)
-> LABEL
2. call, telephone, call up, phone, ring -- (get or try to get into
communication (with someone) by telephone;
"I tried to call you all night"; …)
->TELECOMMUNICATE
3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality;
"He called me a bastard"; …)
-> LABEL
4. call, send for -- (order, request, or command to come;
"She was called into the director's office"; "Call the police!")
-> ORDER
Columbia, 1/29/04
7
WordNet – Princeton (Miller 1985, Fellbaum 1998)
Penn
 On-line lexical reference (dictionary)
 Nouns, verbs, adjectives, and adverbs grouped into
synonym sets
 Other relations include hypernyms (ISA), antonyms,
meronyms
 Limitations as a computational lexicon
 Contains little syntactic information
 No explicit predicate argument structures
 No systematic extension of basic senses
 Sense distinctions are very fine-grained, ITA 73%
 No hierarchical entries
Columbia, 1/29/04
8
Levin classes
(Levin, 1993)
Penn
 3100 verbs, 47 top level classes, 193 second and third level
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
John cut the bread. / *The bread cut. / Bread cuts easily.
John hit the wall. / *The wall hit. / *Walls hit easily.
Columbia, 1/29/04
9
Levin classes
(Levin, 1993)
Penn
 Verb class hierarchy: 3100 verbs, 47 top level classes, 193
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
change-of-state
John cut the bread. / *The bread cut. / Bread cuts easily.
change-of-state, recognizable action,
sharp instrument
John hit the wall. / *The wall hit. / *Walls hit easily.
contact, exertion of force
Columbia, 1/29/04
10
Penn
Columbia, 1/29/04
11
Confusions in Levin classes?
Penn
 Not semantically homogenous
{braid, clip, file, powder, pluck, etc...}
 Multiple class listings
homonymy or polysemy?
 Conflicting alternations?
Carry verbs disallow the Conative,
(*she carried at the ball), but include
{push,pull,shove,kick,draw,yank,tug}
also in Push/pull class, does take the Conative
(she kicked at the ball)
Columbia, 1/29/04
12
Intersective Levin Classes
Penn
“apart” CH-STATE
“across the room”
“at” ¬CH-LOC
CH-LOC
Columbia, 1/29/04
Dang, Kipper & Palmer, ACL98
13
Intersective Levin Classes
Penn
 More syntactically and semantically coherent
sets of syntactic patterns
explicit semantic components
relations between senses
VERBNET
www.cis.upenn.edu/verbnet
Dang, Kipper & Palmer, IJCAI00, Coling00
Columbia, 1/29/04
14
VerbNet – Karin Kipper
Penn
 Class entries:
Capture generalizations about verb behavior
Organized hierarchically
Members have common semantic elements,
semantic roles and syntactic frames
 Verb entries:
Refer to a set of classes (different senses)
each class member linked to WN synset(s)
(not all WN senses are covered)
Columbia, 1/29/04
15
Semantic role labels:
Penn
Julia broke the LCD projector.
break (agent(Julia), patient(LCD-projector))
cause(agent(Julia),
broken(LCD-projector))
agent(A) -> intentional(A), sentient(A),
causer(A), affector(A)
patient(P) -> affected(P), change(P),…
Columbia, 1/29/04
16
Hand built resources vs. Real data
Penn
 VerbNet is based on linguistic theory –
how useful is it?
 How well does it correspond to syntactic
variations found in naturally occurring text?
PropBank
Columbia, 1/29/04
17
Proposition Bank:
Penn
From Sentences to Propositions
Powell met Zhu Rongji
battle
wrestle
join
debate
Powell and Zhu Rongji met
Powell met with Zhu Rongji
Powell and Zhu Rongji had
a meeting
consult
Proposition: meet(Powell, Zhu Rongji)
meet(Somebody1, Somebody2)
...
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu)
Columbia, 1/29/04
discuss([Powell, Zhu], return(X, plane))
18
Capturing semantic roles*
Penn
SUBJ
 Owen broke [ ARG1 the laser pointer.]
SUBJ
 [ARG1 The windows] were broken by the
hurricane.
SUBJ
 [ARG1 The vase] broke into pieces when it
toppled over.
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
Columbia, 1/29/04
19
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels.
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
20
A TreeBanked Sentence
Penn
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NPbeen VP
(VP would
Analyst
SBJ
expectingNP
(VP give
s
(NP the U.S. car maker)
SBAR
NP
S (NP (NP an eventual (ADJP 30 %) stake)
a GM-Jaguar WHNP-1
(PP-LOC
in (NP the British company))))))))))))
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PPNP
Analysts have been expecting a GM-Jaguar
NP
LOC
pact that would give the U.S. car maker an the US car
NP
an eventual
maker
eventual 30% stake in the British company.
the British
30% stake in
company
S
Columbia, 1/29/04
21
The same sentence, PropBanked
Penn
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
Arg1
(VP expecting
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
a GM-Jaguar
(VP would
pact
(VP give
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %)
stake)
Arg0
(PP-LOC in (NP the British
that would give
Arg1
company))))))))))))
have been expecting
Arg0
Analyst
s
*T*-1
Arg2
the US car
maker
Columbia, 1/29/04
an eventual 30% stake in the
British company
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
22
Frames File Example: expect
Penn
Roles:
Arg0: expecter
Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in
interest rates.
Arg0:
REL:
Arg1:
Columbia, 1/29/04
Portfolio managers
expect
further declines in interest rates
23
Frames File example: give
Penn
Roles:
Arg0: giver
Arg1: thing given
Arg2: entity given to
Example:
double object
The executives gave the chefs a standing ovation.
Arg0:
The executives
REL:
gave
Arg2:
the chefs
Arg1:
a standing ovation
Columbia, 1/29/04
24
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
25
Annotation procedure
Penn
 PTB II - Extraction of all sentences with given verb
 Create Frame File for that verb Paul Kingsbury
 (3100+ lemmas, 4400 framesets,118K predicates)
 Over 300 created automatically via VerbNet
 First pass: Automatic tagging (Joseph Rosenzweig)
 http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
 Second pass: Double blind hand correction
Paul Kingsbury
 Tagging tool highlights discrepancies Scott Cotton
 Third pass: Solomonization (adjudication)
 Betsy Klipple, Olga Babko-Malaya
Columbia, 1/29/04
26
Trends in Argument Numbering
Penn
 Arg0 = agent
 Arg1 = direct object / theme / patient
 Arg2 = indirect object / benefactive /
instrument / attribute / end state
 Arg3 = start point / benefactive / instrument /
attribute
 Arg4 = end point
 Per word vs frame level – more general?
Columbia, 1/29/04
27
Additional tags
(arguments or adjuncts?)
Penn
 Variety of ArgM’s (Arg#>4):
 TMP - when?
 LOC - where at?
 DIR - where to?
 MNR - how?
 PRP -why?
 REC - himself, themselves, each other
 PRD -this argument refers to or modifies
another
 ADV –others
Columbia, 1/29/04
28
Inflection
Penn
 Verbs also marked for tense/aspect





Passive/Active
Perfect/Progressive
Third singular (is has does was)
Present/Past/Future
Infinitives/Participles/Gerunds/Finites
 Modals and negations marked as ArgMs
Columbia, 1/29/04
29
Frames: Multiple Framesets
Penn
 Out of the 787 most frequent verbs:
 1 Frameset – 521
 2 Frameset – 169
 3+ Frameset - 97 (includes light verbs)
 94% ITA
 Framesets are not necessarily consistent between
different senses of the same verb
 Framesets are consistent between different verbs
that share similar argument structures,
(like FrameNet)
Columbia, 1/29/04
30
Ergative/Unaccusative Verbs
Penn
Roles (no ARG0 for unaccusative verbs)
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16
billion.
The Nasdaq composite index added 1.01
to 456.6 on paltry volume.
Columbia, 1/29/04
31
Actual data for leave
Penn
 http://www.cs.rochester.edu/~gildea/PropBank/Sort/
Leave .01 “move away from” Arg0 rel Arg1 Arg3
Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44
sub-ARG0 20
sub-ARG0 NP-ARG1-with obj-ARG2 17
sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10
sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6
sub-ARG0 sub-ARG1 VP-ARG3-PRD 5
NP-ARG1-with obj-ARG2 4
obj-ARG1 3
sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Columbia, 1/29/04
32
Penn
PropBank/FrameNet
Buy
Sell
Arg0: buyer
Arg0: seller
Arg1: goods
Arg1: goods
Arg2: seller
Arg2: buyer
Arg3: rate
Arg3: rate
Arg4: payment
Arg4: payment
Broader, more neutral, more syntactic –
maps readily to VN,TR.FN
Rambow, et al, PMLB03
33
Columbia, 1/29/04
Annotator accuracy – ITA 84%
Penn
A n no ta to r A ccu ra cy-p rim ary lab e ls on ly
0.96
hertle rb
0.95
forbe sk
0.94
solam
istreitan 2
accura cy
0.93
wia rm str
kin gsbur
0.92
ksle dge
0.91
nryan t
0.9
ja ywang
m alayao
0.89
0.88
pteppcotter
er
delilkan
0.87
0.86
1000
10000
100000
1000000
# of an nota tio ns (log scale)
Columbia, 1/29/04
34
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
35
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically
assigned accurately to new text?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
36
Automatic Labelling of Semantic
Relations
Penn
• Stochastic Model
• Features:
Predicate
Phrase Type
Parse Tree Path
Position (Before/after predicate)
Voice (active/passive)
Head Word
Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
Columbia, 1/29/04
37
Semantic Role Labelling AccuracyKnown Boundaries
Gold St. parses
Framenet PropBank
≥ 10 inst
77.0
Automatic parses 82.0
73.6
Penn
PropBank
≥ 10 instances
83.1
79.6
•Accuracy of semantic role prediction for known boundaries--the
system is given the constituents to classify.
•FrameNet examples (training/test) are handpicked to be unambiguous.
• Lower performance with unknown boundaries.
• Higher performance with traces.
• Almost evens out.
Columbia, 1/29/04
38
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
39
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 New results, original features, labels, 88%, 93% Penn
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
40
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
41
Mapping from PropBank to VerbNet
Frameset id =
leave.02
Sense =
give
VerbNet class =
future-having 13.3
Arg0
Giver
Agent
Arg1
Thing given
Theme
Arg2
Benefactive
Recipient
Columbia, 1/29/04
Penn
42
Mapping from PB to VerbNet
Penn
Columbia, 1/29/04
43
Mapping from PropBank to VerbNet
Penn
 Overlap with PropBank framesets
 50,000 PropBank instances
 < 50% VN entries, > 85% VN classes
 Results
 MATCH - 78.63%. (80.90% relaxed)
 (VerbNet isn’t just linguistic theory!)
 Benefits
 Thematic role labels and semantic predicates
 Can extend PropBank coverage with VerbNet classes
 WordNet sense tags
Kingsbury & Kipper, NAACL03, Text Meaning Workshop
http://www.cs.rochester.edu/~gildea/VerbNet/
Columbia, 1/29/04
44
WordNet as a WSD sense inventory
Penn
 Senses unnecessarily fine-grained?
 Word Sense Disambiguation bakeoffs
Senseval1 – Hector, ITA = 95.5%
Senseval2 – WordNet 1.7, ITA verbs = 71%
Groupings of Senseval2 verbs, ITA =82%
 Used syntactic and semantic criteria
Columbia, 1/29/04
45
Groupings Methodology
(w/ Dang and Fellbaum)
Penn
 Double blind groupings, adjudication
 Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames
 call him a bastard
 call him a taxi
Recognizable alternations – regular sense
extensions:
 play an instrument
 play a song
 play a melody on an instrument
SIGLEX01, SIGLEX02, JNLE04
Columbia, 1/29/04
46
Groupings Methodology (cont.)
Penn
 Semantic Criteria
Differences in semantic classes of arguments
 Abstract/concrete, human/animal,
animate/inanimate, different instrument types,…
Differences in entailments
 Change of prior entity or creation of a new entity?
Differences in types of events
 Abstract/concrete/mental/emotional/….
Specialized subject domains
Columbia, 1/29/04
47
Results – averaged over 28 verbs
Dang and Palmer, Siglex02,Dang et al,Coling02
Penn
Total
WN polysemy
16.28
Group polysemy
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
MX-group
69%
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
48
+2.5%,
+1.5 to +5%,
+6%
Columbia, 1/29/04
Grouping improved ITA
and Maxent WSD
Penn
 Call: 31% of errors due to confusion between senses
within same group 1:
 name, call -- (assign a specified, proper name to; They named
their son David)
 call -- (ascribe a quality to or give a name of a common noun
that reflects a quality; He called me a bastard)
 call -- (consider or regard as being;I would not call her beautiful)
 75% with training and testing on grouped senses vs.
 43% with training and testing on fine-grained senses
Columbia, 1/29/04
49
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
50
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
51
Overlap between Groups and
Framesets – 95%
Penn
Frameset2
Frameset1
WN1 WN2
WN3 WN4
WN6 WN7 WN8
WN5 WN 9 WN10
WN11 WN12 WN13
WN19
WN 14
WN20
develop
Palmer, Dang & Fellbaum, NLE 2004
Columbia, 1/29/04
52
Sense Hierarchy
Penn
 PropBank Framesets –
coarse grained distinctions
20 Senseval 2 verbs w/ > 1 Frameset
Maxent WSD system, 73.5% baseline, 90% accuracy
 Sense Groups (Senseval-2) intermediate level
(includes Levin classes) – 95% overlap, 69%
 WordNet – fine grained distinctions, 60.2%
Columbia, 1/29/04
53
English lexical resource is available
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically assigned
accurately to new text.
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
54
A Chinese Treebank Sentence
Penn
国会/Congress 最近/recently 通过/pass 了/ASP 银行法
/banking law
“The Congress passed the banking law recently.”
(IP (NP-SBJ (NN 国会/Congress))
(VP (ADVP (ADV 最近/recently))
(VP (VV 通过/pass)
(AS 了/ASP)
(NP-OBJ (NN 银行法/banking law)))))
Columbia, 1/29/04
55
The Same Sentence, PropBanked
Penn
(IP (NP-SBJ arg0 (NN 国会))
(VP argM (ADVP (ADV 最近))
(VP f2 (VV 通过)
(AS 了)
arg1 (NP-OBJ (NN 银行法)))))
通过(f2) (pass)
arg0
国会
argM
最近
arg1
银行法 (law)
(congress)
Columbia, 1/29/04
56
Chinese PropBank Status (w/ Bert Xue and Scott Cotton)
Penn
 Create Frame File for that verb  Similar alternations – causative/inchoative,
unexpressed object
 5000 lemmas, 3000 DONE, (hired Jiang)
 First pass: Automatic tagging 2500 DONE
 Subcat frame matcher
(Xue & Kulick, MT03)
 Second pass: Double blind hand correction
 In progress (includes frameset tagging), 1000 DONE
 Ported RATS to CATS, in use since May
 Third pass: Solomonization (adjudication)
Columbia, 1/29/04
57
A Korean Treebank Sentence
Penn
그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다.
He added that Renault has a deadline until the end of March for a merger
proposal.
(S (NP-SBJ 그/NPN+은/PAU)
(VP (S-COMP (NP-SBJ 르노/NPR+이/PCA)
(VP (VP (NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP (NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
Columbia, 1/29/04
58
The same sentence, PropBanked
덧붙이었다
Arg0
그는
Arg2
갖고 있다
Arg0
르노가
Arg1
ArgM
3 월말까지
Penn
(S Arg0 (NP-SBJ 그/NPN+은/PAU)
(VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노/NPR+이/PCA)
(VP (VP ( ArgM NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP ( Arg1 NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
인수제의 시한을
덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다)
(add)
(he) (Renaut has a deadline until the end of March for a merger proposal)
갖다 (르노가,
(has)
Columbia, 1/29/04
3 월말까지,
인수제의 시한을)
(Renaut) (until the end of March) (a deadline for a merger proposal)
59
PropBank II
Penn
 Nominalizations NYU
 Lexical Frames DONE
 Event Variables, (including temporals and
locatives)
 More fine-grained sense tagging
 Tagging nominalizations w/ WordNet sense
 Selected verbs and nouns
 Nominal Coreference
 not names
 Clausal Discourse connectives – selected subset
Columbia, 1/29/04
60
PropBank II
Event variables;
Penn
sense tags;
nominal reference;
discourse connectives
{Also}, [Arg0substantially lower Dutch corporate tax rates]
helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3PRD flat] [ArgM-ADV relative to earnings growth]].
ID#
REL
h23
help
tax rates
help2,5 tax rate1
the company
keep its tax
outlay flat
k16
keep the
keep1 company1
company
its tax outlay
Columbia, 1/29/04
Arg0
Arg1
Arg3PRD
ArgM-ADV
flat
relative to
earnings…
61
Summary
Penn
 Shallow semantic annotation that captures critical
dependencies and semantic role labels
 Supports training of supervised automatic
taggers
 Methodology ports readily to other languages
English PropBank release – spring 2004
Chinese PropBank release – fall 2004
Korean PropBank release – summer 2005
Columbia, 1/29/04
62
Word sense in Machine Translation
Penn
 Different syntactic frames
John left the room
Juan saiu do quarto. (Portuguese)
John left the book on the table.
Juan deizou o livro na mesa.
 Same syntactic frame?
John left a fortune.
Juan deixou uma fortuna.
Columbia, 1/29/04
63
Summary of
Multilingual TreeBanks, PropBanks
Parallel
Corpora
Text
Treebank
PropBank I
Penn
Prop II
Chinese Chinese 500K Chinese 500K Chinese 500K Ch 100K
Treebank English 400K English 100K English 350K En 100K
Arabic 500K
Arabic 500K
Arabic
Treebank English 500K English ?
?
?
Korean 180K
Korean
Treebank English 50K
Korean 180K
English 50K
Columbia, 1/29/04
Korean 180K
English 50K
64
Levin class: escape-51.1-1
Penn
 WordNet Senses: WN 1, 5, 8
 Thematic Roles: Location[+concrete]
Theme[+concrete]
 Frames with Semantics
Basic Intransitive
"The convict escaped"
motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location)
Intransitive (+ path PP)
"The convict escaped from the prison"
Locative Preposition Drop
"The convict escaped the prison"
Columbia, 1/29/04
65
Levin class: future_having-13.3
Penn
 WordNet Senses: WN 2,10,13
 Thematic Roles: Agent[+animate OR +organization]
Recipient[+animate OR +organization]
Theme[]
 Frames with Semantics
Dative
"I promised somebody my time"
Agent V Recipient Theme
has_possession(start(E),Agent,Theme)
future_possession(end(E),Recipient,Theme) cause(Agent,E)
Transitive (+ Recipient PP)
"We offered our paycheck to her"
Agent V Theme Prep(to) Recipient )
Transitive (Theme Object)
"I promised my house (to somebody)"
Agent V Theme
Columbia, 1/29/04
66
Automatic classification
Penn
 Merlo & Stevenson automatically classified 59
verbs with 69.8% accuracy
 1. Unergative, 2. unaccusative, 3. object-drop
 100M words automatically parsed
 C5.0. Using features: transitivity, causativity,
animacy, voice, POS
 EM clustering – 61%, 2669 instances, 1M words
 Using Gold Standard semantic role labels
 1. float hop/hope jump march leap
 2. change clear collapse cool crack open flood
 3. borrow clean inherit reap organize study
Columbia, 1/29/04
67
SENSEVAL – Word Sense
Disambiguation Evaluation
Penn
DARPA style bakeoff: training data, testing data, scoring algorithm.
SENSEVAL1
1998
Languages
3
Systems
24
Eng. Lexical Sample Yes
Verbs/Poly/Instances 13/12/215
Sense Inventory
Hector,
95.5%
Columbia, 1/29/04
SENSEVAL2
2001
12
90
Yes
29/16/110
WordNet,
73+%
NLE99, CHUM01, NLE02, NLE03
68
Maximum Entropy WSD
Hoa Dang, best performer on Verbs
Penn
 Maximum entropy framework, p(sense|context)
 Contextual Linguistic Features
Topical feature for W:
 keywords (determined automatically)
Local syntactic features for W:
 presence of subject, complements, passive?
 words in subject, complement positions, particles,
preps, etc.
Local semantic features for W:
 Semantic class info from WordNet (synsets, etc.)
 Named Entity tag (PERSON, LOCATION,..) for
proper Ns
 words within +/- 2 word window
Columbia, 1/29/04
69
Best Verb Performance - Maxent-WSD
Hoa Dang
28 verbs - average
Total
WN polysemy
16.28
ITA
71%
MX-WSD
60.2%
Penn
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
+2.5%,
+1.5 to +5%,
+6%
Dang and Palmer, Siglex02,Dang et al,Coling02
Columbia, 1/29/04
70
Role Labels & Framesets
as features for WSD
Penn
 Preliminary results
Jinying Chen
Gold Standard PropBank annotation
 Decision Tree C5.0,
Groups
5 verbs,
Features: Frameset tags, Arg labels
 Comparable results to Maxent with
PropBank features
Syntactic frames and sense distinctions are inseparable
Columbia, 1/29/04
71
Lexical resources provide concrete
criteria for sense distinctions
Penn
 PropBank – coarse grained sense
distinctions determined by different
subcategorization frames (Framesets)
 Intersective Levin classes – regular sense
extensions through differing syntactic
constructions
 VerbNet – distinct semantic predicates for
each sense (verb class)
Are these the right distinctions?
Columbia, 1/29/04
72
Results – averaged over 28 verbs
Penn
Total
WN
16.28
Grp
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
JHU - MLultra
56.6%,58.7%
MX-group
69%
Columbia, 1/29/04
73
Descargar

Proposition Bank: a resource of predicate