Formal Semantics
Slides by Julia Hockenmaier, Laura
McGarrity, Bill McCartney, Chris
Manning, and Dan Klein
Formal Semantics
It comes in two flavors:
• Lexical Semantics: The meaning of words
• Compositional semantics: How the meaning
of individual units combine to form the
meaning of larger units
What is meaning
• Meaning ≠ Dictionary entries
Dictionaries define words using words.
• Referent: the thing/idea in the world that a
word refers to
• Reference: the relationship between a word
and its referent
The president is the commander-in-chief.
= Barack Obama is the commander-in-chief.
I want to be the president.
≠ I want to be Barack Obama.
• Tooth fairy?
• Phoenix?
• Winner of the 2016 presidential election?
What is meaning?
• Meaning ≠ Dictionary entries
• Meaning ≠ Reference
• Sense: The mental representation of a word
or phrase, independent of its referent.
Sense ≠ Mental Image
• A word may have different mental images for
different people.
– E.g., “mother”
• A word may conjure a typical mental image (a
prototype), but can signify atypical examples as
Sense v. Reference
• A word/phrase may have sense, but no
– King of the world
– The camel in CIS 8538
– The greatest integer
– The
• A word may have reference, but no sense:
– Proper names: Dan McCloy, Kristi Krein
(who are they?!)
Sense v. Reference
• A word may have the same referent, but more
than one sense:
– The morning star / the evening star (Venus)
• A word may have one sense, but multiple
– Dog, bird
Some semantic relations
between words
• Hyponymy: subclass
Poodle < dog
Crimson < red
Red < color
Dance < move
• Hypernymy: superclass
• Synonymy:
– Couch/sofa
– Manatee / sea cow
• Antonymy:
– Dead/alive
– Married/single
Lexical Decomposition
• Word sense can be represented with
semantic features:
Compositional Semantics
Compositional Semantics
• The study of how meanings of small units
combine to form the meaning of larger units
The dog chased the cat ≠ The cat chased the dog.
ie, the whole does not equal the sum of the parts.
The dog chased the cat = The cat was chased by the dog
ie, syntax matters to determining meaning.
Principle of Compositionality
The meaning of a sentence is determined by
the meaning of its words in conjunction with
the way they are syntactically combined.
Exceptions to Compositionality
• Anomaly: when phrases are well-formed
syntactically, but not semantically
– Colorless green ideas sleep furiously. (Chomsky)
– That bachelor is pregnant.
Exceptions to Compositionality
• Metaphor: the use of an expression to refer
to something that it does not literally denote
in order to suggest a similarity
– Time is money.
– The walls have ears.
Exceptions to Compositionality
• Idioms: Phrases with fixed meanings not
composed of literal meanings of the words
– Kick the bucket = die
(*The bucket was kicked by John.)
– When pigs fly = ‘it will never happen’
(*She suspected pigs might fly tomorrow.)
– Bite off more than you can chew
= ‘to take on too much’
(*He chewed just as much as he bit off.)
Idioms in other languages
Logical Foundations
for Compositional Semantics
• We need a language for expressing the
meaning of words, phrases, and sentences
• Many possible choices; we will focus on
– First-order predicate logic (FOPL) with types
– Lambda calculus
Truth-conditional Semantics
• Linguistic expressions
– “Bob sings.”
• Logical translations
– sings(Bob)
– but could be p_5789023(a_257890)
• Denotation:
– [[bob]] = some specific person (in some context)
– [[sings(bob)]] = true, in situations where Bob is singing; false, otherwise
• Types on translations:
– bob: e(ntity)
– sings(bob): t(rue or false, a boolean type)
Truth-conditional Semantics
Some more complicated logical descriptions of language:
– “All girls like a video game.”
– x:e . y:e . girl(x)  [video-game(y)  likes(x,y)]
– “Alice is a former teacher.”
– (former(teacher))(Alice)
– “Alice saw the cat before Bob did.”
– x:e, y:e, z:e, t1:e, t2:e .
cat(x)  see(y)  see(z) 
agent(y, Alice)  patient(y, x) 
agent(z, Bob)  patient(z, x) 
time(y, t1)  time(z, t2)  <(t1, t2)
FOPL Syntax Summary
• A set of types T = {t1, … }
• A set of constants C = {c1, …}, each associated
with a type from T
• A set of relations R = {r1, …}, where each ri is a
subset of Cn for some n.
• A set of variables X = {x1, …}
• , , , , , , ., :
Truth-conditional semantics
• Proper names:
– Refer directly to some entity in the world
– Bob: bob
• Sentences:
– Are either t or f
– Bob sings: sings(bob)
• So what about verbs and VPs?
sings must combine with bob to produce sings(bob)
The λ-calculus is a notation for functions whose arguments are not yet filled.
sings: λx.sings(x)
This is a predicate, a function that returns a truth value. In this case, it takes a
single entity as an argument, so we can write its type as e  t
• Adjectives?
Lambda calculus
• FOPL + λ (new quantifier) will be our lambda calculus
• Intuitively, λ is just a way of creating a function
– E.g., girl() is a relation symbol; but
λx . girl(x) is a function that takes one argument.
• New inference rule: function application
(λx . L1(x)) (L2)
→ L1(L2)
E.g., (λx . x2) (3) → 32
E.g., (λx . sings(x)) (Bob) → sings(Bob)
• Lambda calculus lets us describe the meaning of words individually.
– Function application (and a few other rules) then lets us combine those
meanings to come up with the meaning of larger phrases or sentences.
Compositional Semantics
with the λ-calculus
• So now we have meanings for the words
• How do we know how to combine the words?
• Associate a combination rule with each grammar rule:
– S : β(α)  NP : α VP : β
(function application)
– VP : λx. α(x) ∧ β(x)  VP : α and : ∅ VP : β (intersection)
• Example:
Composition: Some more examples
• Transitive verbs:
– likes : λx.λy.likes(y,x)
– Two-places predicates, type e(et)
– VP “likes Amy” : λy.likes(y,Amy) is just a one-place predicate
• Quantifiers:
– What does “everyone” mean?
– Everyone : λf.x.f(x)
– Some problems:
• Have to change our NP/VP rule
• Won’t work for “Amy likes everyone”
– What about “Everyone likes someone”?
– Gets tricky quickly!
Composition: Some more examples
• Indefinites
– The wrong way:
• “Bob ate a waffle” : ate(bob,waffle)
• “Amy ate a waffle” : ate(amy,waffle)
– Better translation:
∃x.waffle(x) ^ ate(bob, x)
What does the translation of “a” have to be?
What about “the”?
What about “every”?
• What do we do with the logical form?
– It has fewer (no?) ambiguities
– Can check the truth-value against a database
– More usefully: can add new facts, expressed in
language, to an existing relational database
– Question-answering: can check whether a statement
in a corpus entails a question-answer pair:
“Bob sings and dances” 
Q:“Who sings?” has answer A:“Bob”
– Can chain together facts for story comprehension
• What does the translation likes : λx. λy. likes(y,x) have
to do with actual liking?
• Nothing! (unless the denotation model says it does)
• Grounding: relating linguistic symbols to perceptual
– Sometimes a connection to a database entry is enough
– Other times, you might insist on connecting “blue” to the
appropriate portion of the visual EM spectrum
– Or connect “likes” to an emotional sensation
• Alternative to grounding: meaning postulates
– You could insist, e.g., that likes(y,x) => knows(y,x)
More representation issues
• Tense and events
– In general, you don’t get far with verbs as predicates
– Better to have event variables e
• “Alice danced” : danced(Alice) vs.
• “Alice danced” : ∃^agent(e, Alice)^(time(e)<now)
– Event variables let you talk about non-trivial
tense/aspect structures:
“Alice had been dancing when Bob sneezed”
More representation issues
• Propositional attitudes (modal logic)
– “Bob thinks that I am a gummi bear”
• thinks(bob, gummi(me))?
• thinks(bob, “He is a gummi bear”)?
– Usually, the solution involves intensions (^p) which are,
roughly, the set of possible worlds in which predicate p is
• thinks(bob, ^gummi(me))
– Computationally challenging
• Each agent has to model every other agent’s mental state
• This comes up all the time in language –
– E.g., if you want to talk about what your bill claims that you bought, vs.
what you think you bought, vs. what you actually bought.
More representation issues
• Multiple quantifiers:
“In this country, a woman gives birth every 15 minutes.
Our job is to find her, and stop her.”
-- Groucho Marx
• Deciding between readings
– “Bob bought a pumpkin every Halloween.”
– “Bob put a warning in every window.”
More representation issues
• Other tricky stuff
Non-intersective adjectives
Generalized quantifiers
• “Cats like naps.”
• “The players scored a goal.”
– Pronouns and anaphora
• “If you have a dime, put it in the meter.”
– … etc., etc.
Mapping Sentences
to Logical Forms
CCG Parsing
• Combinatory Categorial
– Lexicalized PCFG
– Categories encode
argument sequences
• A/B means a category that
can combine with a B to
the right to form an A
• A \ B means a category
that can combine with a B
to the left to form an A
– A syntactic parallel to the
lambda calculus
Learning to map sentences
to logical form
• Zettlemoyer and Collins (IJCAI 05, EMNLP 07)
Some Training Examples
CCG Lexicon
Parsing Rules (Combinators)
Right: X : f(a)  X/Y : f Y : a
Left: X : f(a)  Y : a X\Y : f
Additional rules:
• Composition
• Type-raising
CCG Parsing Example
Parsing a Question
Lexical Generation
Input Training Example
Texas borders Kansas.
Logical form:
borders(Texas, Kansas)
• Input: a training example (Si, Li)
• Computation:
– Create all substrings of consecutive words in Si
– Create categories from Li
– Create lexical entries that are the cross products
of these two sets
• Output: Lexicon Λ
GENLEX Cross Product
Input Training Example
Texas borders Kansas.
Logical form:
borders(Texas, Kansas)
Output Lexicon
Output Substrings
Texas borders
borders Kansas
Texas borders Kansas
(cross product)
Output Categories
NP : texas
NP : kansas
(S\NP)/NP : λx.λy.borders(y,x)
GENLEX Output Lexicon
NP : texas
NP : kansas
(S\NP)/NP : λx.λy.borders(y,x)
NP : texas
NP : kansas
(S\NP)/NP : λx.λy.borders(y,x)
Texas borders Kansas
NP : texas
Texas borders Kansas
NP : kansas
Texas borders Kansas
(S\NP)/NP : λx.λy.borders(y,x)
Weighted CCG
Given a log-linear model with a CCG lexicon Λ, a
feature vector f, and weights w:
The best parse is: y* = argmax w ∙ f(x,y)
where we consider all possible parses y for the
sentence x given the lexicon Λ.
Parameter Estimation
for Weighted CCG Parsing
Inputs: Training set {(Si,Li) | i = 1, …, n}
Initial lexicon Λ, initial weights w, num. iter. T
Computation: For t=1 … T, i = 1 … n:
Step 1: Check correctness
If y* = argmax w ∙ f(Si,y) is Li, skip to next i
Step 2: Lexical generation
Set λ = Λ ∪ GENLEX(Si,Li)
Let y’ = argmax w ∙ f(Si,y)
y s.t. L(y) = Li
Define λi to be the lexical entries in y’
Set Λ = Λ ∪ λi
Step 3: Update Parameters
Let y’’ = argmax w ∙ f(Si,y)
If y’’ ≠ Li y
Set w = w + f(Si, y’) – f(Si,y’’)
Output: Lexicon Λ and parameters w
Example Learned Lexical Entries
Challenge Revisited
Disharmonic Application
Missing Content Words
Missing content-free words
A complete parse
Geo880 Test Set
Zettlemoyer & Collins 2007
Zettlemoyer & Collins 2005
Wong & Mooney 2007
Summing Up
• Hypothesis: Principle of Compositionality
– Semantics of NL sentences and phrases can be composed
from the semantics of their subparts
• Rules can be derived which map syntactic analysis to semantic
representation (Rule-to-Rule Hypothesis)
– Lambda notation provides a way to extend FOPC to this
– But coming up with rule2rule mappings is hard
• Idioms, metaphors and other non-compositional aspects of
language makes things tricky (e.g. fake gun)

Compositional Semantics