The Global Wordnet Grid:
anchoring languages
to universal meaning
Piek Vossen
Irion Technologies/Vrije Universiteit Amsterdam
6th International Plain Language Conference,
October 11-14th, 2007, Amsterdam
Overview:



Problem: effective language and communication
 From human to human
 From human to machine
 From machine to machine
 From human to machine and back to human, maybe via other
machines...
Solution: anchoring language to universal meaning
 Wordnets: network of words related through meaning
 The Global Wordnet Grid: wordnets for languages connected
to each other through an ontology
Future:
 Equal access to the knowledge and information on the Internet
to all people, regardless of language and background
 Systems that start to understand language
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Problem
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Language is inherently vague and
ambiguous

Communication through language:


mediates between the expectation of the Speaker
and the Hearer => half a word is enough
Language is not fully descriptive but
minimally sufficient:


Do not bother the Hearer with information that is
already known => rely on background knowledge
Use a minimal set of words and expressions to
avoid memory overloading => words and
expressions have multiple meaning
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Understanding is fundamentally impossible
Concept in our head
sweet with
pet
rabbit
wanna hug
carrots
and
rosemary
devine
appearance
announcing
spring
"gavagai"
Plato with beard
W.V.O.Quine (1964): inscrutability of reference
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Full understanding is fundamentally
impossible BUT?



People do communicate...
People even communicate with computers...
As long as language is effective:


meaning= to have the desired effect!
Link language to useful content!
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
What is effective computer-mediated
language?

Computers store information and knowledge in
textual form:



Computers analyze information and knowledge:


People search information and knowledge by 'querying'
computers
Effective Computer Mediated Communication (CMC) = find
what you need and nothing else
Collect data and send alerts, reports and facts
Computers connect people:

Support communication across people by analyzing
communication or translating languages
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Concept
Concept
Expression in language
Expression in language
Words….
….Words
Index of Strings
Information
Seeker
Strings
Query
ape
….
energy
….
mass
….
….
zebra
Strings
Strings
Information
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Information
Provider
Conceptual match
Concept
Expression in language
Concept
Expression in language
my cell phone….
….mobile
Index of Strings
Information
Seeker
Strings
Query
ape
….
….
….
mobile
….
….
zebra
Strings
Strings
Information
Linguistic mismatch
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Information
Provider
Conceptual mismatch
Concept
Expression in language
Concept
Expression in language
my cell phone….
….nerve cells
Index of Strings
Information
Seeker
Strings
Query
ape
….
cell
….
….
….
….
zebra
Strings
Strings
Information
Linguistic match
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Information
Provider
Conceptual mismatch
Concept
Expression in language
Concept
Expression in language
police cell ….
…. nerve cells
Index of Strings
Information
Seeker
Strings
Query
ape
….
cell
….
….
….
….
zebra
Strings
Strings
Information
Linguistic match
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Information
Provider
Conceptual match
Concept
Expression in language
Concept
Expression in language
neuron ….
….nerve cells
Index of Strings
Information
Seeker
Strings
Query
ape
….
cell
….
….
….
….
zebra
Strings
Strings
Information
Linguistic mismatch
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Information
Provider
Recall & Precision
Search engine for
database with
all documents
“nerve cell”
“police cell”
found
query:
“cell”
“cell
phone”
intersection
“mobile
phones”
relevant
recall = doorsnede / relevant
Recall < 20%
for basic search engines!
precision = doorsnede / gevonden
(Blair & Maron 1985)
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Useless dialogues with Alice-bot
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
It is useful to anchor meaning!
Anchoring already takes place all over the
world through standardization:




measures and units: meter, liter, kilo
terminological databases, legal definitions,
contracts
international cooperation
ontologies: definition of the meaning of concepts
in a formal knowledge presentation system, (1st
order logic) so that a computer can reason with it
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Solution
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
How can we anchor the meaning of
words?

We can anchor words to each other:


semantic network or wordnet
We can anchor words to logical implications:

a formal ontology
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Relational model of meaning
animal
kitten
animal
man
boy
man
woman
cat
dog
cat
meisje
boy
girl
kitten
puppy
dog
puppy
woman
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Princeton WordNet




Developed by George Miller and his team at
Princeton University, as the implementation
of a mental model of the lexicon
Organized around the notion of a synset: a
set of synonyms in a language that represent
a single concept
Semantic relations between concepts
Covers over 100,000 concepts and over
120,000 English words
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Wordnet: a network of semantically
related words
{conveyance;transport}
{vehicle}
{motor vehicle; automotive vehicle}
{car mirror}
{armrest}
{car door}
{doorlock}
{car; auto; automobile; machine; motorcar}
{bumper}
{car window}
{cruiser; squad car; patrol car;
police car; prowl car}
{cab; taxi; hack; taxicab}
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
{hinge;
flexible joint}
Wordnet family
Princeton
WordNet,
(Fellbaum
EuroWordNet,
1998):
8languages
languages
BalkaNet,
Global Wordnet
(Tufis(Vossen
Association:
2004):
6
languages
all1998):
115,000 conceps
SUMO DOLCE
Domains
Road
4
2
German Words
2
Dutch Words
2
English Words
ENGLISH
Car
…
Train
…
Vehicle
3
vehículo
1
auto tren
Spanish Words
auto trein
train
veicolo
Auto Zug
voertuig
1
TransportDevice
1
2
Device
Air Water
vehicle
car
Fahrzeug
1
Object
Transport
1
Inter-Lingual-Index
auto treno
2
Italian Words
dopravní prostředník
auto
1
2
Czech Words
vlak
1
liiklusvahend
auto
3
véhicule
voiture
1
train
2
French Words
killavoor
2
Estonian Words
Wordnets as autonomous languagespecific structures
Wordnet1.5
Dutch Wordnet
voorwerp
{object}
object
artifact, artefact
(a man-made object)
block
natural object (an
object occurring
naturally)
blok
{block}
instrumentality
body
implement
lichaam
{body}
device
container
tool
instrument
box
werktuig{tool}
spoon
bag
bak
{box}
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
lepel
{spoon}
tas
{bag}
Complex equivalence relations
1. Multiple Targets (1:many)
Dutch wordnet: schoonmaken (to clean) matches with 4
senses of clean in WordNet1.5:
• make clean by removing dirt, filth, or unwanted substances from
• remove unwanted substances from, such as feathers or pits, as of chickens or fruit
• remove in making clean; "Clean the spots off the rug"
• remove unwanted substances from - (as in chemistry)
2. Multiple Sources (many:1)
Dutch wordnet: versiersel near_synonym versiering
Target record: decoration.
3. Multiple Targets and Sources (many:many)
Dutch wordnet: toestel near_synonym apparaat
Target records: machine; device; apparatus; tool
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Complex equivalece relations
Gaps in the English WordNet:

genuine, cultural gaps: unknown in English culture:


Dutch: klunen, to walk on skates over land from one
frozen water to the other
pragmatic gaps: the concept is known but is not
expressed by a single lexicalized form in English:

Dutch: kunstproduct = artifact substance <=> artifact
object
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
From EuroWordNet to Global WordNet



Global Wordnet Association:
http://www.globalwordnet.org
Bi-annual conference: India (2002), Czech (2004),
Korea (2006), Hungary (2008), ....
Currently, wordnets exist for more than 40
languages, including:
Arabic, Bantu, Basque, ...., Chinese, Bulgarian, Estonian,
Hebrew, ...., Icelandic, Japanese, Kannada, Korean,
Latvian, Latin, ....Nepali, Persian, Romanian, Sanskrit,
Tamil, Thai, Turkish, .... Zulu

Many languages are genetically and typologically
unrelated
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Some downsides



Construction is not done uniformly
Coverage differs
Not all Wordnets can communicate with one another:






not linked
linked to different versions: 1.5, 1.6, 1.7, 2.0 and now 3.0, 3.1
linked with different relations
Proprietary rights restrict free access and usage
A lot of the semantics is duplicated
Complex and obscure equivalence relations due to
linguistic differences between English and other
languages
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Next step: Global WordNet Grid
Auto Zug
Inter-Lingual
Ontology
vehicle
voertuig
1
auto trein
1
car
Object
train
2
1
Device
3
TransportDevice
véhicule
auto tren
veicolo
voiture
1
auto treno
dopravní prostředník
2
Italian Words
auto
1
vlak
2
Czech Words
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
liiklusvahend
auto killavoor
3
vehículo
1
Spanish Words
2
German Words
Dutch Words
2
English Words
2
Fahrzeug
1
1
train
2
French Words
2
Estonian Words
The Ontology: main features


Formal, artificial ontology serves as universal index of
concepts
List of concepts is not just based on the lexicon of a
particular language (unlike in EuroWordNet) but uses
ontological observations:





Lexicalization in a language is not sufficient to warrant inclusion in
the ontology
Lexicalization in all or many languages may be sufficient
Ontological observations will be used to define the concepts in the
ontology
Concepts are related in a type hierarchy
Concepts are defined with axioms: Knowledge
Interchange Format (KIF) based on first order predicate
calculus and atomic elements
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Concepts by ontological observations

Types and Roles among the hyponyms of dog in Wordnet:



husky, lapdog; toy dog; hunting dog; working dog; dalmatian,
coach dog, carriage dog; basenji; pug, pug-dog; Leonberg;
Newfoundland; Great Pyrenees; spitz; griffon, Brussels griffon,
Belgian griffon; corgi, Welsh corgi; poodle, poodle dog; Mexican
hairless; pooch, doggie, doggy, barker, bow-wow; cur, mongrel,
mutt
Current WordNet treatment:
(1) a husky is a kind of dog
(2) a husky is a kind of working dog
What’s wrong?
(2) is defeasible, (1) is not:
*This husky is not a dog => RIGID TYPE
This husky is not a working dog => ROLE, NON-RIGID
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Ontology versus wordnet

Hierarchy of disjunct types:
Canine  PoodleDog; NewfoundlandDog;
GermanShepherdDog; Husky

Wordnet:

NAMES for TYPES:
{poodle}EN, {poedel}NL, {pudoru}JP
 ((instance x Poodle)

LABELS for ROLES:
{watchdog}EN, {waakhond}NL, {banken}JP
((instance x Canine) and (role x GuardingProcess))
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Properties of the Ontology




Minimal: terms are distinguished by essential
properties only
Comprehensive: includes all distinct
concepts types of all Grid languages
Allows definitions via KIF of all words that
express non-rigid, non-essential properties of
types
Logically valid, allows inferencing
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Ontology versus Wordnet

Not added to the type hierarchy:
{straathond}NL (a dog that lives in the streets)
 ((instance x Canine) and (habitat x Street))

Added to the type hierarchy:
{klunen}NL (to walk on skates from one frozen body to
the next over land)
KluunProcess => WalkProcess
Axioms:
(and (instance x Human) (instance y Walk) (instance z
Skates) (wear x z) (instance s1 Skate) (instance s2
Skate) (before s1 y) (before y s2) etc…

National dishes, customs, games,....
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Ontology versus Wordnet

Refer to sets of types in specific circumstances or to
concept that are dependent on these types, next to
{rivierwater}NL there are many others:
{theewater}NL (water used for making tea)
{koffiewater}NL (water used for making coffee)
{bluswater}NL (water used for making extinguishing file)

Relate to linguistic phenomena:

gender, perspective, aspect, diminutives, politeness,
pejoratives, part-of-speech constraints
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
KIF expression for gender marking
{teacher}EN
((instance x Human) and (agent x
TeachingProcess))



{Lehrer}DE ((instance x Man) and (agent x
TeachingProcess))
{Lehrerin}DE ((instance x Woman) and
(agent x TeachingProcess))
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
KIF expression for perspective
sell: subj(x), direct obj(z),indirect obj(y)
buy: subj(y), direct obj(z),indirect obj(x)
FinancialTransaction
(and (instance x Human)(instance y Human)
(instance z Entity) (instance e FinancialTransaction)
(source x e) (destination y e) (patient e)
The same process but a different perspective by
subject and object realization: marry in Russian two
verbs, apprendre in French can mean teach and learn
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Advantages of the Global Wordnet Grid

Shared and uniform world knowledge:




universal inferencing
uniform text analysis and interpretation
More compact and less redundant databases
More clear notion how languages map to the
knowledge


better criteria for expressing knowledge
better criteria for understanding variation
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Future
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Language technology: a hole in one!
golf
club(s)
thesaurus
Linguistic
analysis
golf
clubs
Synonyms,
Semantic network
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Index concepts rather than words

Meaning of a word in context:

Domain of the document:


Topic of the paragraph:


[wing player]football player in [police cell]jail
Topic of the query:


transfer scandal => business, crime
Phrase: linguistically-motivated combination of
words:


Juventus => football
Can I order chicken wings? => food
Phrase:

[chicken wings]dish
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Expansion with clear hyponymy
dog
hunting dog
puppy
dachshund
lapdog
street dog
poodle
bitch
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a type to roles
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Expansion with clear hyponymy
dog
hunting dog
puppy
dachshund
lapdog
street dog
poodle
bitch
watchdog
short hair
dachshund
long hair
dachshund
Expansion from a role to types and other roles
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Thought
Objects
in reality
Ontology
携帯電話
(keitaidenwa )
Texts
Knowledge &
information
Expression
Useful and effective behavior:
-reason over knowledge
-collect information and data
-deliver services and be helpful
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Automotive ontology: (http://www.ontoprise.de)
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Dialogue system
Question
Analysis
Dialogue
Manager
• Can I help you?
• My head phone is broke.
• Would you like repair or products?
Search
Engine
Topic
detection
Word
Concept
information
products
mobile
accessories
head phone
reparair
• I want to buy a new one.
• Can yousay more about products?
• It is for my cell phone.
• Can you give more details?
• It is a Nokia 6110
• I got the following accessoires for you.
Please have a look.
• That is not what I want!
User
Model
-Intention
-Satisfaction
-Emotion
Information
State:
-Positive
-Negative
-Relations
Text
Analysis
Website
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Communicative dialog system

Prevent deadlocks:




Detects vagueness and ambiguity (what meaning of cell?)
Detect topic changes
Uses negative feedback: “No jails, I want cell phones!”
Can handle out-of-domain questions (users do not know
what the system knows) :


"We do not have hotel rooms but we do have electronic
equipment".
"No, we do not have portophones but we do have other electronic
equipement such as cell phones"
space
object
equipment
room
hotel room
6th
cell phone
International PLAIN language
Conference
th
11-14 October, Amsterdam
portophone
THANK YOU FOR YOUR ATTENTION
6th International PLAIN language
Conference
th
11-14 October, Amsterdam
Descargar

Slide 1