Semantic Web:
Asking the Right Questions
Włodzisław Duch
Department of Informatics,
Nicolaus Copernicus University, Poland
Google: W Duch
Julian Szymański
Dept. of Electronic, Telecommunication & Informatics,
Gdańsk University of Technology, Poland
Goal: Reaching human-level competence in all aspects of NLP.
Semantic Web – what and how?
Semantic memories.
The human way, or neurocognitive inspirations.
Word games and dialogues for knowledge acquisition
Search guided by semantic knowledge.
Example: psychiatric diagnoses.
Words in the brain.
Priming, brains and creativity.
Taking heads and words games.
Enhancing document representations.
Medical applications.
Semantic Web: idea
Words are ambiguous => Some information is difficult to find.
Duch – a name? A nikname? A spirit? A ghost?
Software agents can’t understand.
I have a dream for the Web [in which computers] become capable of
analyzing all the data on the Web – the content, links, and transactions
between people and computers. A ‘Semantic Web’, which should make this
possible, has yet to emerge, but when it does, the day-to-day mechanisms
of trade, bureaucracy and our daily lives will be handled by machines talking
to machines. The ‘intelligent agents’ people have touted for ages will finally
Tim Berners-Lee, 1999
“The Semantic Web is an extension of the current web in which
information is given well-defined meaning, better enabling computers
and people to work in cooperation.”
Tim Berners-Lee, James Hendler, Ora Lassila,
The Semantic Web, Scientific American, May 2001
Semantic Web – how?
Several roads leading to the Semantic Web:
• Use Natural Language Processing to understand the meaning.
• Use automatic clusterization techniques.
• Use NLP to annotate the web.
• Annotate the web in semi-automatic way.
• Help to ask the right questions.
First, create standards for metadata, define how to express knowledge.
The Semantic Web is a major research initiative of the World Wide
Web Consortium (W3C) to create a metadata-rich Web of resources
that can describe themselves not only by how they should be
displayed (HTML) or syntactically (XML), but also by the meaning of the
W3C Semantic Web Activity Page
Semantic Web: standards
Extend HTML => XML;
B = Ontology languages built on top of RDF and RDF Schema
OWL Full
OWL Lite
RDF Schema
XML, XML Schema
• A = Ontology languages based on XML syntax
• B = Ontology languages built on top of RDF and RDF Schema
• Semantic Web: not much more than standards
SM Techno
XML: customizable eXtensible Markup Language, additional tags
XHTML: Extensible HTML
XML Schema: is a language for providing and restricting the structure and
content of elements contained within XML documents.
RDF: Resource Description Framework, language for expressing data
models, which refer to objects ("resources") and their relationships.
An RDF-based model can be represented in XML syntax.
RDF Schema: a vocabulary for describing properties and classes of RDFbased resources, with semantics for generalized-hierarchies of such
properties and classes.
DAML: DARPA Agent Markup Language
OIL: Ontology Inference Layer
OWL: more sophisticated vocabulary for describing properties and classes:
relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"),
equality, richer typing of properties, characteristics of properties (e.g.
symmetry), and enumerated classes.
SPARQL: is a protocol and query language for semantic web data sources.
Collect info from distributed sources, linking RDFs.
Knowledge: how?
CYC, Douglas Lenat, started in 1984.
Developed by CyCorp, with 2.5 millions of assertions linking over
300.000 concepts and using thousands of micro-theories (2007).
Cyc-NL is still a “potential application”, knowledge representation in
frames is quite complicated and thus difficult to use.
Open Mind Common Sense Project (MIT):
a WWW collaboration with over 14,000 authors, who contributed 710,000
sentences; used to generate ConceptNet, very large semantic network.
Some interesting projects are being developed now around this network but
no systematic knowledge has been collected.
Other such projects: HowNet (Chinese Academy of Science),
FrameNet (Berkley), various large-scale ontologies.
DBPedia - 1.6 million concepts, 91 million RDF triples from Wikipedia
Semantic Memory Models
Endel Tulving „Episodic and Semantic Memory” 1972.
Semantic memory refers to the memory of meanings and understandings.
It stores concept-based, generic, context-free knowledge.
Permanent container for general knowledge (facts, ideas, words etc).
Hierarchical Model
Collins & Quillian, 1969
Semantic network
Collins & Loftus, 1975
Semantic memory
Hierarchical model of semantic memory (Collins and Quillian, 1969),
followed by most ontologies.
Connectionist spreading activation model (Collins and Loftus, 1975), with
mostly lateral connections.
Our implementation is based on connectionist model, uses relational
database and object access layer API.
The database stores three types of data:
• concepts, or objects being described;
• keywords (features of concepts extracted from data sources);
• relations between them.
IS-A relation us used to build ontology tree, serving for activation
spreading, i.e. features inheritance down the ontology tree.
Types of relations (like “x IS y”, or “x CAN DO y” etc.) may be defined
when input data is read from dictionaries and ontologies.
Creating SM
The API serves as a data access layer providing logical
operations between raw data and higher application layers.
Data stored in the database is mapped into application
objects and the API allows for retrieving specific concepts/keywords.
Two major types of data sources for semantic memory:
1. machine-readable structured dictionaries directly convertible into
semantic memory data structures;
2. blocks of text, definitions of concepts from dictionaries/encyclopedias.
3 machine-readable data sources are used:
The Suggested Upper Merged Ontology (SUMO) and the the MIdLevel Ontology (MILO), over 20,000 terms and 60,000 axioms.
WordNet lexicon, more than 200,000 words-sense pairs.
ConceptNet, concise knowledgebase with 200,000 assertions.
Creating SM – free text
WordNet hypernymic (a kind of … ) IS-A relation + Hyponym
and meronym relations between synsets (converted into
concept/concept relations), combined with ConceptNet relation
such as: CapableOf, PropertyOf, PartOf, MadeOf ...
Relations added only if in both Wordnet and Conceptnet.
Free-text data: Merriam-Webster, WordNet and Tiscali.
Whole word definitions are stored in SM linked to concepts.
A set of most characteristic words from definitions of a given concept.
For each concept definition, one set of words for each source dictionary is
used, replaced with synset words, subset common to all 3 mapped back to
synsets – these are most likely related to the initial concept.
They were stored as a separate relation type.
Articles and prepositions: removed using manually created stop-word list.
Phrases were extracted using ApplePieParser + concept-phrase relations
compared with concept-keyword, only phrases that matched keywords
were used.
Semantic => vector reps
Word w in the context: (w,Cont), distribution of brain activations.
States (w,Cont)  lexicographical meanings: clusterize (w,Cont) for all
contexts, define prototypes (wk,Cont) for different meanings wk.
Simplification: use spreading activation in semantic networks to define .
How does the activation flow? Try this algorithm on collection of texts:
Perform text pre-processing steps: stemming, stop-list, spell-checking ...
Use MetaMap with a very restrictive settings to discover concepts,
avoiding highly ambiguous results when mapping text to UMLS ontology.
Use UMLS relations to create first-order cosets (terms + all new terms
from included relations); add only those types of relations that lead to
improvement of classification results.
Reduce dimensionality of the first-order coset space, leave all original
features; use feature ranking method for this reduction.
Repeat last two steps iteratively to create second- and higher-order
enhanced spaces, first expanding, then shrinking the space.
Create X vectors representing concepts.
RDF and semantic knowledge
vwCRK: certainty – truth –
Concept Relation Keyword
Similar to RDF in semantic web.
Simplest rep. for massive evaluation/association:
CDV – Concept Description Vectors, forming
Semantic Matrix
body part
Visual representation
Visualization of Wordnet, Wikipedia, ULMS and CiteSeer, help to edit
knowledge; available at
Visual representation
Visualization of relations between Wordnet and Wikipedia concepts:
IS_A: specific features from more general objects.
Inherited features with w from superior relations;
v decreased by 10% + corrected during interaction with user.
Similar: defines objects which share features with each other;
acquire new knowledge from similar objects through swapping of
unknown features with given certainty factors.
Excludes: exchange some unknown features, but reverse the sign of
w weights.
Entail: analogical to the logical implication, one feature automatically
entails a few more features (connected via the entail relation).
Atom of knowledge contains strength and the direction of relations
between concepts and keywords coming from 3 components:
directly entered into the knowledge base;
deduced using predefined relation types from stored information;
obtained during system's interaction with the human user.
The goal of the 20 question game is to guess a concept that
the opponent has in mind by asking appropriate questions. has a version that is now implemented in some toys!
Based on concepts x question table T(C,Q) = usefulness of Q for C.
Learns T(C,Q) values, increasing after successful games, decreasing
after lost games. Guess: distance-based.
SM does not assume fixed questions.
Use of CDV admits only simplest form “Is it related to X?”, or “Can it be
associated with X?”, where X = concept stored in the SM.
Needs only to select a concept, not to build the whole question.
Once the keyword has been selected it is possible to use the full power of
semantic memory to analyze the type of relations and ask more
sophisticated questions.
How is the concept selected?
20q for semantic data acquisition
Play 20 questions with Avatar!
Think about animal – system tries to
guess it, asking no more than 20
questions that should be answered
only with Yes or No.
Given answers narrows the subspace
of the most probable objects.
System learns from the games –
obtains new knowledge from
interaction with the human users.
Is it vertebrate? Y
Is it mammal? Y
Does it have hoof? Y
Is it equine? N
Is it bovine? N
Does it have horn? N
Does it have long neck? Y
I guess it is giraffe.
20 Q web
Algorithm for 20 questions game
I ( keyw ord )    p  keyw ord  v i  log p  keyw ord  v i 
p(keyword=vi) is fraction of concepts for which the keyword has value vi
Subspace of candidate concepts O(A) is selected:
O (A) = {i; d=|CDVi-ANS | is minimal }
d (C D V , A N S ) 
 1  dist ( C D V
 ANSn ) / | ANS |
n 1
dist ( x , y )  
 x
 y
 y
y  NU LL 
x  NU LL 
where CDVi is a vector for i-concept, ANS is a partial vector of retrieved
answers we can deal with user mistakes choosing d > minimal
Automatic data acquisition
Basic semantic data obtained from aggregation of machine readable
dictionaries: Wordnet, ConceptNet, Sumo Ontology
– Used relations for semantic category: animal
– Semantic space truncated using word popularity rank:
Rank ( w ord ) 
IC  G R  BN C
m ax( Rank )
IC – information content is an amount of appearances of the particular
word in WordNet descriptions.
GR - GoogleRank is an amount of web pages returned by Google
search engine for a given word .
BNC - are the words statistics taken from British National Corpus.
Initial semantic space reduced to 94 objects and 72 features
Human interaction & knowledge acquisition
• Data obtained from machine readable dictionaries:
Not complete
Not Common Sense
Sometimes specialized concepts
Some errors
• Knowledge correction in the
semantic space:
w0 *  
w 
W0 – initial weight, initial knowledge (from dictionaries)
ANS – answer given by user
N – number of answers
β - parameter indicating importance of initial knowledge
N 
Active Dialogues
Dialogues with the user for obtaining new knowledge/features:
While system fails guess the object:
I give up. Tell me what did you think of?
The concepts used in the game corrects the semantic space
While two concepts has the same CDV
Tell me what is characteristic for <concept1/2> ?
The new keywords for specified concepts are stored in semantic memory
While system needs more knowledge for same concept:
I don’t have any particular knowledge about <concept>.
Tell me more about <concept>.
System obtains new keywords for a given concept.
Experiments in animal domain
WordNet, ConceptNet, SumoMilo ontology + MindNet project as
knowledge sources; added to SM only if it appears in at least 2 sources.
Basic space: 172 objects, 475 features, 5031 relations.
# features/concept = CDV density.
Initial CDV density = 29,
adding IS_A relations =41,
adding similar, entails, excludes=46.
Quality Q = NS/N =
#searches with success/# all searches.
Error E = 1-Q = 1-NS/N.
For 10 concepts selected with #features
close to the average.
Q~0.8, after 5 repetition E ~ 18%, so some learning is needed.
Quality measures
Initial semantic space: average # of games for correct recognition ~2.8.
This depends on the number of semantic neighbors close to this concept.
Completeness of concept representation:
• is CDV description sufficient to win the game?
• how far is it from the golden standard (manually created)?
4 measures of the concept description quality:
Sd = Nf(GS)–Nf(O) = #Golden Standard features - #features in O.
how many features are still missing compared to the golden standard.
SGS=Si [1–d(CDVi(GS),CDVi(O))] similarity based on co-occurrence.
SNO = #features in O but not found in GS (reverse of SGS).
Difw= Si [|CDVi(O)–CDVi(GS)|/m, average difference of O and GS.
Learning from games
Select O randomly with preference for larger # features, p~exp(-N(O)/N))
N(O) = #features in O, and N = total number of features,
Learning procedure:
CDV(O) representation of the chosen concept O is inspected, and if
necessary corrected.
CDV(O) is removed from the memory.
Try to learn the concept O by playing the 20 questions game.
Average results for 5 test objects as a function of # games shown.
NOr = SNO + SGS graph showing the average growth of the number of
features as a function of the number of games played.
Randomization of questions helps to find different features in each game.
Average number of games to learn selected concepts Nf=2.7.
After the first successful game when a particular concept has been
correctly recognized it was always found properly.
After 4 games only a few new features are added.
DSM IV experiment
Asks relevant questions, formulate the query precisely.
DSM IV: Diagnostic and Statistical Manual of Mental Disorders, used by
all psychiatrist as a reference text for diagnosis of mental disorders.
Lists different categories of mental disorders and diagnostic criteria.
6 decision trees for diagnosis in specialized psychiatric domains.
A search for mental problem is based on symptoms: ask only important
questions, depending on what you already know!
Compare results with diagnosis based on decision trees in DSM IV.
Algorithm formulates questions using our 20Q approach.
Result: questions asked to increase information gain lead nearly 45%
faster to the diagnosis then the sequential tests in most decision trees.
DSM IV experiment results
Even in mistakes in answers for some classes of problems less questions
were asked than with the standard decision tree approach.
Identify objects
What is it? How to identify animals, flowers, cars and other natural
objects using images? Let the search engine help you!
Few conclusions
Complex knowledge in frames is not too useful for large-scale search.
Semantic search requires extensive knowledge.
We do not have even the simplest common sense knowledge
description, in many applications such representations are sufficient.
It should be easier to generate this knowledge rather than wait for
embodied systems.
Semantic memory built from parsing dictionaries, encyclopedias,
ontologies, results of collaborative projects.
Active search is used to assign features found for concepts that are
not far in ontology (for example, have same parents).
Large-scale effort to create a numerical version of Wordnet for general
applications is necessary, specialized knowledge is also important.
Word games may help to create and correct some knowledge.
20Q is easier than Turing text, good intermediate step.
Time for word games Olympics!
Neurocognitive approach
Why is NLP so hard? Only human brains are adapted to it.
Ambitious approach: make an artificial brain!
Computational cognitive neuroscience: aims at rather detailed neural
models of cognitive functions, first annual CNN conf. Nov. 2005.
Brain simulation with ~1010 neurons and ~1015 synapses (NSI San Diego),
1 sec = 50 days on a 27 processor Beowulf cluster.
Neurocognitive informatics: focus on simplified models of higher
cognitive functions: in case of NLP various types of associative
memory: recognition, semantic and episodic.
Many speculations, because we do not know the underlying brain
processes, but models explaining most neuropsychological syndromes
exist; computational psychiatry is rapidly developing since ~ 1995.
“Roadmap to human level intelligence” – workshops ICANN’05, WCCI’06
Words in the brain
Psycholinguistic experiments show that most likely categorical,
phonological representations are used, not the acoustic input.
Acoustic signal => phoneme => words => semantic concepts.
Phonological processing precedes semantic by 90 ms (from N200 ERPs).
F. Pulvermuller (2003) The Neuroscience of Language. On Brain Circuits of
Words and Serial Order. Cambridge University Press.
networks inferred
from ERP and fMRI
Phonological neighborhood density = the number of words that are similar
in sound to a target word. Similar = similar pattern of brain activations.
Semantic neighborhood density = the number of words that are similar in
meaning to a target word.
Brain areas involved
Organization of the word recognition circuits in the left temporal lobe
has been elucidated using fMRI experiments (Cohen et al. 2004).
How do words that we hear, see and thinking of activate the brain?
Seeing words: orthography, phonology, articulation, semantics.
Visual word form area (VWFA) in the left occipitotemporal sulcus is strictly
unimodal visual area.
Adjacent lateral inferotemporal multimodal area (LIMA) reacts to both
auditory & visual stimulation, has cross-modal phonemic and lexical links.
Likely: homolog of the VWFA in the auditory stream, the auditory word form
area, located in the left anterior superior temporal sulcus; this area shows
reduced activity in developmental dyslexics.
Large variability in location of these regions in individual brains.
Insights and brains
Activity of the brain while solving problems that required insight and that
could be solved in schematic, sequential way has been investigated.
E.M. Bowden, M. Jung-Beeman, J. Fleck, J. Kounios, „New approaches to
demystifying insight”. Trends in Cognitive Science 2005.
After solving a problem presented in a verbal way subjects indicated
themselves whether they had an insight or not.
An increased activity of the right hemisphere anterior superior temporal
gyrus (RH-aSTG) was observed during initial solving efforts and insights.
About 300 ms before insight a burst of gamma activity was observed,
interpreted by the authors as „making connections across distantly related
information during comprehension ... that allow them to see connections
that previously eluded them”.
Insight interpreted
What really happens? My interpretation:
LH-STG represents concepts, S=Start, F=final
understanding, solving = transition, step by step, from S to F
if no connection (transition) is found this leads to an impasse;
RH-STG ‘sees’ LH activity on meta-level, clustering concepts into
abstract categories (cosets, or constrained sets);
connection between S to F is found in RH, leading to a feeling of vague
gamma burst increases the activity of LH representations for S, F and
intermediate configurations;
stepwise transition between S and F is found;
finding solution is rewarded by emotions during Aha! experience;
they are necessary to increase plasticity and create permanent links.
What features of our brain/minds are most mysterious?
Consciousness? Imagination? Intuition? Emotions, feelings?
Higher mental functions?
Masao Ito (director of RIKEN, neuroscientist) answered: creativity.
Still domain of philosophers, educators and a few psychologists, for ex.
Eysenck, Weisberg, or Sternberg (1999), who defined creativity as:
“the capacity to create a solution that is both novel and appropriate”.
MIT Encyclopedia of Cognitive Sciences has 1100 pages, 6 chapters
about logics & over 100 references to logics in the index.
Creativity: 1 page (+1 page about „creative person”).
Intuition: 0, not even mentioned in the index.
In everyday life we use intuition more often than logics.
Unrestricted fantasy? Creativity may arise from higher-order schemes!
Use templates for analytical thinking, J. Goldenberg & D. Mazursky,
Creativity in Product Innovation, CUP 2002
Memory & creativity
Creative brains accept more incoming stimuli from the surrounding
environment (Carson 2003), with low levels of latent inhibition responsible
for filtering stimuli that were irrelevant in the past.
“Zen mind, beginners mind” (S. Suzuki) – learn to avoid habituation!
Creative mind maintains complex representation of objects and situations.
Pair-wise word association technique may be used to probe if a connection
between different configurations representing concepts in the brain exists.
A. Gruszka, E. Nęcka, Creativity Research Journal, 2002.
Word 1
Priming 0,2 s
Word 2
Words may be close (easy) or distant (difficult) to connect;
priming words may be helpful or neutral;
helpful words are related semantically or phonologically (hogse for horse);
neutral words may be nonsensical or just not related to the presented pair.
Results for groups of people of low/high creativity are surprising …
Creativity & associations
Hypothesis: creativity depends on the associative memory,
ability to connect distant concepts together.
Results: creativity is correlated with greater ability to associate words &
susceptibility to priming, distal associations show longer latencies before
decision is made.
Neutral priming is strange!
• for close words and nonsensical priming words creative people do worse
than less creative; in all other cases they do better.
• for distant words priming always increases the ability to find association,
the effect is strongest for creative people.
Latency times follow this strange patterns.
Conclusions of the authors:
More synaptic connections => better associations => higher creativity.
But results for neutral priming are puzzling!
Paired associations
So why neutral priming for close associations and
nonsensical priming words degrades results of creative people?
High creativity = many connections between microcircuits; nonsensical
words add noise, increasing activity between many circuits; in a densely
connected network adding noise creates confusion, the time need for
decision is increased because the system has to settle in specific attractor.
If creativity is low and associations distant, noise does not help, because
there are no connections, and priming words contribute only to chaos.
Nonsensical words increase overall activity in the intermediate configurations. For creative people resonance between distant microcircuits is
possible: this is called stochastic resonance, observed in perception.
For priming words with similar spelling and close words the activity of the
second word representation is higher, always increasing the chance of
connections and decreasing latency. For distant words it will not help, as
intermediate configurations are not activated.
Words: simple model
• make the simplest testable model of creativity;
• create interesting novel words that capture some features of products;
• understand new words that cannot be found in the dictionary.
Model inspired by the putative brain processes when new words are being
invented starting from some keywords priming auditory cortex.
Phonemes (allophones) are resonances, ordered activation of phonemes
will activate both known words as well as their combinations; context +
inhibition in the winner-takes-most leaves only a few candidate words.
Creativity = network+imagination (fluctuations)+filtering (competition)
Imagination: chains of phonemes activate both word and non-word
representations, depending on the strength of the synaptic connections.
Filtering: based on associations, emotions, phonological/semantic density.
Generating novel words
Approximations: associative neural networks, self-organizing
networks, or statistical models capturing phono/morphology.
• create probability models for linking phonemes and syllables;
• create semantic and phonological distance measures for words.
Statistical algorithm to find novel words:
• Read initial pool of keywords.
• Find phonological and semantic associations to increase the pool.
• Break all words into chains of phonemes and chains of morphemes.
• Find all combinations of fragments forming longer chunks ranked
according to their phonological probability (estimating ngram plausibility).
• For final ranking use estimation of semantic density around morphemes
in the newly created words.
Words: experiments
A real letter from a friend:
I am looking for a word that would capture the following qualities: portal to
new worlds of imagination and creativity, a place where visitors embark on
a journey discovering their inner selves, awakening the Peter Pan within.
A place where we can travel through time and space (from the origin to the
future and back), so, its about time, about space, infinite possibilities.
FAST!!! I need it sooooooooooooooooooooooon.
creativital, creatival (creativity, portal), used in
creativery (creativity, discovery), (strategy+creativity)
discoverity = {disc, disco, discover, verity} (discovery, creativity, verity)
digventure ={dig, digital, venture, adventure, venue, nature} still new!
imativity (imagination, creativity); infinitime (infinitive, time)
infinition (infinitive, imagination), already a company name
learnativity (taken, see
portravel (portal, travel); sportal (space, sport, portal), taken
quelion – lion of query systems! Web site
timagination (time, imagination); timativity (time, creativity)
tivery (time, discovery); trime (travel, time)
Word games
Word games were popular before computer games.
They are essential to the development of analytical thinking.
Until recently computers could not play such games.
The 20 question game may be the next great challenge for AI,
because it is more realistic than the unrestricted Turing test;
a World Championship could involve human and software players.
Finding most informative questions requires knowledge and creativity.
Performance of various models of semantic memory and episodic
memory may be tested in this game in a realistic, difficult application.
Asking questions to understand precisely what the user has in mind is
critical for search engines and many other applications.
Creating large-scale semantic memory is a great challenge:
ontologies, dictionaries (Wordnet), encyclopedias, MindNet
(Microsoft), collaborative projects like Concept Net (MIT) …
Realistic goals?
Different applications may require different knowledge representation.
Start from the simplest knowledge representation for semantic memory.
Find where such representation is sufficient, understand limitations.
Drawing on such semantic memory an avatar may formulate and may
answer many questions that would require exponentially large number of
templates in AIML or other such language.
Adding intelligence to avatars involves two major tasks:
• building semantic memory model;
• provide interface for natural communication.
create 3D human head model, with speech synthesis & recognition, use it to
interact with Web pages & local programs: a Humanized InTerface (HIT).
Control HIT actions using the knowledge from its semantic memory.
Types of memory
Neurocognitive approach to NLP: at least 4 types of memories.
Long term (LTM): recognition, semantic, episodic + working memory.
Input (text, speech) pre-processed using recognition memory model to
correct spelling errors, expand acronyms etc.
For dialogue/text understanding episodic memory models are needed.
Working memory: an active subset of semantic/episodic memory.
All 3 LTM are coupled mutually providing context for recogniton.
Semantic memory is a permanent storage of conceptual data.
• “Permanent”: data is collected throughout the whole lifetime of the
system, old information is overridden/corrected by newer input.
• “Conceptual”: contains semantic relations between words and uses them
to create concept definitions.
SM & neural distances
Activations of groups of neurons presented in activation
space define similarity relations in geometrical model.
Similarity between concepts
Left: MDS on vectors from neural network.
Right: MDS on data from psychological experiments with perceived
similarity between animals.
Vector and probabilistic models are approximations to this process.
Concept Description Vectors
Drastic simplification: for some applications SM is used in a more efficient
way using vector-based knowledge representation.
Merging all types of relations => the most general one:
“x IS RELATED TO y”, defining vector (semantic) space.
{Concept, relations} => Concept Description Vector, CDV.
Binary vector, shows which properties are related or have sense for a
given concept (not the same as context vector).
Semantic memory => CDV matrix, very sparse, easy storage of large
amounts of semantic data.
Search engines: {keywords} => concept descriptions (Web pages).
CDV enable efficient implementation of reversed queries:
find a unique subsets of properties for a given concept
or a class of concepts = concept higher in ontology.
What are the unique features of a sparrow? Proteoglycan? Neutrino?
HIT the Web
Haptek avatar as a plug-in in WWW browser.
Connect to web pages, read their contents,
send queries and read answers from specific fields in web forms.
Access Q/A pages, like MIT Start, or Brainboost that answer reasonably to
many questions.
“The HAL Nursery”, “the world's first Child-Machine Nursery”,
Ai Research, is hosting a collection of “Virtual Children”, or
HAL personalities developed by many users through conversation.
HAL is using reinforcement learning techniques to acquire language,
through trial and error process similar to that infants are using.
A child head with child voice makes it much more interesting to play with.
Haptek heads may work with many chatterbots, we focus on use of SM.
Several word games with our head are here:
Talking Head
SM is the brain, HIT needs a talking head and voice interface.
Haptek’s PeoplePutty tools have been used (inexpensive) to create
a 3-D talking head; only the simplest version is used.
Haptek player is a plugin for Windows browsers, or embedded
component in custom programs; both versions were used.
High-fidelity natural voice synthesis with lips synchronization may be
added to Haptek characters.
Free MS Speech Engine, i.e. MS Speech API (SAPI 5) has been used to
add text to speech synthesis and speech to text voice recognition.
OGG prerecorded audio files may be played.
Haptek movements, gestures, face expressions and animation sequences
may be programmed and coordinated with speech using JavaScript,
Visual Basic, Active-X Controls, C++, or ToolBook.
Result: HIT that can interact with web pages, listen and talk, sending
information both ways, hiding the text pages from the user.
Interaction with Web pages is based on Microsoft .NET framework.
HIT – larger view …
T-T-S synthesis
Brain models
Speech recognition
HIT projects
Talking heads
Cognitive Architectures
VR avatars
DREAM architecture
databases interface
Natural input
Text to
Control of
DREAM is concentrated on the cognitive functions + real time control, we plan to
adopt software from the HIT project for perception, NLP, and other functions.
Distance calculation
Euclidean distance used for binary Yes/No answer,
otherwise the distance ||K–A|| is:
K  A 
K i  Ai
where |Ki–Ai| depends on the type of relation Ki and answer Ai:
- if either Ki or Ai is Unknown then |Ki–Ai|=0.5
- if either Ki or Ai is Not Applicable then |Ki–Ai|=1
-otherwise Ki and Ai are assigned numerical values:
-Yes=1, Sometimes = 2/3, Seldom = 1/3, No = 0
CDV matrix for a single ontology reduced to animal kingdom was initially
used to avoid storage size problems.
The first few steps find keywords with IG≈1.
CDV vectors are too sparse, with 5-20, average 8, out of ~5000 keywords.
In later stages IG is small, very few concepts eliminated.
More information is needed in the semantic memory! Active dialogs.
Semantic memory
Applications, eg.
20 questions game
Humanized interface
Part of speech tagger
& phrase extractor
On line dictionaries
Active search and
dialogues with users
Puzzle generator
Semantic memory may be used to invent automatically a large number of
word puzzles that the avatar presents.
This application selects a random concept from all concepts in the
memory and searches for a minimal set of features necessary to uniquely
define it; if many subsets are sufficient for unique definition one of them is
selected randomly.
It is an Amphibian, it is orange and has black spots.
How do you call this animal?
A Salamander.
It has charm, it has spin, and it has charge.
What is it?
If you do not know, ask Google!
Quark page comes at the top …
Medical applications: goals & questions
• Can we capture expert’s intuition evaluating document’s similarity,
finding its category?
• How to include a priori knowledge in document categorization –
important especially for rare disease.
• Provide unambiguous annotation of all concepts.
• Acronyms/abbreviations expansion and disambiguation.
• How to make inferences from the information in the text, assign values
to concepts (true, possible, unlikely, false).
• How to deal with the negative knowledge (not been found, not
consistent with ...).
• Automatic creation of medical billing codes from text.
• Semantic search support, better specification of queries.
• Question/answer system.
• Integration of text analysis with molecular medicine.
Provide support for billing, knowledge discovery, dialog systems.
Example of clinical summary discharges
Jane is a 13yo WF who presented with CF bronchopneumonia.
She has noticed increasing cough, greenish sputum production,
and fatique since prior to 12/8/03.
She had 2 febrile epsiodes, but denied any nausea, vomiting,
diarrhea, or change in appetite.
Upon admission she had no history of diabetic or liver
Her FEV1 was 73% 12/8 and she was treated with 2 z-paks, and
on 12/29 FEV1 was 72% at which time she was started on Cipro.
She noted no clinical improvement and was admitted for a 2 week
IV treatment of Tobramycin and Meropenem.
Unified Medical Language System (UMLS)
semantic types
“Virus” causes “Disease or Syndrome”
semantic relation
 Other relations: “interacts with”, “contains”, “consists
of” , “result of”, “related to”, …
 Other types: “Body location or region”, “Injury or
Poisoning”, “Diagnostic procedure”, …
UMLS – Example (keyword: “virus”)
 Metathesaurus:
Concept: Virus,
CUI: C0042776,
Semantic Type: Virus
Definition (1 of 3):
Group of minute infectious agents characterized by a
lack of independent metabolism and by the ability to
replicate only within living host cells; have capsid, may
have DNA or RNA (not both). (CRISP Thesaurus)
Synonyms: Virus, Vira Viridae
 Semantic Network:
"Virus" causes "Disease or Syndrome"
Summary discharge test data
Clinical Data
Disease name No. of records
Average size
Reference Data
size [bytes]
Cystic fibrosis
Cerebral palsy
Otitis media
JRA - Juvenile Rheumatoid Arthritis
UTI - Urinary tract infection
Data processing/preparation
MMTx – discovers UMLS concepts in text
Reference Texts
ULMS concepts /feature prototypes/
Filtering - focus on
26 semantic types.
Features - UMLS concept IDs
Clinical Documents
UMLS concepts
Final data
Filtering using
existing space
Values indicate the
actual numbers of
concepts found in:
I – clinical texts
II – reference texts
Data statistics
• 10 classes
• 4534 vectors
• 807 features (out of 1097 found in reference texts)
• Majority: 19.1% (asthma class)
• Content based: 34.6% (frequency of class name in text)
• Very sparse vectors
• Feature values represent term frequency (tf) i.e. the number of
occurrences of a particular concept in text
Model of similarity I
Try to capture some intuitions combining evidence while scanning the
1. Initial distance between document D and the reference vectors
Rk should be proportional to d0k = ||D – Rk||  1/p(Ck) – 1
2. If a term i appears in Rk with frequency Rik > 0 but does not
appear in D the distance d(D,Rk) should increase by ik = a1Rik
3. If a term i does not appear in Rk but it has non-zero frequency Di
the distance d(D,Rk) should increase by ik = a2Di
4. If a term i appears with frequency Rik > Di > 0 in both vectors the
distance d(D,Rk) should decrease by ik = a3Di
5. If a term i appears with frequency 0 < Rik ≤ Di in both vectors the
distance d(D,Rk) should decrease by ik = a4Rik
Model of Similarity II
Given the document D, a reference vector Rk and probability p(i|Ck)
probability that the class of D is Ci should be proportional to:
S (Ck | D; Rk )  1    
 d 0 k   p(i | Ck )  ik  
where ik depends on adaptive parameters a1,…,a4 which may
be specific for each class.
Linear programming technique can be used to estimate ai by
maximizing similarity between documents and reference vectors:
d 0 k   p(i | Ck ) ik  min
with the constrains:
 p(i | C
) ij   p(i | Ck ) ik  d 0k  d 0 j ;
where k indicates the correct class.
k  j 1
SSV dec. tree
MLP (300
(Optimal C)
(0.01) (0.01)
10 Ref. vectors
10-fold crossvalidation accuracies in % for different feature
weightings. M0: tf frequencies; M1: binary data;
M 3 : 1  log(tf )
sij  1  log tf ij log N / df i
M 5:
1  log tf ij
N 
sij  round  10 
1  log l j
df i 
Enhancing representations
A priori knowledge is form of reference prototypes is not sufficient.
Experts reading the text activate their semantic memory and add a lot of
knowledge that is not explicitly present in the text.
Semantic memory is difficult to create: co-occurrence statistics does not
capture structural relations of real objects and features.
Better approximation (not as good as SM): use ontologies adding parent
concepts to those discovered in the text.
IBD => [C0021390] Inflammatory Bowel Diseases =>
-> [C0341268] Disorder of small intestine
-> [C0012242] Digestive System Disorders
-> [C1290888] Inflammatory disorder of digestive tract
-> [C1334233] Intestinal Precancerous Condition
-> [C0851956] Gastrointestinal inflammatory disorders NEC
-> [C1285331] Inflammation of specific body organs
-> [C0021831] Intestinal Diseases
-> [C0178283] [X]Non-infective enteritis and colitis
[C0025677] Methotrexate (Pharmacologic Substance) =>
-> [C0003191] Antirheumatic Agents
-> [C1534649] Analgesic/antipyretic/antirheumatic
Enhancing representations
Experts reading the text activate their semantic memory and add a lot of
knowledge that is not explicitly present in the text.
Co-occurrence statistics does not capture structural relations of real
objects and features, systematic knowledge is needed.
An approximation (not as good as SM): use ontologies adding related
concepts (use parent & other relations) to those discovered in the text.
IBD => [C0021390] Inflammatory Bowel Diseases =>
-> [C0341268] Disorder of small intestine
-> [C0012242] Digestive System Disorders
-> [C1290888] Inflammatory disorder of digestive tract
-> [C1334233] Intestinal Precancerous Condition
-> [C0851956] Gastrointestinal inflammatory disorders NEC
-> [C1285331] Inflammation of specific body organs
-> [C0021831] Intestinal Diseases
[C0025677] Methotrexate (Pharmacologic Substance) =>
-> [C0003191] Antirheumatic Agents
-> [C1534649] Analgesic/antipyretic/antirheumatic
Enhancing representations
MDS for original data
MDS on medical discharge summaries
after two enhancement steps.
. MDS on medical discharge summaries after two enhancement steps
Clusterization on enhanced data
MDS mapping of 4534 documents divided in 10 classes, using cosine
Direct, binarized vectors.
Enhanced by all semantic types, one step (parents only).
Enhanced by selected semantic types, one step.
Enhanced by selected semantic types, two steps.
Clusterization on enhanced data
MDS mapping of 4534 documents divided in 10 classes, using cosine
1. Initial representation, 807 features.
2. Enhanced by 26 selected semantic types, two steps, 2237 concepts
with CC >0.02 for at least one class.
Two steps create feedback loops A  B between concepts.
Structure appears ... is it interesting to experts?
Are these specific subtypes (clinotypes)?
Searching for topics
Discover topics, subclusters, more focused than general categories.
Map text on the 2007 MeSH (Medical Subject Headings) ontology,
more precise than ULMS.
Filter rare concepts (appearing in <1% docs) and very common
concepts (>99% docs); remove documents with too few concepts
(<1% of all) => smaller but better defined clusters.
Leave only 26 semantic types.
Ward’s clustering used, with silhouette measure of clustering quality.
Only 3 classes: two classes that mix most strongly (Pneumonia and
Otitis media), add the smallest class JRA.
Initial filtering: 570 concepts with 1%<tf<99%,1002 documents.
Semantic (26 types): 224 concepts, 908 docs with >1% concepts.
These 224 concepts have about 70.000 ULMS relations, only 500
belong to the 26 semantic types.
Enhancement: very restrictive, only ~25 most correlated added.
Start, iterations 2, 3 and 4 shown, 5 clinotypes may be distinguished.
More semantic relations
Neurocognitive approach to language understanding: use recognition,
semantic and episodic memory models, create graphs of consistent
concepts for interpretation, use spreading activation and inhibition to
simulate effect of semantic priming, annotate and disambiguate text.
For medical texts ULMS has >2M concepts, 15M relations … developing
a System for Unambiguous Concept Mapping in Medical Domain (with
Matykiewicz, Pestian), and ontology for common reason (with Szymanski)
Graphs of consistent concepts
General idea: when the text is read and analyzed activation of semantic
subnetwork is spread; new words automatically assume meanings that
increases overall activation, or the consistency of interpretation.
Many variants, all depend on quality of semantic network, some include
explicit competition among network nodes.
1. Recognition of concepts associated with a given concept:
1.1 look at collocations, and close co-occurrences, sort using average
distance and # occurrences;
1.2 accept if this is a ULMS concept; manually verify if not;
1.3 determine fine semantic types, what states/adjectives can be applied.
2. Create semantic network:
2.1 link all concepts, determine initial connection weights (non-symmetric);
2.2 add states/possible adjectives to each node (yes/no/confirmed …).
GCC analysis
After recognition of concepts and creation of semantic network:
3. Analyze text, create active subnetwork (episodic working memory) to
make inferences, disambiguate, and interpret the text.
3.1 find main unambiguous concepts, activate and spread their activations
within semantic network; all linked concepts become partially active,
depending on connection weights.
3.2 Polysemous words, acronyms/abbreviations in expanded form, add to
the overall activation; active subnetwork activates appropriate meanings
stronger than other meaning, inhibition between competing
interpretations decreases alternative meanings.
3.3 Use grammatical parsing and hierarchical semantic types constraints
(Optimality Theory) to infer the state of the concepts.
3.4 Leave only nodes with activity above some threshold (activity decay).
4. Associate combinations of particular activations with billing codes etc.
More conclusions
Neurocognitive NLP leads to interesting inspirations
(Sydney Lamb, Rice Univ, quite general book).
Creation of novel interesting words is possible
at the human competence level, opening a new
vista in creativity research and suggesting new experiments.
Specific (drastically simplified) representation of semantic knowledge is
sufficient in word games and query precisiation applications.
Various approximations to knowledge representation in brain networks
should be studied: from the use of a priori knowledge based on reference
vectors, through ontology-based enhancements, to graphs of consistent
concepts in spreading activation networks.
More work on semantic memory for common sense and
specialized applications is needed.
Sessions on Medical Text Analysis and billing annotation challenge,
April 1-5, 2007, IEEE CIDM, Honolulu, Hilton Hawaiian Village Hotel.
Google: Duch => Papers

neurocognitive inspiraitons in Natural Language Processing