Cognitive corpus-based LSP
lexicography – research and
implementation issues –
a case study on the Multilingual
Glossary on Risk Management
Gerhard Budin
University of Vienna
Austrian Academy of Sciences
8th of April, 2011
Our empirical research landscape
The Making of…
a Multilingual Glossary on
Risk Management
Motivations and Methods:
Terminologies for Risk Communication
• The Role of LSP Lexicography in domain communication
– Increasing the “transparency” of terms
– Help negotiate a common understanding of terms in intra-,
inter- and trans-disciplinary and transcultural discourse
– Help increase the consistency of risk discourse (written
and spoken) and increase understanding in target
– Reduce unnecessary synonyms, disambiguate polysems,
help separate homonyms
– Help create risk terminologies in many languages
– Support knowledge sharing and knowledge transfer in
cooperative work environments
– Support cross-cultural discourse (e.g. translation and
parallel texts)
The Domains of Risk Management
• Multidisciplinary, diverse, and fragmented - or
• Transdisciplinary, overlapping, converging,
integrated, and complementary
• The need for mediating between different
approaches, cultures, and discourses:
Technological, engineering, research, science
Administration, legislation, monitoring
Social, sociological, political, cultural
Domain approaches (financial, ecological, chemical,
safety, geographical, planning and forecast, health, etc.)
WIN Project (FP6 2004-2009):
WP “Human Language Interoperability”
• Objectives
– WP 2200 is designed to support international risk
management and risk communication processes (within
the WIN project and beyond)
• Achieved results (with ongoing work)
– Large parallel corpora collection with risk-related texts and
lexical resources (fr, en, de, es, ro, fi, hu, ru)
– Multilingual index with conceptual structure
– Bibliography and codes of sources
– Risk Ontology
– Multilingual online terminology database
Integrative R&D Approach
• A combination of theoretical approaches and their methods
in order to achieve a result that is targeted towards the needs
of the project consortium and the cooperation partners
– Quantitative (computational) and qualitative (intellectual)
methods of corpus analysis
– Lexicographical and terminographical (word/text-oriented
and concept/knowledge-oriented)
– Text linguistics and translation studies
– Cross-cultural comparative approach and knowledge
system approach, multi-domain communication
– Knowledge engineering, computational semantics/Web 2.0
(ontologies, frame semantics, etc.)
– Cognitive Science approach (media pedagogy – eLearning,
specific learner support, interactive approach (mental
lexicon), usability engineering
Motivation and Convergence of
Research Interests and Contexts
• Interest in cognitive science research applied to
terminology management, ontology engineering,
translation technologies, E-Learning systems design and
• Research Cluster 1 “Translation – Cognition –
Technologies” at the Center for Translation Studies,
University of Vienna
• Interdisciplinary Research Platform on Cognitive Science
– Cluster on Cognitive Linguistics
• Research Priority 1 Lexicology, Terminology, and Parallel
Corpora at the Institute for Corpus Linguistics and Text
Technology at the Austrian Academy of Sciences
Research contexts in several projects
• Previous and ongoing projects
– Dynamont
• Methodology for Creating Dynamic Ontologies, BMVIT, national research
programme “Semantic Systems” – multi-dimensional ontology modelling
– WIN (Wide Area Information Network on Risk Management) MGRM Multilingual
Glossary on Risk Management
• IP (Integrated Project) in FP6, 2004-2008, focus on creating a multilingual
terminology and ontology of risk management – risk ontology for natural hazards
– Montific - Multilingual ONTology for Internal Financial Control, a LLP project (Leonardo
da Vinci II)
• Building a “learning ontology” for an eLearning environment
PERSPECTIVE - European Science Foundation: COST A 31 project
• cognitive linguistics – how “classifiers” are embodied in language incl. ontologies
– TES4IP - Terminology Services for the Intellectual Property Domain (Bridge project
funded by FFG, Austrian Research Agency
• Term extraction, multi-word term recognition, named entity recognition, legal
vocabularies and legal ontologies
• -> Ongoing study
– Cognitive Ontologies
• Designing, Generating and Using Domain Ontologies
Ontology Engineering
and Cognitive Science
• Cognitive Aspects have been of interest in a variety of
ontology engineering approaches
– Barry Smith
• Epistemological focus combined with work on domain ontologies (mainly
• Criticizing the epistemological foundations of terminology theory in
elaborating his foundational theory of ontology
– Aldo Gangemi
• DOLCE: Descriptive Ontology for Linguistic and Cognitive Engineering
• Foundational theory of ontology
• Many projects, also on tools and on domain ontologies
– But also many others (Guarino, Sheth, Obrst, Noy, et al) have
done research on these aspects
– Some criticism, that the focus in ontology evaluation is on
syntactic evaluation for computational uses (only) – the
classical scenario
“Cognitive Ontologies”
• Conceptual clarification:
– Ontologies of cognitive processes
• In neuroscience research, similar to other bio-medical
ontologies (cognitive atlas, neuropsychiatric
phenomena, ontology of cognitive objects, etc.)
– Ontologies with a focus on their cognitive aspects
• DOLCE and other cognitive-oriented approaches
• Constructivist epistemology for ontology building,
concerning the relation to “reality”
• Increasing convergence of these two concepts
Our own research
• Our previous and ongoing projects have been focusing
on cognitive adequacy of domain ontologies and their
use in knowledge acquisition in learning situations
– Terminology studies as a contribution from this
perspective (related research by Nistrup Madsen/Erdman
Thomsen 2005, 2009, etc.)
• Using DOLCE design patterns for multi-dimensional
conceptual modeling for ontology building
– the DYNAMONT project
• From domain corpora to terminologies and from there
to domain ontologies
– for eLearning scenarios – the MONTIFIC project
– For domain experts – the WIN/MULTH/MGRM project
Moving up (and down) the Ontology Spectrum
• The challenge: from linguistic-cultural diversity of discourse and free-form
lexical structures to a unified, formalized, axiomatized ontology – and back,
to support human understanding and social processes such as collaborative
• The method: an integrative, multi-level modelling approach specifying the
steps in a process-oriented workflow framework (with variable, combinable
steps depending on concrete needs) for
Gradual semantic enrichment
Gradual semantic formalization
Multi- and cross-lingual referencing/alignment for text management
Constant interaction between full texts and lex-term resources
• The technology: a multi-component workbench (i.e. Dynamont-WB incl.
ProTerm as a central element), using XML, RDF, OWL, SKOS, WordNet +
GlobalWordnet, MLIF (containing TBX, TMX, XLIFF, LMF, TMF, etc.),
FrameNet, etc.
• The advantage: full exploitation of all types of languages resources (LR) and
knowledge organization systems (KOS), providing a framework not only for
their semantic enrichment and formalization as ontologies but also for
ontology-based multilingual authoring, text generation and translation
The global risk communication scenario
• Several projects since 1994 covering the following activities:
Thesaurus building
Creating multilingual terminology databases
Creating multilingual text corpora
Lexicographical glossary
Semantic enrichment (e.g. conceptual links, frame semantics)
Collection and analysis of relevant knowledge organization systems
Annotation of resources
Mark-up of resources (TBX, etc.)
Ontology building
Communication design
From texts and terminologies to ontologies
- and back to texts
• Using the Risk scenario
– Termbase
• Export XML
• Domain Models – meta-models -> patterns
– Text corpus
• Term extraction – comparative testing ProTerm, MultiTerm Extract,
• Aligning with termbase
• Convert to RDF
– Ontology import -> editor
– Mappings (GMT, XML, RDF, OWL, UML, comma delimited, RDB, for
different kinds of lex-term resources, FN->OWL, etc.)
• The MULTH-WIN Project as an example of methods
Terminological frame semantics
R-PERCEPTION (X is risk)
EXPERIENCE (statistics, case studies)
OBSERVATION (monitoring)
SITUATION/CONTEXT (danger/hazard)
SIMULATION (course of events)
SUSCEPTABILITY (capacity/people)
Terminological frame semantics
I. Pre-event B. Public awareness and planning, II. In-event: C. Events and
afflux/Hochwasser durch Aufstau
BE [[TYPE=flood], [PLACE=], [TIME=]],
BE [[TYPE=flood], [PLACE=], [TIME=]],
Ordnance Survey
Dynamont architecture, tools and workflows
Ontology Creation
Phase 1: Identify the Problem
Phase 2: Structure the Problem
Phase 3: Identify Purpose and
Phase 4: Identify concepts of
domain / subject matter
Phase 5: Create Knowledge Model
Phase 6: Create Application Profile
Phase 7: Create Acceptance
Phase 8: Create System
Phase 9: Implement System
The Glossary
• The paper version of the glossary is used by risk managers, civil
engineers, but also teachers, students, translators, journalists, etc.
• Generally, the purpose of such multilingual conceptual glossaries is to
improve domain communication and to facilitate mutual understanding
across linguistic boundaries.
• The concepts of risk management and their definitions presented in
this glossary were carefully selected from a large body of technical
literature and authentic text corpora in the respective languages.
• These sources are referenced in the bibliography.
• The multilingual glossary presented here includes 8 languages: English
and French as main pivot languages, as well as German, Spanish,
Romanian, Finnish, Hungarian, and Russian.
• It comprises about 230 central concepts of risk management with
about 400 definitions and about 1400 terms representing these
concepts in each language (including synonyms and hyperonyms),
indicating the conceptual relations between the entries.
The Glossary
• The following themes are used as the macro-structure of the glossary:
A. Risk assessment and technology assessment
B. Public perception of risk, planning, preparation and alarm,
C0. Risk events, equipment and operations, general terms
C1. Fire - events, equipment and operations
C2. Floods - events, equipment and operations
C3. Oil spills - events, equipment and operations.
• Each glossary entry follows the same micro-structure with the
following information elements:
– A conceptual number combined with a theme from the macro-structure
– The equivalent terms in the 8 languages, accompanied by grammatical
– The definitions of the concept in each language, including multiple
definitions that may differ from each other, accompanied by the textual
source of the definition, also including structural semantic information on
the concept
– Related terms and expressions.
Research issues
Experimental settings
User studies, user modelling
Data modelling
Multilingual – multi-domain – cross-cultural
Knowledge dynamics - Dynamic knowledge representations
Cognitive studies
Conclusions and Outlook I
• Online terminology database is continuously used
• 8-language Glossary Version produced in February 2011
• Next steps in 2011:
– Work in progress!
– Database to be extended from 5 to 8 languages
– Full text corpora to be extended
– Promotion of the glossary in different user communities
– Term extraction, research
– Extension into more languages
– More scientific publications
Conclusions and outlook II
• Research perspectives
– Further research in
• Cognitive ontologies
• User modelling, usability of terminological databases
and LSP dictionaries
• Corpus-linguistic research – semantic annotation
• Multilingual, multi-domain, cross-cultural issues
Selected References
Budin, G. Socio-terminology and computational terminology – toward an
integrated, corpus-based research approach. In: De Cilia, Rudolf et al. (eds.).
Discourse, Politics, Identity. Tübingen: Stauffenburg Verlag, 2010, 21-31
Budin, G. Semantic Systems supporting Cross-Disciplinary Environmental
Communication. In: Hryniewicz, O.; Studzinski, J.; Szediw, A. (eds.). Environmental
Informatics and Systems Research. Vol 2 Workshop and application papers.
EnviroInfo 2007. Aachen 2007, 23-30
CEDIM , Center for Disaster Management and Risk Reduction Technology c/o
University of Karlsruhe (2005). Glossar: Begriffe und Definitionen aus den
Gangemi DOLCE
Greciano, G. (2001). L'harmonisation de la terminologie en Sciences du Risque. In
Proceedings of Security Conference, Montpellier XII. Council of Europe-FER.
Strasbourg, France.
Greciano, G. (2001). Les sciences du risque: convergences interculturelles. In
Proceedings of Risk Conference, Strasbourg X. Council of Europe-FER. Strasbourg,
Greciano, G. (2001). Pour un glossaire combinatoire plurilingue du Risque.
Proceedings of Risk-Conference, Mèze V. Council of Europe-FER.Strasbourg,
Massué, J.P. (2001). "Mobilisation de la Communauté scientifique au service de
l'amélioration de la gestion des risques". Mèze, FER-EUR-OPA.Strasbourg
Nistrup Madsen/Erdman Thomsen 2005, 2009
Français / Allemand / Anglais / Espagnol / Roumain / Finlandais / Hongrois /
édité par Gertrud Gréciano, Gerhard Budin, Danielle Candel, John Humbley
avec le soutien de la Commission de l’Union Européenne, des Universités de Strasbourg, Vienne,
Helsinki, de la Région Alsace, de la Délégation générale à la langue française et aux langues de
France, et de l’Académie des Sciences d’Autriche.
Auteurs: Gertrud Gréciano (Strasbourg), Gerhard Budin (Vienne),
Annely Rothkegel (Chemnitz), Ulrike Hass (Essen)
Traducteurs: Cornelia Cujba (Iasi), Attila Frigyer (Budapest), Luis Gonzalez (Caracas-Paris),
Csilla Höfler-Bornemisza (Vienne), Annikii Liimatainen (Helsinki),
Alexei Milko (Strasbourg-Moscou)
Coopération scientifique et technique: Steffi Baumann (Chemnitz), Aban Budin (Vienne),
Christian Burghard (Chemnitz), Dimitrij Dobrovolskij (Moscou-Vienne), Eva Haas
(Munich-Ispra), Natalia Jonkova (Moscou), Andra Moga (Iasi-Vienne), Maren Runte
(Essen), Julia Steuber (Essen), Virginie Tombeux (Paris), Elena Volgina (Moscou)
Thank you for your attention
Gerhard Budin
Center for Translation Studies
University of Vienna
Institute of Corpus Linguistics and Text Technology
Austrian Academy of Sciences
[email protected]

Vortrag WU Symposium