The role of standards in the LT
community – an ISO perspective
Should we/you be afraid of standards?
Laurent Romary
Overview
• General background on standardization
• Standardization at work
– Basic standards - Languages and characters
• Providing references and definitions
– Terminologies, lexica, even more semantics…
• TMF, LMF, Dacts…
Quiz
• Standards being mentioned in this
presentation
– ISO 639, ISO 646, ISO 10646, ISO 12620, ISO
16642, ISO 3166, ISO 8859, ISO 15924, ISO
24613, ISO 24617(-2), ISO 8601
• Give yourself a mark:
– I know…. of these standards (out of 11)
Standardization
• Defining methods or models to facilitate:
– Exchange of data
– Interoperability between software components
– Comparability of results
• Involves
– From a scientific and technological point of view
• Stabilizing existing practices, knowledge
• Looking ahead for potential roadblocks (generalizations)
– From an organizational point of view
• International consensus, long term availability and maintenance
• Vertical vs. horizontal standardization
An example scenario: information extraction
Semantic content
Content analysis
Syntactic structures
Chunk parsing
Part-of-speech
tagging
POS tagging
Seite 5
Primary
Data
Horizontal view (W3C perspective)
Semantic content
OWL
Content analysis
XML
Syntactic structures
Chunk parsing
SOAP
Part-of-speech
tagging
POS tagging
Seite 6
Primary
Data
RDF
Vertical view (ISO/TC 37/SC 4 perspective)
Semantic content
Content analysis
Evaluation
Linguistic
models and
descriptors
(Data Categories)
Syntactic structures
Chunk parsing
Part-of-speech
tagging
POS tagging
Seite 7
Primary
Data
Lexica
Standards: a complex picture
• Official standardization bodies
– National: AFNOR, ANSI, DIN, BSI, MSA
– International: ISO, IEC, CEN, W3C, OASIS
• Specific fora
– Many! e.g.
• TEI (Text Encoding Initiative)
• LISA (Localization Industry Standards Association)
• Projects with a pre-normative purpose
– e.g. in Europe:
• EAGLES, Multext, MATE, ISLE, Lirics, Kyoto
ISO in short
• International Organization for
Standardization (http://www.iso.org)
– Administrative view
• Federation of national standardization bodies
– Technical view
• Organized in technical committee and subcommittees
– ISO technical committees
ISO process
CD = Committee Draft
DIS = Draft International Standard
DPAS = Draft Publicly Available Specification
DTR = Draft Technical Report
DTS = Draft Technical Specification
FDIS = Final Draft International Standard
IS = International Standard
NP = New Work Item Proposal
PAS = Publicly Available Specification
TR = Technical Report
TS = Technical Specification
WD = Working Draft
Components of an ISO standard
1.
2.
3.
4.
Scope
Normative references
Terms and definitions
Actual constraints to be compliant to the
standards
5. …
• An ISO standard is not a tutorial…
Example - scope
• ISO 8601:2004 Data elements and interchange formats -- Information
interchange -- Representation of dates and times
• Scope
ISO 8601:2004 is applicable whenever representation of dates in the
Gregorian calendar, times in the 24-hour timekeeping system, time
intervals and recurring time intervals or of the formats of these
representations are included in information interchange.
Some elementary standards
Illustrating the role of ISO
• Providing unique references
– Language, locale and script coding
• Providing definitions and principles
– Character encoding
• Standard as an evolving material
Languages
•
ISO 639:1988, Code for the representation of names of languages. Part 1:
Alpha-2 codes
– Two-letter language symbols
•
ISO 639-2: Code for the representation of names of languages. Part 2: Alpha3 codes
– Three-letter language symbols (lower case)
en/eng = English
fr/fra = French (français)
es/esp = Spanish (español)
de/deu = German (Deutsch)
• A limited language repertoire
– A lot of “peripheral” languages are not registered
• Cf. Ethnologue http://www.sil.org
– Now in ISO 639-3, 639-5, 639-5
– ISO 639-4: description of languages
Complementary standards
• Country codes
– ISO 3166: Code for the representation of names of
countries
• Two-letter country symbols (uppercase)
GB = Great Britain, US = United States, FR = France, RO = Romania
• Scripts
– ISO 15924 - Codes for the representation of names of
scripts
• Script: set of graphic characters used for the written form of
one or more languages
• See ISO 15924 codes
Examples
• Simple language codes:
– de (German)
– fr (French)
– ja (Japanese)
• Language-Region:
– de-DE (German for Germany)
– zh-SG (Chinese for Singapore)
– cs-CS (Czech for Czechoslovakia)
• Language code plus Script code:
– zh-Hant (Traditional Chinese)
– en-Latn (English written in Latin script)
– sr-Cyrl (Serbian written with Cyrillic script)
Representing (written) languages
diving into character issues
Basic definitions
• Character repertoire
– Set of distinct characters, defined independently of any coding or ordering
rule/procedure
• Each character is defined by a name and a reference shape
– A: Latin capital A, Cyrillic capital A, Greek capital A
• Character code
– One to one association between a character repertoire and a set of positive
integers
• Notion of position
•
Character encoding
– Method (algorithm) to represent in electronic form (as a sequence of
bytes) of a character code
• Simple case (When the code is defined within [0-256])
– The integer code is associated to its standard representation as a byte
Example
• Character repertoire
• “a”, “!”, “ä”, “‰”
• Character codes
– ISO 10646
• 97, 33, 228, 8240
• Encoding
– As two bytes
• 0 97, 0 33, 0 228, 32 48
Some archaeology…
• ASCII - American Standard Code for Information
Interchange
– Combines repertoire, codes and encoding
– The ASCII code also contains control characters
• E.g. CR, LF, ESC, TAB
– Repertoire
0
@
P
`
P
!
1
A
Q
a
q
"
2
B
R
b
r
#
3
C
S
c
s
$
4
D
T
d
t
%
5
E
U
e
u
&
6
F
V
f
v
'
7
G
W
g
w
(
8
H
X
h
x
)
9
I
Y
i
y
*
:
J
Z
j
z
+
;
K
[
k
{
,
<
L
\
l
|
=
M
]
m
}
.
>
N
^
n
~
/
?
O
_
o
ASCII : overview
• Character codes
– One to one association of a number from 32 (“ ”) to 126
(“~”) following the order in the preceding table
– Positions from 0 to 31, as well as 127 are kept for
« standardized » control characters
• Character encoding
– Codes are represented by their standard byte
representation
– No specific use is made of codes between 128 and 255
(parity)
From a standardization
point of view
• United states (US-ASCII)
– ANSI X3.4-1986
• International (ISO/IEC JTC1/SC2/WG3)
– ISO 646
• Introduces flexibility for some positions in the code
–# $ ^ ` ~
• Some positions are kept for “national usage”
–@ [ \ ] { | }
• IRV (1991 edition): International Reference Version = USASCII
Next step…
• ISO Latin 1, alias ISO 8859-1
– One member in a family of standards (ISO 8859)
– Defines:
• A character repertoire
– Alphabet latin n° 1 (ISO Latin 1)
• The corresponding codes
– Where ASCII is seen as a sub-set
• Encoding
– Same as ASCII (byte encoding of integers from 0 to 255)
ISO 8859-1
• Additional characters
– Codes from 160 to 255
¡ ¢ £ € ¥ | § ¨ © ª « ¬ – ® ¯
° ± 2 3 ´ µ ¶ · ¸ 1 º » * * * ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
‹ Ñ Ò Ó Ô Õ Ö x Ø Ù Ú Û Ü † fi ß
à á â ã ä å æ ç è é ê ë ì í î ï
› ñ ò ó ô õ ö ÷ ø ù ú û ü ‡ fl ÿ
– Rem.:
• Positions from 128 to 159 are kept for control characters
– E.g. Windows code page 1252, windows-1252
• Code 160: no-break space
The rest of the family
• ISO 8859 from a wider perspective
– The same principles as those of ISO 8859-1 are
used to describe other repertoires
– ISO 8859-2 (ISO Latin 2)
• Slavic languages from centre and eastern Europe
– ISO 8859-15 (ISO Latin 9)
• €!
– Etc.
The whole family…
ISO 8859-1, Latin alphabet No. 1, Western", "West European"
ISO 8859-2, Latin alphabet No. 2, "Central European", "East European"
ISO 8859-3, Latin alphabet No. 3, "South European"; "Maltese & Esperanto"
ISO 8859-4, Latin alphabet No. 4, "North European"
ISO 8859-5, Latin/Cyrillic alphabet, (for Slavic languages)
ISO 8859-6, Latin/Arabic alphabet (for the Arabic language)
ISO 8859-7, Latin/Greek alphabet (for modern Greek)
ISO 8859-8, Latin/Hebrew alphabet (for Hebrew and Yiddish)
ISO 8859-9, Latin alphabet No. 5, "Turkish"
ISO 8859-10, Latin alphabet No. 6, "Nordic" (Sámi, Inuit, Icelandic)
ISO 8859-11, Latin/Thai alphabet, (for the Thai language; draft)
(Part 12 has not been defined.)
ISO 8859-13, Latin alphabet No. 7, Baltic Rim
ISO 8859-14, Latin alphabet No. 8, Celtic
ISO 8859-15, Latin alphabet No. 9, "euro"
ISO 8859-16, Latin alphabet No. 10, for a collection of languages
ISO 8859 tables
• ISO 8859-5 (Cyrillik)
» Bulgarian (bg), Byelorussian (be), Macedonian (mk),
Russian (ru), Serbian (sr)
Towards a universal
representation of characters
• ISO/IEC 10646 (UCS)
–
–
–
–
An international standard
UCS: Universal Character Set
An extensible character repertoire associated to a code
Underlying abstract model
• Unicode
– An industry consortium standard
– Defines a character repertoire and a code made compatible with
that of ISO 10646
• Provides additional constraints on character usage
Structure of ISO/IEC 10646
4 byte encoding
A = 00 00 00 41
Structure of ISO/IEC 10646
(cont.)
– The character code is identified by:
• Group - Plane - Row - Cell
– BMP - Basic Multilingual Plane
• Group = 0, Plane = 0
• Corresponds to a two byte encoding seen as four
zones
A
alphabets, symbols, phonetic section of CJK, hangul...
0000 à 4DFF
19903 positions
I
Unified representations of ideograms (CJK)
4E00 à 9EFF
20992 positions
O
Reserved for future use
A000 à DFFF
16384 positions
R
Private use, compatibility zone, arabic special forms
=restricted use section
E000 à FFFD
8190 positions
Example : IPA
(International Phonetic
Alphabet)
U+0250..U+02AF
Encodings (for BMP/Unicode)
• Reference encoding
– UCS-2
• Representation of characters as a sequences of two bytes
• Alternative
– UTF-8
• Codes below 128 are represented as one byte (7 bits, cf. ASCII
codes)
• Other codes are represented as a sequence of 2 to 6 bytes
(belonging to [128,255])
Summary
• We are close to a stable picture for
character representation
– 30 years to acheive this!
• General idea of the standardisation process
– Combines:
• Identification of existing practices
• Abstraction to cope for additional needs
Linguistic information sources
…and initiatives
Primary resources
(text, dialogues)
Structural mark-up
Basic annotations
[TEI, MPEG7, TMX,
XLIFF, XHTML, etc.]
Access protocols
[WDSL, SOAP]
Links
Knowledge structures
Hierarchies of types
Relations between concepts
(subjects/topics etc.)
Links to primary resources
[Topic Maps, OWL, RDF]
NLP structures
Lexical structures
(annotations)
(Language models)
POS tagging
Terminologies
Chunks (cf. Named Entities)
Transfer lexica
Deep Syntactic structures
LTAG/HPSG/LFG lexica
Co-references etc.
[TBX, OLIF,
[Eagles/ISLE,
Meta-data
Eagles/ ISLE (Genelex)]
CES, MATE,…]
[Dublin core, OLAC,
ISLE, MPEG7, RDF]
ISO committee on language
resources
• ISO TC37 - Terminology and other language resources
– SC 2 - Terminographical and lexicographical working methods
•
ISO 639 series (Codes for the representation of names of languages)
– SC 3 - Systems to manage terminology, knowledge and content
•
ISO 12200 - Martif
– Latest version of TEI Terminology chapter
•
•
ISO 12620 - Data categories (under revision)
ISO 16642 - TMF (Terminological Markup Framework)
– SC 4 - Language Resource Management (May 2002)
•
•
Sec.: K.-S. Choi, Chair.: L. Romary
http://www.tc37sc4.org
ISO/TC 37/SC 4 projects
ISO 24610-1:2006 Feature structures -- Part 1: Feature
structure representation
ISO/CD 24612 Linguistic annotation framework (LAF)
ISO/NP 24619 Citation of Electronic Resources (CitER)
ISO/DIS 24610-2 Feature structures -- Part 2: Feature
system declaration
ISO/WD 24616 Multi lingual information framework
(MLIF)
ISO 24613:2008 Lexical markup framework (LMF)
ISO/DIS 24611 Morpho-syntactic annotation
framework (MAF)
ISO/CD 24615 Syntactic annotation framework
(SynAF)
ISO/DIS 24617-1 Semantic annotation framework
(SemAF) -- Part 1: Time and events
ISO/CD 24617-2 Semantic annotation framework
(SemAF) -- Part 2: Dialogue acts
ISO/CD 24614-1 Word segmentation of written texts for
mono-lingual and multi-lingual information processing - Part 1: General principles and methods
ISO/WD 24614-2 Word segmentation of written texts
for mono-lingual and multi-lingual information
processing -- Part 2: Word segmentation for Chinese,
Japanese and Korean
General modeling framework
• Meta-model
– General, underlying model that informs current
practice
• Data-categories
– Provides to precise semantics of the format
– Obtained:
• By sub-setting a Data Category Registry (cf. ISOcat)
• By providing application specific categories
Data Category
• Definition
– Elementary descriptor used in a linguistic description or annotation
scheme
• Examples
– Fields: /part of speech/, /grammatical gender/
– Values: /feminine/, /plural/, /dual/, /ablative case/
• Role
– Specification
– Documentation
• A reference space for schema designers
– Towards an international registry for language resources
• Data Category Registry (DCR); cf. ISO 12620
Relation to ISO 11179 (not in the quiz)
Set of Simple datcats
Complex datcat
/gender/
Data element concept
Conceptual domain
Data element
Value domain
XML object
/masculine/
/feminine/
/neuter/
List of values
Implemented as an XML
attribute named ‘gen’
m, f, n
XML schema declaration
<w lemma=“vert” gen=“f”>verte</w>
The TC 37 model — ISO 12620
Entry Identifier:
Profile:
Definition (fr):
les
grammatical gender
morpho-syntax
Catégorie grammaticale reposant, selon les langues et
systèmes, sur la distinction naturelle entre les sexes ou
sur
des critères formels (Source: TLFi)
Definition
(en):
Grammatical
category…
(Source:
TLFi
(Trad.))
Object Language:
fr
Object
Language:
en
Object
Language: de
Conceptual
Domain:
{/feminine/,
/masculine/, /neuter/}
Name: genre
Name:
gender
Name: Geschlecht
Conceptual Domain:
Name: grammatical
Name: Genus
{/feminine/,
gender
Conceptual Domain:
/masculine/}
{/feminine/,
/masculine/, /neuter/}
03.10.2015
Seite 41
Application to lexical structures
Episode 1: TMF — Terminological
Markup Framework (ISO 16642)
ISO 16642: A family of formats
TMF
TML1 TML2
(Geneter) (TBX)
TML3
GMT
…
TMLi
Meta-model
TMF: example
TE
id=‘ID67’
subjectField=‘ manufacturing ’
definition=‘A value…’
LS
lang=‘ en ’
LS
lang=‘ hu ’
TS
term=‘alpha smoothing factor’
termType=‘fullForm’
TS
term=‘…’
Implementation in TBX
(cf. www.lisa.org)
<termEntry id='ID67'>
<descrip type='subjectField‘>manufacturing</descrip>
<descrip type='definition'>A value between 0 and 1 used in
...</descrip>
<langSet lang='en'>
<tig>
<term>alpha smoothing factor</term>
<termNote type='termType'>fullForm</termNote>
</tig>
</langSet>
<langSet lang='hu'>
<tig>
<term>Alfa ...</term>
</tig>
</langSet>
</termEntry>
Application to lexical structures
Episode 2: LMF — Lexical Markup
Framework (ISO 24613)
LMF as an ISO project
• Summer 2003: new work item proposal (US) delegation
• Fall 2003: technical proposal (FR) for a data model
dedicated to NLP lexica
• ISO 24613
- Convenor:
• Nicoletta Calzolari (IT)
– Editors:
• Gil Francopoulo (FR), Monte George (US)
– 13 versions written, dispatched (to the National delegations
nominated experts), commented and discussed in various ISO
technical meetings
•
IS (= published standard) in oct. 2008
Tubingen 2007
Lex-Sem & Onto-Resources
49
LMF structural data model
• One core package and 8 extensions
Core Package
Constraint Expression
Morphology
NLP Syntax
NLP Semantic
NLP Paradigm class
MRD
NLP Multilingual notations
NLP MWE pattern
data model: core package
Global Inform ation
Lexical Resource
1
1
1
1..*
Lexicon
1
1..*
Form
1..*
Lexical Entry
1
0..1
1
0..*
0..*
1
Form Representation
Sense
0..*
Tubingen 2007
Lex-Sem & Onto-Resources
51
data model: Morphology
Lexicon
1
1
Lexical Entry
0..1
List Of Com ponents
1
1
1
1
0..*
0..1
1
Lem m a
0..*
{ordered}
2..*
Com ponent
0..*
1
0..*
{ordered}
0..*
Word Form
Sense
Form
0..*
Related Form
1
0..*
Stem OrRoot
Derived Form
Referred Root
0..*
Tubingen 2007
0..*
0..*
Morphological FeaturesLex-Sem
Form Representation
& Onto-Resources
52
data model: Syntax
Lexicon
1
1
Lexical Entry
0..1
1
0..*
0..*
0..*
Syntactic Behaviour
0..*
0..1
Lexem e Property
0..1
Sense
0..*
0..*
1
0..*
0..*
Subcategorization Fram e
1
0..*
0..*
0..*
0..*
0..*
Subcategorization Fram e Set
0..*
1
0..*
0..*
0..*
0..*
{ordered}
0..1
SynArgMap
0..*
2
Syntactic Argum ent
SynSem ArgMap
1
Tubingen 2007
Lex-Sem & Onto-Resources
0..*
Described in Semantic package
53
Representing complex semantic
content
ISO WD 24617-2 Semantic
annotation framework — Part 2:
Dialogue acts
Example dialogue
I: Trains Information Service, good morning.
C: Hello, John Brown, I would like to know what time the next train to Tilburg
leaves.
I:
Just a moment please.
C: OK.
I:
Hello?
C: Yes.
I:
The next train will be at 10.32
C: 10.32
I:
That’s right.
C: OK, thanks very much.
I:
You’re welcome.
C: Bye.
I:
Bye.
Language as action
I: Trains Information Service, good morning.
self-introduction; greeting
C: Hello, John Brown, I would like to know what time the next train to Tilburg
leaves.
return greetting; self-introduction; indirect question
I:
Just a moment please.
stalling
C: OK.
positive feedback
I:
Hello?
contact check
C: Yes.
contact confirmation
I:
The next train will be at 10.32
answer
C: 10.32
check
I:
That’s right.
confirmation
C: OK, thanks very much.
positive feedback; thanking
I:
You’re welcome.
downplayer
C: Bye.
inital farewell greeting
I:
Bye.
return farewell greeting
Dialogues acts - context
• Dialogue acts are widely used
– in studies of dialogue phenomena
– in dialogue annotation efforts
– in the design of dialogue systems
• Objective: interpretation of dialogue behaviour
– statements, questions, promises, requests, etc.
– cf. speech act theory: Austin, 1962; Searle, 1969
– dialogue act theory is a data-driven approach to the
computational modeling of language use in dialogue.
Annotation schemes
• Specific projects
– TRAINS, Map task, Verbmobil
– overlapping sets of communicative functions,
overlapping, often mutually inconsistent terminology
• DAMSL: Dialogue Act Markup using Several
Layers
– (Allen and Core, 1995; Core et al., 1998)
– focus on multi-dimensionality and domainindependence
Project status
- Launched as ISO project 24617-2 at SC 4 meeting in Marrakech, 25
May 2008
- Editorial group:
-
Jan Alexandersson (Germany)
Harry Bunt (Netherlands/Belgium) (PL)
Jean Carletta (UK)
Alex Chengyu Fang (China/HK)
Jae-Woong Choe (Korea)
Koiti Hasida (Japan)
Olga Petukhova (Netherlands)
Andrei Popescu-Belis (Switzerland)
Claudia Soria (Italy)
David Traum (USA)
Getting started
• Example: “Do you know what time it is?”
– Changes in the addressee’s information state:
• If: question about the time
– the speaker does not know what time it is
– the speaker would like to know that
• If: reproach to the addressee for being late
– the speaker does know what time it is
• Basic concepts
– Distinctions such as that between a question and a reproach refer to
the communicative function of a dialogue act;
– The entities, their properties and relations that are referred to,
constitute its semantic content.
Dacts – Terms and Definitions
• turn
– Stretch of communicative behaviour produced by one speaker,
bounded by periods of inactivity of that speaker or by activity of
another speaker.
– NOTE After Allwood (2000).
• dialogue act
– Semantic unit in the description of dialogue behaviour ,
characterizing how the information state(s) of the agent(s) at whom
the behaviour is directed are changed when he/they understands
the behaviour .
Dacts – Terms and Definitions (cont.)
• speaker
– Property of a dialogue act, indicating the dialogue participant who
produces the communicative behaviour that expresses the dialogue
act.
• addressee
– Property of a dialogue act, indicating a dialogue participant at
whom the communicative behaviour that expresses the dialogue
act is directed.
• overhearer
– Participant in a dialogue who witnesses a dialogue act and whose
information state may be affected by it, without being an addressee
of the dialogue act.
Multifunctionality
1. U: Can you tell me what time is the first train
to the airport on Sunday?
2. S: The first train to the airport on Sunday
morning is at 5.32
3. U: Thank you.
- expression of thanks
- positive feedback
(about understanding and acceptance)
- indication of dialogue closure
Meta-model
dialogue
1..N
1..1
participant
1..N
1..1
adressee
speaker
0..N
overhearer
1..N
markable
1..1
Dependency rel.
1..N
1..N
1..N
dialogue act
1..1
1..1
type of semantic content
(„dimension“)
1..1
1..1
1..1
communicative function
Dimensions of communication
• Performing a certain task or activity (Task-driven dialogue acts) through or
with support from the communication
• Monitoring the interaction (“Dialogue Control”):
- Feedback:
- giving auto-feedback
- giving or eliciting allo-feedback
- Interaction Management:
managing contact and attention, the turn-taking,
use of time, the structuring of the discourse;
the use of topics; the editing of one’s own and
one’s partner’s speech
- Managing social obligations - greeting, introducing oneself,
thanking, apologising, saying goodbye.
Dimensions for dialogue acts
Acts for monitoring the interaction:
(Dialogue Control: Feedback, “Interaction Management” and Social
Obligations Management)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Auto-Feedback
Allo-Feedback
Turn management
Time management
Contact management
Own communication management
Partner communication management
Topic management
Dialogue structuring
Social obligations management
Dimension-specific functions
Dimension
Comm. Function
Example
1.
Auto-feedback
OverallPositiveAutoFB
Okay.
2.
Allo-feedback
EvaluationFBElicitation
Okay?
3.
Turn management
TurnGiving
Yes
4.
Time management
Stalling
Well, you know,..
5.
Contact man’t
ContactCheckinng
Hello?
6.
Own comm. man’t
Self-correction
I mean...
7.
Partner comm.man. Completion
[... Completion]
8.
Topic management
TopicShiftAnnounc.
Something else.
9.
Dialogue structuring DA-announcement
Question:
10. Social oblig. man’t
Valediction
Bye
11. Task/domain OpenMeeting
I open this meeting
General-purpose functions
Example: Informs in various dimensions:
•
•
•
•
•
•
•
•
•
utterance
dimension
The KL204 leaves at 12.30.
Task/domain
I see what you mean.
Auto-feedback
You misunderstood me.
Allo-feedback
I would like to hear Peter’s opinion. Turn man.
I’m listening.
Contact man.
... I mean Toronto.
Own communication man.
We should also discuss the agenda.
Topic man.
I would like to ask you something.
Discourse structuring
I’m very grateful for you help.
Social obl. man.
General-purpose functions

Information-seeking functions
WH-question, YN-question, Alternatives-question, Check,..

Information-providing functions
Inform, WH-Answer, YN-Answer, Confirmation,
Disconfirmation, Agreement, Correction,..

Commissive functions
Offer, Promise, AcceptRequest,..

Directive functions
Instruct, Request, Suggest,..
Communicative functions expanded
CF
General Purpose
function
Information
Transfer
function
Information
Seeking
function
Action
Discussion
function
Informaion
Providing
function
QUESTION
Dimension-Specific
function
Dialogue
Control
function
Commissives
INFORM
Task-specific
function
Feedback
function
Directives
Interaction
Management
function
Performatives
Social Obligations
Management
function
CHECK
Interaction Management
functions
Interaction
Management
Turn
Management
Time
Management
Owm Comm.
Management
Contact
Management
Dialogue
Structuring
Topic
Management
Giving
Stalling
Retraction
C-ntact
Check
Opening
Shift
Keeping
Pausing
Self-correction
Contact
Indication
Closing
Shift
Announcement
Dial. Act
Announcement
Introduction
Grabbing
Where to go from here
• Provide precise definitions of dimensions
and communicative functions
– Data Category Registry (ISOcat)
• Provide a default annotation syntax
– Use TEI/ODD as a specification language
Conclusion
• Hope you will remember something…
– Should I dare to put back the Quiz?
• Importance of dissemination of existing standards
(in academia, across EU projects…)
– Standards as the identification of stable concepts in a
field
• Importance of wide involvement of experts
(academia, organizations and industry)
– Defining priorities
– Contribution to technical work — bottom up
– Maybe an item on the Kyoto agenda
Should we/you be afraid of standards?
<cit>
<quote>Yes you should be afraid, but you should be
more afraid of not having them</quote>
<author>Wendell Piez</author>
</cit>
Many thanks to the EU: eContent-22236 LIRICS project
Quiz
• Standards being mentioned in this
presentation
– ISO 639, ISO 646, ISO 10646, ISO 12620, ISO
16642, ISO 3166, ISO 8859, ISO 15924, ISO
24613, ISO 24617(-2), ISO 8601
• Give yourself a mark:
– I know…. of these standards (out of 11)
Descargar

The Europository