Discourse Level Software
Current Status
and Future Directions
Nov. 16, 2004
Lars Huttar ([email protected])
Knowledge Management Services
Abstract (I)
• Discourse analysis (DA, a.k.a.
textlinguistics) is a task frequently cited
as needing computer-assisted tools.
• Some tools are currently available for
certain tasks, but as yet, no user-ready
applications specifically for the discourse
charting commonly used on the field.
Abstract (II)
• This presentation will review a few of the
existing tools most pertinent to DA on the
field, and software that is planned or
under development.
• I will also mention the conceptual model
for constituent charting described in my
thesis, which uses XML encoding of text
and analysis, from which a chart is
rendered via XSL.
• The need for discourse analysis
• What’s already out there?
• What’s coming down the pike?
Need for Discourse Software
The task:
• Help the user produce charts,
diagrams, and summaries of texts in
such a way as to facilitate discovery of
discourse patterns and to expedite
testing of hypotheses.
Major features desired
• Import (interlinear) text
• Segment and move
pieces into chart
• Mark genre(s)
• Configurable autohighlighting, e.g. color
by POS.
• Toggle highlighting of
certain features
• Manual annotation of
features incl. coherence
and prominence
• Search text, IT, and
• Chart/summary of
results, hyperlinked to
• Accessible to
» Geoffrey Hunt
» Kent Spielmann
Example constituent chart
Current Practice
Pencil & paper
MS Word
MS Excel
A few brave
souls use
other tools
The Right Tools?
Specialized tools could make
it quicker and easier!
How to Address the Need?
• Use existing software
• SIL FieldWorks DA tool(s)
• Extend existing tools?
What’s already here?
Multilinear Discourse
• Generate statistics and diagrams relating to span
analysis, topic continuity statistics, and other
• Input is an SFM marked up text (e.g. from Shoebox)
• In Beta 2
• More info:
Biblical Analysis Research Tool
• BART – has features supporting discourse analysis
of biblical texts
• Comes with extensive built-in morphosyntax
markup; supports customizable tagging and
complex queries.
• Only for biblical texts; can’t enter vernacular texts.
• Part of TW, or
available from
• www.sil.org/transl
• Lets user diagram relations between text “chunks.”
• Free download from http://www.wagsoft.com/RSTTOOL
• User can define own set of relations, schemas, etc.
such as SSA or Longacre’s propositional relations.
• Can generate
statistics based on
the tree structures
built by the user.
• File format is
• Text can be edited
even after structuring has begun.
MATE Workbench
• Tool “to aid in the display, editing and
querying of annotated speech corpora”
• Encodes data in XML and displays via XSLlike stylesheets; could be programmed to
produce various displays.
• In “early demo” version (2001). Looks like it
has potential, but I can’t get it to run
on my machine.
• http://mate.nis.sdu.dk/
• Produce fairly feature-complete constituent
charts from XML data using XSLT
• Encode text, column assignments, and
chart configuration in XML; chart is
produced automatically.
• Open standards promote modification
/ reuse of data.
• There is no “application;” no user-friendly
way to enter the XML data.
Helps available
• LinguaLinks Library has several
items, including:
• Analyzing Discourse: a Manual of
Basic Concepts – Dooley & Levinsohn
(avail. on the web as well as in LLL).
Very practical.
Do you know of others?
• Please let me know if you are
aware of other useful discourselevel software tools!
What’s coming?
• FieldWorks
DA tools
• “A tool for drawing syntax
trees” – could also be used for
discourse “chunking” and
• Looks very easy to use.
Collapsible tree makes it easy
to browse large text structures.
• Supports Latin-1 charset.
• Author taking feedback to
make TCC more useful for
SIL’s work.
• Still in beta. No release sched.
• Info: http://ulrikp.org/
Annotation Graph ToolKit
• AGTK is a toolkit for annotating texts
• TreeTrans – edit syntactic trees; charting &
chunking possible
• InterTrans – interlinearize text (very beta)
• Saves in an
abstract XML
format; potential
good basis for
“Lego” solution
• Not ready for end
SIL FieldWorks DA Tool(s)
• FW DA software is still on the drawing board
but is a high priority.
• Would leverage the huge benefits of all the
work that has gone into FieldWorks!
• FW tools already support interlinear text,
text annotations/tagging and highlighting.
• Preliminary work has begun on design of
constituent charting features.
• Wish list for DA features exists but
requirements not yet prioritized.
Guidance team has not yet been
• There are some good tools already out
there for certain tasks related to DA.
Unfortunately they don’t interoperate
much, and there are no domain-aware
applications for constituent charting.
• SIL FieldWorks tools, as they become
available, should cover certain DA
tasks well, such as constituent

Discourse Level Software