Discourse Level Software Current Status and Future Directions Nov. 16, 2004 Lars Huttar ([email protected]) Knowledge Management Services Abstract (I) • Discourse analysis (DA, a.k.a. textlinguistics) is a task frequently cited as needing computer-assisted tools. • Some tools are currently available for certain tasks, but as yet, no user-ready applications specifically for the discourse charting commonly used on the field. Abstract (II) • This presentation will review a few of the existing tools most pertinent to DA on the field, and software that is planned or under development. • I will also mention the conceptual model for constituent charting described in my thesis, which uses XML encoding of text and analysis, from which a chart is rendered via XSL. Overview • The need for discourse analysis software • What’s already out there? • What’s coming down the pike? Need for Discourse Software The task: • Help the user produce charts, diagrams, and summaries of texts in such a way as to facilitate discovery of discourse patterns and to expedite testing of hypotheses. Major features desired • Import (interlinear) text • Segment and move pieces into chart columns • Mark genre(s) • Configurable autohighlighting, e.g. color by POS. • Toggle highlighting of certain features • Manual annotation of features incl. coherence and prominence • Search text, IT, and annotations • Chart/summary of results, hyperlinked to data • Accessible to MTTs/OTTs » Geoffrey Hunt » Kent Spielmann Example constituent chart Current Practice • • • • Pencil & paper MS Word MS Excel A few brave souls use other tools The Right Tools? Specialized tools could make it quicker and easier! How to Address the Need? • Use existing software • SIL FieldWorks DA tool(s) • Extend existing tools? What’s already here? • • • • • MDA BART RSTTool MATE CiCaDA Multilinear Discourse Analysis • Generate statistics and diagrams relating to span analysis, topic continuity statistics, and other issues • Input is an SFM marked up text (e.g. from Shoebox) • In Beta 2 • More info: phil.quick@ sil.org Biblical Analysis Research Tool • BART – has features supporting discourse analysis of biblical texts • Comes with extensive built-in morphosyntax markup; supports customizable tagging and complex queries. • Only for biblical texts; can’t enter vernacular texts. • Part of TW, or available from WordSearch Corp. • www.sil.org/transl ation/bart.htm RSTTool • Lets user diagram relations between text “chunks.” • Free download from http://www.wagsoft.com/RSTTOOL • User can define own set of relations, schemas, etc. such as SSA or Longacre’s propositional relations. • Can generate statistics based on the tree structures built by the user. • File format is XML-based. • Text can be edited even after structuring has begun. MATE Workbench • Tool “to aid in the display, editing and querying of annotated speech corpora” • Encodes data in XML and displays via XSLlike stylesheets; could be programmed to produce various displays. • In “early demo” version (2001). Looks like it has potential, but I can’t get it to run on my machine. • http://mate.nis.sdu.dk/ CiCaDA • Produce fairly feature-complete constituent charts from XML data using XSLT stylesheets. • Encode text, column assignments, and chart configuration in XML; chart is produced automatically. • Open standards promote modification / reuse of data. • There is no “application;” no user-friendly way to enter the XML data. Helps available • LinguaLinks Library has several items, including: • Analyzing Discourse: a Manual of Basic Concepts – Dooley & Levinsohn (avail. on the web as well as in LLL). Very practical. Do you know of others? • Please let me know if you are aware of other useful discourselevel software tools! What’s coming? • TCC • AGTK • FieldWorks DA tools TCC • “A tool for drawing syntax trees” – could also be used for discourse “chunking” and highlighting • Looks very easy to use. Collapsible tree makes it easy to browse large text structures. • Supports Latin-1 charset. • Author taking feedback to make TCC more useful for SIL’s work. • Still in beta. No release sched. • Info: http://ulrikp.org/ Annotation Graph ToolKit • AGTK is a toolkit for annotating texts • TreeTrans – edit syntactic trees; charting & chunking possible • InterTrans – interlinearize text (very beta) • Saves in an abstract XML format; potential good basis for “Lego” solution • Not ready for end users. SIL FieldWorks DA Tool(s) • FW DA software is still on the drawing board but is a high priority. • Would leverage the huge benefits of all the work that has gone into FieldWorks! • FW tools already support interlinear text, text annotations/tagging and highlighting. • Preliminary work has begun on design of constituent charting features. • Wish list for DA features exists but requirements not yet prioritized. Guidance team has not yet been formed. Conclusion • There are some good tools already out there for certain tasks related to DA. Unfortunately they don’t interoperate much, and there are no domain-aware applications for constituent charting. • SIL FieldWorks tools, as they become available, should cover certain DA tasks well, such as constituent charting. Questions? Comments?