OSIS – A Closer Look
Steven J. DeRose, Ph.D.
Chair, Bible Technologies Group
http://www.bibletechnologies.net
[email protected]
November 22, 2002
1
Why have a standard?
(first, for publishers)
 Can
reduce the costs of:
– Editing and publication process
– Software purchase, training, maintenance
– Rekeying, scanning, and conversion
 Lets
texts survive when your WP or
typesetting program goes obsolete
 Facilitates multi-format, multi-platform
delivery and distribution
 Enables use of generic tools
OSIS Tutorial - 2
Why have a standard?
(next, for users)
 Lets
you obtain the same texts regardless
of what reading and other tools you use
– Because the publisher does no more work to
support 10, than to support 1
 Helps
texts survive when your bookreading software goes obsolete
 Reduced costs
 Better, more reliable resources
 Enables communities of interest
– Shared notes, collaborative study,…
OSIS Tutorial - 3
The medium picture
Cost savings
usually start here
OCR
HTML
WPs
XHTML
Typeset
Braille
XML/
OSIS
text
PDF
Open eBook
Other XML
Palmtops
4+7 convertors instead of 4  7
(and reality is bigger)
Cell delivery
OSIS Tutorial - 4
The basic principle:
“Descriptive markup”
 WPs
only see “huge, bold, space before”
– Now find/reformat all chapter headings
– Expensive to apply a house style or look/feel
– Hard to create diverse forms:
• Web, paper, and braille publication
A
perfect user could use stylesheets
– But interfaces make inconsistent work easier
 Instead: say what kind of portion each is
– A formatter applies rules by kind
OSIS Tutorial - 5
Why should I separate out
the formatting?
 It
speeds your work
– You can use a stylesheet from someone else,
and not have to do any manual formatting
– Typesetter can enhance formatting without
risking corrupting your content
• Therefore, less time wasted reviewing galleys
 Multiple
formats from the same source
– Print, braille, Web, etc.
– House styles for different journals
 Last-minute
changes are safer, cheaper
– Especially crucial for Bible publishing OSIS Tutorial - 6
Why not just use HTML?
 HTML
is nice but lacks
– Units like poem, chapter, verse, inscription
– Ways to annotate for meaning, grammar, etc
– Support for reference systems: "Matt 1:1"
 Multi-purpose
tags like <b>, <i>, etc.
– Are hard to tease apart when you need to
 HTML
limitations encourage using tables
to force layout, making re-use infeasible
 And…..
OSIS Tutorial - 7
Compare

<item>
<desc>Cashmere sweater</desc>
<price unit='yen'>120000</price></item>
<item>
<desc>Socks</desc>
<price unit='yen'>1000</price></item>
versus:

<br>Cashmere sweater, ¥120000
<br>Socks, ¥1000
OSIS Tutorial - 8
Why is the markup better?
 When
relations are marked,
an indexer can match price with item
 If not, there is no reliable way
– (there are lots of ways one might guess…)
– A search for “Cashmere and ¥1000” hits
• Needlessly annoying the searcher
• How many false hits have you had like this?
 Markup is not just
about formatting
OSIS Tutorial - 9
How do you spell XML?
 The
Extensible Markup Language
 HTML on steroids (sort of)
 Key features:
– Intrinsic support for Unicode
– Ability to create your own units
– Ability to validate how they are used
• (no chapters inside footnotes, etc.)
– Very easy for computes to process
– Separates formatting (remember earlier)
OSIS Tutorial - 10
OSIS and XML
 OSIS
is an application of XML
– XML specifies the syntax
– OSIS specifies a lexicon for our genre
 Life would be easy if natural languages were
that simple!
 There
are many other lexica for XML
– Humanities: Text Encoding Initiative
• Closely related to OSIS
OSIS Tutorial - 11
What is OSIS, really?
 OSIS
defines:
– A set of XML element types
• p, verse, inscription, note,….
– Certain attributes for those types
• type=“devotional”
– A standard form for Biblical references
• A consistent way to to write them down
• A way to specify within-verse locations
• A way to refer to editions and translations, or to
refer to a passage generically
OSIS Tutorial - 12
Concept: a hierarchy
osis
osisText
div type=
‘book’
header
div type=
‘chapter’
work
osisWork=‘KJV’
p
title language identifier
verse
p
verse
verse
osisID=‘Gen.1.3’
text content
note
inscription
text content
OSIS Tutorial - 13
What's under the covers?
 All
of this is represented by inserting
markers ("tags") into the text
– Like HTML but more consistent
– All starts and ends are explicit
 Three
kinds:
– Start tags:
– End tags:
– Empty tags:

<p>
</p>
<milestone/>
<p>Jesus wept.</p>,
is an element.
OSIS Tutorial - 14
What else is there?
 Elements
can contain other elements
– <div type="chapter">
<verse>In the beginning...</verse>
<verse>And the Word...</verse>
...</chapter>
 Many elements can also contain text
 Some elements require or prohibit others
– No <div> inside <abbr>
 An
empty tag just marks a point
– <milestone type="pb"/>
OSIS Tutorial - 15
Attributes
 Usually modify a whole element
 Appear only inside start tags
<name type="nonhuman">Baal</name>
<div type="chapter">…</div>
<verse osisID="Rev.22.21">
<q who="God">
<transChange type="added">
OSIS Tutorial - 16
The full set of (68) tags




















a
abbr
actor
caption
castGroup
castItem
castList
catchWord
cell
closer
contributor
coverage
creator
date
description
div
divineName
figure
foreign
format




















head
header
hi
identifier
index
inscription
item
l
label
language
lg
list
mentioned
milestone
milestoneEnd
milestoneStart
name
Note
osis
osisCorpus




















osisText
p
publisher
q
rdg
reference
refSystem
relation
revisionDesc
rights
role
roleDesc
row
salute
seg
signed
Source
Speaker
speech
subject








table
teiHeader
title
transChange
type
verse
w
work
OSIS Tutorial - 17
 Don't panic
 A lot
of these get used once each, in the
header, almost as a ritual
– You can paste a sample header and fill it in
 About
a dozen form the Dublin Core set for
cataloging and identification info
 Most of the rest fall into nice groups
 The hard parts (later) include
– Milestones
– Quotes when they cross verses/paragraphs
OSIS Tutorial - 18
Three major pieces to OSIS
 The
markup elements and their attributes
– Defined by a schema
 The
standardized reference system
– Partly defined in the schema
– Partly defined in grammar and prose
 The
authority system
– A way to declare formal/normalized names
– Declaration portion still in process
OSIS Tutorial - 19
Basic OSIS markup
(What's in a name?)
20
Sample markup
<div type="testament">
<div type="book" osisID="Gen">
<div type="chapter" osisID="Gen.1">
<verse osisID="Gen.1.1">In the beginning God created the
heaven and the earth.</verse>
<verse osisID="Gen.1.2">And the earth was without form, and
void; and darkness was upon the face of the deep. And the
Spirit of God moved upon the face of the waters.</verse>
<verse osisID="Gen.1.3">And God said, Let there be light: and
there was light.</verse>
<verse osisID="Gen.1.31">And God saw every thing that he had
made, and, behold, it was very good. And the evening and
the morning were the sixth day. <note type="x-StudyNote">
And the evening...: Heb. And the evening was, and the
morning was etc.</note></verse>
</div></div></div>
OSIS Tutorial - 21
</osisText></osis>
Big generic elements
 div
Testament, book, chap, section
 type
the type of division, as above
 divTitle optional display title

title Title of any div
 list
Genealogies and other lists

label

item
 table
Mainly for appendixes, etc.

row

cell
OSIS Tutorial - 22
Book/chapter/verse
 Large units all use the <div> element
 It has a type attribute, with values
– appendix
– book
– chapter
– concordance
– glossary
Note: There are no separate tags
for testament, book, or chapter
 As
with most attributes you can add new
values if they start with "x-"
– <div type='x-toronto-thing'>
 We expect to add more div types
 <verse osisID="Rev.3.20">
in time
OSIS Tutorial - 23
Small items
 abbr
 divineName
 foreign
 hi
 inscription
 mentioned
 name
P
q
<abbr expansion="">…
<divineName>The Lord…
<foreign lang="">Talitha…
Emphasis in notes/comm
Mene, mene, tekel, parsin
The name <mentioned>Peter
Destroyed the <name type=
"nonhuman">Baals</name>
The ubiquitous paragraph
Quotations (more later)
OSIS Tutorial - 24
Genre-specific elements
salute, closer
– <closer>I, Paul, sign this with my
own hand.</closer>
 Illustrations figure
– May contain caption, note, index
 Poetry
lg, l
– Also used for other line-oriented text
– lg (line group) can be nested
 Drama
speech, speaker
– speaker ok in: speech cell closer div
inscription l p q salute verse
– who attribute can point to a castItem in the header
 Epistolary
OSIS Tutorial - 25
Inscription
<verse osisID="Dan.5.25">
This is the inscription that was
written: <inscription>Mene, Mene,
Tekel, Parsin<note type="">Aramaic
UPARSIN (that is, AND PARSIN)</note>
</inscription>
 How
many inscriptions can you think of?
OSIS Tutorial - 26
About the source/target layout

<milestone>
– Use to mark point events
• page and column breaks of a source manuscript
• Intended screen breaks for display
– Types: column footer header line page
screen

Note: Do not confuse with milestoneStart and
milestoneEnd, which stand in for several other
elements when they must cross verse/p boundaries
in certain ways.
OSIS Tutorial - 27
About the text itself
 transChange
– Types:
Changed in translation
added amplified changed deleted moved
 rdg
Variant readings
 seg
w
(extensions)
word-level linguistics
– Used only within notes (for now)
– <note>Some ancient mss <rdg>kiss the
Son</rdg></note>
– Attributes:
POS, morph, lemma, gloss, src, xlit
OSIS Tutorial - 28
Attributes of all elements
(all are optional)
Name
Type
osisRef
annotateWork
annotateType
ews
ID
lang
osisID
resp
splitID
type
subType
n
osisRefType
anything
osisAnnotation
anything
xs:ID
languageType
osisIDType
anything
anything
anything
anything
anything
Meaning
I am about W
My relation to W
For Web to link to
language, wr sys
reference to here
responsible person
(later)
name/num of unit
OSIS Tutorial - 29
The reference system
(I am named, therefore I am)
30
Header overview
 Purpose
– Identify the file as an XML file
– Identify the file as using the OSIS schema
– Say whether it's one text or a collection
– Identify and declare names for:
•
•
•
•
The work itself (title, author, etc)
Other works referenced
Verse reference systems used
Characters in the text <castList>
OSIS Tutorial - 31
Header sample
<?xml version="1.0" encoding="UTF-8" ?>
<osis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="osisCore.1.1.xsd">
<osisText osisIDWork="KJV"
osisRefWork="defaultReferenceScheme">
<header>
<work osisWork="KJV">
<title>King James Version of 1769</title>
<identifier type="OSIS">KJV</identifier>
<language>en</language>
<refSystem>Bible.KJV</refSystem></work>
<work osisWork="defaultReferenceScheme">
<refSystem>Bible.KJV</refSystem></work>
</header>
OSIS Tutorial - 32
Other header elements

osisCorpus
– Use inside <osis> when there will be several
texts in one document, as for a polyglot
– osisCorpus can have its own header
– osisCorpus then contains osisText elements
 teiHeader
– Allows including a fuller TEI-style header
 Work
uses the standard "Dublin Core" tags to
give catalog/bibliography info
OSIS Tutorial - 33
Dublin Core
– title
– creator
– contributor
– identifier
– date
– language
– rights
– publisher
– description
– format
– coverage
– relation
– source
– subject
– type
– refSystem
The title of the work or collection
The primary author
Other contributers (set 'role')
ISBN or similar unique ID of work
Publication date
Primary language of the work
Statement of permissions/rights
Name of the publisher
An abstract or precis of the work
What representation (=OSIS)
Intended audience and scope
If derived from another work
LCSH or similar subject descr
(OSIS only, not in D.C.)
OSIS Tutorial - 34
Identifying parts of the work

must be specified on any element that
has a canonical reference:
osisID
– <verse osisID="Luk.3.10">
– <p osisID="Rev.3.20">
– <div type="chapter" osisID="Luk.3">
 3-letter book names, periods to separate
 HTML <a name="…"> available as well
– More useful in notes/commentary, not Bible
 Back-of-book
index entries
– <index level1="Idols" level2=
"burning of" level3="by Hezekiah">
– <index level1="False gods"
OSIS Tutorial - 35
see="Idols">
When it won't come out even
several verse are translated as (say) a p
– Put all the appropriate osisIDs on the p
– <p osisID="Matt.1.1 Matt.1.2">
 If a verse is split across paragraphs
 If
– Tag each part; use splitID to number them
– <p>…<verse osisID="1Pe.1.3"
splitID="1">…</verse></p>
<p>…<verse osisID="1Pe.1.3"
splitID="2">…</verse>…</p>

milestone_Start… milestone_End
– Used to mark units that cross boundaries
– abbr closer div foreign l lg q
salute seg signed speech verse OSIS Tutorial - 36
References
 Reference
to other places/works
– <note>See also <reference osisRef=
"Mat.1.1">Matthew</reference> for a
similar theme.</note>
 div,
figure, note, and reference can
also directly refer:
– <div type="commentary"
osisRef="Luk.3.10">
– This identifies the passage this commentary div
is about.
 HTML
<a href="…"> also available
– (more useful in notes/commentary, not Bible)
OSIS Tutorial - 37
Reference syntax
'code point',
~=character
work ref
canonical ref
canonical ref
grain value
grain type
verse
chapter
book
refsystem
edition
NIV.Heb:[email protected][12]
grain ref
finegrain ref
range ref
OSIS Tutorial - 38
Notes
 Notes
are placed right where they are
referenced in the text.
 Notes have several types
– allusion alternative background citation
devotional exegesis explanation study
translation enumeration variant
– Additional types must start with "x-"
 catchWord
-- marks referenced text cited
within a note
– <note><catchWord>hello</catchWord>
may also be translated "goodbye"
here.</note>
 rdg
-- marks alternate readings
OSIS Tutorial - 39
On to the authority system
The name is the thing, and the
true name is the true thing. To
know the name is to control the
thing.
-- Ursula LeGuin
40
Cast-lists
 To
declare cast of characters
– Provides a formal ID for each
– Can refer to ID from <speaker>, <q>, etc.
– castList
–
castGroup
–
castItem
–
actor
–
role
–
roleDesc
OSIS Tutorial - 41
The authority system
 Only supported for castList
 We intend to provide
at present
– A schema for declaring sets of formal names
– A way to invoke such lists in documents
– Standard name sets for
•
•
•
•
Bible versions
Versification schemes
People, places, etc. in the Bible
Journals, classical literature, and other works
commonly cited in Biblical studies
OSIS Tutorial - 42
OSIS in practice
Tourist to police officer:
Can you tell me how to
get to Carnegie Hall?
Officer to tourist:
Practice, practice, practice.
43
How do I know if the
markup is correct?

5 levels of 'correct':
– SLipshod
– Only well-formed
– Valid
– Accurate
– Complete
 SL: no check required
 O: Load in IE 5+
 V: xp, xmetal, and
other true validators
 A: requires human
proofreading and
interpretation
 C: there is always
more that could be
marked up
OSIS Tutorial - 44
Tools vs. today
 Today
we will use the raw form
– Experts will need to know this
– Users should have protective software
 Some
XML editing programs:
– SoftQuad XMetal -- $300
– Open Office -- free, very promising
 Some
generic-enough HTML editors:
– BBEdit, emacs, Netscape Communicator
OSIS Tutorial - 45
Getting to OSIS
 The
cleaner your data, the easier it is
– Data is seldom as clean as you think it is
 Structured
formats (USFM, XSEM, LGM,
ThML) are the easiest sources
 Tools:
– Perl/awk/sed/cc and the like
– XSLT if coming from XML
– BTG has sponsored development of several
convertors.
– BTG will maintain a repository of utilities
OSIS Tutorial - 46
Getting your OSIS XML to
display in IE
 Make sure the document is at least WF
 Name it filename.xml
 Refer to a stylesheet if you want formatting
instead of just an outline view
<?xml version="1.0"?>
<!DOCTYPE osis []>
<?xml-stylesheet href="mystyle.css"
type="text/css"?>
<osis
xmlns:="http://www.bibletechnologies
.org/namespaces/OSIS-1.1">
<header>…
OSIS Tutorial - 47
Getting your OSIS printed
 Most typesetting programs now import XML
 OSIS converts easily to most relevant XML
schemas, using XSLT
 Word processors are also gaining ability to
import arbitrary XML
 Typesetting firms, esp. for journals, are
starting to accept XML as well.
OSIS Tutorial - 48
Near-term concerns of OSIS
 Linguistic
annotation
 Formal name lists for people, places,
translations, etc.
 Connecting text to multimedia
 Greater support for secondary genres
 Tool development and conformance
OSIS Tutorial - 49
How you can help
 Find
the best place to apply OSIS in your
organization, and do it.
 Join a Working Group
 Send feedback, feature requests, etc.
 Join a Working Group
 Convert or create OSIS texts
 Join a Working Group
 Create a converter for your current format
 Join a Working Group
 Tell your friends and colleagues
 Join a Working Group
OSIS Tutorial - 50
For more information
 Web:
– http://www.bibletechnologies.org
– http://www.bibletechnologieswg.org
 Some
contacts:
– Steve DeRose
– Kees de Blois
– Patrick Durusau
– Kirk_Lowery
– Mike_Perez
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
OSIS Tutorial - 51
Descargar

What is OSIS? - ForMinistry.com