Making Silent Voices Heard
Stephen Rhind-Tutt, President
Charting Vanishing Voices Workshop
June 29, 2012
Agenda
1. About Alexander Street
2. The Challenge
3. The Nature of Virtual Space
4. Examples from Alexander Street
5. Partnerships and Collaboration
1.
About ASP
About Alexander Street Press
• Founded in 2000 by executives who used to work for
Chadwyck-Healey, SilverPlatter, Wolters-Kluwer, Gale
and Wilson.
• Headquartered just outside Washington DC, USA
• Offices in Stevenage, England; Shanghai, China;
Kuala Lumpur, Malaysia; Sydney, Australia; Brazil;
New Zealand
• 3,000 customers
• 2,500 licensors
Making silent voices heard…
Collaboration
More examples
2.
The Challenge
The Challenge
By 2020 the web will have
• > 5 Bn users, (currently 2.3 Bn - 37% of the world)
• > 90% of published works prior to 1923
• > Most works published to 2020
• > 4 Billion websites (currently 555m, 71% growth p.a)
• > 1 Trillion photographs (Facebook adds 300m daily)
• > 100 Million pages of facsimiles of manuscripts
• > 100 Million audio files
• > 1 Billion video files (YouTube adds 72 hrs every minute)
• More than 6,500
endangered languages
• Countless cultural
artifacts, audio, video,
texts
• Hidden collections
• (Personal) archives
• Field Notes
• Little or no cataloging
• Mostly undigitized
• Decaying film and audio
formats
• Increasing opportunities to
embellish (HD-video, 3-D
models, social annotation
etc)
• Data sets
Preservation and Access
How are we going to do all of this?
3. The Nature of Virtual Space
The nature of virtual space…
“You must consult the laws of nature…you say “What do
you want brick?” and the brick says to you “I like an arch”
and you say to brick “Look, I want one too, but arches are
expensive…” Brick says “I like an arch”…
“Honor the material you use”
Louis Kahn (1979)
Understanding the medium
• Steel – High cost to create, strong, easy to
stamp shapes, medium weight…
• Wood – Low cost to create, moderately strong,
needs to be crafted, light weight…
• Glass – Medium cost to create, weak, easy to
craft, transparent
• The Web - ?
Nature of electronic publications
Page
Page
Page
Page
Page
Page
Page
Page
Page
• Pliable
• Atomic
• Evolving quickly
• Interconnected
• Unlimited in size
• Interdependent
• The link matters more than the object
Understanding the medium
Programming
languages
C++, PERL,
VB, etc…
Assembly
Code
Machine
Code
Binary
0111010011010000101101101000101110100010001110
1010101010101010111110101010101011111010111001
00011101
Understanding the medium
Font Standards – Postscript
Display Standards – Super VGA
Browser Standards – IE 7.0
Document formats - PDF
Mark-up Standards – SGML, XML, HTML
Communications Protocols – TCP-IP, Modems
Plug-in standards – Java
Image Standards – JPG, TIFF, etc,
etc
Understanding the medium
Twitter – local, custom, news
Four Square
Map Standard - Google Maps, Open Map
iOS, Android,
Devices – Nook, Kindle, iPad,
Phone standards – 3G, 4G, 5G
Network protocols – 801
Video Standards – H264, Silverlight, Flash
Evolving quickly
On current trends…
• Processing speed – by 2015 machines 4 times more
powerful than today’s.
• Storage space – by 2015 20 Terabytes of storage (8 Bn
pages) will cost under $100
• > than 90% of all developed world will have Web access
• Significant improvements in the developing world
• Phone Bandwidth > 1.5 Mb/s
Evolving quickly
On current trends…
Year Hard Disk Size (MB)
1988
20
1990
40
1991
80
1993
160
1994
320
1996
640
1997
1,280
1999
2,560
2000
5,120
2002
10,240
2003
20,480
Year Hard Disk Size (MB)
2000
20,000
2002
40,000
2003
80,000
2005
160,000
2006
320,000
2008
640,000
2009
1,280,000
2011
2,560,000
2012
5,120,000
2014
10,240,000
2015
20,480,000
Where we’re headed…
Why?
Therefore
Who, What, When, Where?
After Data, Information, Knowledge, and Wisdom, Gene Bellinger, Durval
Castro, Anthony Mills. http://www.systems-thinking.org/
Understanding electronic products
Value in the electronic world is about...
“The manner in which or the efficiency with which
something reacts or fulfills its intended purpose”
Webster’s Unabridged
What do we need to do?
• Comprehensive - everything on the network
• Everyone on the network
• Local and personal (unique verified identity)
• Ubiquitous access (everywhere, all devices)
• High quality (peer review)
• Workflow integration and analysis (deep links to relevant
content and tools)
• Maximize efficiencies (easy ingestion and dissemination)
• Real time currency
Producing
Ingestion
Indexing
Filming
Recording
Licensing
Writing
Scanning
Uploading
Data
Crosswalking
MARC
Semantic
Controlled
vocabularies
Commissioning
Inbound
Discovery
API
Quality
Community
Permissions
Bandwidth
Encodes
# of pixels
Sampling
Peer Review
Crowdsource
Annotation
Playlists
Privacy
Permissions
Anonymity
Shibboleth
Tools
Promotion
Transcripts
Subtitles
Chaptering
Translation
Usage Stats
Conferences
Adsense
E-mail
Mailings
Device
s
Outbound
Discovery
Harvesting
Evolution of tasks
Process integration
Workflow tools & apps
Community Building
Outbound discovery
Inbound discovery
Permissions
Automated ingestion and tagging
Rare and unpublished material
Human tagging
Republishing public domain
Simple, One database Search
Warehousing
Compiling Directories
Printing
Typesetting
Growing
Fading
Evolution of tasks
Process integration
Workflow tools & apps
Community Building
Commissioning?
Outbound discovery
Inbound discovery
Permissions
Editorial?
Automated ingestion and tagging
Rare and unpublished material
Human tagging
Licensing?
Republishing public domain
Simple, One database Search
Quality?
Warehousing
Compiling Directories
Selection?
Printing
Marketing?
Typesetting
Growing
Fading
4.
Examples
Searchability
Make video searchable…
30 minutes of news
=
12 double-spaced pages
5 minutes to read in depth
2 minutes to scan
Great functionality
Let it be embedded in courses
Annotation
Studio
Inbound discovery
Be of the web
Websites
Newspapers
Monographs
Music
Primary Works
Journals
Major Collections
Individual Titles
Library Branded Interface
Federated Search Engines
Embeddable Search Box
Make it accessible widely…
Indexing, discovery and analysis
The strain on keyword search…
Questions
• Google: Martin Luther King – 8.3m hits (2005), 32.5m
(2012)
• Google Scholar: 202k hits, options to restrict:
• Article
• Legal document
• Date range (year published)
• Patent or Citation
‘Semantic’ Indexing
Word
Page
Chapter
Book or Volume
Series
Collection
‘Semantic’ indexing >
Who ?
What ?
When ?
Where ?
Increases in Utility
Access
Do you
have
the
book
titled…
Keyword
Search
All
mentions
of ‘Star
Wars’
Fielded
Search
All mentions of
‘Star Wars’ in
texts about
Regan
published in
1985
Semantic
Search
All mentions of
‘Star Wars’ by
Regan in
speeches he
delivered in
1985
What is Semantic Indexing ?
• Identify and divide texts into content elements
(e.g. letter, diary entry…)
• Identify key concepts for these elements
(e.g. authors, sources, battles, encounters…)
• Index both elements and associated concepts
• Integrate to form a cohesive whole
• Unique ways of browsing through concepts
• Unique ways to ask questions
Semantic Indexing…
Document
Text
Author ID
Encounter
ID
Source ID
Date
Subject
Age writing
Etc…
Encounter
Encounter Name
Cultural Groups
Estimated # of
people
Start year
Start month
Start day
Location
Expedition
Encounter Type
Fatalities
Etc…
Author
Name
Date of birth
Place of birth
Date of death
Place of death
Nationality
Religion
Sexual Orientation
Occupation
Etc…
Source
Source
Editor/Translator
Original
Language
Publisher
Publication Date
Publication Place
Subject of Work
Etc…
Semantic Indexing…
Document
Text
Author ID
Encounter ID
Source ID
Date
Subject
Age writing
Etc…
Encounter
Encounter Name
Cultural Groups
Estimated # of people
Start year, month, day
Location
Expedition
Encounter Type
Fatalities
Etc…
Author
Name
Date of birth
Place of birth
Date of death
Place of death
Nationality
Religion
Sexual Orientation
Occupation
Etc…
“Show me writings by
Jesuits, originally
written in French, that
discuss trade
involving the Huron.”
Source
Source
Editor/Translator
Original
Language
Publisher
Publication Date
Publication Place
Subject of Work
Etc…
Early Encounters in North America
Fauna and Flora
Geophysical, Natural Phenomena
Peoples
Personal & Cultural Events
Encounter database
Specific entry points
for American Indian
Studies…
Encounter database
Early Encounters in North America
Early Encounters in North America
Semantic Indexing…
• More than a way to answer questions
• A framework by which users can be guided to
understand, explore, discover and learn.
• A route-map to guide users through data - saving time and effort.
• The intellectual fabric by which information should be
organized…
• Delivers answers that cannot be asked elsewhere
•
•
•
•
•
Discipline specific
Oriented towards the user and the content
At the ‘right’ level
Thoroughly controlled
Metadata should be open
Outbound discovery
Higher value linkages…
Loosely
integrated
Free Websites
Refuse to License
Tightly Held
Loosely Held
License widely
and be a Licensor
License widely
Tightly
integrated
Higher value linkages…
• Higher value links
• Semantic indexing and keyword
searching of more than 3,000 oral
history collections.
• Represents the personal histories of
some 300,000 people.
• Value:
– Context
– Selection
– Search Power
– Licensed material
– Integration
Context and Selection
Search Power
Organized Results
Building the network…
Unhelpful
Helpful
• Legal warnings not to link
• Visibility
• Changing links constantly
• Permanent URLs
• Disabling links
• RSS feeds
• No permanent URLs
• OpenURL, Open Metadata
• No crawling
• Design for multiple interfaces
• Randomly changing URLs
• Open to crawling
• Insisting on one interface and
one access point
• Published open APIs
• Unattached pages
• Ask others to do the same
• Welcome linking
5. Partnerships & Collaboration
Where will the £££ come from?
JSTOR
$52m
Revenues in
2010
American
Memory
Women and Social Movements
• Collaboration with the Center for the
Historical Study of Women and Gender
at SUNY Binghamton and ASP
• Original site is free –new content is for
fee.
• Usage across the free site dipped only
slightly – more usage following
commercial launch.
• Added video, audio, > 200k pages, new
functionality.
Summary
We’re engaged in a leviathan task
Money is needed
For fee content can sit alongside open content
Publishers can help
Need for collaboration and openness
Where we’re headed…
•
•
•
•
It will all be available in digital form
It will not cost too much
Many more people will use it
It will be enriched through better display,
better integration, better links, better context,
etc, etc
Good for
publishers
Good for academics
Good for “society”
www.alexanderstreet.com
www.alexanderstreet.com
Descargar

Overview