Digital Video Library
NoD and Multilingual Status Report
April 1998
Carnegie Mellon University
Howard D. Wactlar
Carnegie
Mellon
MLI and NoD Tasks
•
Data collection & preparation - English, Serb-Croation,
and German
•
Multilingual speech recognition enhancements
•
Video and audio segmentation
•
Multilingual indexing, retrieval, search
•
Summarization-on-demand
•
Annotations
•
User studies
•
Additional languages and functionalities
•
Demonstration as a network-based service
Carnegie
Mellon
Accomplishments to Apr 98
We are achieving what we proposed and beyond
•
Advances in capability (research => integrated function)
•
Infrastructure evolution & growth
•
Testbed activity and extension
•
Related research and outreach
Carnegie
Mellon
Accomplishments to Apr 98 (cont’d)
•
Serbo-Croation demonstration system
•
Automated and dynamic abstraction and summarization
for improved navigation
•
Topic detection and assignment for subject browsing
•
Dynamically improved speech recognition for index
generation
•
Coherent story segmentation through corpus specific,
rule-based analysis
more ...
Carnegie
Mellon
Accomplishments to Apr 98 (cont’d)
•
Video-OCR for improved name/face identification
•
Multi-level annotations to mark and share commentary
•
Web interface enabling “slide show” viewing over slow
links
•
Database restructuring to enable size growth and
function evolution
•
Remote testbeds with access to daily updated news
Carnegie
Mellon
Automated Abstraction and Summarization
•
Critical to efficient navigation of video
•
Improved automatic title generation
•
Dynamic “poster frame” icons - query based
•
Skims smoothed through enhanced language models
and rule-based scene selection
Carnegie
Mellon
Carnegie
Mellon
“Naïve” Poster Frame Result List
(Uses First Shot Image)
Carnegie
Mellon
Query-based Poster Frame Result List
Carnegie
Mellon
Query-based Poster Frame Selection Process
1. Decompose video segment into shots.
2. Compute representative frame for each shot.
3. Locate query scoring words (shown by arrows).
4. Use frame from highest scoring shot.
Carnegie
Mellon
Topic Detection and Tracking
Enhances browsing and discovery over directed search
Different methods from several areas being evaluated
•
Information retrieval
- vector space methods
- relevance feedback
•
Speech recognition
- hidden Markov models
•
Statistics
- k-nearest neighbors
- exponential models
Carnegie
Mellon
KNN-based Topic Detection
•
Build training index with pre-labeled topics
- 45000 Broadcast News stories from 1995 and 1996
- 3178 different news topics occurring > 10 times
•
Search for top 10 related stories in training index
•
Lookup topics for related stories
•
Re-weight topics by story relevance (select top 5)
•
At 5 topics, Recall - .491
Relevance - .482
Carnegie
Mellon
Speech Recognition for Index Generation
•
Integrate closed captioning with speech recognition
generated transcription
•
Improve accuracy by automatic daily expansion of
language model from closed captioning
e.g. “Dodi Fayed”
•
Participated (with Claritech) in TREC Spoken
Document track
–
large text retrieval evaluation benchmarks (NIST/DARPA)
–
scored second due to OOV words (CIA, well-known, torched)
Carnegie
Mellon
Segmentation - Creating the Video Paragraph
Break up a video stream into semantically coherent pieces
•
corpus-specific analysis
•
language model approaches
•
video structure analysis
Carnegie
Mellon
Segmentation - Commercial Detection
Look for several potential indicators in multiple passes
•
detect lapses in cc capture greater than some threshold
•
occurrence of black frames
•
rate of scene change and motion
Carnegie
Mellon
Ad Removal based on
Black Frame and Scene Change Detection
Truth=>
Hypothesis=>
<= Scene change
<= Black frames
Segmentation - Language Models
Novel application to find shift in topic within a document
•
Adaptive exponential language models improve as they
see more material from current topic
e.g., probable distance of “managed care” to “physicians”
•
Static language models are pre-computed likelihood of
short-range adjacency (e.g. trigrams)
•
Compare predictive performance models
i.e., assigned probability to the next observed words
•
A segment boundary is likely to exist when the adaptive
model shows a dip in performance relative to the shortrange model
Carnegie
Mellon
0.25
0.2
0.15
0.1
0.05
0
-0.05
-500 -400 -300 -200 -100
0
100
200
300
400
500
A plot of the ratio of the two language models as a
function of the relative position in a segment.
Carnegie
Mellon
Video OCR
Image component crucial to news corpus
Capture of text overlayed on the video image
Detected, filtered, OCR’d, incorporated into content and
indexed
Carnegie
Mellon
Video OCR Block Diagram
Video
Text Area
Detection
Text Area
Preprocessing
Commercial
OCR
ASCII Text
Carnegie
Mellon
Video Frames
Filtered Frames
AND-ed Frames
(1/2 s intervals)
Carnegie
Mellon
Text Detection False Alarms
Video Frame
Filtered and Anded Frame
Carnegie
Mellon
Text Detection Misses
Video Frame
Filtered and Anded Frame
Carnegie
Mellon
Challenges for VOCR Preprocessing
• The resolution of
video text is very
low (<10×10
ppc).
• Text detection
and extraction
are complicated
by complex
backgrounds.
Carnegie
Mellon
VOCR Preprocessing Problems
Carnegie
Mellon
Carnegie
Mellon
Video OCR - Results
Character recognition - 83%
Word recognition - 70%
Language model post processing will improve word
recognition rate, but new names and places will not be
in language model
Important adjunct to Name-It: name/face correlation
through co-occurrence matrices
Carnegie
Mellon
Annotations
Annotation fields contain metadata automatically
derived from the content (e.g. topics, chyron)
Annotations are included in the index (searchable
separately or combined with transcript)
Personal annotations are typed or spoken comments
that are established on a per user basis
•
bookmarking or commentary
•
fully indexed and searchable with other data
Carnegie
Mellon
Web Interface
Long-time concern about video fidelity on internet
Compromise is slide show of high quality JPEG images
and continuous audio
Not all navigation tools translate directly
Required substantive change in interface specification
Browsing improved over full video interface
User effectiveness versus full video to be explored
Carnegie
Mellon
Infrastructure Evolution and Growth
Conversion of underlying database architecture (ONGOING)
•
•
•
•
extends functionality
- e.g. date filtering => “What’s new?” query
improved interoperability
- fully distributed, replicated function
increased scale
negative impact on query performance (improving)
Summer-long ruggedization program for reliable processing
and quality control
900 hours on-line, terabyte data store
12 Alphas for parallel processing (and experiments)
Carnegie
Mellon
Testbeds
Corpus
•
CNN data: 620 hours + 12 hrs/wk
Early Prime, World View, Impact, Science & Technology Week,
Earth Matters, Travel Guide, Your Health
Distant high speed network access
•
Informedia-Net attached to both vBNS and AAI nets
•
enables attachment of clients to CMU servers from
selected locations
•
clients at DARPA, SPAWAR (forthcoming), NSA
Carnegie
Mellon
Serbo-Croation LVCSR on the Dictation and
Broadcast News Domain
• Informedia (English)
– CMU Informedia Group (Howard Wactlar, Alex
Hauptmann, Ricky Houghton, et al.)
– CMU Sphinx Group
• Multilingual Speech Recognition
– CMU/UKA Interactive Systems Labs - JanusRTk
(Alex Waibel, Michael Finke, Petra Geutner, Peter
Scheytt)
• Translation/Cross Language Retrieval
– CMU Language Technologies Institute (Jaime
Carbonell, Eric Nyberg, Bob Frederking, Paul
Kennedy, et al.)
Carnegie
Mellon
Serbo-Croation Broadcast News Recognition
• Initial database: Globalphone Serbo-Croation (UKA)
• Broadcast news: Collected by satellite from Germany
(UKA)
• 15 hours transcribed
• Janus recognition toolkit: 15 languages
• Janus applied to Serbo-Croation broadcast news
• Problem: Morphology, large number of inflections
• Competitive performance already: 26% WER
Carnegie
Mellon
Broadcast News System
Vocabulary Growth Per Broadcast
25000
20000
Words
15000
10000
5000
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
News Broadcasts
Carnegie
Mellon
Broadcast News System
Serbo-Croatian BN Speech Performance
80
73.6
70
60
Language
Normalization
WER [%]
50
43.6
40
36.0
29.5
30
26.0
20
Hypothesis Driven
Lexicon Adaptation
10
0
August
September
October
December
January
Carnegie
Mellon
Proposed National Research Data Testbed
Informedia dataset and infrastructure as a
benchmarkable testbed for research in spoken language
and visual documents
Potential for establishing on-line public domain video
archive
•
e.g. all government produced video for training and
public information
•
fully indexed and searchable
Carnegie
Mellon
Project Genoa Contributions
•
Code to extract video to place in a CIP
•
Processing changes to index I-frames
•
Code to run Web browser to play the MPEG segment
•
Working towards a generic Web-based interface
•
Other CMU: Meeting browser
•
Full access to client but not full source code
Carnegie
Mellon
Data Source Picture
Secret
Pseudo-TS/SCI
Unclassified
DARPA TIE
Arlington, VA
JTF
Planner
CIP
Server
BWD
OSIS
(U)
DIA
Wash, DC
SAIC
San Diego, CA
CrisisBrowse Client
CMU
Informedia
Server
DIAL-IN
Pittsburgh, PA
CMU
Network
Neighborhood Informedia CrisisBrowse
Client
Server
(NOD)
Mass
Storage
http
Internet
WWW
(U)
World
Energy
Database
(U)
Access
SpIKE/Visage/NOD?
Netscape
JEDS
Starlight
mpeg
jpeg
txt
html
Starlight
mpeg
jpeg
txt
html
?
HPKB
(U)
?
MDITDS
(S)
MIDB
(S)
Sybase
Sybase
CIA
Factbook
(U)
JANES
(U)
CIA
Intelink-S Langley, VA
SIPRNET
DISN LES
JEDS
?
SAIC
San Diego, CA
DB?
DB?
DB?
Carnegie
Mellon
Future Plans - Near Term
•
Complete full-function Web interface
•
Foreign language system unification
•
S-C language models for improved query and selection
•
S-C segmentation
•
System completeness, robustness
•
Should we pursue?
–
Regular capture & processing
–
Delivery to testbeds
Carnegie
Mellon
Future Plans - Long Term
•
NSA’s formal evaluation will help guide modifications
and new features
•
Other languages - Korean? Chinese?
•
Translation? Translation tools?
•
Named entity extraction: people, places, faces
•
Geospatial correlation and visualization
•
More content and multiple sources
•
Multidocument summarization
Carnegie
Mellon
Carnegie
Mellon
Digital Video Library
Carnegie
Mellon
Descargar

Remaining Challenges talk - Carnegie Mellon University