Concepts of Multimedia
Processing and Transmission
IT 481, Lecture #1
Dennis McCaughey, Ph.D.
22 January, 2007
Outline
 Course
Description
 Instructor
 Exams, Homework and Project
 Grading
 General Policies
 Lecture Schedule
IT 481, Spring
2
01/22/2007
Course Description

Topics
– The fundamentals of signal and image
processing, including algorithms for signal
processing that have applications to multimedia
– Techniques for voice coding and recognition, CD
and DVD technology, streaming video, WANs
and LANs, and videoconferencing technology

Text: Multimedia Communications; Applications, Networks,
Protocols and Standards, Fred Halsall, Addison-Wesley; 1st
edition (2002), ISBN: 0-201-39818-4.
IT 481, Spring
3
01/22/2007
Instructor

Dennis McCaughey
– Contact Information




703-263-7425 (Office)
703-624-6830 (Cell)
[email protected] (e-mail)
Office Hours: one hour before class
– Background

IT 481, Spring
PhD in EE University of Southern California 1977
– Thesis: Degrees of Freedom for Projection Imaging
4
01/22/2007
Exams, Homework and Project

Mid-Term: 1 Hour Closed Book
– Cover the key topics covered in class and
homework



Final: Format “To Be Determined”
Homework: 1) Reading assignments, 2)
Written answers to selected questions
based on reading assignments, 3) Some
limited math problems
Project: Format (Preliminary): MATLAB
implementations of selected multimedia
processing applications.
IT 481, Spring
5
01/22/2007
More on the Project





A course project will explore aspects of multimedia
signal processing and will be computer based using
MATLAB.
Project topics will consist of a set of Matlab
implementations addressing multimedia concepts
assigned on a running basis over the semester.
Each student will be required to submit the project
in the format of a final report.
The projects will be graded on the effort applied-not
on Matlab programming skills.
Details regarding topics, content, and format will be
provided during the course.
IT 481, Spring
6
01/22/2007
Grading

The final grade will be determined by a weighted
average of the homework assignments, a midterm exam, a final exam and a project
IT 481, Spring
Homework
10%
Mid-Term
20%
Project
30%
Final
40%
7
01/22/2007
General Policies

Collaboration
– Students are permitted and encouraged to collaborate on homework
assignments.
– All graded work, however, must be the original effort of the student
submitting the paper.

Homework
– Homework will be collected at the beginning of each class
period. Note: Late homework will be accepted provided the reason for
the delay is coordinated with the instructor within 2 days of its
assignment. Homework solutions will be discussed in class.

Make-up Exams
– Make-up exams will not be given unless detailed written clarification
accompanied by documentation for the absence is provided. If this
information is not provided an F grade will be given for the exam. The
location and time for a make-up exam will be decided by the instructor.
Also, students are expected to be in class and on-time for every class.
IT 481, Spring
8
01/22/2007
Lecture Schedule (Preliminary)
Week
Date
Chapter
1
1/22
1
2
1/29
None
3
2/5
2
4
5
7
8
9
10
2/12
2/19
2/26
3/5
3/12
3/19
3
3
4
1-4
None
4
11
3/26
5
12
4/2
6
13
4/9
11
14
15
16
4/16
4/23
4/30
5/14
TBS
TBS
1-6,11
IT 481, Spring
Topic
Lecture #1: Introduction to Multimedia
Communications
Lecture #2: Signal Processing
Fundamentals and Intro to Matlab
Lecture #3: Multimedia Information
Representation
Lecture #4: Text Compression
Lecture #5: Image Compression
Lecture #6: Audio Compression
Mid-Term Exam &Project Review
Spring Break
Lecture #7: Video Compression
Lecture #8: Standards for Multimedia
Communications
Lecture #9: Digital Communication
Basics
Lecture #10: Entertainment Networks
and High Speed Modems
Lecture #11: Data Privacy
Special Topics
Final Exam Review
Final Exam 7:30pm
Reading
Assignment
Homework
1,2
3
3
4
4
5
6
11
TBD
TBD
1-6,11
9
01/22/2007
Multimedia Communications
What is Multimedia?

Multimedia is a combination of text, art,
sound, animation, and video.
Slide: Courtesy, Hung Nguyen
IT 481, Spring
11
01/22/2007
Multimedia Components Simplified

Multimedia can be viewed as they combination of audio,
video, data and how they interact with the user (more than the
sum of the individual components)
Audio
Multimedia
Data
IT 481, Spring
Video
12
01/22/2007
Background




Fast paced emergence in applications in
medicine, education, travel etc
Characterized by large documents that must
be communicated with short delays
Glamorous applications such as distance
learning, video teleconferencing
Applications that are enhanced by Video are
often seen as driver for development of
multimedia networks
IT 481, Spring
13
01/22/2007
Forces Driving Communications That
Facilitate Multimedia Communications






Evolution of communications and data
networks
Increasing availability of almost unlimited
bandwidth demand
Availability of ubiquitous access to the
network
Ever increasing amount of memory and
computational power
Sophisticated terminals
Digitization of virtually everything
IT 481, Spring
14
01/22/2007
New Information System Paradigm
Broadband Link
Multimedia
Integrated
Communication
Integration
Workstation, PC
Multimedia
Processing
Slide: Courtesy, Hung Nguyen
IT 481, Spring
15
01/22/2007
Elements of Multimedia Systems

Two key communication modes
– Person-to-person
– Person-to-machine
Use
Interface
Transport
Use
Interface
Processing
Storage and
Retrieval
Transport
Use
Interface
Slide: Courtesy, Hung Nguyen
IT 481, Spring
16
01/22/2007
Multimedia Networks




The world has been wrapped in copper and
glass fiber and can be viewed as a “hair
ball” with physical, wireless and satellite
entry/exit points.
Physical: LAN-WAN connections
Wireless: Cellular telephony, wireless PC
connectivity
Satellite: INMARSAT, THURYA, ACeS etc
IT 481, Spring
17
01/22/2007
Multimedia Communication Model





Partitioning of information objects into
distinct types, e.g., text, audio, video
Standardization of service components per
information type
Creation of platforms at two levels – network
service and multimedia communication
Define general applications for multiple use
in various multimedia environments
Define specific applications, e.g. ecommerce, tele-training, … using building
blocks from platform and general
applications
IT 481, Spring
18
01/22/2007
Requirements

User Requirements
–
–
–
–

Fast preparation and presentation
Dynamic control of multimedia applications
Intelligent support to users
Standardization
Network Requirements
– High speed and variable bit rates
– Multiple virtual connections using the same
access
– Synchronization of different information types
– Suitable standardized services along with
support
IT 481, Spring
19
01/22/2007
Network Requirements


ATM-BISDN and SS7 have enabled the
switching based communications
capabilities over the PSTN that support the
necessary services
ATM-BISDN-SS7 will evolve to all optical
“switchless” networks based on packet
transfer
IT 481, Spring
20
01/22/2007
Packet Transfer Concept




Allows voice, video and data to be dealt with
in a common format
More flexible than circuit switching which it
can emulate while allowing the multiplexing
of varied bit rate data streams
Dynamic allocation of bandwidth
Handle Variable Bit Rate (VBR) directly
IT 481, Spring
21
01/22/2007
Considerations


Buffering required for constant bit rate data
such as audio
Re-sequencing and recovery capabilities
must be provided over networks where
packets may be received either in an order
different from that transmitted or dropped
– In an ATM network some packets can be
dropped while others may not (i.e. voice vs bank
transfer data packets)
– Optimum packet lengths for voice video and data
differ in an ATM network
– IP packets over the internet may arrive in a
different order or be dropped.
IT 481, Spring
22
01/22/2007
Digital Video Signal Transport
Encoder
Application
Application
Network
•Transformation
•Data Structuring Multiplexing/Routing
•Re-Synch
•Quantization
•Entropy
Coding
•Bit-Rate
•Error detection
•Overhead
Control
•Loss detection
(FEC)
•Error correction
•Re-Trans
•Erasure
correction
IT 481, Spring
Decoder
•De-quantization
•Entropy decode
•Inv Trans
•Loss conceal
•Post process
Users
Video
The following figure will be examined over the course of
the semester
23
01/22/2007
Quality of Service (QoS)


The set of parameters that defines the
properties of media streams
Can define four QoS layers:
1. User QoS: Perception of the multimedia data at
the user interface (“qualitative”)
2. Application QoS: Parameters such as end-toend delay (“quantitative”)
3. System QoS: Requirements on the
communications services derived from the
application QoS
4. Network QoS: Parameters such as network
load and performance
IT 481, Spring
24
01/22/2007
Applications of Multimedia

Business - Business applications for
multimedia include presentations training,
marketing, advertising, product demos,
databases, catalogues, instant messaging,
and networked communication.

Schools - Educational software can be
developed to enrich the learning process.
Slide: Courtesy, Hung Nguyen
IT 481, Spring
25
01/22/2007
Applications of Multimedia

Home - Most multimedia projects reach the
homes via television sets or monitors with
built-in user inputs.

Public places - Multimedia will become
available at stand-alone terminals or kiosks
to provide information and help.
Slide: Courtesy, Hung Nguyen
IT 481, Spring
26
01/22/2007
Compact Disc Read-Only (CD-ROM)



CD-ROM is the most cost-effective
distribution medium for multimedia projects.
It can contain up to 80 minutes of full-screen
video or sound.
CD burners are used for reading discs and
converting the discs to audio, video, and
data formats.
Slide: Courtesy, Hung Nguyen
IT 481, Spring
27
01/22/2007
Digital Versatile Disc (DVD)



Multilayered DVD technology increases the
capacity of current optical technology to 18
GB.
DVD authoring and integration software is
used to create interactive front-end menus
for films and games.
DVD burners are used for reading discs and
converting the disc to audio, video, and data
formats.
Slide: Courtesy, Hung Nguyen
IT 481, Spring
28
01/22/2007
Multimedia Communications

Multimedia communications is the delivery
of multimedia to the user by electronic or
digitally manipulated means.
Audio Communications
(Telephony, sound, Broadcast)
Data, text, image
Communications
(Data Transfer, fax…)
Multimedia
Communications
Video Communications
(Video telephony,
TV/HDTV)
Slide: Courtesy, Hung Nguyen
IT 481, Spring
29
01/22/2007
Multimedia Terms
IT 481, Spring
30
01/22/2007
Alternative Types of Media used in
Multimedia Applications
IT 481, Spring
31
01/22/2007
Multimedia Communications Networks
IT 481, Spring
32
01/22/2007
Multimedia Networks and Their Services
IT 481, Spring
33
01/22/2007
Multimedia Networks and Their Services
IT 481, Spring
34
01/22/2007
Audio-Visual Integration
Application in Biometrics – Bimodal
Person Verification

Existing methods for person verification are
mainly based on a single modality which
would have limitation in security and
robustness

Audio visual integration using a camera and
microphone makes person verification a
more reliable product
Slide: Courtesy, Hung Nguyen
IT 481, Spring
36
01/22/2007
Joint Audio-Video Coding

Correlation between audio and video can be
used to achieve more efficient coding
– Predictive coding of audio and video information
used to construct estimate of current frame
(cross-modal redundancy)
– Difference between original and estimated signal
can be transmitted as parameters
– Decision on what and how to send is based on
Rate Distortion (R-D) criteria

Reconstruction done at receiver according
to agreed-upon decoding rules
Slide: Courtesy, Hung Nguyen
IT 481, Spring
37
01/22/2007
Cross-Model Predictive Coding
Visual
Analysis
A-to-V
Mapping
Parameter X
Xˆ
Decision
Module
(R-D)
Nothing
X
ˆ
 X
Parameter X
Slide: Courtesy, Hung Nguyen
IT 481, Spring
38
01/22/2007
Importance of Interaction
Multimedia is more than the
combination of text, audio, video and
data
 Interaction among media is important
 Consider a poorly dubbed movie

– Audio not synchronized with video
– Lip movements inconsistent with
language
– Audio dynamic range inconsistent with
the scene
Slide: Courtesy, Hung Nguyen
IT 481, Spring
39
01/22/2007
Media Interaction

Process and Model
Audio
Compression
Synthesis
3D Sound
Lip synch
Face Animation
Joint A/V Coding
Speech Recognition
Text-to-Speech
Multimedia
Image
Video
Text
Translation
Natural language
Sign language
Lip reading
Compression, Graphics
Database indexing/retrieval
Slide: Courtesy, Hung Nguyen
IT 481, Spring
40
01/22/2007
Bimodality of Human Speech

Human speech is produced by vibration of
the vocal cord, configuration of the vocal
tract with muscles that generate facial
expressions
Audio +
Visual 
Perceived
ba
ga
da
pa
ga
ta
ma
ga
na
Slide: Courtesy, Hung Nguyen
IT 481, Spring
41
01/22/2007
Basic Definitions


The basic unit of acoustic speech is called a
phoneme
In the visual domain, the basic unit of mouth
movement is called viseme
– A viseme is the smallest visibly distinguishable
unit of speech
– Can contain several phonemes and thus form
one viseme group
– A many-to-one mapping between phonemes and
visemes
Slide: Courtesy, Hung Nguyen
IT 481, Spring
42
01/22/2007
Lip Reading System



Application to support hearing-impaired
person
People learn to understand spoken
language by combining visual content with
lexical, syntactic, semantic and
programmatic information
Automated lip reading systems
– Speech recognition possible using only visual
information
– Integrated with speech recognition systems to
improve accuracy
Slide: Courtesy, Hung Nguyen
IT 481, Spring
43
01/22/2007
Lip Synchronization

Applications
– In VTC (video teleconferencing) where video
frame is dropped (low bandwidth requirement)
but audio must still be continuous
– In non-real-time use such as dubbing in studio
where recorded voice full of background noise

Time-warping commonly used in both audio
and video modes
– Time-frequency analysis
– Video time-warping could be used for VTC
– Audio time-warping could be used for dubbing
Slide: Courtesy, Hung Nguyen
IT 481, Spring
44
01/22/2007
Lip Tracking




To prevent too much jerkiness in the motion
rendering and too much loss in lip synchronization
Involved real-time analysis on 3-dimensional of the
video signal plus one temporal dimension
Produce meaningful parameters
– Classification of mouth images into visemes
– Measures of dimension, e.g. mouth widths and
heights
Analysis tools – Fourier Transform, KarhunenLoeve Transform (KLT), Probability Density
Function (pdf) Estimation
Slide: Courtesy, Hung Nguyen
IT 481, Spring
45
01/22/2007
Audio-to-Visual Mapping for Lip
Tracking



Conversion of acoustic speech to mouth shape
parameters
A mapping of phonemes to visemes
Could be most precisely implemented with a
complete speech recognizer followed by a look-up
table
– High computational overhead plus table look-up complexity
– Do not need to recognize spoken word to achieve audioto-visual mapping

Physical relationships exist between vocal tract
shape and sound produced  functional
relationships exist between speech and visual
parameters
Slide: Courtesy, Hung Nguyen
IT 481, Spring
46
01/22/2007
Classification-Based Conversion
Approaches for Lip Tracking

Two-step process
– Classification of acoustic signal using VQ
(vector quantization), HMM (hidden Markov
model) and NN (neural network)
– Mapping of the acoustic classes into
corresponding visual outputs, then averaged to
get centroid

Shortcomings
– Error resulting from averaging visual vector to
get visual centroid
– Not a continuous mapping – finite output levels
Slide: Courtesy, Hung Nguyen
IT 481, Spring
47
01/22/2007
Classification-Based Conversion
Phoneme Space
Viseme Space
Centroid
Slide: Courtesy, Hung Nguyen
IT 481, Spring
48
01/22/2007
Audio and Visual Integration for Lip
Reading Applications

Three major steps
– Audio-visual pre-processing – Principal
Component Analysis (PCA) has been used for
feature extraction
– Pattern recognition strategy (HMM, NN, timewarping…)
– Integration strategy (decision making)


Heuristic rules to incorporate knowledge of phonemes
about the two modalities
Combination of independent evaluation score for each
modalities
Slide: Courtesy, Hung Nguyen
IT 481, Spring
49
01/22/2007
Descargar

Document