Concepts of Multimedia
Processing and Transmission
IT 481, Lecture #1
Dennis McCaughey, Ph.D.
28 August, 2006
Outline







Course Description
Instructor
Student Survey
Exams, Homework and Project
Grading
General Policies
Lecture Schedule
IT 481, Fall 2006
2
08/28/2006
Course Description

Topics
– The fundamentals of signal and image
processing, including algorithms for signal
processing that have applications to multimedia
– Techniques for voice coding and recognition, CD
and DVD technology, streaming video, WANs
and LANs, and videoconferencing technology

Text: Multimedia Communication Systems: Techniques,
Standards, and Networks, K. R. Rao, Zoran S. Bojkovic,
Dragorad A. Milovanovic, Prentice Hall PTR; 1st edition (April
26, 2002), ISBN: 013031398X.
IT 481, Fall 2006
3
08/28/2006
Instructor

Dennis McCaughey
– Contact Information




703-263-7425 (Office)
703-624-6830 (Cell)
[email protected] (e-mail)
Office Hours: one hour before class
– Background

IT 481, Fall 2006
PhD in EE University of Southern California 1977
– Thesis: Degrees of Freedom for Projection Imaging
4
08/28/2006
Student Survey



Name
Contact Information
Last Degree along with current Degree
Objective i. e.
– Undergrad seeking Bachelor’s, Grad seeking
MS/PhD, Other

Mathematical Background
–
–
–
–
Calculus?
Differential Equations?
Linear Algebra?
Probability, Statistics, Random Processes?
IT 481, Fall 2006
5
08/28/2006
Student Survey Cont’d

Systems Background
– Linear Systems?
– Signal Processing
– Image processing

Programming Languages
– C or C++?
– MATLAB?
IT 481, Fall 2006
6
08/28/2006
Exams, Homework and Project

Mid-Term: 1 Hour Closed Book
– Cover the key topics covered in class and
homework



Final: Format “To Be Determined”
Homework: 1) Reading assignments, 2)
Written answers to selected questions
based on reading assignments, 3) Some
limited math problems
Project: Format (Preliminary): MATLAB
implementation of a multimedia processing
application.
IT 481, Fall 2006
7
08/28/2006
More on the Project





A course project will be required exploring
aspects of multimedia signal processing
which may computer based using MATLAB.
Project topics will be of the student’s choice
subject to review by the instructor.
Each student will also be required to present
a short briefing on the results.
Projects will be evaluated on the content of
the presentation and not on the briefing
itself.
Details regarding topics, content, and format
will be provided during the course.
IT 481, Fall 2006
8
08/28/2006
Grading

The final grade will be determined by a weighted
average of the homework assignments, a midterm exam, a final exam and a project
IT 481, Fall 2006
Homework
10%
Mid-Term
20%
Project
30%
Final
40%
9
08/28/2006
General Policies

Collaboration
– Students are permitted and encouraged to collaborate on homework
assignments.
– All graded work, however, must be the original effort of the student
submitting the paper.

Homework
– Homework will be collected at the beginning of each class
period. Note: Late homework will be accepted provided the reason for
the delay is coordinated with the instructor within 2 days of its
assignment. Homework solutions will be discussed in class.

Make-up Exams
– Make-up exams will not be given unless detailed written clarification
accompanied by documentation for the absence is provided. If this
information is not provided an F grade will be given for the exam. The
location and time for a make-up exam will be decided by the instructor.
Also, students are expected to be in class and on-time for every class.
IT 481, Fall 2006
10
08/28/2006
Lecture Schedule (Preliminary)
Week
Date
Chapter
1
8/28
1, 2
2
9/11
4
3
9/18
3
4
5
6
7
8
9
10
11
12
9/25
10/2
10/9
10/17
10/30
11/6
11/13
11/20
11/27
3
3
3
1-4
5
5
5
13
12/4
Lecture #12:
14
12/11
Final Exam Review
15
12/18
Final Exam
IT 481, Fall 2006
6
Topic
Lecture #1: Introduction to Multimedia
Communications
Lecture #2: Networks and Multimedia
Applications
Lecture #3: Signal Processing Fundamentals
Lecture #4: Audio Coding – MATLAB Tutorial
Lecture #5: Video Coding 1
Lecture #6: Video Coding 2 – Review
Mid-Term Exam &Project Review
Lecture #7: MPEG-1
Lecture #8: MPEG-2
Lecture #9: MPEG-4
Lecture #10: MPEG-4, MPEG-7, MPEG-21
Lecture #11: Audio and video streaming
Reading
Homework
4
3
3
3
3
5
5
5
6
6
5-6
11
08/28/2006
Multimedia Communications
What is Multimedia?

Multimedia is a combination of text, art,
sound, animation, and video.
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
13
08/28/2006
Multimedia Components Simplified

Multimedia can be viewed as they combination of audio,
video, data and how they interact with the user (more than the
sum of the individual components)
Audio
Multimedia
Data
IT 481, Fall 2006
Video
14
08/28/2006
Background




Fast paced emergence in applications in
medicine, education, travel etc
Characterized by large documents that must
be communicated with short delays
Glamorous applications such as distance
learning, video teleconferencing
Applications that are enhanced by Video are
often seen as driver for development of
multimedia networks
IT 481, Fall 2006
15
08/28/2006
Forces Driving Communications That
Facilitate Multimedia Communications






Evolution of communications and data
networks
Increasing availability of almost unlimited
bandwidth demand
Availability of ubiquitous access to the
network
Ever increasing amount of memory and
computational power
Sophisticated terminals
Digitization of virtually everything
IT 481, Fall 2006
16
08/28/2006
New Information System Paradigm
Broadband Link
Multimedia
Integrated
Communication
Integration
Workstation, PC
Multimedia
Processing
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
17
08/28/2006
Elements of Multimedia Systems

Two key communication modes
– Person-to-person
– Person-to-machine
Use
Interface
Transport
Use
Interface
Processing
Storage and
Retrieval
Transport
Use
Interface
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
18
08/28/2006
Multimedia Networks




The world has been wrapped in copper and
glass fiber and can be viewed as a “hair
ball” with physical, wireless and satellite
entry/exit points.
Physical: LAN-WAN connections
Wireless: Cellular telephony, wireless PC
connectivity
Satellite: INMARSAT, THURYA, ACeS etc
IT 481, Fall 2006
19
08/28/2006
Multimedia Communication Model





Partitioning of information objects into
distinct types, e.g., text, audio, video
Standardization of service components per
information type
Creation of platforms at two levels – network
service and multimedia communication
Define general applications for multiple use
in various multimedia environments
Define specific applications, e.g. ecommerce, tele-training, … using building
blocks from platform and general
applications
IT 481, Fall 2006
20
08/28/2006
Requirements

User Requirements
–
–
–
–

Fast preparation and presentation
Dynamic control of multimedia applications
Intelligent support to users
Standardization
Network Requirements
– High speed and variable bit rates
– Multiple virtual connections using the same
access
– Synchronization of different information types
– Suitable standardized services along with
support
IT 481, Fall 2006
21
08/28/2006
Network Requirements


ATM-BISDN and SS7 have enabled the
switching based communications
capabilities over the PSTN that support the
necessary services
ATM-BISDN-SS7 will evolve to all optical
“switchless” networks based on packet
transfer
IT 481, Fall 2006
22
08/28/2006
Packet Transfer Concept




Allows voice, video and data to be dealt with
in a common format
More flexible than circuit switching which it
can emulate while allowing the multiplexing
of varied bit rate data streams
Dynamic allocation of bandwidth
Handle Variable Bit Rate (VBR) directly
IT 481, Fall 2006
23
08/28/2006
Considerations


Buffering required for constant bit rate data
such as audio
Re-sequencing and recovery capabilities
must be provided over networks where
packets may be received either in an order
different from that transmitted or dropped
– In an ATM network some packets can be
dropped while others may not (i.e. voice vs bank
transfer data packets)
– Optimum packet lengths for voice video and data
differ in an ATM network
– IP packets over the internet may arrive in a
different order or be dropped.
IT 481, Fall 2006
24
08/28/2006
Encoder
Application
Application
Network
•Transformation
•Data Structuring Multiplexing/Routing
•Re-Synch
•Quantization
•Entropy
Coding
•Bit-Rate
•Error detection
•Overhead
Control
•Loss detection
(FEC)
•Error correction
•Re-Trans
•Erasure
correction
IT 481, Fall 2006
Decoder
•De-quantization
•Entropy decode
•Inv Trans
•Loss conceal
•Post process
Users
Video
Digital Video Signal Transport
25
08/28/2006
Quality of Service (QoS)


The set of parameters that defines the
properties of media streams
Can define four QoS layers:
1. User QoS: Perception of the multimedia data at
the user interface (“qualitative”)
2. Application QoS: Parameters such as end-toend delay (“quantitative”)
3. System QoS: Requirements on the
communications services derived from the
application QoS
4. Network QoS: Parameters such as network
load and performance
IT 481, Fall 2006
26
08/28/2006
Audio-Visual Integration
Importance of Interaction



Multimedia is more than the combination of text,
audio, video and data
Interaction among media is important
Consider a poorly dubbed movie
– Audio not synchronized with video
– Lip movements inconsistent with language
– Audio dynamic range inconsistent with the scene
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
28
08/28/2006
Media Interaction

Process and Model
Audio
Compression
Synthesis
3D Sound
Lip synch
Face Animation
Joint A/V Coding
Speech Recognition
Text-to-Speech
Multimedia
Image
Video
Text
Translation
Natural language
Sign language
Lip reading
Compression, Graphics
Database indexing/retrieval
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
29
08/28/2006
Bimodality of Human Speech

Human speech is produced by vibration of
the vocal cord, configuration of the vocal
tract with muscles that generate facial
expressions
Audio +
Visual 
Perceived
ba
ga
da
pa
ga
ta
ma
ga
na
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
30
08/28/2006
Basic Definitions


The basic unit of acoustic speech is called a
phoneme
In the visual domain, the basic unit of mouth
movement is called viseme
– A viseme is the smallest visibly distinguishable
unit of speech
– Can contain several phonemes and thus form
one viseme group
– A many-to-one mapping between phonemes and
visemes
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
31
08/28/2006
Lip Reading System



Application to support hearing-impaired
person
People learn to understand spoken
language by combining visual content with
lexical, syntactic, semantic and
programmatic information
Automated lip reading systems
– Speech recognition possible using only visual
information
– Integrated with speech recognition systems to
improve accuracy
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
32
08/28/2006
Lip Synchronization

Applications
– In VTC (video teleconferencing) where video
frame is dropped (low bandwidth requirement)
but audio must still be continuous
– In non-real-time use such as dubbing in studio
where recorded voice full of background noise

Time-warping commonly used in both audio
and video modes
– Time-frequency analysis
– Video time-warping could be used for VTC
– Audio time-warping could be used for dubbing
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
33
08/28/2006
Lip Tracking




To prevent too much jerkiness in the motion
rendering and too much loss in lip synchronization
Involved real-time analysis on 3-dimensional of the
video signal plus one temporal dimension
Produce meaningful parameters
– Classification of mouth images into visemes
– Measures of dimension, e.g. mouth widths and
heights
Analysis tools – Fourier Transform, KarhunenLoeve Transform (KLT), Probability Density
Function (pdf) Estimation
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
34
08/28/2006
Audio-to-Visual Mapping for Lip
Tracking



Conversion of acoustic speech to mouth shape
parameters
A mapping of phonemes to visemes
Could be most precisely implemented with a
complete speech recognizer followed by a look-up
table
– High computational overhead plus table look-up complexity
– Do not need to recognize spoken word to achieve audioto-visual mapping

Physical relationships exist between vocal tract
shape and sound produced  functional
relationships exist between speech and visual
parameters
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
35
08/28/2006
Classification-Based Conversion
Approaches for Lip Tracking

Two-step process
– Classification of acoustic signal using VQ
(vector quantization), HMM (hidden Markov
model) and NN (neural network)
– Mapping of the acoustic classes into
corresponding visual outputs, then averaged to
get centroid

Shortcomings
– Error resulting from averaging visual vector to
get visual centroid
– Not a continuous mapping – finite output levels
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
36
08/28/2006
Classification-Based Conversion
Phoneme Space
Viseme Space
Centroid
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
37
08/28/2006
Audio and Visual Integration for Lip
Reading Applications

Three major steps
– Audio-visual pre-processing – Principal
Component Analysis (PCA) has been used for
feature extraction
– Pattern recognition strategy (HMM, NN, timewarping…)
– Integration strategy (decision making)


Heuristic rules to incorporate knowledge of phonemes
about the two modalities
Combination of independent evaluation score for each
modalities
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
38
08/28/2006
Application in Biometrics – Bimodal
Person Verification

Existing methods for person verification are
mainly based on a single modality which
would have limitation in security and
robustness

Audio visual integration using a camera and
microphone makes person verification a
more reliable product
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
39
08/28/2006
Joint Audio-Video Coding

Correlation between audio and video can be
used to achieve more efficient coding
– Predictive coding of audio and video information
used to construct estimate of current frame
(cross-modal redundancy)
– Difference between original and estimated signal
can be transmitted as parameters
– Decision on what and how to send is based on
Rate Distortion (R-D) criteria

Reconstruction done at receiver according
to agreed-upon decoding rules
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
40
08/28/2006
Cross-Model Predictive Coding
Visual
Analysis
A-to-V
Mapping
Parameter X
Xˆ
Decision
Module
(R-D)
Nothing
X
ˆ
 X
Parameter X
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
41
08/28/2006
Applications of Multimedia

Business - Business applications for
multimedia include presentations training,
marketing, advertising, product demos,
databases, catalogues, instant messaging,
and networked communication.

Schools - Educational software can be
developed to enrich the learning process.
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
42
08/28/2006
Applications of Multimedia

Home - Most multimedia projects reach the
homes via television sets or monitors with
built-in user inputs.

Public places - Multimedia will become
available at stand-alone terminals or kiosks
to provide information and help.
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
43
08/28/2006
Compact Disc Read-Only (CD-ROM)



CD-ROM is the most cost-effective
distribution medium for multimedia projects.
It can contain up to 80 minutes of full-screen
video or sound.
CD burners are used for reading discs and
converting the discs to audio, video, and
data formats.
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
44
08/28/2006
Digital Versatile Disc (DVD)



Multilayered DVD technology increases the
capacity of current optical technology to 18
GB.
DVD authoring and integration software is
used to create interactive front-end menus
for films and games.
DVD burners are used for reading discs and
converting the disc to audio, video, and data
formats.
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
45
08/28/2006
Multimedia Communications

Multimedia communications is the delivery
of multimedia to the user by electronic or
digitally manipulated means.
Audio Communications
(Telephony, sound, Broadcast)
Data, text, image
Communications
(Data Transfer, fax…)
Multimedia
Communications
Video Communications
(Video telephony,
TV/HDTV)
Slide: Courtesy, Hung Nguyen
IT 481, Fall 2006
46
08/28/2006
Descargar

Document