ENEE631 Spring’09
Lecture 17 (4/6/2009)
MPEG Video Coding and Beyond
Spring ’09 Instructor: Min Wu
Electrical and Computer Engineering Department,
University of Maryland, College Park
 bb.eng.umd.edu (select ENEE631 S’09)
 [email protected]
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Overview and Logistics
UMCP ENEE631 Slides (created by M.Wu © 2004)

Last Time:
– Block-matching and application to hybrid video coding


Exploit spatial redundancy via transform coding: e.g. block DCT coding
Exploit temporal redundancy via predictive coding: ME/MC
– MPEG-1 video coding standard

Today:
– Finish MPEG-1 Discussion
– Other coding considerations/standards: H.26x, MPEG-2, MPEG-4, etc.
– Geometric transform of images

Assign#4 on video and motion estimation – posted online
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [2]
Review: DCT + ME/MC for Hybrid Video Coding
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)


“Hybrid” ~ combined transform coding & predictive coding
Spatial redundancy removal
– Use DCT-based transform coding for reference frame

Temporal redundancy removal
– Use motion-based predictive coding for next frames


estimate motion and use reference frame to predict
only encode MV & prediction residue (“motion compensation residue”)
(From Princeton EE330 S’01 by B.Liu)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [3]
(From R.Liu’s Handbook Fig.2.18)
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Review: Hybrid MC-DCT Video Encoder & Decoder
• Intra-frame: encoded
without prediction
• Inter-frame: predictively
encoded => use quantized
frames as ref for residue
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [4]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Review: Additional Issues in Hybrid Video Coding

Not all regions are easily inferable from previous frame
– Occlusion ~ solvable by backward prediction using future frames as ref.
– Adaptively decide using prediction or not

Drifting and error propagation
Solution: Encode reference regions or frames from time to time (“intra coding”)

Random access: e.g. want to get 95th frame
Solution: Encode frame without prediction from time to time

How to allocate bits?
– Based on visual model and statistics: JPEG-like quantiz.steps; entropy coding
– Consider constant or variable bit-rate requirement

Constant-bit-rate (CER) vs. Variable-bit-rate (VER)
 Wrap up all solutions ~ MPEG-like codec
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [5]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Review: MPEG-1 Video Coding Standard

Standard only specifies decoders’ capabilities
– Prefer simple decoding and not limit encoder’s complexity
– Leave flexibility and competition in implementing encoder

Block-based hybrid coding (DCT + M.C.)
– 8x8 block size as basic coding unit
– 16x16 “macroblock” size for motion estimation/compensation

Group-of-Picture (GOP) structure with 3 types of frames
– Intra coded
– Forward-predictively coded
– Bidirectional-predictively coded
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [9]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
MPEG-1 Picture Types and Group-of-Pictures

A Group-of-Picture (GOP) contains 3 types of frames (I/P/B)

Frame order
I1 BBB P1 BBB P2 BBB I2 …

Coding order
I1 P1 BBB P2 BBB I2 BBB …
(From R.Liu Handbook Fig.3.13)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [10]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
“Adaptive” Predictive Coding in MPEG-1

Half-pel M.V. search within +/-64 pel range
– Use spatial differential coding on M.V. to remove M.V. spatial redundancy

Coding each block in P-frame
– Predictive block using previous I/P frame as reference
– Intra-block ~ encode without prediction




use this if prediction costs more bits than non-prediction
good for occluded area
can also avoid error propagation
Coding each block in B-frame
– Intra-block ~ encode without prediction
– Predictive block



Use previous I/P frame as reference (forward prediction),
Or use future I/P frame as reference (backward prediction),
Or use both for prediction and take average
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [11]
(Fig. from Ken Lam – HK Poly Univ.
short course in summer’2001)
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Coding of B-frame (cont’d)
Previous frame
Current frame
Future frame
A
B
B = A  forward prediction
B = C  backward prediction
or B = (A+C)/2  interpolation
M. Wu: ENEE631 Digital Image Processing (Spring'09)
C
one motion vector
two motion vectors
Lec17 – MPEG and more [12]
Revised from R.Liu Seminar Course ’00 @ UMD
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Quantization for I-frame (I-block) & M.C. Residues

Quantizer for I-frame (I-block)
– Different step size for different freq. band (similar to JPEG)
– Default quantization table
– Scale the table for different compression-quality

Quantizer for residues in predictive block
– Noise-like residue
– Similar variance in different frequency band
=> Assign same quantization step size for each frequency band
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [13]
Adjusting Quantizer
UMCP ENEE631 Slides (created by M.Wu © 2001)

For smoothing out bit rate
– Some applications prefer approx. constant bit rate video stream (CBR)
e.g., prescribe # bits per second
 very-short-term bit-rate variations can be smoothed by a buffer
 variations can’t be too large on longer term
~ o.w. buffer overflow, delay and jitter in playback
– Need to assign large step size for complex and high-motion frames

For reducing bit rate by exploiting HVS temporal properties
– Noise/distortion in a video frame would not be very much visible when
there is a sharp temporal transition (scene change)


can compress a few frames right after scene change with fewer bits
Alternative bit-rate adjustment tool ~ frame type
– IIIIII…
– IPP…PI PP…
– IBBPBBPBBI…
lowest compression ratio (like motion-JPEG)
moderate compression ratio
highest compression ratio
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [14]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Color Transformation

RGB  YUV color coordinates
Y

U

 V

  0 . 2990
 
  0 . 1687
 
  0 . 5000
0 . 5870
 0 . 3313
 0 . 4187
0 . 1140   R 
 
0 . 5000
G
 
 0 . 0813   B 
U/V chrominance components are downsampled in coding
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [15]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Video Coding Summary: Performance Tradeoff
From R.Liu’s Handbook Fig.1.2:
“mos” ~ 5-pt mean opinion scale of bad,
poor, fair, good, excellent
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [17]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
About Compression Ratio

Raw video
– 24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps

Potential “cheating” points => contributing ~ 4:1 inflation
– Color components are actually downsampled
– 30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps
– ( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [18]
UMCP ENEE631 Slides (created by M.Wu © 2004)
Other Standards and Considerations for
Digital Video Coding
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [19]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
H.26x for Video Telephony

Remote face-to-face communication: A dream for years
– H.26x series – video coding targeted low bit rate



through ISDN or regular analog telephone line ~ on the order of 64kbps
need roughly symmetric complexity on encoder and decoder
H.261 (early 1990s)
– Similar to simplified MPEG-1 ~ block-based DCT/MC hybrid coder
– Integer-pel motion compensation with I/P frame only ~ no B frames
– Restricted picture size/fps format and M.V. range

H.263 (mid 1990s) and H.263+/H.263++ (late 1990s)
– Support half-pel motion compensation & many options for improvement

H.264 (latest, 2001-): also known as H.26L / JVT / MPEG4 part10
– Hybrid coding framework with many advanced techniques
– Focusing on greatly improving compression ratio at a cost of complexity

allow smaller block size; more choices on ref; advanced entropy coding, etc.
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [20]
From Gonzalez-Woods
3/e Table 8.11
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [21]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
MPEG-2

Extend from MPEG-1

Target at high-resolution high-bit-rate applications
– Digital video broadcasting, HDTV, …; also used for DVD

Support interlaced video
– Frame pictures vs. Field pictures
– New prediction modes for motion compensation for interlaced video


Use previously encoded fields to do M.E.-M.C.
Support scalability
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [22]
From Wang’s book preprint Fig. 13.17
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [25]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Scalability in Video Codecs

Scalability: provide different quality in a single stream
– Stack up more bits on base layer to provide improved quality

Possible ways for achieving scalabilities
– SNR Scalability ~ Multiple–quality video services

Basic vs. premium quality
– Spatial Scalability ~ Multiple-dimension displays

Display on PDA vs. PC vs. Super-resolution display
– Temporal Scalability ~ Multiple frame rates
– Frequency Scalability ~ Blurred version to sharp, detailed version

Layered coding concept facilitates:
– Unequal error protection
– Different needs from customers
M. Wu: ENEE631 Digital Image Processing (Spring'09)
– Efficient use of resources
– Multiple services
Lec17 – MPEG and more [26]
SNR Scalability
Two layers with same spatio-temporal resolution but
different qualities
Video in
+
Base-layer
bitsteam
base-layer
encoder
-
multiplexer

base-layer
decoder
Output
bitsteam
enhancement-layer
encoder
Enhancement-layer
bitsteam
From R.Liu Seminar Course @ UMCP
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [27]
Spatial Scalability
Two layers with different spatial resolution
Down-sampler
Video in
base-layer
encoder
Base-layer
bitsteam
Up-sampler
+
-
multiplexer

base-layer
decoder
Output
bitsteam
enhancement-layer
encoder
Enhancement-layer
bitsteam
From R.Liu Seminar Course @ UMCP
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [29]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
MPEG-4

Many functionalities targeting a variety of applications

Introduced object-based coding strategy
– For better support of interactive applications & graphics/animation video
– Require encoder to perform object segmentation


difficult for general applications
Introduced error resilient coding techniques
– “Streaming video profile” for wireless multimedia applications

Part-10 is converged into H.264/AVC (Advanced Video Coding)
– Focused on improving compression ratio and error resilience
– Stick with Hybrid Coding framework
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [32]
Revised from R.Liu Seminar Course @ UMCP
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Object-based Coding in MPEG-4

Interactive functionalities

Higher compression
efficiency by separately
handling
– Moving objects
– Unchanged background
– New regions
– M.C.-failure regions
=> “Sprite” encoding

Object segmentation
needed (not easy )
– Based on color, motion,
edge, texture, etc.
– Possible for targeted
applications
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [33]
UMCP ENEE631 Slides (created by M.Wu © 2001)
From Wang’s
Preprint
Table 1.3
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [36]
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
MPEG-7

Figure from MPEG-7
Document N4031
(March 2001)
“Multimedia Content Description Interface”
– Not a video coding/compression standard like previous MPEG
– Emphasize on how to describe the video content for efficient
indexing, search, and retrieval

Standardize the description mechanism of content
– Descriptor, Description Scheme, Description Definition Languages

Employ XML type of description language
– Example of MPEG-7 visual descriptors: Color, Texture, Shape, …
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [37]
UMCP ENEE631 Slides (created by M.Wu © 2004)
Summary of Today’s Lecture

MPEG-1 video coding standard

Other coding considerations and standards
– H.26x, MPEG-2, MPEG-4, MPEG-7, etc.

Geometric transform of images ~ more in next lecture

Readings:
– Gonzalez’s 3/e book 8.2.9, 8.1.7; 2.6.5 (geometric transform)
– Liu’s book on video coding (see course website)


Chapter 2 “Motion-Compensated DCT Video Coding”
Chapter 3 “Video Coding Standards”
– Other reference: Wang’s textbook

Chapter 13 (video standards); Chapter 1 (video basics)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [38]
UMCP ENEE631 Slides (created by M.Wu © 2004)
Geometric Relations and Manipulations of Images
Useful to characterize:
- global camera motion in video;
- relate two images of similar scenes taken from
different time or slightly different view point
=> “image registration”
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [40]
Rotation, Translation, and Scaling
UMCP ENEE631 Slides (created by M.Wu © 2001)

(x’, y’)
R.S.T. of an image object
(x, y)
– Original pixel location (x,y)  New location (x’,y’)
 x '  x  t x 
      t 
 y '  y   y 
 x '   cos 
 
 y '  sin 
 x '  s x
   0
 y ' 

transl ation by t x t y

Preserve
length & angle
 sin  

cos  
0  x

s y   y 

x
 
 y
scaling
rotate by  around origin
counter - clockwise
by s x and s y
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Uniform scaling Sx = Sy
(preserve angle and shape)
Differential scaling Sx  Sy
Lec17 – MPEG and more [41]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Rotation, Translation, and Scaling (cont’d)

Rotation and translation of image coordinates
– Note the relations with rotation and translation of image objects
 x '  x    t x 
       t 
 y '  y   y 
y
 x '   cos 
 
 y '   sin 
y’
transl ate origin to
sin  

cos  
x
 
 y
(t x , t y )
rotate axis by 
counter - clockwise
x’

x
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [42]
Implementation Issues of Geometric Transform
UMCP ENEE631 Slides (created by M.Wu © 2001)

Forward transform
– Index mapping from input to output image


What if most values obtained for an output image are at fractional
coordinate indices?
Reverse transform
– Map integer indices of output image to input image

(p,q)
Get values of input image at fractional indices through interpolation
(p,q+1)
b
a
(p’,q’)
(p+1,q)
(p+1,q+1)
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [43]
Exercise: express RST
and reflection in homogeneous coordinate
UMCP ENEE631 Slides (created by M.Wu © 2001/2004)
2-D Homogeneous Coordinate

Describe R.S.T. transform by P’ = M P + T
– Need calculating intermediate coordinate values for successive transf.

Homogeneous coordinate
– Allow R.S.T. represented by matrix multiplication operations

successive transf. can be calculated by combining transf. matrices
– Cartesian point (x,y)  Homogeneous representation ( s x’, s y’, s )

represent same pixel location for all nonzero parameter s; often use s=1
 x '   sx '   a 11
    
y ' ~ sy '  a
     21
 1   s   a 31
a 12
a 22
a 32
a 13 

a 23

1 
x
 
y
 
 1 
The name: Equation f(x,y) = 0 becomes homogeneous equation in (s x’, s y’, s )
such that if the common factors in 3 parameters can be factored out from the equation.
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [45]
UMCP ENEE631 Slides (created by M.Wu © 2001)
R.S.T. in Homogeneous Coordinates
 x ' 1
  
y'  0
  
 1   0
0
1
0
tx 

ty

1 
x
 
y
 
 1 
 x '   cos 
  
y '  sin 
  
 1   0

 x '  s x
  
y'  0
  
 1   0
 sin 
cos 
0
0

0

1 
0
sy
0
0

0

1 
x
 
y
 
 1 
x
 
y
 
 1 
Successive R.S.T.
– Left multiply the basic transform matrices
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [46]
Reflection
UMCP ENEE631 Slides (created by M.Wu © 2001)

Reflect about x-axis, y-axis, and origin
1

0

 0

0
1
0
0
1
0
 1

0

 0
0

0

1 
0
1
0
0

0

1 
Reflect about y=x and y=-x
0

1

 0

 1

0

 0
0

0

1 
1
0
0
0

0

1 
 0

1

 0
1
0
0
0

0

1 
Reflect about a general line y=ax+b
Combination of translate-rotate => reflect => inverse rotate-translate
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [47]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Shear

Shear ~ a transformation that distorts the shape
– Cause opposite layers of the object slide over each other

Shear relative to x-axis
1

0

 0
sh x
1
0
0

0

1 
y’
y
(1, 1)
shx =2
(2, 1)
(3, 1)
x’
x
(1, 0)

Extend to shears relative to other reference lines
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [48]
General Composite Transforms
UMCP ENEE631 Slides (created by M.Wu © 2001)

Combined R.S.T.
– {aij} is determined by
R.S.T. parameters

 x '   a 11
  
y'  a
   21
 1   0
a 12
a 22
0
a 13 

a 23

1 
x
 
y
 
 1 
Rigid-body transform
– Only involve translations and rotations
– 2x2 rotation submatrix is orthogonal


row vectors are orthonormal
Extension to 3-D homogeneous coordinate
– ( sX, sY, sZ, s ) with 4x4 transformation matrices
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [49]
General Composite Transforms (cont’d)
UMCP ENEE631 Slides (created by M.Wu © 2001)

Affine transforms ~ 6 parameters
– Can be expressed as composition of RST,
reflection and shear
 x '   a 11
  
y'  a
   21
 1   0
a 12
a 22
0
a 13 

a 23

1 
x
 
y
 
 1 
– Parallel lines are transformed as parallel lines

Projective transforms ~ 8 parameters
– Cover more general geometric transformations between 2 planes

Widely used in computer vision (e.g. image mosaicing, synthesized views)
– Two unique phenomena:


Chirping: increase in perceived spatial freq as distance to camera increases
Converging/Keystone effects: parallel lines appear closer & merging in
distance
 sx '   a 11 a 12 b1   x  w  [ x ', y ']T
Aw  b
  
 
sy '  a 21 a 22 b 2
y

w new 
T
  
 
1 c w
 s   c1
c2
1   1 
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [51]
UMCP ENEE631 Slides (created by M.Wu © 2004)
Effects of Various Geometric Mappings
From Wang’s Book Preprint Fig.5.18
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [52]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Higher-order Nonlinear Spatial Warping

Analogous to “rubber sheet stretching”
– Forward and reverse mapping of pixels’ coordinate indices
(x, y)  (x’, y’)

Polynomial warping
– Extend affine transform to higher-order polynomial mapping
– 2nd-order warping
x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2
y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2

Spatial distortion in imaging system (lens)
– Pincushion and Barrel distortion
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [53]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Example of
2nd-order
Polynomial
Spatial
Warping
From P. Ramadge’s PU EE488 F’00
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [54]
UMCP ENEE631 Slides (created by M.Wu © 2001)
Illustration of Geometric Distortion
From P. Ramadge’s PU EE488 F’00
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [55]
Compensating Spatial Distortion in Imaging
UMCP ENEE631 Slides (created by M.Wu © 2001)

Control points – establishing correspondence
– Coordinates before and after distortion are known

Fit into polynomial warping model: (x, y) => (x’, y’)
x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2
y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2
– Minimize the sum of squared error between a set of warped
control points and the polynomial estimates



x’ = [ x’1, x’2, …, x’M ]T , Z = [ 1, x1, y1, x12, x1y1, y12 ; 1, x2, y2, … ]
E = ( x’ – Z a )T ( x’ – Z a ) + ( y’ – Z b )T ( y’ – Z b )
 E /  a = 0 => x’ = Z a
– Least square estimates: solution expressed by generalized inverse of Z


a = Z^ x’ = (ZT Z) -1 ZT x’; b = Z^ y’
Higher-order approximation
– 2nd order polynomial usually suffices for many applications
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [56]
Example of
Image Registration
Figure from
Gonzalez-Wood 3/e
online book resource
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [57]
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [58]
From R.Liu Seminar Course ’00 @ UMCP
UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Generations of Video Coding
M. Wu: ENEE631 Digital Image Processing (Spring'09)
Lec17 – MPEG and more [59]
Descargar

No Slide Title