ENEE631 Spring’09 Lecture 17 (4/6/2009) MPEG Video Coding and Beyond Spring ’09 Instructor: Min Wu Electrical and Computer Engineering Department, University of Maryland, College Park bb.eng.umd.edu (select ENEE631 S’09) minwu@eng.umd.edu M. Wu: ENEE631 Digital Image Processing (Spring'09) Overview and Logistics UMCP ENEE631 Slides (created by M.Wu © 2004) Last Time: – Block-matching and application to hybrid video coding Exploit spatial redundancy via transform coding: e.g. block DCT coding Exploit temporal redundancy via predictive coding: ME/MC – MPEG-1 video coding standard Today: – Finish MPEG-1 Discussion – Other coding considerations/standards: H.26x, MPEG-2, MPEG-4, etc. – Geometric transform of images Assign#4 on video and motion estimation – posted online M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [2] Review: DCT + ME/MC for Hybrid Video Coding UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) “Hybrid” ~ combined transform coding & predictive coding Spatial redundancy removal – Use DCT-based transform coding for reference frame Temporal redundancy removal – Use motion-based predictive coding for next frames estimate motion and use reference frame to predict only encode MV & prediction residue (“motion compensation residue”) (From Princeton EE330 S’01 by B.Liu) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [3] (From R.Liu’s Handbook Fig.2.18) UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Review: Hybrid MC-DCT Video Encoder & Decoder • Intra-frame: encoded without prediction • Inter-frame: predictively encoded => use quantized frames as ref for residue M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [4] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Review: Additional Issues in Hybrid Video Coding Not all regions are easily inferable from previous frame – Occlusion ~ solvable by backward prediction using future frames as ref. – Adaptively decide using prediction or not Drifting and error propagation Solution: Encode reference regions or frames from time to time (“intra coding”) Random access: e.g. want to get 95th frame Solution: Encode frame without prediction from time to time How to allocate bits? – Based on visual model and statistics: JPEG-like quantiz.steps; entropy coding – Consider constant or variable bit-rate requirement Constant-bit-rate (CER) vs. Variable-bit-rate (VER) Wrap up all solutions ~ MPEG-like codec M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [5] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Review: MPEG-1 Video Coding Standard Standard only specifies decoders’ capabilities – Prefer simple decoding and not limit encoder’s complexity – Leave flexibility and competition in implementing encoder Block-based hybrid coding (DCT + M.C.) – 8x8 block size as basic coding unit – 16x16 “macroblock” size for motion estimation/compensation Group-of-Picture (GOP) structure with 3 types of frames – Intra coded – Forward-predictively coded – Bidirectional-predictively coded M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [9] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) MPEG-1 Picture Types and Group-of-Pictures A Group-of-Picture (GOP) contains 3 types of frames (I/P/B) Frame order I1 BBB P1 BBB P2 BBB I2 … Coding order I1 P1 BBB P2 BBB I2 BBB … (From R.Liu Handbook Fig.3.13) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [10] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) “Adaptive” Predictive Coding in MPEG-1 Half-pel M.V. search within +/-64 pel range – Use spatial differential coding on M.V. to remove M.V. spatial redundancy Coding each block in P-frame – Predictive block using previous I/P frame as reference – Intra-block ~ encode without prediction use this if prediction costs more bits than non-prediction good for occluded area can also avoid error propagation Coding each block in B-frame – Intra-block ~ encode without prediction – Predictive block Use previous I/P frame as reference (forward prediction), Or use future I/P frame as reference (backward prediction), Or use both for prediction and take average M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [11] (Fig. from Ken Lam – HK Poly Univ. short course in summer’2001) UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Coding of B-frame (cont’d) Previous frame Current frame Future frame A B B = A forward prediction B = C backward prediction or B = (A+C)/2 interpolation M. Wu: ENEE631 Digital Image Processing (Spring'09) C one motion vector two motion vectors Lec17 – MPEG and more [12] Revised from R.Liu Seminar Course ’00 @ UMD UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Quantization for I-frame (I-block) & M.C. Residues Quantizer for I-frame (I-block) – Different step size for different freq. band (similar to JPEG) – Default quantization table – Scale the table for different compression-quality Quantizer for residues in predictive block – Noise-like residue – Similar variance in different frequency band => Assign same quantization step size for each frequency band M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [13] Adjusting Quantizer UMCP ENEE631 Slides (created by M.Wu © 2001) For smoothing out bit rate – Some applications prefer approx. constant bit rate video stream (CBR) e.g., prescribe # bits per second very-short-term bit-rate variations can be smoothed by a buffer variations can’t be too large on longer term ~ o.w. buffer overflow, delay and jitter in playback – Need to assign large step size for complex and high-motion frames For reducing bit rate by exploiting HVS temporal properties – Noise/distortion in a video frame would not be very much visible when there is a sharp temporal transition (scene change) can compress a few frames right after scene change with fewer bits Alternative bit-rate adjustment tool ~ frame type – IIIIII… – IPP…PI PP… – IBBPBBPBBI… lowest compression ratio (like motion-JPEG) moderate compression ratio highest compression ratio M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [14] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Color Transformation RGB YUV color coordinates Y U V 0 . 2990 0 . 1687 0 . 5000 0 . 5870 0 . 3313 0 . 4187 0 . 1140 R 0 . 5000 G 0 . 0813 B U/V chrominance components are downsampled in coding M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [15] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Video Coding Summary: Performance Tradeoff From R.Liu’s Handbook Fig.1.2: “mos” ~ 5-pt mean opinion scale of bad, poor, fair, good, excellent M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [17] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) About Compression Ratio Raw video – 24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps Potential “cheating” points => contributing ~ 4:1 inflation – Color components are actually downsampled – 30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps – ( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [18] UMCP ENEE631 Slides (created by M.Wu © 2004) Other Standards and Considerations for Digital Video Coding M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [19] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) H.26x for Video Telephony Remote face-to-face communication: A dream for years – H.26x series – video coding targeted low bit rate through ISDN or regular analog telephone line ~ on the order of 64kbps need roughly symmetric complexity on encoder and decoder H.261 (early 1990s) – Similar to simplified MPEG-1 ~ block-based DCT/MC hybrid coder – Integer-pel motion compensation with I/P frame only ~ no B frames – Restricted picture size/fps format and M.V. range H.263 (mid 1990s) and H.263+/H.263++ (late 1990s) – Support half-pel motion compensation & many options for improvement H.264 (latest, 2001-): also known as H.26L / JVT / MPEG4 part10 – Hybrid coding framework with many advanced techniques – Focusing on greatly improving compression ratio at a cost of complexity allow smaller block size; more choices on ref; advanced entropy coding, etc. M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [20] From Gonzalez-Woods 3/e Table 8.11 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [21] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) MPEG-2 Extend from MPEG-1 Target at high-resolution high-bit-rate applications – Digital video broadcasting, HDTV, …; also used for DVD Support interlaced video – Frame pictures vs. Field pictures – New prediction modes for motion compensation for interlaced video Use previously encoded fields to do M.E.-M.C. Support scalability M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [22] From Wang’s book preprint Fig. 13.17 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [25] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Scalability in Video Codecs Scalability: provide different quality in a single stream – Stack up more bits on base layer to provide improved quality Possible ways for achieving scalabilities – SNR Scalability ~ Multiple–quality video services Basic vs. premium quality – Spatial Scalability ~ Multiple-dimension displays Display on PDA vs. PC vs. Super-resolution display – Temporal Scalability ~ Multiple frame rates – Frequency Scalability ~ Blurred version to sharp, detailed version Layered coding concept facilitates: – Unequal error protection – Different needs from customers M. Wu: ENEE631 Digital Image Processing (Spring'09) – Efficient use of resources – Multiple services Lec17 – MPEG and more [26] SNR Scalability Two layers with same spatio-temporal resolution but different qualities Video in + Base-layer bitsteam base-layer encoder - multiplexer base-layer decoder Output bitsteam enhancement-layer encoder Enhancement-layer bitsteam From R.Liu Seminar Course @ UMCP M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [27] Spatial Scalability Two layers with different spatial resolution Down-sampler Video in base-layer encoder Base-layer bitsteam Up-sampler + - multiplexer base-layer decoder Output bitsteam enhancement-layer encoder Enhancement-layer bitsteam From R.Liu Seminar Course @ UMCP M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [29] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) MPEG-4 Many functionalities targeting a variety of applications Introduced object-based coding strategy – For better support of interactive applications & graphics/animation video – Require encoder to perform object segmentation difficult for general applications Introduced error resilient coding techniques – “Streaming video profile” for wireless multimedia applications Part-10 is converged into H.264/AVC (Advanced Video Coding) – Focused on improving compression ratio and error resilience – Stick with Hybrid Coding framework M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [32] Revised from R.Liu Seminar Course @ UMCP UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Object-based Coding in MPEG-4 Interactive functionalities Higher compression efficiency by separately handling – Moving objects – Unchanged background – New regions – M.C.-failure regions => “Sprite” encoding Object segmentation needed (not easy ) – Based on color, motion, edge, texture, etc. – Possible for targeted applications M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [33] UMCP ENEE631 Slides (created by M.Wu © 2001) From Wang’s Preprint Table 1.3 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [36] UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) MPEG-7 Figure from MPEG-7 Document N4031 (March 2001) “Multimedia Content Description Interface” – Not a video coding/compression standard like previous MPEG – Emphasize on how to describe the video content for efficient indexing, search, and retrieval Standardize the description mechanism of content – Descriptor, Description Scheme, Description Definition Languages Employ XML type of description language – Example of MPEG-7 visual descriptors: Color, Texture, Shape, … M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [37] UMCP ENEE631 Slides (created by M.Wu © 2004) Summary of Today’s Lecture MPEG-1 video coding standard Other coding considerations and standards – H.26x, MPEG-2, MPEG-4, MPEG-7, etc. Geometric transform of images ~ more in next lecture Readings: – Gonzalez’s 3/e book 8.2.9, 8.1.7; 2.6.5 (geometric transform) – Liu’s book on video coding (see course website) Chapter 2 “Motion-Compensated DCT Video Coding” Chapter 3 “Video Coding Standards” – Other reference: Wang’s textbook Chapter 13 (video standards); Chapter 1 (video basics) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [38] UMCP ENEE631 Slides (created by M.Wu © 2004) Geometric Relations and Manipulations of Images Useful to characterize: - global camera motion in video; - relate two images of similar scenes taken from different time or slightly different view point => “image registration” M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [40] Rotation, Translation, and Scaling UMCP ENEE631 Slides (created by M.Wu © 2001) (x’, y’) R.S.T. of an image object (x, y) – Original pixel location (x,y) New location (x’,y’) x ' x t x t y ' y y x ' cos y ' sin x ' s x 0 y ' transl ation by t x t y Preserve length & angle sin cos 0 x s y y x y scaling rotate by around origin counter - clockwise by s x and s y M. Wu: ENEE631 Digital Image Processing (Spring'09) Uniform scaling Sx = Sy (preserve angle and shape) Differential scaling Sx Sy Lec17 – MPEG and more [41] UMCP ENEE631 Slides (created by M.Wu © 2001) Rotation, Translation, and Scaling (cont’d) Rotation and translation of image coordinates – Note the relations with rotation and translation of image objects x ' x t x t y ' y y y x ' cos y ' sin y’ transl ate origin to sin cos x y (t x , t y ) rotate axis by counter - clockwise x’ x M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [42] Implementation Issues of Geometric Transform UMCP ENEE631 Slides (created by M.Wu © 2001) Forward transform – Index mapping from input to output image What if most values obtained for an output image are at fractional coordinate indices? Reverse transform – Map integer indices of output image to input image (p,q) Get values of input image at fractional indices through interpolation (p,q+1) b a (p’,q’) (p+1,q) (p+1,q+1) M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [43] Exercise: express RST and reflection in homogeneous coordinate UMCP ENEE631 Slides (created by M.Wu © 2001/2004) 2-D Homogeneous Coordinate Describe R.S.T. transform by P’ = M P + T – Need calculating intermediate coordinate values for successive transf. Homogeneous coordinate – Allow R.S.T. represented by matrix multiplication operations successive transf. can be calculated by combining transf. matrices – Cartesian point (x,y) Homogeneous representation ( s x’, s y’, s ) represent same pixel location for all nonzero parameter s; often use s=1 x ' sx ' a 11 y ' ~ sy ' a 21 1 s a 31 a 12 a 22 a 32 a 13 a 23 1 x y 1 The name: Equation f(x,y) = 0 becomes homogeneous equation in (s x’, s y’, s ) such that if the common factors in 3 parameters can be factored out from the equation. M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [45] UMCP ENEE631 Slides (created by M.Wu © 2001) R.S.T. in Homogeneous Coordinates x ' 1 y' 0 1 0 0 1 0 tx ty 1 x y 1 x ' cos y ' sin 1 0 x ' s x y' 0 1 0 sin cos 0 0 0 1 0 sy 0 0 0 1 x y 1 x y 1 Successive R.S.T. – Left multiply the basic transform matrices M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [46] Reflection UMCP ENEE631 Slides (created by M.Wu © 2001) Reflect about x-axis, y-axis, and origin 1 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 1 Reflect about y=x and y=-x 0 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 0 1 Reflect about a general line y=ax+b Combination of translate-rotate => reflect => inverse rotate-translate M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [47] UMCP ENEE631 Slides (created by M.Wu © 2001) Shear Shear ~ a transformation that distorts the shape – Cause opposite layers of the object slide over each other Shear relative to x-axis 1 0 0 sh x 1 0 0 0 1 y’ y (1, 1) shx =2 (2, 1) (3, 1) x’ x (1, 0) Extend to shears relative to other reference lines M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [48] General Composite Transforms UMCP ENEE631 Slides (created by M.Wu © 2001) Combined R.S.T. – {aij} is determined by R.S.T. parameters x ' a 11 y' a 21 1 0 a 12 a 22 0 a 13 a 23 1 x y 1 Rigid-body transform – Only involve translations and rotations – 2x2 rotation submatrix is orthogonal row vectors are orthonormal Extension to 3-D homogeneous coordinate – ( sX, sY, sZ, s ) with 4x4 transformation matrices M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [49] General Composite Transforms (cont’d) UMCP ENEE631 Slides (created by M.Wu © 2001) Affine transforms ~ 6 parameters – Can be expressed as composition of RST, reflection and shear x ' a 11 y' a 21 1 0 a 12 a 22 0 a 13 a 23 1 x y 1 – Parallel lines are transformed as parallel lines Projective transforms ~ 8 parameters – Cover more general geometric transformations between 2 planes Widely used in computer vision (e.g. image mosaicing, synthesized views) – Two unique phenomena: Chirping: increase in perceived spatial freq as distance to camera increases Converging/Keystone effects: parallel lines appear closer & merging in distance sx ' a 11 a 12 b1 x w [ x ', y ']T Aw b sy ' a 21 a 22 b 2 y w new T 1 c w s c1 c2 1 1 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [51] UMCP ENEE631 Slides (created by M.Wu © 2004) Effects of Various Geometric Mappings From Wang’s Book Preprint Fig.5.18 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [52] UMCP ENEE631 Slides (created by M.Wu © 2001) Higher-order Nonlinear Spatial Warping Analogous to “rubber sheet stretching” – Forward and reverse mapping of pixels’ coordinate indices (x, y) (x’, y’) Polynomial warping – Extend affine transform to higher-order polynomial mapping – 2nd-order warping x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2 y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2 Spatial distortion in imaging system (lens) – Pincushion and Barrel distortion M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [53] UMCP ENEE631 Slides (created by M.Wu © 2001) Example of 2nd-order Polynomial Spatial Warping From P. Ramadge’s PU EE488 F’00 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [54] UMCP ENEE631 Slides (created by M.Wu © 2001) Illustration of Geometric Distortion From P. Ramadge’s PU EE488 F’00 M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [55] Compensating Spatial Distortion in Imaging UMCP ENEE631 Slides (created by M.Wu © 2001) Control points – establishing correspondence – Coordinates before and after distortion are known Fit into polynomial warping model: (x, y) => (x’, y’) x’ = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y2 y’ = b0 + b1 x + b2 y + b3 x2 + b4 xy + b5 y2 – Minimize the sum of squared error between a set of warped control points and the polynomial estimates x’ = [ x’1, x’2, …, x’M ]T , Z = [ 1, x1, y1, x12, x1y1, y12 ; 1, x2, y2, … ] E = ( x’ – Z a )T ( x’ – Z a ) + ( y’ – Z b )T ( y’ – Z b ) E / a = 0 => x’ = Z a – Least square estimates: solution expressed by generalized inverse of Z a = Z^ x’ = (ZT Z) -1 ZT x’; b = Z^ y’ Higher-order approximation – 2nd order polynomial usually suffices for many applications M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [56] Example of Image Registration Figure from Gonzalez-Wood 3/e online book resource M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [57] M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [58] From R.Liu Seminar Course ’00 @ UMCP UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002) Generations of Video Coding M. Wu: ENEE631 Digital Image Processing (Spring'09) Lec17 – MPEG and more [59]

Descargar
# No Slide Title