Sections 14.1 - 14.4
Streaming Media on Demand and
Live Broadcast
Multimedia over IP and wireless networks: compression, networking, and systems
Mihaela van der Schaar & Philip A. Chou
Presented by
H. Mark Okada
CMPT 820
February 18, 2009
Streaming Media
 Media on demand: a user scenario characterised by
audio or video playback locally from a CD or DVD
 interactive controls: fast forward, pause, seek, etc.
 Live broadcast: a user scenario characterised by
tuning into a radio or television program
 only has ability to join or leave a session
 Both are prevalent in the internet today
Eg.
 interactive music and video playback
 internet radio
 chapter 14 looks at how these services are available
 Sections 14.2-14.4 will only cover media on demand
Overview
Section 14.2
 Overview of
 Architectures
 Protocols
 Format issues
Section 14.3
 Buffering and timing fundamentals
Section 14.4
 How media data is communicated for streaming on
demand
NOT COVERED - Section 14.5
 Live broadcast
Architectures - 14.2.1
 Streaming media on demand and live
broadcast require different architectures
Figure 14.1
Streaming media on demand
 source of media is encoded off line to a media file
 streaming using different protocols (Section 14.2.2)
 media file may be specialized to support various modes of
streaming (discussed in Section 14.2.3)
 client temporarily buffers encoded media into decoder
buffer
 temporarily buffers decoded media in a render
buffer
 fairly short (a frame or two) as it has large decoded frames
 enable experience through playback commands
 play, FF, stop, seek
Communication between server & client tailored to
 client’s resources
 network connection
Figure 14.1a
Progressive downloading
 type of streaming - media can be streamed faster
than playback. i.e. downloading entire file
network bandwidth > media content bit rate
(the source coding rate)
 If able to decode sequentially
 progressive downloading can be done through simple file
transfer protocols
 eg. FTP, HTTP both over TCP/IP (i.e. over FTP or through a
web server)
 If limited buffer
 progressive downloading can be done using simple TCP flow
control
 allows client to accept data from TCP only if there is space
in media buffer
 popularised by SHOUTcast, an early music streaming service
Progressive downloading
 type of streaming - media can be streamed faster
than playback. i.e. downloading entire file
network bandwidth > media content bit rate
(the source coding rate)
 need to account for network jitter, temporary
interferences
 want highest possible source coding rate (not less
than worst case network bandwidth)
 These are much of the issues for media on demand,
and the communication protocol between the client
and server
Live broadcast
 encoder may be directly connected to the server
through an encoder buffer
 encoder buffer contains limited data to maintain
fixed and short end-to-end delay
 server accesses data at the playback point, not in
any arbitrary data in a file
 restricts adaptivity, important for multiple receivers
 not possible to have interactive access to media
 difficult to adapt transmission rate of varying
clients**
 difficult for server to use retrans-
mission-based error control
 due to negative acknowledgement
(NAK) implosion problem
 error becomes delicate issue for live
broadcast
**receiver-driven layered multicast (RLM)
allows adaptation of transmission rate
Also see:
S. R. McCanne. Scalable Compression and
Transmission of Internet Multicast Video.
Ph.D. thesis, The University of California,
Berkeley, CA, December 1996.
S. R. McCanne, V. Jacobson, and M. Vetterli.
“Receiver-Driven Layered Multicast,”
in Proc. SIGCOM, pages 117–130, Stanford,
CA, August 1996. ACM.
Protocols - 14.2.2
 streaming on demand requires many
protocols at different levels
This section covers a subset of the protocols
described in week 2 of this class
 RTP: Real-Time Protocol
 RTSP: Real-Time Streaming Protocol
 RTCP: Real-Time Control Protocol
 SIP: Session Initiation Protocol
Real-time streaming protocol (RTSP)
 RFC 2326
At the topmost level:
 application level protocol
 protocols for content discovery
 connection to specific streaming media server
Content discovery is done “out of band”
eg.
http://www.microsoft.com/directory/contentname.asx
http://www.realnetworks.com/directory/contentname.ram
http://www.apple.com/directory/contentname.mov
 URL pointing to metadata that references a separate file on
a webserver
 different for each type: asx, ram, mov
Client contacts server using URL for the content.
eg.
rtsp://wms.microsoft.com/directory/contentname.wmv
rtsp://helixserver.example.com/audio1.rm?start=55&end=1:25
rtsp://qtserver.apple.com/directory/contentname.mov
 Prefix: indicates the streaming protocol used
Example of auxiliary file
Microsoft ASX file
<ASX Version="3.0">
<ENTRY>
<REF HREF="mms://streamingmedia/studios/0505/24721/MTV_XBOX_preview_160k.wmv" />
</ENTRY>
<ENTRY>
<REF HREF="mms://winmedianw/studios/0505/24721/MTV_XBOX_preview_160k.wmv" />
</ENTRY>
</ASX>
RealNetworks RAM file
# First URL that opens a related info pane.
rtsp://helixserver.example.com/video3.rm?rpcontextheight=350
&rpcontextwidth=300&rpcontexturl="http://www.example.com/relatedinfo2.html"
&rpcontexttime=5.5&rpvideofillcolor=rgb(30,60,200)
#
# Second URL that keeps the same related info pane,
# but changes the media playback pane’s background color.
rtsp://helixserver.example.com/video4.rm?rpcontexturl=_keep
Figure 14.2
&rpvideofillcolor=red
Streaming protocol
 commands typically sent reliably over TCP
connection (many forms)
 Real Time Streaming
Protocol (RTSP)
is widely adopted
(RFC 2326)
 Idea is simple but SET_PARAMETER can be
complicated
 a media file may have multiple streams for audio and video
for different languages, subtitles, source coding rates, etc.
Real-time protocol (RTP)
 Client is able to specify which lower level
data transport protocol to use
 data transport is usually either
 RTP over UDP, or
 RTP over TCP
 Both are preferred for bandwidth efficiency
 RTP over UDP - must be a means of
transmission rate and error control
 no standard means of transmission rate and
error control for RTP
 HTTP over TCP may be used when avoiding
firewall issues
Real time control protocol (RTCP)
 RFC 3551
 often used with RTP
 often receivers provide statistical feedback
to sender (reports)
 the interoperable and proprietary features
limit the use as a standard
Windows Media system
 RTP over UDP
 normally transmission rate control based on
source coding rate of content
 client can detect congestion
 signal server to lower or increase source coding
rate
Alternative methods of transmission
rate control
1) TFRC: TCP-friendly rate control
2) TCP-like congestion control algorithm
 Both are being standardised as two profiles in Datagram
congestion control protocol (DCCP)
 Must be paired with a source coding algorithm so
that coding rate is same as transmission rate…
 Source coding rate control algorithm
 Eg. rate-distortion optimised (RaDiO) scheduling algorithm
 error control in Windows Media use selective
retransmission
 gaps sends a NAK to the server (negative
acknowledgement), causing retransmission
 audio has higher priority than video
 Windows media players stalls if missing audio packets and
waits for arrival
File formats - 14.2.3
Challenging to adapt fixed media file to
various network and client conditions
 encoding must be done before streaming (no
knowledge of context)
 allow flexibility into media file
Unrealistic to:
 compress or transcode to needs of every
client
 best way is to allow server to select which
parts of the file to stream
Some streaming formats
The Major players
 MPEG-4 format
 QuickTime format (MPEG-4 is based)
 RealMedia format
 Microsoft Advanced streaming format (ASF)
All have ability to contain/multiplex multiple
media and versions of each medium
 recorded into a track (MPEG-4/QT) or stream
(ASF)
 data units: made of chunks (MPEG-4/QT) or
packets (ASF)
Streaming formats
 Each has a header containing metadata relating to
overall file and specific tracks or streams
 title, author, date, encryption, right managements, table
of contents, track/stream enumeration & their descriptions
 Information on individual track/stream properties
 start time, duration, bit rate, buffer size, sampling rate,
picture size, scalability capabilities
 Time-varying metadata can be associated with each
track/stream
 network packetisation, decoding and presentation time
stamps, SMPTE time codes, key frame, switch frame
 Two types of metadata
 static metadata: size independent of length of data,
inexpensive to transmit over the network
 time-varying metadata: size grows with data, expensive to
transmit
Streaming formats
 …
 provides a structure to allow a method to
select parts of data to transmit
Either
 course grained: server streams only a
particular subset of streams to client
 fine grained: in addition allows fraction of
the data to be chosen
 Can set a Lagrange multiplier parameter which
determines which data units are not transmitted
Encoding media into a stream
Two methods
1) Multibit rate (MBR)
 multiple independent encodings (each with
varying coding rates) are stored in separate
streams (in same file)
 choice in which streams to play
2) scalable coding
 later on section 14.3.3
Data units
 use packets
 eg. H.264/AVC use Network Adaption Layer (NAL)
 In general, local playback/storage not
suitable for streaming
 hard for server to choose the right portions of
the file to stream
 difficult to randomly access (seek) arbitrary
points in the stream
Overview
Section 14.2
 Overview of
 Architectures
 Protocols
 Format issues
Section 14.3
 Buffering and timing fundamentals
Section 14.4
 How media data is communicated for streaming on demand
NOT COVERED - Section 14.5
 Live broadcast
Fundamental abstractions - 14.3
Fundamental abstractions of streaming media on
demand (Section 14.3)
 Section covers
 leaky bucket models of bit streams
 constant bit rate (CBR) vs. variable bit rate (VBR)
 compound (multiple media) streams
 preroll delay
 playback speed timing
 timing
 clocks
 decoder and presentation timestamps
 Should know when it is safe for client to begin
playback
Buffering and leaky bucket models
Scenario 1 - constant bit rate (CBR)
 isochronous** noiseless communication
channel
**isochronous - equal amounts of data are communicated in equal amounts of time
Figure 14.3
 encoder buffer in between encoder and channel
 decoder buffer in between channel and decoder
 schedule – sequence of bits which successive
bits in an encoded bit stream pass a given
point in pipeline
B bits = Encoding buffer
+
Decoding buffer
Encoding buffer
Decoding buffer
Figure 14.4
Buffer tube
 Can view previous as a buffer tube
 Characterised with 3 parameters
 R - slope
 B - height in bits
 Fe - offset/fullness from bottom of tube
 Or by Fd - offset from top of tube
 Fd = B - Fe Can view previous as a buffer tube
 From a buffer point of view




overflow in of encoder buffer => decoder buffer underflow
underflow in of encoder buffer => decoder buffer overflow
B = encoder buffer + decoder buffer
Fe - initial fullness of encoder buffer
 managed by a rate control algorithm
 assigns a number of bits b(n) to each frame n
Buffer tube
 Managed by a rate control algorithm
 assigns a number of bits b(n) to each frame n
 B = encoder buffer + decoder buffer
 Fe - initial fullness of encoder buffer
 De initial delay before entering channel De = Fe/R
 Dd = Fd/R delay after data extracted by the
decoder from the channel
(R,B,F) tube
Aim to keep decoder
buffer delay
Dd = Fd/R
low
Figure 14.5
Variable bit rate stream (VBR)
Scenario 2 - variable bit rate stream (VBR)
 Unlike CBR, VBR has a variable amount of
data per time segment
 higher bitrate for complex segments
 lower bitrate for less complex segments
 tend to have wider buffer streams
=> larger start-up delay
 part of an overall problem: difficult to
determine the average bit rate of system
Variable bit rate stream (VBR)
 Recall the (R,B,F) tube
 each parameter is not unique
for a given bit stream
Definitions of average rate is non trivial
 fit the closest slope along the stairwell, or
 number of bits in stream / duration of
stream
Variable bit rate
 encoder does not use channel continuously
 channel has peak transmission rate R higher than average
stream bit rate
 when needed, sends packets at rate R
 otherwise at 0
 typical of packet network and shared channels
 best modelled by leaky bucket
Defined by (R, B, Fe)
 n: frame number
 b(n): number of bits placed in leaky bucket
 τ(n): time that frame n is processed
 R: bit rate of data leaked out of bucket
 Fe(n) fullness of en. buffer before frame n added
 Be(n) fullness of en. buffer after frame n added
 has schedule
Leaky bucket
 Be(n) fullness of encoder buffer after frame
n added to bucket
B e ( n )  F e ( n )  b( n )
 Fe(n) fullness of encoder buffer before frame
n added to bucket



F e ( n  1)  max{ 0, B e ( n )  R[  ( n  1)   ( n )]}
Be(n) < B for all n = 0, 1, … N
 Aim is to find smallest decoder buffer size
and smallest decoder buffer delay
Leaky bucket
For a given stream, define:
 Minimum bucket capacity with leak rate R and given
initial fullness Fe
Bmin(R,Fe) = minnBe(n)
 Initial decoder buffer fullness
F
min
d
( R, F e )  B
min
( R, F e )  F e
 Derives that there is a minimum capacity B as well as
minimum decoder buffer delay Dd = Fd / R, provided it
starts with initial fullness Fe = Femin (R)

 Source coding rate (Rc): maximum leak rate R such that a
leaky bucket (R, B, Fe) does not underflow with initial
fullness Fe = Femin(R)
 larger leak rates R => smaller required capacity
Leaky bucket
 If transmission rate R > source coding rate Rc
 Decoder buffer reduced
 Decoder buffer delay
also reduced
 client can determine required
buffer size and preroll delay
 use functions Bmin(R) and Fdmin(R)
 computed off line at set of transmission rates
R, R1 < R2 < · · · < RL
Figure 14.7
 stored in the bit stream header as a set of leaky
bucket parameters (Ri , Bi , Fi )
 where Bi = Bmin(Ri) and Fi = Fdmin(Ri)
 each i ∈ L represents the breakpoints in piecewise linear
function in Bmin(R) and Fdmin(R)
 can estimate by linear interpolation (and extrapolation at
ends) at any point R can estimate Bmin(R) and Fdmin(R)
Leaky bucket
Linear interpolation of Bmin(R) and Fdmin(R)
Compound streams (section 14.3.2)
 Compound streams encapsulate many streams
meant to played and streamed concurrently
 view as a single compound stream and a set of leaky
buckets
 a leaky bucket (B,F,R) is the sum of its component
leaky buckets
 eg. If audio has bucket (Ra,Ba,Fa), and video has
bucket (Rv,Bv,Fv), then parameters sum:
 R = Ra + Rv
 B = Ba + Bv
 F = F a + Fv
 Find a combination of each leaky bucket s.t. the
combined leaky bucket won’t overflow
Compound streams
 Find a combination of each leaky bucket s.t.
the combined leaky bucket won’t overflow
 combination of i in La and j in Lv
 minimising using Lagrangian shows that there
are at most La + Lv index pairs, that lie on
set
 can extend this into M concurrent media
streams
Multibit rate (MBR)
 multiple independent encodings (each with
varying coding rates) are stored in separate
streams (in same file)
 choice in which streams to play
 mutually independent, each at different
source coding rates
 combining all possible mutually exclusive
streams (eg. audio Na and video Nv) each
with a different leaky bucket
 most combinations of Na × Nv not likely, typically
are Na + Nv
 use distortion rate approach
Distortion-rate approach
Decide which streams to pair
 assign a distortion Dia and source coding rate Ria to each
audio stream in i = 0… Na
 assign a distortion Djv and source coding rate Rjv to each
video stream in j = 0… Nv
 For each (i,j) combined stream, define distortion
and source coding rate
 Where α: arbitrary weight relative to video distortion
 using Lagrangian again, can find the lowest total
distortion among all combinations with same or
lower total bit rate
 can extend this to other sets of media
Temporal coordinate systems and timestamps
(section 14.3.4)
 Each frame has a decoder timestamp (DTS) in
(MPEG terminology)
 instructs client when to decode it
 also acts as a decoding deadline
 presentation buffer holds decoded frames before
the renderer
 assigned presetation timestamp (PTS), instructs
when to play
 critical in synchronising different streams
 PTS are a layer above the DTS
 Note that presentation order ≠ decoding order
 Eg. I0, B1, B2, P3, B4, B5, P6, ... (presentation order)
I0, P3, B1, B2, P6, B4, B5, ... (decoding order)
 assumed that frames are time stamped with DTS
and PTS
 book will only use DTS
clocks (temporal coordinate system)
 media time τ: clock for device used to capture and
timestamp original content (real time)
 client time t: clock for device playing content
eg.
 τDTS(0), τDTS(1), etc.
 tDTS(0), tDTS(1), etc.
Converting is done by
 Where
t  t0 
  0


 v is the playback rate (v=2
=> playing 2x the speed)
 t0 and τ0 are common initial events (first frame after
seeking/rebuffering)
Leaky bucket update
 Leaky bucket update becomes
where
 R´ = Rv is the arrival rate of bits into client (unit:
bits/client time)
 R = R´/v rate that must be used to compute
required buffer size Bemin(R) and initial decoder
buffer fullness
 preroll delay is Fdmin(R)/R´ = Fdmin(R)/Rv
 larger playback speed => smaller preroll delay
Overview
Section 14.2
 Overview of



Architectures
Protocols
Format issues
Section 14.3

Buffering and timing fundamentals
Section 14.4
 How media data is communicated for streaming on demand
NOT COVERED - Section 14.5
 Live broadcast
Packet networks - 14.4
 RC: source coding rate
 RS: sending rate - rate at which data
injected into transport layer
 Measured in bits/s of client time
 RX: transmission rate - rate which data
injected into network layer (TCP or UDP)
 RX - RS = error control overhead
 RS / RX = channel coding rate
 Ra: arrival rate
 assumed to be RS
 usually set to Ra = vRc
Decoupling Rc and Ra has advantages
Figure 14.8a
Decoupling Ra = vRc
 Adjusting source coding rate defined by
problem source coding rate control
 Choose Rc as a function of Ra
 Change client buffer duration and history
 Have variety of average bit rates R(1), R(2), …
 Each with tight buffer tube (R(i),B(i),Fe(i))
 Can delay playback to ensure guaranteed
continuous playback
Control theoretic model - 14.4.2.1
Figure 14.9
 Client buffer - gap between frame arrival
time ta(n) and its playback deadline td(n)
 Overflow when gap too large
 Underflow when gap too small
 If gap shrinks, must reduce Rc to adjust tb(n)
Control Objective - 14.4.2.2
 Underflow prevented by previous section
 Quality fluctuates to complexity of content
 Target schedule has a margin of safety
 Introduces a penalty to the cost function
 Deviation of buffer tube from target schedule
 Coding rate difference between successive frames
Target schedule design - 14.4.2.3
 Want smallest client buffer duration
 Start with small delay, and increase gap
 Slope is the average source coding rate to
the average arrival rate
s( n ) 
t T ( n  1)  t T (n )
 ( n  1)   (n )
 If upper bound aligns

with target schedule
 tb(n) = tT(n)
s( n ) 
R c ( n  1)
Ra
Eventually want logarithmic growth of buffer

Figure 14.10
Controller design - 14.4.2.4
 Adjust source coding rate
 Controller needs to change n+2 frame at time n
 Uses notion of an error e(n) and a vector
feedback gain G
 Optimal G* is solved
Controller interpretation - 14.4.2.6
 Virtual frame rate is used to reduce
feedback rate and as it is difficult to specify
a frame rate for merged streams
Figure 14.11a
 Start with source coding rate 1/2 of arrival
rate to build up the client buffer duration
Descargar

Document