A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1 Outline of Talk Introduction Algorithm Analysis 2 Time C 1 Data stream: t1 v1 t2 v2 t3 v3 t4 v4 t5 v5 For simplicity assume unit valued elements 3 Most recent time window of duration W C 1 Data stream: t1 v1 t2 v2 t3 v3 t4 v4 t5 Current time v5 Goal: Compute the sum of elements with time stamps in time window [C W , C ] vi C W t i C 4 Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments 5 Data stream: t1 v1 t2 v2 t3 v3 t4 v4 t5 v5 Synchronous stream ti: In ascending order Asynchronous stream ti: No order guaranteed 6 Why Asynchronous Data Streams? Synchronous stream Network Asynchronous stream Network delay & multi-path routing Synchronous Asynchronous Synchronous Merge w/o control 7 Processing Requirements: •One pass processing •Small workspace: poly-logarithmic in the size of data •Fast processing time per element •Approximate answers are ok 8 Our results: A deterministic data aggregation algorithm Time: log W O log B log W log B Space: O log B log W Relative Error: |X S | S 9 Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous Merging buckets [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Random sampling 10 Outline of Talk Introduction Algorithm Analysis 11 Time C 1 Data stream: t1 t2 t3 t 4 t 5 t 6 Current time For simplicity assume unit valued elements 12 Most recent time window of duration W C 1 Data stream: t1 t2 t3 t 4 t 5 t 6 Current time Goal: Compute the sum of elements with time stamps in time window [C W , C ] 13 W 1 W W W W 2W 3W W 4W Divide time into periods of duration W 14 sliding window W 1 W T 2W C 3W 4W The sliding window may span at most two time periods 15 sliding window W S left 1 W T S right C 2W 3W 4W S S1 S2 Sum can be written as two sub-sums In two time periods 16 sliding window W S left 1 W T Dleft S right C 2W Dright 3W 4W Data structure that maintains an estimate ofS left In left time period 17 1 S left T W Dleft Without loss of Generality, Consider data structure D left in time period [1,W ] 18 Data structure consists of various levels D1 Dleft D2 DL 2 is an upper bound of the sum in a period L 19 Consider level Di Bucket at Level i 1 1 0 Time period W Counts up to 2 i 1 elements 20 Stream: 1 t1 1 t1 W 1 W Increase counter value 21 Stream: 1 t1 t2 1 t2 W 2 W Increase counter value 22 Stream: 1 t1 t 2 t3 1 t3 W 3 W Increase counter value 23 Stream: 1 t1 t 2 t3 ...... t 2 i 1 2 1 t 2i 1 1 W 1 i 1 1 W Increase counter value 24 Stream: t1 t 2 t3 ...... t 2 1 2 1 2 1 i 1 t 2i 1 t 2i 1 1 W 1 i 1 W i 2 W W 2 2 i W 1 Split bucket Counter threshold of 2 i 1 reached 25 Stream: t1 t 2 t3 2 1 ...... t 2 i 1 1 t 2i 1 t 2i 1 1 W 1 i 2 W W 2 2 i 1 W New buckets have threshold also 2 i 1 26 t1 t 2 t3 Stream: ...... t 2 i 1 1 t 2i t 2i 1 1 1 i 2 1 1 1 t 2i 1 1 2 W W 2 2 W 2 i 1 W Increase appropriate bucket 27 t1 t 2 t3 Stream: ...... t 2 i 1 1 t 2i t 2i 1 1 1 t 2i 1 2 W 2 i 2 1 1 t 2i 1 2 W i 2 1 W W 2 2 1 W Increase appropriate bucket 28 t1 t 2 t3 Stream: ...... t 2 i 1 1 t 2i t 2i 1 1 1 t 2i i 2 2 1 1 2 t 2i 1 3 1 t 2i 1 3 W 2 i 2 1 W W 2 2 1 W Increase appropriate bucket 29 Stream: t1 ...... t W m 2 x1 1 Split bucket 2 W W 2 2 W 2 2 i 1 W 1 2 1 1 tm W i 3W 3W 4 4 2 1 i W 30 Stream: t1 ...... t m x1 1 W 2 W 2 2 1 i 3W 3W 4 4 2 1 i W 31 Stream: ......t t1 m t m 1 W 2 1 t m 1 3W 4 x1 1 W 2 i W 2 2 1 1 3W 3W 4 4 2 1 i W Increase appropriate bucket 32 Stream: t1 ...... t m t m 1 t ...... m x1 W 1 2 W Split bucket 2 W 2 2 1 2 1 x4 i 1 i 5W 5W 8 8 3W 3W 4 4 2 1 1 W i 3W 4 33 Stream: t1 ...... t m t m 1 t ...... m x1 1 W 2 x4 3W 4 W 2 2 1 i 5W 5W 8 8 2 1 1 W i 3W 4 34 Splitting Tree 2 1 i 1 W x1 1 i 2 xk 2 2 W W 2 2 i 1 W 2 W 2 W 1 2 x4 i 1 1 x2 1 i 1 3W 3W 4 4 1 W x3 5W 5W 8 8 1 3W 4 35 2 1 i 1 W Max depth = Leaf buckets of duration 1 are not split any further t1 t1 1 t2 t2 1 log W 36 1 2 i 1 W Leaf buckets The initial bucket may be split into many buckets 37 1 2 i 1 W Leaf buckets Due to space limitations 2 log W we only keep the last a buckets 38 1 S T W Suppose we want to find the sum S of elements in time period [T ,W ] 39 S 1 W T 2 Consider various levels of splitting threshold 2 2 1 a 2 2 a k a k 1 a 40 S 1 W T 2 First level with a leaf bucket 2 2 that intersects timeline 2 2 1 a a k a k 1 a 41 S 1 W T Estimate of S: X x1 x2 xz 2 k x1 x2 a Consider buckets on right of timeline xz z a 42 S 1 W T OR 2 First level with a leaf bucket 2 2 On right timeline 2 2 1 a a k a k 1 a 43 Outline of Talk Introduction Algorithm Analysis 44 1 S W T i 1 Suppose that we use level 2 in order to compute the estimate 45 tk Stream: xb xb 1 tl tr Consider splitting threshold level 2 i 1 A data element is counted in the appropriate bucket 46 tk Stream: tl t k t r tk tl tr We can assume that the element is placed in the respective bucket 47 tk Stream: 2 tl tl 2 i i 1 tr tk tl t r 2 tl t r 2 1 2 i tr We can assume that when bucket splits the element is placed in an arbitrary child bucket 48 tk Stream: 2 tl tl 2 i i 1 tr tk If: t l t k 2 tl t r tl t r 2 2 tl t r 2 i 1 tr GOOD! Element counted in correct bucket 49 tk Stream: 2 tl tl If: tl t r 2 2 i i 1 tr tk tl t r 2 2 tl t r 2 1 tk tr i 1 tr BAD! Element counted in wrong bucket 50 S 1 Consider Leaf Buckets tk W 1 If W T T tk W GOOD! 51 S 1 Consider Leaf Buckets tk W 1 If W T tk T BAD! Element counted in wrong bucket 52 S 1 W T Consider Leaf Buckets tk W 1 X S | Z1 | | Z2 | Z 1 :elements of left part counted on right Z 2 :elements of right part counted on left 53 T W 1 tk Z1 elements of left part counted on right tk 1 Must have been initially inserted in one of these buckets W 54 Since tree depth log W i | Z 1 | O ( 2 log W ) 55 Since tree depth log W i | Z 1 | O ( 2 log W ) Similarly, we can prove i | Z 2 | O ( 2 log W ) Therefore: i | X S ||| Z 1 | | Z 2 || O ( 2 log W ) 56 Since a 2 log W i It can be proven S ( 2 log W ) 57 Since a 2 log W It can be proven S ( 2 log W ) i Combined with | X S | O ( 2 i log W ) We obtain relative error : |X S | S 58

Descargar
# Languages and Finite Automata