A Deterministic Algorithm for
Summarizing Asynchronous Streams
over a Sliding Window
Costas Busch
Rensselaer Polytechnic Institute
Srikanta Tirthapura
Iowa State University
1
Outline of Talk
Introduction
Algorithm
Analysis
2
Time
C
1
Data stream:
t1
v1
t2
v2
t3
v3
t4
v4
t5
v5
For simplicity assume
unit valued elements
3
Most recent time window of duration W
C
1
Data stream:
t1
v1
t2
v2
t3
v3
t4
v4
t5
Current
time
v5
Goal: Compute the sum of elements with
time stamps in time window
[C  W , C ]
 vi
C W t i  C
4
Example I: All packets on a network link, maintain
the number of different ip sources in the last
one hour
Example II: Large database, continuously
maintain averages and frequency moments
5
Data stream:
t1
v1
t2
v2
t3
v3
t4
v4
t5
v5
Synchronous stream
ti: In ascending order
Asynchronous stream
ti: No order guaranteed
6
Why Asynchronous Data Streams?
Synchronous stream
Network
Asynchronous stream
Network delay & multi-path routing
Synchronous
Asynchronous
Synchronous
Merge w/o control
7
Processing Requirements:
•One pass processing
•Small workspace: poly-logarithmic in
the size of data
•Fast processing time per element
•Approximate answers are ok
8
Our results:
A deterministic data aggregation algorithm
Time:
log W 

O  log B 

 

log W  log B 

Space: O  log B log W




Relative Error:
 
|X S |
S
9
Previous Work:
[Datar, Gionis, Indyk, Motwani.
SIAM Journal on Computing, 2002]
Deterministic, Synchronous
Merging buckets
[Tirthapura, Xu, Busch, PODC, 2006]
Randomized, Asynchronous
Random sampling
10
Outline of Talk
Introduction
Algorithm
Analysis
11
Time
C
1
Data stream:
t1 t2 t3 t 4 t 5 t 6
Current
time
For simplicity assume
unit valued elements
12
Most recent time window of duration W
C
1
Data stream:
t1 t2 t3 t 4 t 5 t 6
Current
time
Goal: Compute the sum of elements with
time stamps in time window
[C  W , C ]
13
W
1
W
W
W
W
2W
3W
W
4W
Divide time into periods of duration W
14
sliding window W
1
W
T
2W
C
3W
4W
The sliding window may span at most
two time periods
15
sliding window W
S left
1
W
T
S right
C
2W
3W
4W
S  S1  S2
Sum can be written as two sub-sums
In two time periods
16
sliding window W
S left
1
W
T
Dleft
S right
C
2W
Dright
3W
4W
Data structure that
maintains an estimate ofS
left
In left time period
17
1
S left
T
W
Dleft
Without loss of Generality,
Consider data structure D
left
in time period [1,W ]
18
Data structure consists of various levels
D1
Dleft
D2
DL
2 is an upper bound of the sum in a period
L
19
Consider level Di
Bucket at Level i  1
1
0
Time period
W
Counts up to 2 i  1 elements
20
Stream:
1
t1
1  t1  W
1
W
Increase counter value
21
Stream:
1
t1 t2
1  t2  W
2
W
Increase counter value
22
Stream:
1
t1 t 2 t3
1  t3  W
3
W
Increase counter value
23
Stream:
1
t1 t 2 t3
...... t
2
i 1
2
1  t 2i  1 1  W
1
i 1
1
W
Increase counter value
24
Stream:
t1 t 2 t3
...... t
2
1
2
1
2
1
i 1
t 2i
1  t 2i  1 1  W
1
i 1
W
i
2
W
W
2
2
i
W
1
Split bucket
Counter threshold of 2
i 1
reached
25
Stream:
t1 t 2 t3
2
1
...... t
2
i 1
1
t 2i
1  t 2i  1 1  W
1
i
2
W
W
2
2
i
1
W
New buckets have threshold also 2 i  1
26
t1 t 2 t3
Stream:
...... t
2
i 1
1
t 2i t 2i
1
1
1
i
2 1
1
1  t 2i  1  1 
2
W
W
2
2
W
2
i
1
W
Increase appropriate bucket
27
t1 t 2 t3
Stream:
...... t
2
i 1
1
t 2i t 2i
1
1
1
t 2i
1
2
W
2
i
2 1
1
 t 2i  1  2  W
i
2 1
W
W
2
2
1
W
Increase appropriate bucket
28
t1 t 2 t3
Stream:
...... t
2
i 1
1
t 2i t 2i
1
1
1
t 2i
i
2 2
1
1
2
t 2i
1
3
1  t 2i  1  3 
W
2
i
2 1
W
W
2
2
1
W
Increase appropriate bucket
29
Stream:
t1
...... t
W
m
2
x1
1
Split bucket
2
W
W
2
2
W
2
2
i 1
W
1
2
1
 1  tm 
W
i
3W
3W
4
4
2
1
i
W
30
Stream:
t1
...... t
m
x1
1
W
2
W
2
2
1
i
3W
3W
4
4
2
1
i
W
31
Stream:
......t
t1
m
t m 1
W
2
 1  t m 1 
3W
4
x1
1
W
2
i
W
2
2 1
1
3W
3W
4
4
2
1
i
W
Increase appropriate bucket
32
Stream:
t1
...... t
m
t m 1
t
......
m
x1
W
1
2
W
Split bucket
2
W
2
2
1
2
1
x4
i 1
i
5W
5W
8
8
3W
3W
4
4
2
1
1
W
i
3W
4
33
Stream:
t1
...... t
m
t m 1
t
......
m
x1
1
W
2
x4
3W
4
W
2
2
1
i
5W
5W
8
8
2
1
1
W
i
3W
4
34
Splitting Tree
2
1
i 1
W
x1
1
i
2  xk  2
2
W
W
2
2
i 1
W
2
W
2
W
1
2
x4
i 1
1
x2
1
i 1
3W
3W
4
4
1
W
x3
5W
5W
8
8
1
3W
4
35
2
1
i 1
W
Max depth =
Leaf buckets of duration 1
are not split any further
t1
t1  1
t2
t2  1
log W
36
1
2
i 1
W
Leaf buckets
The initial bucket may be split into
many buckets
37
1
2
i 1
W
Leaf buckets
Due to space limitations
2
log W
we only keep the last a 

buckets
38
1
S
T
W
Suppose we want to find the sum S
of elements in time period [T ,W ]
39
S
1
W
T
2
Consider various levels
of splitting threshold
2
2
1
a
2
2
a
k
a
k 1
a
40
S
1
W
T
2
First level with a leaf bucket
2
2
that intersects timeline
2
2
1
a
a
k
a
k 1
a
41
S
1
W
T
Estimate of S:
X  x1  x2    xz
2
k
x1
x2
a
Consider buckets on right of timeline
xz
z a
42
S
1
W
T
OR
2
First level with a leaf bucket
2
2
On right timeline
2
2
1
a
a
k
a
k 1
a
43
Outline of Talk
Introduction
Algorithm
Analysis
44
1
S
W
T
i 1
Suppose that we use level 2
in order to compute the estimate
45
tk
Stream:
xb  xb  1
tl
tr
Consider splitting threshold level
2
i 1
A data element is counted in the
appropriate bucket
46
tk
Stream:
tl  t k  t r
tk
tl
tr
We can assume that the element is
placed in the respective bucket
47
tk
Stream:
2
tl
tl
2
i
i 1
tr
tk
tl  t r
2
tl  t r
2
1
2
i
tr
We can assume that when bucket
splits the element is placed in an
arbitrary child bucket
48
tk
Stream:
2
tl
tl
2
i
i 1
tr
tk
If: t l  t k 
2
tl  t r
tl  t r
2
2
tl  t r
2
i
1
tr
GOOD!
Element counted in correct bucket
49
tk
Stream:
2
tl
tl
If:
tl  t r
2
2
i
i 1
tr
tk
tl  t r
2
2
tl  t r
2
 1  tk  tr
i
1
tr
BAD!
Element counted in wrong bucket
50
S
1
Consider Leaf Buckets
tk
W
1
If
W
T
T  tk  W
GOOD!
51
S
1
Consider Leaf Buckets
tk
W
1
If
W
T
tk  T
BAD!
Element counted in wrong bucket
52
S
1
W
T
Consider Leaf Buckets
tk
W
1
X  S  | Z1 |  | Z2 |
Z 1 :elements of left part counted on right
Z 2 :elements of right part counted on left
53
T
W
1
tk
 Z1
elements of left part
counted on right
tk
1
Must have been initially inserted
in one of these buckets
W
54
Since tree depth  log W
i
| Z 1 | O ( 2 log W )
55
Since tree depth  log W
i
| Z 1 | O ( 2 log W )
Similarly, we can prove
i
| Z 2 | O ( 2 log W )
Therefore:
i
| X  S ||| Z 1 |  | Z 2 || O ( 2 log W )
56
Since a 
2

log W
i
It can be proven   S   ( 2 log W )
57
Since a 
2

log W
It can be proven   S   ( 2 log W )
i
Combined with | X  S | O ( 2 i log W )
We obtain relative error :
|X S |
S
 
58
Descargar

Languages and Finite Automata