Data Storage selection in sensor
networks
Report of Advanced Data Base Topics Project
Instructor : Dr. rahgozar
euhanna ghadimi, Ali abbasi , kave pashaii
1
Outline
1. Introduction
Definition, Applications, Differences,
Storage
2. Queries
2.1. Querying in Cougar
2.2. Querying in TinyDB
2.3. In-network Aggregation
3. Other Issues
2
Introduction
From a data storage point of view, a sensor
network database :
“a distributed database that collects
physical measurements about the
environment, indexes them, and serves
queries from users and other applications
external to or from within the network”
Research in sensor network databases:
• relatively new
• can benefit from current efforts in data
streams and P2P networks
3
The long term goal
Embed numerous distributed
devices to monitor and interact
with physical world: in workspaces, hospitals, homes, vehicles,
and “the environment” (water,
soil, air…)
Circulatory Net
Disaster Response
Network these devices so
that they can coordinate to
perform higher-level tasks.
Requires robust distributed
systems of tens of thousands
of devices.
4
Sensor Net Sample Apps
Habitat Monitoring: Storm
petrels on Great Duck Island,
microclimates on James
Reserve.
Vehicle detection: sensors along a
road, collect data about passing
vehicles.
Earthquake monitoring in shaketest sites.
Traditional monitoring
apparatus.
5
Overview of research
• Sensor network challenges
• One approach: Directed diffusion
• Basic algorithm
• Initial simulation results (Intanagowat)
• Other interesting localized algorithms in
progress:
•
•
•
•
•
•
Aggregation (Kumar)
Adaptive fidelty (Xu)
Address free architecture, Time synch (Elson)
Localization (Bulusu, Girod)
Self-configuration using robotic nodes (Bulusu, Cerpa)
Instrumentation and debugging (Jerry Zhao)
6
The Challenge is Dynamics!
The physical world is dynamic
• Dynamic operating conditions
• Dynamic availability of resources
• … particularly energy!
• Dynamic tasks
Devices must adapt automatically to the
environment
• Too many devices for manual configuration
• Environmental conditions are unpredictable
Unattended and un-tethered operation is
key to many applications
7
Approach
Energy is the bottleneck resource
• And communication is a major consumer--avoid
communication over long distances
Pre-configuration and global knowledge are
not applicable
• Achieve desired global behavior through localized
interactions
• Empirically adapt to observed environment
Leverage points
• Small-form-factor nodes, densely distributed to
achieve Physical locality to sensed phenomena
• Application-specific, data-centric networks
• Data processing/aggregation inside the network
8
Directed Diffusion Concepts
Application-aware communication primitives
• expressed in terms of named data (not in terms of the
nodes generating or requesting data)
Consumer of data initiates interest in data with
certain attributes
Nodes diffuse the interest towards producers via
a sequence of local interactions
This process sets up gradients in the network
which channel the delivery of data
Reinforcement and negative reinforcement used
to converge to efficient distribution
Intermediate nodes opportunistically fuse
interests, aggregate, correlate or cache data
9
Illustrating Directed Diffusion
Setting up gradients
Sending data
Source
Source
Sink
Sink
Reinforcing
stable path
Source
Source
Sink
Recovering
from node failure
Sink
10
Sensor Network Tomography: Key
Ideas and Challenges
Kinds of tomograms
•
network health
• resource-level indicators
•
responses to external stimuli
Can exchange resource health
•
•
during low-level housekeeping
functions
… such as radio
synchronization
Key challenge: energyefficiency
•
•
•
need to aggregate local
representations
algorithms must auto-scale
outlier indicators are
different
11
Self configuring networks using and
supporting robotic nodes
(Bulusu, Cerpa, Estrin, Heidemann, Mataric, Sukhatme)
Robotics
introduces selfmobile nodes and
adaptively placed
nodes
Self configuring ad
hoc networks in the
context of
unpredictable RF
environment
Place nodes for
network
augmentation or
formation
Place beacons for
localization
granularity
12
Programming Sensor Nets Is
Hard
• Months of lifetime required from small batteries
• 3-5Current
days naively;
can’tby
recharge
often
(mA)
Processing
• 20
Interleave sleep with processing
Phase
Current (mA)
– Lossy, low-bandwidth, short range communication
15
»Nodes
coming and going
»~20% loss @ 5m
10
»Multi-hop
200-800
instructions
High-Level
Abstraction
Is
Need high level
Needed!per bit transmitted!
abstractions!
– Remote, zero administration deployments
5
– Highly distributed environment
– Limited
Development Tools
0
»Embedded,
LEDs for
Debugging!Processing &
Processing
Processing &
Listening
Idle
Transmitting
13
A Solution: Declarative Queries
Users specify the data they want
• Simple, SQL-like queries
• Using predicates, not specific addresses
• Same spirit as Cougar – Our system: TinyDB
Challenge is to provide:
• Expressive & easy-to-use interface
• High-level operators
• Well-defined interactions
• “Transparent Optimizations” that many programmers would miss
• Sensor-net specific techniques
• Power efficient execution framework
Question: do sensor networks change query
processing? Yes!
14
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
15
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
16
TinyDB Demo
17
TinyDB Architecture
SELECT
T:1, AVG: 225
AVG(temp) Queries
Results T:2, AVG: 250
WHERE
light > 400
Multihop
Network
Query Processor
Aggavg(temp)
Schema:
•“Catalog” of commands &
attributes
~10,000
Lines Embedded C Code
Filter
Name: temp
light >
400
got(‘temp’)
~5,000
LinesSamples
(PC-Side)
Java Time to sample: 50 uS
get (‘temp’) Tables
Cost to sample: 90 uJ
Schema
~3200 Bytes
RAM (w/ 768 byte
heap) Table: 3
Calibration
getTempFunc(…)
Units: Deg. F
TinyOS code
~58 kB compiled
Error: ± 5 Deg F
Get f Program)
: getTempFunc()…
(3x larger than 2nd largest TinyOS
TinyDB
18
Declarative Queries for Sensor
Networks
“Find the sensors in bright
nests.”
1
Examples:
SELECT nodeid, nestNo, light
FROM sensors
WHERE light > 400
EPOCH DURATION 1s
Sensors
Epoch Nodeid nestNo Light
0
1
17
455
0
2
25
389
1
1
17
422
1
2
25
405
19
Aggregation Queries
2 SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
3 SELECT region, CNT(occupied)
AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
“Count the number occupied
nests in each loud region of
the island.”
Epoch
region
CNT(…)
AVG(…)
0
North 3
360
0
South 3
520
1
North
3
370
1
South
3
520
Regions w/ AVG(sound) > 200
20
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
21
Tiny Aggregation (TAG)
In-network processing of aggregates
• Common data analysis operation
• Aka gather operation or reduction in || programming
• Communication reducing
• Operator dependent benefit
• Across nodes during same epoch
Exploit query semantics to improve
efficiency!
22
Query Propagation Via TreeBased Routing
Tree-based routing
• Used in:
• Query delivery
• Data collection
• Topology selection is
important; e.g.
• Krishnamachari, DEBS
2002, Intanagonwiwat,
ICDCS 2002, Heidemann,
SOSP 2001
• LEACH/SPIN,
Heinzelman et al.
MOBICOM 99
• SIGMOD 2003
• Continuous process
• Mitigates failures
Q:SELECT …
A
Q
R:{…}
Q
R:{…}
B
Q
R:{…}Q
Q
D
R:{…}Q
C
Q
Q
R:{…}
Q
Q
Q
F
E
Q
23
Basic Aggregation
In each epoch:
• Each node samples local sensors once
1
• Generates partial state record (PSR)
• local readings
• readings from children
2
3
• Outputs PSR during assigned comm. interval
At end of epoch, PSR for whole network
output at root
New result on each successive epoch
4
5
Extras:
• Predicate-based partitioning via GROUP BY
24
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
Interval #
4
2
3
Interval 4
1
4
Epoch
5
1
2
3
3
2
1
4
4
1
5
25
Illustration: Aggregation
SELECT COUNT(*) FROM sensors
Interval 3
Sensor #
1
2
1
3
4
Interval #
4
3
2
5
1
2
3
2
2
4
1
4
5
26
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
1
4
4
1
5
1
3
2
Interval 2
3
2
3
2
1
3
4
1
4
5
27
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
2
3
2
2
4
5
1
3
Interval 1
1
4
4
1
5
1
3
4
5
5
28
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
5
1
3
2
3
2
2
4
1
4
4
1
Interval 4
1
3
4
5
1
1
5
29
Interval Assignment: An Approach
4 intervals / epoch
Interval # = Level
SELECT
COUNT(*)…
• CSMA for collision
L T
Z
Z
Z
2 avoidance
Z
Z
Z
Z
5 4
1
Level = 1
Z
Comm Interval
Z
Z
Z
Z
Z
Z
Z
Z
Z
3 2 1
L T
Z
5
Z
Z
Z
Z
Epoch
2
• Time intervals for
power conservation
3
• Many variations(e.g. Yao
& Gehrke, CIDR 2003)
4• Time Sync (e.g. Elson &
Estrin OSDI 2002)
Z
Z
3
Z
Z
4
Z
Z
Z
Z
Z
Z
Z
L T
L T
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
L T
L
Pipelining: Increase throughput by delaying
5 until a later epoch
result arrival
Z
Z
Z
Madden, Szewczyk, Franklin, Culler. Supporting
Aggregate Queries Over Ad-Hoc Wireless Sensor
30
Networks. WMCSA 2002.
Aggregation Framework
• As in extensible databases, we support any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0}
Partial State Record (PSR)
 <a0>
Fmerge {<a1>,<a2>}  <a12>
Fevaluate {<a1>}
 aggregate value
Example: Average
AVGinit
{v}
 <v,1>
AVGmerge
{<S1, C1>, <S2, C2>}
 < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>}
 S/C
Restriction: Merge associative, commutative
31
Types of Aggregates
SQL supports MIN, MAX, SUM, COUNT,
AVERAGE
Any function over a set can be computed via
TAG
In network benefit for many operations
• E.g. Standard deviation, top/bottom N, spatial
union/intersection, histograms, etc.
• Compactness of PSR
32
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
33
Simulation Environment
Evaluated TAG via simulation
Coarse grained event based simulator
• Sensors arranged on a grid
• Two communication models
• Lossless: All neighbors hear all messages
• Lossy: Messages lost with probability that increases
with distance
Communication (message counts) as
performance metric
34
Taxonomy of Aggregates
TAG insight: classify aggregates according to
various functional properties
• Yields a general set of optimizations that can
automatically be applied
Properties
Partial State
Drives an API!
Monotonicity
Exemplary vs. Summary
Duplicate Sensitivity
35
Partial State
Growth of PSR vs. number of aggregated values (n)
•
•
•
•
Algebraic:
Distributive:
Holistic:
Unique:
• d = # of distinct values
• Content Sensitive:
Property
Partial State
|PSR| = 1 (e.g. MIN)
“Data Cube”,
|PSR| = c (e.g. AVG)
Gray et. al
|PSR| = n (e.g. MEDIAN)
|PSR| = d (e.g. COUNT DISTINCT)
|PSR| < n (e.g. HISTOGRAM)
Examples
MEDIAN : unbounded,
MAX : 1 record
Affects
Effectiveness of TAG
36
Benefit of In-Network
Processing
Simulation Results
2500 Nodes
Total Bytes Xmitted vs. Aggregation Function
50x50 Grid
Neighbors = ~20
Uniform Dist.
90000
Total Bytes Xmitted
Depth = ~10
100000
Unique
80000
70000
Holistic
• Aggregate & depth
dependent benefit!
60000
50000
40000
30000
Distributive
Algebraic
20000
10000
0
EXTERNAL
MAX
AVERAGE DISTINCT
MEDIAN
Aggregation Function
37
Monotonicity & Exemplary vs. Summary
Property
Examples
Affects
Partial State
MEDIAN : unbounded,
MAX : 1 record
Effectiveness of TAG
Monotonicity
COUNT : monotonic
AVG : non-monotonic
MAX : exemplary
COUNT: summary
Hypothesis Testing, Snooping
Exemplary vs.
Summary
Applicability of Sampling,
Effect of Loss
38
Channel Sharing (“Snooping”)
Insight: Shared channel can reduce communication
Suppress messages that won’t affect aggregate
• E.g., MAX
• Applies to all exemplary, monotonic aggregates
Only snoop in listen/transmit slots
• Future work: explore snooping/listening tradeoffs
39
Hypothesis Testing
Insight: Guess from root can be used for
suppression
• E.g. ‘MIN < 50’
• Works for monotonic & exemplary aggregates
• Also summary, if imprecision allowed
How is hypothesis computed?
• Blind or statistically informed guess
• Observation over network subset
40
Experiment: Snooping vs.
Hypothesis Testing
Uniform
Value
Distribution
Dense
Packing
Ideal
Pruning at
Leaves
Communication
Messages/ Epoch vs. Network Diameter
(SELECT MAX(attr), R(attr) = [0,100])
3000
No Guess
Messages / Epoch
2500
Guess = 50
Guess = 90
2000
Snooping
1500
1000
Pruning in
Network
500
0
10
20
30
40
Network Diameter
50
41
Duplicate Sensitivity
Property
Examples
Affects
Partial State
MEDIAN : unbounded,
MAX : 1 record
Effectiveness of TAG
Monotonicity
COUNT : monotonic
AVG : non-monotonic
MAX : exemplary
COUNT: summary
MIN : dup. insensitive,
AVG : dup. sensitive
Hypothesis Testing, Snooping
Exemplary vs.
Summary
Duplicate
Sensitivity
Applicability of Sampling,
Effect of Loss
Routing Redundancy
42
Use Multiple Parents
Use graph structure
• Increase delivery probability with no communication overhead
For duplicate insensitive aggregates, or
Aggs expressible as sum of parts
SELECT COUNT(*)
• Send (part of) aggregate to all parents
• In just one message, via multicast
R
• Assuming independence, decreases variance
P(link xmit successful) = p
P(success from A->R) = p2
E(cnt) = c *
p2
Var(cnt) = c2 * p2 * (1 – p2)
V
# of parents = n
E(cnt) = n * (c/n * p2)
(c/n)2
Var(cnt) = n *
*
p2 * (1 – p2) = V/n
B
C
c
c/n
n=2
c/n
A
43
Multiple Parents Results
Better than
previous analysis
Critical
expected!
Link!
Losses aren’t
independent!
Insight: spreads
data over many
links
With Splitting
No Splitting
Benefit of Result Splitting
(COUNT query)
1400
Avg. COUNT
1200
1000
800
Splitting
No Splitting
600
400
200
0
(2500 nodes, lossy radio model, 6 parents per
node)
44
Taxonomy Related Insights
Communication Reducing
• In-network Aggregation (Partial State)
• Hypothesis Testing (Exemplary & Monotonic)
• Snooping (Exemplary & Monotonic)
• Sampling
Quality Increasing
• Multiple Parents (Duplicate Insensitive)
• Child Cache
45
TAG Contributions
Simple but powerful data collection language
• Vehicle tracking:
SELECT ONEMAX(mag,nodeid)
EPOCH DURATION 50ms
Distributed algorithm for in-network aggregation
• Communication Reducing
• Power Aware
• Integration of sleeping, computation
• Predicate-based grouping
Taxonomy driven API
• Enables transparent application of techniques to
• Improve quality (parent splitting)
• Reduce communication (snooping, hypo. testing)
46
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
47
Acquisitional Query Processing
(ACQP)
Closed world assumption does not hold
• Could generate an infinite number of samples
An acqusitional query processor controls
• when,
• where,
• and with what frequency data is collected!
Versus traditional systems where data is provided a priori
Madden, Franklin, Hellerstein, and Hong. The Design of An
Acqusitional Query Processor.
SIGMOD, 2003
48
ACQP: What’s Different?
How should the query be processed?
• Sampling as a first class operation
• Event – join duality
How does the user control acquisition?
• Rates or lifetimes
• Event-based triggers
Which nodes have relevant data?
• Index-like data structures
Which samples should be transmitted?
• Prioritization, summary, and rate control
49
Operator Ordering: Interleave Sampling +
Selection
At 1 sample / sec, total power savings
SELECT light, mag
• could
E(sampling
mag) as
>> 3.5mW
E(sampling
light)
be
as
much

FROM sensors
1500 uJ vs.
90
uJ
Comparable
to
processor!
WHERE pred1(mag)
AND pred2(light)
Correct ordering
EPOCH DURATION 1s (unless pred1 is very selective
Traditional DBMS
and pred2 is not):
(pred1)
(pred2)
(pred1)
ACQP
Costly
(pred2)
Cheap
mag
light
mag
light
(pred2)
light
(pred1)
mag
50
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s)
FROM sensors
WHERE mag > x
EPOCH DURATION 1s
Traditional DBMS
WINMAX
(mag>x)
ACQP
WINMAX
(mag>x)
mag
(light > MAX)
• Novel, general
pushdown
technique
• Mag sampling is
the most
expensive
operation!
light
mag
light
51
Lifetime Queries
Lifetime vs. sample rate
SELECT …
EPOCH DURATION 10 s
SELECT …
LIFETIME 30 days
Extra: Allow a MAX SAMPLE PERIOD
• Discard some samples
• Sampling cheaper than transmitting
52
(Single Node) Lifetime Prediction
Voltage vs. Time, Measured Vs. Expected
Lifetime Goal = 24 Weeks (4032 Hours. 15 s / sample)
Voltage (Raw Units)
1000
Voltage (Expected)
Voltage (Measured)
Linear Fit
R2 = 0.8455
900
800
700
Expected
Measured
1030
1010
600
990
500
970
950
400
0
100
200
300
Insufficient Voltage to
Operate (V = 350)
300
0
1000
2000
Time (Hours)
3000
4000
53
Overview
TinyDB: Queries for Sensor Nets
Processing Aggregate Queries (TAG)
Taxonomy & Experiments
Acquisitional Query Processing
Other Research
Future Directions
54
Sensor Network Challenge
Problems
Temporal aggregates
Sophisticated, sensor
network specific aggregates
• Isobar Finding
• Vehicle Tracking
• Lossy compression
• Wavelets
“Isobar Finding”
Hellerstein, Hong, Madden, and Stanek. Beyond Average. IPSN 2003
55
TinyDB Deployments
Initial efforts:
• Network monitoring
• Vehicle tracking
Ongoing deployments:
•
•
•
•
Environmental monitoring
Generic Sensor Kit
Building Monitoring
Golden Gate Bridge
56
Data Storage
Recently Introduced
Larger capacity, larger battery power
Usual sensors send their data to it
It replies queries
(sheng et. al ACM MobiHoc 2006)
57
Problems
Data Storage Placement
• (Sheng et. al paper)
Data Storage Selection
• Our method : An adaptive and
decentralized method
58
Costs in the system
59
Overall cost
60
Our method
61
Our method (Cont.)
62
Our results
Very Good !!
2500
2000
1500
cost
1000
500
248
235
204
191
177
166
157
145
131
117
106
92
79
65
55
43
29
19
1
0
id
63
References
Book:
• Wireless Sensor Networks: An Information
Processing Approach, by F. Zhao and L. Guibas,
Elsevier, 2004.
Papers:
• [1]Bo Sheng, Qun Li, and Weizhen Mao. Data
Storage Placement in sensor networks ,ACM
Mobihoc 2006, Florence, Italy, May 22-25,
2006,
• [2]B. Bonfils,.P. Bonnet , Adaptive and
Decentralized Operator Placement for InNetwork Query Processing ,2003, springer
verlag .
64
References(Cont.)
• [3]S. Bhattacharya, H. Kim, S. Prabh, and T. Abdelzaher.
Energy-conserving data placement and asynchronous
multicast in wireless sensor networks. In Proceedings of
the 1st international conference on Mobile systems,
applications and services, pages 173–185, New York, NY,
USA, 2003. ACM Press.
• [4]H. S. Kim, T. F. Abdelzaher, and W. H. Kwon. Minimumenergy asynchronous dissemination to mobile sinks in
wireless sensor networks. In Proceedings of the 1st
international conference on Embedded networked sensor
systems, pages 193–204, New York, NY, USA, 2003. ACM
Press.
• [5] A. Trigoni, Y. Yao, A. Demers, J. Gehrke and R.
Rajaraman. Multi-Query Optimization for Sensor
Networks. in the International Conference on Distributed
Processing on Sensor Systems (DCOSS), 2005.
65
References(Cont.)
• [6]Madden S., Franklin M.J., Hellerstein
J.M., Hong W., The Design of an
Acquisitional Query Processor For
Sensor Networks, Proc. Int. Conf. on
Management of Data (SIGMOD), San
Diego (USA), 2003.
• [7] P. Bonnet, J. Gehrke, P. Seshadri,
Towards Sensor Database Systems,
Lecture Notes in Computer Science,
2001, Springer Verlag
66
Descargar

Reserch in Sensor Networks at USC/ISI UCB-Uwash