Indexing and Searching in Wireless
Sensor Networks
Demetris Zeinalipour
[ [email protected] ]
School of Pure and Applied Sciences
Open University of Cyprus
HPCL, University of Cyprus, February 14th, 2008
http://is.ouc.ac.cy/~zeinalipour/
Presentation Goals
•
To provide an overview of recent
developments in Wireless Sensor
Network Technology
•
To highlight some important
indexing and searching
challenges that arise in this context
2
Acknowledgements
• This is a joint work with my collaborators at the
University of California – Riverside.
• Our results were presented in the following
papers:
– "MicroHash: An Efficient Index Structure for FlashBased Sensor Devices",
D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos and W.
Najjar, The 4th USENIX Conference on File and Storage
Technologies (FAST’05), San Francisco, USA, December, 2005.
– " Efficient Indexing Data Structures for Flash-Based
Sensor Devices",
S. Lin, D. Zeinalipour-Yazti, V. Kalogeraki, D. Gunopulos, W.
Najjar, ACM Transactions on Storage (TOS), ACM Press, Vol.2,
No. 4, pp. 468-503, November 2006.
3
Main Citations
•
Capsule: an energy-optimized object storage system for memory-constrained
sensor devices, ACM Sensys'06 (UMass, Amherst)
Ultra-low power data storage for sensor networks, IEEE IPSN'06 (UMass,
Amherst)
TINX: a tiny index design for flash memory on wireless sensor devices, ACM
Sensys'06 (Stanford)
Re-thinking data management for storage-centric sensor networks, IEEE
CIDR'07 (UMass, Amherst)
FlashDB: dynamic self-tuning database for NAND flash, IEEE IPSN'07,
(Microsoft Research, Redmond)
MStore: Enabling Storage-Centric Sensornet Research, IEEE IPSN'07
(Stanford)
EnviroMic: Towards Cooperative Storage and Retrieval in Audio Sensor
Networks, IEEE ICDCS'07 (Univ. of Illinois at Urbana-Champaign)
•
•
•
•
•
•
•
DALi: A Communication-Centric Data Abstraction Layer for Energy-Constrained
Devices in Mobile Sensor Networks, ACM Mobisys'07 (Princeton University)
4
Presentation Outline
1.
Overview of Wireless Sensor Networks
2.
Overview of Data Acquisition Models
3.
The MicroHash Index Structure.
4.
MicroHash Experimental Evaluation
5.
Conclusions and Future Work
5
Wireless Sensor Networks
•
6
Resource constrained devices utilized for
monitoring and studying the physical
world at a high fidelity.
Wireless Sensor Device
7
Wireless Sensor Network
8
Wireless Sensor Networks
•
Applications have already emerged in:
– Environmental and habitant monitoring
– Seismic and Structural monitoring
– Understanding Animal Migrations &
Interactions between species
– Automation, Tracking, Hazard Monitoring
Scenarios, Urban Monitoring etc
9
Great Duck Island – Maine
(Temperature, Humidity etc).
Golden Gate – SF, Vibration
and Displacement of the
bridge structure
Zebranet (Kenya)
GPS trajectory
Wireless Sensor Networks
The Great Duck Island Study (Maine, USA)
Large-Scale deployment by Intel Research,
Berkeley in 2002-2003 (Maine USA).
Focuses on monitoring microclimate in and
around the nests of endangered species
which are sensitive to disturbance.
•
•
•
They deployed more than 166 motes
installed in remote locations (such as 1000
feets in the forest)
10
Wireless Sensor Networks
WebServer
11
Wireless Sensor Networks
The James Reserve Project, CA, USA
12
Available at: http://dms.jamesreserve.edu/
The Anatomy of a Sensor Device
• Processor, in various (sleep, idle, active) modes
• Power source AA or Coin batteries, Solar Panels
• SRAM used for the program code and for inmemory buffering.
• LEDs used for debugging
• Radio, used for
transmitting the acquired
Storage
data to some storage site
(SINK) (9.6Kbps-250Kbps)
• Sensors: Numeric readings in a limited range
(e.g. temperature -40F..+250F with one decimal
13
point precision) at a high frequency (2-2000Hz)
Sensor Devices & Capabilities
Sensing Capabilities
• Light
• Temperature
• Humidity
• Pressure
• Detect Tone
• Wind Speed
• Soil Moisture
• Location (GPS)
• etc…
14
UCRiverside
RISE
Xbow’s
Mica
UC-Berkeley
mica2dot
Xbow’s
Telos
Xbow’s
i-mote2
Characteristics
1. The Energy Source is limited.
Energy source: AA batteries, Solar Panels
2. Local Processing is cheaper than transmitting
over the radio.
Transmitting 1 Byte over the Radio consumes as
much energy as ~1200 CPU instructions.
3. Local Storage is cheaper than transmitting
over the radio.
Transmitting 512B over a single-hop 9.6Kbps
(915MHz) radio requires 82,000μJ, while writing
to local flash only 760μJ.
15
The Programming Cycle
•
The Operating System
TinyOS (UC-Berkeley): Component-based architecture that
allows programmers to wire together the minimum required
components in order to minimize code size and energy
consumption
(The operating system is really a number of libraries that can be
statically linked to the sensor binary at compile time)
•
The Programming Language
nesC (Intel Research, Berkeley): an event-based C-variant
optimized for programming sensor devices
event result_t Clock.fire() {
state = !state;
if (state)
call Leds.redOn();
else
call Leds.redOff();
}
“Hello World”: Blinking the red LED!
The Programming Cycle
The Testing Environment
•
•
Debugging code directly on a sensor device is a tedious
procedure
nesC allows programmers to compile their code to
•
•
•
•
A Binary File that is installed on the sensor
A Binary File that runs on a PC
TOSSIM (TinyOS Simulation) is the environment which allows
programmers to simulate the mote binary directly on a PC.
This enables accurate simulations, fine grained energy
modeling (with PowerTOSSIM) and visualization (TinyViz)
The Programming Cycle
The Pre-deployment Environment
•
•
•
•
Once you have created and debugged you code you can perform a
deployment in a laboratory environment.
Harvard’s MoteLab uses 190 sensors, powered from wall power
interconnected with an Ethernet connection.
The Ethernet is just for debugging and reprogramming, while the
Radio for actual communication between motes
Motes can be reprogrammed through a web interface.
Available at: http://motelab.eecs.harvard.edu/
Presentation Outline
1.
Overview of Wireless Sensor Networks (WSN)
2.
Overview of Data Acquisition Models
3.
The MicroHash Index Structure
4.
MicroHash Experimental Evaluation
5.
Conclusions and Future Work
19
The Centralized Storage Model
• A Database that collects readings from many
Sensors.
• Centralized: Storage, Indexing, Query
Processing, Triggers, etc.
20
Centralized Storage I
Crossbow’s MoteView software
• No in-network Aggregation
• No in-Network Filtering
21
Available at: http://www.xbow.com/
Centralized Storage II
TinyDB - A Declarative Interface for Data
Acquisition in Sensor Networks.
• In-Network Aggregation
• In-Network Filtering (i.e., WHERE clause)
v1
75
90
85
90
v3
v2
70
v4
90
v5
22
MAX
90
65
70
70
e.g., SELECT MAX(temp)
FROM sensors
Available at: http://telegraph.cs.berkeley.edu/tinydb/
Centralized Storage: Conclusions
•
Frameworks such as TinyDB:
- Are suitable for continuous queries.
- Push aggregation in the network but keep
much of the processing at the sink.
•
New Challenges:
- Many applications do not require the query to
be evaluated continuously (e.g., Average
temperature in the last 6 months?)
- In many applications there is no sink (e.g.,
remote deployments and mobile sensor nets)
- Local Storage on sensors keeps increasing
(e.g., RISE and more recently imote2)
Our Model: In-Situ Data Storage
1. The data remains In-situ (at the generating site) in
a sliding window fashion.
2. When required, users conduct on-demand
queries to retrieve information of interest.
The Sink
Programming board
24
A Network of Sensor Databases
In-Situ Data Storage: Motivation
Soil-Organism Monitoring
(Center for Conservation Biology, UCR)
– A set of sensors monitor the CO2 levels in the soil over
a large window of time.
– Not a real-time application.
– Many values may not be very interesting.
25
D. Zeinalipour-Yazti, S. Neema, D. Gunopulos, V. Kalogeraki and W. Najjar,
"Data Acquision in Sensor Networks with Large Memories", IEEE Intl. Workshop on
Networking Meets Databases NetDB (ICDE'2005), Tokyo, Japan, 2005.
Presentation Outline
1.
Overview of Wireless Sensor Networks
2.
Overview of Data Acquisition Models
3.
The MicroHash Index Structure
4.
MicroHash Experimental Evaluation
5.
Conclusions and Future Work
26
Flash Memory at a Glance
•
The most prevalent storage medium used for Sensor
Devices is Flash Memory (NAND Flash)
Fastest growing memory market (‘05 $8.7B, ‘06:$11B)
•
(NAND) Flash Advantages
• Simple Cell Architecture (high
capacity in a small surface)
• Fast Random Access (50-80 μs)
compared to 10-20ms in Disks
• Economical Reproduction
• Shock Resistant
• Power Efficient
27
Surface mount
NAND flash
Removable NAND Devices
Flash Memory at a Glance
1.
2.
3.
Write-Constraint: Writing can only be performed at
a page granularity (256B~512B), after the respective
page (and its respective 8KB~64KB block!) has been
deleted
Delete-Constraint: Erasure of a page can only be
performed at a block granularity (i.e. 8KB~64KB)
Wear-Constraint: Each page can only be written a
limited number of times (typically 10,000-100,000)
Flash Media
Block 1
Measurements using RISE
Energy (Page Size = 512 B)
Block 2
Read = 24 μJ
Write =763μJ
Block n
Occupied Page
28
Empty Page
Block Erase =425μJ
Asymmetric Read/Write Energy Cost :
MicroHash Objectives
General Objectives
•
Provide efficient access to any record stored
on flash by timestamp or value
Execute a wide spectrum of queries based on
our index, similarly to generic DB indexes.
•
Design Objectives:
•
•
•
Avoid wearing out specific pages.
Minimize random access deletions of pages.
Minimize SRAM structures
•
•
29
SRAM is extremely limited (8KB-64KB).
Small memory-footprint => quick initialization.
Main Structures
• 4 Page Types: a) Root Page, b) Directory Page,
c) Index Page and d) Data Page
• 4 Phases of Operation: a) Initialization, b)
30 Growing, c) Repartition and d) Garbage Collect.
Growing the MicroHash Index
• Collect data in an SRAM buffer page Pwrite
• When Pwrite is full flush it out to flash media
• Next create index records for each data record
in Pwrite
• If SRAM gets full, Index pages are forced out to
flash media by an LRU policy.
Buffer
Pwrite
(ts, 74F)
40
50
60
Index
70
80
31
x
90
Directory
Index Pages
Growing the MicroHash Index
A populated Flash Media
Flash Media
idx: next empty page
Garbage Collection in MicroHash
• When the media gets full some pages need to
be deleted => delete the oldest pages.
• Oldest Block? The next block following the idx
pointer.
Note:
• This might create invalid
index records.
• This will be handled by
our search algorithm
33
Directory Repartition in MicroHash
•
MicroHash starts out with a directory that is
segmented into equiwidth buckets
–
•
Not efficient as certain buckets will not be
utilized
–
34
e.g., divide the temperature range [-40,250] into c
buckets)
Consider the first few or last few buckets below.
Directory Repartition in MicroHash
•
•
If bucket A links to more than τ index pages,
evict the least used bucket B and segment
bucket A into A and A’
We want to avoid bucket reassignments of old
records as this would be very expensive
Example: τ=2
C: #entries since last split
S: timestamp of last addition
35
Searching in MicroHash
• Searching by value
“Find the timestamp (s) on which the temperature was
100F”
– Simple operation in MicroHash
– We simply find the right Directory Bucket, from there the
respective index page and then data record (page-by-page)
• Searching by timestamp
“Find the temperature of some sensor on a given
timestamp tq”
– Problem: Index pages are mixed together with data pages.
– Solutions:
36
1. Binary Search (O(logn), 18 pages for a 128MB media)
2. LBSearch (less than 10 pages for a 128MB media)
3. ScaleSearch (better than LBSearch, ~4.5 pages for a 128MB
media)
LBSearch and ScaleSearch
Solutions to the Search-by-timestamp problem:
A) LBSearch: We recursively create a lower
bound on the position of tq until the given
timestamp is located.
B) ScaleSearch: Quite similar to LBSearch,
however in the first step we proceed more
aggressively (by exploiting data distribution)
Query
tq=500
37
tq=300
tq=350
tq=420
tq=490
tq=500
Presentation Outline
1.
Overview of Wireless Sensor Networks
2.
Overview of Data Acquisition Models
3.
The MicroHash Index Structure
4.
MicroHash Experimental Evaluation
5.
Conclusions and Future Work
38
Experimental Evaluation
• Implemented MicroHash in nesC.
• We tested it using TinyOS along with a
trace-driven experimental methodology.
• Datasets:
– Washington State Climate
• 268MB dataset contains readings in 2000-2005.
– Great Duck Island
• 97,000 readings between October and November
2002.
• Evaluation Parameters: i) Space
Overhead, ii) Energy Overhead, iii) Search
Performance
39
Space Overhead of Index
• Index page overhead Φ = IndexPages/(DataPages+IndexPages)
• Two Index page layouts
– Offset, an index record has the following form {datapageid,offset}
– NoOffset, in which an index record has the form {datapageid}
• 128 MB flash media (256,000 pages)
40
Space Overhead of Index
Increasing the Buffer Decreases the Index Overhead
Black
denotes the
index pages
41
Search Performance
• 128 MB flash media (256,000 pages), varied SRAM (buffer) size
• 2 Index page layouts
– Anchor: Index Pages store the last known timestamp
– No Anchor: Timestamp is only stored in Data Pages
42
Indexing the Great Duck Island Trace
• Used 3KB index buffer and a 4MB flash card to store all
the 97,000 20-byte data readings.
– The index pages never require more than 28% additional space
– Indexing the records has only a small increase in energy
demand: the energy cost of storing the records on flash
without an index is 3042mJ
– We were able to find any record by its timestamp with 4.75 page
reads on average
43
Presentation Outline
1.
Overview of Wireless Sensor Networks
2.
Overview of Data Acquisition Models
3.
The MicroHash Index Structure
4.
MicroHash Experimental Evaluation
5.
Conclusions and Future Work
44
Conclusions & Future Work
• We proposed the MicroHash index, which is an
efficient external memory hash index that
addresses the distinct characteristics of flash
memory
• Our experimental evaluation shows that the
structure we propose is both efficient and
practical
• Future work:
– Develop a complete library of indexes and
data structures (stacks, queues, b+trees, etc.)
– Buffer optimizations and Online Compression
– Support Range Queries
45
Indexing and Searching in Wireless
Sensor Networks
Demetris Zeinalipour
Thank you!
Questions?
Related Publications
•"MicroHash: An Efficient Index Structure for Flash-Based Sensor Devices", D.
Zeinalipour,S. Lin, V. Kalogeraki, D. Gunopulos, W. Najjar, In USENIX FAST’05.
• " Efficient Indexing Data Structures for Flash-Based Sensor Devices",
ACM Transactions on Storage (TOS), November 2006.
Presentation and publications available at:
http://is.ouc.ac.cy/~zeinalipour/
Searching Bottlenecks
• Index Pages written on flash might not be fully
occupied
• When we access these pages we transfer a lot
of empty bytes (padding) between the flash
media and SRAM.
• Proposed Solutions:
– Solution 1: Two-Phase Page Reads
– Solution 2: ELF-like Chaining of Index Pages
48
Improving Search Performance
• Solution 1: Utilize Two-Phase Page Reads.
– Reads the 8B header from the flash media.
– Then read the correct payload in the next
phase.
49
Improving Search Performance
• Solution 2: Avoid non-full index pages using ELF*.
– ELF: a linked list in which each page, other than the
last page, is completely full.
– keeps copying the last non-full page into a newer
page, when new records are requested to be added.
*Dai et. al., Efficient Log Structured Flash File System, SenSys 2004
50
Search Performance
•
•
51
We compared MicroHash vs. ELF Index Page
Chaining by searching all values in the range [20,100]
Keeping full index pages increases search
performance but decreases insertion performance.
Increasing search
performance using ELF
(10% less reads)
Decreasing indexing
performance using ELF
(15% more writes)
Descargar

MicroHash: An efficient Index Structure for Wireless