IBM InfoSphere Streams
Enabling a smarter planet
Roger Rea
InfoSphere Streams Product Manager
[email protected]
Sept 15, 2010
© 2010 IBM Corporation
Moore’s Law drives new waves of technology …
Multicore
Chips
Embedded
Chips
2,000
Billions of Units Shipped
2 Technology Waves
1,000
The “Internet of Things”
500
10
5
S/360
IBM PC
World Wide Web
1
1960 1965 1970 1975
1980 1985
1990 1995 2000
2005 2010 2015
Source: IDC, SSR and IBM Market Insight
Welcome to the Decade of Smart
2
© 2010 IBM Corporation
Time is ripe for a new era of computing in support of Big Data
• Emerging trends create need for new languages
• Scientific programming  Fortran
• Business programming  Cobol
• Systems programming at higher level  C
• Increased productivity  C++
• Web programming  Java
• Streaming data sources and multicore architectures
• Streams Processing Language
3
© 2010 IBM Corporation
IBM InfoSphere Streams
• Streaming analytic applications
• Multiple input streams
• Advanced streaming analytics
• Eclipse based IDE
• Define sources, apply
operators, define intermediary
and final output sinks
• User defined operators in Java
or C++
• Optimizing compiler automates
deployment and connections
Source
Adapters
Operator Repository
Sink
Adapters
InfoSphere Streams Studio
(IDE for Streams Processing Language)
Automated, Optimized Deploy
and Management (Scheduler)
• Extremely low latency
• Cluster of up to 125 nodes
4
© 2010 IBM Corporation
Scalable stream processing
• InfoSphere Streams provides
• A programming model and IDE for defining data sources and
software analytic modules called operators that are fused into
process execution units (PEs)
• infrastructure to support the composition of scalable stream
processing applications from these components
• deployment and operation of these applications across distributed
x86 processing nodes, when scaled processing is required
• stream connectivity between data sources and PEs of a stream
processing application
5
© 2010 IBM Corporation
Streams offers tremendous deployment flexibility
With only a simple re-compile of application:
All on one machine fused
into one multi-threaded
process
All on one machine; each
operator in its own process
Each operator in its own process,
each process on its own machine
6
© 2010 IBM Corporation
ANISE: Active Network for Information from Synchrotron Experiments
High speed network to process data from synchrotrons in Canada and
US using the CANARIE network
Canadian Light
Source, Canada
Science Studio
Laboratory
Control
Module
Client Services
Layer
Browser
Browser
Beamline
Business Model
Layer
IOCs
Service
Proxies
Persistence
Layer
Device
Proxies
Science Studio specific
Component
Argonne Lab. US
XRF Processing
Labatory
Control
Module
XRD Processing
ANISE
Data
Data
Service
Service
General, common
Component
7
Beamline
Processing
Processing
Service
Service
IOCs
Stream
Computing
© 2010 IBM Corporation
TerraEchos Adelos™– Covert Intrusion Detection
• State-of-the-art covert surveillance
based on Streams platform
• Acoustic signals from buried fiber
optic cables are monitored,
analyzed and reported in real time
to locate intruders
• Currently designed to scale up to
1600 streams of raw binary data
8
© 2010 IBM Corporation
Forecasting Space Weather at LOFAR Outrigger in Scandinavia (LOIS)
Solar
Flares
 Radio signal
input and data
preparation
+
+
 Signal detection
and noise
filtering
=
Space Weather
prediction
regarding impact
on satellites and
electric grids
 Strength and 3D
directional
analysis
Triaxial Antenna
InfoSphere Streams
Swedish Institute of Space Physics
9
© 2010 IBM Corporation
Real Time Marine Mammal Position and Behavior Modeling
Filter wind &
wave noise
+
+
=
Model Marine
Mammal
environment
Correlate to
Galway Bay
ecosystem
Analytics &
Sensors
10
InfoSphere Streams
Advanced
Acoustical Analytics
© 2010 IBM Corporation
What are key advantages of Streams?
Language built for Streaming
applications:
• Reusable operators
• Rapid application development
• Continuous “pipeline”
processing
Compiling groups of operators into
single processes enables:
• Efficient use of cores
• Distributed execution
• Very fast data exchange
• Can be automatic or tuned
• Can be scaled with the push of a button
Use the data that gives
you a competitive
advantage:
• Can handle virtually
any data type
• Use data that is too
expensive and time
sensitive for other
approaches
Easy to extend:
• Built in adaptors
• Extend with C++ and Java
• Extend running applications
11
Extremely flexible and high
performance transport:
• Very low latency
• High data rates
© 2010 IBM Corporation
QUESTIONS ?
12
© 2010 IBM Corporation
Descargar

Slide 1