Grid Computing
Sudhindra Rao
Outline
•
•
•
•
•
•
•
•
•
•
History of Distributed Computing
Grid – Definition, Architecture details
P2P versus Grid
Webservices
Java – anywhere computing paradigm
Middleware
Grid models and recent research
Research directions
Tools and grids available
References
History
• Shift from Centralized Computing to
Distributed Computing – powerful
processors, faster networks
• Parallel computing based on MPI and
PVM models
• Cluster Computing
• Peer-to-peer computing
• Grid computing
Application and Infrastructure
technology trends
Serial
applications
Client Server
P2P
Service virtualization
• CORBA
• App Integration
• Web services
Parallel
applications
• COM/DCOM
• Reliable
messaging
• Service Registration
• Multi-threaded
• MPI/PVM
• OpenMP
•. NET
• J2EE
• Reliable
execution
• Custom distributed
systems
• Service discovery
• Location independent
service invocation
• Lifting apps off the
servers
Time
Monolithic
Mainframes
Open
Distributed
Open Systems
Clusters
• Unix
• DRM
• Direct attached
Storage
• Linux
Storage :
Direct attached
Storage
Infrastructure
Virtualization
• Grid
• Windows
Storage :
Virtualized
Storage :
Direct attached
Storage
• OGSA
• Data Grid
• Service provisioning
NETWORKING
COMPUTING
Technology Evolution: Cluster, Grid,
P2P
* HTC
*
* Mainframes
Minicomputers
* PCs
* Crays
*XEROX PARC worm
* Email
* Sputnik
* ARPANET
1960
1970
1975
* TCP/IP
* Ethernet
* P2P
* PDAs
* Workstations
* Grids
* PC Clusters
* MPPs * WS Clusters
* IETF
* HTML* Mosaic
* Internet Era
1980
* W3C
1985
* WWW Era
1990
1995
* Web Service
* XML
2000
What is Cluster/Grid ?
• A type of parallel and distributed system that
enables the sharing, selection, & aggregation
of resources distributed in administrative domains
depending on their availability, capability,
performance, cost, and users quality of service
requirements.
A Cluster
A Single
A Cluster
A Cluster
Cluster
Grid
Approaches for Parallel
Programming
• Implicit Parallelism
– Supported by parallel languages and parallelizing
compilers that take care of identifying parallelism, the
scheduling of calculations and the placement of data.
• Explicit Parallelism
– In this approach, the programmer is responsible for most
of the parallelization effort such as task decomposition,
mapping task to processors, the communication structure.
– This approach is based on the assumption that the user is
often the best judge of how parallelism can be exploited for
a particular application.
Parallel Programming Models and
Tools
• Shared Memory Model
– DSM
– Threads/OpenMP (enabled for clusters)
– Java threads (HKU JESSICA, IBM cJVM)
• Message Passing Model
– PVM
– MPI
• Hybrid Model
– Mixing shared and distributed memory model
– Using OpenMP and MPI together
• Object and Service Oriented Models
– Wide area distributed computing technologies
• OO: CORBA, DCOM, etc.
• Services: Web Services-based service composition
Levels of Parallelism
PVM/MPI
Threads
Compilers
CPU
Task i-l
func1 ( )
{
....
....
}
a ( 0 ) =..
b ( 0 ) =..
+
Task i
func2 ( )
{
....
....
}
a ( 1 )=..
b ( 1 )=..
x
Task i+1
func3 ( )
{
....
....
}
a ( 2 )=..
b ( 2 )=..
Load
Code-Granularity
Code Item
Large grain
(task level)
Program
Medium grain
(control level)
Function (thread)
Fine grain
(data level)
Loop (Compiler)
Very fine grain
(multiple issue)
With hardware
Cluster Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation
PC/Workstation
PC/Workstation
PC/Workstation
Communications
Communications
Communications
Communications
Software
Software
Software
Software
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Network Interface
Hardware
Cluster Interconnection Network/Switch
A Typical P2P Computing Environment
Peer Discovery Service
Peer Agent
Application
pN
pM
Request
Sorry, I am busy.
Peer Agent
P2
Request
Peer Agent
Response
P1
R7
p4
p5
P3
CPM: DC Economy-based P2P Computing
(Jxta based Implementation)
Market
Server
- Discovery
- Membership
User (Consumer)
Market
Repository
CPM Agent
Trader
Bill
Job
Management
Accounting
Resources
(Provider)
Definition of a Grid
• Grid is a type of parallel and distributed system that enables
the sharing, selection, and aggregation of geographically
distributed "autonomous" resources dynamically at runtime
depending on their availability, capability, performance, cost,
and users' quality-of-service requirements
• Coordinated resource sharing and problem solving in dynamic,
multi-institutional Virtual Organizations (VOs)
• Most current distributed technologies facilitate this in a local
environment
• J2EE, CORBA, VPN are a few examples
• Nomadic users and applications provide new avenues for providing
such a service
• Mechanisms required to coordinate trusted and untrusted access to
resources
Grid Architecture
A Typical Grid Computing Environment
Grid Information Service
Grid Resource Broker
R2
2
R5
R3
Application
database
R4
RN
Grid Resource Broker
R6
Grid Information Service
R1
Resource Broker
Virtual Drug Design
A Virtual Lab for “Molecular Modeling for Drug
Design” on P2P Grid
Data Replica
Catalogue
Grid Market
Directory
“Give me list PDBs sources
Of type aldrich_300?”
“Screen 2K molecules
in 30min. for $10”
Grid Info.
Service
GTS
Resource
Broker
“mol.5 please?”
GTS
(RB maps suitable
Grid nodes and
Protein DataBank)
PDB2
GTS
GTS
PDB1
GTS
(GTS - Grid
Trade Server)
Scalable Seamless Computing: Breaking
Administrative Barriers
2100
2100
2100
2100
2100
2100
2100
2100
?
P
E
R
F
O
R
M
A
N
C
E
2100
Administrative Barriers
•Individual
•Group
•Department
•Campus
•State
•National
•Globe
•Inter Planet
•Galaxy
Desktop
SMPs or
SuperComputers
Local
Cluster
Enterprise
Cluster/Grid
Global
Cluster/Grid
Inter Planetary
Grid!
Basic Elements
Security
Computational Economy
Uniform Access
Resource Discovery
Resource Allocation
& Scheduling
System Management
Data locality
Application Development Tools
Network Management
Cluster, Grid, P2P: Characteristics
Characteristic
Cluster
Grid
P2P
Population
Commodity
Computers
High-end computers
Edge of network
(desktop PC)
Ownership
Single
Multiple
Multiple
Discovery
Membership
Services
Centralised Index &
Decentralised Info
Decentralized
User Management
Centralised
Decentralised
Decentralised
Resource mgmt
Centralized
Distributed
Distributed
Allocation/Scheduling
Centralised
Decentralised
Decentralised
Inter-Operability
VIA based?
No standards yet
No standards
Single System Image
Yes
No
No
Scalability
100s
1000?
Millions? [@Home]
Capacity
Guaranteed
Varies, but high
Varies
Throughput
Medium
High
Very High
Speed(Lat. Bandwidth)
Low, high
High, Low
High, Low
Issues in Grid computing
• Protocols required for interoperability
• Define standard services – for access of
computation, data, resource discovery etc.
• APIs and SDKs to assist such protocol and
service deployment
• Current Distributed Computing – Resource
sharing in single organization – limited to
sharing certain resource types only
• Need of services to support a common set of
applications – Middleware
Projects
• Globus – A toolkit for grid computing
infrastructure development
• Gridbus
• Legion
• OGSA – Standard for developing Grid
application infrastructure (derived from
Globus)
mix-and-match
Object-oriented
Internet/partial-P2P
Network enabled Solvers
Economic-based Utility /
Service-Oriented Computing
Nimrod-G
Some Global Initiatives
• Australia
– Nimrod-G
– Gridbus
– GridSim
– Virtual Lab
– DISCWorld
– GrangeNet.
– ..etc
• Europe
– UK eScience
– EU Data Grid
– Cactus
– XtremeWeb
– ..etc.
• India
– I-Grid
 Japan
– Ninf
– DataFarm
• Korea...
N*Grid
• Singapore
NGP
• USA
–
–
–
–
–
–
–
–
–
–
AppLeS
Globus
Legion
Sun Grid Engine
NASA IPG
Condor-G
Jxta
NetSolve
AccessGrid
and many more...
• Cycle Stealing & .com Initiatives
– Distributed.net
– [email protected], ….
– Entropia, UD, SCS,….
• Public Forums
–
–
–
–
–
Global Grid Forum
Australian Grid Forum
IEEE TFCC
CCGrid conference
P2P conference
Globus Approach
• A toolkit and collection of services addressing key
technical problems
– Modular “bag of services” model
– Not a vertically integrated solution
– General infrastructure tools (aka middleware) that can
be applied to many application domains
• Inter-domain issues, rather than clustering
– Integration of intra-domain solutions
• Distinguish between local and global services
Grid computing – SuperScalar model
IBM
FXU
IS U
FPU
FXU
• Ease
the
applications
• Basic idea:
IS U
FPU
ID U
ID U
LSU
IF U
LSU
L 3 D ire c to ry/C o n tro l
BXU
IF U
BXU
L2
L2
L2
programming

ns  seconds/minutes/hours
of
GRID
Grid
Automatic code generation
app.idl
gsstubgen
client
app.c
server
app-stubs.c
app.h
app-worker.c
app-functions.c
Automatic code generation
app-worker.c
app-functions.c
app.c
app-stubs.c
GRID superscalar
runtime
serveri
.
.
.
GT2
client
app-worker.c
app-functions.c
serveri
Production Grids & Testbeds
NASA’s Information Power Grid
The Alliance National Technology Grid
GUSTO Testbed
Testbed Statistics
(Browse the Testbed)
•
Grid Nodes: 218 distributed across 62 sites in 21 countries.
– Laptops, desktop PCs, WS, SMPs, Clusters,
supercomputers
– Total CPUs: 3000+ (~3 TeraFlops)
•
CPU Architecture:
– Intel x86, IA64, AMD, PowerPC, Alpha, MIPS
•
Operating Systems:
– Windows or Unix-variants – Linux, Solaris, AIX, OSF, Irix,
HP-UX
•
Intranode Network:
– Ethernet, Fast Ethernet, Gigabit, Myrinet, QsNet,
PARAMNet
•
Internet/Wide Area Networks
– GrangeNet, AARNet, ERNet, APAN, TransPAC, & so on.
Grid Technologies and Applications
Natural
Language
Engineering
High
Energy
Physics
Molecular
Docking
Portfolio
Analysis
Brain
Activity
Analysis
GAMESS
Chemistry
Grid
Apps.
High-level Services and Tools
…
G-Monitor
Grid Brokers & Schedulers
Alchemi:
.NET Grid
Services
+Clustering
of desktop
PCs
.NET
Windows
Nimrod-G
Globus
MDS
Gridscape
Programming Framework
GRAM
GASS
Gridbus Data Broker
Data
Management
Services
Grid
Bank
GMD
PKI-based Grid Security Interface (GSI)
JVM
Solaris
Condor
Linux
PBS
AIX
SGE
IRIX
LSF
OSF
1
Tomcat
HP UX
User-Level
Middleware
(Grid Tools)
Core Grid
Middleware
Grid
Fabric
Classes of Applications that can be
powered by Grids
•
•
•
•
•
•
•
•
Distributed HPC (Supercomputing):
– Computational science.
High-Capacity/Throughput Computing:
– Large scale simulation/chip design & parameter studies.
Content Sharing (free or paid)
– Sharing digital contents among peers (e.g., Napster)
Remote software access/renting services:
– Application service provides (ASPs) & Web services.
Data-intensive computing:
– Drug Design, Particle Physics, Stock Prediction...
On-demand, realtime computing:
– Medical instrumentation & Mission Critical.
Collaborative Computing:
– Collaborative design, Data exploration, education.
Service Oriented Computing (SOC):
– Towards economic-based Utility Computing: New paradigm, new applications, new
industries, and new business.
Analysis Summary
Application
Data Size
Processing
Time
Nodes
Belle Analysis
(HEP)
300 MB input (100
jobs – 3MB each)
30 min.
Australia,
Japan
Financial Portfolio 50 MB output (50
Analysis
jobs – 1MB each)
20 min.
Global
Newswire
Indexing
80 MB input (12
jobs – 7MB each
job)
20 min.
GrangeNet,
Australia
GAMESS
4KB for each job.
Total output:
860MB
compressed
Each job
took 5-78
minutes.
Total 15
hours
Global
(130 nodes,
15 sites)
What is Grid computing?
• Grid is the next-generation internet
• Grid requires a distributed operating
system
• Grid requires new programming models
• Grid does not need high performance
computers
Research directions
• Publisher/Subscriber systems on the Grid – How
can the grid be used to manage such
applications and what are the issues
• What levels of selectivity and regionalism is
expected from VOs?
• How to handle the dynamics of the topology and
nodes?
• Addressing QoS on Grid – best effort ?
• Efficient Discovery and Retrieval
• Replication techniques
References
• List of available resources on grid computing http://www.gridcomputing.com
• Foster I., Kesselman, C., and Tuecke, S., - “The Anatomy of the
Grid- Enabling Scalable Virtual Organizations” – Intl J.
SuperComputer Applications, 2001
• Casanova, H., “Distributed Computing Research Issues in Grid
Computing” – ACM SIGACT News Distributed Computing Column 8
July, 2002
• Lau, F., Ho, R. and Wang, C., “Grid Computing: Challenges and
Design Approaches”
• “The grid : blueprint for a new computing infrastructure” Editors
Foster, I., and Kesselman, C. , Elsevier, 2004
Descargar

Grid Computing - Department of Electrical Engineering …