Holding slide prior to starting
show
(Some) Key Issues in Grid Computing
David Walker
School of Computer Science
Cardiff University
http://www.cs.cf.ac.uk/user/David.W.Walker
Main Thesis of Talk
• At a surface level many aspects of Grid
Computing appear to be straightforward, and
reduce to simple programming tasks and the
use of existing tools.
• This talk aims to show that for domain
scientists to effectively use the Grid many
challenging CS issues need to be addressed.
A Typical Scientific Process
Key Elements of the Grid
• The specification of problems – how do you
program the Grid?
• The dynamic discovery of Grid resources.
• Provenance support for Grid applications.
• The interoperability and federation of different
Grid middleware stacks.
• Grid access to legacy applications.
• Support for remote collaboration over the
Grid.
A Simple Example
• A simple use of the Grid involves the use of a
PSE or portal to do a set of pre-determined
tasks.
• This corresponds to the “utility computing”
mode of use.
• No support for building new applications or
services.
• No support for dynamic discovery of
resources.
• No support for collaboration.
Programming the Grid
• Problem specification could involve
– Use of high-level domain-specific
programming/scripting language.
– Representing coordinated tasks with a
workflow graph assembled in a visual
programming environment.
– Use of recommender systems to assist
users in formulating and solving problems.
Workflow
• Commonly used to represent
applications composed of interacting
services.
• Services may be hierarchical –
composed of other services.
• Easy to represent graphically, but not
scalable with number of services or
number of inputs/outputs.
Problems in Workflow
Composition
• How do you know that the input port of
one service is compatible with the
output port of another service?
• Given that the services may have been
created by different
people/organisations?
• Type signatures must match, but
semantics must also match.
Annotating Services
• To support “plug-and-play” between services
in a workflow requires the use of ontologies.
• Need to give semantic content (meaning) to
service inputs and outputs.
• This allows composition hints in the form of
“semantic suggestions”. For example, for a
given service port we could find all services
that could be connected to it.
Types of Workflow Composition
Manual
Semi-automated
“Semantic suggestions”
User generates workflow
graphically or through
text editor.
Triana
BPWS4J
Self-Serve
Automated
User still has to select the
service required from a
shortlist.
The entire composition
is automated using AI
technologies.
Cardoso & Sheth
GEODISE
myGRID
Sirin , Hendler et al.,
SHOP2
Pegasus – ISI
McIllraith
IRS-II
Workflow Composition in
Semantic Grids
• Semantic Web technologies enable automation at
several levels – automated resource discovery,
selection, management, service composition, execution.
• Promises automated seamless interoperation of
autonomous, heterogeneous distributed applications.
• Our focus is on the use of Semantic Web technologies to
automate service composition in Grid environments.
• See S Majithia, DW Walker, and WA Gray “Automatic
Composition of Web Services,” in Proceedings of the UK
e-Science Programme All-Hands Meeting 2004.
Available online at
http://www.allhands.org.uk/proceedings/papers/148.pdf
• Main developer is Shalil Majithia.
Framework - Overview
WFMS – Workflow Manager
Service
High level objective
WFMS
CWFC
AWFC
AWFC – Abstract Workflow
Composition Service
CWFC – Concrete Workflow
Composition Service
RS – Reasoning Service
MMS – Matchmaking Service
RS
RB
MMS
AWFR
AWFR – Abstract Workflow
Repository
CWFR
CWFR – Concrete Workflow
Repository
RB - Rulebase
Framework - Interactions
Client
WFMS
AWFC
CWFC
WFEE
1
2
3
4
5
6
7
8
1.
2.
3.
4.
High Level Request
Request for Abstract WF
Composed Abstract WF
Request for Concrete WF
5. Composed Concrete WF
6. Request for Execution
7. Results or Request for Alternatives
8. Final Results
Abstract Workflow Composer
• An abstract workflow specifies a workflow
without referring to a specific service
implementation .
• The Abstract Composer tries to generate an
abstract workflow by using:
– AWF Repository: stores semantically annotated
descriptions of services and workflows. Use
ontology to match services.
– Rulebase: a rulebase specifies the “recipe” to
achieve an objective
– Chaining services: try and chain services by
matching service outputs and inputs.
Concrete Workflow Composer
• A concrete workflow specifies an executable
workflow by referring to specific service
implementations.
• The Concrete Composer tries to generate an
executable workflow by using:
– Matchmaking: match abstract workflow with
service implementations available at that time.
– Chaining services: try and chain services by
matching service outputs and inputs.
Other Components
• Matchmaker service (based on that of
Paolucci et al.) adapted for dynamic
substitution.
• Chaining service: backward chaining
service based on domain ontologies.
• Repositories: store semantically
annotated abstract and concrete
workflows.
Implementation
<profileHierarchy:SignalProcessing rdf:ID="FFT">
• All components
<profile:input>
implemented as Web
<profile:ParameterDescription rdf:ID="FFTInput">
services using Axis
<profile:restrictedTo
server.
rdf:resource="Concepts.owl#VectorType"/>
• Services and workflows
</profile:ParameterDescription>
described using OWL-S.
</profile:input>
• DQL/JTP server used for
<profile:output>
subsumption reasoning
<profile:ParameterDescription rdf:ID="FFTOutput">
<profile:restrictedTo
• Rulebase implemented in
rdf:resource="Concepts.owl#ComplexSpectrum"/>
RuleML
</profile:ParameterDescription>
• Plug-in module enables
</profile:output>
generation of concrete
</profileHierarchy:SignalProcessing>
workflows in BPEL4WS.
Snippet of OWL-S Profile for FFT
Family Tree Example
• Families trees have 3 basic
relationships
– Spouse_of
– Child_of
– Parent_of
• Other relationships (aunt, grandparent,
cousin, etc) can expressed in terms of
these relationships through an ontology.
Cousins Example
• Suppose we want to create a workflow
to find the cousins of a given person, X.
• Query is submitted to WFMS which
checks the AWF repository (i.e., checks
annotated name of workflows)
• If no match then check rule base
Rulebase
Grandparents(X)=Parents[Parents[X]]
Cousins(X)=exclude[Grandchildren[Grandpar
ents(X), Children[Parents[X]]]]
Note: There is no rule for Grandchildren[X].
The Chaining Service would deduce how to
do this from the ontology.
Abstract Workflow From Rulebase
Grandparents
Grandchildren
Exclude
X
Parents
Atomic service
Composite service
Children
Cousins
WF after Recursive Application of
Rulebase
Parents
Parents
Parents
Children
Grandchildren
X
Exclude
Cousins
WF after Application of Chaining
Service
Parents
Parents
Children
Children
X
Parents
Children
Exclude
Note opportunity for optimization and parallelism.
Cousins
Dynamic Resource Discovery and
Scheduling
• Assume that semantically annotated services can be
found through a registry or repository service.
• Scheduling of workflow nodes on distributed
resources.
– Early binding model: bind to specific service/platform at
composition time (“validation”).
– Intermediate binding model: bind at “compile” time (when
converting from XML to executable form).
– Late binding model: bind dynamically at runtime.
• Later binding allows the use of more up-to-date
information to make scheduling decisions.
• In our framework binding is done by the Matchmaker
Service, and can follow any of the above binding
models.
Provenance Support in ServiceOriented Grids
• A workflow may produce many intermediate
and final data products that may need to be
later reviewed and analysed.
• A person, project, or organisation may need
to archive many such workflows and their
results.
• Want to store the provenance of data
products: how they were produced and why.
• Main developer is Shrija Rajbhandari.
Provenance
• Provenance can be regarded as historical
metadata that provides an explanation of
how a particular data product has been
generated.
• Uniquely defines the derived data.
• Identifies what data is passed between
services.
• Provides a traceable path to the origin of
the data.
Provenance Importance and
Problem
• No known standards to support
archiving provenance in serviceoriented Grid environment.
• Requires recording the provenance:
– The transformation of data occurred during
the invocation of services in a workflow.
– Complex service executed via a workflow
Engine.
Original Motivation
• Would like to be able to view an electronic
publication, and click on tables and figures of
results to:
– See how they were generated: requires
provenance browser.
– Re-run the workflows that generated the results to
verify them, or to perform “what-if” study by
changing the workflow inputs.
– See the results of any re-run workflows in the
same format as the original data (table of graph).
Provenance Model
Workflow
Engine
[BPWS4J]
Provenance Server
I
N
T
E
R
F
A
C
E
PCS
PQS
J
E
N
A
RDF
Schema
Provenance
mySql
Database
PCS = Provenance Collection Service
PQS = Provenance Query Service
Jena is a Java framework for building Semantic Web applications.
http://jena.sourceforge.net/
Prototype Provenance System
• Provenance Schema
– Resource Description Framework (RDF).
– Provenance of workflow execution.
• Provenance Collection Service (PCS)
– Provenance is represented in RDF statements.
– Database storage.
• Provenance Query Service (PQS)
– Client interface to browse provenance.
– Allows re-execution of retrieve provenance for
“what- if” style of analysis.
Prototype Dataflow
Web
Services
2) PCS sends the invocation initiation
of a workflow to BPWS4J.
4) BPWS4J sends message about
invoked services, and the input and
output parameters to PCS
1) User Client Interface sends the workflow
invocation parameters to PCS.
PCS Client
Interface
Uses
PCS
Provenance
RDF
schema
5) PCS Creates RDF
representation of the
collected provenance
data of the workflow
execution
PQS
Client
Interface
7) PQS Client passes query to the
database server which returns the
provenance data using Jena tools
to access RDF data.
BPWS4J
Engine
Provenance
Database
6) PCS stores the RDF graph in the
database server using Jena tools
8) PQS allows re-execution of the workflow from the provenance data
retrieved. Also allows parameter changes during re-execution of such
workflow.
3) BPWS4J
invokes the
partner
services
Services Composition and
Invocation
• Compose Web services using
BPEL4WS
• Execute with BPEL4WS compliant
engine: IBM’s BPWS4J
• Dynamically invoke Web services using
Web Service Invocation Framework
(WSIF).
Provenance Recording
Example: Adding two numbers and multiplying the result with
a third number
Provenance Recording (cont..)
Provenance Recording (cont..)
Provenance Query
Re-execution for “what-if”
analysis
Support for Collaboration in Grid
Environments
•
•
•
•
•
Collaboration can take various forms.
Making services available to others.
Making workflows available to others.
Making results available to others.
Collaboratively doing steering an
application.
• Collaborative visualisation of results.
Resource-Aware Visualisation
Environment (RAVE)
• Aims to develop a collaborative visualization
environment that scales across a wide range
of network-enabled devices.
• Will respond to changes in network
bandwidth and capabilities of the target
display device.
• Will start by examining VizServer and
COVISE systems.
• RAVE postdoc is Dr Ian Grimstead.
RAVE Overview
RAVE Motivation
• Current systems make assumptions
about available resources.
• RAVE makes use of local and/or remote
resources, and can react dynamically to
changes in these resources and the
network connecting them
RAVE Infrastructure
• The RAVE infrastructure is based on
Web services.
• Services are published and discovered
through a UDDI server.
• Main services are
– Data Service.
– Render Service.
Data Service
• Imports data from a file, web resource,
or external application.
• Acts as a central distribution point for
scene graph.
• Bridging services link to external
applications.
Render Service
• Render services connect to the Data
Service which accepts and broadcasts
changes in the scene graph.
• Render services contain complete
scene graph.
• View may be rendered in mono or
stereo mode.
• Multiple render sessions supported.
Thin Client
• A thin client is a client with modest
rendering capabilities, e.g., a PDA.
• It can connect to a remote render
service and make requests for offscreen rendered copies of the data.
• Local user can still manipulate camera
and underlying data.
RAVE on Zaurus PDA
Connecting to an Application
• Data Service can
receive live updates
from an external
application via a
bridging service.
• Future work will extend
this to allow
computational
steering.
Other Grid Projects
• Quality of Service:
http://www.cs.cf.ac.uk/user/Rashid/
• Grid-Enabled Computational
Electromagnetics (GECEM):
http://www.wesc.ac.uk/projects/gecem/
• Workflow Optimization Services for eScience (WOSE):
http://www.wesc.ac.uk/projects/wose/
Summary
• Semantic Web technologies play a key role in
enabling;
– “plug-and-play” in the composition of service to
create workflows.
– dynamic discovery of resources.
– Support for provenance.
• The above, together with collaborative
visualisation, are important in convincing
scientists (and others) to use the Grid.
Descargar

Holding slide prior to starting show