Pattern Programming
Barry Wilkinson
University of North Carolina Charlotte
CCI Friday Seminar Series
April 13th, 2012
Acknowledgment
This work was initiated by Jeremy Villalobos
and described in his PhD thesis:
“RUNNING PARALLEL APPLICATIONS ON A
HETEROGENEOUS ENVIRONMENT WITH ACCESSIBLE
DEVELOPMENT PRACTICES AND AUTOMATIC
SCALABILITY,” UNC-Charlotte, 2011.
2
Problem Addressed
• To make parallel programming more useable and scalable.
• Parallel programming is writing programs for solving problems
using multiple computers, processors and cores, including with
physically distributed computers.
• A very long history but still a challenge.
• Traditional approaches involve explicitly
specifying message-passing with lowlevel tools such as MPI and thread
parallelism with OpenMP and OpenCL.
• Still not mainstream as it should be with
the introduction of multicore processors
3
Pattern Programming Concept
Programmer constructs his application using established
computational or algorithmic “patterns” that provide a structure.
Patterns are widely applicable.
What patterns are we talking about?
• Low-level algorithmic patterns that might be embedded into a
program such as fork-join, broadcast/scatter/gather.
• Higher level algorithm patterns for forming a complete
program such as workpool, pipeline, stencil, map-reduce.
We concentrate upon higher-level “computational/algorithm ”
level patterns rather than lower level patterns.
4
Some Patterns
Workpool
Workers
Master
Two-way
connection
Compute node
Source/sink
Derived from Jeremy Villalobos’s PhD thesis defense
5
Pipeline
Stage 1
Stage 2
Stage 3
Workers
Master
One-way
connection
Two-way
connection
Compute node
Source/sink
6
Divide and Conquer
Divide
Merge
Two-way
connection
Compute node
Source/sink
7
Stencil
Synchronous
Two-way
connection
Compute node
Source/sink
8
All-to-All
Two-way
connection
Compute node
Source/sink
9
Note on Terminology
“Skeletons”
Sometimes term “skeleton” used to describe
“patterns”, especially directed acyclic graphs with a
source, a computation, and a sink.
We do not make that distinction and use the term
“pattern” whether directed or undirected and whether
acyclic or cyclic.
This is done elsewhere.
10
Skeletons/Patterns
• Advantages
• Implicit parallelization
• Avoid deadlocks
• Avoid race conditions
• Reduction in source code size (lines of code)
• Abstracts the Grid/Cloud environment
• Disadvantages
• Takes away some of the freedom from the user
programmer
• New approach to learn
• Performance reduced (5% on top of MPI)
Derived from Jeremy Villalobos’s PhD thesis defense
11
More Advantages/Notes
• “Design patterns” have been part of software engineering for
many years ... .
–
–
–
–
Reusable solutions to commonly occurring problems *
Patterns provide guide to “best practices”, not a final implementation
Provides good scalable design structure to parallel programs
Can reason more easier about programs
• Hierarchical designs with patterns embedded into patterns,
and pattern operators to combine patterns.
• Leads to an automated conversion into parallel programs
without need to write with low level message-passing
routines such as MPI.
* http://en.wikipedia.org/wiki/Design_pattern_(computer_science)
12
Previous/Existing Work
• Patterns/skeletons explored in several projects.
• Universities:
– University of Illinois at Urbana-Champaign and University
of California, Berkeley
– University of Torino/Università di Pisa Italy
– ...
• Industrial efforts
– Intel
– Microsoft
–…
13
Universal Parallel Computing Research
Centers (UPCRC)
University of Illinois at Urbana-Champaign and University of
California, Berkeley with Microsoft and Intel in 2008 (with
combined funding of at least $35 million).
– Co-developed OPL (Our Pattern Language), a pattern
language for parallel programming
– Promoted patterns in Workshop on Parallel Programming
Patterns, ParaPLoP 2009, 2010, and 2011, and Pattern
Languages for Programs conference PloP’2009
14
UPCRC Patterns
Group of twelve computational patterns identified:
•
•
•
•
•
•
Finite State Machines
Circuits
Graph Algorithms
Structured Grid
Dense Matrix
Sparse Matrix
•
•
•
•
•
•
Spectral (FFT)
Dynamic Programming
Particle Methods
Backtrack
Graphical Models
Unstructured Grid
in seven general application areas
15
Closest
to our
work
http://calvado
s.di.unipi.it/do
kuwiki/doku.p
hp?id=ffname
space:about
University of
Torino, Italy
/Università di
Pisa
16
Intel
Focused on very low level patterns such as fork-join,
and provides constructs for them in:
• Intel Threading Building Blocks (TBB)
– Template library for C++ to support parallelism
• Intel Cilk plus
– Compiler extensions for C/C++ to support parallelism
• Intel Array Building Blocks (ArBB)
– Pure C++ library-based solution for vector parallelism
Above are somewhat competing tools obtained through
takeovers of small companies. Each implemented differently.
17
New book due out 2012 from
Intel authors
“Structured Parallel
Programming: Patterns for
Efficient Computation,”
Michael McCool, James
Reinders, Arch Robison,
Morgan Kaufmann, 2012
Focuses on Intel tools
B. Wilkinson was a reviewer for this book.
15.18
Using patterns with Microsoft C#
http://www.microsoft.com/download/en/det
ails.aspx?displaylang=en&id=19222
Again very low-level with patterns such as parallel for loops.
19
Our approach
(Jeremy Villalobos’ UNC-C PhD thesis)
Focuses on a few patterns of wide applicability
(workpool, pipelined, stencil, and dense matrix
patterns) but Jeremy took it much further than UPCRC
and Intel.
He developed a higher-level framework called “Seeds”
Uses pattern approach to automatically distribute code
across geographical sites and execute the parallel
code.
20
“Seeds” Parallel Grid Application
Framework
Some Key Features
• Pattern-programming
(Java) user interface
• Self-deploys on computers,
clusters, and geographically
distributed computers
• Load balances
• Three levels of user
interface
http://coit-grid01.uncc.edu/seeds/21
Seeds Development Layers

Basic



Advanced: Used to add or extend functionality such as:




Intended for programmers that have basic parallel
computing background
Based on skeletons and patterns
Create new patterns
Optimize existing patterns or
Adapt existing pattern to non-functional requirements
specific to the application
Expert: Used to provide basic services:




Deployment
Security
Communication/Connectivity
Changes in the environment
22
Derived from Jeremy Villalobos’s PhD thesis defense
Deployment
• Deployment with
Globus
– Globus GSIFTP to
transfer “seeds” folder
– GRAM to submit job to
run seed nodes
• Deployment with SSH
– now preferred
– Globus will be
depreciated in Seeds
23
Basic User Programmer Interface
To create and execute parallel programs,
programmer selects a pattern and implements
three principal Java methods:
• Diffuse method – to distribute pieces of data.
• Compute method – the actual computation
• Gather method – used to gather the results
Programmer also has to fill in details in a
“bootstrap” class to deploy and start the
framework.
Diffuse
Compute
Gather
Bootstrap class
The framework self-deploys on a geographically distributed
platform and executes pattern.
24
Example: Deploy a workpool pattern to
compute p using Monte Carlo method
Monte Carlo p calculation
• Basis on Monte Carlo
calculations is use of
random selections
• In this case, circle formed
with a square
• Points within square
chosen randomly
• Fraction of points within
circle = p/4
• Only one quadrant used in
code
25
Complete code
for computation
package edu.uncc.grid.example.workpool;
import java.util.Random;
import java.util.logging.Level;
import edu.uncc.grid.pgaf.datamodules.Data;
import edu.uncc.grid.pgaf.datamodules.DataMap;
import edu.uncc.grid.pgaf.interfaces.basic.Workpool;
import edu.uncc.grid.pgaf.p2p.Node;
public class MonteCarloPiModule extends Workpool {
private static final long serialVersionUID = 1L;
private static final int DoubleDataSize = 1000;
double total;
int random_samples;
Random R;
public MonteCarloPiModule() {
R = new Random();
}
@Override
public void initializeModule(String[] args) {
total = 0;
Node.getLog().setLevel(Level.WARNING); //
reduce verbosity for logging
random_samples = 3000; // set number of
random samples
}
public Data Compute (Data data) {
// input gets the data produced by DiffuseData()
DataMap<String, Object> input = (DataMap<String,Object>)data;
// output will emit the partial answers done by this method
DataMap<String, Object> output = new DataMap<String, Object>();
Long seed = (Long) input.get("seed"); // get random seed
Random r = new Random();
r.setSeed(seed);
Long inside = 0L;
for (int i = 0; i < DoubleDataSize ; i++) {
double x = r.nextDouble();
double y = r.nextDouble();
double dist = x * x + y * y;
if (dist <= 1.0) {
++inside;
}
}
output.put("inside", inside);// store partial answer to return to GatherData()
return output;
}
public Data DiffuseData (int segment) {
DataMap<String, Object> d =new DataMap<String, Object>();
d.put("seed", R.nextLong());
return d; // returns a random seed for each job unit
}
public void GatherData (int segment, Data dat) {
DataMap<String,Object> out = (DataMap<String,Object>) dat;
Long inside = (Long) out.get("inside");
total += inside; // aggregate answer from all the worker nodes.
}
public double getPi() {
// returns value of pi based on the job done by all the workers
double pi = (total / (random_samples * DoubleDataSize)) * 4;
return pi;
}
public int getDataCount() {
return random_samples;
Note: No
message
passing
(MPI etc)
26
package edu.uncc.grid.example.workpool;
import java.io.IOException;
import net.jxta.pipe.PipeID;
import edu.uncc.grid.pgaf.Anchor;
import edu.uncc.grid.pgaf.Operand;
import edu.uncc.grid.pgaf.Seeds;
import edu.uncc.grid.pgaf.p2p.Types;
public class RunMonteCarloPiModule {
public static void main(String[] args) {
try {
MonteCarloPiModule pi = new MonteCarloPiModule();
Seeds.start( "/path/to/seeds/seed/folder" , false);
PipeID id = Seeds.startPattern(new Operand( (String[])null, new Anchor( "hostname"
, Types.DataFlowRoll.SINK_SOURCE), pi ) );
System.out.println(id.toString() );
Seeds.waitOnPattern(id);
System.out.println( "The result is: " + pi.getPi() ) ;
Seeds.stop();
} catch (SecurityException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
Bootstrap class
This code deploys
framework and
starts execution of
pattern
Different patterns
have similar code
27
Compiling/executing
• Can be done on the command line (ant
script provided) or through an IDE (Eclipse)
28
Another example
Bubble sort with the pipeline pattern
From Jeremy
Villalobos’ PhD
thesis. See thesis
for more details and
results
Static test - Program executed using a fixed number of cores as given.
Dynamic test - Number of cores dynamically changed during execution to
improve performance. Platform: Dell 900 server with four quad-core
processors and 64GB shared memory.
29
Pattern operators
Can combine patterns.
Example: Adding Stencil and All-to-All synchronous pattern
Example use: Heat distribution simulation (Laplace’s eq.)
• Multiple cells on a stencil pattern work in a loop parallel
fashion, computing and synchronizing on each iteration.
• However, every x iterations, they must implement an all-to-all
communication pattern to run an algorithm to detect
termination.
Directly from Jeremy Villalobos’s PhD thesis
30
Tutorial page
15.31
15.32
Download
page
33
Open Source Apache License in progress
http://code.google.com/p/seeds-parallel-pattern-framework/
34
Work in progress
• Tutorial on using all-to-all pattern to solve the n-body
problem, and other documentation
• Documentation on advanced layer to enable programmers to
develop their own patterns
• Exploring various ways to enhance and expand the work,
using for example Aparapi
http://code.google.com/p/aparapi/
35
ITCS 4145/5145 Parallel Programming
• Pattern programming to be introduced into ITCS 4145/5145
Parallel Programming in Fall 2012.
To be taught on NCREN
in same way as ITCS
4/5146 Grid Computing
with instructors at UNCCharlotte and UNCWilmington but in Fall
2012 just two sites.
Subsequently will be
offered across NC.
Regional/national
workshops planned
External funding for this
work pending.
36
Pattern Programming Research Group
• 2011
– Jeremy Villalobos (PhD awarded, continuing involvement)
– Saurav Bhattara (MS thesis, graduated)
• Spring 2012
– Yawo Adibolo (ITCS 6880 Individual Study)
– Ayay Ramesh (ITCS 6880 Individual Study)
• Fall 2012
– Haoqi Zhao (MS thesis)
Openings!
Loosely related Spring 2012: Tim Lukacik and Phil Chung (UG senior projects
on Eclipse PTP)
37
Some publications
• Jeremy F. Villalobos and Barry Wilkinson, “Skeleton/Pattern
Programming with an Adder Operator for Grid and Cloud Platforms,”
The 2010 International Conference on Grid Computing and Applications
(GCA’10), July 12-15, 2010, Las Vegas, Nevada, USA.
• Jeremy F. Villalobos and Barry Wilkinson, “Using Hierarchical
Dependency Data Flows to Enable Dynamic Scalability on Parallel
Patterns,” High-Performance Grid and Cloud Computing Workshop, 25th
IEEE International Parallel & Distributed Processing Symposium,
Anchorage (Alaska) USA, May 16-20, 2011.
I regard this conference as
a top conference in the field
Also presented by B. Wilkinson as Session 4 in “Short Course on Grid
Computing” Jornadas Chilenas de Computación, INFONOR-CHILE 2010,
Nov. 18th - 19th, 2010, Antofagasta, Chile.
http://coitweb.uncc.edu/~abw/Infornor-Chile2010/GridWorkshop/index.html
38
Questions
39
Descargar

No Slide Title