Bulk Synchronous Parallel Processing Model Jamie Perkins Overview Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines A Bridge for Parallel Computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation Goals for BSP Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way BSP Design Three Components Node Processor Router and Local Memory or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware BSP Design Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory Illustration of BSP Computer Node P M Node P M Node P M Barrier Communication Network http://peace.snu.ac.kr/courses/parallelprocessing/ BSP Program Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization Strategies for programming on BSP Balance the computation between processes Balance the communication between processes Minimize the number of supersteps BSP Program Superstep 1 P1 P2 Computation Communication Superstep 2 Barrier http://peace.snu.ac.kr/courses/parallelprocessing/ P3 P4 Advantages of BSP Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms Cost Function g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received wi + ghi + L = execution time for the superstep i Languages & Machines BSP ++ C C++ Fortran JBSP Opal IBM SP1 SGI Power Challenge (Shared Memory) Cray T3D Hitachi SR2001 TCP/IP Thank You Any Questions References http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111.