Performance Engineering
Large Scale Computing Systems
SC07-APART Workhop on:
Performance Analysis and Optimization
of High-End Computing Systems
Dr. Frederica Darema
CISE/NSF
1
Outline
•
•
•
•
•
•
The BIG PICTURE
Applications Directions
Computing Platforms Directions
Research and Technology Directions
Examples of some advances
Future Challenges and Opportunities
2
Science, Engineering, and “Commercial”
Applications Environments:
how are they shaping in the future
What does it entail for:
Large-Scale Computing
and..
for Large-Scale High-End Computing
Small-Scale and Large-Scale Systems –
Increasing complexity of systems and applications …
• Processing at multiple levels
• Computation and data processing, both at the application
and the instruments/sensors side
• New Computational Units
– Beyond commodity microprocessors /superscalar / (D)MT
GPU/(GP)2Us (MC-P), MT, FPGAs, GPUs, …
– Populating: high-end platforms, workstations,
visualization servers, data servers, etc, …
• Potentially:
– MC-Ps, FPGAs, GPUs at application side
– MC-Ps, FPGAs, GPUs at the data acquisition side
• One kind of processor EVERYWHERE???
• Or Mix of MC-Ps, FPGAs, GPUs???
• Pros & deficiencies in each - advances close gaps
• Complexity persists and increases
4
Platforms Directions
– Vector Processors
– SIMD MPPs
– Distributed Memory MPs
– Shared Memory MPs
– Distributed Platforms,
Heterogeneous Computers and Networks
• Heterogeneity
• Latencies
– architecture
– variable (internode,
(computer &network)
intranode)
– node power
• Bandwidths
(supernodes, MCP)
– different for different links
– different based on traffic
Petaflops Platform
(Grid-in-a-Box)
Distributed Platform
tac-com
alg accelerator
….
MPP
5
NOW
fire
cntl
data
base
data
base
fire
cntl
SAR
SP
Applications Directions
– Mostly monolithic
– Mostly one
programming language
–
–
–
–
–
–
–
Multi-Modular
–
Multi-Language
–
Multi-Developers
Multi-Source Data –
–
6
– Computation Intensive
– Batch
– Hours/days
Computation Intensive
Data Intensive
Real Time
Few Minutes/hours
Visualization
Interactive Steering
Integrated Simulations&Experiments
Dynamic Data Driven Applications Systems
Example of new applications and systems directions
Dynamic Data Driven Application Systems (DDDAS)
(www.cise.nsf.gov/dddas & www.dddas.org)
DDDAS: ability to dynamically incorporate additional data into an executing
application, and in reverse, ability of an application to dynamically steer the
measurement process
Dynamic Integration of
Computation & Measurements/Data
(from the Real-Time to the High-End)
Unification of
Computing Platforms & Sensors/Instruments
DDDAS guides sensor systems architectures
Challenges:
Application Simulations Methods
Experiment
Measurements
Field-Data
(on-line/archival)
Dynamic
User
Feedback & Control
Loop
7
Algorithmic Stability
Measurement/Instrumentation Methods
Computing Systems Software Support
Software Architecture Frameworks
Synergistic, Multidisciplinary Research
TeraGrid
• A distributed system of
unprecedented scale
• 30+ TF, 1+ PB, 40 Gb/s net
• Unified user environment
across resources
• User software environment User
support resources
• Integrated new partners to
introduce new capabilities
• Additional computing,
visualization capabilities
• New types of resources: data
collections, instruments
courtesy Charlie Catlett
8
• Created an initial community of
over 500 users, 80 PIs
• Created User Portal in
collaboration with NMI
DDDAS: Beyond Grid Computing
“Extended Grid” – “SuperGRID”:
the Application Platform is
the computational&measurement system
Applications
Measurement Grids
Computational Grids
SuperGrids:
Dynamically Coupled Networks of Data and Computations
9
Examples of TeraGrid Applications
Aquaporin Mechanism
Animation pointed to by 2003 Nobel chemistry
prize announcement.
Schulten, UIUC
Atmospheric Modeling
Droegemeier, OU
Reservoir Modeling
Wheeler/UTAustin, Saltz/OSU,Parashar/Rutgers
Advanced Support for
TeraGrid Applications:
TeraGrid staff are “embedded”
with applications to create
- Functionally distributed workflows
- Remote data access, storage
and visualization
- Distributed data mining
- Ensemble and parameter sweep
run and data management
10
Groundwater/Flood Modeling
Lattice-Boltzman Simulations
Maidment, Wells, UT
courtesy Charlie Catlett
Coveney, UCL
Bruce Boghosian, Tufts
To address the complexity of today’s and
future systems, applications and their
environments
We need systematic modeling and analysis
approaches for designing, supporting the
runtime, and management of such systems
Systems Performance Engineering
11
Background
• Systems Modeling and Analysis increasingly important:
– systems design cycle and runtime
– measurements (static and runtime)
– functional correctness of hw, hw and sw performance, dependability,
reliability, power management, security, debugging, …
• Traditionally/in the past (for example):
– modeling specific aspects components, rather than full system
– architectural simulators trade speed for accuracy – full-system simulators
trade accuracy for speed
• Want modeling/simulation capabilities that allow
– accurate – cycle level resolution
– complete modeling of the entire system
– simulate execution of real workloads (full applications or realistic
benchmarks) on top of real OS systems
– allow users to probe features in the systems (hardware, systems
software, application)
• A number of research efforts are addressing such challenges, and
more…
12
System Modeling and Analysis
develop methods and tools for modeling, measuring, analyzing,
evaluating, and predicting the performance, dependability, reliability,
runtime management, debugging, security, etc..
for design & runtime support of complex computing and communications
systems
• Hardware and Software modeling
– methods tools and measurements, providing multimodal, hierarchical
or multilevel modeling and analysis capabilities of such systems;
– methods that describe components of the system, but also the
system as a total, and enable assessment of the effects of individual
hardware and software layers and components of these systems;
– ability to describe the system in multiple levels of detail
(characteristics and time-scales);
– combine different (hybrid) methods of describing components and
layers, from analytical, statistical, to simulation, emulation, etc….
– performance specification languages and compilers
– 13
testing & validation of developed methods and tools
System Modeling and Analysis
• Modeling and measurement approaches
– capabilities to describe, analyze and predict the behavior of the
components as well as the systems;
– analysis and prediction due to characteristics or changes in the
application, system software, hardware;
– multilevel approaches and multi-modal approaches
• Performance Frameworks
– combine tools in “plug-and-play” fashion
– multiple views of the system
• Use of systems modeling and analysis methods and tools
beyond the design cycle..
… that is: to support optimized application composition,
mapping, runtime with performance, dependability, faulttolerance
14
Systems Modeling and Analysis
Distributed Applications
Performance Frameworks
Application
Models
...
File/IO
Models
Prog.Models
Compilers
Libraries
Tools
Scalable I/O
Data Management
Archiving/Retrieval
Services
Collaboration
Environments
/
Authenication
Authorization
Fault Recovery
Services
OS
Scheduler
Models
Distributed Systems Management
Architecture
Network
Models
Distributed, Heterogeneous, Dynamic, Adaptive
Computing Platforms and Networks
Memory
Models
15
Visualization
Memory
Technology
CPU
Technology
Device
Technology
...
Multiple views of the system
The Operating Systems’ view
Application
Models
...
IO / File
Models
OS
Scheduler
Models
Architecture /
Network
Models
Memory
Models
16
Distributed Applications
Languages
Compilers
Libraries
Tools
Visualization Collaboration
Environments
Scalable I/O Authenication/
Data Management Authorization
Archiving/RetrievalDependability
Services
Services
Other Services. . .
Distributed Systems Management
Distributed, Heterogeneous, Dynamic, Adaptive
Computing Platforms and Networks
Memory
Technology
CPU
Technology
Device
Technology
...
Technology for integrated feedback & control
Runtime Compiling System (RCS) and Dynamic Application Composition
Application
Model
Dynamic Analysis
Situation
Distributed
Programming
Model
Application
Program
Compiler
Front-End
Application
Intermediate
Representation
Compiler
Back-End
Launch
Application (s)
Dynamically
Link
&
Execute
Performance
Measuremetns
&
Models
Application
Components
&
Frameworks
Distributed Computing Resources
Distributed Platform
tac-com
alg accelerator
….
17
MPP
NOW
fire
cntl
data
base
data
base
fire
cntl
SAR
SP
Great set of efforts that are developing
systems modeling methods
along these directions
and leading to performance frameworks
Emphasis on Multidisciplinary Research
(across sub-areas of CS)
Application driven validation
of research and technology advances
Collaborations with industry are fruitful
Projects can be found in the proceedings of the
Next Generation Software Workshop Series
organized every year in conjunction with IPDPS
GRADS Project & VGRADS PI: Ken Kennedy, (& Dan Reed, Andrew
Chien, Fran Berman, Dennis Gannon, Ian Foster, Jack Dongarra, et.al)
Project Goals: To develop program preparation system support for computational Grid applications and technologies to support efficient run-time
management of computational Grid resources, and achieve reliable performance under varying load
.
GrADSoft Architecture
Program Preparation System
Program Execution System
Performance
Feedback
Software
Components
Source
Application
Libraries
WholeProgram
Compiler
Configurable
Object
Program
Performance
Problem
Service
Negotiator
Real-time
Performance
Monitor
Negotiation
Scheduler
Dynamic
Optimizer
Performance Contracts - At the Heart of the GrADS Model:
19
•Fundamental mechanism for managing mapping and execution
What are they?
•Mappings from resources to performance
•Mechanisms for determining when to interrupt and reschedule
Abstract Definition
•Random Variable: r(A,I,C,t0) with a probability distribution
•A = app, I = input, C = configuration, t0 = time of initiation
•Important statistics: lower and upper bounds (95% confidence)
Challenge
•When should a contract be violated?
•Strict adherence balanced against cost of reconfiguration
Grid
Runtime
System
Dynamic Adaptive Systems Software
for Robust and Dependable Large-Scale Systems
20
{Adve & Sanders}
Montage - An Integrated End-to-End Design
and Development Framework for Wireless Networks
•
•
PI: Rappaport (& Browne, Shakkottai, Ramakrishnan, Varadarajan) {UTAustin, VTech}
Project advanced the state-of-the art in fast and efficient methods for simulating largescale networks
Deliverables:
– generated a wide range of analytical and simulation-based modeling methods
– Developed a wireless channel simulator (the Site Specific Software Simulator for
Wireless - S^4W)
• S^4W was used by the PIs to develop more powerful and efficient techniques for
end-to-end improved network performance for users of both wired and wireless
networksS^4W has been used by several universities (in US and Canada), industry
(Boeing) and NASA, and commercial business (Schlotzky’s deli)
• Developed fast simulation capabilities of networks
• Fast hybrid network simulation using spatiotemporal dilations FluNet: hybrid
simulation-emulation environment, based on combined fluid models
• Developed scalable parallel discrete event simulator (Shakkottai, Ramakrishnan)
• Open Network Emulator
– Highly scalable distributed direct code execution environment; supports both simulation
and emulation in a single tool; novel method, using the notion of Relativistic Time, so that
the global virtual time is derived by dilating the real (wall-clock) time
– Productivity with Performance through Components&Composition (Browne)
• P-COM^2environement: automated compile-time/runtime-composition of a
21
parallel programs - applied here to performance modeling
A Fast, Cycle-Accurate Computer
System Technology
22
Fast and Accurate Simulation of Scalable Computer Systems
{Falsafi & Hoe}
ProtoFlex addresses full-system and scaling complexity for FPGA-based simulation in two ways.
Hybrid emulation (a) avoids reconstruction of the entire system on FPGAs.
Interleaved emulation (b) lets us decouple the size and complexity of the simulated system from that of the
underlying FPGA host.
(a) Hybrid Emulation
23
(b) Multiple-context Interleaved Emulation
Examples of Modeling & Analysis Efforts
•
•
•
•
•
•
(Performance Modeling Frameworks)
FPGA Accelerated Simulation Technologies – functional simulator + timing model
(implemented in FPGAs) for fastest cycle-accurate, full system simulator (within
1-3 orders of real hw)
Fast and accurate simulator through sampling, checkpointing to capture the
microarchitectural state, and performing cycle-accurate simulation in the
selected sampled regions, to simulate full (unmodified) applications
Structural and composable performance simulation of complex systems
effort constructs simulators from system descriptions and component libraries
(e.g. produced in 11 wks Itanium2 simulator accurate to 3% of actual hardware)
Real-time large-scale network simulation environment, through a hybrid of
continuous and event-driven simulation paradigms, of a fluid-model
representation the mean traffic and a packet-oriented simulation. The hybrid
testbed will combine advantages of analytical models, simulation and emulation,
and physical network testbeds.
Component based software environment for simulation, emulation and synthesis
of network protocols, integrating model-checking with event-driven simulations
to allow performance evaluation and protocol validation in a unified way
End-to-end design and development framework for large-scale wireless
networks - composed through capabilities developed under problem solving
environments application compile-time and runtime composition methods to
compose the simulation and emulation systems for setting-up experimental
testbeds, performance engineering methods (of the POEMS project), the
Weaves runtime and the P-COM for parallel/distributed execution of discrete
event simulations, and integrate low level channel models to higher level protocol
24 and the relativistic time temporal model developed under the collabort’n.
layers
Examples of Modeling & Analysis Efforts
(Application modeling, resource management, …)
•
•
•
•
•
•
•
•
•
•
Modeling system for enabling algorithm designers and programmers to develop,
evaluate and compare application algorithms for CMP/CMT systems
Software tools to enable access to coordinated information collected through
hardware-based profiling of local and remote memory access of application
computation and communication patterns
Dynamic profiling of application phases for optimizing power consumption under set
performance constraints for reconfigurable multi-core environments and data servers
Cross platform performance estimation by partial execution of applications, capturing
computation and communication parameters, and generalizing prediction to problemscaling scenarios, in parallel and distributed platforms
Language support continuous monitoring of distributed systems, grids and other datacentric and network systems
Adaptive resource sharing mechanisms autonomically matching resources to
dynamically changing needs via statistical and stochastic approaches
Data driven resource allocation in complex systems, through workload
characterization, analytical models and policy development
Compiler enabled model- and measurement-driven adaptation environment for
dependability and performance (performability)
Engineering reliability at software design time by coupling software component
architectural models with statistical methods to address uncertainties in design stage
Tools for pro-active runtime system health monitoring and enhancement for largescale parallel systems, by collecting and analyzing through on-line models data
collected over extended periods of time, and in real-time, filtering and correlating
evolving failure data with respect to factors such as workload and operating
temperature, and use this information to schedule or checkpoint jobs
25
Summary Thoughts
• Large scale high-End systems cannot be treated as isolated
platforms
• Such systems demand: enhanced and optimized computation,
communication and data management capabilities, in the
presence of resource heterogeneity, dynamicity, adaptivity
• Need to advance the technologies that will automate the
mapping of complex and dynamic applications on complex
platforms with multiple and heterogeneous levels of
processors, memory, and networks
• Modeling and Analysis Methods – Performance Engineering
of systems are crucial in enabling optimized design, runtime,
and management of such systems
26
Dynamic Adaptive Systems Software
for Robust and Dependable Large-Scale Systems
Award 0406351: A Compiler-Enabled Model- and
Measurement-Driven Adaptation Environment for
Dependability and Performance
William Sanders and Vikram Adve
Develops compiler controlled performance data monitoring together
with performance models for adaptive and optimized runtime
support, in environments with underlying computational,
communication, and storage resources maybe changing, as well as
environments where also the application requirements may be
changing
Combines and advances in novel directions work on dynamic runtime
compilation methods (LLVM) developed by Adve in 0093426(CAREER) NGS: Techniques and Applications of Dynamic Compilation;
and system level integrated performance methods developed by
Sanders in 0228762 - Next Generation Software: An Integrated
Framework for Performance Engineering and Resource-Aware
Compilation
Other Technical impacts of the individual projects:
Möbius is a performance engineering framework and tool for the evaluation of
In addition to the multidisciplinary work from two sub-areas of
distributed and parallel computing systems, accounting for system components
computer sciences: compilers and performance modeling and analysis
including the application software itself, the operating system, and the underlying
the project includes collaboration with industry, and specifically with
computing and communication hardware. The framework provides a means by which
two senior researchers from ATT Labs-Research, which provides
multiple, heterogeneous models can be composed together, each representing a
resources such as production-level software, to drive and validate the
different module (software or hardware), component, or view of the system.
research methods, and also provides opportunities for student Möbius has made a significant worldwide impact in the research area of stochastic model
analysis. The impact spans both academic and commercial domains. In addition to being the
internships at the ATT Research Lab.
Other Technical impacts of the individual projects:
The LLVM compiler infrastructure has been publicly distributed since
October 2003 and downloaded well over 2000 times since.
It has attracted at least 40 serious users in academia (instructors and
researchers) and industry (startups and established companies).
Apple Computer has not only adopted LLVM and has set up an active
group of developers working on incorporating LLVM in Apple’s products
such as the next release of MacOS due in Spring 2007
A paper: Automatic Pool Allocation, on novel methods developed under
the project and incorporated in LLVM, won a Best Paper award at the
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI), the premier conference in the area of compilers.
27
principal tool used in the graduate-level system reliability courses at the University of Illinois,
USA and the Univ. of Florence, Italy, Möbius has been licensed to over 150 university sites
throughout the world for teaching and research purposes. International Partnerships with
tesearch groups from the Univ. of Twente, Dörtmund University, University of the Federal
Armed Forces München, and Saarland University are partnering with the Möbius team to
developing plug-in modules for the Möbius framework. The first International Möbius
Developer’s Working group meeting was held in Sept. 2004, further increasing the number of
groups that use Möbius in their research.
Möbius has also been licensed for commercial use to many companies, including: Motorola,
Iridium, Pioneer Hybrids, Windber Research Institute, General Dynamics and Boeing. For
example, Möbius have been used for numerous telecommunications and computer system
applications at Motorola and was designated one of three company wide system availability
modeling packages.
Recently, researchers have begun to use Möbius for biological applications; over 25 universities
and Pioneer Hybrid (the world's largest seed producer) and Windber Research Incorporated
(non-profit research organization with projects studying the disease progression of breast
cancer) have licensed it for use with biological systems.
Descargar

Document