TAU Performance Profiling of the VTF Code
Julian C. Cummings (Caltech/CACR) and Sameer Shende (University of Oregon)
T A U P erfo rm a n ce S ystem F ra m ew o rk
Overview of VTF Code Profile
P ro g ra m D a ta b a se T o o lk it
A pplication
/ L ibrary






T uning and A nalysis U tilities
P erform ance syste m fram ew ork for scalable parallel and distributed
high -perform ance com putin g
T argets a general com plex system com putation m odel
 nodes / contexts / threads
 M ulti-level: system / softw are / parallelism
 M easurem ent and analysis abstraction
Integrated toolkit for perform ance instrum entation, m easurem ent,
analysis, and visualization
 P ortable, configurable perform ance profiling/tracing facility
 O pen softw are approach
U niversity of O regon, L A N L , F Z J G erm an y
http://w w w .cs.u oregon.edu/research/paracom p/tau
C / C++
parser
F ortran 77/90
parser
IL
IL
C / C++
IL analyzer
P ro gram
D atabase
Files
F ortran 77/90
IL analyzer
D U CTAPE

VTF code run with 2 solid nodes and 8
fluid nodes (Nodes 0 and 2 are solid
and fluid server nodes)
Solid solver ad lib co mputes Ta wall
response to pressure bubble with FEM
Fluid solver arm3d evolves bubble
with Perfect Gas EOS using Godunov
scheme on uniform Cartesian grid
Try to balance solid & flu id workload,
reduce waiting at end of each timestep
Use of highly refined solid mesh leads
to expensive broadcast of solid
boundary location data from fluid
server to other nodes (long green bars)
P D B htm l
P rogram
docu m entation


S IL O O N
A pplication
com p onent glue
CH A SM
C + + / F 90
interoperability

T A U _instr
A utom atic source
instrum entation

P ro g ra m D a ta b a se T o o lk it (P D T )

P rogram c ode analysis fram ew ork for developin g source -based tools for C 99,
C + + and F 90

H igh -level interface to source code inform ation

W idely p ortable:

U se T A U pro file group labels
to produce text or graphical
profiles of specific com ponent
IB M , S G I, C om paq, H P , S un, L inux clusters,W ind ow s, A pple , H itach i,
C ra y T 3E ...

C om pare function call and
tim ing data for adlib and arm 3d
Integrated toolkit for source code parsing, database creation, and database query

A nalysis gives sim ple view o f
w here each solver spends tim e

com m ercial grade front end parsers (E D G for C 99/C + + , M utek for F 90)

Intel/K A I C + + headers for std. C + + library distributed w ith P D T

portable IL analyzer, database form at, and access A P I

open softw are approach for tool develop m ent

T arget and integrate m ultiple source languages

U sed in C C A for autom ated ge neration of S ID L

U se in T A U to build autom ated perform ance instrum entation tools
(tau _instrum entor)

F o cu sin g o n C o d e C o m p o n en t P erfo rm a n ce



VTF code components and Python bindings are built
using a set of scripts and definition files: build procedure
 Added new tau build target for code instrumentation
Focus code tuning e ffo rts

S um tim e spent in m ain solver
functions in each tim e step to
check w orkload balance of
solid and fluid solvers

S can across nodes in solver
group to assess load balance of
each individual solver package
C an be used to generate code for perform ance ports in C C A
TAU Instrumentation of the VTF
T A U P erfo rm a n ce S ystem A rch itectu re
Colored bars indicate portion of total execution
time spent by each node in various functions
Overview of VTF Code Profile




Code parser  .pdb file with description of functions
Tau instrumentor  auto-instrumented source code
Instrumented code compiled with build procedure
Component-specific definition file sets TAU profile
group name, for easy tracking of component performance
 Added pytau Python bindings to allow run-time enabling
and disabling of TAU profile objects or groups

P araver

pytau bindings are now included in TAU package

VTF code run with 2 solid nodes and 8
fluid nodes (Nodes 0 and 2 are solid
and fluid server nodes)

Solid solver ad lib co mputes Ta wall
response to pressure bubble with FEM
Fluid solver arm3d evolves bubble
with Perfect Gas EOS using Godunov
scheme on uniform Cartesian grid
Try to balance solid & flu id workload,
reduce waiting at end of each timestep
Use of highly refined solid mesh leads
to expensive broadcast of solid
boundary location data from fluid
server to other nodes (long green bars)



E P ILO G
T A U S ta tu s






Instrum entation supp orted:
 S ource, preprocessor, com piler, M P I, runtim e, virtual m achine
L an guages supp orted:
 C + + , C , F 90, Java, P yth on
 H P F , Z P L , H P C + + , p C + + ...
P ackages supported:
 P A P I [U T K ], P C L [F Z J] (hardw are perform ance counter access),
 O pari, P D T [U O ,L A N L ,F Z J], D yninstA P I [U .M aryland] (instrum entation),
 E X P E R T , E P IL O G [F Z J],V am p ir[P allas], P araver [C E P B A ] (visualization)
P latform s supported:
 IB M S P , S G I O rigin, S un, H P S uperdom e, C om paq E S ,
 L inux clusters (IA -32, IA -64, P ow erP C , A lpha), A pple, W indow s,
 H itachi S R 8000, N E C S X , C ray T 3E ...
C om pilers suites supported:
 G N U , Intel K A I (K C C , K A P /P ro), Intel, S G I, IB M , C om paq,H P , F ujit su,
H itachi, S un, A pple, M icrosoft, N E C , C ra y, P G I, A bsoft, …
T hread libraries supported:
 P threads, S G I sproc, O penM P , W ind ow s, Java, S M A R T S
Reductions in TAU Profiling Overhead

Auto-instrumentation of VTF via build procedure is very
convenient, but initially produced unacceptable overhead


Must avoid profiling functions with short execution times
and/or large number of calls
Removing functions that account for small portion of
overall time profile can help clarify performance analysis
Can disable profiling by group or selectively enable
profiling only during key periods during code execution
 Our approach: use selection files to list signatures of
functions to be included in or excluded from profiling




Very tedious to generate appropriate lists manually
With rule-based selection, profiling data is examined and
filtered to produce selection files, refine instrumentation
Using these techniques, profiling overhead is minimal
Colored bars indicate portion of total execution
time spent by each node in various functions
Summary – TAU Profiling of VTF Code
TAU is a C++ toolkit for code profiling/tracing
 Highly portable, automatic code instrumentation
 Support for MPI or multithreaded programs
 Graphical tools for analysis of code profiling data
 TAU instrumentation now fully integrated with our VTF
code build procedure using tau build target
 Many bug fixes and new features based on our early use

 Automated labeling
of profiled function by group name
 Run-time access to code profiling statistics
 Selection of instrumented functions based on user criteria

TAU tools critical to enhancement of VTF performance
Descargar

Tau Site Visit 2003 - California Institute of Technology