Design-Driven Compilation
Radu Rugina and Martin Rinard
Laboratory for Computer Science
Massachusetts Institute of Technology
Overview
Computation
+
Goal:
Parallelization
Analysis Problems:
Points-to Analysis, Region Analysis
Two Potential
Solutions
Fully
Automatic
Design
Driven
Evaluation
Example - Divide and Conquer Sort
7 4 6 1 3 5 8 2
Example - Divide and Conquer Sort
7 4 6 1 3 5 8 2
7 4
6 1
3 5
8 2
Divide
Example - Divide and Conquer Sort
7 4 6 1 3 5 8 2
7 4
6 1
3 5
8 2
Divide
4 7
1 6
3 5
2 8
Conquer
Example - Divide and Conquer Sort
7 4 6 1 3 5 8 2
7 4
6 1
3 5
8 2
Divide
4 7
1 6
3 5
2 8
Conquer
1 4 6 7
2 3 5 8
Combine
Example - Divide and Conquer Sort
7 4 6 1 3 5 8 2
7 4
6 1
3 5
8 2
Divide
4 7
1 6
3 5
2 8
Conquer
1 4 6 7
2 3 5 8
1 2 3 4 5 6 7 8
Combine
Divide and Conquer Algorithms
• Lots of Generated Concurrency
• Solve Subproblems in Parallel
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency
• Recursively Solve Subproblems in Parallel
Divide and Conquer Algorithms
• Lots of Recursively Generated Concurrency
• Recursively Solve Subproblems in Parallel
• Combine Results in Parallel
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)
Divide array into
if (n > CUTOFF) {
subarrays and
sort(d,t,n/4);
recursively sort
sort(d+n/4,t+n/4,n/4);
subarrays
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)
Subproblems Identified
if (n > CUTOFF) {
Using Pointers Into
sort(d,t,n/4);
Middle of Array
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
7 4 6 1 3 5 8 2
d
d+n/4
d+n/2
d+3*(n/4)
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
7 4 6 1 3 5 8 2
d
d+n/4
d+n/2
d+3*(n/4)
“Recursively Sort Four Quarters of d”
void sort(int *d, int *t, int n)
Sorted Results
if (n > CUTOFF) {
Written Back Into
sort(d,t,n/4);
Input Array
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
4 7 1 6 3 5 2 8
d
d+n/4
d+n/2
d+3*(n/4)
“Merge Sorted Quarters of d Into Halves of t”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
4 7 1 6 3 5 2 8
d
1 4 6 7 2 3 5 8
t
t+n/2
“Merge Sorted Halves of t Back Into d”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
1 2 3 4 5 6 7 8
d
1 4 6 7 2 3 5 8
t
t+n/2
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
7 4 6 1 3 5 8 2
d
d+n
“Use a Simple Sort for Small Problem Sizes”
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+2*(n/2),t+2*(n/2),n/4);
sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4),d+n,t+n/2);
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
7 4 1 6 3 5 8 2
d
d+n
Parallel Sort
void sort(int *d, int *t, int n)
if (n > CUTOFF) {
spawn sort(d,t,n/4);
spawn sort(d+n/4,t+n/4,n/4);
spawn sort(d+2*(n/2),t+2*(n/2),n/4);
spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
sync;
spawn merge(d,d+n/4,d+n/2,t);
spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
sync;
merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
What Do You Need To Know To
Exploit This Form of Parallelism?
Points-to Information
(data blocks that pointers point to)
Region Information
(accessed regions within data blocks)
Information Needed To Exploit Parallelism
d and t point to different memory blocks
Calls to sort access disjoint parts of d and t
Together, calls access [d,d+n-1] and [t,t+n-1]
sort(d,t,n/4);
d
t
d+n-1
t+n-1
sort(d+n/4,t+n/4,n/4);
d
t
d+n-1
t+n-1
sort(d+n/2,t+n/2,n/4);
d
t
d+n-1
t+n-1
d
t
d+n-1
t+n-1
sort(d+3*(n/4),t+3*(n/4),
n-3*(n/4));
Information Needed To Exploit Parallelism
d and t point to different memory blocks
First two calls to merge access disjoint parts of d,t
Together, calls access [d,d+n-1] and [t,t+n-1]
merge(d,d+n/4,d+n/2,t);
d
t
d+n-1
t+n-1
merge(d+n/2,d+3*(n/4),
d+n,t+n/2);
d
t
d+n-1
t+n-1
merge(t,t+n/2,t+n,d);
d
t
d+n-1
t+n-1
Information Needed To Exploit Parallelism
Calls to insertionSort access [d,d+n-1]
insertionSort(d,d+n);
d
d+n-1
What Do You Need To Know To
Exploit This Form of Parallelism?
Points-to Information
(d and t point to different data blocks)
Symbolic Region Information
(accessed regions within d and t blocks)
How Hard Is It To Figure These
Things Out?
How Hard Is It To Figure These
Things Out?
Challenging
How Hard Is It To Figure These
Things Out?
void insertionSort(int *l, int *h) {
int *p, *q, k;
for (p = l+1; p < h; p++) {
for (k = *p, q = p-1; l <= q && k < *q; q--)
*(q+1) = *q;
*(q+1) = k;
}
}
Not immediately obvious that
insertionSort(l,h) accesses [l,h-1]
How Hard Is It To Figure These
Things Out?
void merge(int *l1, int*m, int *h2, int *d) {
int *h1 = m; int *l2 = m;
while ((l1 < h1) && (l2 < h2))
if (*l1 < *l2) *d++ = *l1++;
else
*d++ = *l2++;
while (l1 < h1) *d++ = *l1++;
while (l2 < h2) *d++ = *l2++;
}
Not immediately obvious that merge(l,m,h,d)
accesses [l,h-1] and [d,d+(h-l)-1]
Issues
• Heavy Use of Pointers
• Pointers into Middle of Arrays
• Pointer Arithmetic
• Pointer Comparison
• Multiple Procedures
• sort(int *d, int *t, n)
• insertionSort(int *l, int *h)
• merge(int *l, int *m, int *h, int *t)
• Recursion
Fully Automatic Solution
• Whole-program pointer analysis
• Context-sensitive, flow-sensitive
• Rugina and Rinard, PLDI 1999
• Whole-program region analysis
• Symbolic constraint systems
• Solve by reducing to linear programs
• Rugina and Rinard, PLDI 2000
Key Complication
Need for sophisticated interprocedural analyses
• Pointer analysis
• Propagate analysis results through call graph
• Fixed-point algorithm for recursive programs
• Region analysis
• Formulation avoids fixed-point algorithms
• Single constraint system for each strongly
connected component
• Need to have whole program in analyzable form
Bigger Picture
• Points-to and region information is (implicitly)
part of the interface of each procedure
• Programmer understands procedure interfaces
• Programmer knows
• Points-to relationships on entry
• Effect of procedure on points-to relationships
• Regions of memory blocks that procedure
accesses
Idea
Enhance procedure interface to make points-to
and region information explicit
• Points-to language
• Points-to graphs at entry and exit
• Effect on points-to relationships
• Region language
• Symbolic specification of accessed regions
• Programmer provides information
• Analysis verifies that it is correct
Points-to Language
f(p, q, n) {
context {
entry: p->_a, q->_b;
exit: p->_a, _a->_c,
q->_b, _b->_d;
}
context {
entry: p->_a, q->_a;
exit: p->_a, _a->_c,
q->_a;
}
}
Points-to Language
f(p, q, n) {
context {
entry: p->_a, q->_b;
exit: p->_a, _a->_c,
q->_b, _b->_d;
}
context {
entry: p->_a, q->_a;
exit: p->_a, _a->_c,
q->_a;
}
}
Contexts for f(p,q,n)
p
p
q
q
p
p
q
q
entry
exit
Verifying Points-to Information
One (flow sensitive) analysis per context
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
p
p
q
q
entry
exit
Verifying Points-to Information
Start with entry points-to graph
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
p
q
q
q
p
p
q
q
entry
exit
Verifying Points-to Information
Analyze procedure
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
p
q
q
entry
q
p
p
q
q
exit
Verifying Points-to Information
Analyze procedure
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
p
p
p
q
q
q
entry
exit
Verifying Points-to Information
Check result against exit points-to graph
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
p
p
p
q
q
q
entry
exit
Verifying Points-to Information
Similarly for other context
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
p
p
q
q
entry
exit
Verifying Points-to Information
Start with entry points-to graph
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
p
q
q
q
p
p
q
q
entry
exit
Verifying Points-to Information
Analyze procedure
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
entry
p
q
p
p
q
q
exit
Verifying Points-to Information
Check result against exit points-to graph
Contexts for f(p,q,n)
f(p,q,n) {
.
.
.
}
p
p
q
q
p
p
p
q
q
q
entry
exit
Analysis of Call Statements
g(r,n) {
.
.
f(r,s,n);
.
.
}
Analysis of Call Statements
Analysis produces points-graph before call
g(r,n) {
.
r
.
s
f(r,s,n);
.
.
}
Analysis of Call Statements
Retrieve declared contexts from callee
g(r,n) {
.
r
.
s
f(r,s,n);
.
.
}
Contexts for f(p,q,n)
p
p
q
q
p
p
q
q
entry
exit
Analysis of Call Statements
Find context with matching entry graph
g(r,n) {
.
r
.
s
f(r,s,n);
.
.
}
Contexts for f(p,q,n)
p
p
q
q
p
p
q
q
entry
exit
Analysis of Call Statements
Find context with matching entry graph
g(r,n) {
.
r
.
s
f(r,s,n);
.
.
}
Contexts for f(p,q,n)
p
p
q
q
p
p
q
q
entry
exit
Analysis of Call Statements
Apply corresponding exit points-to graph
g(r,n) {
.
r
.
s
f(r,s,n);
r
.
s
.
}
Contexts for f(p,q,n)
p
p
q
q
p
p
q
q
entry
exit
Analysis of Call Statements
Continue analysis after call
g(r,n) {
.
.
f(r,s,n);
r
.
s
.
}
Analysis of Call Statements
g(r,n) {
.
.
f(r,s,n);
r
.
s
.
}
Result
• Points-to declarations
separate analysis of multiple
procedures
• Transformed
• global, whole-program
analysis into
• local analysis that
operates on each
procedure independently
Region Language
h(p,n) {
reads [p,p+n-1];
writes [p,p+n-1];
}
Region Language
h(p,n) {
reads [p,p+n-1];
writes [p,p+n-1];
}
reads p
p+n-1
writes p
p+n-1
Verifying Region Information
• Two region containment requirements
• Direct Accesses:
• Locations directly accessed by procedure
must be contained in declared regions
• Callees:
• Regions accessed by callees must be
contained in declared regions of caller
Verifying Region Information
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
Verifying Region Information
Extract directly accessed regions
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
reads p
p+n-1
writes p
p+n-1
Directly
Accessed
Regions
Verifying Region Information
Check inclusion within declared regions
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
reads p
p+n-1
writes p
p+n-1
reads p
p+n-1
writes p
p+n-1
Declared
Regions
for h(p,n)
Directly
Accessed
Regions
Verifying Region Information
Check inclusion for accesses of callees
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
Callees
Verifying Region Information
Start with call to h(p,n/2)
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
Verifying Region Information
Extract and translate regions for h(p,n/2)
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
reads p
p+n-1
writes p
p+n-1
Translated
Regions from
h(p,n/2);
Verifying Region Information
Check inclusion in declared regions of caller
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
h(p+n/2,n-n/2);
}
}
reads p
p+n-1
writes p
p+n-1
reads p
p+n-1
writes p
p+n-1
Declared
Regions
for h(p,n)
Translated
Regions from
h(p,n/2);
Verifying Region Information
Similarly for call h(p+n/2,n-n/2)
h(p,n) {
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
reads p
h(p+n/2,n-n/2);
writes p
}
}
p+n-1
p+n-1
Translated
Regions from
h(p+n/2,n-n/2);
Verifying Region Information
Check inclusion in declared regions of caller
reads p
h(p,n) {
writes p
if (n < k)
for (i=0;i<n;i++)
p[i] = p[i]-1;
else {
h(p,n/2);
reads p
h(p+n/2,n-n/2);
writes p
}
}
p+n-1
p+n-1
p+n-1
p+n-1
Declared
Regions
for h(p,n)
Translated
Regions from
h(p+n/2,n-n/2);
Verifying Region Information
Result
• Region declarations
separate analysis of
h(p,n) {
multiple procedures
if (n < k)
for (i=0;i<n;i++) • Transformed
p[i] = p[i]-1;
• global, whole-program
analysis into
else {
h(p,n/2);
• local analysis that
operates on each
h(p+n/2,n-n/2);
procedure independently
}
}
Experience
Experience
• Implemented points-to and region languages
• Integrated with points-to and region analyses
• Obtained Divide and Conquer Benchmarks
• Quicksort
(QS)
Sorting Programs
• Mergesort
(MS)
• Matrix multiply
(MM)
Dense Matrix
Computations
• LU decomposition (LU)
Scientific
• Heat
(H)
Computation
• Written in C
• We added points-to and region information
Results
• With points-to and region information, could
parallelize all benchmarks
• Points-to information speeds up points-to
analysis significantly (up to factor of two)
• Region information has no significant effect
on how fast region analysis runs
Programming Overhead
Proportion of C Code, Region Declarations, and
Points-to Declarations
1.00
0.75
C Code
0.50
Region
Declarations
0.25
Points-to
Declarations
0.00
QS
MS
MM
LU
H
Evaluation
How difficult is it to provide declarations?
Not that difficult.
• Have to write comparatively little code
• Must know information anyway
How much benefit does compiler obtain?
Substantial benefit.
• Simpler analysis software
(no complex interprocedural analysis)
• More scalable, precise analysis
Evaluation
Software Engineering Benefits of Points-to and
Region Declarations
• Analysis reflects programmers intention
• Enhanced code reliability
• Enhanced interface information
• Analyze incomplete programs
• Programs that use libraries
• Programs under development
Evaluation
Drawbacks of Points-to and Region Declarations
• Have to learn new language
• Have to integrate into development process
• Legacy software issues
(programmer may not know points-to and
region information)
Related Work
• Extended Type Systems
• FX/87 [GJLS87]
• Dependent Types [XF99]
• Issue: where put extended type information?
• Integrated with rest of program
• Separated from rest of program
• Program Verification
• ESC [DLNS98]
• PVS [ORRSS96]
Related Work
• Pointer Analysis
• Landi, Ryder, Zhang – PLDI93
• Emami, Ghiya, Hendren – PLDI94
• Wilson, Lam – PLDI96
• Rugina, Rinard – PLDI99
• Rountev, Ryder – CC01
• Region Analysis
• Triolet, Irigoin, Feautrier- PLDI86
• Havlak, Kennedy – IEEE TPDS91
• Rugina, Rinard – PLDI00
• Pointer Specifications
• Hendren, Hummel, Nicolau – PLDI92
• Guyer, Lin – LCPC00
Conclusion
• Basic idea:
• Programmer provides
• Points-to information
• Region information
• Analysis
• Verifies correctness
• Uses information to enable further
analyses and transformations
• Lots of benefits to compiler and programmer
Descargar

Design-Driven Compilation