Are New Languages Necessary for Manycore?
David I. August
Department of Computer Science
Princeton University
SPEC CPU INTEGER PERFORMANCE
THIS is the Problem!
2004
?
TIME
David I. August
Why New Multicore Languages Will Fail
1.
2.
3.
4.
5.
6.
Money is earned by relieving customer pain
The Market
Legacy, Legacy, Legacy
Programmers adopt new programming models
Parallel programming is more difficult
Parallel programming models have longevity issues
7. Automatic Thread Extraction (ATE)
David I. August
Automatic Thread Extraction
“That isn't to say we are parallelizing
arbitrary C code, that's a fool's errand!”
– Richard Lethin
“Compiler can’t determine a tree from a
graph…” – Burton Smith
“Compiler can’t determine dependences
without type information. Even then…”
– Burton Smith
“Decades of automatic parallelization work
has been a failure…” – James Larus
“All that icky pointer chasing code...”
– Tim Mattson
David I. August
How To Get Parallelism For Multicore?
• Nine months ago, with an open mind…
• A priori select ALL C programs from SPEC CINT 2000
• Our objective function (in priority order):
1. Extract meaningful parallelism
2. Prefer automatic over manual
3. Minimize impact to the programmer when manual
David I. August
Our Results
Benchmark
Threads at Peak Speedup LOCs Changed
164.gzip
32+
29.91
26
175.vpr
15
3.59
1
176.gcc
16
5.06
17
181.mcf
32+
2.84
0
186.crafty
32+
25.18
9
197.parser
32+
24.50
2
253.perlbmk
5
1.21
0
254.gap
10
1.94
1
255.vortex
32+
4.92
0
256.bzip2
12
6.72
0
300.twolf
8
2.06
1
GEOMEAN
17
5.54
ARITHMEAN
20
9.81
M.L.O.P.:
5 Generations
32 Cores
5.3x Speedup
David I. August
Our Recipe
Recent Compiler Technology:
• Decoupled Software Pipelining (DSWP) [MICRO 05]
• Parallel-Stage DSWP (PS-DSWP)
• Speculative DSWP (Spec-DSWP) [PACT 07]
• Existing Technology: Speculative DOALL, TLS
• Targeted Memory Profiling
• Procedure Boundary Elimination [PLDI 06]
Hardware Support:
• Compiler-Controlled Speculation
• Streaming Communication [MICRO 06]
David I. August
Typical Example: 197.parser
Threads run on
multicore model
with Itanium 2 cores.
Find
English
Sentences
Parse
Sentences
(95%)
DSWP
Emit
Results
PS-DSWP (Spec DOALL Middle Stage)
David I. August
What We Learned
1. A new way of thinking about dependences:
Go With the Flow
2. TLP is easier to extract than ILP
3. A holistic approach is better
4. A limitation exists in the sequential model:
Determinism
David I. August
Determinism: A Double Edged Sword
while(<cond>):
<work>
x = Rand()
<work>
int Rand():
state = f2(state)
return f1(state)
DOALL
1
2
3
4
SEQUENTIAL
1
2
3
4
56 LOCs in 11 programs: 22 annotations
Only 2 programs needed more
Most common culprit: Custom Allocators
David I. August
What about Manycore?
Multicore
• New languages aren’t necessary
• Legacy code easily adjusted
Manycore
• Implicitly Parallel Sequential Programming
• No optimization for sequential (custom allocators)
• Points of non-determinism specified
• Parallel algorithms in sequential codes
• Debuggability, Understandability, Sanity
David I. August
The Answer Originates with ATE
The Old Way:
PL folks would write languages,
Architecture folks would make HW, and
Compiler folks would dutifully connect the two.
This will fail for Manycore:
• Unduly burden the programmer
• Performance will suffer
There’s a New Way…
David I. August
DO NOT POST ANYTHING AFTER THIS SLIDE
David I. August
How Code Was Transformed
Benchmark
LOC
(All)
LOC
(Model)
Model
Techniques
Compiler
Techniques Applied
164.gzip
26
2
Y-Branch
TLS Memory, DSWP
175.vpr
1
1
PURE
Alias, Value, & Control Spec, TLS Mem,
DSWP
176.gcc
17
7
PURE
Alias & Control Spec, TLS MEM, DSWP
181.mcf
0
0
186.crafty
9
9
PURE
TLS Mem, DSWP, Nested
197.parser
2
2
PURE
TLS Mem, DSWP
253.perlbmk
0
0
254.gap
1
1
255.vortex
0
0
Alias & Value Spec, TLS Mem, DSWP
256.bzip2
0
0
TLS Memory, DSWP
300.twolf
1
1
Alias, Silent Store, & Control Spec, TLS
Mem, DSWP, Nested
Alias, Control, & Value Spec, DSWP
PURE
PURE
TLS Memory, DSWP, Alias Spec
Alias & Control Spec, TLS Mem, DSWP
David I. August
PURE
David I. August
Y-Branch
David I. August
SPEC 2006: 403.gcc
Threads run on multicore model with Itanium 2 cores.
David I. August
Descargar

Are New Languages Necessary for Manycore?