MemScale: Active Low-Power Modes
for Main Memory
Qingyuan Deng, David Meisner*, Luiz Ramos,
Thomas F. Wenisch*, and Ricardo Bianchini
Rutgers University
*University of Michigan
1
Server memory power challenges
Power (% of peak)
Power consumption of a Google server [Barroso & Hoelzle’07]
Compute Load (%)
• DRAM power varies little with load
• Memory power represents 30-40% of total power for typical loads
• Fraction is larger since memory controller power is not included
2
Improving memory energy efficiency
• Observation: Memory bandwidth is rarely fully utilized [Meisner’11];
we can save energy during periods of light and moderate load
• Previous approaches
• Leveraging DRAM idle low-power state
[Lebeck’00][Delaluz’01][Li’04][Diniz’07]…
• Rank sub-setting and DRAM reorganization
[Ahn’09][Udipi’10][Zheng’10]…
• Memory controller power is typically not considered
• Need active low-power modes to save energy when underutilized
• Frequency has greater impact on bandwidth than latency
3
MemScale: Active low-power modes for memory
• Goal: Dynamically scale memory frequency to conserve energy
• Hardware mechanism:
• Frequency scaling (DFS) of the channels, DIMMs, DRAM devices
• Voltage & frequency scaling (DVFS) of the memory controller
• Key challenge:
• Conserving significant energy while meeting performance constraints
• Approach:
• Online profiling to estimate performance and bandwidth demand
• Epoch-based modeling and control to meet performance constraints
• Main result:
• System energy savings of 18% with average performance loss of 4%
4
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
5
Impact of frequency scaling on memory latency
Req
CL
Reply
800 MHz
MC
PRE
ACT
Burst
ACT
CL
Burst
PRE
Time
400 MHz
MC
ACT
CL
Burst
PRE
• For DDR3 DRAM, scaling frequency from 800MHz to 400MHz:
bandwidth down by 50%, latency up by only 10%
6
Opportunity for MemScale
Power % (normalized)
100%
Background
Dynamic
MC
80%
60%
40%
20%
0%
MEM INTENSIVE
INTERMEDIATE
COMPUTE INTENSIVE
Background: clock tree, I/O driver, register, PLL, DLL, refresh, others
Dynamic: read, write, termination
MC: memory controller
• Effects of lower frequency on power:
• Lowers background power linearly (~f)
• Lowers MC power by cubic factor (~f^3)
7
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
8
MemScale design
• Goal: Minimize energy under user-specified slowdown bound
• Approach: OS-managed, epoch-based memory frequency tuning
• Each epoch (e.g., an OS quantum):
1. Profile performance & bandwidth demand
•
New performance counters track mem latency, queue occupancies
2. Estimate performance & energy at each frequency
•
Models estimate queuing delays & system energy
3. Re-lock to best frequency; continue tracking performance
•
Slack: delta between estimated & observed performance
4. Carry slack forward to performance target for next epoch
9
Frequency and slack management
CPU
Pos. Slack
Neg. Slack
Pos. Slack
Actual
Profiling
Target
EstimateCalculate
performance/energy
slack vs. target
via models
MC, Bus + DRAM
High Freq.
Low Freq.
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Time
10
Modeling of performance and energy
• New performance counters enable estimate of
• Level of contention (bank and bus)
• Energy consumption
• CPI of each application
• Avg memory latency
• Performance slack
• Estimate full system energy
11
MemScale adjusts frequency dynamically
Timeline of workload mix MID3
12
Outline
• Motivation and overview
• Background on memory systems
• MemScale: DVFS for the memory system
• Results
• Conclusions
13
Methodology
• Detailed simulation
• 16 cores, 16MB LLC, 4 DDR3 channels, 8 DIMMs
• Multi-programmed workloads from SPEC suites
• Power modes
• 10 frequencies between 200 and 800 MHz
• Power consumption
• Micron’s DRAM power model
• Memory system power = 40% of total server power
14
Results – energy savings and performance
Average energy savings
Performance overhead
Full system energy
Memory system energy
80%
70%
12%
Multiprogram average
Worst program in mix
10%
CPI degradation bound
8%
50%
CPI increase (%)
Energy savings (%)
60%
40%
30%
20%
10%
0%
6%
4%
2%
0%
ILP
MID
MEM
AVG
ILP
MID
MEM
AVG
Memory energy savings of 44%
System energy savings of 18% always within performance bound
15
Alternative approaches
• Fast power-down
• Transition ranks into fast power-down mode when idle
• Decoupled-DIMM [Zheng’09]
• Low frequency DRAM + high frequency DIMMs & channels
• Static
• Pre-selected active low-power mode w/o dynamic scaling
• Unrealistic: needs a priori knowledge of workload behavior
16
Results – comparison to alternative approaches
20%
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
F
D
Performance overhead (MID)
Multiprogram average
Worst program in mix
10%
9%
8%
7%
6%
5%
4%
3%
2%
1%
0%
CPI increase (%)
Energy Savings (%)
Full System
Energy Saving
Full system
energy
savings (MID)
- PD
t
s
a
MM
I
D
led
p
u
eco
]
'g 09
en
[Zh
ti
a
t
S
c
M
Sca
m
e
M
le
+
le
a
c
S
em
-PD
t
s
Fa
F
D
- PD
ast
M
-DI
d
le
up
o
ec
]
'09
g
en
Zh
[
M
Sta
t ic
M
Sca
em
M
le
Sca
em
le
ast
+F
-PD
17
Conclusions
• MemScale contributions:
• Active low-power modes for the memory subsystem
• New perf. counters to capture energy and contention
• OS policy to choose best power mode dynamically
• Avg 18% system energy savings, avg 4% performance loss
• In the paper
• Performance and energy models
• Sensitivity analyses (including lower performance bounds)
• Energy break-down comparison
18
THANKS!
SPONSORS:
19
Descargar

幻灯片 1 - Rutgers University