Department of Computer Sciences
Dynamic Shape Analysis
via Degree Metrics
Maria Jump & Kathryn S. McKinley
Department of Computer Sciences
The University of Texas at Austin
{mjump,[email protected]
Software is Dynamic
 Software is always changing
 New expectations
 New algorithms
 New applications
 New users
 New notions of what is cool
 Limited only by human ambition
First Law of Software: Software is a gas!
It expands to fit the container it is in!
[Nathan Myhrvold, Former CTO, Microsoft]
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Software Complexity
Property of a system that is directly proportional to the
difficulty one has in comprehending the system
at a level and detail necessary to make changes
without introducing instability or functional regressions
[Peter Rosser, Microsoft Software Engineer]
 Developed by large teams
 Few, if any, understand all parts
 Need tools to help
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Program Analysis
Program =
Code
+
Data
Software complexity leads to heap complexity
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Heap Complexity
 Exacerbated by modern languages
 Objects are smaller and more numerous
 Most objects allocated on the heap
 Objects encode program state
 Heap contains semantic, memory, and
concurrency bugs
Program analysis is incomplete
without heap analysis
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
An Opportunity
 Automatic memory management
examines live data on a regular basis
 Understand heap usage
 Detect heap-based bugs
 Optimize program based on heap usage
Opportunity to perform
heap analysis dynamically
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Shape Analysis
Characterize the “shape” of heap-allocated
pointer-based data structures and
verify shape-preserving properties
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Shape Analysis
Static
Dynamic
 Conservatively characterizes
 Discover shape of current
all “possible” shapes at every
program point
 Only works on small
programs
Program =
Code
20-Jun-2009
Department of Computer Sciences
+
data structure
 Generate assertions
for verification
 Monitor shape during
entire execution
Data
Jump & McKinley
Shape Analysis
HomeBrewed
Custom
Library
Library
% of Heap
100
LinkedHashMap: 99.3%
80
60
BinaryTree: 35.8%
HashMap: 59.5%
40
20
OctTree: 98%
20-Jun-2009
Department of Computer Sciences
Custom: 33%
Library: 58%
averag e
x a la n
pm d
lus e a r ch
luin d e x
jy th o n
fo p
e clip s e
ch a r t
b lo a t
a n tlr
ja c k
m p e g a u d io
ja v a c
r a y tr a ce
je s s
co m p r e ss
0
Jump & McKinley
ShapeUp
 Desired goals
 Discover shape of current data structure
 Discover dynamic invariants
 Monitor expected shape of current data
structure during execution
 Heap summarization graph [POPL ‘07]
 Class field-wise graph (CFWG)
 Identifies recursive data structures (RDS)
 Summarize dynamic degree invariants
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Class Field-Wise Graph (CFWG)
Heap
CFWG
Home-brewed data structure from SPECjbb2000 (simplified)
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Class Field-Wise Graph (CFWG)
Heap
CFWG
Home-brewed data structure from SPECjbb2000 (simplified)
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Class Field-Wise Graph (CFWG)
Heap
CFWG
Degree Metrics
Home-brewed data structure from SPECjbb2000 (simplified)
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Degree Metrics
 Degree metrics per instance:
 Out-degree: # of outgoing references
 In-degree: # of incoming references
 Only count edges between objects
of same class
 Use word in header to track in-degree of
backbone objects
 Summarize by class per data structure
instance in a degree profile
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=1
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=1
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=1
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=2
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=2
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=2
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=3
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=3
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=3
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=4
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=5
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=6
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=6
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=6
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=7
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=8
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=9
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=9
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=9
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=10
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=10
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=10
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=11
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=12
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=13
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=13
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=13
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=14
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating Degree Profile
n=15
CFWG
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Calculating
Degree Profile
constant
n
CFWG
range
20-Jun-2009
Complete Binary Tree
Department of Computer Sciences
Jump & McKinley
Calculating
Degree Profile
constant
CFWG
range
left
20-Jun-2009
Complete Binary Tree
Department of Computer Sciences
right
Jump & McKinley
Interpreting Degree Profile
 Monitoring degree profile shows errors:
 Error: constant violation
 Warning: range violation [min-,max+]
 Indicate problem with RDS during
development and testing
 Monitor shape after deployment for
dynamically introduced errors
What kind of errors can be detected?
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Implementation and Methodology
 Jikes RVM
 SPECjvm98, DaCapo, SPECjbb2000
 Heap composition
 Showed degree metric are stable by class
 Overheads
 <8% total time
 <1% space overhead [used bits in header]
 Microbenchmarks
 Single RDS in isolation
 Random error injection
20-Jun-2009
Department of Computer Sciences
Singly-linked list
Doubly-linked list
Binary Tree
Binary Tree w/ PP
Jump & McKinley
Looking for errors
TRAINING
 Ran microbenchmark 100 times
 Merged together dynamic invariants
ERROR DETECTION
 Created RDS with 100,000 nodes
 Errors: 1, 2, 3, 4, 5, 10, 50, and 100
 Detected dynamic invariant violations
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Doubly-linked List
=0
=1
=2
in
0
2
n-2
out
0
2
n-2
circle
disconnect
20-Jun-2009
Department of Computer Sciences
cyclic
skip
Jump & McKinley
Doubly-linked List
1
2
=0
=1
=2
in
0
2
n-2
out
0
2
n-2
3
4
5
10
50
100
100
% D e te ct io n
90
80
70
60
50
40
30
20
10
20-Jun-2009
Department of Computer Sciences
Ra nd o m
S kip
C y c lic
D is co n ne ct
C irc le
0
Jump & McKinley
Binary Tree
=0
=1
=2
Complete Binary Tree
in
out
1
n-1
0
[50.0,50.2]
0
[49.8,50.0]
Full Binary Tree
in
out
1
n-1
0
[50.0,50.1]
0
[49.9,50.0]
Random Binary Tree
in
out
1
n-1
0
[33.6,35.7]
[28.7,35.1]
[32.3,35.6]
link
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Binary Tree
=0
=1
=2
Complete Binary Tree
in
out
1
n-1
0
[50.0,50.2]
0
[49.8,50.0]
Full Binary Tree
in
1
2
3
4
5
10
50
out
100
100
n-1
0
[50.0,50.1]
0
[49.9,50.0]
Random Binary Tree
90
80
in
70
60
out
50
1
n-1
0
[33.6,35.7]
[28.7,35.1]
[32.3,35.6]
40
30
20
10
B TR a n dom
B T F u ll
0
B T C o m p le te
% D e te ct ion
1
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Binary Tree w/ PP
=1
=2
=3
Complete Binary Tree w/ PP
in
[50.0,50.0]
2
[49.9,50.0]
out
[50.0,50.0]
2
[49.9,50.0]
link
Full Binary Tree w/ PP
in
[50.0,50.1]
1
[49.9,50.0]
out
[50.0,50.1]
1
[49.9,50.0]
Random Binary Tree w/ PP
in
[33.8,35.2]
[30.0,32.6]
[33.6,34.8]
out
[33.8,35.2]
[30.0,32.6]
[33.6,34.8]
disconnect
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
20-Jun-2009
Department of Computer Sciences
R an d o m
B TP R a n d o m
R an d o m
50
B TP F u ll
R an d o m
10
B TP C o m p le te
D isc on n e ct
5
B TP R a n d o m
D isc on n e ct
4
B TP F u ll
3
D isc on n e ct
2
B TP C o m p le te
L ink
1
B TP R a n d o m
B TP F u ll L in k
L ink
B TP C o m p le te
% D e te c tio n
Microbenchmark: Binary Tree w/ PP
100
100
90
80
70
60
50
40
30
20
10
0
Jump & McKinley
ShapeUp’s Contributions




Performs dynamic heap analysis to determine shape
of current data structure
Summarizes degree metrics using a class field-wise
graph with low overheads
Shows whole-heap degree metrics are not stable, but
class degree metrics are
Introduces degree profiles for RDS


Regular RDS have more invariants than random RDS
Degree profiles can detect some shape errors
Thank You!
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Thank You!
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Space Overhead
jess
# of
types
bm+VM
Eclipse
Geomean
1744
3365
avg
318
667
max
319
775
# of
avg
edges max
844
4090
861
7585
1142
0.094%
0.167%
0.233%
0.233%
Increased Alloc %
20-Jun-2009
Department of Computer Sciences
19%
2.7X
1747
334
346
904
Jump & McKinley
Normalized Total Time
Time Overhead
Heap Size Relative to Minimum
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Singly-linked List
=0
=1
in
1
n-1
out
1
n-1
1 Error
head to tail
 cyclic := creates a
cycle from tail to
random node
90
80
Detection %
 circle := attaches
100
70
60
50
40
30
20
10
20-Jun-2009
Department of Computer Sciences
SL L C y c lic
SL L C ir c le
0
Jump & McKinley
Microbenchmark: HashMap
=0
=1
=2
in
[31.9,51.6]
[48.4,68.1]
{0}
out
[31.9,51.6]
[48.4,68.1]
{0}
1
2
3
4
5
10
50
100
100
90

link := creates a connection
between buckets from a null
ptr to a random entry
80
70
60
50
40
30
20
10
H M Lin k
0
20-Jun-2009
Department of Computer Sciences
Jump & McKinley
Data Structures in Benchmarks
Benchmark
RDS Node
Raytrace
OctNode
H
78.0
Jack
LinkedHashMap$LinkedHashEntry
L
44.2
RuntimeNfaState
H
9.4
Antlr
Object[]
H
76.2
Bloat
HashMap$HashEntry
L
56.6
CallMethodExpr
H
3.7
LinkedHashmap$LinkedHashEntry
L
59.0
AND_AND_Expression
H
0.4
HashMap$HashEntry
L
51.4
PropertyList
H
4.4
Jython
Pyframe
H
94.6
Luindex
LinkedHashMap$LinkedHashEntry
L
99.3
Lusearch
WeakHashMap$WeakBucket
L
47.5
HitDoc
H
2.0
ChildIterator
H
34.6
Eclipse
Fop
Xalan
20-Jun-2009
Department of Computer Sciences
L or H?
%
Jump & McKinley
Descargar

Cork: Dynamic Memory Leak Detection with Garbage …