From OODL Interpreters
to VM’s and Beyond
Jonathan Bachrach
MIT AI Lab
Outline
•
•
•
•
•
•
•
•
•
Material mostly from “Lisp In Small Pieces”
Metacircular Interpreters
CPS Interpreters
Fast Interpreters
VM’s and Compilation
Speeding up VM’s
VVM’s
In Language VM’s
Assignment #2
Lisp In Small Pieces
• Written by Christian Queinnec
• Highly recommend looking at this book
• Basis for a fair amount of today’s lecture
Q: Why Interpreters?
A: Why Interpreters?
• Semantics of language
– is manifest
– is measurable
• Allows more
– Dynamism
– Extensibility
Metacircular Interpreters
• Write interpreter in same language using
own special forms
• Complexity of resulting interpreter is a
measure of expressivity and simplicity of
language
• Dream is to get a compiler for free from
this!
Simple Environment
(dv <frame> <assocs>)
*env*
(dv <env> (isa <any>))
(slot <env> (env-next <env>))
(slot <env> (env-frame <env>) (isa <frame>))
next
next
frame
frame
keytext val
keytext val
keytextval
keytextval
(dv $empty-env (isa <env>))
(dm env-define-binding!
((env <env>) (name <sym>) (value <any>))
(set (elt (env-frame frame) name) value))
(dm env-extend!
((env <env>) (names <lst>) (values <lst>)
=> <frame>)
(let ((new-env (isa <env> (set env-next env))))
(do2 (fun (name value)
(set (elt (env-frame new-env) name) value))
names values)
new-env))
(dm env-binding-value
((env <env>) (name <sym>) => <any>)
(if (== env $empty-env)
nul
(let ((value (elt (env-frame env) name)))
(if (== value nul)
(env-binding-value (env-next env) name)
value))))
keytextval
Proto in Proto
(dm evol ((e <any>) (env <env>)) e)
(dm evol ((e ‘()) (env <env>)) e)
(dm evol ((e <sym>) (env <env>))
(env-binding-value env e))
(dm evol ((e <lst>) (env <env>))
(evol-list (head e) e env))
(dm evol-list ((_ 'quote) (e <lst>) (env <env>))
(sexpr-text-of-quotation e))
Q: What’s the Problem?
• Why might a metacircular interpreter not be
acceptable?
A: What’s the Problem?
• Creates bootstrapping issues
• Very inefficient
Tail Position / Calls
• If an expression is in tail position then it is
not necessary to restore the environment
– (fun (x) (f x))
– (fun (x) (if (f x) 5 6))
• Tail calls are calls that are in tail position
and don’t save/restore environment
– Bind arguments
– Goto function
Continuation Passing Style (CPS)
• CPS = Continuation Passing Style
– Explicitly telling a computation where to send
the result of that computation
– Once computation is complete, executor applies
receiver to result rather than returning it as a
normal value
• Example
– (fun (x) (f x)) => (fun (x k) (f’ x k))
CPS Interpreter
• Control is more explicit by passing
continuations through interpreter
• Allows one to code complicated control
forms without built-in support.
– E.g., call/cc
CPS Interpreter Example
(dm evol-list ((_ ‘quote) (e <lst>) (env <env>) (k <fun>))
(k (sexpr-text-of-quotation e)))
(dm evol-list ((_ 'if) (e <lst>) (env <env>) (k <fun>))
(evol (sexpr-if-test e) env
(fun (e)
(if e
(evol (sexpr-if-then e) env k)
(evol (sexpr-if-else e) env k)))))
(dm evol-seq+ ((e+ <lst>) (env <env>) (k <fun>))
(if (empty? (tail e*))
(evol (head e*) env k)
(evol (head e*) env
(fun (e) (evol-seq+ (tail e*) env k)))))
(dm evol-seq ((e* <lst>) (env <env>) (k <env>))
(if (empty? e*)
(k #f)
(evol-seq+ e* env k)))
(dm evol-list ((_ 'seq) (e <lst>) (env <env>) (k <fun>))
(evol-seq (sexpr-begin-actions e) env k))
CPS Wins
(dm evol-list ((_ 'lab) (e <lst>) (env <env>) (k <fun>))
(let ((exit (make-function '(x) `(,k x) env)))
(evol-seq (sexpr-block-body e)
(env-extend! env (lst (sexpr-block-name e)) (lst exit))
k)))
CPS Loses
• Problem is that some special forms like fin
(cf. Lisp’s unwind-protect) are difficult* to
implement without resorting to built-in
support
– *It appears that they are possible with enough
continuations
• Conses (allocates) lots of closures for
continuations during interpretation
Interpreter Pretreatment
• Lighten up environment
• Invent beginnings of instruction set
– Implemented in terms of thunks (zero
parameter functions)
• Threaded as tree of calls to thunks
• Reject environment (rep’d as global *env*)
• Keep track of tail position
Fast Environment
• Globals stored in vector
• Locals are in list of inlined
frames
• Model environment structure in
static environment
• Resolve environment accesses
to fixed or relative offsets
– Globals are fixed offset
– Locals are two relative offsets
• N.B., Can do even better by
flattening ribcage environment
into a stack
*env*
*globals*
next:
next:
Pretreatment
• Pretreatment of programs is handled by the
function meaning (compile)
• Converts into thunks that will serve as
threading mechanism
• Each thunk can be thought of as an address
to which we simply jump to execute
Inventing Instructions
• Use combinators which are like instructions
for an abstract machine:
(dm CONSTANT (value)
(fun () value) )
(dm meaning-list ((_ 'quote) (e <lst>) r t?)
(fun ()
(CONSTANT (sexpr-text-of-quotation e)) ))
Interpreter Pretreatment Output
(fun (n f k)
(if (= n 0)
(k 1)
(f (- n 1) f
(fun (r) (k (* n r))))))
(FIX-CLOSURE
(ALTERNATIVE
(CALL2 #<=>
(SHALLOW-ARGUMENT-REF 0)
(CONSTANT 0))
(TR-REGULAR-CALL
(SHALLOW-ARGUMENT-REF 2)
(STORE-ARGUMENT (CONSTANT 1)
(ALLOCATE-FRAME 1) 0)
(TR-REGULAR-CALL
(SHALLOW-ARGUMENT-REF 1)
(STORE-ARGUMENT
(CALL2 #<-> (SHALLOW-ARGUMENT-REF 0)
(CONSTANT 1))
(STORE-ARGUMENT (SHALLOW-ARGUMENT-REF 1)
(STORE-ARGUMENT
(FIX-CLOSURE
(TR-REGULAR-CALL
(DEEP-ARGUMENT-REF 1 2)
(STORE-ARGUMENT
(CALL2 #<*> (DEEP-ARGUMENT REF 1 0)
(SHALLOW-ARGUMENT-REF 0))
(ALLOCATE-FRAME 1) 0))
1)
(ALLOCATE-FRAME 2) 2)
1)
0)))
3)
Control Forms
(dm ALTERNATIVE (m1 m2 m3)
(fun () (if (m1) (m2) (m3))) )
(dm meaning-list ((_ 'if) (e <lst>)
(let ((m1 (meaning (sexpr-if-test
(m2 (meaning (sexpr-if-then
(m3 (meaning (sexpr-if-else
(ALTERNATIVE m1 m2 m3) ) )
r t?)
e) r #f))
e) r t?))
e) r t?)) )
Sequence
(dm SEQUENCE (m m+)
(fun () (m) (m+)) )
(dm meaning-sequence (e+ r t?)
(if (empty? e+)
(static-wrong "Illegal syntax: (seq)")
(if (empty? (tail e+))
(meaning (head e+) r t?)
(meaning*-sequence (head e+) (tail e+) r t?))))
(dm meaning*-sequence (e e+ r t?)
(let ((m1 (meaning e r #f))
(m+ (meaning-sequence e+ r t?)) )
(SEQUENCE m1 m+) ) )
Closures
(dm FIX-CLOSURE (m+ arity)
(let ((arity+1 (+ arity 1)))
(fun ()
(loc ((the-function (v* sr)
(if (= (frame-length v*) arity+1)
(seq (set *env* (sr-extend* sr v*))
(m+) )
(wrong "Incorrect arity") ) )
(make-closure the-function *env*) ) ) )
(dm meaning-fix-abstraction (n* e+ r t?)
(let ((arity (len n*))
(r2 (r-extend* r n*))
(m+ (meaning-sequence e+ r2 #t)) )
(FIX-CLOSURE m+ arity) ) )
Calls
(dm TR-REGULAR-CALL (m m*)
(fun ()
(let ((f (m))) (invoke f (m*)) ) ) )
(dm REGULAR-CALL (m m*)
(fun ()
(let ((f (m)) (v* (m*)) (sr *env*)
(result (invoke f v*)) )
(set *env* sr)
result ) ) )
(dm meaning-application (e e* r t?)
(let ((m (meaning e r #f))
(m* (meaning* e* r (len e*) #f)) )
(if t? (TR-REGULAR-CALL m m*) (REGULAR-CALL m m*)) ) )
Argument Processing
(dm meaning* (e* r size t?)
(if (empty? e*)
(meaning-no-argument r size t?)
(meaning-some-arguments (head e*) (tail e*) r size t?)))
(dm meaning-no-argument (r size t?)
(ALLOCATE-FRAME size))
(dm meaning-some-arguments (e e* r size t?)
(let ((m (meaning e r #f))
(m* (meaning* e* r size t?))
(rank (- size (+ (len e*) 1))))
(STORE-ARGUMENT m m* rank)))
Q: Where Do We Go From Here?
VM Introduction
• Portable
• Small
• Factors Implementation
VM Design Parameters
• Space
– Runtime
– Delivery
• Efficiency
– Runtime
– Compilation
• Analyzability
– Inferred types
– Control flow info
From Interpreter to VM
•
•
•
•
•
Linearize
Invent registers
Invent stack
Code generation through concatenation
Byte code instructions
Byte Code Machine
next:
*env*
next:
code
*fun*
*stack*
used
*stack-index*
*pc*
free
... 34 250 41 4 34 ...
Bytecode
env
*constants*
“Instructions”
(SHALLOW-ARGUMENT-REF j)
(DEEP-ARGUMENT-REF I j)
(DEEP-ARGUMENT-SET! I j m)
(CHECKED-GLOBAL-REF I)
(CONSTANT V)
(SEQUENCE M M+)
(FIX-LET M* M+)
(CALL1 ADDRESS M1)
(CALL3 ADDRESS M1 M2 M3)
(REGULAR-CALL M M*)
(ALLOCATE-FRAME SIZE)
M,m1,m2,m3,m+,v are values
Rank,arity,size,I,j are ints
Address is predefined funct
(PREDEFINED I)
(SHALLOW-ARGUMENT-SET J M)
(GLOBAL-REF I)
(GLOBAL-SET! I M)
(ALTERNATIVE M1 M2 M3)
(TR-FIX-LET M* M+)
(CALL0 ADDRESS)
(CALL2 ADDRESS M1 M2)
(FIX-CLOSURE M+ ARITY)
(TR-REGULAR-CALL M M*)
(STORE-ARGUMENT M M* RANK)
Introduce *val*
• Break implicit communication between
instructions by introducing result register
(dm CONSTANT (value)
(fun () (set *val* value)))
Inventing Stack
(dm STORE-ARGUMENT (m m* rank)
(fun () (m)
(let ((v *val*)) (m*)
(set (frame-argument *val* rank) v))))
=>
(dm STORE-ARGUMENT (m m* rank)
(fun () (m) (stack-push *val*)
(m*)
(let ((v (stack-pop)))
(set (frame-argument *val* rank) v))))
Linearization
(dm SHALLOW-ARGUMENT-SET! (j m)
(fun () (m)
(set (frame-argument! *env* j) *val*)))
=>
(dm SHALLOW-ARGUMENT-SET! (j m)
(cat m (SET-SHALLOW-ARGUMENT! J)))
(dm SET-SHALLOW-ARGUMENT! (j)
(lst (fun () (set (frame-argument! *env* j) *val*)))))
Inventing PC
(dm REGULAR-CALL (m m*)
(cat m (PUSH-VALUE)
m* (POP-FUNCTION) (PRESERVE-ENV)
(FUNCTION-INVOKE)
(RESTORE-ENV)))
(dm PUSH-VALUE ()
(lst (fun () (stack-push *val*))))
(dm POP-FUNCTION ()
(lst (fun ()
(set *fun* (stack-pop))))
(dm PRESERVE-ENV ()
(lst (fun () (stack-push *env*))))
(dm FUNCTION-INVOKE ()
(lst (fun () (invoke *fun*))))
(dm RESTORE-ENV ()
(lst (fun ()
(set *env* (stack-pop)))))
(dm run ()
(let ((inst (head *pc*)))
(set *pc* (tail *pc*))
(inst)
(run)))
VM Dispatch Loop
(dm run ()
(let ((instruction (head *pc*)))
(set *pc* (tail *pc*))
(instruction)
(run)))
• Already removed decode overhead by translating
incoming bytecodes to direct calls
• Still extra overhead
– Return from instruction
– Loop
Reducing Dispatch Overhead
• Convert back to threaded code
• Each instruction is responsible for calling next one:
(dm CONSTANT (value)
(fun () (set *val* value)
(let ((instruction (head *pc*)))
(set *pc* (tail *pc*))
(instruction))
• Example Push3 on PowerPC takes
– 2 instrs to do real push3
– 11 total instrs for pc loop version
– 5 total instrs for threaded version
• Can do same with fast interpreter
– Use CPS threading
– Consing not an issue if closures created during pretreatment
Hand Selected Macro
Instructions
• Still have overhead especially for simple
instructions
• Can do better by combining oft-occurring
sequences of instructions
– amortizing the decode/dispatch overhead
– customizing resulting instruction to remove any
other interpretive overhead
– shrinks byte-code size
Automatic Macro
Instruction Detection
• Can run statistics off-line to determine best
macros
• Can also run statistics on-line
VVM’s
•
•
•
•
Tower of VM’s
Record expansions
Concatenate
Peephole optimize at each level
– Specialization of instructions
• Now that actual arguments are available
– Between instruction optimizations
Boxing/Unboxing
• Objects all the way down
• Integers are objects and must be self-identifying
• One way to encode them is to wrap integer data in
a “box” which also points to integer prototype
• Integer ops must then first unbox before running
machine level integer op and finally box result
• We’ll be talking about alternative representations
later
VVM Example
(fun (x y) (* (+ x 1) y))
=>
(SHALLOW-ARGUMENT-REF 0 R0)
(CONSTANT 1 R1)
(I+ R0 R1 R2)
(SHALLOW-ARGUMENT-REF 1 R3)
(I* R2 R3 R4)
(RETURN R4)
(SHALLOW-ARGUMENT-REF 0 R0)
(CONSTANT 1 R1)
(UNBOX-INT R0 R2)
(UNBOX-INT R1 R3)
(PI+ R2 R3 R4)
(BOX-INT R4 R5)
(SHALLOW-ARGUMENT-REF 1 R6)
(UNBOX-INT R5 R7)
(UNBOX-INT R6 R8)
(PI* R7 R8 R9)
(BOX-INT R9 R10)
(RETURN R9)
R/VVM’s
• Provides extensible VM infrastructure
• Permits the running of many different
instruction sets customized to different
languages
• Examples
– Greg Sullivan’s DVM
– Ian Piumarta’s VVM
Bytecodes can be Faster
than Direct Execution
• When
– There is a high price for cache misses
– VM can fit inside cache
– The interpretive overhead of the bytecodes is small
(i.e., they are high-level)
• But
– There will be cases where interpretive overhead is high
• Suggests hybrid approach
– Demand driven translation
– Analysis determines interpretive overhead
In C VM’s
+
-
Portable
Hard to interoperate with native code
Needs own versions of special forms
Needs its own FFI to C
In Language VM’s
+ Write VM in Same Language
- Challenging to get high performance
type system can be overly restrictive
- Tough Bootstrap
IDVM
• In Dylan Virtual Machine was developed at
Harlequin by Tony Mann and Eliot Miranda
• Consider IPVM
Example
(dm is-numeric? ((x <int>)) #t)
=>
(dm IPVM-assign-constant-to-result
((vm-code <vec>) (ip <int>) result (vars …))
(let ((new-result (elt vm-code ip))
(new-ip
(+ ip 1))
(next-inst (elt vm-code new-ip)))
(apply next-inst vm-code (+ new-ip 1) new-result vars)))
(dm IPVM-return ((vm-code <vec>) (ip <int>) result (vars …))
result)
(fun ((x <int>))
(let ((ip 0)
(vm-code (vec IDVM-assign-constant-to-result #f IDVM-return))
(first-inst (elt vm-code ip)))
(first-inst vm-code (+ ip 1) #f a)))
The Trick
• Considerable effort was expending optimizing the
following idiom:
(dm foo (a b c (thingies …))
…
(apply somefunction a b c thingies))
• Converted into a tail call with thingies being stack
allocated
• Each IPVM instruction tail calls the next with
code, pc, and locals as arguments.
Open Problems
• Compiler for Free with VVM’s
• Hybrid VM’s
• Type system’s powerful enough to do in
language VM’s
• Analyzable VM
Readings
•
•
•
•
•
•
•
•
•
•
•
Queinnec: Lisp In Small Pieces
Deutsch, Schiffman: Efficient Implementation of the Smalltalk-80 System
Piumarta: Optimizing threaded code by selected inlining
Doyle, Moss: When are bytecodes faster than direct execution?
Withington, McKay, Palter, Robertson: The symbolics virtual lisp machine or
using the DEC alpha as a programmable micro-engine
Mann, Miranda: A Dylan virtual machine in Dylan or son of Dylan Compiler
in Dylan
Folliot, Piumarta, Riccardi: A dynamically configurable multi-language
execution platform
Miranda: BrouHaHa - a portable smalltalk interpreter
Ford et al: microkernels meet recursive virtual machines
Chambers, Unger: Making pure object-oriented languages practical
Ingalls: The Evolution of the Smalltalk Virtual Machine
Assignment #2
• Write a meta circular interpreter for proto
• Implement all the special forms
• Assume (i.e., use)
– the object system
– the reader and parsing functions
– the special forms themselves
Descargar

Interpreters and VM’s for OODL’s