Programming Languages: The
Essence of Computer Science
Robert Harper
Carnegie Mellon University
October, 2002
CS Is About Programming
• How to build systems.
– Better, faster, cheaper.
– Reliable, maintainable, extensible.
• Evaluating and comparing systems.
– Performance, behavior, security.
– Compliance, usability.
• What can and can’t be done.
– Algorithms and complexity.
Programming Is Linguistic
• Programming is an explanatory activity.
– To yourself, now and in the future.
– To whoever has to read your code.
– To the compiler, which has to make it run.
• Explanations require language.
– To state clearly what is going on.
– To communicate ideas over space and
time.
Programming Languages Are
Essential
• Therefore languages are of the essence
in computer science.
– Programming languages, in the familiar
sense.
– Description languages, design patterns,
architecture languages, specification
languages.
Some Conventional Wisdom
• PL research is over.
– The language of the future is X … for various
values of X.
– It’s not about the one, true language!
• Anyone can design a PL.
– Clearly, anyone does.
– But look what you get! TCL, Perl, TeX, C++.
• PL research is irrelevant to practice.
– Seen as a purely academic pursuit.
– But the tide is turning as successes accumulate.
Some Accomplishments
•
•
•
•
•
•
High-level languages.
Static type disciplines.
Automatic storage management.
Objects, classes, ADT’s.
Sophisticated compiler technology.
Specification and verification.
Why People Don’t Notice
• It takes decades to go from research to
practice.
– Similar to other areas, such as algorithms, AI.
– Large installed base of programmers.
• Ideas become “common sense” long before
they are widely used.
– Even very simple things, such as lexical scope,
were once controversial!
– Not to mention data abstraction, OOP, ….
Some Important Trends
• Safety with performance.
– High-level languages with good compilers.
– Low-level languages with good type
systems.
• Languages, not tools.
– The design should live with the code.
– Conformance should be checkable
throughout the evolution of the code.
Some Important Trends
• Interplay between theory and practice.
– “Prove theorems and report numbers.”
– Build systems to assess practicality and to
discover new problems.
• Full-spectrum coverage.
– Functional and imperative.
– High and low-level languages.
– Design and implementation.
Some Important Trends
• Type theory as the GUT of PL’s.
– Provides a precise criterion for safety and
sanity of a design.
– “Features” correspond to types.
– Close connections with logics and
semantics.
What Is A Type System?
• Static semantics: the well-formed
programs.
• Dynamic semantics: the execution
model.
What is a Type System?
• Safety theorem: types predict behavior.
– Types describe the states of an abstract
machine model.
– Execution behavior must cohere with these
descriptions.
• Thus a type is a specification and a type
checker is a theorem prover.
Types Are Specifications
• Examples:
– e : float means that e evaluates to a value
in, say, floating point register 0.
– e : float ! int means that e is a procedure
that is called with arg in FR0 and returns
with result in GR0.
– e : queue means that e behaves like a
queue.
Types Are Formal Methods
• Type checking is the most successful formal
method!
– In principal there are no limits.
– In practice there is no end in sight.
• Examples:
– Using types for low-level languages, say inside a
compiler.
– Extending the expressiveness of type systems for
high-level languages.
Types in Compilation
• Conventional compilers:
Source = L1  L2  …  Ln = Target
:
T1
• Types apply only to the source code.
– Type check, then discard types.
– If compiler is correct, target code is safe.
Typed Intermediate
Languages
• Generalize syntax-directed translation
to type-directed translation.
– Intermediate languages come equipped
with a type system.
– Compiler transformations translate both a
program and its type.
– Translation preserves typing: if e:T then
e*:T* after translation
Typed Intermediate
Languages
• Type-directed translation:
Source = L1  L2  …  Ln = Target
:
:
:
T1  T2  …  Tn
• Transfers typing properties from source
code to object code.
– Check integrity of compiler.
– Exploit types during code generation.
Certifying Compilers
• Types on object code certify its safety.
– Type checking ensures safety.
– Type information ensures verifiability.
• Examples of certified object code:
– TAL = typed assembly language.
– PCC = bare code + proof of safety.
– Many variations are being explored.
TAL Example
fact: 8 .{r1:int, sp:{r1:int, }::}
jgz r1, positive
mov r1,1
ret
positive:
push r1
; sp:int::{t1:int,sp:}::
sub r1,r1,1
call fact[int::{r1:int,sp:}::]
imul r1,r1,r2
pop r2
; sp:{r1:int,sp:}::ret
Types for Low-Level Languages
• What is a good type system for a low-level
language?
–
–
–
–
Should expose data representations.
Should allow for low-level “hacks”.
Should be verifiably safe.
Should not compromise efficiency.
• Current systems make serious compromises.
– Very weak safety properties.
– Force atomicity of complex operations.
Example: Memory Allocation
• Most type systems take an atomic view
of constructors.
– Allocate and initialize in “one step”.
– Even HLL’s like Java impose restrictions.
• We’d like to expose the “fine structure”.
– Support code motion such as coalescing.
– Allow incremental initialization.
– But nevertheless ensure safety!
Example: Memory Allocation
• An allocation protocol (used in TILT):
– Reserve: obtain raw, un-initialized space.
– Initialize: assign values to the parts.
– Allocate: baptize as a valid object.
• Current type systems cannot handle this.
– Partially initialized objects.
– In-place modification of parts.
– Interaction with collected heap.
Example: Memory Allocation
Heap
?
1
?
2
?
0
?
HP
AP
AP
HP
LP
A Low-Level Type System
• Borrow two ideas from linear logic.
– Restricted and unrestricted variables.
– A modality to mediate between them.
• Restricted variables are resources.
– Think: pointer into middle of an object.
• Unrestricted variables are standard.
– Think: heap pointer.
A Low-Level Type System
• Variables are bound to valid objects.
– Can be used freely.
– Garbage collected when inaccessible.
• Resources are bound to parts of
objects-in-creation.
– Cannot be passed around freely.
– Explicitly allocated and disposed.
Restrictions on Resources
• Linearity: use resources exactly once.
– Admits re-typing after initialization.
– Ensure allocation before general usage.
• Ordering: resource adjacency matters.
– Admit “pointers into the middle” of objects.
– Supports in-place, piecemeal initialization.
Variables and Resources
• Typing judgments:
• Ordering of x’s does not matter.
– Abstract “mobile” locations.
• Ordering and usage of a’s does matter.
– Abstract “pinned” locations, with (a form of)
pointer arithmetic.
Low-Level Type Constructors
• Contiguous data: 1 ² 2.
– Two contiguous values.
– Two adjacent words: int ² int.
• Mobile data object: !.
– A fully initialized object of type .
• Example: 1 £ 2 := ! (1 ² 2).
– A pointer to an adjacent pair of values.
Allocating a Pair
• Allocate (1,2):
Create
Resource
Initialize
Reserve
names
aspace
a
using
used
for
ita.up!
up.
Re-introduce
aat1.parts.
1, is
Fuse
and allocate.
Must parts
be empty
on return.
What Have We Gained?
• The ordered type system ensures:
– All reserved data is eventually allocated.
– Initialization can happen in any order.
– Cannot read un-initialized memory.
• May also be used for inter-operability.
– Adjacency requirements.
– Marshalling and un-marshalling.
– Precise control over representations.
Types for High-Level Languages
• Recall: types express invariants.
– Calling sequence, data representation.
– Abstraction boundaries (polymorphism).
• Theme: capture more invariants.
– Representation invariants.
– Protocol compliance.
• Trade-off expressiveness for simplicity
and convenience.
Data Structure Invariants
• Example: bit strings.
– nat: no leading 0’s.
– pos : a non-zero Nat.
– bits : any old thing.
• Goal: check such properties of code.
– Incr takes nat to pos, preserves pos.
– Left shift preserves nat, pos.
Data Structure Invariants
• Properties of interest:
– pos ` nat ` bits
• Operations:
–  : bits
– add0 : bits ! bits Æ nat ! pos
– add1 : bits ! bits Æ nat ! pos
Data Structure Invariants
• Logical consequences:
– add0 : nat ! pos Æ nat ! nat
– add1 : pos ! pos
–  : bits
• Type check code to check invariants!
– Simple bi-directional algorithm suffices.
– Example: increment.
Data Structure Invariants
• Increment:
inc: bits ! bits Æ nat ! pos
fun inc () ) 1
| inc (b0)) b1
| inc (b1)) (inc b)0
Data Structure Invariants
• Fully mechanically checkable!
– Type check inc twice, once for each conjunct.
– First pass: assume argument is just bits, derive
that result is therefore bits.
– Second pass: assume argument is nat, derive that
result is pos.
• Requires checking entailment of properties.
– Decidable for subtype-like behavior.
Value Range Invariants
• Array bounds checking (a la Pascal):
– [0..size(A)]
• Null and non-null objects:
– null, really(C)
• Aliasing:
– its(c): the object c itself
Watch Out!
• Such types involve dynamic entities.
– [0..size(A)] : A is an array.
– its(o) : o is a run-time object.
• But types are static!
– What is an expression of type
[0..size(if … then A else B)]???
• How to get static checking?
Dependent Types
• Solution: compile-time proxies.
– Track values insofar as possible.
– Existentially quantify when its not.
• Examples:
– 0 : its(0)
– + : its(1) £ its(2) ! its(4)
– if … then 1 else 2 : 9 n its(n)
Types for the World
• What about the state of the world?
– The lock l is held.
– The file f is open.
– The contents of cell c is positive.
• But such properties change as
execution proceeds!
– Here I’m holding the lock, there I’m not.
– The file is open now, not later.
Types for the World
• Want a simple, checkable type system
for the “world”.
– Types characterize the current state.
– Types of the state change over time.
• But what kind of type system?
– Properties of the world are ephemeral.
– Facts are no longer persistent.
• Need: a logic of ephemera.
Types for the World
• Ephemera behave strangely:
– Cannot be replicated: holding twice is not
the same as holding once.
– Cannot be ignored: must not lose track of
whether you hold a lock.
• Once again linear logic does the trick!
– Model parts of the world as resources.
– Type changes track state changes.
Types for the World
• A very simple locking protocol:
– acquire :
8 l its(l) / free(l) !
unit / held(l)
Value / World
– release :
8 l its(l) / held(l) ! free(l)
– new :
A “linear”
type. / free(l)
“Replaces”
Gets “used
premise.
up”.
unit
! function
9 l unit
Types for the World
• What does this buy you?
– Stating and enforcing simple protocol
properties.
– Locality of reasoning: focus only on what
changes, not what stays the same.
• It’s much harder than it sounds!
– Separation logic: Reynolds and O’Hearn.
– But it can be done: Vault Project at MSR.
Summary
• PL research is providing creative, useful
solutions to practical problems.
– Building on decades of fundamental
research, esp. in logic and type theory.
• There is much, much more coming!
– Many ripe fruit, lots of new developments.
– Many good problems yet to be addressed.
Acknowledgements
• Much of the work discussed here was
done by or in collaboration with
Karl Crary (CMU), Frank Pfenning
(CMU), David Walker (Princeton),
Greg Morrisett (Cornell).
• Plus dozens of our current and former
students at Carnegie Mellon.
Descargar

Programming Languages: The Essence of Computer Science