A Survey of Rewriting
Strategies in Program
Transformation Systems
Part 1
Author: Eelco Visser
Speaker: Wei Zhao
 Section 1: introduction
 Section 2: A Taxonomy of program
 Section 3: Program Representation
 Section 4: Implementation of Program
 Section 5: Program Transformation Paradigms
 Section 6: conclusions
Section 1: Introduction
Definition of program transformation (PT)
Syntax allow us to transform a program
Semantics (intentional and extensional) to
reason the validity of transformation
PT: is the act of changing one program
into another
Source & target languages
Section 1: Introduction
 Definition of program transformation (PT)
 The aim of PT: programmer productivity,
maintainability, re-usability
 The aim of PT research:
The exist PT systems are specialized
Programming languages have structural/semantic
To capture generic, language independent schemas of
transformation; to develop efficient implementation
that scales up to large programs
 The aim of the paper: understand the
similarities/differences among PT systems;
concentrate on the transformation strategies
Section 2: Taxonomy of PT
Two main scenarios:
Translation: source and target languages are
Rephrasing: source and target languages are
the same
Sub-scenarios based on the level of
abstraction of a program and to what
extent they preserve the semantics of a
2.1 Translation scenarios
Sub-scenarios are based on the level of
abstraction of a program
Aims at preserving the extensional
semantics, but usually not possible
Sub-scenarios: Synthesis, migration,
reverse engineering, analysis
 Translate from the high level to lower level
 Design is traded for increased efficiency
 Examples:
Refinement: deriving implementation from high level
Parser/pretty-printer generation from context-free
Translate a program into another language
at the same level of abstraction
Examples: transformations between
dialects (from Fortran 77 to Fortran 90,
from Pascal to C)
Reverse Engineering
 To extract the program from a lower-level to a
high-level program or some higher-level aspects
 The dual of synthesis
 Examples:
Decompilation of an object program into a high-level
Architecture (design) extraction
Documentation generation
Software visualization (some aspects are depicted in
an abstract way)
Reduce a program to one aspect such as
its control- or data- flow.
A transformation to a sub-language to an
aspect language
2.2 Rephrasing scenarios
Rephrasing says the same thing in
different words
Improving some aspect of the program
The semantics of the program are
Sub-scenarios: normalization,
optimization, refactoring, and renovation
 Reduces a program to a program in a sublanguage decreasing its syntactic complexity
 Examples
Desugaring: syntactic sugar of a language is
transformed into more fundamental constructs
 Desugar Haskell to its kernel language
Simplification: a program is reduced to a normal
(standard form)
 Canonical form of intermediate representation
 Algebraic simplification of expressions
Improves the run-time/space performance
Constant propagation
Constant folding
Common-subexpression elimination
Dead code elimination
Refactoring improves the design of a
program by restructuring, preserving
Obfuscation makes program harder to
understand by renaming variables,
inserting dead code preventing reverse
Renovation repairs an error or bring the
program up to date with changed
The extensional semantics is changed
Example: Y2K
Section3: program representation
 Some PT systems directly work on text
 Most systems use structured representation
 so parser and unparser are needed for
conversion between text and structure
 In some programming framework (IP),
programs are stored, edited and processed as
source graphs.
 What representation form should be used?
3.1 parse tree or AST
 Internal representation of a program to be
 Parser tree contain syntactic information such as layout
(white space and comments), parentheses, and extra
nodes disambiguating grammar transformation
 Normally AST is used since this info is irrelevant for the
 Layout
 Some applications (renovation, refactoring) need restore the
layout through the transformation.
 parse tree is used
 Where to insert the comments in a generic manner? (origin
 Types
 Optimization and compilation might need type info to be stored
in an extension of a tree format
3.2 concrete or abstract syntax
 The representation of program fragments in the
specification of transformation rules
 ASTs are usually represented using data structure:
records, objects, algebraic data types, terms
 When the ASTs (in a data structure facility) are easy to
be composed and decomposed
 Such as in compiler, small fragments are manipulated each time
 AST is used directly in the specification
 So, the AST can be used as an intermediate language such as
multiple language can be expressed in.
 Otherwise (when the conceptual distance between
concrete program and the data structure access
operations used in AST is too large)
 Transformation languages support concrete object syntax for the
programmer to define the transformations while internally use
the AST.
3.3 Trees or Graphs
 Program structures can be represented by trees, DAGs, or full
fledged graphs with cycles
 Copy
 Tree requires a complete copy
 For DAG, only a pointer to the tree gets copied
 sub-tree shared by multiple context;
 reduced memory usage;
 testing of equality is cheap
 But the context need to be rebuilt when a tree is transformed because two
occurrences of a shared tree that are syntactically the same can have a
different meaning depending on their context
 Annotation
 DAG: annotations associated with each occurrences of the tree, Not
 Full fledged graph for back links (loops, link to declarations)
 More problematic: sub-graph can have links to the entire graph, it may
be required to reconstruct the entire graph after a transformation if it is
necessary to keep the original graph.
3.4 Variable binding
 Transformation system has no special knowledge for the variables in
the syntax tree
 Transformations need to be aware of variables by means of extra
conditions to avoid the problem of free variable capture and lifting
the variable occurrences out of binding
 Higher-order abstract syntax (HOAS) : transparent handling of
variable binding by encoding variable binding as lambda
 FreshML has a weaker mechanism by transparently refreshes
variable names
 All approaches that rename variables are in conflict with
requirements of preserving the original name (refactoring,
 Associating declaration info (symbol table, annotating the usage
occurrences of a symbol with the declaration)
3.5 exchange format
Program representation should be
exchangeable among the transformation
XML: supports exchange of tree shaped data
Annotated Term Format: supports exchange
of DAGs maintaining maximal sharing
Section4 implementation of PT
 A complex program transformation is achieved through a
number of consecutive modifications of a program
 A Rule: defines a basic step in the transformation of a
 A Strategy: is a plan for achieving a complex
transformation using a set of rules
 Strategy for the reproducible and automated
transformations versus the interactive transformation
 Example
let x=3 in x+y let x=3 in 3+y  3+y (inline, dead variable)
Let x=3 in x+y
let x=3 in let newfun(z)=z+y in newfun(x)
let newfun(z)=z+y in let x=3 in newfun(x)
(extract function, hoist)
4.1. Transformation rules
 Rules preserves the extensional semantics
 Some other aspects change (might be
intentional semantics): time/space resource
 A rule:
Syntactic recognition
Some semantic condition verification
The replacement contains
 A term pattern
 A function constructing a new tree or graph
 A semantic action with arbitrary side-effects
4.2 transformation strategies
 A set of transformation rules for a programming
language induces a rewrite relation on programs
 If the relation is confluent and terminating,
there is a unique normal form for every
program; the matter is applying the rules most
efficiently to reach the normal form
 In PT, it is usually not the case:
A set of transformation rules can give rise to infinite
branches (by inlining recursive function)
Inverses to undone a transformation(by distribution
or commutatively rules)
Non-confluence in which a program can be
transformed into different programs
4.2 cont’
 For a specific program it is always possible to find the
shortest path to the optimal solution for a specific
transformation task
 A strategy is a algorithm for choosing a path in the
rewrite relation
 Given one set of rules, there can be many strategies,
each achieving a different goal
 A strategy can be provided by the engine or the user:
 Fixed application order: the engine applies rules exhaustively
according to a built-in strategy
 Automatic dependency analysis: based on the analysis of rules
 Goal driven: to apply rules to achieve a user defined goal
 Strategy menu
 Completely programmable in a strategy language
 The strategy needs to be expressed and implemented
4.2 cont’ strategies languages
 Sequential composition
Consecutively, conditionally, iteratively or recursively
compose the rules
 Non-deterministic programming
Speculatively explore paths until an acceptable
solution is found
Explore all paths in parallel and choose the best (cost
function to compare the solution)
Goal based exploration: discard the path inconsistent
with a set of constraints
4.2 cont’ strategies languages
 Structural traversal:
A rewrite relation includes application of rules in any
The strategy should determine the location where the
rule is applied
find the location in the program representation
structure, apply the rule, rebuild the context
It is inefficient to directly use the tree paths (context)
Some mechanism is needed to traverse syntax trees
and to apply transformation rules at subtree
Language specific/generic transformation mechanism
Top-down / bottom-up traversal
Primitive and composite traversal
4.2 cont’ strategies languages
Information carrying strategies: strategies
may carry information that can be used in
making decisions about paths to take an
in passing context-sensitive information to
An important aspect: the scope in which the
information is valid
4.2 cont’ strategies languages
 Separation of rules and strategies
 In the implementation, the rule can be hardwired in the
definition of the strategies
 Strategies are parameterized with a set of rules
 Clearer specification that allow reasoning about smaller entities
(rules, strategies) separately
 Separation definition enables the reuse of rules, strategies, and
the generic implementation of aspects of transformation system
for a class of languages
 Intertwining may sometimes be required for efficiency reason,
but should be done by a compiler rather than the specifier
 Abstraction
 The strategy language should have proper abstraction over the
rules and strategy, I.e. it should be possible to name and
parameterize the rules and strategies
Section 5: PT paradigms

A Language Based Formalism Toward Domain Driven …