• An Application-Specific Language (ASL) is
a programming language that supports:
– a particular class of applications
– in a highly idiosyncratic domain
• In order to understand a particular ASL, it is
usually necessary to understand the domain
• An ASL can be extremely effective when
used for its intended purpose
• Bard is designed to perform pattern
matching and manipulation of Abstract
Syntax Trees (ASTs) in the reengineering
• In order to understand Bard, it is necessary
to understand a little about ASTs and
Outline of the talk
• This talk will cover:
What reengineering is all about
What an Abstract Syntax Tree (AST) is
What an "idiom" is
What the Idiom Tool does
How Bard and the Idiom Tool work together
The design decisions behind Bard
Legacy code
• The legacy code problem:
– About 80% of programmer time goes to
– New software methodologies are the preferred
solution (that is, do it over from scratch!)
– 40 years of working systems can't be
Software reengineering
• Software reengineering is any activity that
improves one's understanding of software
improves the software documentation
improves the software itself
brings legacy code up to more current standards
aids in a new implementation of the software
Why reengineer?
• We reengineer code for many reasons:
add functionality
ease maintenance
move to new platforms and/or new languages
facilitate reuse
improve performance
decrease cost of keeping obsolete systems
combat software rot
The reengineering process
• Reverse engineering moves from "software
assets" (code, documentation, etc.) to a
higher level of abstraction
• Transformations modify the system at some
level of abstraction
• Forward engineering moves from a higher
level of abstraction back to code
Reengineering levels
• Reengineering can be carried out at many
Idiom tool
• Idiom Tool is
the part of the
suite that
on the AST
Creating the Abstract Syntax Tree
• Parsers convert programs written in various
languages into their AST representations
• This AST can be manipulated by Idiom Tool,
or used as input for various other tools (to
create flowcharts, data flow diagrams,
concordances, etc.)
• Idiom Tool is a programmable subsystem
• The programming language used is Bard
Example of an AST (simplified)
• In C: if (n < 0) return -n;
• The ASTs can expressed in a language
called Gray
• Gray is itself an ASL (Application-Specific
• Since ASTs are trees, Gray borrows the
syntax (but not the operators) of LISP
Example Gray code
• (If_Op (Less Identifier:n Number_Literal:0)
(Return_Value (Negate Identifier:n))
Epsilon) )
• An idiom is a conventional way of
expressing an idea
• Programming languages, like natural
languages, have idioms
• Examples:
– for (i = 0; i < n; i++) A[i] = 0;
– temp = x; x = y; y = temp;
Idioms change
• Idioms vary over time and across languages
– Arrays may be zero-based or one-based
– FORTRAN used parallel arrays to simulate structs
– Languages may use array indices to simulate
– C lacks multiply-dimensioned arrays
– No early languages were object-oriented
– COBOL typically represented years as two digits
• To summarize,
– ReEngineer is a tool that uses
– parsers to translate programs in various
languages into
– Abstract Syntax Trees, which can be
represented in a linear fashion by
– Gray code, and which can be manipulated by
– Idiom Tool, which uses programs written in
– the Bard language.
The reengineering flow
Regular expressions
• Recognizing idioms in a program is a matter
of recognizing certain kinds of patterns
• Regular expressions (regexps) are a
standard, well-developed mechanism for
doing pattern recognition on strings
• Regular expressions are used by grep, Perl,
Tcl, awk, Python, vi, emacs, and a number
of other languages and tools
Common regexp operators
E xp ression
[^ x]
M ean in g
T he literal c haracter s a b c
A ny single c haracter
N egation: a ny c haracter b ut x
O ptional x
Z ero or more x 's
More regexp operators
E xp ression
(x| y )
M ean in g
O ne or m ore x 's
E ither x or y
x follow ed b y y
x at beginning o f string
V ariable
Regular expressions in Bard
• Regular expressions in Bard are modelled
after regular expressions in grep
• Because we are doing pattern matching on
ASTs rather than strings,
– Literals are represented by Gray expressions,
– Regular expression syntax must be modified to
accomodate arbitrary Gray expressions, and
– Regular expressions may occur within Gray
expressions, and vice versa
Some regexps in Bard
Ex p re ss ion
(assig nm e n t
id e ntifie r:x
num b e r_ lite ra l:0)
{no t x}
{o p t x}
{* x}
M e aning
T he lite ra l a ss ignm e nt s tate me nt
A ny node or s ub tree
N egatio n: a ny node o r s ubtree b ut
O ptio na l x
Zero o r more x' s
More regexps in Bard
Ex p re ss ion
M e aning
{+ x }
{x | y}
x y
O n e o r mo re x 's
E ith e r x o r y
x fo llo w e d b y y
V a r ia b le x
V a r ia b le s e q u e n c e x
Inner and outer languages
• Regular expressions are not enough -- we
need some way to control pattern matching
• We also need to manipulate the parts of the
AST found by pattern matching
• Solution: wrap a more-or-less conventional
"outer language" around the "inner
language" of regular expressions
• This is also the approach taken in Perl
Conventional statements in Bard
• Conventional statements in Bard include:
assignment statements
if statements
while loops
print statements
function and procedure calls
calls to native (Ada) code
tracing and debugging facilities
Special-purpose statements in Bard
• Examining and manipulating the AST:
– match pattern [at position]
– insert subtree as relative-position
• for example, insert identifier:x as first child;
delete position
replace position with subtree
find pattern [up to position]
go to position
• ReEngineer has been used to process
hundreds of thousands of lines of code
• Pattern-matching languages are not efficient
enough for this purpose
• Bard procedures are inefficient
• Bard procedures are called only when there
is a high likelihood that pattern matching
will succeed
Fast pre-screening of nodes
• Each Bard routine specifies the types of
nodes at which it might be applicable
• Idiom Tool walks the AST and calls Bard
when it finds such a node
• Since Idiom Tool is written in Ada, it can
screen nodes very, very fast
The structure of Bard procedures
• A "procedure" in Bard, like a "rule" in an
expert system, consists of two parts:
– The test part determines whether this particular
procedure is applicable at this point in the tree
• Tests must not have have side effects
– The action part consists of arbitrary Bard code
• Actions may modify the AST and/or may collect and
store information about the AST
Skeleton of a Bard procedure
• procedure name (pass, node_types)
-- may not have side effects
-- separates tests from actions
-- manipulate the AST
end procedure;
Bard example
idiom remove_null_statements;
-- Remove null statements, except empty loop bodies.
procedure remove_null_statements (1, Null_Statement);
go to first child of parent;
not match While_Op;
not match Do_While;
not match C_For_Loop;
go to @trigger;
end procedure;
end idiom;
Diffuse patterns
• A pattern is compact if it is a single
recognizable unit
• A pattern is diffuse if it consists of multiple
pieces scattered throughout the AST
• Diffuse patterns may overlap in complex ways
• The problem is to collect the various pieces
• This is not easily solved with only simple data
structures such as arrays
Dealing with diffuse patterns
• Idiom Tool makes multiple passes over the data
– Each Bard procedure indicates the pass for which it
is active
– This allows some procedures to collect information
and other procedures to process the information.
• Idiom tool uses a fact base to store information
between passes
Modifying the fact base
• Bard's fact base is modeled after that of Prolog
• Bard allows negative assertions as well as
positive assertions
• assert fact;
• deny fact;
• retract fact;
Interrogating the fact base
• Because Bard does not used the Closed
World Assumption, facts may be "true",
"false", or (unlike Prolog) "unknown"
• true fact;
• false fact;
• known fact;
• unknown fact;
Multiple occurrences
• It is sometimes necessary to recognize
multiple occurrances of the same subtree.
– For example, to recognize Pascal statements of
the form x:=x+1, you have to realize that the
identifier x is repeated
• (Assignment Identifier:?id
Unification in Bard
• When pattern matching variable ?id against
value v,
– If ?id has no prior value, it is unified with v
– If id? has a prior value, then the match succeeds
if and only if id? == v
– If a test succeeds, all unifications performed
during the test are retained
– If a test fails, all unifications performed during
the test are discarded
Summary I
• Idiom tool does multiple preorder traversals
of the AST; at each node it may trigger one
or more Bard procedures
• A Bard procedure consists of
a header that specifies appropriate node types
a test part that performs any additional tests
the keyword commit, and
an action part
Summary II
• Special features of Bard include
an "inner language" for pattern matching
procedures modeled after rules in expert systems
powerful tests that can comprise arbitrary actions
unification and unification unwinding
an integrated fact base
• These features make Bard far better for its
purpose than a general language such as Ada
Summary III
• Bard has been used
– in a Pascal to C translation system
– to assist in converting CMS-2 to C
– in a prototype Y2K system
Current status of Bard
• Currently owned by either Lockheed or
• No longer under active development
• Lost and presumed dead as a result of
corporate mergers and reorganizations
The End

Bard - Villanova University