CSC 415: Translators and Compilers
Dr. Chuck Lillie
Course Outline

Translators and
Compilers
–
–
–
–
–
–
–
Chart 1
Language Processors
Compilation
Syntactic Analysis
Contextual Analysis
Run-Time Organization
Code Generation
Interpretation

Major Programming
Project
–
–
Project Definition and
Planning
Implementation

–
Weekly Status Reports
Project Presentation
Project

Implement a Compiler for the Programming
Language Triangle
–
–

Present Project Plan
–

What and How
Weekly Status Reports
–
–
–
–
Chart 2
Appendix B: Informal Specification of the Programming
Language Triangle
Appendix D: Class Diagrams for the Triangle Compiler
Work accomplished during the reporting period
Deliverable progress, as a percentage of completion
Problem areas
Planned activities for the next reporting period
Chapter 1: Introduction to
Programming Languages



Programming Language: A formal notation for
expressing algorithms.
Programming Language Processors: Tools to
enter, edit, translate, and interpret programs on
machines.
Machine Code: Basic machine instructions
–
–

Chart 3
Keep track of exact address of each data item and
each instruction
Encode each instruction as a bit string
Assembly Language: Symbolic names for
operations, registers, and addresses.
Programming Languages

High Level Languages: Notation similar to familiar
mathematical notation
–
–
–
–
–
–
Chart 4
Expressions: +, -, *, /
Data Types: truth variables, characters, integers,
records, arrays
Control Structures: if, case, while, for
Declarations: constant values, variables, procedures,
functions, types
Abstraction: separates what is to be performed from
how it is to be performed
Encapsulation (or data abstraction): group together
related declarations and selectively hide some
Programming Languages

Any system that manipulates programs
expressed in some particular programming
language
–
–
Editors: enter, modify, and save program text
Translators and Compilers: Translates text from
one language to another. Compiler translates a
program from a high-level language to a low-level
language, preparing it to be run on a machine
 Checks
–
program for syntactic and contextual errors
Interpreters: Runs program without compliation
 Command
languages
 Database query languages
Chart 5
Programming Languages
Specifications

Syntax
–
–
–

Contextual constraints
–
–

Scope: determine scope of each declaration
Type:
Semantics
–
Chart 6
Form of the program
Defines symbols
How phrases are composed
Meaning of the program
Representation

Syntax
–
Backus-Naur Form (BNF): context-free grammar




Terminal symbols (>=, while, ;)
Non-terminal symbols (Program, Command, Expression,
Declaration)
Start symbol (Program)
Production rules (defines how phrases are composed from
terminals and sub-phrases)
–
–
Syntax Tree

Chart 7
N::=a|b|….
Used to define language in terms of strings and terminal
symbols
Representation

Semantics
–
Abstract Syntax

–
Chart 8
Concentrate on phrase structure alone
Abstract Syntax Tree
Contextual Constraints

Scope
–
Binding


–
Static: determined by language processor
Dynamic: determined at run-time
Type


Statically: language processor can detect all errors
Dynamically: type errors cannot be detected until run-time
Will assume static binding and statically typed
Chart 9
Semantics

Concerned with meaning of program
–

Usually specified informally
–
–
–
Chart 10
Behavior when run
Declarative sentences
Could include side effects
Correspond to production rules
Chapter 2: Language
Processors
Translators and Compilers
 Interpreters
 Real and Abstract Machines
 Interpretive Compilers
 Portable Compilers
 Bootstrapping
 Case Study: The Triangle Language
Processor

Chart 11
Translators & Compilers

Translator: a program that accepts any text
expressed in one language (the translator’s
source language), and generates a semanticallyequivalent text expressed in another language
(its target language)
–
–
–
–
Chart 12
Chinese-into-English
Java-into-C
Java-into-x86
X86 assembler
Translators & Compilers

Assembler: translates from an assembly
language into the corresponding machine code
–

Compiler: translates from a high-level language
into a low-level language
–
Chart 13
Generates one machine code instruction per source
instruction
Generates several machine-code instructions per
source command.
Translators & Compilers


Disassembler: translates a machine code into the
corresponding assembly language
Decompiler: translates a low-level language into
a high-level language
Question: Why would you want a disassembler or decompiler?
Chart 14
Translators & Compilers


Source Program: the source language text
Object Program: the target language text
Compiler
Source
Program
Syntax Check
Context Constraints
Generate Object Code
Object
Program
Semantic Analysis
Chart 15
• Object program semantically equivalent to source program
 If source program is well-formed
Translators & Compilers

Why would you want to do:
–
–
–
Chart 16
Java-into-C translator
C-into-Java translator
Assembly-language-into-Pascal decompiler
Translators & Compilers
P
L
P = Program Name
L = Implementation Language
M
M = Target Machine
P
L
For this to work, L must equal M, that
is, the implementation language must be
the same as the machine language
M
S
T
L
Chart 17
S-into-T Translator is
itself a program that
runs on machine L
S = Source Language
T = Target Language
L = Translator’s Implementation Language
Translators & Compilers
P
S
S
T
P
T
M
M
•
•
•
•
Chart 18
Translating a source program P
Expressed in language T,
Using an S-into-T translator
Running on machine M
Translators & Compilers
sort
sort
Java Java x86 x86
x86
x86
•
•
•
•
Translating a source program sort
Expressed in language Java,
Using an Java-into-x86 translator
Running on an x86 machine
The object program is running on
the same machine as the compiler
Chart 19
sort
x86
x86
Translators & Compilers
sort
sort
Java Java PPC PPC
x86
download
sort
PPC
PPC
x86
•
•
•
•
•
Translating a source program sort
Expressed in language Java,
Using an Java-into-PPC translator
Running on an x86 machine
Downloaded to a PPC machine
Cross Compiler: The object program is running
on a different machine than the compiler
Chart 20
Translators & Compilers
sort
Java Java
x86
x86
•
•
•
•
C
sort
C
sort
x86 x86
C
x86
x86
x86
Translating a source program sort
Expressed in language Java,
Using an Java-into-C translator
Running on an x86 machine
•
•
•
•
Then translating the C program
Using an C-into x86 compiler
Running on an x86 machine
Into x86 object program
Two-stage Compiler: The source program is
translated to another language before being
translated into the object program
Chart 21
sort
x86
Translators & Compilers

Translator Rules
–
–
–
–
Chart 22
Can run on machine M only if it is expressed in
machine code M
Source program must be expressed in translator’s
source language S
Object program is expressed in the translator’s target
language T
Object program is semantically equivalent to the
source program
Interpreters

Accepts any program (source program)
expressed in a particular language (source
language) and runs that source program
immediately
–
Chart 23
Does not translate the source program into object code
prior to execution
Interpreters
Interpreter
Source
Program
Fetch Instruction
Analyze Instruction
Execute Instruction
• Source program starts to run as soon
as the first instruction is analyzed
Chart 24
Program
Complete
Interpreters

When to Use Interpretation
–
–
–
–

Disadvantages
–
Chart 25
Interactive mode – want to see results of instruction
before entering next instruction
Only use program once
Each instruction expected to be executed only once
Instructions have simple formats
Slow: up to 100 times slower than in machine code
Interpreters

Examples
–
–
–
–
Chart 26
Basic
Lisp
Unix Command Language (shell)
SQL
Interpreters
S
L
P
S
S
M
M
graph
Basic
Basic
x86
x86
Chart 27
S interpreter expressed in language L
Program P expressed in
language S, using
Interpreter S, running on
machine M
Program graph written in
Basic running on a Basic
interpreter executed on an
x86 machine
Real and Abstract Machines

Hardware emulation: Using software to execute
one set of machine code on another machine
–
–
–
Can measure everything about the new machine
except its speed
Abstract machine: emulator
Real machine: actual hardware
An abstract machine is functionally equivalent to a real
machine if they both implement the same language L
Chart 28
Real and Abstract Machines
New Machine Instruction
(nmi) interpreter written in C
nmi
C
nmi interpreter written in C
nmi
C
nmi
M M
C
M
nmi interpreter expressed in
machine code M
The nmi interpreter is translated
into machine code M using the
C compiler
M
Compiler to translate C
program into M machine code
P
nmi
nmi
M
M
Chart 29
P
nmi
nmi
Interpretive Compilers

Combination of compiler and interpreter
–
–
–
–
Translate source program into an intermediate
language
It is intermediate in level between the source language
and ordinary machine code
Its instructions have simple formats, and therefore can
be analyzed easily and quickly
Translation from the source language into the
intermediate language is easy and fast
An interpretive compiles combines fast
compilation with tolerable running speed
Chart 30
Interpretive Compilers
Java
JVM
M
JVM code interpreter
running on machine M
JVM
M
P
Java
Java into JVM translator
running on machine M
Java
JVM
M
M
P
JVM
P
JVM
JVM
M
M
A Java program P is first translated into JVM-code,
and then the JVM-code object program is interpreted
Chart 31
Portable Compilers

A program is portable if it can be compiled and
run on any machine, without change
–
–
A portable program is more valuable than an
unportable one, because its development cost can be
spread over more copies
Portability is measured by the proportion of code that
remains unchanged when it is moved to a dissimilar
machine

Language affects protability
–
Assembly language: 0% portable
– High level language: approaches 100% portability
Chart 32
Portable Compilers

Language Processors
–
–
Valuable and widely used programs
Typically written in high-level language

–
Part of language processor is machine dependent


–
Chart 33
Pascal, C, Java
Code generation part
Language processor is only about 50% portable
Compiler that generates intermediate code is more
portable than a compiler that generates machine code
Portable Compilers
Java
JVM
Java
JVM
C
JVM
C
C
M
Java
JVM
JVM
Java
JVM
Rewrite interpreter in C
JVM
M
P
Java
Java
JVM
M
JVM
M
JVM
M
P
JVM
M
Chart 34
Note: C M Compiler exists; rewrite
JVM interpreter from Java to C
P
JVM
JVM
M
M
Bootstrapping

The language processor is used to process itself
–

Bootstrapping a portable compiler
–

–
Compiler expressed in itself but targeted for another machine
Bootstrapping to improve efficiency
–
Chart 35
Writing the compiler in itself
Using the latest version to upgrade the next version
Half bootstrap
–

A portable compiler can be bootstrapped to make a true compiler
– one that generates machine code – by writing an intermediatelanguage-into-machine-code translator
Full bootstrap
–

Implementation language is the source language
Upgrade the compiler to optomize code generation as well as to
improve compile efficiency
Bootstrapping
Bootstrap an interpretive compiler to generate machine code
Java
M
Java
Java
M
Java Java
Java
JVM
JVM JVM
JVM
First, write a JVMcoded-into-M
translator in Java
M
JVM
M
Java
M
JVM JVM
JVM
Next, compile
translator using
existing interpreter
JVM
M
M
Java
Java
JVM
JVM JVM
M
M
M
Chart 36
JVM
M
Translate Javainto-JVM-code
translator into
machine code
P
Java
M
M
Use translator
to translate
itself
M
Java
JVM
M
M
P
JVM
Two stage
Java-into-M
compiler
JVM
M
M
M
M
P
M
Bootstrapping
Full bootstrap
v1
Ada-S
M
Ada-S
C
Ada-S
M
C
Write Ada-S
compiler in C
v2
v1
v1
C
M
M
Ada-S
M
Ada-S
Convert the C
version of Ada-S
into Ada-S
version of Ada-S
M
M
v2
v2
Ada-S
M
v1
Ada-S Ada-S
M
M
Chart 37
Ada-S
M
M
M
v3
v3
M
Ada
v3
Ada
M
Ada-S
Extend Ada-S
compiler to (full)
Ada compiler
M
v2
Ada-S Ada-S
M
M
Ada
M
M
M
Bootstrapping
Half bootstrap
Ada
HM
Ada
Ada
Ada
Ada
HM
Ada
HM
Ada
TM
Ada
TM
Ada
P
Ada
TM
HM HM
Ada
HM
HM
HM
Ada
Ada
Ada
TM
Ada
HM
HM
Chart 38
TM
TM
TM
TM
P
TM
P
TM
TM
Bootstrapping
Bootstrap to improve efficiency
v1
v1
Ada
Ada
Ms
Ms
Ms
Ada
v2
Ada
v2
v2
Mf
Ada
Ada
Ada
Mf
v1
Ada
Ms
Ms
M
P
Ada
v2
Ada
Mf
Ms
Chart 39
Ada
Mf
Ms
Chapter 3: Compilation

Phases
–
–
–

Passes
–
–
–

Chart 40
Syntactic Analysis
Contextual Analysis
Code Generation
Multi-pass Compilation
One-pass Compilation
Compiler Design Issues
Case Study: The Triangle Compiler
Phases

Syntactic Analysis
–

Contextual Analysis
–

The parsed program is analyzed to check whether it
conforms to the source language's contextual
constraints
Code Generation
–
Chart 41
The source program is parsed to check whether it
conforms to the source language’s syntax, and to
determine its phrase structure
The checked program is translated to an object
program, in accordance with the semantics of the
source and target languages
Phases
Source Program
Syntactic
Analysis
Error Report
AST
Contextual
Analysis
Error Report
Decorated AST
Code
Generation
Object Program
Chart 42
Syntactic Analysis

To determine the source program’s phrase structure
–
–
Parsing
Contextual analysis and code generation must know how the
program is composed

–
–

Check for conformance to the source language’s syntax
Construct suitable representation of its phrase structure (AST)
AST
–
–
–
–
Chart 43
Commands, expressions, declarations, …
Terminal nodes corresponding to identifiers, literals, and operators
Sub trees representing the phases of the source program
Blanks and comments not in AST (no meaning)
Punctuation and brackets not in AST (only separate and enclose)
Contextual Analysis

Analyzes the parsed program
–
–

Produces decorated AST
–
–
–
Chart 44
Scope rules
Type rules
AST with information gathered during contextual
analysis
Each applied occurrence of an identifier is linked ot the
corresponding declaration
Each expression is decorated by its type T
Code Generation

The final translation of the checked program to an
object program
–

After syntactic and contextual analysis is completed
Treatment of identifiers
–
Constants


–
Variables



Binds identifier to some memory address
Replace each occurrence of identifier by address
Target language
–
–
Chart 45
Binds identifier to value
Replace each occurrence of identifier with value
Assembly language
Machine code
Passes

Multi-pass compilation
–
Traverses the program or AST several times
Compiler
Driver
Syntactic
Analyzer

Contextual
Analyzer
Code
Generator
One-pass compilation
–
–
Single traverse of program
Contextual analysis and code
generation are performed ‘on the fly’
during syntactic analysis
Contextual
Analyzer
Chart 46
Compiler
Driver
Syntactic
Analyzer
Code
Generator
Compiler Design Issues

Speed
–

Space
–

To optimize code – must have multi-pass compiler
Source language properties
–
Chart 47
Multi-pass compiler is more flexible because it generates an AST
that can be traversed in any order by the other phases
Semantics-preserving transformations
–

Multi-pass compiler more modular than one-pass compiler
Flexibility
–

Storage: size of compiler + files generated
Modularity
–

Compiler run time
May restrict compiler choice – some language constructs may
require multi-pass compilers
Chapter 4: Syntactic Analysis






Chart 48
Sub-phases of Syntactic Analysis
Grammars Revisited
Parsing
Abstract Syntax Trees
Scanning
Case Study: Syntactic Analysis in the
Triangle Compiler
Structure of a Compiler
Source code
Lexical
Analyzer
tokens
Parser & Semantic
Analyzer
parse tree
Intermediate Code
Generation
intermediate representation
Optimization
intermediate representation
Assembly code
Chart 49
Assembly Code
Generation
Symbol
Table
Syntactic Analysis

Main function
–
–
–
–
Chart 50
Parse source program to discover its phrase structure
Recursive-descent parsing
Constructing an AST
Scanning to group characters into tokens
Sub-phases of Syntactic Analysis

Scanning (or lexical analysis)
–
Source program transformed to a stream of tokens





–

–
–
To determine the source programs phrase structure
Source program is input as a stream of tokens (from the Scanner)
Treats each token as a terminal symbol
Representation of phrase structure
–
Chart 51
Comments and blank spaces discarded
Parsing
–

Identifiers
Literals
Operators
Keywords
Punctuation
AST
Lexical Analysis –
A Simple Example


Chart 52
Scan the file character by
character and group characters
into words and punctuation
(tokens), remove white space
and comments
Some tokens for this example:
main
(
)
{
int
a
,
b
,
c
;
Main() {
int a, b, c;
char number[5];
/* get user inputs */
A = atoi ( gets(number));
B = atoi (gets(number));
/* calculate value for c */
C = 2*(a+b) + a*(a+b);
/* print results */
Printf(“%d”,c);
}
Creating Tokens –
Mini-Triangle Example
let var y: Integer
in !new year
y := y+1
Buffer
Input
Converter
character string (S= space)
l e t S
v a r S
y : S
I n t e g e r S
i n
....
Scanner
Chart 53
let
var
Ident.
colon
Ident.
in
Ident.
becomes
Ident.
op.
Intlit.
let
var
y
:
Integer
in
y
:=
y
+
1
eot
Tokens in Triangle
// literals, identifiers, operators...
INTLITERAL
= 0,
CHARLITERAL
= 1,
IDENTIFIER
= 2,
OPERATOR
= 3,
"<int>",
"<char>",
"<identifier>",
"<operator>",
// reserved words - must be in alphabetical order...
ARRAY
= 4,
"array",
BEGIN
= 5,
"begin",
CONST
= 6,
"const",
DO
= 7,
"do",
ELSE
= 8,
"else",
END
= 9,
"end",
FUNC
= 10,
"func",
IF
= 11,
"if",
IN
= 12,
"in",
LET
= 13,
"let",
OF
= 14,
"of",
PROC
= 15,
"proc",
RECORD
= 16,
"record",
THEN
= 17,
"then",
TYPE
= 18,
"type",
VAR
= 19,
"var",
WHILE
= 20,
"while",
Chart 54
// punctuation...
DOT
COLON
SEMICOLON
COMMA
BECOMES
IS
= 21,
= 22,
= 23,
= 24,
= 25,
= 26,
".",
":",
";",
",",
"~",
// brackets...
LPAREN
RPAREN
LBRACKET
RBRACKET
LCURLY
RCURLY
= 27,
= 28,
= 29,
= 30,
= 31,
= 32,
"(",
")",
[",
"]",
"{",
"}",
// special tokens...
EOT
ERROR
= 33,
= 34;
"",
"<error>"
Grammars Revisited

Context free grammars
–
–
–

Chart 55
Generates a set of sentences
Each sentence is a string of terminal symbols
An unambiguous sentence has a unique phrase
structure embodied in its syntax tree
Develop parsers from context-free grammars
Regular Expressions


A regular expression (RE) is a convenient
notation for expressing a set of stings of terminal
symbols
Main features
–
–
–
Chart 56
‘|’ separates alternatives
‘*’ indicates that the previous item may be represented
zero or more times
‘(‘ and ‘)’ are grouping parentheses
Regular Expression Basics
 e The empty string a special string of length 0

Regular expression operations
–
–
–
Chart 57
| separates alternatives
* indicates that the previous item may be represented
zero or more times (repetition)
( and ) are grouping parentheses
Regular Expression Basics

Algebraic Properties
–
| is commutative and associative


–
Concatenation is associative

–

–
Chart 58
(rs)t = r(st)
Concatenation distributes over |

–
r|s = s|r
r|(s|t) = (r|s)|t
r(s|t) = rs|rt
(s|t)r = sr|tr
e is the identity for concatenation
er=r
 re=r
* is idempotent

r** = r*

r* = (r| e)*
Regular Expression Basics

Common Extensions
–
–
r+ one or more of expression r, same as rr*
rk k repetitions of r

–
~r the characters not in the expression r

–
Chart 59
~[\t\n]
r-z range of characters

–
r3 = rrr
[0-9a-z]
r? Zero or one copy of expression (used for fields of an
expression that are optional)
Regular Expression Example

Regular Expression for Representing Months
–
Examples of legal inputs


–
January represented as 1 or 01
October represented as 10
First Try: [0|1|e][0-9]

Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

Matches any illegal inputs? Yes
0, 00, 18
Chart 60
Regular Expression Example

Regular Expression for Representing Months
–
Examples of legal inputs


–
January represented as 1 or 01
October represented as 10
Second Try: [1-9]|(0[1-9])|(1[0-2])

Matches all legal inputs? Yes
1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

Chart 61
Matches any illegal inputs? No
Regular Expression Example

Regular Expression for Floating Point Numbers
–
Examples of legal inputs


–
1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6
Assume that a 0 is required before numbers less than 1 and
does not prevent extra leading zeros, so numbers such as
0011 or 0003.14159 are legal
Building the regular expression

Assume
Digit  0|1|2|3|4|5|6|7|8|9

Handle simple decimals such as 1.0, 0.2, 3.14159
Digit+.digit+

Chart 62
Add an optional sign (only minus, no plus)
– (-| e)digit+.digit+
or
-?digit+.digit+
Regular Expression Example

Regular Expression for Floating Point
Numbers (cont.)
–
Building the regular expression (cont.)
 Format
for the exponent
(E|e)(+|-)?(digit+)
 Adding
it as an optional expression to the decimal
part
(-| e)digit+.digit+((E|e)(+|-)?(digit+))?
Chart 63
Extended BNF

Extended BNF (EBNF)
–
–
–
Combination of BNF and RE
N::=X, where N is a nonterminal symbol and X is an
extended RE, i.e., an RE constructed from both
terminal and nonterminal symbols
EBNF


Chart 64
Right hand side may use |. *, (, )
Right hand side may contain both terminal and nonterminal
symbols
Example EBNF
Expression
::=
Primary-Expression
::=
|
Identifier
::=
a|b|c|d|e
Operator
::=
+|-|*|/
Generates
e
a+b
a–b–c
a + (b * c)
a + (b + c) / d
a – (b – (c – (d – e)))
Chart 65
primary-Expression (Operator primary-Expression)*
Identifier
( Expression )
Grammar Transformations

Left Factorization
XY | XZ is equivalent to X(Y | Z)
Chart 66
single-Command ::=
|
|
V-name := Expression
if Expression then single-Command
if Expression then single-Command
else single-Command
single-Command ::=
|
V-name := Expression
if Expression then single-Command
(e |else single-Command)
Grammar Transformations

Chart 67
Elimination of left recursion
N::= X | NY is equivalent to N::=X(Y)*
Identifier
::=
|
|
Letter
Identifier Letter
Identifier Digit
Identifier
::=
|
Letter
Identifier (Letter | Digit)
Identifier
::=
Letter(Letter | Digit)*
Grammar Transformations

Substitution of nonterminal symbols
Given N::=X, we can substitute each occurrence of N with X
iff N::=X is nonrecursive and is the only production rule for N
single-Command
::=
Control-Variable
To-or-Downto
|
::=
::=
|
single-Command
::=
|
Chart 68
for Control-Variable := Expression To-or-Downto
Expression do single-Command
…
Identifier
to
down
for Identifier := Expression (to|downto)
Expression do single-Command
…
Scanning (Lexical Analysis)


The purpose of scanning is to recognize tokens in
the source program. Or, to group input
characters (the source program text) into tokens.
Difference between parsing and scanning:
–
–
Chart 69
Parsing groups terminal symbols, which are tokens,
into larger phrases such as expressions and
commands and analyzes the tokens for correctness
and structure
Scanning groups individual characters into tokens
Structure of a Compiler
Source code
Lexical
Analyzer
tokens
Parser & Semantic
Analyzer
parse tree
Intermediate Code
Generation
intermediate representation
Optimization
intermediate representation
Assembly code
Chart 70
Assembly Code
Generation
Symbol
Table
Creating Tokens –
Mini-Triangle Example
let var y: Integer
in !new year
y := y+1
Buffer
Input
Converter
character string (S= space)
l e t S
v a r S
y : S
I n t e g e r S
i n
....
Scanner
Chart 71
let
var
Ident.
colon
Ident.
in
Ident.
becomes
Ident.
op.
Intlit.
let
var
y
:
Integer
in
y
:=
y
+
1
eot
What Does a Scanner Do?

Hand keywords (reserve words)
–
–
Recognizes identifiers and keywords
Match explicitly


–
Match as an identifier, perform lookup


Chart 72
Write regular expression for each keyword
Identifier is any alpha numeric string which is not a keyword
No special regular expressions for keywords
When an identifier is found, perform lookup into preloaded
keyword table
How does Triangle handle keywords?
Discuss in terms of efficiency and ease to code.
What Does a Scanner Do?

Remove white space
–

Tabs, spaces, new lines
Remove comments
–
Single line
-- Ada comment
–
Multi-line, start and end delimiters
{ Pascal comment }
/* c comment */
–
–
Nested
Runaway comments

Chart 73
Nonterminated comments can’t be detected till end of file
What Does a Scanner Do?

Perform look ahead
–

Multi-character tokens
1..10 vs. 1.10
&, &&
<, <=
etc
Challenging input languages
–
FORTRAN



Keywords not reserved
Blanks are not a delimiter
Example (comma vs. decimal)
DO10I=1,5 start of a do loop (equivalent to a C for loop)
DO10I=1.5 an assignment statement, assignment to variable DO10I
Chart 74
What Does a Scanner Do?

Challenging input languages (cont.)
–
PL/I, keywords not reserved
IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;
Chart 75
What Does a Scanner Do?

Error Handling
–
–
Error token passed to parser which reports the error
Recovery


–
Examples of lexical errors:


–
Delete characters from current token which have been read so
far, restart scanning at next unread character
Delete the first character of the current lexeme and resume
scanning form next character.
3.25e bad format for a constant
Var#1illegal character
Some errors that are not lexical errors

Mistyped keywords
–


Chart 76
Begim
Mismatched parenthesis
Undeclared variables
Scanner Implementation

Issues
–
–
–
Chart 77
Simpler design – parser doesn’t have to worry about
white space, etc.
Improve compiler efficiency – allows the construction
of a specialized and potentially more efficient
processor
Compiler portability is enhanced – input alphabet
peculiarities and other device-specific anomalies can
be restricted to the scanner
Scanner Implementation



What are the keywords in Triangle?
How are keywords and identifiers implemented in
Triangles?
Is look ahead implemented in Triangle?
–
Chart 78
If so, how?
Structure of a Compiler
Lexical
Analyzer
Source code
tokens
Parser
Semantic
Analyzer
parse tree
Intermediate Code
Generation
intermediate representation
Optimization
intermediate representation
Assembly code
Chart 79
Assembly Code
Generation
Symbol
Table
Parsing

Given an unambiguous, context free grammar,
parsing is
–
–
Recognition of an input string, i.e., deciding whether or
not the input string is a sentence of the grammar
Parsing of an input string, i.e., recognition of the input
string plus determination of its phrase structure. The
phrase structure can be represented by a syntax tree,
or otherwise.
Unambiguous is necessary so that every sentence
of the grammar will form exactly one syntax tree.
Chart 80
Parsing


The syntax of programming language constructs are
described by context-free grammars.
Advantages of unambiguous, context-free grammars
–
–
–
–
Chart 81
A precise, yet easy-to understand, syntactic specification of
the programming language
For certain classes of grammars we can automatically
construct an efficient parser that determines if a source
program is syntactically well formed.
Imparts a structure to a programming language that is
useful for the translation of source programs into correct
object code and for the detection of errors.
Easier to add new constructs to the language if the
implementation is based on a grammatical description of the
language
Parsing
sequence of tokens


–
Chart 82
syntax tree
Check the syntax (structure) of a program and
create a tree representation of the program
Programming languages have non-regular
constructs
–

parser
Nesting
Recursion
Context-free grammars are used to express the
syntax for programming languages
Context-Free Grammars

Comprised of
–
–
–
–

Example:
1.
2.
3.
4.
Chart 83
A set of tokens or terminal symbols
A set of non-terminal symbols
A set of rules or productions which express the legal
relationships between symbols
A start or goal symbol
expr  expr – digit
expr  expr + digit
expr  digit
digit  0|1|2|…|9
 Tokens: -,+,0,1,2,…,9
 Non-terminals: expr, digit
 Start symbol: expr
Context-Free Grammars
1.
expr  expr – digit
2.
expr  expr + digit
3.
expr  digit
4.
digit  0|1|2|…|9
Example input:
3+8-2
expr
expr
digit
3
Chart 84
expr
-
+
digit
8
digit
2
Checking for Correct Syntax


Given a grammar for a language and a program,
how do you know if the syntax of the program is
legal?
A legal program can be derived from the start
symbol of the grammar
Grammar must be unambiguous and context-free
Chart 85
Deriving a String



The derivation begins with the start symbol
At each step of a derivation the right hand side of a
grammar rule is used to replace a non-terminal symbol
Continue replacing non-terminals until only terminal
symbols remain
1. expr
 expr – digit
2. expr
 expr + digit
3. expr
 digit
4. digit
 0|1|2|…|9
Example input:
3+8-2
Chart 86
Rule 1
Rule 4
Rule 2
expr  expr – digit  expr – 2  expr + digit - 2
Rule 4
Rule 3
Rule 4
 expr + 8-2  digit + 8-2  3+8 -2
Rightmost Derivation

The rightmost non-terminal is replaced in each step
Rule 1
1. expr
 expr – digit
2. expr
 expr + digit
expr  expr – digit
Rule 4
expr – digit  expr – 2
Rule 2
3. expr
 digit
expr – 2  expr + digit - 2
4. digit
 0|1|2|…|9
expr + digit - 2  expr + 8-2
Example input:
3+8-2
Rule 4
Rule 3
expr + 8-2  digit + 8-2
Rule 4
digit + 8-2  3+8 -2
Chart 87
Leftmost Derivation

The leftmost non-terminal is replaced in each step
Rule 1
1. expr
 expr – digit
2. expr
 expr + digit
expr  expr – digit
Rule 2
expr – digit  expr + digit – digit
Rule 3
3. expr
 digit
expr + digit – digit  digit + digit – digit
4. digit
 0|1|2|…|9
digit + digit – digit  3 + digit – digit
Example input:
3+8-2
Rule 4
Rule 4
3 + digit – digit  3 + 8 – digit
Rule 4
3 + 8 – digit  3 + 8 – 2
Chart 88
Leftmost Derivation

The leftmost non-terminal is replaced in each step
1
2
expr
3
expr
4
digit
expr
Rule 1
6
-
expr – digit  expr + digit – digit
3
expr + digit – digit  digit + digit – digit
4
digit + digit – digit  3 + digit – digit
digit
5
+
digit
2
Chart 89
Rule 2
2
5
8
6
3
expr  expr – digit
1
Rule 3
Rule 4
Rule 4
3 + digit – digit  3 + 8 – digit
Rule 4
3 + 8 – digit  3 + 8 – 2
Bottom-Up Parsing



Parser examines terminal symbols of the input
string, in order from left to right
Reconstructs the syntax tree from the bottom
(terminal nodes) up (toward the root node)
Bottom-up parsing reduces a string w to the start
symbol of the grammar.
–
Chart 90
At each reduction step a particular sub-string matching
the right side of a production is replaced by the symbol
on the left of that production, and if the sub-string is
chosen correctly at each step, a rightmost derivation is
traced out in reverse.
Bottom-Up Parsing

Types of bottom-up parsing algorithms
–
Shift-reduce parsing

–
LR(k) parsing

Chart 91
At each reduction step a particular sub-string matching the
right side of a production is replaced by the symbol on the left
of that production, and if the sub-string is chosen correctly at
each step, a rightmost derivation is traced out in reverse.
L is for left-to-right scanning of the input, the R is for
constructing a right-most derivation in reverse, and the k is for
the number of input symbols of look-ahead that are used in
making parsing decisions.
Bottom-Up Parsing Example
3+8-2
1.
expr  expr – digit
2.
expr  expr + digit
3.
expr  digit
4.
digit  0|1|2|…|9
Example input:
3+8-2
3
+
8
-
2
+
8
-
2
-
2
digit
3
digit
3
digit
+
expr
digit
3
Chart 92
8
digit
+
8
-
2
Bottom-Up Parsing Example
3+8-2
expr
digit
3
digit
+
8
-
2
expr
digit
3
digit
digit
+
8
-
2
expr
expr
digit
Chart 93
3
digit
digit
+
8
-
2
Bottom-Up Parsing Example
abbcde
1.
S  aABe
2.
A  Abc | b
3.
Bd
a
Example input:
abbcde
b
b
c
d
e
b
c
d
e
c
d
e
A
a
b
Abbcde  aAbcde
A
a
b
aAbcde
Chart 94
b
Bottom-Up Parsing Example
abbcde
1.
S  aABe
2.
A  Abc | b
3.
Bd
A
A
Example input:
abbcde
a
b
b
aAbcde  aAde
c
d
e
c
d
e
A
A
a
b
aAde
Chart 95
b
Bottom-Up Parsing Example
abbcde
1.
S  aABe
2.
A  Abc | b
3.
Bd
Example input:
abbcde
A
B
A
a
b
b
aAde  aABe
c
A
d
e
B
A
a
b
aABe
Chart 96
b
c
d
e
Bottom-Up Parsing Example
abbcde
1.
S  aABe
2.
A  Abc | b
3.
Bd
S
Example input:
abbcde
A
B
A
a
b
aABe  S
Chart 97
b
c
d
e
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
the
cat
sees
a
rat
.
Noun
.
the
cat
sees
a
rat
the cat sees a rat.  the Noun sees a rat.
Noun
the
cat
sees
the Noun sees a rat.
Chart 98
a
rat
.
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
Subject
Noun
the
cat
sees
a
rat .
the Noun sees a rat.  Subject sees a rat.
Subject
Noun
the
cat
sees
Subject sees a rat.
Chart 99
a
rat
.
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
Subject
Noun
Verb
the
cat
sees
a
rat .
Subject sees a rat.  Subject Verb a rat.
Subject
Noun
Verb
the
cat
sees
Subject Verb a rat.
Chart 100
a
rat
.
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
Subject
Noun
Verb
Noun
the
cat
sees
a
rat .
Subject Verb a rat.  Subject Verb a Noun.
Subject
Noun
Verb
the
cat
sees
a
Subject Verb a Noun.
Chart 101
Noun
rat
.
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
Subject
Object
Noun
Verb
cat
sees
the
Noun
a
rat
.
Subject Verb a Noun.  Subject Verb Object.
What would happened if we
choose
‘Subject  a Noun’ instead
of ‘Object  a Noun’?
Subject
the
Object
Noun
Verb
cat
sees
Subject Verb Object.
Chart 102
Noun
a
rat
.
Bottom-Up Parsing Example
the cat sees a rat.
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
the
Object
Noun
Verb
cat
sees
Subject Verb Object.
Chart 103
Noun
a
rat
.
Top-Down Parsing


The parser examines the terminal symbols of the
input string, in order from left to right.
The parser reconstructs its syntax tree from the
top (root node) down (towards the terminal
nodes).
An attempt to find the leftmost
derivation for an input string
Chart 104
Top-Down Parsers

General rules for top-down parsers
–
–
–
–
–
Chart 105
Start with just a stub for the root node
At each step the parser takes the left most stub
If the stub is labeled by terminal symbol t, the parser
connects it to the next input terminal symbol, which
must be t. (If not, the parser has detected a syntactic
error.)
If the stub is labeled by nonterminal symbol N, the
parser chooses one of the production rules N::=
X1…Xn, and grows branches from the node labeled by
N to new stubs labeled X1,…, Xn (in order from left to
right).
Parsing succeeds when and if the whole input string is
connected up to the syntax tree.
Top-Down Parsing

Two forms
–
Backtracking parsers

–
Guesses which rule to apply, back up, and changes choices if
it can not proceed
Predictive Parsers

Predicts which rule to apply by using look-ahead tokens
Backtracking parsers are not
very efficient. We will cover
Predictive parsers
Chart 106
Predictive Parsers

Many types
–
LL(1) parsing


–
Recursive decent parsing

Chart 107
First L is scanning the input form left to right; second L is for
producing a left-most derivation; 1 is for using one input
symbol of look-ahead
Table driven with an explicit stack to maintain the parse tree
Uses recursive subroutines to traverse the parse tree
Predictive Parsers (Lookahead)

Lookahead in predictive parsing
–
–
The lookahead token (next token in the input) is used
to determine which rule should be used next
For example:
term
1.
term  num term’
2.
term’  ‘+’ num term’ |
‘-’ num term’ | e
–
num  ‘0’|’1’|’2’|…|’9’
term’
num
term
Example input:
7+3-2
term’
num
7
+
Chart 108
num
term’
Predictive Parsers (Lookahead)
1.
term  num term’
2.
term’  ‘+’ num term’ |
‘-’ num term’ | e
–
num  ‘0’|’1’|’2’|…|’9’
term
term’
num
7
Example input:
7+3-2
num
+
term’
3
term
term’
num
7
3
Chart 109
term’
num
+
-
num
term’
Predictive Parsers (Lookahead)
1.
term  num term’
2.
term’  ‘+’ num term’ |
‘-’ num term’ | e
–
num  ‘0’|’1’|’2’|…|’9’
term
term’
num
+
7
Example input:
7+3-2
3
num
term’
-
num
term’
2
term
term’
num
7
+
3
Chart 110
num
term’
-
num
term’
2
e
Recursive-Decent Parsing

Top-down parsing algorithm
–
–
–
Chart 111
Consists of a group of methods (programs) parseN,
one for each nonterminal symbol N of the grammar.
The task of each method parseN is to parse a single
N-phrase
These parsing methods cooperate to parse complete
sentences
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Example input:
the cat sees a rat
Sentence
Verb
Subject
the
cat
Object
sees
.
a
rat
.
a. Decide which production rule to apply. Only one, #1.
This step created four stubs.
Chart 112
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Example input:
the cat sees a rat
Object
.
Noun
the
Chart 113
Verb
Subject
cat
sees
a
rat
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
Object
.
Noun
the
Chart 114
Verb
cat
sees
a
rat
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
Object
.
Noun
the
Chart 115
Verb
cat
sees
a
rat
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
cat
.
Object
Noun
the
Chart 116
Verb
Noun
sees
a
rat
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
cat
.
Object
Noun
the
Chart 117
Verb
Noun
sees
a
rat
Recursive-Decent Parsing
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Sentence
Subject
Example input:
the cat sees a rat
cat
.
Object
Noun
the
Chart 118
Verb
Noun
sees
a
rat
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 119
ParseSentence
ParseSubject
ParseObject
ParseVerb
ParseNoun
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 120
ParseSentence
parseSubject
parseVerb
parseObject
parseEnd
Sentence 
Subject
Verb
Object
.
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 121
Subject 
ParseSubject
I
if input = “I”
accept
|
a
else if input =“a”
accept
Noun
parseNoun
the
else if input = “the” |
accept
Noun
parseNoun
else error
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 122
ParseNoun
if input = “cat”
accept
else if input =“mat”
accept
else if input = “rat”
accept
else error
Noun 
cat
|
mat
|
rat
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 123
Object 
ParseObject
me
if input = “me”
accept
|
a
else if input =“a”
accept
Noun
parseNoun
|
the
else if input = “the”
accept
Noun
parseNoun
else error
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 124
ParseVerb
if input = “like”
accept
else if input =“is”
accept
else if input = “see”
accept
else if input = “sees”
accept
else error
Verb 
like
|
is
|
see
|
sees
Recursive-Descent Parser for
Micro-English
1.
Sentence  Subject Verb Object.
2.
Subject  I | a Noun | the Noun
3.
Object  me | a Noun | the Noun
4.
Noun  cat | mat | rat
5.
Verb  like | is | see | sees
Chart 125
ParseEnd
if input = “.”
accept
else error
.
Systematic Development of a
Recursive-Descent Parser

Given a (suitable) context-free grammar
–
Express the grammar in EBNF, with a single production rule for
each nonterminal symbol, and perform any necessary grammar
transformations


–
–
Transcribe each EBNF production rule N::=X to a parsing method
parseN, whose body is determined by X
Make the parser consist of:




Chart 126
Always eliminate left recursion
Always left-factorize whenever possible
A private variable currentToken;
Private parsing methods developed in previous step
Private auxiliary methods accept and acceptIt, both of which call the
scanner
A public parse method that calls parseS, where S is the start symbol
of the grammar), having first called the scanner to store the first input
token in currentToken
Quote of the Week

“C makes it easy to shoot yourself in the foot;
C++ makes it harder, but when you do, it blows
away your whole leg.”
–
Chart 127
Bjarne Stroustrup
Quote of the Week
Did you really say that?
Dr. Bjarne Stroustrup:
Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot;
C++ makes it harder, but when you do, it blows your whole leg off. What people tend to
miss is that what I said about C++ is to a varying extent true for all powerful languages. As
you protect people from simple dangers, they get themselves into new and less obvious
problems. Someone who avoids the simple problems may simply be heading for a not-sosimple one. One problem with very supporting and protective environments is that the hard
problems may be discovered too late or be too hard to remedy once discovered. Also, a
rare problem is harder to find than a frequent one because you don't suspect it.
I also said, "Within C++, there is a much smaller and cleaner language struggling to get
out." For example, that quote can be found on page 207 of The Design and Evolution of
C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a
section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is
much cleaner than its syntax. I was thinking of programming styles, libraries and
programming environments that emphasized the cleaner and more effective practices over
archaic uses focused on the low-level aspects of C.
Chart 128
Converting EBNF Production
Rules to Parsing Methods

For production rule N::=X
–
Convert production rule to parsing method named parseN



–
–
–
Private void parseN () {
Parse X
}
Refine parseE to a dummy statement
Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()
Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing
method
parseN()
–
Refine parse X Y to
{
parseX
parseY
}}
–
Refine parse X|Y
Switch (currentToken.kind) {
Cases in starter[[X]]
Parse X
Break;
Cases in starters[[Y]]:
Parse Y
Break
Default:
Report a syntax error
Chart 129
}
Converting EBNF Production
Rules to Parsing Methods

For X | Y
–
–
Choose parse X only if the current token is one that
can start an X-phrase
Choose parse Y only if the current token is one that
can start an Y-phrase


starters[[X]] and starters[[Y]] must be disjoint
For X*
–
Choose
while (currentToken.kind is in starters[[X]])
 starter[[X]] must be disjoint from the set of tokens that can
follow X* in this particular context
Chart 130
Converting EBNF Production
Rules to Parsing Methods


Chart 131
A grammar that satisfies both these conditions is
called an LL(1) grammar
Recursive-descent parsing is suitable only for
LL(1) grammars
Error Repair


Good programming languages are designed with
a relatively large “distance” between syntactically
correct programs, to increase the likelihood that
conceptual mistakes are caught on syntactic
errors.
Error repair usually occurs at two levels:
–
–
Chart 132
Local: repairs mistakes with little global import, such as
missing semicolons and undeclared variables.
Scope: repairs the program text so that scopes are
correct. Errors of this kind include unbalanced
parentheses and begin/end blocks.
Error Repair

Repair actions can be divided into insertions and
deletions. Typically the compiler will use some look
ahead and backtracking in attempting to make progress in
the parse. There is great variation among compilers,
though some languages (PL/C) carry a tradition of good
error repair. Goals of error repair are:
–
–
–
–
No input should cause the compiler to collapse
Illegal constructs are flagged
Frequently occurring errors are repaired gracefully
Minimal stuttering or cascading of errors.
LL-Style parsing lends itself well to error repair,
since the compiler uses the grammar’s rules to
predict what should occur next in the input
Chart 133
Mini-Triangle Production Rules
Program
::=
Command
Program
(1.14)
V-name := Expression
Identifier ( Expression )
Command ; Command
if Expression then Command
else Command
while Expression do Command
let Declaration in Command
AssignCommand
CallCommand
SequentialCommand
IfCommand
(1.15a)
(1.15b)
(1.15c)
(15.d)
WhileCommand
LetCommand
(1.15e
(1.15f)
Expression ::=
|
|
|
Integer-Literal
V-name
Operator Expression
Expression Operator Expression
IntegerExpression
VnameExpression
UnaryExpression
BinaryExpressioiun
(1.16a)
(1.16b)
(1.16c)
(1.16d)
V-name
Identifier
SimpelVname
(1.17)
Declaration ::=
|
|
const Identifier ~ Expression
var Identifier : Typoe-denoter
Declaration ; Declaration
ConstDeclaration
VarDeclaration
SequentialDeclaration
(1.18a)
(1.18b)
(1.18c)
Type-denoter
::=
SimpleTypeDenoter
(1.19)
Command ::=
|
|
|
|
|
Chart 134
::=
Identifier
Abstract Syntax Trees


Chart 135
An explicit representation of the source program’s
phrase structure
AST for Mini-Triangle
Abstract Syntax Trees

Program ASTs (P):
Program
Program ::=
Command Program
(1.14
C

Command ASTs (C):
AssignCommand
V
(1.15a)
E
CallCommand
Identifier
spelling
Command ::=
Chart 136
E
(1.15b)
SequentialCommand
C1
(1.15c)
C2
V-name := Expression
AssignCommand
(1.15a)
|
Identifier ( Expression )
CallCommand
(1.15b)
|
Command ; Command
SequentialCommand
(1.15c)
Abstract Syntax Trees

Command ASTs (C):
WhileCommand
V
(1.15e)
Command ::=
LetCommand
E
D
|
(1.15f)
C
if Expression then Command
SequentialCommand
E
C1
(1.15d)
C2
IfCommand
(15.d)
else Command
Chart 137
|
while Expression do Command
WhileCommand
(1.15e
|
let Declaration in Command
LetCommand
(1.15f)
Midterm Review: Chapter 1

Context-free Grammar
–
–
–
–

Aspects of a programming language that
need to be specified
–
–
–
Chart 138
A finite set of terminal symbols
A finite set of non-terminal symbols
A start symbol
A finite se to production rules
Syntax: form of programs
Contextual constraints: scope rules and type
variables
Semantics: meaning of programs
Midterm Review: Chapter 1

Language specification
–
–
Informal: written in English
Formal: precise notation (BNF, EBNF)
 Unambiguous
 Consistent
 Complete

Context-free language
–
–
–
Chart 139
Syntax tree
Phrase
Sentence
Midterm Review: Chapter 1

Syntax tree
–
–

Abstract Syntax Tree (AST)
–
–
–
Chart 140
Terminal node labeled by terminal symbol
Non-terminal nodes labeled b y non-terminal symbol
Each non-terminal node ius labeled by production rule
Each non-terminal node has exactly one subtree for
each subprogram
Does not generate sentences
Midterm Review: Chapter 2

Translator
–

Compiler
–

Translates from high-level language into low-level
language
Interpreter
–
Chart 141
Accepts any text expressed in one language (source
language) and generates a semantically-equivalent
text expressed in another language (target language)
A program that accepts any program (source program)
expressed in a particular language (source language)
and runs that source program immediately
Midterm Review: Chapter 2

Interpretive compiler
–
Combination of compiler and interpreter


Portable compiler
–
–
–

Chart 142
Compiled and run on any mainline, without change
Portability measured by proportion of code that
remains unchanged
Portability is an economic issue
Bootstrapping
–

Some of the advantages of each
Using the language processor to process itself
Tombstone diagrams
Midterm Review: Chapter 3

Three phases of compilation
–
–
–



Single pass compilers
Multi-pass compilers
Compiler design issues
–
–
–
–
–
Chart 143
Syntactic analysis
Contextual analysis
Code generation
–
Speed
Space
Modularity
Flexibility
Semantic preserving transformations
Source language properties
Midterm Review: Chapter 4

Sub-phases of syntactic analysis
–
Scanning (lexical analysis)


–
Parsing


–
Source program in form of stream of tokens parsed to
determine phrase structure
Parser treats each token as a terminal symbol
Representation of the phrase structure


Chart 144
Source program transformed to stream of tokens
Comments and blank spaces between tokens are discarded
A data structure representing the source program’s phrase
structure
Typically an abstract syntax tree (AST)
Midterm Review: Chapter 4

Tokens
–
–
–
An atomic symbol of the source program
May consist of several characters
Classified according to kind

–
Each token completely described by it’s kind and
spelling

–
Token represented by tuple
Only kind of each token examined by parser

Chart 145
All tokens of the same kind can be freely interchanged without
affecting the program’s phrase structure
Spelling examined by contextual analyzer and/or code
generator
Midterm Review: Chapter 4

Grammars
–
Regular expressions




“|” separates alternatives
“*” indicates that the previous item may be repeated zero or
more times
“(“ and “)” are grouping parenthesis
e is the empty string
–


–
Algebraic properties
Common extensions
Grammar transformations



Chart 146
a special string of length 0
Left factorization
Elimination of left recursion
Substitution of non-terminal symbols
Midterm Review: Chapter 4
 Structure
of compiler
Source code
– Lexical analyzer
– Parser & semantic analyzer
– Intermediate code generation
– Optimization
– Assembly code generation
– Assembly code
–
Chart 147
Midterm Review: Chapter 4

Scanning (lexical analysis)
–
What does it do?
 Handles
keywords (reserve words
 Removes white space (tabs, spaces, new lines)
 Removes comments
 Perform look ahead
 Error handling
–
Issues
 Simpler
design
 Improve compiler efficiency
 Enhance compiler portability
Chart 148
Midterm Review: Chapter 4

Parsing
–
Given an unambiguous, context-free grammar
of input string – sentence in grammar
 Parsing an input string – determines its phrase
structure
 Recognition
–
–
–
Why is unambiguous important?
Advantages of unambiguous, context-free
grammars (see chart 81)
How do you know the syntax of a language is
legal?
A
Chart 149
legal program can be derived from the start
symbol of the grammar
Midterm Review: Chapter 4

Parsing
–
–
–
Rightmost (replace rightmost non-terminal in
each step) and leftmost (replaced leftmost nonterminal in each step) derivation
Bottom-up (reconstructs syntax tree from
terminal nodes up toward the root node) and
top-down (reconstructs syntax tree from the
root node down towards the terminal nodes)
Predictive parsers
 LL(1)
 Recursive
Chart 150
decent
Midterm Review: Chapter 4

Parsing
–
–
Chart 151
Converting EBNF production rules to parsing
methods
Error repair
Chapter 5: Contextual Analysis

Identification
–
–
–
–
–



Chart 152
Monolithic Block Structure
Flat Block Structure
Nested Block Structure
Attributes
Standard Environment
Type Checking
A Contextual Analysis Algorithm
Case Study: Contextual Analysis in the Triangle
Compiler
Contextual Analysis

Given a parsed program, the purpose of
contextual analysis is to check that the program
conforms to the source language’s contextual
constraints.
–
–

Chart 153
Scope rules: rules governing declarations and applied
occurrences of identifiers
Type rules: rules that allow us t0 infer the types of
expressions, and to decide whether each expression
has a valid type
Analysis of the program to determine correctness
with respect to the language definition (beyond
structure)
Contextual Analysis

Contextual analysis consists of two sub-phases:
–
–
Chart 154
Identification: applying the source language’s scope
rules to relate each applied occurrence of an identifier
to its declaration (if any).
Type checking: applying the source language's type
rules to infer the type of each expression, and compare
that type with the expected type.
Structure of a Compiler
Lexical
Analyzer
Source code
tokens
Parser
Semantic
Analyzer
Symbol
Table
parse tree
Intermediate Code
Generation
Semantic Analyzer
intermediate representation
Optimization
Identification
intermediate representation
Assembly code
Chart 155
Assembly Code
Generation
Type checking
Identification

Relate each applied occurrence of an identifier in
the source program to the corresponding
declaration
–


Chart 156
Ill-formed program if no corresponding declaration –
generate error
Identification could cause compiler efficiency
problems
Inefficient to use the AST
Identification Table



Also known as symbol table
Associates identifiers with their attributes
Basic operation
–
–
–

Attribute
–
–
Chart 157
Make the identification table empty
Add an entry associating a given identifier with a given attribute
Retrieve the attribute (if any) associated with a given identifier
Consists of information relevant to contextual analysis
Obtained from the identifier’s declaration
Identification Table

Each declaration in a program has a defined
scope
–


Portion of program over which the declaration takes
effect
Block: any program phase that delimits the scope
of declarations within it
Example Triangle block command
–
Let D in C

Chart 158
Scope of each declaration in D extends over the subcommand
C
Identification Table:
Structure/Implementation

Maintain scope
–
–

Efficiency
–
–
–
Chart 159
An identifier should be found in the table only when
valid
If an identifier is defined in multiple scopes, then a
lookup in the table must provide the appropriate
meaning for the use
How fast is lookup?
How fast to enter/exit a scope?
What is the overall table size?
Identification Table:
Structure/Implementation

Different implementations
–
–
–
Chart 160
Organized for efficient retrieval
Binary search tree
Hash table
Identification Table:
Functionality

A mapping of identifiers to their meanings

Information
–
–
–
Name
Type
Location

Operations
–
–
–
–
–
–
–
Chart 161
Create
Insert
Lookup
Delete
Update entry
Entering a new
scope
Leaving a scope
Block Structures

Monolithic block structure
–

Flat block structure
–

Fortran
Nested block structure
–
Chart 162
Basic and Cobol
Pascal, Ada, C, and Java
Monolithic Block Structure



The only block is the entire program
All declarations are global
Simple rules
–
–
No identifier may be declared more than once
For every applied occurrence of an identifier I, there must be a
corresponding declaration of I


The identification table should contain entries for all
declarations in the source program
–
–
Chart 163
No identifier may be used unless declared
At most, one entry for each identifier
The table contains an identifier I and the associated attribute A
Monolithic Block Structure
Program
(1) integer b = 10
(2) integer n
(3) char C
begin
…
n=n*b
…
Write c
…
end
Chart 164
Identification
Attribute
b
(1)
n
(2)
c
(3)
• Create new table
create command
• At declaration for identifier I, make table entry
insert command
• At applied occurrence of identifier I, retrieve
information from table
lookup command
Flat Block Structure


Program partitioned into several disjoint blocks
Two scope levels
–
Some declarations are local in scope

–
Other declarations are global in scope


Less
–
Chart 165
But same identifier may also be declared locally
No locally declared identifier may be redeclared in the same block

–
Identifiers allowed anywhere in the program – the program as a
whole is a block
Minor complication is to distinguish
simple rules global and local declaration entries
No global declared identifier may be redeclared globally

–
Identifiers restricted to particular block
Same identifier may be declared locally in several different blocks
For every applied occurrence of an identifier I in a block B, there
must be a corresponding declaration of I

Either global declaration of I or a declaration of I local to B
Flat Block Structure
Level
Identification
(2) real r
(3) real pi = 3.14
begin
…
end
global
Q
(1)
local
r
(2)
local
pi
(3)
(4) procedure R
Level
Identification
global
Q
(1)
global
R
(4)
local
c
(5)
Attribute
(1) procedure Q
(5) integer c
begin
…
end
program
(6) integer i
(7) boolean b
(8)char c
begin
…
call R
…
end
Chart 166
Level
Attribute
Identification
Attribute
global
Q
(1)
global
R
(4)
local
i
(6)
local
b
(7)
local
c
(8)
• Create new table
create command
• At start of a block
enter new scope command
• At end of a block
leave scope command
delete command
• At declaration for identifier I, make
table entry
insert command
• At applied occurrence of identifier I,
retrieve information from table
lookup command
Level
Identification
global
Q
(1)
global
R
(4)
Attribute
Nested Block Structure


Blocks may be nested one within another
Many scope levels
–
Declarations in the outermost block are global in
scope.

–
Declarations inside an inner block are local to that
block



Chart 167
The outermost block is at scope level 1
Every inner block is completely enclosed by another block
Next to outermost block is at scope level 2
If enclosed by a level-n, the block is at scope level n+1
Nested Block Structure

More complex rules
–
No identifier may be declared more than once in the
same block

–
Same identifier may be declared in different blocks, even if
they are nested
For every applied occurrence of an identifier I in a
block B, there must be a corresponding declaration of I
Must be in B itself
 Or in the block B’ immediately enclosing B
 Or in B’’ immediately enclosing B’
 Etc.
In smallest enclosing block that contains any declaration of I

Chart 168
Nested Block Structure
Let
(1) var a: Integer;
(2) var b: Boolean
In
begin
…;
let
(3) var b: Integer;
(4) var c: Boolean
In
begin
…;
let
(5) var d: Integer;
In
…;
…
end;
…
let
(6) var d: Boolean;
(7) Var e: Integer
in
…;
…
end
Chart 169
Level
Identification
1
a
(1)
1
b
(2)
Attribute
Level
Identification
1
a
(1)
1
b
(2)
2
b
(3)
2
c
(4)
Level
Attribute
• Create new table
create command
• At start of a block
enter new scope command
• At end of a block
leave scope command
delete command
• At declaration for identifier I, make
table entry
insert command
Level number determined by
number of calls to enter new scope
• At applied occurrence of identifier I,
retrieve information from table using
highest level for I
command
Level lookup
Identification
Attribute
Identification
Attribute
1
a
(1)
1
a
(1)
1
b
(2)
1
b
(2)
2
b
(3)
2
d
(6)
2
c
(4)
2
e
(7)
3
d
(5)
Attributes
Examples

Kind
–
–
–
–
–
Chart 170
constant
variable
procedure
function
type

Type
–
–
–
–
–
boolean
character
integer
record
array
Attributes

Information to be extracted from declaration
–
–
–

Constant, variable, procedure, function, type
Procedure or function declaration includes a list of formal
parameters that may be a constant, variable, procedural, or
functional parameter
Language provides whole families of record and array types
How to manage attribute information
–
Extract type information from declarations and store in information
table


–
Use the AST

Chart 171
Could be complex for a realistic programming language
Could require tedious programming
Pointers in information table pointing to location in AST with that
identifier
Attributes
Program
LetCommand
SequentialDeclaration
(2)
(1)
VarDeclaration
VarDeclaration
Ident.
SequentialCommand
int
a
Ident.
bool
LetCommand
...
b
SequentialDeclaration
(6) VarDeclaration
Ident.
bool
Chart 172
1
Identification
a
1
b
Attribute


Level
1
1
2
2
Identification
a
b
d
e
...
(7) VarDeclaration
Ident.
e
d
Level
...
SequentialCommand
Attribute




int
Standard Environment



Predefined constants, variables, types,
procedures, and functions
These are loaded into the identification table
Scope rules for standard environment
–
Scope enclosing the entire program

–
Same scope level as global declarations

Chart 173
Level 0
Example is C
Structure of a Compiler
Lexical
Analyzer
Source code
tokens
Parser
Semantic
Analyzer
Symbol
Table
parse tree
Intermediate Code
Generation
Semantic Analyzer
intermediate representation
Optimization
Identification
intermediate representation
Assembly code
Chart 174
Assembly Code
Generation
Type checking
Type Checking


Chart 175
Second task of contextual analyzer is to ensure
that the source program contains on type errors
Once applied occurrence of an identifier has
been identified, the contextual analyzer will check
that the identifier is used in a way consistent with
its declaration
Type Checking

Statically –typed language can detect any type
errors without actually running the program
–
For every expression E in the language, the compiler
can infer either that E has some type T or that E is illtyped


Chart 176
If E does have type T, then E will always yield a value of type T
If a value of type T’ is expected, then compiler checks that T’ is
equivalent to T
Type Checking

Infers the type of each expression bottom-up
–
–
–
–
Starting with literals and identifiers, and working up through larger
and larger subexpressions
Literal: The type of a literal is immediately known
Identifier: The type of an applied occurrence of identifier I is
obtained from the corresponding declaration of I
Unary operator application:




–
Binary operator application:




Chart 177
Consider “O E” where O is a unary operator of type T1  T2
Type checker ensures that E’s type is equivalent to T1
Infers that type of “O E” is T2.
Otherwise a type error

Consider “E1 O E2” where O is binary operator of type T1 X T2  T3
E1’s type is equivalent to T1
E2’s type is equivalent to T2
‘E1 O E2‘ is of type T3
Otherwise type error
Type Checking


Chart 178
Type of a nontrivial expression is inferred from
the types of its sub-expressions, using the
appropriate type rules
Must be able to test if two given types T and T’
are equivalent
Type Checking – Constant or
Variable Identifier
ConstDeclaration
Ident.
Expr.
:T
SimpelVname
Ident.
x
x
...
ConstDeclaration
Ident.
Expr.
:T
SimpelVname :T
Ident.
x
x
Chart 179
...
Type Checking – Variable
Declaration
VarDeclaration
Ident.
SimpelVname
Ident.
T
x
x
VarDeclaration
Ident.
SimpelVname :T
T
Ident.
x
x
Chart 180
Type Checking – Binary
Operator
BinaryExpression
Ident.
Op.
:int
...
BinaryExpression
Expr.
:int
<
...
Ident.
:int
...
< is of type Int X int  bool
Chart 181
Op.
Expr.
:int
<
...
:bool
Type Checking

Chart 182
Each applied occurrence of an identifier must be
identified before type checking can proceed
Chapter 6: Run-time Organization

Chart 183
Marshal the resources of the target machine
(instructions, storage, and system software) in
order to implement the source language
Chapter 6: Run-time Organization

Data Representation
–

Expression Evaluation
–



Chart 184
How should we implement procedures, functions, and parameters, in
terms of low-level routines?
Heap Storage Allocation
Run-time Organization for Object-oriented Languages
–

How should we organize storage for variables, taking into account the
different lifetimes of global, local, and heap variables?
Stack Storage Allocation
Routines
–

How should we organize the evaluation of expressions, taking care of
intermediate results?
Static Storage Allocation
–

How should we represent the values of each source-language type in the
target machine?
How should we represent objects and methods?
Case Study: The Abstract Machine TAM
Data Representation
How should we represent the values of each sourcelanguage type in the target machine?

High-Level Data Types
Machine Data Types
•
•
•
•
•
•
•
•
•
•
•
Truth values
Integers
Characters
Records
Arrays
Operations over these types
Bits
Bytes
Words
Double-words
Low-level arithmetic
and logical operations
Need to bridge the semantic gap between
high-level types and machine level types
Chart 185
Data Representation -Fundamental Principles

Non-confusion
–
–
–
Different values of a given type should have different
representations
If two different values are confused, i.e., have the same
representation, then comparison of these values will incorrectly
treat the values as equal
Example: approximate representation of real numbers


–
–
Must avoid confusion in the representations of discrete types such
as truth values, characters, and integers
For statically typed languages need only be concedrned with
values of the same type


Chart 186
Real numbers that are slightly different mathematically might have
the same approximate representation
Difficult to avoid – need to take care during compiler design
00…002 may represent false, the integer 0, the real number 0.0
Compile time type checks will denote the values of different types
Data Representation -Fundamental Principles

Uniqueness
–
–
Each value should always have the same
representation
Example of non-uniqueness




Chart 187
Ones-complement representation of integers in which zero is
represented both by 00...002 and 11…112 (+0 and –0)
A simple bit-string co0parison would incorrectly treat these
values as unequal
More specialized integer comparison must be used
Alternative twos-complement representation gives us unique
representations of integers
Data Representation – Pragmatic
Issues

Constant-size representation
–
–
–
Chart 188
The representations of all values of a given type
should occupy the same amount of space
Make possible for compiler to plan the allocation of
storage
Knowing the type of variable but not the actual value,
the compiler will know exactly how much storage
space the variable will occupy
Data Representation – Pragmatic
Issues

Direct representation vs. indirect representation
–
–
Should the values of a given type be represented
directly, or indirectly through pointers?
Direct representation

–
Just the binary representation of the value consisting of one or
more bits, bytes, words
Indirect representation


A handle that points to the storage area which has the binary
representation of the value
Essential for types whose values vary greatly in size
–
Chart 189
List or dynamic array
Direct representation vs. indirect
representation
Same type as x but
requiring more space
x
Chart 190
handle
x
handle
y
Notation

#T: cardinality of type T
–
–

Size T: amount of space (in bits, bytes, or words)
occupied by each value of type T
–

For indirect representation only handle is counted
For direct representation of type T
–
–
–
Chart 191
Number of distinct values of type T
#[[Boolean]] = 2
size T  log2 (#T) or 2(size T)  #T
size T is represented in bits
In n bits we can represent at most 2n distinct values if
we are to avoid confusion  non-confusion
requirement
Primitive Types


Cannot be decomposed into simpler values
Most programming languages provide these
primitive types
–
–

Chart 192
Boolean, Char, Integer
Also provide elementary logical and arithmetic
operations
Machines typically support the above primitive
types, so choice of representation is
straightforward
Primitive Types Representation

Boolean
–
–
–
true and false
Since #[[Boolean]] = 2 then size[[Boolean]]  1 bit
Can represent Boolean with one bit, one bye, or one
word



Chart 193
For single bit: 0 for false and 1 for true
For byte or word: 00…002 for false and either 00…012 or
11…112 for true
Negation, conjunction, disjunction  NOT, AND, OR
Primitive Types Representation

Char
–
Source language can specify character set


–
Most do not

–
Allows compiler writers to choose the machine’s native
character set (27 or 28 distinct characters)
ISO defines character representation for “A” to be
010000012

Chart 194
Ada: ISO-Latin1 character set (28 distinct characters)
Java: Unicode character set (216 distinct characters)
Can represent a character by one byte or one word
Primitive Types Representation

Integer
–
Denotes an implementation-defined bounded range of integers

–
Binary representation determined by target machine’s arithmetic
unit and almost always occupies one word

–
Defined by the individual language processor
Can implement language’s integer operations with machine's integer
operations
Pascal and Triangle

-maxint, …, -1, 0, +1, …, +maxint
–



–
#[[Integer]] = 2 X maxint + 1
2size[[Integer]]  2 X maxint + 1
For word size of w bits, size[[Integer]] = w, maxint = 2w-1 – 1
Java

Chart 195
maxint is implementation defined

Int denotes –231, …, -1, 0, +1, …, +231 – 1
#[[Int]] = 232
Record Type

Consists of several fields, each of which has an
identifier
–

Fundamental operation on records is field
selection
–

Use one field identifier to access the corresponding
field
Simple representation
–
–
Chart 196
All records of a particular type have fields with the
same identifiers and types
Juxtapose the fields to make them occupy consecutive
positions in storage
Allows us to predict total sized of each record and the
position of each field relative to the base of the record
Record Type

Consider the following
type T = record I1: T1, …, In: Tn end;
var r: T
–
–
–
size T = size T1 + … + size Tn
If size T1, .., and size Tn are all constant, then
size T is also constant
Implementation of field selection

r.I1
r.I2
…
r.In
Chart 197
Address[[r.Ii]] = address r + (size T1 + … + size Ti-1)
Value of type T1
Value of type T2
…
Value of type Tn
Some machines have
alignment restrictions, which
force unused space to be left
between record fields; cannot
use these equations
Disjoint Unions


Tag and a variant part
Value of tag determines type of variant part
–
T = T1 + … + Tn

In each value of type T, the variant part is a value chosen from one of
the types T1, …, or Tn; the tag indicates which one
Size T = size Ttag + max(sizeT1, …, size Tn)
– Address[[u.Itag]] = address u + 0
– Address[[u.Ii]] = address u + size Ttag
u.Itag
u.Itag
u.I1
Chart 198
value of
type T1
or
u.I2
u.Itag
value of
type T2
Wasted space
… or …
u.In
value of
type Tn
Max(sizeT1,…,sizeTn)
value of type Ttag
Will have wasted space
–
Static Arrays

Consists of several elements, all of the same type
–
–
–
Bounded range of indices – usually integers
Each index has exactly one element
Fundamental operation on arrays is indexing


–
Static Array



Chart 199
Access an individual element by giving its index
Index evaluated at run-time
Index bounds are known at compile-time
Direct representation is to juxtapose the array elements, in
order of increasing indices.
Implemented by run-time address computation
Static Arrays
(lower index bound is 0)

Consider the following example
Type T = array n of Telem;
Var a: T
–
–
–
–
Chart 200
a[0]
a[1]
a[2]
Size T = n X size Telem
The number of elements n is
constant, so size Telem is constant,
then size T is also constant
Address[[a[i] ]] = address a + (i X
a[n-1]
size Telem)
Since i is known only at run-time, an
array indexing implies a run-time
address computation
values of type Telem
Static Arrays
(programmer chooses lower and upper array bounds)

Consider the following example
Type T = array [l..u] of Telem;
Var a: T
–
–
–
–
–
–
–
Chart 201
a[l]
a[l+1]
a[l+2]
size T = (u - l + 1) X size Telem
The number of elements (u – l + 1) is
constant, so size Telem is constant, then size
T is also constant
a[u]
address[[a[i] ]] = address a + (i – l) X size
Telem) = address a – (l X size Telem) + (i X size
Telem)
Address[[a[0] ]] = address a – (l X size Telem)
Address[[a[i] ]] = address[[a[0] ]] + (i X size
Telem)
Since i is known only at run-time, an array
indexing implies a run-time address
computation
Index check must ensure that l  i  u
values of type Telem
Dynamic Arrays

An array whose index bounds are not know until
run-time
–
–
–
Different dynamic arrays of the same type may have
different index bounds, and therefore different numbers
of elements
Need to satisfy constant-size requirement
Create array descriptor or handle



Chart 202
Pointer to the array’s elements
Index bounds
Handle has constant size
Dynamic Arrays

Ada example
Type T is array [Integer range <>) of Telem;
a: T (E1 .. E2);
–
size T = address:size + 2 X size[[Integer]]


–
Declaration of array variable a:




–
E1 and E2 are evaluated to yield a’s index bounds (say l and u)
Space is allocated for (u – l + 1) elements, juxtaposed and separate
from a’s handle
Address[[a(0)]] = address[[a(l)]] – (l X size Telem)
Values for address[[a(0)]], l, and u are stored in a’s handle
The element with index i will be address as follows:


Chart 203
Address:size is the amount of space required to store an address –
usually one word.
Satisfies constant-size requirement
Address[[a(i)]] = address[[a(0)]] + (i X size Telem) =
content(address[[a]]) + (i X size Telem)
Index check is l  i  u where l = content(address[[a]] + address:size)
and u = content(address[[a]]+ address:size + size[[Integer]]
Dynamic Arrays
a[l]
a[l+1]
a[l+2]
a
origin
lower bound
upper bound
a[0]
l
u
handle
a[u]
elements of type Telem
Chart 204
Status

Chapter 6: Run-time Organization
–
Data Representations






–
Expression Evaluation


–
Chart 205
Register machine
Stack machine
Static Storage Allocation

–
Primitive types
Record types
Disjoint unions
Static arrays
Dynamic arrays
Recursive types
Global variables
Stack Storage Allocation

Local variables
Recursive Types

Defined in terms of itself
–
–
Values of recursive type T have components that are
themselves of type T
Examples


Chart 206
List with tail being itself a list
Tree with the sub-trees themselves being trees
Recursive Types

Consider the Pascal declaration
type IntList = ^IntNode;
IntNode = record
head: Integer;
tail: IntList
end;
–
var primes: IntList
Size[[IntList]] = address:size (usually 1 word)
primes
handle
Always use pointers
to represent values
of the recursive type
Chart 207
Expression Evaluation
Register Machine



How should we organize the evaluation of
expressions
The problem is the need to keep intermediate
results somewhere
Consider the expression
–
–
a * b + (1 – (c * 2))
Will have intermediate results for
a * b, c * 2, and 1 – (c * 2)
For a register based machine (non-stack machine)


Chart 208
Use the registers to store intermediate results
Problem arises when there are not enough registers for all
intermediate results
Expression Evaluation
Example a * b + (1 – (c * 2))
LOAD
MULT
LOAD
LOAD
MULT
SUB
ADD
R1 a
R1 b
R2 #1
R3 c
R3 #2
R2 R3
R1 R2
a, b, c are memory addresses for the values of a, b, c
Chart 209
Expression Evaluation
Stack Machine
The machine provides a stack for holding intermediate
results
 For the expression a * b + (1 – (c * 2))
LOAD a
LOAD b
MULT
LOADL 1
LOAD c
LOADL 2
MULT
SUB
ADD

Chart 210
Expression Evaluation
Stack Machine Example
(1) After LOAD a
(2) After LOAD b
(3) After MULT
value of a
value of b
value of a
a * b + (1 – (c * 2))
(4) After LOAD 1
value of a*b
value of a*b
1
unused
space
(5) After LOAD c
(6) After LOAD 2
value of a*b
value of a*b
1
1
(7) After MULT
1
value of c
value of c
value of a*b
(8) After SUB
value of a*b
value of 1-(c*2)
value of c*2
2
(9) After ADD
value of (a*b)+(1-(c*2))
Chart 211
Operands of different types (and therefore different sizes) can
be evaluated in just the same way. E.g., AND, OR, function,
etc. Each operation takes values from top of stack and places
results onto top of stack
Static Storage Allocation
Global Variables



Each variable in source program requires enough storage
to contain any value that might be assigned to it
As a consequence of constant-size representation, the
compiler knows how much storage needs to be allocated
to variable, based on type of variable (size T)
Global variables
–
–
Chart 212
Variables that exist and take up storage throughout the program’s
run-time.
Static storage allocation: Compiler locates these variables at
some fixed positions in storage (decides each global variable’s
address relative to the base of the storage region in which global
variables are located)
Static Storage Allocation
Global Variables: Example
let
type Date = record
y: Integer,
m: Integer;
d: Integer
end;
var a: array 3 of Integer;
var b: Boolean;
var c: Char;
var t: Date
in
...
Chart 213
a(0)
a(1)
a(2)
b
c
t.y
t.m
t.d
unused
space
a
t
Stack Storage Allocation
Local Variables



A local variable v is one that is declared inside a
procedure (or function).
Lifetime of v: the variable v exists (occupies
storage) only during an activation of that
procedure
If same procedure is activated several times
–
–
Chart 214
v will have several lifetimes
Each activation creates a distinct variable
Stack Storage Allocation
Local Variables: An Example
let
var a: array 3 of Integer;
var b: Boolean;
var c: Char;
proc Y () ~
let
var d: Integer;
var e: record c: Char, n:
Integer end
in
...
proc Z () ~
let
var f: Integer
in
begin …; Y(); … end
in
begin …; Y(); …; Z(); … end
Chart 215
Stack Storage Allocation
Local Variables: An Example
time
Lifetime of global variables
Lifetime of
variables
local to Y
Lifetime of variables local to Z
Lifetime of
variables
local to Y
Program
calls Y
Return
from Y
Program
calls Z
Z calls Y
Return
from Y
Return Program
from Z stops
Observations:
• Global variables are the only ones that exist throughout the program’s run-time
• Use static allocation for global variables
• Lifetimes of local variables are properly nested
• Use a stack for local variables
Chart 216
Stack Storage Allocation
Stack Frames: An Example
SB
(1) After program
starts
SB
(2) After program
calls Y
SB
LB
ST
SB
globals
globals
globals
(3) After return
from Y
globals
ST
LB
frame
for Y
ST
SB
(5) After Z calls
Y
SB
LB
frame
for Z
LB
ST
Chart 217
frame
for Y
ST
frame
for Z
ST
(6) After return
from Y
SB
frame
for Z
(7) After return
from Z
globals
globals
globals
(4) After program
calls Z
dynamic
links
ST
Registers
SB: Stack Base – Location of global
variables
LB: Local Base – Local variables of
currently running procedure
ST: Stack Top – Very top of stack
Stack Storage Allocation

The stack varies in size
–
–
–

Registers (find address of variables relative to these
registers)
–
–
–
Chart 218
For example, the frames for each of Y’s activation are at two
different locations
The position of a frame within a stack cannot be predicted in
advance
Need registers dedicated to point to the frames
SB: stack base – is fixed, pointing to the base of the stack. This is
where the global variables are located.
LB: local base – points to the base of the topmost frame in the
stack. This frame always contains the variables of the currently
running procedure.
ST: stack top – points to the very top of the stack. ST keeps track
of the frame boundary as expressions are evaluated and the top
of the stack expands and contracts.
Stack Storage Allocation

Frame contents
–
–
Space for local variables
Link data


Return address – code address to which control will be
returned at the end of the procedure activation. It is the
address of the instruction following the call instruction that
activated the procedure in the first place.
Dynamic link – the pointer to the base of the underlying fram e
in the stack. It is the old content of LB and will be restored at
end of procedure activation
link data
local data
Chart 219
dynamic link
return address
Since there are two words
of link data, local variable
addresses are offset by 2
This only considers access
to local or global variables,
not nested variables.
Chapter 7: Code Generation





Chart 220
Code Selection
A Code Generation Algorithm
Constants and Variables
Procedures and Functions
Case Study: Code Generation in the Triangle
Compiler
Code Generation

Translation of the source program to object code
–

Target Machines
–
–
–
Chart 221
Dependent on source language and target machine
Registers, or stack, or both for intermediate results
Instructions with zero, one, two, or three operands, or a
mixture
Single addressing mode, or many
Code Generation
Major Subproblems

Code selection: which sequence of target machine
instructions will be the object code for each phrase
–
–

Storage allocation: deciding the storage address of each
variable in source program
–

Write code templates: a general rule specifying the object code of
all phases of a particular form (e.g., all assignment commands,
etc.)
But there are usually lots of special cases
Exact for glob al variables, but only relative for local variables
Register allocation: should be used to hold intermediate
results during expression evaluation
–
Complex expressions -- not enough registers
Since code generation for stack machine much simpler than for
register machine, will only generate code for stack machine
Chart 222
Code Generation
Code Selection




Chart 223
Deciding which sequence of instructions to
generate for each case
Code template: specifies the object code to which
a phrase is translated, in terms of the object code
to which its sub phrases are translated.
Object code: sequence of instructions to which
the source-language phrase will be translated
Code specification: collection of code functions
and code templates; must cover the entire source
langauge
Abstract Machine TAM




Suitable for executing programs compiled from a
block-structured language such as Triangle
All evaluation takes place o a stack
Primitive arithmetic, logical, and other operations
are treated uniformly with programmed functions
and procedures
Two separate stores
–
–
Chart 224
Code Store: 32-bit instruction words (read only)
Data Store: 16-bit data words (read-write)
Abstract Machine TAM
Code and Data Stores

Code Store
–
–
Fixed while program is running
Code segment: contains the program’s instructions



CB  points to base of code segment
CT  points to top of code segment
CP  points to next instruction to be executed
–
–
Primitive segment: contains ‘microcode’ for
elementary arithmetic , logical, input-output, heap, and
general-purpose operations


Chart 225
Initialized to CB (programs first instruction is at base of code
segment)
PB  points to base of primitive segment
PT  points to top of primitive segment
Abstract Machine TAM
Code and Data Stores

Data Store
–
–
While program is running segments of data store may
vary
Stack grows from low-address end of Data Store


SB  points to base of the stack
ST  points to top of the stack
–
–
Heap grows from the high-address endo fo Data Store


HB  points to base of heap
HT  points to top of heap
–
Chart 226
Initialized to SB
Initialized to HB
Abstract Machine TAM
Code and Data Stores
Code Store
CB
Data Store
SB
global
segment
code
segment
frame
CP
stack
CT
unused
LB
frame
ST
unused
PB
HT
primitive
segment
PT
Chart 227
heap
segment
HB
• Stack and heap can expand
and contract
• Global segment is always
at base of stack
• Stack can contain any
number of other segments
known as frames
containing data local to an
activation of some routine
• LB  points to base of
topmost frame
Code Functions
run P
execute C
evaluate E
fetch V
assign V
elaborate D
Chart 228
Run the program P and then halt, starting and
finishing with an empty stack
Execute the command C, possibly updating
variables, but neither expanding nor contracting
the stack
Execute the expression E, pushing its result on to
the stack top, but having no other effect
Push the value of the constant or variable named V
on to the stack top
Pop a value from t he stack top, and store it in the
variable named V
Elaborate the declaration D, expanding the stack to
make space for any constants and variables
declared therein
Abstract Machine TAM
Instructions
LOAD(n) d[r]
LOADA d[r]
LOADI(n)
LOADL d
STORE(n) d[r]
STOREI(n)
CALL(n) d[r]
CALLI
RETURN(n) d
PUSH d
POP(n) d
JUMP d[r]
JUMPI
JUMPIF(n) d[r]
HALT
Chart 229
Fetch an n-word object from the data address (d+register r), and push it on the stack
Push the data address (d+register r) on to the stack
Pop a data address from the stack, fetch an n-word object from that address, and
push it on to the stack
Push the 1-word literal value d on to the stack
Pop an n-word object from the stack, and store it at the data address (d+register r)
Pop an address from the stack, then pop an n-word object from t he stack and store
it at that address
Call the routine at code address (d+register r), using the address in register n as the
static link
Pop a closure (static link and code address) from the stack, then call the routine at
that code address
Return from the current routine: pop an n-word result from the stack, then pop the
topmost frame, then pop d words of arguments, then push the result back on to
the stack
Push d words (uninitialized) on to the stack
Pop an n-word result from the stack, then pop d more words, then push the result
back on to the stack
Jump to code address (d+register r)
Pop a code address from the stack, then jump to that address
Pop a 1-word value from the stack, then jump to code address (d+register r) if and
only if that value equals n
Stop execution of the program
While Command

execute [[while E do C]] =
JUMP h
–
–
g: execute C
h: evaluate E
JUMPIF(1)
Chart 230
g
While Command

Chart 231
execute [[while i > 0 do i := i – 2]]
30: JUMP 35
// JUMP h
g: 31: LOAD i
32: LOADL2
– execute [[i := I – 2]]
33: CALL sub
34: STORE i
h: 35: LOAD i
36: LOADL0
– execute [[i > 0]]
37: CALL gt
38: JUMPIF(1) 31 // JUMPIF(1) g
While Command
public Object visitWhileCommand(WhileCommand ast, Object o) {
Frame frame = (Frame) o;
int jumpAddr, loopAddr;
Chart 232
jumpAddr = nextInstrAddr;
// saves the next instruction address (g:) to put in JUMP command
emit(Machine.JUMPop, 0, Machine.CBr, 0);
// puts the JUMP h instruction in obj file
loopAddr = nextInstrAddr;
// this is address g:
ast.C.visit(this, frame);
// this generates code for C
patch(jumpAddr, nextInstrAddr);
// this establishes address h: that was needed in the JUMP h statement
ast.E.visit(this, frame);
// this generated code for E
emit(Machine.JUMPIFop, Machine.trueRep, Machine.CBr, loopAddr);
// this generated code to check expression, if false to address g:
return null;
}
While Command

Chart 233
execute [[while E do C]] =
g: execute C
evaluate E
JUMPIF(1) g
Repeat Command

Chart 234
execute [[repeat i := i – 2 until i < 0 do ]]
–
execute [[i := i – 2]]
–
execute [[i > 0]]
g: 31: LOAD
i
32: LOADL 2
33: CALL
sub
34: STORE
i
35: LOAD
i
36: LOADL 0
37: CALL
lt
38: JUMPIF(0) 31 // JUMPIF(0) g
Repeat Command
public Object visitRepeatCommand(RepeatCommand ast, Object o) {
Frame frame = (Frame) o;
int jumpAddr, loopAddr;
// emit(Machine.JUMPop, 0, Machine.CBr, 0);
// jumpAddr = nextInstrAddr;
loopAddr = nextInstrAddr;
ast.C.visit(this, frame);
// patch(jumpAddr, nextInstrAddr);
ast.E.visit(this, frame);
emit(Machine.JUMPIFop, Machine.falseRep, Machine.CBr, loopAddr);
return null;
Chart 235
}
Abstract Machine TAM
Routines
Chart 236
Abstract Machine TAM
Primitive Routines
Chart 237
Extend Mini-Triangle
V1 , V2 := E1 , E2

This is a simultaneous assignment: both E1 and
E2 are to be evaluated, and then their values
assigned to the variables V1 and V2, respectively
evaluate E1
evaluate E2
assign V2
assign V1
ST
ST
Results pushed to top of stack
Results pushed to top of stack
Top of stack stored in variable V2
Top of stack stored in variable V1
Result E1
ST
Result E1
Result E2
ST
V2
Chart 238
Result E1
ST
Result E2
V2
Result E2
V1
Result E1
Extend Mini-Triangle
C1 , C 2

This is a collateral command: the subcommands
C1 and C2 are to be executed in any order
chosen by the implementer
execute C1
execute C2
Chart 239
Top of stack unchanged
Top of stack unchanged
Extend Mini-Triangle
if E then C

This is a conditional command: if E evaluates to
true, C is executed, otherwise nothing
evaluate E
JUMPIF (0) g
execute C
g:
Chart 240
Results pushed to top of stack
Jump to g if E evaluates to false
Top of stack unchanged
Jump location
Extend Mini-Triangle
repeat C until E

This is a loop command: E is evaluated at the
end of each iteration (after executing C), and the
loop terminates if its value is true
g: execute C
evaluate E
JUMPIF (0) g
Chart 241
Top of stack unchanged
Results pushed to top of stack
Jump to g if E evaluates to false
Extend Mini-Triangle
repeat C1 while E do C2

This is a loop command: E is evaluated in the
middle of each iteration (after executing C1 but
before executing C2), and the loop terminates if
its value is false
JUMP h
g: execute C2
h: execute C1
evaluate E
JUMPIF (1) g
Chart 242
Top of stack unchanged
Top of stack unchanged
Results pushed to top of stack
Jump to g if E evaluates to true
Extend Mini-Triangle
if E1 then E2 else E3

This is a conditional expression: if E1 evaluates to
true, E2 is evaluated, otherwise E3 is evaluated
(E2 and E3 must be of the same type)
evaluate E1
JUMPIF (0) g
evaluate E2
JUMP h
g: evaluate E3
h:
Chart 243
Results pushed to top of stack
Jump to g if E evaluates to false
Results pushed to top of stack
Jump location
Results pushed to top of stack
Extend Mini-Triangle
let D in E

This is a block expression: the declaration D is
elaborated, and the resultant bindings are used in the
evaluation of E
elaborate D
evaluate E
POP (n) s
Chart 244
Expand stack for variables or constants
Results pushed to top of stack
Pop an n word from stack, pop s more, then
push first n-word back on stack
If s>0
where s = amount of storage allocated by D
n = size (type of E)
Extend Mini-Triangle
begin C; yield E end

Here the command C is executed (making
side effects), and then E is evaluated
execute C
evaluate E
Chart 245
Top of stack unchanged
Results pushed to top of stack
Extend Mini-Triangle
for I from E1 to E2 do C

Chart 246
First the expressions E1 and E2 are evaluated,
yielding the integer m and n, respectively. Then the
subcommand C is executed repeatedly, with I bound
to integers m, m+1, …, n in successive iterations. If
m < n, C is not executed at all. The scope of I is C,
which may fetch I but may not assign to it.
Extend Mini-Triangle
for I from E1 to E2 do C
evaluate E2
evaluate E1
JUMP h
g: execute C
CALL succ
h: LOAD –1 [ST]
LOAD –3 [ST]
CALL le
JUMPIF(1) g
POP(0) 2
Compute final value
Compute initial value of I
Top of stack unchanged
Increment current value of I
Fetch current value of I
Fetch final value
Test current value <= final value
If so, repeat
Discard current and final values
At g and at h, the current value of I is at the stack top (at address –1 [ST],
and the final value is immediately underlying (at address –2 [ST]
Chart 247
Chapter 8: Interpretation

Interactive Interpretation
–
–
–
Interactive Interpretations of Machine Code
Interactive Interpretation of Command
Languages
Interactive Interpretation of Simple
Programming Languages
Recursive Interpretation
 Case Study: The TAM Interpreter

Chart 248
Chapter 9: Conclusion

The Programming Language Life Cycle
–
–
–
–

Error Reporting
–
–

Compile-time Error Reporting
Run-time Error Reporting
Efficiency
–
–
Chart 249
Design
Specification
Prototype
Compilers
Compile-time Efficiency
Run-time Efficiency
Structure of a Compiler
Lexical
Analyzer
Source code
tokens
Parser
Semantic
Analyzer
Symbol
Table
parse tree
Intermediate Code
Generation
Semantic Analyzer
intermediate representation
Optimization
Identification
intermediate representation
Assembly code
Chart 250
Assembly Code
Generation
Type checking
Programming Language Lifecycle:
Concepts









Chart 251
Values
Types
Storage
Bindings
Abstractions
Encapsulation
Polymorphism
Exceptions
Concurrency
Concepts
Advanced Concepts
Programming Language Lifecycle:
Simplicity & Regularity

Strive for simplicity and regularity
–
–
Chart 252
Simplicity: support only the concepts essential
to the applications for which language is
intended
Regularity: should combine those concepts in
a systematic way, avoiding restrictions that
might surprise programmers or make their task
more difficult
Design Principles

Type completeness: no operation should be arbitrarily
restricted in the types of its operands
–

Abstraction: for phrase that specifies some kind of
computation, should be a way to abstract that phrase and
parameterize it
–

Should be possible to abstract any expression and make it a
function
Correspondence: for each form of declaration there
should be corresponding parameter mechanism
–
Chart 253
Operations like assignment and parameter passing should,
ideally, be applied to all types
Take a block with a constant definition and transform it into as
procedure (or function) with a constant parameter
Programming Language Lifecycle
Design
Specification
Prototype
Compilers
Manuals,
textbooks
Chart 254
Specification

Precise specification for language’s syntax and semantics
must be written
–
Informal or formal or hybrid
Informal
Syntax
Semantics
Chart 255
Formal
English phrases
BNF, EBNF
English phrases
Axiomatic method
(based on mathematical
logic)
Prototypes



Cheap, low quality implementation
Highlights features of language that are hard to
implement
Try out language
–
–
Interpreter might be a ghood prototype
Interpretive compiler

Chart 256
From source to abstract machine code
Compile-Time Error Reporting



Rejecting ill-formed programs
Report location of each error with some
explanation
Distinguish between the major categories of
compile-time errors:
–
Syntactic error: missing or unexpected characters or
tokens

–
Scope error: a violation of the language’s scope rules

–
Indicate which identifier was used twice, or used with
declaration
Type error: a violation of the language’s type rule

Chart 257
Indicate what characters or tokens were expected
Indicate which type rule was violated and/or what type was
expected
Run-Time Error Reporting

Common run-time errors
–
–
–

Chart 258
Arithmetic overflow
Division by zero
Out-of-range array indexing
Can be detected only at run-time, because they
depend on values computed at run-time
Final Exam Review

Final Exam is comprehensive in that:
–
–

Essay questions will cover Chapters 5, 6, 7, 9
Problem oriented questions require knowledge from
the entire semester
Exam Structure
–
Four questions

Two essay questions
–
Discuss
– Describe

– Compare & contrast
– Evaluate
Two problems
–
Develop code template for new language construct
– Determine identification table for given program
– Calculate size and address for given type(s)
Chart 259
Final Exam Review
Chapter 5 – Contextual Analysis

Contextual analysis checks that the program conforms to
the source language’s contextual constraints
–
–

Block Structure
–
–
–

–
–
–
Chart 260
Monolithic
Flat
Nested
Type Checking
–

Scope rules
Type rules
Literal
Identifier
Unary operator application
Binary operator application
Standard Environment
Final Exam Review
Chapter 6 – Run-Time Organization

Key Issues
–
–
–
–

Fundamental Principles of Data Representation
–
–
Chart 261
Data representation
Expression evaluation
Storage allocation
Routines
Non-confusion: different values of a given type should
have different representation
Uniqueness: each value should always have same
representation
Final Exam Review
Chapter 6 – Run-Time Organization

Types
–
Primitive types: cannot be decomposed



–
–
–
–
–

Chart 262
Boolean
Character
Integer
Records
Disjoint unions
Static arrays
Dynamic arrays
Recursive types
For various types be able to determine size (storage
required) and address (how to locate)
Final Exam Review
Chapter 6 – Run-Time Organization

Expression Evaluation
–
–
–
Stack machine
Register machine
Static storage allocation

–
Stack storage allocation

Chart 263
Global variables
Local variables
Final Exam Review
Chapter 7 – Code Generation

Translation of the source program to object code
–

Target Machines
–
–
–
Chart 264
Dependent on source language and target machine
Registers, or stack, or both for intermediate results
Instructions with zero, one, two, or three operands, or a
mixture
Single addressing mode, or many
Final Exam Review
Chapter 7 – Code Generation



Chart 265
Code selection: which sequence of target
machine instructions will be the object code for
each phrase
Storage allocation: deciding the storage address
of each variable in source program
Register allocation: should be used to hold
intermediate results during expression evaluation
Final Exam Review
Chapter 9 – Programming Language Life-Cycle
Design
Specification
Prototype
Compilers
Manuals,
textbooks
Chart 266
Final Exam Review
Chapter 9 – Programming Language Life-Cycle


Strive for simplicity and regularity
Design Principles
–
–
–



Specifications
Prototype
Error Reporting
–
–
Chart 267
Type completeness: no operation should be arbitrarily restricted in
the types of its operands
Abstraction: for phrase that specifies some kind of computation,
should be a way to abstract that phrase and parameterize it
Correspondence: for each form of declaration there should be
corresponding parameter mechanism
Compile-time
Run-time
Final Exam Review
Structure of a Compiler
Lexical
Analyzer
Source code
tokens
Parser
Semantic
Analyzer
Symbol
Table
parse tree
Intermediate Code
Generation
Semantic Analyzer
intermediate representation
Optimization
Identification
intermediate representation
Assembly code
Chart 268
Assembly Code
Generation
Type checking
Descargar

Development Lifecycle Models