Compilers and Language
Translation
Gordon College
What’s a compiler?

All computers only understand machine language
This is
a program
10000010010110100100101……

Therefore, high-level language instructions must be
translated into machine language prior to execution
2
What’s a compiler?

Compiler
A piece of system software that translates high-level
languages into machine language
while (c!='x')
{
if (c == 'a' || c == 'e' || c == 'i')
printf("Congrats!");
else
if (c!='x')
printf("You Loser!");
}
program.c
Compiler
Congrats!
prog
10000010010110100100101……
gcc -o prog program.c
3
Assembler (a kind of compiler)
LOAD
(opcode table)
0101
X
Assembly
(symbol table)
0000 0000 1001
Machine Language
One-to-one translation
4
Compiler (high-level language translator)
a = b + c - d;
0101 00001110001
0111 00001110010
0110 00001110011
0100 00001110100
LOAD B
ADD C
SUBTRACT D
STORE A
0101 00001110001 0111 00001110010…….
One-to-many translation
5
Goals of a compiler

Code produced must be correct
A = (B+C)-(D+E);
Possible translation:
LOAD B
ADD C
STORE B
LOAD D
ADD E
STORE D
LOAD B
SUBTRACT D
STORE A
Is this correct?
No - STORE B and STORE D
changes the values of variables
B and D which is the high-level
language does not intend
6
Goals of a compiler

Code produced should be reasonably efficient
and concise
Compute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000
sum = 0.0
for(i=0;i<50000;i++) {
sum = sum + (2.0 * x[i]);
Optimizing compiler:
sum = 0.0
for(i=0;i<50000;i++) {
sum = sum + x[i];
sum = sum * 2.0;
49,999 less instructions
7
General Structure of a Compiler
8
The Compilation Process

Phase I: Lexical analysis


Compiler examines the individual characters in the
source program and groups them into syntactical
units called tokens
Phase II: Parsing

Source
code
Scanner
Groups
of
tokens
The sequence of tokens formed by the scanner is
checked to see whether it is syntactically correct
Groups
of
tokens
Parser
correct
not correct
9
The Compilation Process

Phase III: Semantic analysis and code
generation

The compiler analyzes the meaning of the
high-level language statement and generates
the machine language instructions to carry
out these actions
Groups
of
tokens
Code
Generator
Machine
language
10
The Compilation Process

Phase IV: Code optimization

The compiler takes the generated code and
sees whether it can be made more efficient
Machine
language
Code
Optimizer
Machine
language
11
Overall Execution Sequence on a High-Level
Language Program
12
The Compilation Process

Source program


Original high-level language program
Object program

Machine language translation of the source
program
13
Phase I: Lexical Analysis


Lexical analyzer

The program that performs lexical analysis

More commonly called a scanner
Job of lexical analyzer

Group input characters into tokens
• Tokens: Syntactical units that are treated as single,
indivisible entities for the purposes of translation

Classify tokens according to their type
14
Phase I: Lexical Analysis
Program statement
sum = sum + a[i];
Digital perspective:
tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;
Tokenized:
sum,=,sum,+,a[i],;
15
Phase I: Lexical Analysis
Typical Token Classifications
TOKEN TYPE
Symbol
Number
=
+
;
==
If
Else
(
)
[
]
…
CLASSIFICATION NUMBER
1
2
3
4
5
6
7
8
9
10
11
12
13
16
Phase I: Lexical Analysis

Lexical Analysis Process
1. Discard blanks, tabs, etc. - look for beginning of token.
2. Put characters together
3. Repeat step 2 until end of token
4. Classify and save token
5. Repeat steps 1-4 until end of statement
6. Repeat steps 1-5 until end of source code
Scanner
sum=sum+a[i];
sum
=
+
a
[
i
]
;
1
3
4
1
12
1
13
6
17
Phase I: Lexical Analysis

Input to a scanner
- A high-level language statement from the source
program

Scanner’s output
- A list of all the tokens in that statement
- The classification number of each token found
Scanner
sum=sum+a[i];
sum
=
+
a
[
i
]
;
1
3
4
1
12
1
13
6
18
Phase II: Parsing

Parsing phase

A compiler determines whether the tokens
recognized by the scanner are a syntactically
legal statement

Performed by a parser
19
Phase II: Parsing


Output of a parser

A parse tree, if such a tree exists

An error message, if a parse tree cannot be
constructed
Successful construction of a parse tree is proof that
the statement is correctly formed
20

Example

High-level language statement: a = b + c
21
Grammars, Languages, and
BNF

Syntax

The grammatical structure of the language

The parser must be given the syntax of the
language

BNF (Backus-Naur Form)
Most widely used notation for representing the syntax of a programming
language
literal_expression ::= integer_literal | float_literal
| string | character
22
Grammars, Languages, and
BNF

In BNF

The syntax of a language is specified as a set of
rules (also called productions)

A grammar
• The entire collection of rules for a language

Structure of an individual BNF rule
left-hand side ::= “definition”
23
Grammars, Languages, and
BNF

BNF rules use two types of objects on the righthand side of a production

Terminals
• The actual tokens of the language
• Never appear on the left-hand side of a BNF rule

Nonterminals
• Intermediate grammatical categories used to help
explain and organize the language
• Must appear on the left-hand side of one or more rules
24
Grammars, Languages, and
BNF


Goal symbol

The highest-level nonterminal

The nonterminal object that the parser is
trying to produce as it builds the parse tree
All nonterminals are written inside angle
brackets
Java BNF
25
BNF Example
<postal-address> ::= <name-part> <street-address> <zip-part>
<name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL>
| <personal-part> <name-part>
<personal-part> ::= <first-name> | <initial> "."
<street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL>
<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
<opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
Identify the following:
Goal symbol, terminals, nonterminals, a individual rule
Is this a legal postal address?
Steve Moses Sr.
215 Rose Ave.
Everywhere, NC 43563
26
Parsing Concepts and
Techniques

Fundamental rule of parsing:

By repeated applications of the rules of the
grammarIf the parser can convert the sequence of input
tokens into the goal symbol
the sequence of tokens is a syntactically valid
statement of the language
else
the sequence of tokens is not a syntactically
valid statement of the language
27
Is the following http address legal:
http://www.csm.astate.edu/~rossa/cs3543/bnf.html
Parsing Example
<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]
<hostport> ::= <host> [ : <port> ]
<host>
::= <hostname> | <hostnumber>
<hostname> ::= <ialpha> [ . <hostname> ]
<hostnumber> ::= <digits> . <digits> . <digits> . <digits>
<port>
::= <digits>
<path>
::= <void> | <xpalphas> [ / <path> ]
<search>
::= <xalphas> [ + <search> ]
<xalpha>
::= <alpha> | <digit> | <safe> | <extra> | <escape>
<xalphas>
::= <xalpha> [ <xalphas> ]
<xpalpha>
::= <xalpha> | +
<xpalphas> ::= <xpalpha> [ <xpalpha> ]
<ialpha>
::= <alpha> [ <xalphas> ]
<alpha>
::= a | b | … | z | A | B | … | Z
<digit>
::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<safe>
::= $ | - | _ | @ | . | & | ~
<extra>
::= ! | * | " | ' | ( | ) | : | ; | , | <space>
<escape>
::= % <hex> <hex>
<hex>
::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F
<digits>
::= <digit> [ <digits> ]
<void>
::=
28
Parsing Concepts and
Techniques

Look-ahead parsing algorithms - intelligent parsers

One of the biggest problems in building a compiler
is designing a grammar that:

Includes every valid statement that we want to be in
the language

Excludes every invalid statement that we do not want
to be in the language
29
Parsing Concepts and
Techniques

Another problem in constructing a compiler:
Designing a grammar that is not ambiguous

An ambiguous grammar allows the
construction of two or more distinct parse
trees for the same statement
NOT GOOD - multiple interpretations
30
Phase III: Semantics and Code
Generation

Semantic analysis

The compiler makes a first pass over the parse tree
to determine whether all branches of the tree are
semantically valid
• If they are valid
the compiler can generate machine language
instructions
else
there is a semantic error; machine language
instructions are not generated
31
Phase III: Semantics and Code
Generation

Semantic analysis

Syntactically correct, but semantically incorrect
example:
sum = a + b;
int a;
double sum;
char b;
Semantic records
data
typeinteger
mismatch
a
sum
double
b
char
32
Phase III: Semantics and Code
Generation

Semantic analysis
Parse tree
b
a
integer
char
<expression> + <expression>
Semantic record
Semantic record
<expression>
temp
?
Semantic record
33
Phase III: Semantics and Code
Generation

Semantic analysis
Parse tree
b
a
integer
integer
<expression> + <expression>
Semantic record
Semantic record
<expression>
temp
integer
Semantic record
34
Phase III: Semantics and Code
Generation

Code generation

Compiler makes a second pass over the
parse tree to produce the translated code
35
Phase IV: Code Optimization


Two types of optimization

Local

Global
Local optimization

The compiler looks at a very small block of
instructions and tries to determine how it can
improve the efficiency of this local code block

Relatively easy; included as part of most compilers:
36
Phase IV: Code Optimization

Examples of possible local optimizations

Constant evaluation
x = 1 + 1 ---> x = 2

Strength reduction
x = x * 2 ---> x = x + x

Eliminating unnecessary operations
37
Phase IV: Code Optimization


Global optimization

The compiler looks at large segments of the program
to decide how to improve performance

Much more difficult; usually omitted from all but the
most sophisticated and expensive production-level
“optimizing compilers”
Optimization cannot make an inefficient algorithm
efficient - “only makes an efficient algorithm more
efficient”
38
Summary

A compiler is a piece of system software that
translates high-level languages into machine
language

Goals of a compiler: Correctness and the production
of efficient and concise code

Source program: High-level language program
39
Summary

Object program: The machine language translation
of the source program

Phases of the compilation process

Phase I: Lexical analysis

Phase II: Parsing

Phase III: Semantic analysis and code generation

Phase IV: Code optimization
40
Descargar

Chapter 10: Compilers and Language Translation