SCANNERS AND NAMES
Outline
2

Names in programming Languages
 Identifiers
& variables
 Keywords and reserved words

Scanner
 Tokens
and Lexemes
 Regular expressions and finite automata
 Lookahead and backtracking
 Implementing a scanner
 Symbol table
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Names in programming Languages
3

Identifiers & variables: Names
 Patterns
 Case-sensitivity
 Length

Keywords and reserved words
 Can/cannot
be used as identifiers
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Attributes of Variables
4


Name
Address: L-value

 Indicates
range of values,
and operators
 Type declaration
 One
variable – different
address
 Explicit
 At
different time
 At different place in program
 Two
variables – same
address: Aliase
 Caused

by pointers
Value: R-value
 Content
stored in memory
Type:
 Implicit

Lifetime
 Determined

by binding
Scope
 Where
the variable is
visible
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Variable Bindings
5

Binding = association between a variable and its
attributes
 Static
binding: occurs before execution and stays the same.
 Dynamic binding: occurs during execution and can be
changed.

Type binding
 Static
type binding: C, Java, Pascal
 Dynamic type binding: PHP, JavaScript

Storage binding determines lifetime of variables.
 Static,
Stack-dynamic, heap-dynamic variables
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Scanners
6
Sometimes called a lexical analyzer
 What a scanner does:

 Get
a stream of characters (source program)
 Divide it into tokens
 Tokens are units that are meaningful in the source
language.
 Lexemes are strings which match the patterns of
tokens.
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Lexemes and Tokens
7
Tokens
Lexemes
identifier
Age, grade,Temp, zone, q1
number
3.1416, -498127,987.76412097
string
“A cat sat on a mat.”, “90183654”
open parentheses
(
close parentheses
)
Semicolon
;
reserved word if
IF, if, If, iF
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
What a scanner does
8

When a token is found:
 It
is passed to the next phase of compiler.
 Sometimes attributes associated to the token need to
be calculated.
 Variables and constants, together with their attributes,
must be stored in the symbol/literal table.
 it

is necessary to check if the token is already in the table.
Symbol and literal tables
 Accessed
in many phases of compilers.
 Efficient data structure is needed (usually use hash table).
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
How to construct a scanner
9
Define tokens in the source language.
 Describe the patterns allowed for tokens.
 Write regular expressions describing the patterns.
 Construct an FA for each pattern.
 Combine all FA’s which results in an NFA.
 Convert NFA into DFA
 Write a program simulating the DFA.

2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Regular Expression
10
a character or symbol in the alphabet
  : an empty string
  : an empty set
 if r and s are regular expressions, the followings
are regular expressions.

r
|s
r s
r *
 (r )
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Extension of regular expr.
11

[a-z]
 any

.
 any

character
r+
 one

character in a range from a to z
or more repetition
r?
 optional

subexpression
~(a | b | c), [^abc]
 any
single character NOT in the set
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Examples of Patterns
12

(a | A) = the set {a, A}

[0-9]+ = (0 |1 |...| 9) (0 |1 |...| 9)*

(0-9)? = (0 | 1 |...| 9 |  )

[A-Za-z] = (A |B |...| Z |a |b |...| z)

A . = the string with A following by any one symbol

~[0-9] = [^0123456789] = any character which is not 0,
1, ..., 9
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Describing Patterns of Tokens
13






reservedIF = {IF, if, If, iF}
letter = [a-zA-Z]
digit =[0-9]
identifier = letter (letter|digit)*
numeric = (+|-)? digit+ (. digit+)? (E (+|-)? digit+)?
Comments
{
(~})* }
 /* ([^*]*[^/]*)* */
 ;(~newline)* newline
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Disambiguating Rules
14

IF is an identifier or a reserved word?
A
reserved word cannot be used as identifier.
 A keyword can also be identifier.

<= is < and = or <=?
 Principle
of longest substring
 When
a string can be either a single token or a
sequence of tokens, single-token interpretation is
preferred.
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Identifiers
15
letter
letter,digit
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Numbers
16
E
+,-,e
digit
digit
.
digit
E
+,-,e
digit
2301380 Chapter 3 Scanners and Names in Programming Languages
digit
digit
03/10/58
Comments
17
~/
/
*
*
/
~*
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Combining Finite Automata
18
E,e
L, l
I,i
E,e
I, i
E, e
S,s
F,f
L,l
S,s
E,e
F, f
other letter
letter,digit
letter
letter,digit
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Lookahead and backtracking
19
I,i
F,f
letter,
digit
[other]
2301380 Chapter 3 Scanners and Names in Programming Languages
Return ID
Return IF
03/10/58
DFA with lookahead
20
I,i
E,e
F,f
L,l
[other]
S,s
E,e
[other]
Return IF
[other]
Return ID
letter,digit
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Return ELSE
Implementing a Scanner with Nested If
21
switch (state)
{ case 0:
{ if isletter(nxt)
state=1;
elseif isdigit(nxt)
state=2;
else state=3;
break;
}
case 1:
{ if isletVdig(nxt)
state=1;
else state=4;
break;
}
…
}
letter, digit
1 other
0
digit 2
3
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
…
…
4
Implementing a Scanner with Transition Table
22
state
0 1 2 3 …
initialize currentState=start
letter
1 1 .. ..
while(not final(currentState))
digit
2 1 .. ..
{ next_state =
…
..
dfa(current_state, next)
3 4
0 1 2 3 …
current_state = next_state
}
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Symbol Table
23


Stores attributes of identifiers and literals.
In symbol tables, associated to variables are:
Name
 Address
 Type

For arrays, dimension and size are also needed.
 For structures or records, positions are needed.


In symbol tables, associated to literals are:
Address
 Type

For arrays, dimension and size are also needed.
 For structures or records, positions are needed.

2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Buffering
24




A compiler reads a block of characters from an input
file, and stores in a buffer.
It is inefficient to read from the input file, one character
at a time.
Managing the buffer is also important.
Buffer size reflects some restrictions in programming
languages.
 Name
length
 String length
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Data Structure for Symbol Tables
25




Symbol tables are used in most parts of compilers.
An efficient data structure is required.
Mostly, hash table are used.
Other data structures can also be used.
2301380 Chapter 3 Scanners and Names in Programming Languages
03/10/58
Descargar

introduction