Scope binding and Data
types
Kun-Yuan Hsieh
[email protected]
Programming Language Lab., NTHU
PLLab, NTHU,Cs2403 Programming Languages
Names
• Identify labels, variables, subprograms, parameters,
etc.
• Name length
–
–
–
–
–
Earliest PL: 1
FORTRAN I: max 6
CORBOL: max 30
FORTRAN 90 & ANSI C: max 31
Java, C++: no limit, implementers often impose one
• Case sensitive (e.g. StrLeng != strleng)
– C, C++, and Java are
– Problems: poor readability & writability
PLLab, NTHU,Cs2403 Programming Languages
Names (con’t)
• Keyword:
– word that is special only in certain contexts
(in FORTRAN)
REAL APPLE
REAL = 3.4
(in FORTRAN)
REAL INTEGER
INTEGER REAL
• Reserved word:
– word that cannot be used as a name
(in C) for, while, if, else
• Predefined words:
– words that have been defined by public libraries
(in C) printf, scan, createfile
PLLab, NTHU,Cs2403 Programming Languages
Variable (var)
• A var is an abstraction of a memory cell
• A var can be viewed as below:
(Name, Address, Value, Type, Lifetime, Scope)
• Name
– Referred to as identifier (id)
• Address
– The memory address with which it is associated
– May have different address at different times during
execution (e.g. local vars in a recursive subprogram)
– May have different address at different places in a
program (e.g. count defined in sub1 and sub2
PLLab, NTHU,Cs2403 Programming Languages
Variable
• Alias
– More than one variables are used to access the same
memory location
– (in C++)
&p = d ; (p and d are aliases)
• Type
– Determines the range of its values, and the set of ops
defined
– (in C) unsigned int count ; (range: 0 – 65535)
• Value
– The contents of the memory cell associated with the
name
PLLab, NTHU,Cs2403 Programming Languages
Binding
• Binding is an association between two attributes
– Ex:
•
•
•
•
int count ;
count = count + 5;
Type of count: bound at compile time
Value range of count: bound at compiler design time
Value of count: bound at execution time
Definition of + op: bound at language design time
PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic binding
• Static binding: it first occurs before run time
and remains unchanged in execution
• Dynamic binding: it first occurs during run
time or can change in execution
• Discussed issues: type & storage
– Type binding
• How is type specified?
• When does the binding take place?
– Storage binding
• When does the allocation take place?
PLLab, NTHU,Cs2403 Programming Languages
Static type binding
• Static type binding: explicit & implicit declaration
– Explicit declaration
• (in C) int A, B ; double C ;
– Implicit declaration
• (in FORTRAN) name begins with I, J, K, L, M, N are INTEGER,
others are REAL
• (C/C++) declaration & definition
–
–
–
–
declaration specifies types but do not cause allocation
external declaration is used for type checking
definition specifies types also causes allocation
only declares in header files but defines in C files
PLLab, NTHU,Cs2403 Programming Languages
Dynamic type binding
• Variable is bound to a type when assigned a
value in an assignment statement
– (in JavaScript) list = [10.2, 3.5]
list = count
• Advantage: generic programming
– process a list of data with any type
• Disadvantage: poor error detection
– i = x (int) and i = y (double)!!!
• Disadvantage: large cost
– storage of a variable must be of varying size
PLLab, NTHU,Cs2403 Programming Languages
Storage binding
• The lifetime of a variable is the time during
which it is bound to a specific memory
location
– static variables
– stack-dynamic variables
– explicit heap-dynamic variables
– implicit heap-dynamic variables
PLLab, NTHU,Cs2403 Programming Languages
Static variables
• Vars that bound to memory cells before
execution begins and unbound when execution
terminates
– static storage binding
– e.g. global variables
– (in C) static int count ; // history sensitive
• Advantages:
– code efficiency by direct addressing
– no allocation/deallocation during runtime
• Disadvantages:
– less flexibility : cannot support recursive
– no storage sharing
PLLab, NTHU,Cs2403 Programming Languages
Stack-dynamic variables
• Vars whose storage bindings are created
when declarations are executed
– dynamic storage binding
– static type binding
– allocated from the run-time stack
• Advantages:
– storage sharing, support recursive
• Disadvantages:
– overhead of allocation / deallocation
PLLab, NTHU,Cs2403 Programming Languages
Example I
virtual address space
#include <stdio.h>
int count;
main( ) {
int i ;
for (i=0; i<=10; i++)
{ test( ) ; }
}
test( ) {
int i ;
static int count = 0;
count = count + 1 ;
}
test::i
main::i
run-time
stack ptr
test::count
static var
ptr
count
PLLab, NTHU,Cs2403 Programming Languages
Example I
#include <stdio.h>
int count;
main( ) {
int i ;
for (i=0; i<=10; i++)
{ test( ) ; }
}
test( ) {
int i ;
static int count = 0;
count = count + 1 ;
}
Typebinding
Storagebinding
count
static
static
main::i
static
stackdynamic
test::i
static
stackdynamic
test::count
Static
static
PLLab, NTHU,Cs2403 Programming Languages
Explicit heap-dynamic
variables
• Vars that are allocated and deallocated by explicit
statements during run-time
–
–
–
–
Dynamic storage binding
Referenced through pointers or references
(in C) malloc( ) and free( ) (in C++) new and delete
Memory leak is a major cause of SW bugs
• Heap: a section of virtual address space reserved for
dynamic memory allocation
– heap fragmentation: a major cause of SW perf bugs
• Java objects are explicit heap-dynamic vars
– implicit garbage collection (no free or delete)
PLLab, NTHU,Cs2403 Programming Languages
Example II
virtual address space
#include <stdio.h>
main( ) {
int i ;
test( ) ;
}
test( ) {
int i ;
char *space;
static int count = 0;
space = (char *) malloc(64);
// memory leak
}
test::space
heap
test::i
main::i
run-time
stack ptr
test::count
static var
ptr
PLLab, NTHU,Cs2403 Programming Languages
Example II
virtual address space
#include <stdio.h>
main( ) {
int i ;
test( ) ;
}
test( ) {
int i ;
char *space;
static int count = 0;
space = (char *) malloc(64);
free(space);
}
heap
test::i
main::i
run-time
stack ptr
test::count
static var
ptr
PLLab, NTHU,Cs2403 Programming Languages
Implicit heap-dynamic variables
• Vars that are bound to heap storage only when
they are assigned values
– dynamic storage binding
– Strings and arrays in Perl and JavaScript
@names = { “door”, “windows”, “table”};
$names[3] = “bed”;
• Advantages:
– highest degree of flexibility
• Disadvantages:
– huge run-time overhead
– some loss of error detection by compiler
PLLab, NTHU,Cs2403 Programming Languages
Type checking
• The activity of ensuring that the operands of
an operator are of compatible types
– (int) A = (int) B + (real) C
– int + real  okay for the + operator
– (int + real) converts to (real + real)  type
coercion
– int = real  type error
• Static type checking: done in compile time
• Dynamic type checking: done in run time
PLLab, NTHU,Cs2403 Programming Languages
Strong typing
• Strong-typed: if type errors are always detected
– Yes (Ada, Java), No (Pascal, C, C++)
• Example: union in C: type error cannot be detected
typedef union { float A; int B; } Sample;
Sample data;
main( ) {
float R1 = 3.231;
int N1;
data.A = R1;
…..
N1 = data.B; // N1 = 1078905012
}
PLLab, NTHU,Cs2403 Programming Languages
Type compatibility
• Name type compatibility: two variables
declared with the same type name
– Easy to implement but less flexibility
type days = integer ; type months= integer ;
days N1; months N2;
– N1 and N2 are not compatible
• Structured type compatibility: two variables
have identical structures
– More flexible, but harder to implement
– N1 and N2 above are compatible
PLLab, NTHU,Cs2403 Programming Languages
Scope
• The range of statements over which it
is visible (it can be referenced)
• Static scope: determined in compile
time
– Search: based on declaration sequence
• Dynamic scope: determined in run-time
– Search: based on the call stack sequence
PLLab, NTHU,Cs2403 Programming Languages
Scope example
proc main
var X1, X2;
proc A
var X1, X2;
end
proc B
var X1, X2;
call A;
end
call B;
end
Proc A reference
environment
Static scope:
A  main
Dynamic scope:
A  B  main
PLLab, NTHU,Cs2403 Programming Languages
Static & dynamic scope
Procedure big;
var x : integer;
procedure sub1;
begin
…x…  (1)
end;
procedure sub2;
var x : integer;
begin
…x…  (2)
sub1;
end;
begin
sub1;  (3)
…x…  (4)
sub2;  (5)
end
Static scope:
(31) x = big’s x
(4) x = big’s x
(2) x = sub2’s x
(51) x = big’s x
Dynamic scope
(31) x = big’s x
(4) x = big’s x
(2) x = sub2’s x
(51) x = sub2’s x
PLLab, NTHU,Cs2403 Programming Languages
Static vs. dynamic scope
• Static scope
– easier to read
– reference statically resolved
– fast execution
• Dynamic scope
– harder to read
– reference dynamically resolved
– slow execution
PLLab, NTHU,Cs2403 Programming Languages
Block
• Block: a method of creating static scopes inside
program unit
• Global reference uses :: operator
(in C/C++)
for (…) {
int index;
index = temp;
….
temp = ::index;
}
PLLab, NTHU,Cs2403 Programming Languages
Lifetime & Scope
• Scope and lifetime are closely related but
different concepts
• Example in C
– static scope
– count scope only in
home( )
– count lifetime ends at
the return of office( )
void office( ) {
…
} /* end of office */
void home( ) {
int count;
…
office( );
} /* end of home */
PLLab, NTHU,Cs2403 Programming Languages
Named constants
• Named constant: variable that is bound to a
value only when it is bound to storage
–
–
–
–
–
value cannot be changed later
e.g. final in Java, const in C or C++
readability & modifiability
different from #define (no storage allocated)
Ex:
(in C++)
void students( ) {
const int count = 100;
for (i = 0; i < count; i++) total += scores[i];
}
PLLab, NTHU,Cs2403 Programming Languages
Data types
PLLab, NTHU,Cs2403 Programming Languages
Primitive data types
• Those are not defined in terms of other
data types
• Numeric types
– Integer
– Floating-point
– Decimal
• Boolean types
• Character types
PLLab, NTHU,Cs2403 Programming Languages
Numeric Types
• Integer
– Several sizes
– Byte, short, int, long…
– Range limited
• Floating point
– Approximation to model real numbers
IEEE float: exp fraction (23)
fraction (52)
IEEE double: exp (11)
PLLab, NTHU,Cs2403 Programming Languages
Numeric Types
• Decimal
–
–
–
–
–
For business applications
Store a fixed number of decimal digits (coded)
Advantage: accuracy
Disadvantages: limited range, wastes memory
e.g. 924.1 in an 6-byte data w/ a 4-byte integer
part
00 00 09 24 10 00
PLLab, NTHU,Cs2403 Programming Languages
Boolean & character types
•
Boolean
– Could be implemented as bits, but often
as bytes
– Advantage: readability
•
Character
– ASCII: one byte per char (0 – 127)
– Unicode: 2 bytes per char (multilanguage support)
PLLab, NTHU,Cs2403 Programming Languages
Character String Types
• Values are sequences of characters
• C and C++
– Not primitive
– Use char arrays and a library of functions
that provide operations
PLLab, NTHU,Cs2403 Programming Languages
User-defined ordinal types
• Ordinal type: the range of possible values can be
easily associated with positive integers
– enumeration & subrange
• Enumeration type:
– (C++)
enum colors {red, blue, yellow};
colors mycolor = blue;
– C and C++: are implicitly converted to integer
mycolor++ assigns green to mycolor
– Readability: code is more meaningful
– Reliability: compiler can check operations and ranges
PLLab, NTHU,Cs2403 Programming Languages
Subrange types
• A contiguous subsequence of an ordinal type
– (Pascal) type uppercase = ‘A’ .. ‘Z’;
index = 1..100;
• Subrange types are different from derived types
(Ada)
type Cash is new INTEGER range 1..100;
subtype Score is INTEGER range 1..100;
– type Cash (derived) is not compatible with any
INTEGER type
– type Score (subrange) is compatible with any
INTEGER type
PLLab, NTHU,Cs2403 Programming Languages
Array types
• An aggregate of homogeneous data elements
• Indexing: mapping to an element
– Subscript range checking:
• C, C++, FORTRAN: no range checking
• Pascal, Ada, and Java: yes
– Subscript type:
• FORTRAN, C, Java: integer only
• Pascal: any ordinal type (int, char, boolean, enum)
– # of subscripts:
• FORTRAN I: 3 (max)
• FORTRAN 77: 7 (max)
• Others: no limit
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4)
• Static array
– storage: statically bound
– subscript range: statically
bound
– e.g. global arrays in C/C++
• Fixed stack-dynamic
array
– storage: dynamically bound
– subscript range: statically
bound
– e.g. local arrays in C/C++
#include <stdio.h>
int data[10];
main( ) {
…
}
#include <stdio.h>
int data[10];
main( ) {
int bounds[20];
…
}
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t)
• Stack-dynamic array
– storage: dynamic bound
– subscript range: dynamic
– only bound once
(Ada) GET(LEN)
declare
LIST : array (1..LEN) of
integer
begin
…
end;
• Heap-dynamic array
– both: dynamic bound
– bound multiple times
– e.g. (FORTRAN 90)
INTEGER, ALLOCTABLE, ARRAY
(:,:) :: MAT
(MAT is 2-dim array)
ALLOCATE(MAT(10, 5))
(MAT has 10 rows, 5 cols)
DEALLOCATE MAT
(deallocate MAT’s storage)
PLLab, NTHU,Cs2403 Programming Languages
Array categories (4) (con’t)
• Heap-dynamic array in • Heap-dynamic array in
C/C++
Perl/JavaScript
sample( ) {
int bounds*, A;
bound = (int *) malloc(40);
A = bound[3];
free(bound);
bound = (int *) malloc(80);
A = bound[10];
free(bound);
}
@home = {“couch”, “chair”, “table”,
“stove” };
// $home[0] = “couch”
// $home[1] = “chair”
// $home[2] = “table”
// $home[3] = “stove”
$home[4] = “bed”;
PLLab, NTHU,Cs2403 Programming Languages
Array implementation
• One-dim array address calculation
– addr(a[j]) = addr(a[0]) + j * elementSize;
– Ex:
int bounds[10]; // one int = 4 bytes
&bounds[0] = 1200  &bounds[4] = 1216
• Row-major allocation
– addr(a[i,j]) = addr(a[0,0]) + ((i * n) + j) * elementSize;
– ex:
a[0, ] = 3 4 7
a[1, ] = 6 2 5 
3, 4, 7, 6, 2, 5, 1, 3, 8
a[2, ] = 1 3 8
&a[0,0] = 1200  &a[1,2] = 1220
PLLab, NTHU,Cs2403 Programming Languages
Column-major allocation
• Column-major allocation
– addr(a[i, j]) = addr(a[0,0]) + ((j * n) + i) * elementSize;
– Ex:
a[0, ] = 3
a[1, ] = 6
a[2, ] = 1
4
2
3
7
5
8
 3, 6, 1, 4, 2, 3, 7, 5, 8
&a[0,0] = 1200  &a[1,2] = 1228
PLLab, NTHU,Cs2403 Programming Languages
Associative arrays
• an unordered collection of data elements
that are indexed by an equal # of keys
• example in Perl
%salaries = { “John” => 5000, “Tom” => 3500, “Mary” => 6000};
$salaries{“Perry”} = 6500; { add an entry into this array }
tax = $salaries{“John”};
{tax is assigned with 5000}
delete $salaries{“Tom”}; {remove the entry with key “Tom”}
@salaries = ();
{remove all entries in this array}
PLLab, NTHU,Cs2403 Programming Languages
Record types
• A heterogeneous aggregate of data elements
identified by names
• Ex:
(Ada)
EMPLOYEE_RECORD :
record
EMPLOYEE_NAME :
record
FIRST : STRING (1..20);
LAST : STRING (1..20);
end record;
SALARY : FLOAT;
end record
PLLab, NTHU,Cs2403 Programming Languages
Record types (con’t)
• C/C++: struct
• Field reference
– COBOL: field OF Name1 OF … OF NameN
– Others: Name1.Name2. … .NameN.field
– Pascal and Module-2 provide with clause
with employee do
begin
name := ‘Bob’;
age := 42;
end
• Implementation: field are stored in adjacent memory, and
accessed through offset address
PLLab, NTHU,Cs2403 Programming Languages
Array vs. Record
•
•
•
•
Array: same type of data
int data[100];
A = data[35];
&data[35] = 1340
• Record: diff type of data
• struct {
char name[32];
int age;
float salary; } dd;
• A = dd.age;
• &dd.age = 1632
data
size
lower
upper
dd
name
age
salary
PLLab, NTHU,Cs2403 Programming Languages
1200
4
0
0
1600
0
32
36
Union types
• Whose variables are
allowed to store diff type
values at diff time
– FORTRAN:
EQUIVALENCE
– C and C++: union (free
union: free from type
checking)
data
B
typedef union
{
float A; short B;
} Sample;
Sample data;
main( ) {
float R1 = 3.231;
short N1;
data.A = R1;
N1 = data.B;
// N1 = error number
}
A
PLLab, NTHU,Cs2403 Programming Languages
Pointer type
• Usage: indirect addressing & dynamic storage
management
• Operator: * (C, C++), access (Ada), ^ (Pascal)
• Problems:
– Dangling pointer: points to a heap-dynamic var that has
been deallocated
– Memory leakage (aka lost heap-dynamic var): an allocated
var that is no longer accessible
– These 2 problems cause a large part of bugs in software
int *p1, *p2;
p1 = (int) malloc(sizeof(int));
p2 = p1;
delete p1;
value = *p2;
int *p1, *p2;
p1 = (int) malloc(sizeof(int));
p2 = (int) malloc(sizeof(int));
p1 = (int) malloc(sizeof(int));
delete p1;
PLLab, NTHU,Cs2403 Programming Languages
Solution to pointer problems
• Tombstone approach
– Each dynamic storage has a special cell, called
tombstone, that points to this storage
– Pointer(s) point to the tombstone
– Deallocate: set tombstone to NULL
– Disads: cost in time & space as tombstone is never
deallocated and reused
Heap
Heap
dynamic
variable
0
tombstone
dangling pointers
PLLab, NTHU,Cs2403 Programming Languages
dynamic
variable
Solution to pointer problems
• Locks-and-keys approach
– Pointer value represented by (key, address)
– Dynamic storage represented by (lock,
storage)
– Each access will first check if (key == lock),
unmatched pair causes run-time error.
– Deallocation: lock  invalid value
PLLab, NTHU,Cs2403 Programming Languages
Heap management for
reclaiming garbage
• Reference counters:
–
–
–
–
maintaining a counter in every cell
reclaim if (counter == 0)
Space required is significant
Execution time overhead for mantaining the counter
• Garbage collection:
– Run-time system allocated storage
– Garbage collection if no available cells
– 3 phases:
• Invalidate all cell
• Trace pointer to mark valid ones
• Sweep the invalid to available list
PLLab, NTHU,Cs2403 Programming Languages
Descargar

Document