Kernfach
System Software
WS04/05
P. Reali
M. Corti
1
Introduction
Admin

Lecture
–
–

Mo 13-14
We 10-12
IFW A 36
IFW A 36
Exercises
–
Always on Thursday
14-15
14-15
15-16
15-16
16-17
16-17
IFW A34
IFW C42
IFW A32.1
RZ F21
IFW A34
IFW A32.1
C. Tuduce
V. Naoumov
I. Chihaia
C. Tuduce
T. Frey
K. Skoupý
(E)
(E)
(E)
(E)
(E)
(E)
2
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Additional Info

Internet
–
–

Homepage http://www.cs.inf.ethz.ch/ssw/
Inforum
vis site
Textbooks & Co.
–
–
–
–
Lecture Slides
A. Tanenbaum, Modern Operating Systems
Silberschatz / Gavin, Operating Systems Concepts
Selected articles and book chapters
3
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Exercises

Exercises are optional
(feel free to shoot yourself in the foot)
–
Weekly paper exercises
test the knowledge acquired in the lecture
identify troubles early
exercise questions are similar to the exam ones
–
Monthly programming assignment
feel the gap between theory and practice
4
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Exam



Sometimes in March 2005
Written, 3 hours
Allowed help
–
–

2 A4 page summary
calculator
Official Q&A session 2 weeks before the exam
5
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Lecture Goals

Operating System Concepts
–
–
–
–
bottom-up approach
no operating system course
learn most important concepts
feel the complexity of operating systems


Basic knowledge for other lectures / term assignments
–
–
–
6
there‘s no silver-bullet!
–
Compilerbau
Component Software
....
OS-related assignments
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
What is an operating system?
An operating system has two goals:
 Provide an abstraction of the hardware
–
–
–

ABI (application binary interface)
API (application programming interface)
hide details
Manage resources
–
–
time and space multiplexing
resource protection
7
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Operating system target machines
Targets
 mainframes
 servers
 multiprocessors
 desktops
 real-time systems
 embedded systems
Different goals and
requirements!
 memory efficiency
 reaction time
 abstraction level
 resources
 security
 ...
8
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Memory vs. Speed Tradeoff
Example: retrieve a list of names
memory time
1. Array
Nn
N
2. List
N(n+4) N/2
3. Bin. Tree
N(n+8) log(N)
4. Hash Table 3Nn
1
N = # names
n = name length
9
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Operating System as resource manager
... in the beginning
was the hardware!
Most relevant
resources:
 CPU
 Memory
 Storage
 Network
10
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Abstraction level
Lecture Topics
Virtual Machine
Process
Distributed
Object-System
Thread
Coroutine
Scheduling
Concurrency
Support
Object-Oriented
Runtime Support
Distributed
File-System
Garbage Collection
Memory Management
Demand Paging
Virtual Memory
11
File System
Runtime support
CPU
© P. Reali / M. Corti
Memory
Disk
Network
System-Software WS 04/05
Introduction
A word of warning....
Most of the topics may seem simple.....
.... and in fact they are!
Problems are mostly due to:
 complexity when integrating system
 low-level („bit fiddling“) details
 bootstrapping (X needs Y, Y needs X)
12
© P. Reali / M. Corti
System-Software WS 04/05
Introduction
Bootstrapping (Aos)
Timers
SMP
Active
Traps
Interrupts
Modules
Storage
Level
Memory
13
© P. Reali / M. Corti
Locks
Processor
System-Software WS 04/05
Introduction
Lecture Topics
Runtime Support

Virtual Addressing

Memory Management

Distributed Obj. System

Concurrency
Concurrency

Disc / Filesystem

Case Study: JVM
Jan‘05


Feb‘05
Overview
Dec‘04 Nov‘04 Oct‘04

14
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Overview

Support for programming abstractions
–
Procedures


–
Object-Oriented Model


–
–
calling conventions
parameters
objects
methods (dynamic dispatching)
Exceptions Handling
... more ...
15
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Application Binary Interface (ABI)
Object a, b, c, … with methods P, Q, R, … and
internal procedures p, q, r, …
Stack
Call Sequence
16
Call a.P
Stack
Call b.Q
Pointer (SP)
Call b.q 1
Call b.q 2
Return b.q
Return b.q 3
Procedure
Call c.R
Return c.R 4 Activation
Frame (PAF)
Return b.Q
a.P
© P. Reali /Return
M. Corti
b.q
c.R
b.q
b.q
b.Q
b.Q
b.Q
b.Q
a.P
a.P
a.P
a.P
1
2
3System-Software4WS 04/05
Run-time Support
Procedure Activation Frame
Frame
Pointer (FP)
© P. Reali / M. Corti
Return
Remove Locals
Restore FP
Restore PC
Remove Parameters
Restore Registers
Caller
17
FP‘
PC
params
Callee
Caller
Frame
locals
Caller
Stack
Pointer (SP)
Save Registers
Call Push Parameters
Save PC
Branch
Dynamic
Save FP
Link
FP := SP
Allocate Locals
System-Software WS 04/05
Run-time Support
Procedure Activation Frame, Optimizations

Many optimizations are possible
–
–
–
–
use registers instead of stack
register windows
procedure inlining
use SP instead of FP addressing
18
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Procedure Activation Frame (Oberon / x86)
Caller
Callee
push params
call P
push pc
pc := P
...
push fp
mov fp, sp
sub sp, size(locals)
mov
pop
ret
sp, fp
fp
size(params)
pop pc
add
sp,size(params)
19
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Calling Convention

Convention between caller and callee
–
how are parameters passed




–
stack layout


–
data layout
left-to-right, right-to-left
registers
register window
dynamic link
static link
register saving

reserved registers
20
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Calling Convention (Oberon)

Parameter passing:
–
–
–
–

Stack
–
–

21
on stack (exception: Oberon/PPC uses registers)
left-to-right
self (methods only) as last parameter
structs and arrays passed as reference, value-parameters
copied by the callee
dynamic link
static link as last parameter (for local procedures)
Registers
–
saved by caller
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Calling Convention (C)

Parameter passing:
–
–
–

Stack
–

on stack
right-to-left
arrays passed as reference (arrays are pointers!)
dynamic link
Registers
–
some saved by caller
22
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Calling Convention (Java)

Parameter passing
–
–
–
–
–
left-to-right
self as first parameter
parameters pushed as operands
parameters accessed as locals
access through symbolic, type-safe operations
23
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Object Oriented Support, Definitions
Class Hierarchy
ObjA
ObjB
Obj x = new ObjA();
• static type of x is Obj
• dynamic type of x is ObjA
x compiled as being compatible with
Obj, but executes as ObjA.
Obj
Obj0
24
© P. Reali / M. Corti
Polymorphism
static and dynamic type can be different
 the system must keep track of the
dynamic type with an hidden
„type descriptor“
System-Software WS 04/05
Run-Time Support
Polymorphism
Type is
statically
known!
Type is
discovered
at runtime!
25
© P. Reali / M. Corti
VAR
t: Triangle;
s: Square;
o: Figure;
BEGIN
t.Draw();
s.Draw();
o.Draw();
END;
WHILE p # NIL DO
p.Draw(); p := p.next
END;
System-Software WS 04/05
Run-time Support
Object Oriented Support, Definitions
Class Hierarchy
ObjA
ObjB
Obj
Obj0
Obj x = new ObjA();
if (x IS ObjA) { ... }
// type test
ObjA y = (ObjA)x
// type cast
x = y;
// type coercion
// (automatic convertion)
26
© P. Reali / M. Corti
System-Software WS 04/05
Run-time Support
Object Oriented Support (High-level Java)
Type Test Implementation
.... a IS T ....
27
© P. Reali / M. Corti
if (a != null) {
Class c = a.getClass();
while ((c != null) && (c != T)) {
c = c.getSuperclass();
}
return c == T;
} else {
return false;
}
System-Software WS 04/05
Run-Time Support
Type Descriptors
struct TypeDescriptor {
int
level;
type[]
extensions;
method[] methods;
}
class Object {
TypeDescriptor
}


many type-descriptor
layouts are possible
layout depends on the
optimizations choosen
type;
28
© P. Reali / M. Corti
System-Software WS 04/05
“extension level”
Run-Time Support
Type Tests and Casts
TD(ObjA)
3: NIL
2: ObjA
1: Obj
0: Obj0
TD(Obj)
3: NIL
2: NIL
1: Obj
0: Obj0
ObjA
TD(Obj0)
3: NIL
2: NIL
1: NIL
0: Obj0
ObjB
Obj
1
Obj0
0
(obj IS T)
obj.type.extension[ T.level ] = T
29
© P. Reali / M. Corti
2
mov EAX, obj
mov EAX, -4[EAX]
cmp T, -4 * T.level - 8[EAX]
bne ....
System-Software WS 04/05
Run-time Support
Object Oriented Support (High-level Java)
Method Call Implementation
.... a.M(.....) ....
30
Class[] parTypes = new Class[params.Length()];
for (int i=0; i< params.Length(); i++) {
parTypes[i] = params[i].getClass();
}
Class c = a.getClass();
Method m = c.getDeclaredMethod(“M”, parTypes);
res = m.invoke(self, parValues);
© P. Reali / M. Corti
Use method
implementation
for the actual
class
(dynamic type)
System-Software WS 04/05
Run-Time Support
Handlers / Function Pointers
TYPE
SomeType = POINTER TO SomeTypeDesc;
Handler = PROCEDURE (self: SomeType; param: Par);
SomeTypeDesc = RECORD
handler: Handler;
next: SomeType;
END
root
31
Disadvantages:
• memory usage
• bad integration (explicit self) handler
next
• non constant
Advantages:
• instance bound
• can be changed at run-time
© P. Reali / M. Corti
PROC R
handler
next
PROC Q
handler
next
System-Software WS 04/05
Run-Time Support
Idea:
have a per-type table
of function pointers.
Method tables (vtables)
TYPE
A = OBJECT
PROCEDURE M0;
PROCEDURE M1;
END A;
B = OBJECT (A)
PROCEDURE M0;
PROCEDURE M2;
END B;
32
B.M0 overrides
A.M0
B.M2
is new
A.MethodTable
0: A.M0
1: A.M1
B.MethodTable
0: A.M0
1: A.M1
2: B.M2
• New methods add a new entry in the method table
• Overrides replace an entry in the method table
• Each method has an unique entry number
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Method tables
TYPE
A = OBJECT
PROCEDURE M0;
PROCEDURE M1;
END A;
A.MethodTable
0: A.M0
1: A.M1
B.MethodTable
0: A.M0
B.M0
B = OBJECT (A)
PROCEDURE M0; 1: A.M1
PROCEDURE M2; 2: B.M2
END B;
o
33
Type
Virtual Dispatch
o.M0;
call o.Type.Methods[0]
mov eax, VALUE(o)
mov eax, type[eax]
mov eax, off + 4*mno[eax]
call eax
Fields
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Oberon Type Descriptors
type desc
td size
type name
• method table
• superclass table
• pointers in object for GC
mth table
for method invocation
ext table
type descriptor is
also an object!
for type checks
type desc
type desc
obj size
obj fields
ptr offsets
for object allocation
for garbage collection
34
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interfaces, itables
interface A {
void m();
}
interface B {
void p();
}
does x
implement A?
Object x;
A y = (A)x;
y.m();
x has an method table
(itable) for each
implemented interface
multiple itables:
how is the right itable
discovered?
35
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interface support
How to retrieve the right method table (if any)?
 Global table indexed by [class, interface]
 Local (per type) table / list indexed by
[interface]

Many optimizations are available
use the usual trick:
enumerate
interfaces
36
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interface support (I)
Call is expensive
because requires
traversing a list:
O(N) complexity
Type Descriptor
interfaces
Intf0
Intf7
method
table
(vtable)
method
table
(itable)
method
table
(itable)
Intf0 y = (Intf0)x;
y.M();
37
© P. Reali / M. Corti
interface i = x.type.interfaces;
while ((i != null) && (i != Intf0) {
i = i.next;
}
if (i != null) i.method[mth_nr]();
System-Software WS 04/05
Run-Time Support
Interface support (II)
Lookup is fast
(O(1)), but wastes
memory
Type Descriptor
interfaces
vtable
0 1 2 3 4 5 6 7
sparse array!
Intf0 y = (Intf0)x;
y.M();
itable2
interface i = x.type.interfaces[Intf0];
itable7
if (i != null) i.method[mth_nr]();
38
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interface Implementation (III)
overlap
interface table
index
Type Descriptor t
interfaces
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Type Descriptor u
interfaces
vtablet
vtablet
itableu,2
itablet,2
itableu,0
39
itablet,7
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interface Implementation (III)
Type Descriptor
interfaces
vtable
itable
itable
overlapped
interface table index
Type Descriptor
interfaces
vtable
itable
itable
40
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Interface Implementation (III)
Type Descriptor
interfaces
overlapped
interface
tables
Intf0 y = (Intf0)x;
y.M();
vtable
itable i = x.type.interfaces[Intf0];
itable
if ((i != null) && (i in x.type))
i.method[mth_nr]();
itable
itable
41
itable
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Exceptions
void catchOne() {
try {
tryItOut();
} catch (TestExc e) {
handleExc(e);
}
}
void catchOne()
0 aload_0
1 invokevirtual tryItOut();
4 return
5 astore_1
6 aload_0
7 aload_1
8 invokevirtual handleExc
11 return
ExceptionTable
From To
Target Type
0
4
5
TestExc
42
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Exception Handling / Zero Overhead
try {
.....
} catch (Exp1 e) {
.....
} catch (Exp2 e) {
.....
}
43
pcstart
pcend
void ExceptionHandler(state)
{
pc = state.pc, exc = state.exception;
while (!Match(table[i], pc, exc))
{
i++;
if (i == TableLength) {
PopActivationFrame(state);
pc = state.pc; i = 0;
}
}
state.pc = table[i].pchandler;
ResumeExecution(state)
pchandler1
pchandler2
Global Exception Table
start end exception handler
pcstart pcend Exp1 pchandler1
pcstart pcend Exp2 pchandler2
}
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Exception Handling / Zero Overhead



exception table filled by the loader / linker
traverse whole table for each stack frame
system has default handler for uncatched
exceptions
system optimized
for normal case


no exceptions => no overhead
exception case is expensive
44
© P. Reali / M. Corti
System-Software WS 04/05
push catch
descriptors on
the stack
Run-Time Support
Exception Handling / Fast Handling
try {
.....
} catch (Exp1 e) {
.....
} catch (Exp2 e) {
.....
}
add code
instrumentation
pchandler1
pchandler2
use an
exception stack to
keep track of
the handlers
45
© P. Reali / M. Corti
try {
save (FP, SP, Exp1, pchandler1)
save (FP, SP, Exp2, pchandler2)
.....
remove catch descr.
jump end
} catch (Exp1 e) {
.....
remove catch descr.
jump end
} catch (Exp2 e) {
.....
remove catch descr.
jump end
}
System-Software WS 04/05
end:
Run-Time Support
Exception Handling / Fast Handling
void ExceptionHandler(ThreadState state)
{
int FP, SP, handler;
Exception e;
pop next exception
descriptor from
exception stack
do{
retrieve(FP, SP, e, handler);
} while (!Match(state.exp, e));
state.fp = FP;
// set frame to the one
state.sp = SP;
// containing the handler
state.pc = handler; // resume with the handler
ResumeExecution(state)
can resume in a
different activation
frame
}
46
© P. Reali / M. Corti
System-Software WS 04/05
Run-Time Support
Exception Handling / Fast Handling



code instrumentation
insert exception descriptor at try
remove descriptor before catch
system optimized
for exception case


fast exception handling
overhead even when no exceptions
47
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Overview



Virtual Addressing: abstraction of the MMU
(Memory Management Unit)
Work with virtual addresses, where
addressreal = f(addressvirtual)
Provides decoupling from real memory
–
–
–
virtual memory
demand paging
separated address spaces
48
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Pages
programs use and
run in this
address spaces
Memory as array of pages
page
frame
7
6
5
4
3
2
1
0
unmapped
(invalid) page
virtual address-space 2
unmapped
range
5
3
0
page
1
2
0
real memory:
pool of page frames
memory address
virtual address-space 1
49
mapping
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Page mapping
Virtual Address  Real Address
Virtual Address
page-no
off
TLB
(PT, VA, RA)
(PT, VA, RA)
(PT, VA, RA)
MMU
frame
page-no
Page Table Ptr
50
Register
© P. Reali / M. Corti
Page Table
Real Address
frame
off
Page Frame
Translation
Lookaside Buffer
Associative Cache
off
frame
Real Memory
System-Software WS 04/05
Virtual Addressing
Definitions
page
page frame
page table
page fault
working set
smallest unit in the virtual address space
unit in the physical memory
table mapping pages into page frames
access to a non-mapped page
pages a process is currently using
51
© P. Reali / M. Corti
System-Software WS 04/05
64 bit
Address
Space
Virtual Addressing
Alternate Page Mapping
1. Level Table
2. Level Table
Multilevel page tables
 Multipart Virtual Address
 Page table as (B*-)Tree
pno1 pno2
Inverted Page-Table
Process
N
pr, vp
Hashtable
© P. Reali / M. Corti
vp
Next
probe
pf
pr
vp
pf
unassigned
pr
pf
vp
Hash
52
pr
1
0
off
pf
pr
vp
pf
pr
unassigned
vp
pf
System-Software WS 04/05
Virtual Addressing
What for?

Decoupling from real memory
–
–
–
virtual memory (cheat: use more virtual memory than the
available real memory)
dynamically allocated contiguous memory blocks (for
multiple stacks in multitasking systems)
some optimizations



null reference checks
garbage collection (using dirty flag)
Virtual Addressing is not for free!
–
–
address mapping may require additional memory accesses
page table takes space
53
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Virtual Memory
 Use secondary storage (disc) to keep currently unused
pages (swapping)
 Page table usually keeps some per-page flag
 invalid
 referenced
 dirty
page not mapped
page has been referenced
page has been modified
 Accessing an invalid page causes a page-fault
interrupt
54
 select page frame to be swapped out (victim or candidate)
 swap-in requested page frame
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Virtual Memory / Demand Paging
requested
page
Real Memory
“Page-out”
Disc
“Page-in”
victim
set to invalid
55
Page Table
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Demand Paging Sequence
TLB
IF VA IN TLB THEN RETURN RA
ELSE Access Page Table;
IF Page invalid THEN Page-Fault
ELSE RETURN RA
END
END
56
Expected time to translate
VA into RA
E[t] = PTLB * tTLB +
PPT * tPT +
Pdisc * tdisc
© P. Reali / M. Corti
MMU
OS
PageFault
Handler
IF Free Page Frame exists THEN
Assign frame to VA
ELSE Search victim page;
IF victim page modified THEN
page-out to secondary storage
END;
Invalidate victim page;
Assign frame to VA
END;
Page-in from secondary storage;
Reset invalid flag
System-Software WS 04/05
Virtual Addressing
Example
Page size
Address size
4 KB
32 Bits
page offset:
12 Bits (4KB = 212)
page number: 20 Bits (32 - 12)
addressable memory: 232 = 4GB
page table size: 220 * 32 Bits = 4 MB
Real Memory 128 MB
page table overhead: ca. 3%
57
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Example
PTLB
TLB
mov EAX, @Addr
1-Ppage
1-PTLB
Page
Table
Memory
fault
Ppag
Disc
1 disc read
1 disc write
1 memory read
e
fault
E[t] = PTLB tTLB + (1- PTLB)(tPT + PPF tdisc + (1-PPF)tmem)
58
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Demand Paging: Page Replacement
Optimal Strategy (Longest Unused)
 Take the page, that will remain unused for the
longest time
 Requires oracle
NRU: ”Not Recently Used”
59
 Reset the referenced flag at each
tick
 Create page categories (good
candidate to bad candidate)
 choose best candidate
© P. Reali / M. Corti
Pref ref mod
3
0
0
2
0
1
1
1
0
0
1
1
System-Software WS 04/05
Virtual Addressing
Demand Paging: Page Replacement (2)
LRU: “Least Recently Used”
 Assumption:
not used in past ==> not used in the future
 Hardware implementation
 64-Bit time-stamp for each page
 Software implementation
0 1 1 1 t(i)
 “Aging”-Algorithm
 Choose page with lowest value
t
1 0 1 1 0 0 1 1 1 0
60
© P. Reali / M. Corti
Reference
Flag
1 1 1 0 t(i+1)
set if page accessed
System-Software WS 04/05
Virtual Addressing
Demand Paging: Page Replacement (3)
“Least Recently Created” LRC (FIFO)
 Page Lifespan as metric (old are swapped out)
 Chain sorted by creation time
 Bad handling for often-used pages
 Fix: “second chance” when accessed (ref flag set) during
the last tick
cur := earliest;
WHILE cur.ref DO
cur.ref := FALSE;
cur := cur.next
END
61
© P. Reali / M. Corti
next
earliest
Ref-Flag
System-Software WS 04/05
Virtual Addressing
Demand Paging: Page Replacement (4)

Strategies:
–
–

optimal
LRU / NRU / LRC
Exceptions:
–
“page pinning”: page cannot be swapped out

kernel code
62
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Example
working set
{1,2,3,4}
Accessed Pages: 1, 2, 1, 3, 4, 1, 2, 3, 4
Available Page Frames: 3
Page
1
Access
2
Ideal
1, 2 1, 2 1, 2, 3
1
PF!
FIFO
1
PF!
LRU
1
PF!
PF!
1
3
PF!
1, 2 1, 2 1, 2, 3
PF!
PF!
1, 2 1, 2 1, 2, 3
PF!
PF!
4
1
2
1, 2, 4
1, 2, 4 1, 2, 4 2, 3, 4 2, 3 ,4
PF!
2, 3, 4
PF!
1, 3, 4
PF!
3
4
PF!
3, 4, 1 4, 1, 2 1, 2, 3 2, 3, 4
PF!
PF!
PF!
PF!
1, 3, 4 1, 4, 2 4, 2, 3 4, 2, 3
PF!
PF!
63
© P. Reali / M. Corti
System-Software WS 04/05
Demand Paging
Belady’s Anomaly
LRC Strategie
• 3 Page Frames
9 Page Faults
• 4 Page Frames
10 Page Faults
64
Belady’s Anomaly:
More page frames cause
more page faults
© P. Reali / M. Corti
Page access
sequence
012301401234
012301444233
01230111422
0123000144
Victim
0123
01
xxxxxxx
xx
012333401234
01222340123
0111234012
000123401
Victim
012340
xxxx
xxxxxx
System-Software WS 04/05
Demand Paging
How many page frames per process?
Even Distribution
 Every process has the same amount of memory
 Thrashing
 every memory access causes a page-fault
 not enough page-frames for the current working-set
CPU-Load
System is swapping
instead of running
100 %
65
1
© P. Reali / M. Corti
2
n
n+1
Process Count
System-Software WS 04/05
Demand Paging
How many page frames per process? (2)
 Depending on the process needs (1)
 use Working-Set
Page Access
132233122333422111213331312341
Sliding
Window
66
{ 2, 3, 4 }
{ 1, 2, 3, 4 }
Working
Set
 Page Frames assigned according to the process’ working-set
size. Swap-out a process when not enough memory available.
© P. Reali / M. Corti
System-Software WS 04/05
Demand Paging
How many page frames per process? (3)
Depending on the process needs (2)
 use Page-Fault Rate
Page-Fault Rate
HIGH
LOW
Time
Swap out one
process
67
© P. Reali / M. Corti
Swap in
System-Software WS 04/05
Virtual Addressing
Aos/Bluebottle, Memory Layout Example
4 GB
Stacks
2 GB
Heap
68
Kernel
© P. Reali / M. Corti
• 128 KB per stack
• max. 32768 active objects
• first stack page allocated on process creation
PROCEDURE PageFault;
BEGIN
IF adr > 2GB THEN
add page to stack
ELSE
Exception(NilTrap)
END
END PageFault;
System-Software WS 04/05
Virtual Addressing
Example: UNIX, Fork
a UNIX Program consists of.....
Page Table
read-only
code
read-only
Process A
read-only
text
Fork()
data
read-only
read-only
data’
69
© P. Reali / M. Corti
Process B
read-only
“copy on write” read-write
System-Software WS 04/05
Virtual Addressing
OS Control

Oberon
–

Windows
–
–

no virtual memory
Virtual Memory configuration
Task Manager
Linux
–
–
Swap partition / Swap files
ps / top
70
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Segmentation

real memory

e.g. Intel x86
Problem
–
–
code
segment

data
segment
Solution
–
segment limit
640KB Max Memory
16bit addresses (i.e. 64KB)
–
–
work in a segment
code / data segments
check segment boundaries
segment base  Addrreal = Segbase+Offset
71
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Addressing
Summary

virtual addresses,
addressreal = f(addressvirtual)

Keywords
–
–

Decoupling from real memory
–
virtual memory
demand paging
separate address spaces
–
–
–
–
–
page
page frame
page table
page fault
page flags

–
page replacement strategy

–
–
dirty, used, unmapped
LRC, LRU, ideal, ...
swapping
thrashing, belady’s anomaly
72
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Overview

Abstractions for
applications
–
–

heap
memory blocks
( << memory pages)
Operations:
–
–
Allocate
Deallocate

Topics:
–
–
–
–
–
memory organization
free lists
allocation strategies
deallocation explicit
garbage collection





type-aware
conservative
copying / moving
incremental
generational
73
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Objects on the heap
Object Instances: a, b, c, d, …
Sequence:
dynamic
NEW(a)
allocation
NEW(b)
NEW(c)
DISPOSE(b)
NEW(d)
explicit
NEW(e)
disposal
74
d
c
b
e
!
a
„Heap“
e
not
enough
space
c
a
© P. Reali / M. Corti
e
e
Case 1 Case 2
System-Software WS 04/05
Memory Management
Problem overview

Problems




Solutions



75
Heap size limitation ( e, case 1)
External Fragmentation ( e, case 2)
Dangling Pointers (a points to b)
System-managed list of free blocks
(„free list“)
Vector of blocks with fixed size
(Bitmap, with 0=free, 1=used)
Automated detection and reclamation of unused blocks
(„garbage collection“)
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Theory: 50% rule



Assumption: stable state
M free blocks, N block allocated
50%-Rule: M = 1/2 N
N=A+B +C
M = 1/2 (2A + B + e)
ΔM = (C - A) / N
block disposal:
ΔM = 1 - p
block allocation:
(splitting likelihood)
2M = 2A + B + e
2M = 2A + N - A - C + e
2M = N + A - C + e
76
© P. Reali / M. Corti
A
BC B
B B
BC
e = 0,1, or 2
(C - A) / N = 1 - p
C - A - N + pN = 0
2M +e = pN
System-Software WS 04/05
Memory Management
Theory: Memory Fragmentation
QuickTim e™ and a
TIFF (Uncom pressed) decompressor
are needed to see thi s picture.
77
 { 50%-Rule }
(b/2)*F = H - b*B, /2*b*B = H - b*B

H/(b*B) = 1 + /2,  = 2/ - 2
© P. Reali / M. Corti
Critical
point
System-Software WS 04/05
Memory Management
Free-list management with a Bitmap

Idea
–
–

partition heap in blocks of size s
use bitmap to track allocated blocks
bitmap[i] = true  blocki allocated
loss due to
internal
fragmentation
Problems
–
–
internal fragmentation
round up block size to next multiple of s
map size
size is (heap_size / s) bits
78
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Free-list management with a list

List organization
–
sorted / non-sorted

–
one list / many lists (per size)

–
search is simpler, merging is more difficult
management data stored in the free block


merging of empty blocks is simpler with sorted list
size, next pointer
Operations
–
–
Allocation
Disposal with merge

find free blocks next to current block, merge into bigger free block
79
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Memory allocation strategies

block splitting:
–
if a free-block is bigger than the requested block, then it is split
used

80
used
free
used
used
used
used
take smallest fitting block
internal
fragmentation
 causes a lot of fragmentation
worst-fit
–

used
use first free block which is big enough
best-fit
–

used
first-fit
–

free
take biggest available block
quick-fit
–
best-fit but multiple free-lists (one per block size)
© P. Reali / M. Corti
 fast allocation!
System-Software WS 04/05
Memory Management
Buddy System (for fast block merging)


Blocks have size 2k
Block with size 2i has address
j*2i (last i bits are 0)

64
32
16
16
81
16
8
8
32
32
32

2k+1
Split
2k
2k-1
Merge
© P. Reali / M. Corti
Blocks with address
x=j*2i and (j XOR 1)*2i
are buddies (can be
merged into a block of
size 2i+1)
buddy = x XOR 2i
b1 xxxx 0 0000
b2 xxxx 1 0000
System-Software WS 04/05
Memory Management
Buddy System (for fast block merging)

Problem: only buddies can be merged
buddies

16
16
16
16
16
16
8
8
8
8
32
32
32
32
8
32
no buddies
Cascading merge
16
8
16
82
16
32
© P. Reali / M. Corti
32
32
System-Software WS 04/05
Memory Management
Buddy System (for fast block merging)

Allocation
–
allocate(8)
32
split
32
16
split
16
32
8 8
quick
fit
16
32
16
32
8
8
83
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Example: Oberon / Aos
Block size = k*32
free-lists for k = 1..9, one list for blocks > 9*32
Allocate quick-fit, splitting may be required
Free-list management and block-merging done
by the Garbage Collector
84
k * 32
96
64
32
© P. Reali / M. Corti
k * 32
96
initial
64
state
32
ALLOCATE(50)
Allocated Block
System-Software WS 04/05
Memory Management
Garbage Collection
Two steps:
1.
Free block detection
–
–
collector is aware of the
types traversed, i.e. know
which values are pointers
collector doesn’t know
which values are pointers
Block Disposal

–
–
conservative

2.
GC Characteristics
type-aware

–

return unused blocks to the
free-lists
85
© P. Reali / M. Corti

incremental
gc is performed in small steps to
minimize program interruption
moving / copying / compacting
blocks are moved around
generational
blocks are grouped in
generations; different treatment
or collection priority
Barriers
–
–
read
intercept and check every pointer
read operation
write
intercept and check every pointer
write operation
System-Software WS 04/05
Memory Management
Garbage Collection: Reference Counting
 Every object has a Reference counter rc
 rc = 0  Object is „Garbage“ write barrier
 Problems
 Overhead
p, q Pointers to Object
q := p
 no support for
circular structures
 Useful for...
86
 Module hierarchies
 DAG-Structures (z. B. LISP)
© P. Reali / M. Corti
p
INC p.rc
DEC q.rc
IF q.rc = 0 THEN
Collect q^
END;
q := p
rc >= 1
rc >= 1
q
rc
rc
M
A
B
C
D
System-Software WS 04/05
Memory Management
Garbage Collection: Mark & Sweep
 Mark-Phase (Garbage Detection)
 Compute the Root-set consisting of
 global pointers (statics) in each module
 local pointers on the stack in each PAF
 temporary pointers in the CPU’s registers
 Traverse the graph of the live objects starting from the root-set
with depth-first strategy; mark all reached objects.
 Sweep-Phase (Garbage Collection)
87
 Linear heap traversal. Non-marked blocks are inserted into
free-lists.
 Optimization: lazy sweeping (sweep during allocation,
allocation gets slower)
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: root-set
 Run-time support from object-system. Hidden
data structures with (compiler generated)
information about pointers (metadata). instanc
off2
global
pointer
off
Module
Descriptor
88
e
pointer
off1
off2
off1
off
Module
Data
Object
Instance
Type Tag
Typ
Descriptor
 Conservative approach. Guess which values
could be pointers and threat them as such
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Mark with Pointer Rotation/1
Problem:
Garbage collection called when free memory is low, but mark may
require a lot of memory
Solution:
Pointer rotation algorithm (Deutsch, Schorre , Waite)
+
+
–
–
89
–
Memory efficient
iterative
structures are temporarily inconsistent
non-concurrent
non-incremental
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Mark with Pointer Rotation/2
Simple case: list traversal
q
q
90
p
p
p.link
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Mark with Pointer Rotation/3
Generic case: structure traversal
q
p
91
q
p
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
MS .NET
Garbage Collection: Memory Compaction



nextavail Pointer: partition heap between allocated
and free space
Allocate: increment nextavail
Garbace Collector performs memory compaction
nextavail
ALLOC
GC
92
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Stop & Copy


Partition heap in from and to regions
Collection:
–
–
–
–

traverse objects in from, copy to to
leave forwarding pointer behind
requires read barrier
swap from and to
Characteristics
–
–
–
copying
incremental
(generational)
93
© P. Reali / M. Corti
access p
instrument code
with read barrier
IF p is moved THEN
replace p with forwarding pointer
END;
access p
System-Software WS 04/05
Memory Management
Garbage Collection: Stop & Copy
1
2
from
to
3
from
to
to
from
4
from
to
94
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Concurrent GC
„Stop-and-Go“ Approach
Mutator
GC Mutator
User
Process
GC
Mutator
„Incremental“ Approach
Mutator
95
Real-Time
Constraint
© P. Reali / M. Corti
Mutator
Mutator
Mutator
GC
System-Software WS 04/05
Memory Management
Garbage Collection: Tricolor marking
„Wave-front“ Model
State
Color
already traversed,
behind wave
being traversed,
on the wave
black
not reached yet,
in front of the wave
white
grey
96
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Tricolor marking / Isolation
Mutator can change pointers at any time
 Critical case: black  white
Remedy
 Write-Barrier
 color B gray
 color W gray
Write
Barrier
B
W
97
unreachable
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Backer‘s Treadmill
Free-Space
Heap: doublelinked chain of
objects
98
curscan
To-Space
© P. Reali / M. Corti
From-Space
System-Software WS 04/05
Memory Management
Garbage Collection: Backer‘s Treadmill
Free-Space
conservative
allocation
99
curscan
To-Space
© P. Reali / M. Corti
progressive
allocation
From-Space
System-Software WS 04/05
Memory Management
Garbage Collection: Backer‘s Treadmill
Free-Space
collect
reference
curscan
curscan
curscan
To-Space
From-Space
100
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Backer‘s Treadmill
State transitions after GC is complete
 From-Space + Free-Space  Free-Space
 ToSpace  FromSpace
NEW(x
Fragmentation
)x
 External: not removed
 Internal: depends on
y
supported block sizes curscan
Allocation
101
 conservative: black
 progressive: white
© P. Reali / M. Corti
Root Set
NEW(y)
System-Software WS 04/05
Memory Management
Generational Garbage Collection
collect where it is
garbage is most
likely to be found
Generations
 Expected object life
young  short life (temp data)
old  long life
 Generations G0, G1, G2
© P. Reali / M. Corti
GC
frequency
G0
high
G1
medium
G2
low
J
E
102
Gen
D
G
I
C
F
H
B
D
G
A
A
A
G0
G1
G2
special handling
for pointers
across different
generations
required
System-Software WS 04/05
Memory Management
Garbage Collection: Finalization
Finalization (after-use cleanup)
 User-defined routine when object is collected
 Establish Consistency
 save buffers
 flush caches
 Release Resources
 close connections
 release file descriptors
 Dangers:
103
 Resurrection of objects: objects added to live structures
 Finalization sequence is undefined
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: .NET Finalization Example
Rules:
garbage
objects with finalizer belong to
older generation
finalizer only called once
(ReRegisterForFinalize)
FinalizationQueue: live object
with finalizer
FreachableQueue: collected
objects to be finalized
Finalization executed by
different process for security
reasons
104
E
D
C
B
A
E
B
A
Finalization
Queue
A
Finalization
Queue
E
B
Freachable
Queue
GC
E
D
C
B
A
thread
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Weak Pointers
weak
pointer
 „Weak“ Pointers
 Objects referenced only
through a weak pointer can
be collected by the GC in
case of need
 Used for Caches and
Buffers

garbage
in use
garbage
Implementation
1. Weak Pointers are not
registered to the GC
2. Use a weak reference table
(indirect access)
weak reference
105
weak reference table
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Garbage Collection: Weak Pointers Example

Oberon: internal file list
–
–
–
system must keep track of open files to avoid buffer
duplication
file descriptor must be collected once user has no
more reference to it
use weak pointer in the system (otherwise would
keep file alive!)
106
© P. Reali / M. Corti
System-Software WS 04/05
Memory Management
Object Pools
 Application keeps a pool of preallocated object
instances; handles allocation and disposal
 Simulation discrete events
 Buffers in a file system
 Provide dynamic allocation in real-time system
107
PROCEDURE NewT (VAR p: ObjectT);
BEGIN
IF freeT = NIL THEN NEW(p)
ELSE p := freeT; freeT := freeT.next
END
PROCEDURE DisposeT (p: ObjectT);
END NewT;
BEGIN p.next := freeT; freeT := p
END DisposeT;
© P. Reali / M. Corti
System-Software WS 04/05
Garbage Collection, Recap
108
GC kinds:
 compacting
 copying
 incremental
 generational
Helpers:
 write barrier
 read barrier
 forwarding pointer
 pointer rotation
© P. Reali / M. Corti
Algorithms:
 Ref-Count
 Mark & Sweep
 Stop & Copy
 Mark & Copy (.NET)
 Baker’s Threadmill
–
–
Dijkstra / Lamport
Steele
System-Software WS 04/05
Distributed Object Systems
Overview

Goals
–
–

object-based approach
hide communication
details
Advantages
–
–
–
–
more space
more CPU
redundancy
locality
Problems
 Coherency
–

Interoperability
–
–
–

ensure that same object
definition is used
serialization
type consistency
type mapping
Object life-time
–
distributed garbage collection
109
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Architecture
Naming
Service
Client
Application
Server
Proxy
Call
Contex
t
Stub
Impl.
Message
Object
Broker
110
IDL-Compiler
© P. Reali / M. Corti
Object
Broker
IDL
Impl.
Skeleton
IDL-Compiler
System-Software WS 04/05
Remote Procedure Invocation
Overview

network byteordering
Problem
–
–
–
send structured
information from A to B
A and B may have
different memory layouts
“endianness”
Big-Endian: MSB before LSB
• IBM, Motorola, Sparc
12
1
0
34
–
How is 0x1234 (2 bytes)
representend in memory?
34
12
Little-Endian: LSB before MSB
• VAX, Intel
little end
first
111
© P. Reali / M. Corti
System-Software WS 04/05
Definitions

Serialization
–

Deserialization
–

conversion of an object‘s instance into a byte stream
conversion of a stream of bytes into an object‘s instance
Marshaling
–
gathering and conversion (may require serialization) to an
appropriate format of all relevant data, e.g in a remote method
call; includes details like name representation.
112
© P. Reali / M. Corti
System-Software WS 04/05
Remote Procedure Invocation
big-endian
representation
Protocol Overview

Protocols
–
RPC + XDR (Sun)


–
–
–
–
–
V2.0, February 1997
V3.0, August 2002
–

SOAP / XML (W3C)

–
–
IIOP / CORBA (OMG)

–
RFC 1014, June 1987
RFC 1057, June 1988
 XDR
...
V1.1, May 2000
–
–
–
–
–
Type System
[unsigned] Integer (32-bit)
[unsigned] Hyper-Integer (64-bit)
Enumeration (unsigned int)
Boolean (Enum)
Float / Double (IEEE 32/64-bit)
Opaque
String
Array (fix + variable size)
Structure
Union
Void
113
© P. Reali / M. Corti
System-Software WS 04/05
Remote Procedure Invocation
RPC Protocol


Remote Procedure Call
Marshalling of procedure
parameters
Client
114
PROCEDURE P(a, b, c)
• pack parameters
• send message to
server
• await response
• unpack response
© P. Reali / M. Corti



Message Format
Authentication
Naming
Server
Server
• unpack parameters
• find procedure
• invoke
• pack response
• send response
P(a, b, c)
System-Software WS 04/05
Distributed Object Systems
Details

References vs. Values
–
–
–
client receives reference to
remote object
data values are copied to
client for efficiency reasons
decide whether an object is
sent as reference or a value


serializable (Java, .NET),
valuetype (CORBA)
MarshalByRefObject (.NET),
java/RMI/Remote (Java),
default (CORBA)

object creation
–
–
–

object instances
–
–
–

server creates objects
client creates objects
server can return references
one object for all requests
one object for each requests
one object per proxy
conversation state
–
–
stateless
stateful
115
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Distr. Object Systems vs. Service Architecture

Dist. Object System
–
–
–
–
116
object oriented model
object references
stateful / stateless
tight coupling
internal communication
between application’s
tiers
© P. Reali / M. Corti

Service Architecture
–
–
–
–
OO-model / RPC
service references
stateless
loose coupling
external
communication
between applications
System-Software WS 04/05
Distributed Object Systems
Distr. Object Systems vs. Service Architecture
Remoting
RMI
© P. Reali / M. Corti
coupling
117
loose
• services
remote procedure calls
• stateless conversation
(session?)
• message
tight
• components / objects
(distributed object system)
• stateful and stateless
conversation
• transactions
CORBA
Web Services
environment
homogeneous
heterogeneous
System-Software WS 04/05
Distributed Object Systems
Type Mapping
Type System 1
Interoperability
Type System
Type System 2
Possible Types
Possible Types
Possible Types
Mappable
Types
Interop
Subset
Mappable
Types
118
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Type Mapping, Example
Java
Type System
CORBA
Type System
CLS
Type System
char
double
char
union
119
enum
enum
double
double
wchar
char
union
union
custom implementation
© P. Reali / M. Corti
custom implementation
System-Software WS 04/05
Distributed Object Systems
Examples

Standards
–
OMG CORBA

–
IIOP
Web Services

SOAP

Frameworks
–
–
–
Java RMI (Sun)
DCOM (Microsoft)
.NET Remoting (Microsoft)

IIOP.NET
120
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
CORBA
Common Object Request Broker Architecture
Client Application
Object
Remote Architecture
Client Stub
CORBA
Runtime
Interface
Repository
Client
Implementation
Repository
„Object-Bus“
ORB
Server
Object Skeleton
Object Adaptor
CORBA
Runtime
ORB
GIOP/IIOP
TCP/IP Socket
121
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
CORBA
–
CORBA is a standard from
OMG


–
Object Management Group
Common Object Request
Broker Architecture
–
CORBA defines...




CORBA is useful for...



building distributed object
systems
heterogeneous
environments
tight integration


an object-oriented type system
an interface definition language
(IDL)
an object request broker (ORB)
an inter-orb protocol (IIOP) to
serialize data and marshall method
invocations
language mappings from Java, C++,
Ada, COBOL, Smalltalk, Lisp,
Phyton
... and many additional standards
and interfaces for distributed
security, transactions, ...
122
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
CORBA

Basic Types
–
integers

–
–
–
–
–
16-, 32-, 64bit integers (signed
and unsigned)
–
–
struct
union
sequence (variable-length array)
array (fixed-length)
interface


–
8bit and wide
boolean
opaque (8bit), any
enumerations
–
–
32-, 64-bit and extendedprecision numbers
fixed point
char, string

Compound Types
–
IEEE floating point

–

value type




© P. Reali / M. Corti

pass-by-value
abstract (no state)
Operations

123
concrete (pass-by-reference)
abstract (pure definition)
in / out / inout parameters
raises
Attributes
System-Software WS 04/05
Distributed Object Systems
CORBA / General Inter-ORB Protocol (GIOP)

CDR (Common Data
Representation)
–
–
–

Variable byte ordering
Aligned primitive types
All CORBA Types supported

Message Format
–
–


IIOP (Internet IOP)
–
–

GIOP over TCP/IP
Defines Interoperable Object
Reference (IOR)



host
post
key
Defined in IDL
Messages



–
–
Byte ordering flag
Connection Management

124

© P. Reali / M. Corti
Request, Reply
CancelRequest,
CancelReply
LocateRequest,
LocateReply
CloseConnection
MessageError
Fragment
request multiplexing
asymmetrical / bidirectional
connectionsSystem-Software WS 04/05
Distributed Object Systems
CORBA / GIOP Message in IDL
module GIOP {
struct Version {
octet major;
octet minor;
}
enum MsgType_1_0 {
Request, Reply,
CancelRequest,
CancelReply,
LocateRequest,
LocateReply,
CloseConnection,
Error
}
struct MessageHeader {
char
Magic[4];
Version
GIOP_Version;
boolean
byte_order;
octet
message_size;
unsigned long
message_type;
}
} // module end GIOP
125
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
CORBA Services

CORBA Services
–
–

System-level services
defined in IDL
Provide functionality required
by most applications
–
–
–
Allows local or remote
objects to be located by
name
Given a name, returns an
object reference
Hierarchical directory-like
naming tree
Allows getting initial
reference of object
© P. Reali / M. Corti
Event Service
–
–
Naming Service
–
126

–

Allows objects to
dynamically register
interest in an event
Object will be notified
when event occurs
Push and pull models
... and more
–
Trader, LifeCycle,
Persistence, Transaction,
Security
System-Software WS 04/05
Distributed Object Systems
WebServices


Service-oriented architecture
Rely on existing protocols
–
SOAP

–
–
HTTP
service description protocol
UDDI

SOAP
messaging protocol
WSDL

Web Services
TCP/IP
service location protocol
127
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
SOAP





Simple Object Access
Protocol
communication protocol
XML-based
describes object values
XML Schemas as interface
description language
–
basic types

128
–
string, boolean, decimal,
float, double, duration,
datetime, time, date,
hexBinary, base64Binary,
URI, Qname, NOTATION

SOAP Message
–
–
–

SOAP Envelope
SOAP Header
SOAP Body
Method Call
–
–
–
packed as structure
messages are selfcontained
no external object
references
structured types

© P. Reali / M. Corti
list, union
System-Software WS 04/05
Distributed Object Systems
SOAP Message

SOAP Message
–

Example
SOAP Envelope


SOAP Header
SOAP Body
float Multiply(float a, float b);
129
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
SOAP Example (Request)
POST /quickstart/aspplus/samples/services/MathService/CS/MathService.asmx
HTTP/1.1
Host: samples.gotdotnet.com
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://tempuri.org/Multiply"
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<Multiply xmlns="http://tempuri.org/"> <a>float</a> <b>float</b> </Multiply>
</soap:Body>
</soap:Envelope>
130
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
SOAP Example (Answer)
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: length
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<MultiplyResponse xmlns="http://tempuri.org/">
<MultiplyResult>float</MultiplyResult>
</MultiplyResponse>
</soap:Body>
</soap:Envelope>
131
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
SOAP Example (Service Description-1)
132
<?xml version="1.0" encoding="utf-8"?>
<definitions ....>
<types>
<s:schema elementFormDefault="qualified" targetNamespace="http://tempuri.org/">
<s:element name="Multiply">
<s:complexType><s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="a" type="s:float" />
<s:element minOccurs="1" maxOccurs="1" name="b" type="s:float" />
</s:sequence></s:complexType>
</s:element>
</s:schema>
</types>
<message name="MultiplySoapIn">
<part name="parameters" element="s0:Multiply" />
</message>
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
SOAP Example (Service Description-2)
133
<binding name="MathServiceSoap" type="s0:MathServiceSoap">
<soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document"
/>
<operation name="Multiply">
<soap:operation soapAction="http://tempuri.org/Multiply" style="document" />
<input><soap:body use="literal" /></input>
<output><soap:body use="literal" /></output>
</operation>
</binding>
<service name="MathService">
<port name="MathServiceSoap" binding="s0:MathServiceSoap">
<soap:address
location="http://samples.gotdotnet.com/quickstart/aspplus/samples/services/MathS
ervice/CS/MathService.asmx" />
</port>
</service>
System-Software WS 04/05
</definitions>
© P.
Reali / M. Corti
Distributed Object Systems
WebServices

Comments
–
–
–
–
–
–
–
XML (easily readable)
system independent
standard
stateless (encouraged design
pattern)

Constraints
–
Services


–
no object references
server-activated servant
Goes over HTTP

requires web server
bloated
big messages (but easily
compressed)
requires expensive parsing
134
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
WebService Future


Use SOAP-Header to store additional information
about message or context
Many standards to come...
–
–
–
–
–
–
WS-Security
WS-Policy
WS-SecurityPolicy
WS-Trust
WS-SecureConversation
WS-Addressing
135
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Java RMI
Java Remote Method Invocation
Client Application
Object
Lookup
Register
Lookup
Register
Object Stub
Remote Architecture
Object Stub
Remote
References
Remote
References
Transport
Layer
Transport
Layer
136
Client
Network
Server
TCP/IP Socket
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Java RMI Details

Framework
–
supports various implementations

e.g. RMI/IIOP
–
–
mapping limited to the Java type system, workarounds
needed
uses reflection to inspect objects
137
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object-Systems
Low-Level Details: Java RMI/IIOP

Common Type-System
–


restricted CORBA
–
Marshalling
–
–
name mapping
remote objects

only references
Interface Description
Language (IDL)


java to IDL mapping
Message representation
Underlying protocol
–
IIOP (CORBA)
138
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Microsoft DCOM
Distributed Common Object Model
Client Application
Object Proxy
COM
Runtime
Client
Object
Remote Architecture
Object Stub
COM
Runtime
SCMs and Registration
SCM
SCM
Registry
Registry
Server
OXID Resolver
RPC Channel
139
Ping Server
© P. Reali / M. Corti
Network
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting
new Instace()
or
Activator.GetObject(...)
Channel
ObjRef
IChannelInfo
IEnvoyInfo
IRemotingTypeInfo
string
140
ChannelInfo;
EnvoyInfo;
TypeInfo;
URI;
Application Domain Boundary
Transparent
Proxy
Client
Instance
Channel
Network
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting
Client
Dispatcher
Message
Chan.Sink(s)
Formatter
Stream
Chan.Sink(s)
Transport
Sink
141
Instance
Message custom operations
Chan.Sink(s)
channel
channel
Instance s = new Instance();
s.DoSomething();
Proxy
Formatter
serialize object
Stream custom operations
Chan.Sink(s)
Transport handle communication
Sink
Network
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting
Activation
 client
–

–
one instance per activation
one instance of object
–
–
one instance per call

Leases (Object Lifetimes)
–
–
renew lease on call
set maximal object lifetime
SOAP

server / SingleCall
–

Serialization
server / Singleton
–


Warning: non-standard
types, only for .NET use
binary
user defined
Transport
–
–
–
TCP
HTTP
user defined
142
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting (Object Marshalling)

MarshalByRefObjects


remoted by reference
client receives an ObjRef
object, which is a“pointer“
to the original object

[Serializable]



ISerializable

AppDomain 1
Obj
AppDomain 2
Proxy
Serialized
ObjRef
143
© P. Reali / M. Corti
all fields of instance are
cloned to the client
[NonSerialized] fields are
ignored
AppDomain 1
Obj
object has method to define
own serialization
AppDomain 2
Obj‘
Serialized
fld1... fldn
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting, Activation

Server-Side Activation
(Well-Known Objects)
–
Singleton Objects

–
only one instance is allocated to process all
requests
SingleCall Objects


one instance per call is allocated
Client-Side Activation
–
“stateless”
Client Activated Objects

144
© P. Reali / M. Corti
“stateful”
the client allocates and controls the object on
the server
System-Software WS 04/05
Distributed Object Systems
Microsoft .NET Remoting, Limitations
–Server-Activated

object configuration limited to the default constructor
–Client-Activated



Objects
Objects
class must be instantiated, no access over interface
class hierarchy limitations
use Factory Pattern
–
to get interface reference
– to allow parametrization of the constructor
–Furthermore...


145
interface information is lost when passing an object reference to
another machine
no control over the channel
–
which channel is used
– which peer is allowed to connect
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET

Opensource project based on ETH-Diploma thesis
–

IIOP.NET (marketing)
–

http://iiop-net.sourceforge.net/
„Provide seamless interoperability between .NET and CORBAbased peers (including J2EE)“
IIOP.NET (technical)


146

© P. Reali / M. Corti
.NET remoting channel implementing the CORBA IIOP protocol
Compiler to make .NET stubs from IDL definitions
IDL definition generator from .NET metadata
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET
 IIOP rather than SOAP
 transparent reuse of
existing servers
 tight coupling
 object-level granularity
 efficiency
 Runtime: standard .NET
remoting channel for IIOP
 transport sink
 formatter
 type-mapper
147
 Build tools
 IDL  CLS compiler
 CLS  IDL generator
© P. Reali / M. Corti
.NET
client
binary
.NET
server
Java
client
IIOPIIOP
J2EE
server
CORBA
objects
Java Type System
IDL Type System
CLS Type System
Possible Types
Possible Types
Possible Types
IDL
Mappable
Types
Interop
Subset
IDL
Mappable
Types
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET, Interoperability
148
Application
This is what we want
Services
Distributed Transaction Coordinator, Active Directory, …
Conversation
Activation model (EJB, MBR), global naming,
distributed garbage collection, conversational state,…
Contextual Data
Interception Layer
SessionID, TransactionID, cultureID, logical threadID …
Message Format
RPC, IIOP, HTTP, SOAP, proprietary binary format,
messages, unknown data (exceptions), encryption
Data Model
Type system, mapping and conversion issues
Communication
Protocols
TCP/UDP, Byte stream, point-to-point communication
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET, Granularity
System
Service
Component
Component
Object Object
Object Object
Service
Component
Object Object
Granularity
149
Service
Message-based
Coupling,
Interface,
Interaction
Stateless
© P. Reali / M. Corti
Component
Object
Strongly-typed
Interface,
Stateless or
Stateful
Implementation
Dependency,
Stateful
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET
2nd Article
1st Article
1.6
1.5
1.4
1.3
1.2
1.1
1.0
150
© P. Reali / M. Corti
System-Software WS 04/05
Distributed Object Systems
Case Study: IIOP.NET, Performance
 Test
Case:
–WebSphere
–Clients



5.0.1 as server
–Response

IBM SOAP-RPC Web Services
IBM Java RMI/IIOP
IIOP.NET
time
receiving 100 beans from
server
–
WS:
– IIOP.NET:

4.0 seconds
0.5 seconds
when sending many more
beans, WS are then 200%
slower than IIOP.NET
Source:
posted on IIOP.NET forum
151
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Introduction
 CPU
as resource, provide
abstraction to it
 Allow multiprogramming
–
–
pseudo-parallelism
(single-processors)
real parallelism
(multi-processors)
 Required
–
–
–

Topics
–
–
–
–

abstractions
multiple activities -- execution
of instructions
protection of resources
synchronization of activities
coroutines
processes
threads
scheduling

–
fairness
starvation
synchronization

deadlocks
152
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Multithreading
Stack
2
Stack
1
Call a.run
Call b.Q
Call b.q
Call b.q
Threa
Return b.q
d1
Return b.q
Call c.R
Return c.R
Return b.Q
Return a.run
153
2
1
b.q
b.Q
a.run
2
1
time
© P. Reali / M. Corti
b.q
Call c.run
Call d.Q
Call d.q
Call d.q
Threa
Return d.q
d2
Return d.q
Call e.R
Return e.R
Return d.Q
Return c.run
e.R
d.Q
c.run
2
1
time
time
System-Software WS 04/05
Processes and Threads
Coroutines (1)

Coroutines
–
–
–
each activity has its own stack, address-space is
shared
explicit context switch (stack only) under
programmer‘s control
uses Transfer call switch to another coroutine
154
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Coroutines (2)
Subroutines
Call
Call
Return
Return
Coroutinen
Transfer
Start
Start
155
Transfer
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Coroutines (3)
TYPE Coroutine = POINTER TO RECORD
FP: LONGINT;
stack: POINTER TO ARRAY OF SYSTEM.BYTE;
END;
VAR cur: Coroutine; (* Current Coroutine *)
PROCEDURE Transfer*(to: Coroutine);
BEGIN
SYSTEM.GETREG(SYSTEM.EBP, cur.FP);
cur := to;
SYSTEM.PUTREG(SYSTEM.EBP, cur.FP);
END Transfer;
PUSH EBP
SUB ESP, 4
save FP
restore FP
MOV ESP, EBP
POP EBP
RET 4
156
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Transfer(Q)
Coroutines (4)
locals
FP’
PC’
to’
SP
FP
stackP
SP
FP
stack
locals
FP”
pcx
Q
locals
FP’
PC’
to’
stackP
stack
Q
FP := Q.FP
locals
FP”
pcx
Q
157
stackP
© P. Reali / M. Corti
SP
FP
Q
locals
FP’
PC’
to’
locals
FP”
pcx
Q
stack
stackP
Q
return
jump at PC’
SP
FP
stack
System-Software WS 04/05
Q
Processes and Threads
Coroutines (5)



158
Current stack: current execution state
All other stacks: top PAF (proc activation frame) contains last
Transfer call
Start: create stack with fake Transfer-like PAF
PROCEDURE Start(C: Coroutine; size: LONGINT);
BEGIN
NEW(C.stack, size);
tos := SYSTEM.ADR(C.stack[0])+LEN(C.stack);
SYSTEM.PUT(tos-4, 0); (* par = null *)
SYSTEM.PUT(tos-8, 0); (* PC’ = null, not allowed to return *)
SYSTEM.PUT(tos-12, 0); (* FP’ *)
cur.FP := tos-12;
END;
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Problems caused by multitasking

Concurrent access to
resources
–
–

protection
limit access to a resource
synchronization
synchronize task with
resource state or other task
Concurrent access to CPU
–
–

One problem’s solution
is another problem’s
cause....
–
–
–
deadlocks
fairness
deadlines / periodicity
constraints
task priorities
scheduling
159
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Protection: Mutual Exclusion
Mutual Exclusion
only one activity is allowed to access one
resource at a time
disable interrupts (single CPU only, avoid switches)
locks
flag: lock taken / lock free
spin lock (uses busy waiting)
exclusive lock
read-write lock (multiple reader, one writers)
160
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Protection: Monitor
Shared resources as Monitor
 resources are passive objects
 execution of critical sections inside monitor is
mutually exclusive
 Global Monitor Lock
 Shared Monitor Lock for read-access (optional) Resource
 monitor as a special module
[original version (Hoare, Brinch Hansen)]
 object instance as monitor
161
 method and code block granularity
 Java, C#, Active Oberon, ...
© P. Reali / M. Corti
task P
task Q
acquire acquire
release
release
System-Software WS 04/05
Processes and Threads
Protection

162
one waiting queue
per resource is
required
Simplistic implementation with coroutines
Non-reentrant lock (no recursion allowed)
PROCEDURE Acquire(r: Resource);
BEGIN
IF r.taken THEN
InsertList(r.waiting, cur);
SwitchToNextRoutine()
ELSE
r.taken := TRUE
END
END Acquire;
© P. Reali / M. Corti
PROCEDURE Release(r: Resource);
BEGIN
next := GetFromList(r.waiting);
IF next # NIL THEN
InsertList(ready , next);
Transfer(GetNextTask());
ELSE
r.taken := FALSE
END
END Release;
System-Software WS 04/05
Processes and Threads
Protection
Shared resource as Process
synchronization during communication
Communicating Sequential Processes (CSP)
C.A.R. Hoare (1978)
Model of communication
„Rendez-vous“ between two processes
P!x (send x to process P)
Q?y (ask y from process Q) task P task Q
Used in Ada, Occam
163
© P. Reali / M. Corti
P?x
Q!z
task P
task Q
Q!z
P?x
System-Software WS 04/05
Processes and Threads
Protection

Some variations on the theme....
–
–
Reentrant Locks
Readers / Writers

–
Binary Semaphores

–
one writer or multiple readers allowed
one activity can get the resource
Generic Semaphores

N activities are allowed to get the resource
164
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Synchronization
 Wait on a condition / state
 Signals with Send/Wait Methods
 Require cooperation from all processes
 Example: Producer/Consumer with conditions
nonempty/nonfull
 Semantic of Send
 Send-and-Pass vs. Send-and-Continue
 Generic system-handled conditions (Active Oberon)
 AWAIT(x > y);
 Wait on partner process
165
 CSP
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Synchronization: Implementation Example
Process list
 double-chained list of all coroutines
 cur points to current (running) coroutine
 each signal has a LIFO list
ready
C1
C4
C2
link
s
Signal
166
© P. Reali / M. Corti
C3
C5
cur
System-Software WS 04/05
Processes and Threads
Synchronization: Implementation Example
Terminate
cur.next.prev := cur.prev;
cur.prev.next := cur.next;
Schedule
Schedule
prev := cur;
WHILE ~cur.ready & cur.next # prev DO
cur := cur.next
END;
167 IF cur.ready THEN Transfer(cur) ELSE (*deadlock*) END
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Synchronization: Implementation Example
Init(s)
s := NIL
Wait(s)
cur.link := s; s := cur; cur.ready := FALSE;
Schedule (*to next ready from cur*)
Send(s)
IF s # NIL THEN (*send-and-pass*)
cur := s; s.ready := TRUE; s := s.link
END;
Schedule (*to next ready from cur*)
168
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Active Oberon: Bounded Buffer
Buffer* = OBJECT
VAR
data: ARRAY BufLen OF INTEGER;
in, out: LONGINT;
(* Put - insert element into the buffer *)
PROCEDURE Put* (i: INTEGER);
BEGIN {EXCLUSIVE}
(*AWAIT ~full *)
AWAIT ((in + 1) MOD BufLen # out);
data[in] := i; in := (in + 1) MOD BufLen
END Put;
169
© P. Reali / M. Corti
(* Get - get element from the buffer *)
PROCEDURE Get* (VAR i: INTEGER);
BEGIN {EXCLUSIVE}
(*AWAIT ~empty *)
AWAIT (in # out);
i := data[out]; out := (out + 1) MOD BufLen
END Get;
PROCEDURE & Init;
BEGIN
in := 0; out := 0;
END Init;
END Buffer;
System-Software WS 04/05
Processes and Threads
CSP: Bounded Buffer (I)
[bounded_buffer || producer || consumer]
producer ::
*[<produce item>
bounded_buffer ! item;
]
consumer ::
*[bounded_buffer ? item;
<consume item>
]
170
Geoff Coulson
Lancaster University
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
CSP: Bounded Buffer (II)
bounded_buffer ::
buffer: (0..9) item;
in, out: integer;
in := 0; out := 0;
*[
in < out+10; producer ? buffer(in mod 10)
-> in := in + 1;
||
out < in; consumer ! buffer(out mod 10)
-> out := out + 1;
]
171
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Process State

Process states
1.
Running
3
1
2.
2
3.
–
Blocked
Ready
Process state transitions
1.
2.
4
Running: actually using the
CPU
Ready: waiting for a CPU
Blocked: unable to run,
waiting for external event
3.
4.
wait for external event
system scheduler
system scheduler
external event happens
172
© P. Reali / M. Corti
System-Software WS 04/05
Processes and Threads
Process State (Active Oberon)

Active Oberon provides
–
Running
–
Awaiting
Object
Awaiting
Condition


Ready
monitor-like object
protection
conditions
Condition are checked
by the system.
No explicit help or
knowledge from user is
required (no x.Signal)
173
© P. Reali / M. Corti
System-Software WS 04/05
Activities


Program (static concept) ≠ Process (dynamic)
Processes, jobs, tasks, threads (differences later)
–
–
program code
context:
 program counter (PC) and registers
 stack pointer
 state
–
–
–
–
–
–
–
174
[new]
running
waiting
ready
[terminated]
stack
data section (heap)
© P. Reali / M. Corti
System-Software WS 04/05
Processes vs. Threads

Process or job
(heavyweight)
–
–
–
–
–
code
address space
processor state
private data
(stack+registers)

Thread (lightweight)
–
–
–
–
shared code
shared address space
processor state
private data
(stack+registers)
can have multiple
threads
Kernel
CPU
175
© P. Reali / M. Corti
System-Software WS 04/05
Processes vs. Threads: Example
176
HEAP 1
HEAP 2
HEAP
STACK 1
STACK 2
STACK 1
PROC 1
PROC 2
PROC
instr
instr
instr
instr
instr
instr
…
…
…
instr
instr
instr
© P. Reali / M. Corti
STACK 2
System-Software WS 04/05
Multitasking

Programmed events that can cause a task
switch
synchronous
–
protection (locks)


–
synchronization

asynchronous

177

acquire
release
wait on a condition
send a signal (send-and-pass)
System events that can cause a task switch
–
–
–
voluntary switch (“yield”, task termination)
process with higher priority becomes available
consumption of the allowed time quantum
task
preemption
© P. Reali / M. Corti
System-Software WS 04/05
Preemption


Assign each process a time-quantum (normally in
the order of tens of ms)
Asynchronous task switches can happen at any
time!
–
–

task can be in the middle of a computation
save whole CPU state (registers, flags, ...)
Perform switch
–
–
–
178
on resource conflict
on synchronization request
on timer-interrupt (time-quantum is over)
© P. Reali / M. Corti
System-Software WS 04/05
Context switch

Scheduler invocation:
–
–

Operations:
–
–
–
–
–

preemption  interrupt
cooperation  explicit call
store the process state (PC, regs, …)
choose the next process (strategy)
[accounting]
restore the state of the next process (regs, SP, PC, …)
jump to the restored PC
A context switch is usually expensive: 1–1000s
depending on the system and number of processes
–
179
hardware optimizations (e.g., multiple sets of registers –
SPARC, DECSYSTEM-20)
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling algorithms
Three categories of environments:
 batch systems (e.g., VPP, DOS)
–

interactive systems (UNIX, Windows, Mac OS)
–
–

usually non-preemptive (i.e., task is not stopped by
scheduler, only synchronous switches)
cooperative or preemptive
no task allowed to have the CPU forever
real-time systems (PathWorks, RT Linux)
–
180
timing constraints (deadlines, periodicity)
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling Performance


CPU utilization
Throughput
–
–

Turnaround time
–
–


= exit time - arrival time
execution, wait, I/O
Response time
–

number of jobs per time unit
minimize context switch penalty
= start time - request time
Waiting time (I/O, waiting, …)
Fairness
181
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling algorithm goals

All systems
–
–

give every task a chance
–
Response time

–
–
respond quickly
meet user’s expectations
© P. Reali / M. Corti
minimize time in system
CPU utilization


maximize number of jobs
Turnaround time

keep CPU busy
Real-time systems
–
Meet deadlines

–
Proportionality

182
–
keep all subsystems busy
Interactive systems
Throughput

Policy enforcement
Balance

Batch systems
–
Fairness

–

Predictability

–
avoid losing data
avoid degradation
Hard- vs. soft-real-time
systems
System-Software WS 04/05
Batch Scheduling Algorithms
Choose task to run (task is usually not preempted)
 First Come First Serve (FCFS)
–

Shortest Job First (SJF)
–

–
response ratio = (time in the system / CPU time)
depends on the waiting time
Highest Priority First
–

requires knowledge about job length
Longest Response Ratio
–

fair, may cause long waiting times
ETH-VPP is a
batch system!
Which algorithm
does it use?
with or without preemption
Mixed
–
183
the priority is adjusted dynamically (time in queue, length, priority, …)
© P. Reali / M. Corti
System-Software WS 04/05
Preemptive Scheduling Algorithms

Time sharing
–
–
Each task has a predefined time quantum
Round-Robin
 Schedule
next task on the ready list
next
P4
P1
–
Quantum choice:
next
P2
P3
 small:
may cause frequent switches
 big: may cause slow response
–
184
Implicit assumption: all task have same
importance
© P. Reali / M. Corti
System-Software WS 04/05
Preemptive Scheduling Algorithms

Priority scheduling
–

process with highest priority is scheduled first
Variants
–
–
multilevel queue scheduling
one list per priority, use round-robin on list
dynamic priorities


–
185
proportional to time in system
inversely proportional to part of quantum used
make time quantum proportional to priority
© P. Reali / M. Corti
System-Software WS 04/05
Real-Time Scheduling Algorithms



Task needs to meet the
deadline!
Task cost is known
(should)
Two task kind:
–
–

aperiodic
periodic
Reservation
–
186

Algorithms:
–
–
Rate Monotonic
Scheduling
assign static priorities
(priority proportional to
frequency)
Earliest Deadline First
task with closest
deadline is chosen
scheduler decides if
system has enough
resources for the task
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling Algorithm Example

Situation:
–
Tasks P1, P2, P3, P4



187
Arrive at time t = 0
Priority: P1 highest, P4 lowest
Time to process: 10, 2, 5, 3
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling Algorithm Example

Highest Priority First
P1
P2
P3
P4
0
188
© P. Reali / M. Corti
10
12
17
20
System-Software WS 04/05
Scheduling Algorithm Example

Shortest Job First
P1
P2
P3
P4
189
0
© P. Reali / M. Corti
2
5
10
20
System-Software WS 04/05
Scheduling Algorithm Example

Timesharing with quantum = 2
P1
P2
P3
P4
0
190
2
© P. Reali / M. Corti
4
6
8 10 12 14 16 18 20
13
System-Software WS 04/05
Scheduling Algorithm Example

Timesharing with quantum  0
running
at 1/4
running
at 1/3
running
at 1/2
P1
P2
P3
P4
0
191
© P. Reali / M. Corti
8
11
15
20
System-Software WS 04/05
Scheduling Algorithm Example: Results

Situation:
–
Tasks P1, P2, P3, P4



–
Results




192
Arrive at time t = 0
Priority: P1 highest, P4 lowest
Time to process: 10, 2, 5, 3
turnaround
Highest Priority First:
14.75
Shortest Job First:
9.25
Timesharing with Quantum = 2:
12.75
Timesharing with Quantum  0:
13.5
© P. Reali / M. Corti
response time
9.75
4.25
3.0
0
System-Software WS 04/05
Scheduling Examples

UNIX
–
–
–

BSD similar
–

preemption
32 priority levels (round robin)
each second the priorities are recomputed (CPU usage,
nice level, last run)
every 4th tick priorities are recomputed (usage
estimation)
Windows NT
–
–
–
193
“real time” priorities: fixed, may run forever
variable: dynamic priorities, preemption
idle: last choice (swap manager)
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling Examples: Quantum & Priorities

Win2K:
–
–

BSD:
–
–

quantum = 20ms (professional) 120ms (user),
configurable
depending on type (I/O bound)
quantum = 100ms
priority = f(load,nice,timelast)
Linux:
–
–
194
quantum = quantum / 2 + priority
f(quantum, nice)
© P. Reali / M. Corti
System-Software WS 04/05
Scheduling Problems

Starvation
A task is never scheduled (although ready)
 “fairness”

Deadlock
No task is ready (nor it will ever become ready)
 detection+recovery or avoidance
195
© P. Reali / M. Corti
System-Software WS 04/05
Deadlock Conditions
A holds R
T
R
Thread
Resource
B wants R
R1
T
1
T
R2
2
A wants S
B holds S
Coffman conditions for a deadlock (1971):
 Mutual exclusion
 Hold and wait
 No resource preemption
 Circular wait (cycle)
196
© P. Reali / M. Corti
System-Software WS 04/05
Deadlock Remedies
Coarser lock granularity:
 use a single lock for all resources (e.g., Linux 2.0-2.4 “Big
Kernel Lock”)
Locking order:
 resources are ordered
 resource locking according to the resource order (ticketing)
Two-phase-locking:
 try to acquire all the resources
 if successful, lock them; otherwise free them and try again
197
© P. Reali / M. Corti
System-Software WS 04/05
Deadlock Detection, Prevention & Recovery

Deadlock detection: the system keeps a graph of
locks and tries to detect cycles.
–
–


time consuming
the graph has to be kept consistent with the actual state
Deadlock prevention (avoidance): remove one of
the four Coffman conditions  cycles
Recovery:
–
–
198
kill processes and reclaim the resources
rollback: requires to save the states of the processes
regularly
© P. Reali / M. Corti
System-Software WS 04/05
Simple Deadlock Scenario

Example
–
–

Resources R, S, T
Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively
Case 1: Sequential execution, no deadlock
A
B
C
199
+R +S -R -S
© P. Reali / M. Corti
+S
+T
-S
-T
+T
+R
-T -R
System-Software WS 04/05
Simple Deadlock Scenario

Case 2: Interleaving, deadlock
A
B
C
+R
+S
+S
+T
+T
+R
C
T
R
A
S
B
200
© P. Reali / M. Corti
System-Software WS 04/05
Complex Deadlock Scenario

Case with 6 resources and 7 tasks
graphical
representation
R
A
C
S
D
F
U
W
G
is this a case of deadlock?
201
© P. Reali / M. Corti
B
T
E
V
System-Software WS 04/05
Deadlock Avoidance Strategy in Bluebottle
Timers
Processors
Module Hierarchy
Threads
202
Traps
Interrupts
Modules
Blocks
Memory
Locks
Each Kernel Module
has a lock to protect
its data
When multiple locks
are
needed, acquire
them
according to the
module hierarchy
Configuration
© P. Reali / M. Corti
Module
Lock
System-Software WS 04/05
Priority Inversion


A high-priority task can be blocked by a lower
priority one.
Example:
High
Medium
Low
waiting
running
ready
203
© P. Reali / M. Corti
System-Software WS 04/05
Priority Inversion


Big problem for RTOS
Solutions
–
–
priority inheritance
low-priority task holding resource inherits priority of highpriority task wanting the resource
priority ceilings



204
each resource has a priority corresponding to the highest priority
of the users +1
the priority of the resource is transferred to the locking process
can be used instead of semaphores
© P. Reali / M. Corti
System-Software WS 04/05
Example: Mars Pathfinder (1996–1998)







VxWorks real-time system: preemptive, priorities
Communication bus: shared resource (mutexes)
Low priority task (short): meteorological data
gathering
Medium priority task (long): communication
High priority: bus manager
Detection: watchdog on bus activity  system reset
Fix: activate priority inheritance via an uploaded onthe-fly patch (no memory protection).
205
© P. Reali / M. Corti
System-Software WS 04/05
Locking on Multiprocessor Machines



Real parallelism!
Cannot “disable interrupts” like on single processor
machines (could stop every task, but not efficient)
Software solutions
–

Peterson, Dekker, ...
Hardware support
–
–
206
bus locking
atomic instructions
(Test And Set, Compare And Swap)
© P. Reali / M. Corti
System-Software WS 04/05
Locking on multiprocessor machines

Test And Set

Compare and Swap (Intel)
TAS s:
CAS R1, R2, A:
IF s = 0 THEN
s := 1
ELSE
CC := TRUE
END
R1: expected value
R2: new value
A: address
IF R1 = M[A] THEN
M[A] := R2; CC := TRUE
ELSE
R1 := M[A]; CC := FALSE
END
These instructions are atomic even on multiprocessors!
The usually do so by locking the data bus
207
© P. Reali / M. Corti
System-Software WS 04/05
Example: Semaphores on SMP


Counter s: available resources
Binary Semaphores with TAS
Spinning
(busy wait)
208
© P. Reali / M. Corti
Try TAS s
JMP Try
CS
TAS s
JMP Queuing
CS
Blocking
System-Software WS 04/05
Example: Semaphores on SMP


Counter s: available resources
Generic Semaphores with CAS
P(s)
Enter CS
Exit CS
V(s)
209
© P. Reali / M. Corti
P(S): { S := S - 1}
IF S < 0 THEN
jump queuing
END
Load R1s
TryP MOVE R1R2
DEC R2
CAS R1, R2, s
BNE TryP
CMP R2, 0
BN Queuing
[CS]
V(S): { S := S + 1}
IF S <= 0 THEN
jump dequeuing
END
[CS]
Load R1s
TryV MOVE R1R2
INC R2
CAS R1, R2, s
BNE TryV
CMP R2, 0
BNP Dequeuing
System-Software WS 04/05
Spin-Locks: the Bluebottle/i386 way
PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN);
CODE {SYSTEM.i386}
MOV EBX, locked[EBP] ; EBX := ADR(locked)
MOV AL, 1
; AL := 1
CLI
; switch interrupts off before
; acquiring lock
test:
XCHG [EBX], AL
CMP AL, 1
JE test
;
;
;
;
;
set and read the lock
atomically.
LOCK prefix implicit.
was locked?
retry
..
END AcquireSpinTimeout;
210
© P. Reali / M. Corti
simplified
version
System-Software WS 04/05
Active Objects in Active Oberon
State
Z = OBJECT
VAR myT: T; I: INTEGER;
Initialize
r
PROCEDURE & NEW (t: T);
BEGIN myT := t
END NEW;
Method
Object
Activity
PROCEDURE P (u: U; VAR v: V);
BEGIN { EXCLUSIVE } i := 1
END P;
BEGIN { ACTIVE }
BEGIN { EXCLUSIVE }
AWAIT (i > 0);
END
END Z;
Mutual
Exclusion
Condition
211
© P. Reali / M. Corti
System-Software WS 04/05
Active Oberon Runtime Structures
NIL
CPUs
1
Running
Wait Queue
Lock Queue
2
Ready
Awaiting
Object
Awaiting
Assertion
Ready
Ready Queue
212
© P. Reali / M. Corti
System-Software WS 04/05
Active Oberon Implementation
NIL
7
NEW
Create object;
Create process; 0
Set to ready
Preempt
6
Set to ready;
Run next ready
1
7
END
Run next ready
1
2
Awaiting
Object
4
Running
6
1
Ready
0
213
© P. Reali / M. Corti
3
Awaiting
Assertion
5
NIL
System-Software WS 04/05
Active Oberon Implementation
Enter Monitor
IF monitor lock set THEN
Put me in monitor obj wait list; 2
Run next ready
1
ELSE set monitor lock
END
Exit Monitor
Find first asserted x in wait list;
IF x found THEN set x to ready 5
ELSE Find first x in obj wait list;
IF x found THEN set x to ready 4
ELSE clear monitor lock
END
END
1
Run next ready
214
© P. Reali / M. Corti
NIL
7
2
Awaiting
Object
4
Running
6
1
Ready
0
3
Awaiting
Assertion
5
NIL
System-Software WS 04/05
Active Oberon Implementation
NIL
7
2
3
AWAIT
Put me in monitor assn wait list;
Call Exit monitor
Awaiting
Object
4
Running
6
1
Ready
3
Awaiting
Assertion
5
0
215
© P. Reali / M. Corti
NIL System-Software WS 04/05
Case Study: Windows CE 3.0

Real-time constraints
–
–

Reaction time on events
Execution time
Threads with priorities and time quanta
–
–
Priorities: 0 (high), …, 255 (low)
Time quanta in ms



Default 100 ms
0  no quantum
Single processor
end of
quantum
p
q<p
p
216
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Windows CE 3.0

Interrupt Handling
–
ISR (Interrupt Service Routine)





–
1st level handling
Kernel mode, uses kernel stack
Installed at boot-time
Creates event on-demand
Preempted by ISR with higher priority
IST (Interrupt Service Thread)



2nd level handling
User mode
Awaits events
User Modus
IST
Event
IRQ
Event
NK.EXE
ISR
Kernel Modus
217
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Windows CE 3.0

Synchronization on common resources:
–
–
Critical sections: enter, leave operations
Semaphores and mutexes (binary semaphores)
[


CS
]
[
]
[
]
Synchronization is performed with system/library
calls (they are not part of a language).
Priority inversion avoidance
–
218
priority inheritance (thread inherits priority of task wanting
the resource)
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Java


Activities are mapped to threads (no processes)
Synchronization in the language
–
–


locks
signals
Threads provided by the library
Scheduling depends on the JVM
219
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Java
public class MyThread() extends Thread {
public void run() {
System.out.println("Running");
}
public static void main(String [] arguments) {
MyThread t = (new MyStread()).start();
}
}
220
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Java
public class MyThread() implements Runnable {
public void run() {
System.out.println("Running");
}
public static void main(String [] arguments) {
Thread t = (new Thread(this)).start();
}
}
221
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Java

Protection with monitor-like objects
–
with method granularity

–
with statement granularity


public synchronized void someMethod()
synchronized(anObject) { ... }
Synchronization with signals
222
–
wait() (with optional time-out)
–
notify() / notifyAll() (“send and continue” pattern)
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Java
private Object o;
public synchronized consume() {
while (o == null) {
try { wait(); } catch (InterruptedException e) {}
}
use(o);
o = null;
notifyAll();
}
public synchronized void produce(Object p) {
while (o != null) {
try { wait(); } catch (InterruptedException e) {}
}
o = p;
notifyAll();
}
223
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: POSIX Threads





Standard interface for threads in C
Mostly UNIX, possible on Windows
Provided by a library (libpthread) and not part of the
language.
IEEE POSIX 1003.1c standard (1995)
Various implementations (both user and kernel
level)
224
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: POSIX Threads
#include <pthread.h>
pthread_mutex_t m;
void *run(){
pthread_mutex_lock(&m);
// critical section
pthread_mutex_unlock(&m);
pthread_exit(NULL);
}
int main (int argc, char *argv[]){
pthread_t t;
pthread_create(&t, NULL, run,NULL);
pthread_exit(NULL);
}
225
© P. Reali / M. Corti
System-Software WS 04/05
File Systems
File Systems - Overview




Hardware
File abstraction
File organization
File systems
–
–
–
227
Oberon
Unix
FAT
© P. Reali / M. Corti

Distributed file systems
–
–

NFS
AFS
Special topics
–
–
–
Error recovery
ISAM
B* Trees
System-Software WS 04/05
Hardware: the ATA Bus

ATA / IDE (1986)
–
–


–
–

–
UDMA 100
SATA
head / cylinder / sector
support for LBA
(logical block addressing)
PIO mode
–

master / slave
low-level interface
UDMA 66
ATA-6
–


ATA-5
–

ATA Packet Interface
(SCSI command set)
bus with 2 devices
–
ATA-2 / EIDE
ATA-4 / ATAPI
–

Advanced Technology
Attachment
Integrated Drive Electronics

read byte by byte through
hardware port
DMA mode
–
use DMA transfer
ATA-7
–
228
UDMA 133
© P. Reali / M. Corti
System-Software WS 04/05
Hardware: the SCSI Bus


SCSI: Small Computer
Systems Interface
SCSI-2
–
–

Fast SCSI
Wide SCSI

Bus with 8 devices
–
–
–

Device kinds
–
SCSI-3
–
–

–
© P. Reali / M. Corti
read-block, write-block
Transfer mode selection
–
229
direct access
CD-ROM
...
Block-oriented access
–

wide: 16 / 32 devices
bus arbitration
disconnected mode
asynchronous (hand-shake)
synchronous (period / offset)
System-Software WS 04/05
Hardware: Hard Disk
Organization
–
–
–

cylinder (c)
head (h)
sector (s)
Addressing
–
–
sector (c, h, s)
block (LBA)
sector
track (cylinder)
surface (head)

rotation
axis
230
© P. Reali / M. Corti
System-Software WS 04/05
Hardware: Example
Current disk example:
 ATA-100
 250GB
 512 bytes per sector (488·106 sectors)
 8MB cache
 8.9ms average seek time
 7200 rpm
231
© P. Reali / M. Corti
System-Software WS 04/05
Hardware: Hard Disk Improvements

Interleaving
optimize sequential sector
access



Read-ahead
Caching
Sector defect management
cylinder
1
5
4
2
7
232
© P. Reali / M. Corti
6
3
System-Software WS 04/05
Hardware: Disk Scheduling

Disk controllers have a queue of pending requests:
–
–
–
–
233
type: read or write
block number: translated into the (h,c,s)-tuple
memory address (where to copy from and to)
amount to be transferred (byte or block count)
© P. Reali / M. Corti
System-Software WS 04/05
Hardware: Disk Scheduling




Performance: minimize head movements, maximize
throughput
Scheduling is now in the hardware
First-come, first-served
(FCFS)
Shortest-seek-time-first
(SSTF)
234
© P. Reali / M. Corti


SCAN (elevator) &
C-SCAN
LOOK &
C-LOOK
System-Software WS 04/05
Hardware: Disk Scheduling

Example (head position, track number):
queue = 31, 72, 4, 18, 147, 193, 199, 153, 114, 72
235
© P. Reali / M. Corti
System-Software WS 04/05
Hardware: Disk Scheduling
236
© P. Reali / M. Corti
System-Software WS 04/05
Abstractions
Block: array of sectors
 some systems call
them “clusters”
 user configured
 reduces address space
 increases access
speed
 causes internal
fragmentation
237
© P. Reali / M. Corti
Disk: array of sectors
File: stream of bytes
 sequential access
 random access
 stored on disk
–
–
mapping byte to block
block allocation
management
System-Software WS 04/05
Abstraction Layers
Abstractions
File System
OpenFile, WriteFile, ReadFile,
SeekFile, CloseFile
Volume
ReadBlock, WriteBlock
AllocateBlock, FreeBlock
Disk
ReadSector, WriteSector
238
© P. Reali / M. Corti
Implementations
FAT
Oberon
ISO 9660
ext3
NTFS
ATA driver
SCSI driver
System-Software WS 04/05
File Organization

How can we map groups of blocks into files?
How do we manage free space?
How can I jump to a certain location?

Operation: read n bytes at position p.


239
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Contiguous Allocation




File is a group of contiguous blocks
Simple management
Fast transfers
IBM MVS (mainframe)
start
240
© P. Reali / M. Corti
length
System-Software WS 04/05
File Organization: Contiguous Allocation


external fragmentation
allocation
–
–


how much space does a file need?
first fit, best fit, …?
file growth (error? move? extensions?)
preallocation: internal fragmentation
start
241
© P. Reali / M. Corti
length
System-Software WS 04/05
File Organization: Linked Allocation

File is a linked list of blocks
–
–

no external fragmentation
no growth problems
Problems
–
–
–
sequential files only (positioning requires traversal)
space for pointers (1TB, 5B addr., 1% with 512B blocks)
reliability (lost pointers)
start
242
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Linked Allocation

Clusters: series of contiguous blocks
–
–
–
faster (less jumps)
less space wasted for pointers
internal fragmentation
start
243
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Linked Allocation

Pointer tables
–
–
–
–
the list of pointers is stored in a separate table
can be cached
usually is stored twice (reliability)
FAT (MS-DOS, OS/2, Windows, solid-state memory)
start
244
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Indexed Allocation




Index with block addresses
Fast access for random-access files
No external fragmentation
Problems
–
–
–
high management overhead
limited file size (depending on the index structure)
pointer overhead
file
245
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Indexed Allocation

Variation:
–

Advantage:
–

linked list of indexes
no file size limitation
Disadvantage:
–
Index lookup requires sequential traversal of index list
file
246
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Indexed Allocation


multi-level indexes
(index of indexes)
UNIX

Advantage:
–

fast index lookup
Disadvantage:
–
limited file size
file
247
© P. Reali / M. Corti
System-Software WS 04/05
File Organization: Indexed Allocation
Example:
 blocks 2KB
 address 4B


First level index blocks:
512 entries · 2KB = 1MB
Second level index block:
512 entries · 2KB = 0.5GB
file
248
© P. Reali / M. Corti
System-Software WS 04/05
Free Space Management

Bitmap (e.g., HFS)
–
–
–

Linked lists
–

list of free blocks (similar to linked allocation)
Grouping
–

bit vector to mark free blocks
simple
needs caching
free blocks contain n address of free blocks (similar to
multilevel indexing)
Counting
–
249
list of 2-tuples of series of free blocks (start, length)
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon File System

Disk module: controller driver
–

block management
FileDir module:
–
–
–
maps files to locations
implemented with B-trees
garbage collection (files)



–
250
Files
FileDir
the directory is the root set
anonymous (nonregistered) files are collected
Files module:
–

Disk
allows user operations (read, create, write,
…)
access is performed through riders
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon File System
Characteristics
 Block size = 1KB
 File organization
–
multilevel index:


–

64 direct
12 1st level indirect
designed to
optimize small
files
672 data bytes in file header
Block allocation
–
–
251
allocation table created at boot-time (partition GC)
no collection at run-time (partition fills up!)
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon File System

Block = 1KB
dd
dd
64 blocks
i1
0
1
63
dd
dd
dd
dd
i2
75
d
dd
di
1
dd
dd
(672B)
12 index blocks with 256
data blocks each
252
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon File System
Free block management: bitmap
Garbage collection at startup
11111111111111111111111111111111
0
8
16
24
startup / GC
11010010011110111101110100011100
0
8
16
24
allocate 16,17
11010010011110110001110100011100
0
8
16
24
allocate 19
11010010011110110000110100011100
0
253
8
© P. Reali / M. Corti
16
24
System-Software WS 04/05
Case Study: Oberon File System
Internals
File Handle
 “Rider”: current read
or write position
 Buffer (cache) for
f
consistency (each file
sees the write operations
on it)
Rider
R
R
R
“Hint”
Buffer
f
254
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon RAM Disk
File = POINTER TO Header;
Index = POINTER TO Sector;
header
primary
sector
table
ext
table
255
points to
sectors 0 - 63
index
sector 0
index
sector 1
points to
sectors
64 - 319
points to
sectors
320 - 575
© P. Reali / M. Corti
Rider = RECORD
eof: BOOLEAN;
file: File;
pos: LONGINT;
adr: LONGINT;
END;
Header = RECORD
mark: LONGINT;
name: FileDir.Name;
len, time, date: LONGINT
ext: ARRAY 12 OF Index;
sec: ARRAY 64 OF SectorTable;
END;
System-Software WS 04/05
Case Study: Oberon RAM Disk
PROCEDURE Read(VAR r: Rider; VAR x: SYSTEM.BYTE);
VAR m: INTEGER;
BEGIN
IF r.pos < r.file.len THEN
SYSTEM.GET(r.adr, x); INC(r.adr); INC(r.pos);
IF r.adr MOD SS = 0 THEN (*end of sector *)
m := SHORT(r.pos DIV SS);
IF m < STS THEN
r.adr := r.file.sec[m]
ELSE
r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]
END
END
SS = Sector Size
STS = Sector Table Size
ELSE x := 0X; r.eof := TRUE
XS = Index Size
END
END Read;
256
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: Oberon RAM Disk
PROCEDURE Write(VAR r: Rider; x: SYSTEM.BYTE);
VAR k, m, n: INTEGER; ix: LONGINT;
BEGIN
IF r.pos < r.file.len THEN
m := SHORT(r.pos DIV SS); INC(r.pos);
IF m < STS THEN
r.adr := r.file.sec[m]
ELSE
r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS]
END
ELSE
....
END;
SYSTEM.PUT(r.adr, x); INC(r.adr);
END Write;
257
© P. Reali / M. Corti
overwrite
System-Software WS 04/05
Case Study: Oberon RAM Disk
IF r.pos < r.file.len THEN
....
ELSE
IF r.adr MOD SS = 0 THEN
m := SHORT(r.pos DIV SS);
IF m < STS THEN Kernel.AllocSector(0, r.adr);
r.file.sec[m] := r.adr
ELSE n := (m-STS) DIV XS; k := (m-STS) MOD XS;
IF k = 0 THEN
Kernel.AllocSector(0, ix);
r.file.ext[n] := SYSTEM.VAL(Index, ix)
END;
Kernel.AllocSector(0, r.adr);
r.file.ext[n].x[k] := r.adr
END;
INC(r.pos); r.file.len := r.pos
END;
SYSTEM.PUT(r.adr, x); INC(r.adr);
258
© P. Reali / M. Corti
expand
System-Software WS 04/05
Case Study: UNIX, inodes


File system: files and directories (files with a special content)
A file is represented by an inode
Inode:
 file owner
 file type
–





regular / directory / special
access permissions
access time
reference count (links)
table of contents
file size
259
© P. Reali / M. Corti
Inode table of contents
 10 (12) direct blocks
 1 indirect block
 1 double indirect block
 1 triple indirect block
System-Software WS 04/05
Case Study: UNIX, inodes
d
d
d
type
access
refc
info
0
1
i1
i
i 1
d
d
d
i2
i
i 2
i1
i
i 1
d
d
d
ii 2i2
2
i1
i
i 1
d
d
d
1
10
11
12
2
inode
i3
i
i 3
3
260
© P. Reali / M. Corti
1
1
System-Software WS 04/05
Case Study: UNIX, directories


Directories are normal files with a special content.
The data part contains a list with
–
–

inode
name
Every directory has two special entries
–
–
261
.
..
© P. Reali / M. Corti
the directory itself
the parent directory
System-Software WS 04/05
Case Study: UNIX, inodes
inode 2
type: dir
blocks: 132
owner: root
ref count: 1
inode 3
type: dir
blocks: 406
owner: root
ref count: 1
inode 6
type: file
blocks: 42, 103
owner: root
ref count: 1
block 132
/
2 .
2 ..
4 bin
3 root
block 406
/root/
3 .
2 ..
5 .tcshrc
6 mbox
block 42
data
inodes
disk block
block 103
data
inode #
262
name
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: UNIX, soft and hard links

Hard links:
–
–
two directories entries with the same inode number
each file has a reference counter
42
42

file
hardlink
Soft links
–
the directory entry points to a special file with the path of
the linked file
42
43
file
softlink
(inode 43 points to a special file with the path of file)
263
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: UNIX, hard links
inode 2
type: dir
blocks: 132
owner: root
ref count: 1
inode 3
type: dir
blocks: 406
owner: root
ref count: 1
inode 5
type: file
blocks: 42, 103
owner: root
ref count: 2
block 132
/
2 .
2 ..
4 bin
3 root
block 406
/root/
3 .
2 ..
5 mails
5 mbox
block 42
data
inodes
disk block
block 103
data
264
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: UNIX, soft links
inode 2
type: dir
blocks: 132
owner: root
ref count: 1
inode 3
type: dir
blocks: 406
owner: root
ref count: 1
block 132
/
2 .
2 ..
4 bin
3 root
block 406
/root/
3 .
2 ..
5 mbox
6 mails
265
© P. Reali / M. Corti
inode 5
type: file
blocks: 42
owner: root
ref count: 1
block 42
data
inode 6
type: file
blocks: 43
owner: root
ref count: 1
block 43
/root/mbox
System-Software WS 04/05
Case Study: UNIX, Volume Layout
A volume (partition) contains
 boot block
–

super block
–
–
–
–


bootstrap code
size
max file
free space
…
inodes
data blocks
boot
block
266
super
block
© P. Reali / M. Corti
inode list
data blocks
System-Software WS 04/05
Case Study: UNIX, Functions
Core functions
 bread
read block
 bwrite
write block




iget
iput
bmap
namei
267
© P. Reali / M. Corti
get inode from disk
put inode to disk
map (inode, offset) to disk block
convert path name to inode
System-Software WS 04/05
Case Study: UNIX, namei
namei (path)
if (absolute path)
inode = root;
else
inode = current directory inode;
while (more path to process) {
read directory (inode);
if match(directory, name component) {
inode = directory[name component];
iget(inode);
} else {
return no inode;
}
}
return inode;
268
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: FAT



FATnn: nn corresponds to the FAT size in bits
FAT12, FAT16, FAT32 used by MS-DOS and
Windows for disks and floppies
Volume Layout
boot FAT1
block
269
© P. Reali / M. Corti
FAT2
root
directory
data
System-Software WS 04/05
Case Study: FAT, Example
disk
size
0
1
2
EOF
3
EOF
4
12
5
FREE
6
9
7
BAD
8
3
9
11
10
EOF
11
10
12
EOF
13
FREE
File 1:
6
9
File 2:
4
12
File 3:
8
3
11
10
…
270
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: FAT, Directory

Information about files is kept in the directory
File name (8)
Extension (3)
A
D
V
S
H
R
Reserved (10)
Time (2)
Date (2)
First block (2)
File size (4)
271
© P. Reali / M. Corti
System-Software WS 04/05
Case Study: FAT, Max. Partition Size
Block size
FAT-12
0.5 KB
2 MB
1 KB
4 MB
2 KB
8 MB
128 MB
4 KB
16 MB
256 MB
1 TB
8 KB
512 MB
2 TB
16 KB
1024 MB
2 TB
32 KB
2048 MB
2 TB
272
© P. Reali / M. Corti
FAT-16
FAT-32
System-Software WS 04/05
File System Mounting

More than one volume mounted in the same
directory tree.
/
afs
ethz.ch
home
corti
usr
bin
floppy
mnt
dos
cd
273
© P. Reali / M. Corti
System-Software WS 04/05
Virtual File System

Support for several file systems
–
–
–


disk based
network
special
VFS: unifies the system calls
Mirrors the traditional UNIX file system model
Applications
274
ext3
FAT
NFSVFS AFS
proc
pts
ext3
FAT
NFS
proc
pts
© P. Reali / M. Corti
AFS
System-Software WS 04/05
File System Mounting




Each file system type has a method table
System calls are indirect function calls through the method
table
Common interface (open, write, readdir, lock, …)
Each file is associated with a the method table
275
© P. Reali / M. Corti
System-Software WS 04/05
File System Mounting: Special Files

Devices
–
–
–
–
–



disks
memory
USB devices
serial ports
…
Kernel communication (e.g., proc)
Uniform interface (open, close, read, write)
Uniform protection (user, groups)
276
© P. Reali / M. Corti
System-Software WS 04/05
File Systems: Protection

Restrict: access (who), operations (what),
management
–
FAT: flags in the directory


–
UNIX: restrictions in inodes




–
based on users and groups
operations: read, write, execute
directories: manage files
not so flexible
VMS: access lists

277
e.g., read only
execution based on name
list of users and rights per file
© P. Reali / M. Corti
System-Software WS 04/05
Distributed File Systems
Distributed File Systems (DFS)

Clients, servers and storage are dispersed among
machines in a distributed system.
279
Client
Client
Client
Server
Server
Client
Client
© P. Reali / M. Corti
Server
Client
Server
System-Software WS 04/05
Overview
Naming (dynamic):
 location
transparency: file
name does not reveal
the file location
 location
independence: file
name does not change
when storage is moved
280
© P. Reali / M. Corti
Caching (efficiency)
 write-through
 delayed-write
 write-on-close
Consistency
 client-initiated: poll
server for changes
 server-initiated: notify
clients
System-Software WS 04/05
Naming

Simple approaches:
–
–
–

Transparent
–
–
–
–

file is identified by a host, path pair
Ibis (host:path)
SMB (\\host\path)
remote directory are mounted in the local file system
not uniform (the mount point is not defined)
NFS (/mnt/home, /home/)
SMB (\\host\path mounted on Z:)
Global name structure
–
–
281
uniform and transparent naming
AFS (/afs/cell/path)
© P. Reali / M. Corti
System-Software WS 04/05
Caching



Reduces network and disk load
Consistency problems
Granularity:
–
–

Location:
–
–
–

How much? Big/small chunks of data? Entire files?
Big: +hit ratio, +hit penalty, +consistency problems
memory: +diskless stations, +speed
disk: +cheaper, +persistent
hybrid
Space consumption on the clients
282
© P. Reali / M. Corti
System-Software WS 04/05
Caching
Policies:
 write-through: +reliability, -performance (cache is
effective only for read operations)
 delayed-write: +write speed, +unnecessary writes
eliminated, -reliability
–
–

write when the cache is full (+performance, -long time in
the cache)
regular intervals
write-on-close
283
© P. Reali / M. Corti
System-Software WS 04/05
Consistency


Is my cached copy up-to-date?
Client-initiated approach:
–
–

the client performs validity checks
when? open/fixed intervals/every access
Server-initiated approach:
–
–
–
284
the server keeps track of cached files (parts)
notifies the clients when conflicts are detected
should the server allow conflicts?
© P. Reali / M. Corti
System-Software WS 04/05
Stateless and Stateful Servers
Stateful: the server keeps track of each accessed file
 session IDs (e.g., identifying an inode on the server)
 fast
–
–
–
–

simple requests
caches
fewer disk accesses
read ahead
volatile
–
–
285
server crash: rebuild structures (recovery protocol)
client crash: orphan detection and elimination
© P. Reali / M. Corti
System-Software WS 04/05
Stateless and Stateful Servers
Stateless: each request is self-contained
 request: file and position
 complex requests
 need for uniform low-level naming scheme (to avoid
name translations)
 need idempotent operations (same results if
repeated)
–

absolute byte counts
No locking possible
286
© P. Reali / M. Corti
System-Software WS 04/05
File Replication


A file can be present on failure independent
machines
Naming scheme manages the mapping
–
–


same high-level name
different low-level names
Transparency
Consistency
287
© P. Reali / M. Corti
System-Software WS 04/05
Distributed File-Systems (mainstream)




NFS: Network File System (Sun)
AFS: Andrew File System (CMU)
SMB: Server Message Block (Microsoft)
NCFS: Network Computer FS (Oberon)
288
© P. Reali / M. Corti
System-Software WS 04/05
Network File System (NFS)




UNIX - based (Sun)
mount file system from
another machine into
local directory
stateless (no
open/close)
uses UDP to
communicate

–

© P. Reali / M. Corti
every operation is a
remote procedure call
known problems:
–
–
–

289
based on RPC and
XDR (External Data
Representation)
no caching
no disconnected mode
efficiency
security: IP based
System-Software WS 04/05
NFS: Example
exports
etc
/home/ client(rw)
reali
/
home
corti
server
client
mount -t nfs server:/home /home
etc
etc
reali
/
/
home
home
corti
290
© P. Reali / M. Corti
System-Software WS 04/05
NFS


No special servers (each machine can act as a
server and as a client)
Cascading mounts are allowed
–
–

mount -t nfs server1:/home /home
mount -t nfs server2:/projects/corti /home/corti/projects
Limited scalability (limited number of exports)
291
© P. Reali / M. Corti
System-Software WS 04/05
NFS: Stateless Protocol



Each request contains a unique file identifier and an
absolute offset
No concurrency control (locking has to be
performed by the applications)
Committed information is assumed to be on disk
(the server cannot cache writes)
292
© P. Reali / M. Corti
System-Software WS 04/05
Network File System (NFS)
System call layer
Virtual file system layer
Local file
system
Virtual file system layer
NFS client
NFS server
RPC / XDR
RPC / XDR
Local file
system
network (UDP)
293
© P. Reali / M. Corti
System-Software WS 04/05
Remote Procedure Invocation: Overview

Problem
–
–
–
–
send structured information
from A to B
A and B may have different
memory layouts
byte order problems
network
byteordering
Big-endian: MSB before LSB
• IBM, Motorola, SPARC
How is 0x1234 (2 bytes)
represented in memory?
12
34
1
0
34
12
Little-endian: LSB before MSB
• VAX, Intel
little end
first
294
© P. Reali / M. Corti
System-Software WS 04/05
Marshalling / Serialization
Marshalling: packing one or
more data items into a
buffer using a standard
representation

Presentation layer (OSI)

RPC + XDR (Sun)
–
–

IIOP / CORBA (OMG)
–
–

RFC 1014, June 1987
RFC 1057, June 1988
V2.0, February 1997
V3.0, August 2002
SOAP / XML (W3C)
–
295
V1.1, May 2000
© P. Reali / M. Corti
XDR Type System
 [unsigned] integer (32-bit)
 [unsigned] hyper-integer
(64-bit)
 enumeration (unsigned int)
 boolean (enum)
 float / double (IEEE 32/64bit)
 opaque
 string
 array (fix + variable size)
 structure
 union
 void
System-Software WS 04/05
RPC Protocol


Remote procedure call
Marshalling of procedure
parameters
Client


Message format
Authentication
Naming
Server
procedure P(a, b, c)
• pack parameters
• send message to
server
• await response
• unpack response
296

© P. Reali / M. Corti
Server
• unpack parameters
• find procedure
• invoke
P(a, b, c)
• pack response
• send response
System-Software WS 04/05
NFS
Client
297
© P. Reali / M. Corti
RPC - protocol
Server
lookup
lookup
read
read
write
write
System-Software WS 04/05
NFS Efficiency

Stateless protocols are inherently slow
–

Caching:
–
–
file blocks (data)
file attributes (inodes)
–
read-ahead
delayed write
–
tradeoff between speed and consistency
–

e.g., directory lookup
It is possible that two machines see different data
298
© P. Reali / M. Corti
System-Software WS 04/05
NFS: Security

Exports based on IP addresses
–
–


low security
low granularity
Data is not encrypted
Permissions based on user and group ID
–
299
uniform naming needed (e.g., NIS)
© P. Reali / M. Corti
System-Software WS 04/05
Andrew File System (AFS)


1983 CMU (later IBM, now open source)
Scalable (>5000 workstations):
–




network divided in clusters (cells)
Client/user mobility (files are accessible from
everywhere)
Security: encrypted communication (Kerberos)
Protection: control access lists
Heterogeneity: clear interface to the server
300
© P. Reali / M. Corti
System-Software WS 04/05
Andrew File System (AFS)




server provides a cell
world-wide addressing
scheme (name  cell)
client caches a whole
file
server-synchronization
on file open and close





301
© P. Reali / M. Corti
AFS is efficient
low network overhead
stateful: consistency is
implemented with
callbacks
callback = client is in
synch with server
on store, server
changes the callbacks
System-Software WS 04/05
AFS: Logical View
Private
Space
usr
/
Shared Space
bin
afs
dir
dir Mount Point
vol Volume
bin
302
© P. Reali / M. Corti
f
System-Software WS 04/05
AFS: Physical View
network
client
sever
ethz.ch
cell
epfl.ch
cmu.edu
303
© P. Reali / M. Corti
System-Software WS 04/05
AFS
Client
open
RPC - protocol
Server
open
Cache
read
write
close
304
© P. Reali / M. Corti
close
System-Software WS 04/05
AFS: Consistency




Interaction only when opening and closing files.
Writes are not visible on other machines before a
close.
Clients assume that cached files are up-to-date.
Servers keep track of caching by the clients
(callbacks)
–
305
clients are notified in case of changes
© P. Reali / M. Corti
System-Software WS 04/05
AFS: Kerberos

Kerberos (Cerberos: three-headed dog guarding the
Hades)
–
–
–



authentication
accounting
audit
Needham-Schroeder shared key protocol
Distributed
AFS: communication is encrypted
306
© P. Reali / M. Corti
System-Software WS 04/05
AFS: Protection
Access lists:
%> fs listacl thesis
Access list for thesis is
Normal rights:
system:anyuser l
trg rlidwk
corti rlidwka



It’s possible to allow (or deny) access to users or
customized groups
Restriction on: read, write, lookup, insert,
administer, lock and delete.
Supports UNIX control bits.
307
© P. Reali / M. Corti
System-Software WS 04/05
Network Fallacies
The Eight Fallacies of Distributed Computing (Peter Deutsch)




The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure




308
© P. Reali / M. Corti
The network topology
doesn’t change
There is one administrator
Transport cost is zero
The network is
homogeneous
System-Software WS 04/05
General Principles (Satyanarayan)
From DFSs we learned the following lessons:
 we should try to move computations to the clients
 use caching whenever possible
 special files (e.g., temporary) can be specially
treated.
 make scalable systems.
 trust the fewest possible entities
 batch work if possible
309
© P. Reali / M. Corti
System-Software WS 04/05
Kernel Structure
Introduction

Kernel performs “dangerous” operations
–
–

Kernel must be protected against malign user code
–
–


page table mapping
scheduling
access to other processes’ data
increasing own processes’ priority
Kernel must have more rights than user code
Solution:
–
–
–
311
distinguish between kernel mode and user mode
access kernel through system calls
the system calls define the interface to the kernel
© P. Reali / M. Corti
System-Software WS 04/05
Kernel Protection
application
application
application
application
system
application
calls
application
drivers
312
© P. Reali / M. Corti
memory
manager
file
systems
System-Software WS 04/05
Kernel Protection
Means:
 hardware support
–
–

separate address spaces
–

privileged instructions
supervisor mode
user process has no access to kernel structures
access memory / functions through symbolic names
–
313
user has no access to hardware
© P. Reali / M. Corti
System-Software WS 04/05
Kernel Protection


Privileged instructions in user mode generate a trap
Mode switch:
–
–

Parameters:
–
–

interrupts
gated calls (user generated sw interrupt calls)
stack
registers
Examples:
–
–
314
Intel x86: 4 protection levels (code/segment attribute),
interrupt
PowerPC: 2 levels (CPU attribute), special instruction
© P. Reali / M. Corti
System-Software WS 04/05
Linux System Calls (Intel)


System calls are wrapped in libraries (e.g., libc)
The library function
–
–
–
–

writes the parameters in registers (5)
writes the parameters on the stack (>5)
writes the system call number in EAX
calls int 0x80
The kernel
–
315
jumps to the corresponding function in sys_call_table
© P. Reali / M. Corti
System-Software WS 04/05
Linux System Calls
Examples:
 pid_t fork(void): creates a child process
 ssize_t write(int fd, const void *buf, size_t
count): writes count bytes from buf to fd
 int kill(pid_t pid, int sig): send signal to a
process
 int gettimeofday(struct timeval *tv, struct
timezone *tz): gets the current time
 int open(const char *pathname, int flags):
opens a file
 int ioctl(int d, int request, ...): manipulates
special devices

…
316
© P. Reali / M. Corti
System-Software WS 04/05
Windows System Calls


Layered system: system
call must be performed
by a wrapper
(NTDLL.DLL).
The system call position
in the
KiSystemServiceTable is
not known (depends on
the build)
application
call WriteFile()
KERNEL32.DLL NtWriteFile()
NTDLL.DLL
…
int 0x2e
KiSystem
Service
Table
317
© P. Reali / M. Corti
System-Software WS 04/05
Kernel Design: API vs. System Calls
Linux
 system-calls are clearly
specified (POSIX standard)
 system-calls do not change
 about 100 calls
Windows
 system-calls are hidden
 only Win32 API is published
 Win32 is standard
 “thousands” of API calls,
still growing
 some API calls are handled
in user space
 More than one API:
–
–
318
© P. Reali / M. Corti
POSIX
OS/2
System-Software WS 04/05
Protection and SMP

What happens when two process (on two CPUs)
enter in kernel mode?
–
–
319
Big kernel lock: not allowed (OpenBSD, NetBSD)
Fine grained locks in the kernel (FreeBSD 5, Linux 2.6)
© P. Reali / M. Corti
proc1:
proc1:
int 0x80
int 0x80
CPU 1
CPU 2
System-Software WS 04/05
Kernel Structure

monolithic kernel
–
–
–

layered system
–
–

layern uses functions from layern-1
OS/2 (some degree of layering)
virtual machine
–

big mess, no structure, one big block, fast
MS-DOS (no protection), original UNIX
micro-kernel (AIX, OS X)
define artificial environment for programs
client-server
–
320
tiny communication microkernel to access various
services
© P. Reali / M. Corti
System-Software WS 04/05
Monolithic Kernels

Monolithic
Micro-kernel
user-level
applications
user-level
applications
scheduler
signal handling
file system
swapping
virtual memory
scheduler
signal handling
file system
swapping
virtual memory
terminal controllers
device drivers
memory controllers
321

© P. Reali / M. Corti
terminal controllers
device drivers
memory controllers
System-Software WS 04/05
Layered Systems




THE operating system
A layer uses only functions from
below
What goes where?
Less efficient
user programs
buffering I/O
console drivers
memory management
CPU scheduling
hardware
322
© P. Reali / M. Corti
System-Software WS 04/05
Virtual Machines






VM operating system (IBM)
slow and difficult to implement
complete protection
no sharing of resources
useful for development and
research
compatibility
procs
procs
procs
virtual machine
hardware
323
© P. Reali / M. Corti
System-Software WS 04/05
Design: Kernel or User Space?
Big monolithic kernel:
 fast (less switches)
 less protection
Examples:
 HTTP server in the Linux
kernel.
 graphic routines in
Windows
324
© P. Reali / M. Corti
Modular and micro-kernels:
 structured
 more separation
 move code to user space
 less efficient
 more secure
Example:
 user level drivers
System-Software WS 04/05
Virtual Machines

Machine specification in
software
–
–
–
–
instruction set
memory layout
virtual devices
....



–
–

325
© P. Reali / M. Corti
JVM (Java Virtual Machine)
.NET / Mono
VMWare
specified machine is a whole
PC
allows multiple PC
environments on same
machine
IBM VM/370
System-Software WS 04/05
Case Study: JVM
Virtual Machines
What is a machine?
 does something (...useful)
 programmable
 concrete (hardware)
What is a virtual machine?
 a machine that is not
concrete
 a software emulation of a
physical computing
environment
327
© P. Reali / M. Corti
Reality is somewhat fuzzy!
Is a Pentium-II a machine?
instructions
RISC Op
Core Op1
2
Op3
decoder
Hardware and software are
logically equivalent
(A. Tanenbaum)
System-Software WS 04/05
Virtual Machine, Intermediate Language

Pascal P-Code (1975)
–
–
–
–

Modula M-Code (1980)
–
–

high code density
Lilith as microprogrammed virtual processor
JVM – Java Virtual Machine (1995)
–
–

stack-based processor
strong type machine language
compiler: one front end, many back ends
UCSD Apple][ implementation, PDP 11, Z80
Write Once – Run Everywhere
interpreters, JIT compilers, Hot Spot Compiler
Microsoft .NET (2000)
–
328
language interoperability
© P. Reali / M. Corti
System-Software WS 04/05
JVM Case Study





compiler (Java to bytecode)
interpreter, ahead-of-time
compiler, JIT
dynamic loading and linking
exception Handling
memory management,
garbage collection
329
© P. Reali / M. Corti


OO model with single
inheritance and interfaces
system classes to provide
OS-like implementation
–
–
–
–
compiler
class loader
runtime
system
System-Software WS 04/05
JVM: Type System

Primitive types
–
–
–
–
–
–
–
–
–
330
byte
short
int
long
float
double
char
reference
boolean mapped to int
© P. Reali / M. Corti

Object types
–
–
–



classes
interfaces
arrays
Single class inheritance
Multiple interface
implementation
Arrays
–
–
anonymous types
subclasses of
java.lang.Object
System-Software WS 04/05
JVM: Java Byte-Code
Memory access
 tload / tstore
 ttload / ttstore
 tconst
 getfield / putfield
 getstatic / putstatic
Operations
 tadd / tsub / tmul / tdiv
 tshifts
Conversions
 f2i / i2f / i2l / ....
 dup / dup2 / dup_x1 / ...
331
© P. Reali / M. Corti
Control
 ifeq / ifne / iflt / ....
 if_icmpeq / if_acmpeq
 invokestatic
 invokevirtual
 invokeinterface
 athrow
 treturn
Allocation
 new / newarray
Casting
 checkcast / instanceof
System-Software WS 04/05
JVM: Java Byte-Code Example
bipush
Operation
Push byte
Format
Forms
bipush
byte
bipush = 16 (0x10)
Operand Stack ...
Description
332
© P. Reali / M. Corti
=>
..., value
The immediate byte is sign-extended to an int
value. That value is pushed onto the operand
stack.
System-Software WS 04/05
JVM: Machine Organization
Virtual Processor
 stack machine
 no registers
 typed instructions
 no memory addresses, only
symbolic names
Runtime Data Areas
 pc register
 stack
–
–
–


heap
method area
–


333
© P. Reali / M. Corti
locals
parameters
return values
code
runtime constant pool
native method stack
System-Software WS 04/05
program
JVM: Execution Example
iload 5
iload 6
iadd
istore 4
locals
v4
v5
v6
iload 5
iload 6
istore 4
v6
v5
iadd
v5+v6
operand stack
Time
334
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Reflection
Load and manipulate unknown classes at runtime.

java.lang.Class
–
–
–

getFields
getMethods
getConstructors
–
–


java.lang.reflect.Method
getModifiers
invoke
java.lang.reflectConstructor
java.lang.reflect.Field
–
–
–
–
335
setObjectgetObject
setInt
getInt
setFloat getFloat
.....
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Reflection – Example
import java.lang.reflect.*;
public class ReflectionExample {
public static void main(String args[]) {
try {
Class c = Class.forName(args[0]);
Method m[] = c.getDeclaredMethods();
for (int i = 0; i < m.length; i++) {
System.out.println(m[i].toString());
}
} catch (Throwable e) {
System.err.println(e);
}
}
}
336
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Java Weaknesses
Transitive closure of java.lang.Object contains






1.1
1.2
1.3
1.4
5 (1.5)
classpath 0.03
47
178
180
248
280
299
class Object {
public String toString();
....
class String {
}
public String toUpperCase(Locale loc);
....
public final class Locale implements
}
Serializable, Cloneable {
....
}
337
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Java Weaknesses
Class static initialization

T is a class and an instance of T
is created
Problem

circular dependencies in static
initialization code
T tmp = new T();

T is a class and a static method
of T is invoked
T.staticMethod();

A nonconstant static field of T is
used or assigned
(field is not static, not final, and
not initialized with compile-time
constant)
A
B
static {
x = B.f();
}
static {
y = A.f();
}
T.someField = 42;
338
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Java Weaknesses
interface Example {
final static String labels[] = {“A”, “B”, “C”}
}
hidden static initializer:
labels = new String[3];
labels[0] = “A”; labels[1] = “B”; labels[2] = “C”;
Warning:
 in Java final means write-once!
 interfaces may contain code
339
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Memory Model

The JVM specs define a memory model:
–
–

defines the relationship between variables and the
underlying memory
meant to guarantee the same behavior on every JVM
The compiler is allowed to reorder operations
unless synchronized or volatile is specified.
340
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Reordering

read and writes to ordinary variables can be
reordered.
public class Reordering {
int x = 0, y = 0;
public void writer() {
x = 1;
y = 2;
}
public void reader() {
int r1 = y;
int r2 = x;
}
}
341
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Memory Model

synchronized: in addition to specify a monitor it
defines a memory barrier:
–
–


acquiring the lock implies an invalidation of the caches
releasing the lock implies a write back of the caches
synchronized blocks on the same object are
ordered.
order among accesses to volatile variables is
guaranteed (but not among volatile and other
variables).
342
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Double Checked Lock
Singleton
public class SomeClass {
private static Resource resource = null;
public Resource synchronized getResource() {
if (resource == null) {
resource = new Resource();
}
return resource;
}
}
343
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Double Checked Lock
Double checked locking
public class SomeClass {
private static Resource resource = null;
public Resource
getResource() {
if (resource == null) {
synchronized (this) {
if (resource == null) {
resource = new Resource();
}
}
}
return resource;
}
}
344
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Double Checked Lock
Thread 1
Thread 2
public class SomeClass {
public class SomeClass {
}
private Resource resource
= null;
private Resource resource
= null;
public Resource getResource() {
if (resource == null) {
synchronized {
if (resource == null) {
resource =
new Resource();
}
}
}
return resource;
}
The object is
public Resource getResource() {
if (resource == null) {
synchronized {
if (resource == null) {
resource =
new Resource();
}
}
}
return resource;
}
345
instantiated }
but not yet initialized!
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Immutable Objects are not Immutable

Immutable objects:
–
–

all types are primitives or references to immutable objects
all fieds are final
Example (simplified): java.lang.String
–
contains



–
an array of characters
the length
an offset
example: s = “abcd”, length = 2, offset = 2, string = “cd”
String s1 = “/usr/tmp”
String s2 = s1.substring(4); //should contain “/tmp”


Sequence: s2 is instantiated, the fields are initialized (to 0),
the array is copied, the fields are written by the constructor.
What happens if instructions are reordered?
346
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Reordering Volatile and Nonvolatile Stores



volatile reads and writes are totally ordered among
threads
but not among normal variables
example volatile boolean initialized = false;
SomeObject o = null;
Thread 1
Thread 2
o = new SomeObject;
initialized = true;
while (!initialized) {
sleep();
}
o.field = 42;
347
© P. Reali / M. Corti
?
System-Software WS 04/05
JVM: JSR 133




Java Community Process
Java memory model revision
Final means final
Volatile fields cannot be reordered
348
© P. Reali / M. Corti
System-Software WS 04/05
Java JVM: Execution

Interpreted (e.g., Sun JVM)
–
–
–
–

Just-in-time compilers (e.g., Sun JVM, IBM JikesVM)
–
–
–
–

bytecode instructions are interpreted sequentially
the VM emulates the Java Virtual Machine
slower
quick startup
bytecode is compiled to native code at load time (or later)
code can be optimized (at compile time or later)
quicker
slow startup
Ahead-of time compilers (e.g., GCJ)
–
–
–
–
349
bytecode is compiled to native code offline
quick startup
quick execution
static compilation
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Loader – The Classfile Format
ClassFile {
version
constant pool
flags
super class
interfaces
fields
methods
attributes
}
350
© P. Reali / M. Corti
Constants:
 Values
String / Integer / Float / ...
 References
Field / Method / Class / ...
Attributes:
 ConstantValue
 Code
 Exceptions
System-Software WS 04/05
JVM: Class File Format
class HelloWorld {
public static void printHello() {
System.out.println("hello, world");
}
public static void main (String[] args) {
HelloWorld myHello = new HelloWorld();
myHello.printHello();
}
}
351
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Class File (Constant Pool)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
String hello, world
Class HelloWorld
Class java/io/PrintStream
Class java/lang/Object
Class java/lang/System
Methodref HelloWorld.<init>()
Methodref java/lang/Object.<init>()
Fieldref java/io/PrintStream
java/lang/System.out
Methodref HelloWorld.printHello()
Methodref
java/io/PrintStream.println(java/lang/Strin
g )
NameAndType <init> ()V
NameAndType out Ljava/io/PrintStream;
NameAndType printHello ()V
NameAndType println
(Ljava/lang/String;)V
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
352
© P. Reali / M. Corti
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
Unicode
()V
(Ljava/lang/String;)V
([Ljava/lang/String;)V
<init>
Code
ConstantValue
Exceptions
HelloWorld
HelloWorld.java
LineNumberTable
Ljava/io/PrintStream;
LocalVariables
SourceFile
hello, world
java/io/PrintStream
java/lang/Object
java/lang/System
main
out
printHello
System-Software WS 04/05
JVM: Class File (Code)
Methods
0 <init>()
0 ALOAD0
1 INVOKESPECIAL [7] java/lang/Object.<init>()
4 RETURN
1 PUBLIC STATIC main(java/lang/String [])
0 NEW [2] HelloWorld
3 DUP
4 INVOKESPECIAL [6] HelloWorld.<init>()
7 ASTORE1
8 INVOKESTATIC [9] HelloWorld.printHello()
11 RETURN
2 PUBLIC STATIC printHello()
0 GETSTATIC [8] java/io/PrintStream java/lang/System.out
3 LDC1 hello, world
5 INVOKEVIRTUAL [10] java/io/PrintStream.println(java/lang/String )
8 RETURN
353
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Compilation – Pattern Expansion

Each byte code is translated according to fix
patterns
+
-

easy
limited knowledge
Example (pseudocode)
switch (o) {
case ICONST<n>: generate(“push n”); PC++; break;
case ILOAD<n>: generate(“push off_n[FP]”); PC++; break;
case IADD: generate(“pop -> R1”);
generate(“pop -> R2”);
generate(“add R1, R2 -> R1”);
generate(“push R1”);
PC++;
break;
…
System-Software WS 04/05
354 © P. Reali / M. Corti
JVM: Optimizing Pattern Expansion
Main Idea:
 use internal virtual stack
 stack values are consts / fields / locals / array
fields / registers / ...
 flush stack as late as possible
emitted
code
virtual
stack
MOV EAX, off4[FP]
local4
iload4
355
© P. Reali / M. Corti
iload5
ADD EAX, off5[FP]
local5
local5
local4
EAX
iadd
iload 4
iload 5
iadd
istore 6
MOV off6[FP], EAX
EAX
istore6
System-Software WS 04/05
JVM: Compiler Comparison
5 instructions
9 memory accesses
pattern expansion
push
push
pop
add
pop
356
off4[FP]
off5[FP]
EAX
0[SP], EAX
off6[FP]
© P. Reali / M. Corti
iload_4
iload_5
iadd
istore_6
3 instructions
3 memory accesses
optimized
mov EAX, off4[FP]
add EAX, off5[FP]
mov off6[FP], EAX
System-Software WS 04/05
Linking (General)


A compiled program contains references to
external code (libraries)
After loading the code the system need to link the
code to the library
–
–
–

identify the calls to external code
locate the callees (and load them if necessary)
patch the loaded code
Two options:
–
–
357
the code contains a list of sites for each callee
the calls to external code are jumps to a procedure
linkage table which is then patched (double indirection)
© P. Reali / M. Corti
System-Software WS 04/05
Linking (General)
358
0
instr
0
instr
1
instr
1
instr
2
jump
2
jump
3
instr
3
instr
4
instr
4
instr
5
jump
5
jump
6
instr
6
instr
7
jump
7
jump
9
instr
9
instr
10
instr
10
instr
-
2
proc 0
5
100
jump
proc 1
7
101
jump
© P. Reali / M. Corti
101
100
101
System-Software WS 04/05
Linking (General)
359
0
instr
0
instr
1
instr
1
instr
2
jump
2
jump
3
instr
3
instr
4
instr
4
instr
5
jump
5
jump
6
instr
6
instr
7
jump
7
jump
9
instr
9
instr
10
instr
10
instr
&p1
&p0
&p1
101
100
101
proc 0
5
100
jump
&p0
proc 1
7
101
jump
&p1
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Linking

Bytecode interpreter
–

Native code (ahead of time compiler)
–
–

references to other objects are made through the JVM
(e.g., invokevirtual, getfield, …)
static linking
classic native linking
JIT compiler
only some classes are compiled
– calls could reference classes that are not yet loaded or
compiled (delayed compilation)
 code instrumentation
–
360
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Methods and Fields Resolution




method and fields are accessed through special VM
functions (e.g., invokevirtual, getfield, …)
the parameters of the special call defines the target
the parameters are indexes in the constant pool
the VM checks id the call is legal and if the target is
presentl
361
© P. Reali / M. Corti
System-Software WS 04/05
JVM: JIT – Linking and Instrumentation

Use code instrumentation to detect first access of static
fields and methods
class A {
....
...B.x
class B {
int x;
}
}
B.x
362
© P. Reali / M. Corti
CheckClass(B);
B.x
IF ~B.initialized
THEN
Initialize(B)
END;
System-Software WS 04/05
Compilation and Linking Overview
CC
header
header
C source
Compiler
Object
Object
File
Object
File
File
Object
file
Linker
Loader
Loaded
Code
363
© P. Reali / M. Corti
System-Software WS 04/05
Compilation and Linking Overview
Oberon
source
Compiler
Object
Object
File
Object
File
File
Object &
Symbol
Loader
Linker
Loaded
Loaded
Module
Loaded
Module
Module
364
© P. Reali / M. Corti
Loaded
Module
System-Software WS 04/05
Compilation and Linking Overview
Java
source
Compiler
Class
File
JIT
Compiler
Class
Loader
Class
365
© P. Reali / M. Corti
Reflection
API
Loader
Linker
Class
Class
Class
Class
System-Software WS 04/05
Jaos


Jaos (Java on Active Object System) is a Java
virtual machine for the Bluebottle system
goals:
–
–
–
–
366
implement a JVM for the Bluebottle system
show that the Bluebottle kernel is generic enough to
support more than one system
interoperability between the Active Oberon and Java
languages
interoperability between the Oberon System and the Java
APIs
© P. Reali / M. Corti
System-Software WS 04/05
Jaos (Interoperability Framework)
Oberon
source
Oberon
Loader
Linker
Loaded
Module
367
© P. Reali / M. Corti
Class
File
Metadata
Compiler
Object &
Symbol
Java
Reflection
API
Oberon
Browser
Java
Metadata
Loader
Oberon
Metadata
Loader
Loader
JIT
Compiler
Linker
Loader
Linker
Loaded
Loaded
Module
Module
Loaded
Class
System-Software WS 04/05
JVM: Verification
Compiler generates
“good” code....
 .... that could be
changed before
reaching the JVM
 need for verification

Verification makes the VM
simpler (less run-time
checks):
–
–
–
–
–
–
–
–
368
© P. Reali / M. Corti
no operand stack overflow
load / stores are valid
VM types are correct
no pointer forging
no violation of access
restrictions
access objects as they are
(type)
local variable initialized before
load
…
System-Software WS 04/05
JVM: Verification
Pass1 (Loading):
 class file version check
 class file format check
 class file complete
369
© P. Reali / M. Corti
Pass 2 (Linking):
 final classes are not
subclassed
 every class has a
superclass (but Object)
 constant pool references
 constant pool names
System-Software WS 04/05
JVM: Verification
Delayed for
performance
reasons
Pass 3 (Linking):
Pass 4 (RunTime):
For each operation in code
First time a type is referenced:
(independent of the path):
 load types when referenced
 operation stack size is the
 check access visibility
same
 class initialization
 accessed variable types
First member access:
are correct
 member exists
 method parameters are
 member type same as
appropriate
declared
 field assignment with
 current method has right to
correct types
access member
 opcode arguments are
Byte-Code
appropriate
Verification
370
© P. Reali / M. Corti
System-Software WS 04/05
JVM: Byte-Code Verification
Verification:
 branch destination must
exists
 opcodes must be legal
 access only existing locals
 code does not end in the
middle of an instruction
371
© P. Reali / M. Corti



types in byte-code must be
respected
execution cannot fall of the
end of the code
exception handler begin
and end are sound
System-Software WS 04/05
Addendum: Security
Security

internal protection
–
–

external protection
–

memory protection
file system accesses
accessibility
problems:
–
373
program threats
© P. Reali / M. Corti
System-Software WS 04/05
Security: Program Threats

Trojan horses: a code segment
that misuses its environment
–
–
–
–

mail attachments
web downloads (e.g., SEXY.EXE
which formats your hard disk)
programs with the same name as
common utilities
misleading names (e.g.,
README.TXT.EXE)
Trap door (in programs or
compilers): an intentional hole in
the software
374
© P. Reali / M. Corti
System-Software WS 04/05
Security: System Threats

worms: a standalone program that spawns other
processes (copies of itself) to reduce system
performance
–
example: Morris worm (1988)


–

exploited holes in rsh, finger and sendmail to gain
access to other machines
once on the other machine it was able to replicate itself
used by spammers to spread and distribute spamming
applications
viruses: similar to worms but embedded in other
programs
–
375
they usually infect other programs and
the boot sector
© P. Reali / M. Corti
System-Software WS 04/05
Security: System Threats

Denial of service
–
–

Example: SYN flooding attacks
–
–
–

perform many requests to steal all the available resources
often distributed (using worms)
the attacker tries to connect
the victim answers with a synchronize and acknowledge
packet
and waits for acknowledgment
Countermeasures
–
–
–
–
376
active filtering
request dropping
cookie based protocols (requests must be authenticated)
stateless protocols
© P. Reali / M. Corti
System-Software WS 04/05
Security: System Threats

badly implemented and designed software:
–
–
–
–
–
–
–
–
377
lpr (setuid) with an option to delete the printed file
mkdir (first create the inode then change the owner)
it was possible to change the inode before the chown …
buffer overflows
password in memory or swap files
insecure protocols (FTP, SMTP)
missing sanity checks (syscalls, command in input, …)
short keys and passwords
proprietary protocols
© P. Reali / M. Corti
System-Software WS 04/05
Bad design: A very recent example





Texas Instruments produces RFID tags offering
cryptographic functionalities.
used for cars and electronic payments
40 bit keys
proprietary protocol
Attack from Johns Hopkins University and RSA
Labs
–
–
378
less than 2 hours for 5 keys
less than 3500$
© P. Reali / M. Corti
System-Software WS 04/05
Security: Buffer Overflows

Overwrite a function’s return
address
function foo(int p1, int p2) {
char array[10];
strcpy(array, someinput);
}
p1 & p2
RET
FP
array
array

Avoid strcpy and check the length,
e.g., strncpy
379
© P. Reali / M. Corti
System-Software WS 04/05
Security: Monitoring

check for suspicious patterns
–


audit logs
periodic scans for security holes (bad passwords,
set-uid programs, changes to system programs)
–

login times
system integrity checks (checksums for executable files)
[tripwire]
network services
–
380
monitor network activity
© P. Reali / M. Corti
System-Software WS 04/05
Example: Firewalling



Many applications use network sockets to
communicate (even on a single machine)
Many applications are not protected
Solution: filter all the incoming connections by
default and allow only the trusted ones
381
© P. Reali / M. Corti
System-Software WS 04/05
Security: (some) Design Principles






Open systems (programs and protocols)
Default is deny access
Check for current authority (timeouts, …)
Give the least privilege possible
Simple protection mechanisms
Do not ask to much to the users (or they will avoid
to protect themselves)
382
© P. Reali / M. Corti
System-Software WS 04/05
Security and Systems: Some Examples
Enhancements to memory management:
 Intel XD bit, AMD NX bit
 mark pages according to the content (data or code)
 an exception is generated if the PC is moved to a
data address
 prevents some buffer overflow attacks
 dynamically generated code has to be generated
through special system calls
 Windows XP SP2, Linux, BSD …
383
© P. Reali / M. Corti
System-Software WS 04/05
Security and Systems: Some Examples
SELinux
 National Security Agency (USA)
 patches to the Linux kernel to enforce mandory
access control
 open source
 independent from the traditional UNIX roles (users
and groups)
 configurable policies restricting what a program is
able to do
384
© P. Reali / M. Corti
System-Software WS 04/05
Security and Systems: Some Examples
OpenBSD
 audit process (proactive bug search)
 random gaps in the stack
 ProPolice: gcc puts a random integer on the
stack in a call prologue and checks it when
returning
 W^X: pages are writable xor executable
385
© P. Reali / M. Corti
System-Software WS 04/05
Security and Systems: Some Examples
OpenBSD
 randomized shared library order and
addresses
 mmap() and malloc() return randomized
addresses
 guard pages between objects
 privilege separation and revocation
386
© P. Reali / M. Corti
System-Software WS 04/05
Privilege Separation

unprivileged child process to contain and restrict
the effects of programming errors
e.g., openssh
time

network connection
listen *22
fork unprivileged child
monitor
request auth
auth result
network
processing
key exchange
authentication
state export
fork user child
monitor
387
© P. Reali / M. Corti
request PTY
pass PTY
user request
processing
user network data
System-Software WS 04/05
Descargar

Uebungsbetrieb - Computer Systems Institute