COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)
Models of Distributed Computing
Noah Mendelsohn
Tufts University
Email: [email protected]
Web: http://www.cs.tufts.edu/~noah
Architecting a universal Web
Identification: URIs
Interaction: HTTP
Data formats: HTML, JPEG,
GIF, etc.
© 2010 Noah Mendelsohn
Goals
 Introduce basics of distributed system design
 Explore some traditional models of distributed computing
 Prepare for discussion of REST: the Web’s model
3
© 2010 Noah Mendelsohn
Communicating systems
© 2010 Noah Mendelsohn
Communicating systems
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Communicating Sequential Processes
We’ve got pretty clean higher
level abstractions for use on a
single machine
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Communicating systems
How can we get a clean model of
two communicating machines?
CPU
Memory
Storage
CPU
Memory
Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Large scale systems
How can we get a clean model of
a worldwide network of
communicating machines?
Internet
What are the clean abstractions on this scale?
© 2010 Noah Mendelsohn
WARNING!!
 This is a very big topic…
 …many important approaches have been studied and used…
 …there is lots of operational experience, and also formalisms…
This presentation does not attempt to be either comprehensive
or balanced…the goal is to introduce some key concepts
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Message Passing
© 2010 Noah Mendelsohn
Message passing
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Half duplex: one way at a time
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Full duplex: both ways at the same time
CPU
Memory
Storage
CPU
Memory
Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Message passing
 Data abstraction:
– Low level: bytes (octets)
– Sometimes: agreed metaformat (XML, C struct, etc.)
 Synchronization
– Wait for message
– Timeout
© 2010 Noah Mendelsohn
Interaction Patterns
© 2010 Noah Mendelsohn
Between pairs of machines
CPU
Memory
Storage
CPU
Memory
Storage
Request
Response
 Message passing: no constraints
 Common pattern: request/response
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Client Server
© 2010 Noah Mendelsohn
Client / server
CPU
Memory
Storage
CPU
Memory
Storage
Request service
Response
 Request / response is a traffic pattern
 Client / server describes the roles of the nodes
 Server provides service for client
© 2010 Noah Mendelsohn
Client / server
 Probably the most common dist. sys. architecture
 Simple – well understood
 Doesn’t explain:
– How to exploit more than 2 machines
– How to make programming easier
– How to prove correctness: though the simple model helps
 Most client/server systems are request/response
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
N-Tier
© 2010 Noah Mendelsohn
N-tier – also called Multilevel Client/Server
CPU
Memory
Storage
CPU
Memory
Storage
Request
CPU
Memory
Storage
Request
Response
Response
 Layered
 Each tier provides services for next higher level
 Reasons:
– Information hiding
– Management
– Scalability
© 2010 Noah Mendelsohn
Typical N-tier system: airline reservation
Reservation
Records
iPhone or Android
Reservation Application
Flight Reservation
Logic
Browser or Phone App
Application - logic
Application - logic
Many commercial applications work this way
© 2010 Noah Mendelsohn
The Web itself is a 2 or 3 Tier system
Web Server
Browser
Proxy Cache
(optional!)
E.g. Firefox
E.g. Squid
E.g. Apache
Many commercial applications work this way
© 2010 Noah Mendelsohn
Web Reservation System
Reservation
Records
Web-Base
Reservation Application
Flight Reservation
Logic
Proxy Cache
(optional!)
HTTP
Browser or Phone App
HTTP
E.g. Squid
RPC? ODBC? Proprietary?
Application - logic
Application - logic
Many commercial applications work this way
© 2010 Noah Mendelsohn
Web Publishing System
Content Management
System
Web-Base
Reservation Application
Content
Distribution
Network
Browser or Phone App
E.g. Akamia
Content Web Site
E.g. cnn.com
Database or CMS
Many commercial applications work this way
© 2010 Noah Mendelsohn
Advantages of n-tier system
 Separation of concerns – each layer has own role
 Parallism and performance?
– If done right: multiple mid-tier servers work in parallel
– Back end systems centralize mainly data requiring sharing & synchronization
– Mid tier can provide shared, scalable caching
 Information hiding
– Mid-tier apps shielded from data layout
 Security
– Credit card numbers etc. not stored at mid-tier
© 2010 Noah Mendelsohn
Other patterns
 Spanning tree
 Broadcast (send to many nodes at once)
 Flood
 Various P2P
 Etc.
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Remote Procedure Call
© 2010 Noah Mendelsohn
Remote Procedure Call
 The term RPC was coined by the late Bruce Nelson in his
1981 CMU PhD thesis
 Key idea: an ordinary function call executes remotely
 The trick: the language runtime or helper code must
automatically generate code to send parameters and results
 For languages like C: proxies and stubs are generated
– Not needed in dynamic languages like Ruby, JavaScript, etc.
 RPC is often (erroneously IMO) used to describe any
request / response system
© 2010 Noah Mendelsohn
RPC: Call remote functions automatically
x = sqrt(4)
float
sqrt(float n) {
send n;
read s;
return s;
}
proxy
CPU
Memory
Storage
Request
float
sqrt(float n) {
…compute sqrt…
return result;
}
CPU
Memory
Storage
invoke sqrt(4)
result=2 (no exception thrown)
Response
void
doMsg(Msg m) {
s = sqrt(m.s);
send s;
}
stub
 Interface definition: float sqrt(float n);
 Proxies and stubs generated automatically
 RPC provides transparent remote invocation
© 2010 Noah Mendelsohn
RPC: Pros and Cons
 Pros:
– Transparency is very appealing
– Simple programming model
– Useful as organizing principle even when not fully automated
 Cons
– Getting language details right is tricky (e.g. exceptions)
– No client/server overlap: doesn’t work well for long-running operations
– May not optimize large transfers well
– Not all APIs make sense to remote: e.g. answer = search(tree)
– Versioning can be a problem: client and server need to agree exactly on
interface (or have rules for dealing with differences)
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Distributed Object Systems
© 2010 Noah Mendelsohn
How do you build an RPC for this?
Class
int
int
int
}
Point {
x,y
getx() {return x;}
gety() {return y;}
Class Rectangle {
…members and constructs not shown…
Point getUpperLeft() {…};
Point getLowerRight {…};
}
Call method on remoted object
int
area (Rectangle r) {
width=r.getLowerRight().getx() –
r.getUpperLeft.getx();
width=r.getLowerRight().gety() –
r.getUpperLeft.gety();
}
myRect = new Rectangle;
…assume position set here..
int a = area(myRect); // REMOTE THIS CALL!
Pass object to remote method
Distributed Object systems make this work!
© 2010 Noah Mendelsohn
Distributed object systems
 In the 1990s, seemed like a great idea
 Advantages of OO encapsulation & inheritance + RPC
 Examples
– CORBA (Industry standard)
– DCOM (Microsoft)
 Still quite widely used within enterprises
 Complicated
–
–
–
–
–
Marshalling object references
Distributed object lifetime management
Brokering: which object provides the service today
Remote “new”: creating objects on remote systems
All the pros & cons of RPC, plus the above
 Generally not appropriate at Internet scale
© 2010 Noah Mendelsohn
Traditional Models of Distributed
Computing
Some Other Options
© 2010 Noah Mendelsohn
Special Purpose Models
 Remote File System
– Network provides transparent access to remote files
– Examples: NFS, CIFS
 Remote Database
– Examples: ODBJ, JDBC
 Remote Device
– Remote printing, disk drive etc.
 Virtual terminal
– One computer simulates an interactive terminal to another
© 2010 Noah Mendelsohn
Some other interesting models
 Broadcast / multicast
– Send messages to everyone (broadcast) / named group (multicast)
 Publish / subscribe (pub/sub)
– Subscribe to named events or based on query filter
– Call me whenever Pepsi’s stock price changes
– Implements a distributed associative memory
 Reliable queuing
–
–
–
–
Examples: IBM MQSeries, Java Message Service (JMS)
Model: queued messages, preserved across hardware crashes
Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions;
Depends on disk-based transaction systems at each node to keep queues
 Tuple spaces
– Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM)
– Network-scale shared variable space, with synchronization
– Good for queues of work to do: some cloud architectures use a related model to distribute work to
servers
© 2010 Noah Mendelsohn
Stateful and Stateless
Protocols
© 2010 Noah Mendelsohn
Stateful and Stateless Protocols
 Stateful: server knows which step (state) has been reached
 Stateless:
– Client remembers the state, sends to server each time
– Server processes each request independently
 Can vary with level
– Many systems like Web run stateless protocols (e.g. HTTP) over
streams…at the packet level, TCP streams are stateful
– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs)
update persistent state at the server
© 2010 Noah Mendelsohn
Advantages of stateless protocols
 Protocol usually simpler
 Server processes each request independently
 Load balancing and restart easier
 Typically easier to scale and make fault-tolerant
 Visibility: individual requests more self-describing
© 2010 Noah Mendelsohn
Advantages of stateful protocols
 Individual messages carry less data
 Server does not have to re-establish context each time
 There’s usually some changing state at the server at some
level, except for completely static publishing systems
© 2010 Noah Mendelsohn
Text vs. Binary Protocols
© 2010 Noah Mendelsohn
Protocols can be text or binary on the wire
 Text: messages are encoded characters
 Binary: any bit patterns
 Pros and cons quite similar to those for text vs. binary file
formats
 When sending between compatible machines, binary can be
much faster because no conversion needed
 Most Internet-scale application protocols (HTTP, SMTP) use
text for protocol elements and for all content except
photo/audio/video
 HTTP 2.0 moving to binary (for msg size and parsing speed)
© 2010 Noah Mendelsohn
Summary
© 2010 Noah Mendelsohn
Summary
 The machine-level model is complex: multiple CPUs,
memories
 A number of abstractions are widely used for limited-scale
distribution
 RPC is among the most interesting and successful
 Statefulness / statelessness is a key design tradeoff
 We’ll see next time why a new model was needed for the Web
© 2010 Noah Mendelsohn
Descargar

Naming System Design Tradeoffs