The World Wide Web
Modified by Linda Kenney
2/4/08
10/3/2015
CS403 Introduction
1
Using the Web, it’s possible for anyone to
publish their own Web pages on a host
running a Web server and have those pages
available to any Internet user with a Web
browser.
10/3/2015
CS403 The World Wide Web
2
Hypertext
The Web was invented in 1990.


But it was based on the concept of hypertext
which had been around for decades.
The basic idea of hypertext is to take the
passive cross-references that are common
in printed text and make them active.
 When reading a book, a cross-reference passively
informs the reader where to turn for additional info and
the reader must manually perform the actions necessary
to obtain that additional info if it is desired.
 Examples?
10/3/2015
CS403 The World Wide Web
3
Hypertext



10/3/2015
On a computer, it’s easy to make cross-references
active. You notify the reader that additional info is
available, but let the computer take the actions
necessary to obtain that info if the reader desires
it.
Such an active cross-reference is called a
hyperlink (or just “link”) and text that contains
such links is called hypertext.
This concept is fundamental to the Web.
CS403 The World Wide Web
4
Web presentations
Most Web pages do not exist in isolation.

The vast majority of them are grouped
together into collections of pages with a
common purpose or theme.
 Such a collection of Web pages is called a Web
presentation or Web site.
 Typically, all the pages within a given
presentation are under the editorial control of a
single individual or organization.
10/3/2015
CS403 The World Wide Web
5
Web presentations (cont.)

A given Web page is likely to contain
several links to other pages.
 Often, those links will lead to other resources
within the same presentation. These links are
called “local links” or “links to local
resources”.
 Some of those links may lead to other
resources which are part of a different
presentation. These links are called “remote
links” or “links to remote resources”.
10/3/2015
CS403 The World Wide Web
6
Clients and servers on the Web
Like most Internet services, the Web is based
on the client/server model.

10/3/2015
A Web browser is just a specific example of a
client program.
CS403 The World Wide Web
7
Clients and servers on the Web (cont.)

The browser can’t accomplish much
without the cooperation of a server.
 A Web server is a program that makes files
available to Web browsers upon request.


10/3/2015
In general, the files a Web server makes available
contain Web pages and the images, sounds, videos
and other media that supplement them.
And all the files a Web server has access to are
generally stored in the secondary storage of the host
on which the server runs.
CS403 The World Wide Web
8
Hypertext Transfer Protocol
Hypertext Transfer Protocol (HTTP) is the
protocol that Web browsers and Web
servers use to communicate with one
another.


10/3/2015
As a protocol, it carefully defines the range of
possibilities, determining precisely what a browser
may say to a server and when.
It also dictates what servers can say to browsers
and when.
CS403 The World Wide Web
9
Hypertext Transfer Protocol
“I need the file page.html”
“Here is the file page.html”
Server
Browser
10/3/2015
CS403 The World Wide Web
10
HTTP requests and responses
When “speaking” HTTP, a Web browser generally sends an HTTP GET
request to the Web server on a specific host requesting a specific
resource.

When it receives an HTTP GET request from a browser, a Web server, in
turn, sends some sort of HTTP response back to the browser.

Note that HTTP requests and responses rely on TCP (Transmission Control
Protocol) and IP to get across the Internet. (see p 72-74)

In other words, HTTP is layered on top of TCP and IP.
HTTP GET request for /page.html
HTTP response
Status
code:
200
Status code:
404 Not
Found
Browser
Content-type: text/html
4370
Content-length: 1634
Server
[contents
ofstatus
/page.html]
[contents
of error
page]
10/3/2015
CS403 The World Wide Web
11
The server’s responsibilities
When it receives an HTTP GET request, a Web server
must prepare an appropriate HTTP response
message.

The request will specify the file it is requesting.
 The server must first locate the requested file within the
file system of its host.
 If the file cannot be located, the server sends back a
‘404 File not found’ response message.
10/3/2015
CS403 The World Wide Web
12
The server’s responsibilities (cont.)

Having found the file, however, the server must
also verify that the file permissions allow it to
access the file.
 If the server is not able to access the file, it will typically
return a ‘403 Forbidden’ response message.

10/3/2015
If the requested file is located and accessible, the
server generates a ‘200 OK’ response message
that includes the contents of the file as well as a
variety of headers that provide information about
the file, such as its type, size and last modified
date.
CS403 The World Wide Web
13
Locating files
A typical host stores thousands of files, all of
which must be uniquely identified.


10/3/2015
It’s impractical to give 100,000 files unique
names.
Instead, a host uses a file system consisting of a
hierarchy of directories to create uniquely
identified locations in which files may be stored.
CS403 The World Wide Web
14
Locating files (cont.)
Each location can be uniquely identified by
the sequence of steps necessary to reach it
from the top of the hierarchy.

10/3/2015
The list of steps needed to reach a location from
the top of the hierarchy is called the absolute
path to that location, and every location has a
unique absolute path.
CS403 The World Wide Web
15
Locating files (cont.)

All items in a given location must have unique
names.
 So each item in the hierarchy can be uniquely identified
by combining its absolute path with its filename to form
an absolute pathname.
10/3/2015
CS403 The World Wide Web
16
Uniform Resource Locators
Before a browser can request a resource, it needs to
know where it can find that resource and what type
of server will be providing it.

To find a specific resource, the browser must be told not
only the name of the file containing that resource, but also
what host it is on and where it is in the file system of that
host.
All the information needed to find a specific resource,
out of the billions available on the Web, is contained
in that resource’s Uniform Resource Locator (URL).
Every resource available on the Web is identified by a
unique URL that contains all the information
necessary for a browser to retrieve that resource.
10/3/2015
CS403 The World Wide Web
17
Uniform Resource Locators (cont.)

The browser always does the same thing with the
URL: it requests the resource and renders it on
the screen.
 In computer science, we use the term render to refer to
the process of producing an image by interpreting some
data.
 A browser renders a Web resource by determining what
to display on the screen based upon what it finds in the
HTTP response that contains the contents of that
resource.
10/3/2015
CS403 The World Wide Web
18
The anatomy of a URL
Consider a typical URL
http://www.sample.com/products/catalog/prod1.html


A URL typically begins with the protocol to use
when accessing the resource.
The remainder of the URL is the identifier that
tells the browser how to locate the resource.
 The identifier starts with a hostname that uniquely
identifies the host on which the resource is stored.
 The rest of the identifier is the pathname that uniquely
locates the resource in that host’s file system.

10/3/2015
The pathname consists of a path and a file name.
CS403 The World Wide Web
19
The Web step-by-step – step 1
The process of displaying a Web resource begins
when the browser is given the URL of that resource
by the user.

The browser examines that URL to find out what it needs to
do next.
 The first part (ex: http://) tells the browser what protocol to
use, and indirectly what type of server to contact.
 The identifier tells the browser where the resource is located.



The hostname in the identifier tells the browser which host is running the
server responsible for the resource.
The pathname in the identifier tells the browser precisely where the desired
resource is stored in that host’s file system.
Using this information, the browser composes an HTTP GET
request message.
 The GET request contains the pathname of the desired
resource as well as the hostname of the server’s host and
various other information.
10/3/2015
CS403 The World Wide Web
20
The Web step-by-step – step 2
The HTTP GET request must be sent to the
appropriate server.

Since it must arrive in its entirety at a specific
host, the request gets sent over the Internet using
TCP and IP.
 To establish a TCP connection with the server, the
browser needs to know the IP address of the host
running the server.
 To get the IP address of the server’s host, the browser
resolves the hostname in the URL’s identifier using DNS.

Using the IP address of the server’s host, the
browser establishes connection with the server.
 The HTTP GET request message is sent to the server
over this connection. Since the request message is small,
it takes little time to send.
10/3/2015
CS403 The World Wide Web
21
The Web step-by-step – step 3
When a Web server receives an HTTP GET
request, it composes an HTTP response.


10/3/2015
Using the pathname specified in the request, the
server attempts to locate the file containing the
resource within the file system of its host.
Once the resource’s file has been located, the
server verifies that it has permission to access that
file.
CS403 The World Wide Web
22
The Web step-by-step – step 3 (cont.)
If the server is able to locate and access the file, the
HTTP response will indicate success.


The response will also indicate the date and time at which
the file was last modified, the type of resource the file
contains and how big it is.
And the server will include the contents of the resource’s file
in the response message.
 Note that this means the size of the response message is
primarily determined by the size of the resource being
requested.
If the server is unable to locate or access the file, the
HTTP response will indicate the nature of the
problem.

10/3/2015
The response may also contain some content for the
browser to use in lieu of the requested resource.
CS403 The World Wide Web
23
The Web step-by-step – step 4
The server must now send the response back to the
requesting browser.


It gets the IP address for the browser from the packet
that carried the HTTP request.
Because they typically contain the contents of the requested
resource, HTTP response messages tend to be significantly
larger than HTTP request messages.
 To minimize the time a user must wait to receive a requested
resource, it’s up to the creator of that resource to minimize the
size of the file(s) containing the resource(s).
10/3/2015
CS403 The World Wide Web
24
The Web step-by-step – step 5
Upon receiving an HTTP response message,
the browser is responsible for rendering the
resource it contains.

Many resources will be Web pages, which are
written in Extensible Hypertext Markup Language
(XHTML).
 Rendering a Web page involves interpreting the XHTML
to determine what the page should look like.

Other resources, however, will be other forms of
media such as images, sounds and video.
 Rendering multimedia resources involves interpreting the
data those resources contain and producing the image,
sound or video that data represents.

10/3/2015
Browsers therefore need to understand a range of
resource types.
CS403 The World Wide Web
25
The Web step-by-step – step 5 (cont.)
It’s also useful to note at this stage that even
though a Web page may appear to contain
images, sounds and videos, each of those
resources must be stored separately in
its own file.


10/3/2015
And each of those resources must therefore be
retrieved from a server with a separate HTTP
transaction.
So, the time it takes to retrieve a Web page is the
sum of the time it takes to retrieve all of its parts.
CS403 The World Wide Web
26
The browser lends a hand
Browsers can play a role in minimizing the time the
user must wait for a page to load.


10/3/2015
A user often revisits the same resources
repeatedly.
So, what you want is for the browser to have save the
resource so that you can return to it without having to
request it from the server again.
CS403 The World Wide Web
27
The browser cache
As a browser receives each requested
resource, it stores a copy of that resource in
a special place called the browser cache.

Along with the contents of the resource it stores
the current date and time and the URL used to
retrieve the resource.
Each time a resource is requested, the
browser checks to see if that resource is
already stored in its cache.

10/3/2015
If it’s not, then the browser goes about retrieving
the resource as we’ve already described.
CS403 The World Wide Web
28
When things go wrong…
Although it often goes off without a hitch,
there are places in an HTTP transaction
where problems can occur.


10/3/2015
Knowing what might go wrong can help us make
sense of otherwise cryptic or confusing error
messages we may get from our browser.
Of course, different browsers and servers are free
to use different error messages as they see fit, so
the wording may differ.
CS403 The World Wide Web
29
When things go wrong… (cont.)
If the hostname in the URL cannot be resolved
to an IP address using DNS, there’s no way
to establish the necessary TCP connection to
the server.
In this case, we’ll get an error to the effect of
“Unable to locate server”.
10/3/2015
CS403 The World Wide Web
30
When things go wrong… (cont.)
The hostname may resolve but the TCP
connection may not be able to be established
for a variety of other reasons.
In this case, we’ll get an error to the effect of
“No response”.
10/3/2015
CS403 The World Wide Web
31
When things go wrong… (cont.)
If we’re able to get a TCP connection and send
an HTTP request to the server, there’s no
guarantee it will be successful.


10/3/2015
If the server is unable to locate the requested file,
we’ll get an error to the effect of
“Not found”.
If the server locates the file but does not have
permission to access it, we’ll get an error to the
effect of
“Forbidden” or “Access denied”.
CS403 The World Wide Web
32
…And how to fix it
Understanding the root cause of an error can
often help you devise a solution to the
problem.
10/3/2015
CS403 The World Wide Web
33
…And how to fix it (cont.)
If you get an “Unable to locate server”
error, you know there’s a problem with the
hostname in the URL.



10/3/2015
Double-check your typing of the hostname.
Make sure your network connection is still
working.
Ensure that your DNS server is functioning in
general.
CS403 The World Wide Web
34
…And how to fix it (cont.)
If you get a “No response” error, you know
the hostname is okay but the server is not
able to respond.


10/3/2015
Often, there’s nothing you can do about this
yourself.
However, since this is often a temporary problem,
try again a little later.
CS403 The World Wide Web
35
…And how to fix it (cont.)
If you get a “Not found” error, you know
there’s a problem with the pathname in the
URL.


10/3/2015
Again, double-check your typing, paying attention
to case.
Try eliminating steps from the pathname one at a
time, moving from right to left.
CS403 The World Wide Web
36
…And how to fix it (cont.)
If you get a “Forbidden” error, the problem
is with the permissions on the file containing
the requested resource.


10/3/2015
If the file belongs to you, simply adjust the
permissions.
Otherwise, there’s little you can do about this
problem yourself except contact the owner of the
resource.
CS403 The World Wide Web
37
Resource types
As we’ve seen, the Web consists of a variety
of resource types.


10/3/2015
In each HTTP response, the server includes an
indicator of the resource’s type so the browser
knows how to render it.
Since servers and browsers must agree on the
meaning of this type info, it needs to be
standardized.
CS403 The World Wide Web
38
Resource types (cont.)

The standard used for this purpose is called
Multipurpose Internet Mail Extensions (MIME).
 As you can tell from its name, MIME was originally
designed for use with e-mail.
 A MIME type consists of an indicator of the general
resource type (text, image, audio, etc.) followed by a /
followed by an indicator of the specific resource type
(html, jpeg, mpeg, etc.).
10/3/2015

For example, XHTML files are assigned a MIME type of text/html.

JPEG image files are assigned a MIME type of image/jpeg.

MP3 sound files are assigned a MIME type of audio/mpeg.
CS403 The World Wide Web
39
Filename extensions
The server needs to know the type of each resource
for which it is responsible.


Otherwise, it wouldn’t know what MIME type to list in the
HTTP response message.
Servers are set up to use the extension of the resource’s
filename to determine its type.
 A filename extension is part of the actual filename, but it
comes at the end and starts with a dot.
 Examples?

10/3/2015
The server is configured to associate certain filename
extensions with specific MIME types.
CS403 The World Wide Web
40
Filename extensions (cont.)
For this reason, it’s important to name all of
the files containing your Web resources with
appropriate filename extensions.

We’ll generally use only a small number of
resource types in this course.
 XHTML files are given .html (or .htm) extensions.
 JPEG images are given .jpg (or . jpeg ) extensions.
 GIF images are given .gif extensions.
 CSS files are given .css extensions.
10/3/2015
CS403 The World Wide Web
41
What Browsers Understand
A browser understands the HTTP protocol for
retrieving Web pages.
 Most browsers also understand protocols for other Web
services like file transfer, instant messaging, e-mail and
network news.
A browser understands XHTML and HTML and
can interpret it in order to render Web pages.
 Many also understand other popular languages like CSS,
JavaScript and XML .
10/3/2015
CS403 The World Wide Web
42
What Browsers Understand (cont.)
Most browsers understand common image
file formats like JPEG and GIF and can render
images stored in these formats.
 Some also understand image file formats like BMP and
PNG.
Many browsers understand other forms of
media as well.
 Flash presentations are used for interactive animations.
 MP3 is a file format commonly used for storing sounds
and music.
 MPEG and AVI are common file formats for storing video.
10/3/2015
CS403 The World Wide Web
43
What Browsers Understand (cont.)
A good browser is designed to provide the
functionality most Web users are likely to
need.

10/3/2015
Since people use the Web in many different ways
most browsers are designed to accept two
different types of add-ons that extend their
capabilities.
CS403 The World Wide Web
44
Add-Ons : Helpers and Plug-Ins
(p. 76-83)
An application is a program you run on your
computer to accomplish specific tasks.
You can obtain applications from retail
software stores or the Internet.
A browser often uses other applications to
view the Web.
You can customize what applications your
browser uses.
10/3/2015
CS403 The World Wide Web
45
Helpers
A helper application is an application a
browser can launch. It can be any application
on your computer.

Examples?
When your browser encounters a file that
requires special handling, it looks for an
appropriate helper application and opens the
file in that application.
10/3/2015
CS403 The World Wide Web
46
Plug-Ins
A browser plug-in is an application that
expands the capabilities of a web browser.
When you install a plug-in, you extend the
capabilities of your browser to handle a file
type that it wasn’t originally designed to
handle.
Any file requiring that plug-in will be
displayed inside the browser window, with
the plug-in working as if it were a part of
your browser.
10/3/2015
CS403 The World Wide Web
47
Plug-Ins (cont.)
Plug-ins support everything from audio to
animation to documents
Plug-ins increase your browser’s memory
requirements and launch time.
You can find Web pages to help you locate
plug-ins for your browser.
10/3/2015
CS403 The World Wide Web
48
Common plug-ins and helper applications:
10/3/2015
CS403 The World Wide Web
49
Key terms
Absolute path
Absolute pathname
Browser cache
Browsing
Conceptual network
File system
Filename extension
Helper app
Hostname
HTTP
HTTP GET request
HTTP HEAD request
HTTP response
Hyperlink
Hypermedia
Hypertext
Identifier
10/3/2015
Link
Local link
MIME
MIME type
Pathname
Permissions
Plug-in
Remote link
Render
Scheme
URL
Web browser
Web presentation
Web server
Web site
World Wide Web
XHTML
CS403 The World Wide Web
50
Some information used from:

10/3/2015
Web 101 by Lehnert and Kopec
CS403 The World Wide Web
51
Descargar

CS403: Online Network Exploration