Search Engine
Search Engine
A Web search engine is a search engine designed to
search for information on the World Wide Web.
Information may consist of web pages, images and
other types of files.
Eg:, Yahoo!,,
Search Engine
Basically, a search engine is a software
program that searches for sites based on
the words that you designate as search
 Search engines look through their own
databases of information in order to find
what it is that you are looking for.
Web Search Engine
Web search engine is a tool designed to
search for information on the World Wide
 The search results are usually presented
in a list and are commonly called hits.
 The information may consist of web
pages, images, information and other
types of files..
Web Search Engine
Some search engines also mine data
available in databases or open directories.
 Unlike Web directories, which are
maintained by human editors, search
engines operate algorithmically or are a
mixture of algorithmic and human input
Web Search Engine
It is a prograWeb Search Enginem that searches
documents for specified keywords and returns a
list of the documents where the keywords were
Although search engine is really a general class
of programs, the term is often used to
specifically describe systems like Google, Alta
Vista and Excite that enable users to search for
documents on the World Wide Web and
USENET newsgroups.
Web Search Engine
Search engines and directories are not the
same thing; although the term "search
engine" often is used interchangeably.
Search engines automatically create web
site listings by using spiders that "crawl"
web pages, index their information, and
optimally follows that site's links to other
Web Search Engine
Spiders return to already-crawled sites on
a pretty regular basis in order to check for
updates or changes, and everything that
these spiders find goes into the search
engine database.
How it works?
Typically, a search engine works by sending out
a spider to fetch as many documents as
Another program, called an indexer, then reads
these documents and creates an index based on
the words contained in each document.
Each search engine uses a proprietary algorithm
to create its indices such that, ideally, only
meaningful results are returned for each query.
How it works?
They include incredibly detailed processes and
methodologies, and are updated all the time.
This is a bare bones look at how search engines
work to retrieve your search results. All search
engines go by this basic process when
conducting search processes, but because there
are differences in search engines, there are
bound to be different results depending on which
engine you use.
How it works?
The searcher types a query into a search
 Search engine software quickly sorts
through literally millions of pages in its
database to find matches to this query.
 The search engine's results are ranked in
order of relevancy.
Example of Search Engines
Google is always a safe bet for most
search queries, and most of the time your
search will be successful on the very first
page of search results.
 Yahoo is also a great choice, and finds a
lot of stuff that Google does not
necessarily pick up.
Example of Search Engines
There are some search engines out there that
are able to answer factual questions, among
these are, BrainBoost,
Factbites, and Ask Jeeves.
There are quite a few search engines that will
help you do this with clustered results or search
suggestions. Some of these include Clusty,
WiseNut, AOL Search, and Teoma, in addition
to Gigablast, AllTheWeb, and SurfWax.
Example of Search Engines
There are lots of great search engines that
deal primarily in academic and research
oriented results. Included among there are
Scirus, Yahoo Reference, National
Geographic Map Machine, MagPortal,
CompletePlanet, FirstGov, and
Example of Search Engines
Images on the Web are easy to find,
especially with targeted image search
engines such as Picsearch, Ditto, and of
course, Google has some fantastic image
search capabilities. You can also check
out my list of Image Search EnginesDirectories-Collections, or Clip ArtButtons-Graphics- Icons-Images on the
Example of Search Engines
There's so much multimedia on the Web
that your main problem will be finding
enough time to look at it all. Here are a
few places you can use to search for
sounds, movies, and music on the Web:
Loomia, Torrent Typhoon, The Internet
Movie Database, SingingFish, and
Podscope. For even more multimedia
search engines and sites
Example of Search Engines
Finding someone with similar interests on
the Web via a blog or online community is
simple. Use LjSeek, Technorati, and
Daypop to search for blogs; find people
with ZoomInfo, Pretrieve, or
Zabasearch, and search for discussion
groups and message boards with
Operation of Search Engine
A search engine operates, in the following
 Web
 Indexing
 Searching
Web Crawling
A Web crawler is a computer program that browses
the World Wide Web in a methodical, automated
manner. Other terms for Web crawlers are ants,
automatic indexers, bots, and worms or Web spider,
Web robot, or—especially in the FOAF (an acronym of
Friend of a friend) community—Web scutter.
This process is called Web crawling or spidering. Many
sites, in particular search engines, use spidering as a
means of providing up-to-date data. Web crawlers are
mainly used to create a copy of all the visited pages for
later processing by a search engine that will index the
downloaded pages to provide fast searches.
Web Crawling
Crawlers can also be used for automating maintenance
tasks on a Web site, such as checking links or validating
HTML code. Also, crawlers can be used to gather specific
types of information from Web pages, such as harvesting
e-mail addresses (usually for spam).
A Web crawler is one type of bot, or software agent. In
general, it starts with a list of URLs to visit, called the
seeds. As the crawler visits these URLs, it identifies all
the hyperlinks in the page and adds them to the list of
URLs to visit, called the crawl frontier. URLs from the
frontier are recursively visited according to a set of
Internet bots, also known as web robots, WWW robots or simply
bots, are software applications that run automated tasks over the
There are important characteristics of the Web that make crawling it very
 its large volume,
 its fast rate of change, and
 dynamic page generation.
The behavior of a Web crawler is the outcome of a combination of
 a selection policy that states which pages to download,
 a re-visit policy that states when to check for changes to the pages,
 a politeness policy that states how to avoid overloading Web sites,
 a parallelization policy that states how to coordinate distributed Web
Search engine indexing collects, parses, and
stores data to facilitate fast and accurate
information retrieval. Index design incorporates
interdisciplinary concepts from linguistics,
cognitive psychology, mathematics, informatics,
physics and computer science. An alternate
name for the process in the context of search
engines designed to find web pages on the
Internet is Web indexing.
Popular engines focus on the full-text indexing
of online, natural language documents.
Media types such as video and audio and
graphics are also searchable.
Meta search engines reuse the indices of
other services and do not store a local index,
whereas cache-based search engines
permanently store the index along with the
A web search query is a query that a user
enters into web search engine to satisfy
his or her information needs. Web search
queries are distinctive in that they are
unstructured and often ambiguous; they
vary greatly from standard query
languages which are governed by strict
syntax rules.
"Googol" is the mathematical term for a 1
followed by 100 zeros. The term was coined
by Milton Sirotta, nephew of American
mathematician Edward Kasner, and was
popularized in the book, "Mathematics and
the Imagination" by Kasner and James
The purpose of inventing Google's is to
organize the world's information and
make it universally accessible and
Type Public
Founded Menlo Park, California (September 7, 1998)
Headquarters Mountain View, California, USA
Eric E. Schmidt, CEO/Director
Sergey Brin, Co-Founder, Technology President
Key people
Larry Page, Co-Founder, Products President
George Reyes, CFO
Industry Internet, Computer software
Revenue US$16.593 billion ▲56% (2007)
Net income US$4.203 billion ▲25% (2007)
Employees 16,805 (December 31, 2007)
Slogan Don't be evil
Founded by Larry Page and Sergey Brin
September 7, 1998
Begun as a research project
They hypothesized that a search engine that
analyzed the relationships between websites
would produce better ranking of results than
existing techniques, which ranked results
according to the number of times the search
term appeared on a page
Nicknamed – ‘Backrub’
Originally, the search engine used the
Stanford University website with the
The domain was registered
on September 15, 1997 and the
company was incorporated as Google
Inc. on September 7, 1998 at a friend's
garage in Menlo Park, California.
Originally it was but the
name has already registered to another
Google's founders Larry Page and Sergey Brin
developed a new approach to online search that
took root in a Stanford University dorm room and
quickly spread to information seekers around the
globe. Named for the mathematical term "googol,"
Google operates websites at many international
domains, with the most trafficked being Google is widely recognized as the
world's best search engine because it is fast,
accurate and easy to use. The company also
serves corporate clients, including advertisers,
content publishers and site managers with costeffective advertising and a wide range of revenuegenerating search services.
Google History
According to Google lore, company founders Larry Page and Sergey Brin
were not terribly fond of each other when they first met as Stanford
University graduate students in computer science in 1995. Larry was a 24year-old University of Michigan alumnus on a weekend visit; Sergey, 23,
was among a group of students assigned to show him around. By January
of 1996, Larry and Sergey had begun collaboration on a search engine
called BackRub, named for its unique ability to analyze the "back links"
pointing to a given website. A year later, their unique approach to link
analysis was earning BackRub a growing reputation among those who had
seen it. In September 1998, Google Inc. opened its door in Menlo Park,
California. Already, still in beta, was answering 10,000 search
queries each day. The press began to take notice of the upstart website with
the relevant search results, and articles extolling Google appeared in USA
TODAY and Le Monde. That December, PC Magazine named Google one
of its Top 100 Web Sites and Search Engines for 1998. Google was moving
up in the world.
Objectives and Goals
To push more development and understanding
into the academic realm.
To build system that reasonable numbers of
people can actually use. Usage are important to
Google because they think some of the most
interesting research will involve leveraging the
vast amount of usage data that is available from
modern web systems.
To build an architecture that can support novel
research activities on large-scale web data.
To set up an environment where other researches
can come in quickly, process large chunks of the
web, and produce interesting results that have
been very difficult to produce otherwise.
To set up a Space-lab like environment where
researches or even students can propose and do
interesting experiments on our large-scale web
Features Overview
 The Google Toolbar enables you to conduct
a Google search from anywhere on the web
 Google AdWords program to promote their
products and services on the web with
targeted advertising, and they believe
AdWords is the largest program of its kind.
 Google AdSense program to deliver ads
relevant to the content on their sites,
improving their ability to generate revenue and
enhancing the experience for their users
Technology Overview
PageRank Technology: PageRank reflects
Google's view of the importance of web pages by
considering more than 500 million variables and 2
billion terms. Pages that Google believes are
important pages receive a higher PageRank and are
more likely to appear at the top of the search results.
PageRank also considers the importance of each
page that casts a vote, as votes from some pages
are considered to have greater value, thus giving the
linked page greater value. Important pages receive a
higher PageRank and appear at the top of the
search results. Google's technology uses the
collective intelligence of the web to determine a
page's importance. There is no human involvement
or manipulation of results, which is why users have
come to trust Google as a source of objective
information untainted by paid placement.
Hypertext-Matching Analysis: Google's
search engine also analyzes page content.
However, instead of simply scanning for
page-based text (which can be manipulated
by site publishers through meta-tags),
Google's technology analyzes the full
content of a page and factors in fonts,
subdivisions and the precise location of
each word. Google also analyzes the
content of neighboring web pages to ensure
the results returned are the most relevant to
a user's query.
Anchor Text
The text of links is treated in a special way in Google
search engine. Most search engines associate the
text of a link with the page that the link is on. In
addition, they associate it with the page the link
points to. This has several advantages. First, anchors
often provide more accurate descriptions of web
pages than the pages themselves. Second, anchors
may exist for documents which cannot be indexed by
a text-based search engine, such as images,
programs, and databases. This makes it possible to
return web pages which have not actually been
System Anatomy
BigFiles are virtual files spanning multiple
file systems and are addressable by 64 bit
integers. The allocation among multiple file
systems is handled automatically. The
BigFiles package also handles allocation
and deallocation of file descriptors, since
the operating systems do not provide
enough for our needs. BigFiles also
support rudimentary compression options.
 Repository
The repository contains the full HTML of every
web page. Each page is compressed using zlib
(see RFC1950). The choice of compression
technique is a tradeoff between speed and
compression ratio. They chose zlib’s speed
over a significant improvement in compression
offered by bzip. The compression rate of bzip
was approximately 4 to 1 on the repository as
compared to zlib’s 3 to 1 compression. The
repository requires no other data structures to
be used in order to access it. This helps with
data consistency and makes development
much easier; they can rebuild all the other data
structures from only the repository and a file
which lists crawler errors.
Document Index
The document index keeps information about
each document. It is a fixed width ISAM (Index
sequential access mode) index, ordered by
docID. The information stored in each entry
includes the current document status, a pointer
into the repository, a document checksum, and
various statistics. If the document has been
crawled, it also contains a pointer into a variable
width file called docinfo which contains its URL
and title. Otherwise the pointer points into the
URLlist which contains just the URL. This design
decision was driven by the desire to have a
reasonably compact data structure, and the ability
to fetch a record in one disk seeks during a
 Lexicon
The lexicon has several different forms. One
important change from earlier systems is that
the lexicon can fit in memory for a reasonable
price. In the current implementation Google
can keep the lexicon in memory on a
machine with 256 MB of main memory. The
current lexicon contains 14 million words
(though some rare words were not added to
the lexicon). It is implemented in two parts -a list of the words (concatenated together but
separated by nulls) and a hash table of
Hit List
A hit list corresponds to a list of
occurrences of a particular word in a
particular document including position, font,
and capitalization information. Hit lists
account for most of the space used in both
the forward and the inverted indices.
Because of this, it is important to represent
them as efficiently as possible.They chose
a hand optimized compact encoding since it
required far less space than the simple
encoding and far less bit manipulation than
Huffman coding
 Forward
The forward index is actually already partially
sorted. It is stored in a number of barrels
(we used 64). Each barrel holds a range of
wordID’s. If a document contains words that
fall into a particular barrel, the docID is
recorded into the barrel, followed by a list of
wordID’s with hitlists which correspond to
those words. This scheme requires slightly
more storage because of duplicated docIDs
but the difference is very small for a
reasonable number of buckets and saves
considerable time and coding complexity in
the final indexing phase done by the sorter.
Inverted Index
The inverted index consists of the same barrels
as the forward index, except that they have been
processed by the sorter. For every valid wordID,
the lexicon contains a pointer into the barrel that
wordID falls into. It points to a doclist of docID’s
together with their corresponding hit lists. This
doclist represents all the occurrences of that word
in all documents.
 Crawling
The Web
In order to scale to hundreds of millions of
web pages, Google has a fast distributed
crawling system. A single URLserver serves
lists of URLs to a number of crawlers (we
typically ran about 3). Both the URLserver
and the crawlers are implemented in Python.
Each crawler keeps roughly 300 connections
open at once. This is necessary to retrieve
web pages at a fast enough pace. At peak
speeds, the system can crawl over 100 web
pages per second using four crawlers.
Each crawler maintains its own DNS
cache so it does not need to do a
DNS lookup before crawling each
document. Each of the hundreds of
connections can be in a number of
different states: looking up DNS,
connecting to host, sending request,
and receiving response. These factors
make the crawler a complex
component of the system. It uses
asynchronous IO to manage events,
and a number of queues to move
page fetches from state to state.
 Indexing
The Web
- Parsing
Any parser which is designed to run on the entire Web must
handle a huge array of possible errors. These range from
typos in HTML tags to kilobytes of zeros in the middle of a
tag, non-ASCII characters, HTML tags nested hundreds
deep, and a great variety of other errors that challenge
anyone’s imagination to come up with equally creative ones.
Indexing Documents into Barrels
After each document is parsed, it is encoded into a number
of barrels. Every word is converted into a wordID by using
an in-memory hash table -- the lexicon. New additions to the
lexicon hash table are logged to a file. Once the words are
converted into wordID’s, their occurrences in the current
document are translated into hit lists and are written into the
forward barrels.
- Sorting
In order to generate the inverted
index, the sorter takes each of the
forward barrels and sorts it by wordID
to produce an inverted barrel for title
and anchor hits and a full text inverted
barrel. This process happens one
barrel at a time, thus requiring little
temporary storage.
• Searching
The goal of searching is to provide quality search
results efficiently. made great progress in terms of
efficiency. Therefore, they have focused more on
quality of search in our research, although they believe
their solutions are scalable to commercial volumes with
a bit more effort. The Google query evaluation process
is show in Figure 4.
To put a limit on response time, once a certain number
(currently 40,000) of matching documents are found,
the searcher automatically goes to step 8 in Figure 4.
This means that it is possible that sub-optimal results
would be returned. They are currently investigating
other ways to solve this problem. In the past, they
sorted the hits according to PageRank, which seemed
to improve the situation.
• Searching
• The Ranking System
 Google designed their ranking
function so that no particular factor
can have too much influence. For
example a single word query. In
order to rank a document with a
single word query, Google looks at
that document’s hit list for that word.
Google considers each hit to be one
of several different types (title,
anchor, URL, plain text large font,
plain text small font, ...) each of which
has its own type-weight.
The type-weights make up a vector indexed
by type. Google counts the number of hits
of each type in the hit list. Then every count
is converted into a count-weight. Countweights increase linearly with counts at first
but quickly taper off so that more than a
certain count will not help. They take the
dot product of the vector of count-weights
with the vector of type-weights to compute
an IR score for the document. Finally, the
IR score is combined with PageRank to
give a final rank to the document.
• Result and Performance
The results are clustered by server. This helps
considerably when sifting through result sets.
Google relied on anchor text to determine this
was a good answer to the query.
All of the results are reasonably high quality
pages and, at last check, none were broken links.
This is largely because they all have high
• Google strengths
 The
interface is clear and simple.
 Pages load instantly.
 Placement in search results is never sold
to anyone.
- No other search engine accesses more of
the Internet or delivers more useful
information than Google. Google Search is
fast with most results coming back to the
user in less than one second
• Google Weakness
Some people love the results they get at
Google, others are often disappointed. To a
large extent, both the pluses and the minuses
derive from Google's ranking system, which (as
the folks at Google explain depends
largely on the number of links to a particular
page and the relevance of the content on those
linking pages to the content on the target page,
and the quality of the pages doing the linking.
If we want to know what Web pages
outside of your own site have links to your
pages. At Google, we can do a search for
or get the same results by going to their
"Advanced" search and using their "page
specific search" to find pages that link to a
particular page. But the results are include
the information that we don’t want and we
don’t need
• Google Security and Product Safety
As a provider of software, services and monetization
for users, advertisers and publishers on the Internet,
Google feel a responsibility to protect your privacy
and security. They recognize that secure products
are instrumental in maintaining the trust you place in
them and strive to create innovative products that
both serve your needs and operate in your best
Google takes security issues very seriously and will
respond swiftly to fix verifiable security issues. Some
of their products are complex and take time to
update. When properly notified of legitimate issues,
they will do their best to acknowledge your emailed
report, assign resources to investigate the issue, and
fix potential problems as quickly as possible.
Google is designed to crawl and index the
Web efficiently and produce much more
satisfying search result than existing
systems. Google's technology uses the
collective intelligence of the web to
determine a page's importance. There is no
human involvement or manipulation of
results, which is why users have come to
trust Google as a source of objective
information untainted by paid placement
Search - 4 elements - speed, accuracy, objectivity
and ease of use
Google examines billions of web pages to find the
most relevant pages for any query and typically
returns those results in less than half a second.
Google Gadgets - Sidebar plugins
Toolbar - to use Google search without visiting
the Google homepage.
Pagerank - PageRank reflects Google's view of
the importance of web pages
Hypertext Matching Analysis - analyzes page
content to ensure the results returned are the most
relevant to a user's query.
Google also pioneered the first wireless search
technology for on-the-fly translation of HTML to
formats optimized for WAP, i-mode, J-SKY, and
Google Desktop - made it easier for people to find and
share information on their own computers
Google Chat-connected people through Gmail and Talk the first service to integrate email and instant messaging
within a web browser
Google Page Creator -made it even easier for anybody
to design and create web pages quickly and simply
Google Earth - This technology enables users to fly
through space, zooming into specific locations they
choose, and seeing the real world in sharp focus
The life span of a Google query
normally lasts less than half a second,
yet involves a number of different
steps that must be completed before
results can be delivered to a person
seeking information.
3. The search results are returned to
the user in a fraction of a second.
2. The query travels to the doc
servers, which actually retrieve
the stored documents. Snippets
are generated to describe each
search result.
1. The web server sends the query to
the index servers. The content inside
the index servers is similar to the
index in the back of a book - it tells
which pages contain the words that
match the query.
Google is the BIGGEST search engine database in
the world
PageRank™ often finds useful pages. It is one of
the defaults that cannot be turned off in Google
and is not for sale. It works on a unique
combination of factors, some of which are:
Popularity - based on the number of links to a page
and the importance of the pages that link
Importance - traffic, quality of links
Word proximity and occurrence in results
Google has many useful ways to limit searches
Google offers special "fuzzy" searches that are
useful to search synonyms, find definitions, find
similar/related pages, and more
The shortcuts & special Google databases can
enhance certain types of research
Google Books and Google Scholar have great
potential for university-level research using the
Lots of "stop words" which you have to precede
with a + to search or search in quotes.
Full Boolean logic is not supported - OR, - for
"not," and AND (implied as default).
Despite its default AND, Google sometimes returns
pages that do not contain all of your terms. Google
shows you these results because they are
"important" (rank high) in Google. The only way to
know whether your terms are in a page or why the
page was provided is to look at Google's cached
Google Search Tips:
You can search for a phrase by using quotations ["like this"] or with a
minus sign between words [like-this].
You can search by a date range by using two dots between the years
When searching with a question mark [?] at the end of your phrase,
you will see sponsored Google Answer links, as well as definitions if
Google searches are not case sensitive.
By default Google will return results which include all of your search
Google automatically searches for variations of your term, with
variants of the term shown in yellow highlight.
Google lets you enter up to 32 words per search query.
Headquarters in Sunnyvale, California
1of the internet service provider leading in internet
business around the world.
Currently 500 million user globally visit the site
More then 20 branches around the world and More
then 20 different language world wide.
40 types of popular awards since incorporated.
Founder David Filo & Jerry
Yang PhD candidates in
Electrical Engineering at
Stanford University’s.
Hobby tracking of personal
interest turn to biz when
100,000 user access it.
Incorporated 1995.
Publicly owned company
1st went public on
NASDAQ in April of 1996.
Jerry Yang
David Filo
Feb, 2000 Yahoo!
announces fourth stock
April, 2003 New Yahoo!
Search introduced.
Today become
competitive internet biz
Organized into categories and sub categories
Categories identified, from Arts and Humanities to
Society and Culture. (and everything in between).
Focusing on classifying the data.
Inspire people to make a positive impact on their
Investing on human effort.
human editors provide the brain power and intuition
needed to classify the web's many offerings.
Development cont..
Through this human effort, Yahoo has become the
de-facto Dewey Decimal System for categorizing
web sites.
it probably adds more sites to the guide than it has
in the past.
Yahoo automatically sends queries to its partner
AltaVista, should it fail to find a match within its own
As Yahoo is not a search engine, it cannot add the
same instant indexing service. But it competes with
search engines.
To maintain market share!
“The users are finding what they need,"
said Srinija Srinivasan Yahoo's
Ontological Yahoo, or Director of Surfing
"Our primary goal is to satisfy the users,
not the listers," Srinivasan said.
Buying Binge
Excellence using user data over 133 million (2004)
Deep Relations
Know better personalize searchers by their profile
Advertiser prefer splashy, animated ads ~ relationship
bolstered with major auto makers & entertainment giants
Commercial skew
65% result page ~ commerce related vs 27% by Google
no clear attempt to either target those users with
advertising or extend any real value added services.
They have made acquisitions such as MyBlogLog and Flickr. For what
Publisher’s do not have a chance to leverage
Yahoo’s size and revenue potential the way that
Google has empowered their users, through the
contextual ad system.
Feb, 2004 launch by Reiterating it’s supports for biz
model at “period inclusion” stirred up controversy
Yahoo automatically sends queries to its
partner AltaVista, should it fail to find a
match within its own listings.
What This Privacy Policy Covers
 Yahoo!
treats personal information that
Yahoo! collects and receives
 Yahoo! participates in the Safe Harbor
program developed by the U.S. Department of
Commerce and the European Union
In general..
 to
customize the advertising and content you
see, fulfill your requests for products and
services, improve our services, contact you,
conduct research, and provide anonymous
reporting for internal and external clients.
 child
under age 13 attempts to register with Yahoo!,
to create a Yahoo! Family Account to obtain parental
 Yahoo! does not ask a child under age 13 for more
personal information, as a condition of participation,
than is reasonably necessary to participate in a given
activity or promotion.
Information Sharing and Disclosure
Yahoo! does not rent, sell, or share personal information with
other people or non-affiliated companies except to provide
products or services you've requested, when we have your
permission, or under the following circumstances:
The About system is covered by one or
more of the following patents:
 U.S.
Patent No. 5,918,010
 U.S. Patent No. 6,081,788
 U.S. Patent No. 6,157,926
 U.S. Patent No. 6,195,681
 U.S. Patent No. 6,226,648
 U.S. Patent No. 6,336,132
 Australian Patent No. 729,891
 Other Patents Pending.
Yahoo Search Tips:
By default Yahoo returns results that include all of your
search terms
To exclude words use a minus sign [cat -tabby] would
show all results about cats with no mention of tabby.
Yahoo search results also shows related searches,
which are based on other searches by users with similar
To search for a map, use map [location]
To search for dictionary definitions use "define" [define
To search a single domain use site [
DVD] would search Webopedia for the term DVD.
Bing is a new search engine from Microsoft that was
launched on May 28, 2009.
Microsoft calls it a "Decision Engine," because it's
designed to return search results in a format that
organizes answers to address your needs.
When you search on Bing, in addition to providing
relevant search results, the search engine also
shows a list of related searches on the left-hand
side of the search engine results page (SERP).
You can also access a quick link to see recent
search history. Bing uses technology from a
company called Powerset, which Microsoft acquired
Bing launched with several features that are unique in
the search market.
For example, when you mouse-over a Bing result a
small pop-up provides additional information for that
result, including a contact e-mail address if available.
The main search box features suggestions as you type,
and Bing's travel search is touted as being the best on
the net. Bing is expected to replace Microsoft Live
Bing Search Tips:
You can search for feeds using feeds: before the
To search Bing without a background image use
To turn the background image back on, use
To change the number of search results returned
per page, click "Extras" (on top-right of page) and
select "Preferences". Under Web Settings / Results
you can choose 10, 15, 30 or 50 results
Semantic Web
The Semantic Web is a web of data. There is
lots of data we all use every day, and it is not
part of the web. I can see my bank statements
on the web, and my photographs, and I can see
my appointments in a calendar. But can I see my
photos in a calendar to see what I was doing
when I took them? Can I see bank statement
lines in a calendar?
Why not? Because we don't have a web of data.
Because data is controlled by applications, and
each application keeps it to itself.
Semantic Web
The Semantic Web is about two things. It is
about common formats for integration and
combination of data drawn from diverse sources,
where on the original Web mainly concentrated
on the interchange of documents. It is also about
language for recording how the data relates to
real world objects. That allows a person, or a
machine, to start off in one database, and then
move through an unending set of databases
which are connected not by wires but by being
about the same thing.
Semantic Web
The word semantic stands for the meaning
 The semantic of something is the meaning
of something.
 The Semantic Web = a Web with a
Semantic Web
The Semantic Web is a web that is able to
describe things in a way that computers can
 The
Beatles was a popular band from Liverpool.
 John Lennon was a member of the Beatles.
 "Hey Jude" was recorded by the Beatles.
Sentences like the ones above can be
understood by people. But how can they be
understood by computers?
Semantic Web
Statements are built with syntax rules. The
syntax of a language defines the rules for
building the language statements. But how
can syntax become semantic?
 This is what the Semantic Web is all
about. Describing things in a way that
computers applications can understand it.
 The Semantic Web is not about links
between web pages.
Semantic Web
The Semantic Web describes the relationships
between things (like A is a part of B and Y is a
member of Z) and the properties of things
(like size, weight, age, and price)
•Hypermedia is a term that has been around
since the 1940's.
•It refers to information linked together in an
easily accessible way.
•The Internet thrives on hypermedia and allows
videos to be linked to graphic buttons or text
and other content found to be accessible simply
with a mouse click.
•Hypermedia is more a method for accessing
available information, which is the end result
An example of hypermedia is hypertext links. When an
Internet user enters a search term in Google or Yahoo
and clicks the search button to find results, the
information is presented as hypertext links with a bit of
text describing the link. This helps the web surfer decide
if these links are relevant to them and if they are worth
viewing. If the first link is something that would be useful
based on the blurb provided, clicking on the hypermedia
— in this case a hypertext link — will take the web surfer
to relevant information regarding their search.
A blurb is a brief piece of writing used in the
advertising of a creative work. The classic
example of a blurb is the quote smeared across
the cover of a bestselling novel which reads
something like “absolutely thrilling.” Blurbs are
designed to drum up interest in the creative
work, hopefully thereby increasing sales, and
the hunt for blurbs is a perennial quest for many
artists, especially for people who are just starting
out in their field.
Hypermedia is used as a logical extension of
the term hypertext in which graphics, audio,
video, plain text and hyperlinks intertwine to
create a generally non-linear medium of
information. This contrasts with the broader term
multimedia, which may be used to describe noninteractive linear presentations as well as
hypermedia. It is also related to the field of
Electronic literature. A term first used in a 1965
article by Ted Nelson
The World Wide Web is a classic example of
hypermedia, whereas a non-interactive cinema
presentation is an example of standard
multimedia due to the absence of hyperlinks.
The first hypermedia work was, arguably, the
Aspen Movie Map. Atkinson's HyperCard
popularized hypermedia writing, while a variety
of literary hypertext and hypertext works, fiction
and nonfiction, demonstrated the promise of
Most modern hypermedia is delivered via
electronic pages from a variety of systems
including Media players, web browsers,
and stand-alone applications. Audio
hypermedia is emerging with voice
command devices and voice browsing.
Hypermedia may be developed a number of ways.
Any programming to Hypermedia can be used to
write programs that link data from internal variables
and nodes for external data files. Multimedia
development software such as Adobe Flash, Adobe
Director, Macromedia Authorware, and MatchWare
Mediator may be used to create stand-alone
hypermedia applications, with emphasis on
entertainment content. Some database software such
as Visual FoxPro and FileMaker Developer may be
used to develop stand-alone hypermedia
applications, with emphasis on educational and
business content management.
Process of writing
and reading using
traditional linear
Process of writing and
reading using non-linear
Hypermedia and Human Memory
Human memory is associative. We associate pieces
of information with other information and create
complex knowledge structures. We often remember
information via association. That is a person starts
with an idea which reminds them of a related idea or
a concept which triggers another idea. The order in
which a human associates an idea with another idea
depends on the context under which the person
wants information. That is a person can start with a
common idea and can end up associating it to
completely different sequences of ideas on different
Hypermedia and Human
When writing, an author converts his knowledge
which exists as a complex knowledge structure into
an external representation. Physical media such as
printed material and video tapes only allow us to
represent information in an essentially linear
manner. Thus the author has to go through a
linearisation process to convert his knowledge to a
linear representation. This is not natural. So the
author will provide additional information, such as a
table of contents and an index, to help the reader
understand the overall information organisation.
Hypermedia and Human
Hypermedia, using computer supported links,
allows us to partially mimic writing and reading
processes as they take place inside our brain.
We can create non linear information structures
by associating chunks of information in different
ways using links. Further we can use a
combination of media consisting of text, images,
video, sound and animation to enrich the
representation of information.
Hypermedia and Human
It is not necessary for an author to go through a
linearisation process of the author’s knowledge
when writing. Also the reader can have access
to some of the information structures the author
had when writing the information. This will help
the readers to create their own representation of
knowledge and to integrate it into existing
knowledge structures.
Hypermedia and Human
In addition to being able to access information
through association, hypermedia applications
are strengthened by a number of additional
aspects. These include an ability to incorporate
various media, interactivity, vast data sources,
distributed data sources, and powerful search
engines. These make hypermedia a very
powerful tool to create, store, access and
manipulate information.
Hypermedia Linking
Hypermedia systems - indeed information in
general - contains various types of relationships
between elements of information. Examples of
typical relationships include similarity in meaning
or context (Vannevar Bush relates to
Hypermedia), similarity in logical sequence
(Chapter 3 follows Chapter 2) or temporal
sequence (Video 4 starts 5 seconds after Video
3), and containment (Chapter 4 contains Section
Hypermedia Systems
Hypermedia allows these relationships to be instantiated
as links which connect the various information elements,
so that these links can be used to navigate within the
information space. We can develop different taxonomies
of links, in order to discuss and analyse how they are
best utilised.
One possible taxonomy is based on the mechanics of
the links. We can look at the number of sources and
destinations for links (single-source single-destination,
multiple-source single-destination, etc.) the directionality
of links (unidirectional, bidirectional), and the anchoring
mechanism (generic links, dynamic links, etc.).
Hypermedia Systems
A more useful link taxonomy is based on
the type of information relationships being
represented. In particular we can divide
relationships (and hence links) into those
based on the organisation of the
information space (structural links) and
those related to the content of the
information space (associative and
referential links).
Structural Links:
The information contained within the hypermedia
application is typically organised in some suitable fashion.
This organisation is typically represented using structural
links. We can group structural links together to create
different types of application structures. If we look, for
example, at a typical book, then this has both a linear
structure (from the beginning of the book linearly to the
end of the book) and usually a hierarchical structure (the
book contains chapters, the chapters contain sections, the
sections contain …). Typically in a hypermedia application
we try to create and utilise appropriate structures.
Structural Links:
These structures are important in that they provide a form
for the information space, and hence allow the user to
development an understanding of the scale of the
information space, and their location within this space.
This is very important in helping the user navigate within
the information space. Structural relationships do not
however imply any semantic relationship between linked
information. For example, a chapter in a book which
follows another is structurally related, but may not contain
any directly related information. This is the role of
associative links.
Associative Links:
An associative link is an instantiation of a semantic
relationship between information elements. In other
words, completely independently of the specific structure
of the information, we have links based on the meaning
of different information components. The most common
example which most people would be familiar with is
cross-referencing within books ("for more information on
X refer to Y"). It is these relationships - or rather the links
which are a representation of the relationships - which
provide the essence of hypermedia, and in many
respects can be considered to be the defining
Referential Links:
A third type of link which is often defined (and is related to
associative links) is a referential link. Rather than
representing an association between two related
concepts, a referential link provides a link between an
item of information and an elaboration or explanation of
that information. A simple example would be a link from a
word to a definition of that word. One simple way of
conceptualizing the difference between associative and
referential links is that the items linked by an associative
link can exist independently, but are conceptually related.
However the item at one end of a referential link exists
because of the existence of the other item.
Hypermedia Model
Run-time layer: presentation of the hypertext, user
interaction, dynamics
(Presentation specifications)
Storage layer: database containing network of nodes and
Within component layer: the contents/structure of nodes.
Designing a Hypermedia
Important questions in designing the
hypermedia are:
 Converting
linear text to hypertext
 Text format conversions
 Dividing the text into nodes
 Link structures, automatic generation of links
 Are nodes in a database or are they separate
files on file system
 Client-server of standalone
Designing a Hypermedia
Text indexing is a well know problem area and results
from there can be used to study automatic generation of
links. In principle, a document can be analysed
semantically (with the help of AI), statistically or lexically
(by computing the occurences of words). Problems in
semantic analysis are that natural language is not easy
to understand by the computer. In lexical analysis
problems are for example the conflation of words and
regognition of phrases (Esim. matriisi, matriisin,
matriisilla mutta ei jälki, jälkeen). Solutions:
Conflation algorithm
Stemming algorithm
Hypermedia Applications
The Crossroads The Crossroads ACM Student
Magazine. SIGLINK Home Page ACM Special
Interest Group on Hypermedia WWW server.
Apple Computer: The Virtual Campus Apple
Computer Higher Education Home Page and
The Virtual Campus project. Amsterdam Tourist
Guide An example of WWW tourism. Similar
presentation can be found at least from Paris.
WWW Virtual Library Mathematical index search
(CSC) SIGLINK Home Page HyperKalevala projekti 1993-1995
Hypermedia Systems
A well known hypermedia system is Intermedia
developed at Brown Universitys Institute for Research in
Information and Scholarship (IRIS) between 1985 and
1990 (see for example [Haan, ACM Comm. Jan 1992]).
Intermedia is a multiuser hypermedia framework where
hypermedia functionality is handled at system level.
Intermedia presents the user graphical file system
browser and a set of applications that can handle text,
graphics, timelines, animations and videodisc data.
There is also a browser for link
information, a set of linguistic tools and the
ability to create and traverse links. Link
information is isolated from the documents
and are saved into separate database.
The start and end position of the link are
called anchors.
World Wide Web
World Wide Web (WWW) is a global
hypermedia system on Internet. It can be
described as wide-area hypermedia
information retrieval initiative aiming to
give universal access to a large universe
of documents [Hug93]. It was originally
developed in CERN for transforming
research and ideas effectively throughout
the organization [Hug93].
World Wide Web
Through WWW it is possible to deliver
hypertext, graphics, animation and sound
between different computer environments.
To use WWW the user needs a browser,
for example NCSA Mosaic and a set of
viewers, that are used to display complex
graphics, animation and sound. NCSA
Mosaic is currently available on XWindows, Windows and Macintosh.
NSCA Mosaic and Netscape
The browser itself can read hypertext documents
that are marked with HyperText Markup Language
(HTML). HTML is based on Standard Generalized
Markup Language (SGML), and contains all
formatting and link information as ASCII text.
HTML documents can reside on different
computers on Internet, and a document is
referenced by URL (Universal Resource Locator).
URL is of the form where is the name of the computer
and doc.html is the search path to the document.
NSCA Mosaic and Netscape
In order to create a node for WWW, a
HTTP (Hypertext Transfer Protocol) server
application is needed. A link in WWW
document is always expressed as URL.
Links can be references to files in ftpservers, Gophers, HTTP-servers or
Usenet newsgroup.
NSCA Mosaic and Netscape
Netscape is a popular WWW browser
developed by Netscape Communications
Corp. Netscape 1.1 supports some HTML
3.0 features (tables) and has interesting
API, that makes it possible to develop
Arena is an experimental WWW browser
developed in CERN. It supports HTML 3.0
and thus is able to display mathematical
formulas and tables.
Recently, The Mathsoft company has
announced MathBrowser, a WWW-browser
that can display HTML and MathCAD
documents. MathBrowser has a
computational engine and interface similar
than MathCAD, allowing the student to edit
MathCAD documents through the Internet.
MathBrowser is used to distribute a collection
of Shaum's outline series in electronic form.
HyperCard is hypermedia authoring
software for Macintosh computers. It is
based on a card-metafora. Hypercard
application is called a stack or a collection
of stacks. Each stack consists of cards
and only one card is visible in a stack. A
card is displayed in fixed size window.
Hypertext links can be programmed by
creating buttons and writing a HyperTalk
script for the button.
LinksWare is a commercial hypermedia
authoring software for Macintosh that can
create hypertext links between text files
created with different word processors.
LinksWare uses a set of translators to
convert files to its own format (Claris
XTND system). This can make the
opening of a file very slow.
LinksWare can open files that contain
mathematical text, but files may be formatted
differently than in original document,
especially formulae do not appear to have
proper line heights. In addition, it can not
create links to other applications. However, it
can create links to Apple script command files
that can open an application and execute
commands for that application.
Hyper-G is the name of an hypermediaproject
currently under development at the IICM. Like
other hypermedia undertakings, Hyper-G will
offer facilities to access a diversity of databases
with very heterogeneous information (from
textual data, to vector graphics and digitized
pictures, courseware and software, digitized
speech and sound, synthesized music and
speech, and digitized movie-clips). Like other
hypermedia-systems it will allow browsing,
searching, hyperlinking, and annotation.
Future Directions of Hypermedia
There is a trend that hypertext features
start to appear in ordinary applications like
word processors, spreadsheets etc. This is
called hypertext functionality within an
application. Good examples of this is
Microsoft Internet Assistant, MathBrowser
and MatSyma. Eventually, this will lead to
system software containing support for
hypertext features, nodes, links and

Yahoo! Search Engine - Universiti Putra Malaysia