The Invisible Web
Gary Price, MLIS
George Washington University
Chris Sherman
Associate Editor
Search Engine Watch
How Search Engines Work
Crawler
URL1
URL2
Indexer
The Web
URL3
Search
Engine
Database
Eggs?
URL4
Eggs.
Eggs
All About
- 90%
Eggo
Your
Eggs
- 81%
EgoBrowser
by40%
Huh?
S. I.- Am
10%
What is the Invisible Web?
• “Stuff” that search engine crawlers
(spiders) can not -- or will not -add to their databases
• 2 to 50 times larger than the
visible Web
• Resources often much higher
quality than the visible Web
What is the Invisible Web?
• Certain file formats (PDF, Flash,
Office files, streaming media)
– Why? They aren’t HTML text
• Most real-time data (stock quotes,
weather, airline flight info)
– Why? Ephemeral & storage intensive
What is the Invisible Web?
• Dynamically generated pages
(cgi, javascript, asp, or most pages
with “?” in URL)
– Why? Spider traps
• Web accessible databases
– Why? Spiders can’t type
Invisible Web Gateways
• Intelliseek
– http://www.invisibleweb.com
– http://beta.profusion.com
• Complete Planet
– http://www.completeplanet.com/
• Librarians’ Index to the Internet
– http://www.lii.org
The Invisible Web
& The Librarian
The Need For Knowledge!
• Awareness that the IW Exists
Maybe the IW Hold the Content Your Users Can’t
Find! What is the cost in both wasted time/effort and
total frustration?
• Let Others Know About the IW
• Awareness of The Synonyms
– Invisible Web
– Deep Web
– Hidden Web
• Let the Content be Your Calling Card
Focus Less on the Amount IW Data
The Invisible Web
& The Librarian
Why is the IW Useful to the Librarian
and the End User?
• Quality of Content (Authority)
• Deep Content on Subject Area (Comprehensiveness)
• Focused Databases (Limited Scope)
Smaller Universe of Documents to Search (Maximize
Precision/Recall)
The Invisible Web
& The Librarian
Why is the IW Useful to the
Librarian & the End User?
• Material Unavailable Elsewhere on the Web
(Uniqueness)
• Many Options to Limit, Sort, Interact with the Data
(Maximize Precision)
• Timeliness vs. Time Lag of General Search Tools
(Currency)
The Invisible Web
& The Librarian
The IW, The Librarian, The Future
• What Happens If/When the General Search Tools
Crawl IW Material? Good News? Bad News?
• General Search Tools May NOT:
Offer Many Interactive/Limiting Tools
May Not be Updated/Refreshed (time lag) as
Frequently
Timeliness, making current info available is one of
the things the NET does well.
The Invisible Web
& The Librarian
The IW, The Librarian, The Future
• The Search Engine Business, Will IW Material be a
Priority?
• Just One Dialog or SilverPlatter Database?
NO, in Terms of Content!!!
• Yes, Common Interface, Syntax
Perhaps XML will Assist
The Invisible Web
& The Librarian
Challenges
•
•
•
•
•
•
It’s Not The Magic Bullet. It’s a Tool
We Still Need Traditional Online Databases
Learning Curve, Sorry!
Database Selection, When To Use the IW?
Numerous Interfaces, Syntax
A Non-Stop Flow of New Material
The Invisible Web
& The Librarian
Things To Do!
• Build Your Own Collections
Internet Resource Collection Development
• Mine Entire Sites, Often the IW Material Gets Little or
No Notice In Reviews
• Create Links When Possible DIRECT to the Interface.
• “Save the Time of the Web Researcher”
• Keep Current
The Invisible Web
& The Librarian
Types of IW Content in Librarian Terms
• Bibliographic
- OPAC’s
- Subject Bibs
• Non-Bibliographic
- Full-Text
-
Numeric
Graphic
Directory
Real-Time
Future Trends
• Killer apps will lead the way
– Research Index (CiteSeer)
• Search engines will work harder to
“find” Invisible Web content
– Inktomi (Index Connect, Ultraseek)
– WhizBang (“wrappers”)
• No matter what, there will always
be a problem!
Coming Soon
Available: July 2001
CyberAge Books 0-910965-51-X
http://www.invisible-web.net
Invisible Web:
Computer Science
• MacAfee World Virus Map
– http://www.mcafee.com
• ResearchIndex
– http://www.researchindex.com
Invisible Web:
Company Research
• European High-Tech Industry
Database
– http://www.tornadoinsider.com/radar/
• Kompass
– http://www.kompass.com
Invisible Web:
Intellectual Property
• Delphion Intellectual Property
Network
– http://www.delphion.com/
• [email protected] (European Patent
Office) Patent Database
– http://ep.espacenet.com/
Invisible Web:
Dictionaries & Languages
• EuroDicAutom
– http://eurodic.ip.lu
• Verbix
– http://www.verbix.com/index.html
Invisible Web:
Art & Artists
• ADAM (Art, Design, Architecture &
Media Information Gateway)
– http://adam.ac.uk/
• Artcyclopedia
– http://www.artcyclopedia.com/
Invisible Web:
Real-Time Information
• Flight Tracker
– http://www.trip.com/ft/home/0,2096,
1-1,00.shtml
• J-Track 3-D Satellite Locator
– http://liftoff.msfc.nasa.gov/realtime/J
Track/Spacecraft.html
Invisible Web:
Maps and Driving Directions
• MapBlast
– http://www.mapblast.com
• Streetmap.co.uk
– http://www.streetmap.co.uk/
Invisible Web:
Government Info
• Parline Database
– http://www.ipu.org
• United Nations Daily Press
Briefings
– http://www.un.org/News/
Invisible Web:
Health & Medicine
• Economics of Tobacco Control
Database
– http://www1.worldbank.org/tobacco/
database.asp
• International Digest of Health
Legislation
– http://www.who.int
Invisible Web:
News & Current Events
• Cold North Wind Newspaper
Archive Project
– http://www.coldnorthwind.com
• Financial Times Global Archive
– http://www.globalarchive.ft.com
Invisible Web:
Science
• Great Barrier Reef Online Image
Catalogue
– http://www.gbrmpa.gov.au/corp_site
/info_services/library/index.html
• Nuclear Explosions Database
– http://www.ausseis.gov.au/databases
Invisible Web:
Transportation
• Equasis (Merchant Ships)
– http://www.equasis.org/
• World Aircraft Accident Summary
(WAAS) Fatal Airline Accident
Subset
– http://www.waasinfo.net/
Descargar

The Invisible Web - Information Today, Inc.