Taxonomy Strategies LLC
Benchmarking Taxonomies
Joseph A. Busch
30 November 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Who I am: Joseph Busch
 Over 25 years in the business of organized information.





Founder, Taxonomy Strategies LLC
Director, Solutions Architecture, Interwoven
VP, Infoware, Metacode Technologies
Program Manager, Getty Foundation
Manager, Pricewaterhouse
 Metadata and taxonomies community leadership.
 President, American Society for Information Science &




Technology
Director, Dublin Core Metadata Initiative
Adviser, National Research Council Computer Science and
Telecommunications Board
Reviewer, National Science Foundation Division of Information
and Intelligent Systems
Founder, Networked Knowledge Organization Systems/Services
Taxonomy Strategies LLC The business of organized information
2
Recent & current projects
Taxonomy Strategies LLC The business of organized information
3
Agenda
 Qualitative methods
 Walk-throughs
 Usability testing
 User satisfaction surveys
 Tagging samples
 Quantitative methods
Taxonomy Strategies LLC The business of organized information
4
Qualitative taxonomy benchmarking methods
Method
Process
Who
Requires
Validation
Walk-thru
Show &
explain
 Taxonomist
 SME
 Team
 Rough
taxonomy
 Approach
 Appropriateness to task
Walk-thru
Check
conformance
to editorial
rules
 Taxonomist
 Draft
taxonomy
 Editorial
Rules
 Consistent look and feel
Usability
Testing
Contextual
analysis (card
sorting,
scenario
testing, etc.)
 Users
 Rough
taxonomy
 Tasks &
Answers
 Tasks are completed
successfully
 Time to complete task is
reduced
User
Satisfaction
Survey
 Users
 Rough
Taxonomy
 UI Mockup
 Search
prototype
Reaction to taxonomy
Reaction to new interface
Reaction to search results
Tagging
Samples
Tag sample
content with
taxonomy
 Taxonomist
 Team
 Indexers
 Sample
content
 Rough
taxonomy
(or better)
Content ‘fit’
Fills out content inventory
Training materials for people &
algorithms
Basis for quantitative
methods
Taxonomy Strategies LLC The business of organized information
5
Walk-through method—
Show & explain
ABC Computers.com
Content
Type
Competency
Industry
Service
Award
Case Study
Contract &
Warranty
Demo
Magazine
News & Event
Product
Information
Services
Solution
Specification
Technical Note
Tool
Training
White Paper
Other Content
Type
Business &
Finance
Interpersonal
Development
IT Professionals
Technical
Training
IT Professionals
Training &
Certification
PC Productivity
Personal
Computing
Proficiency
Banking &
Finance
Communications
E-Business
Education
Government
Healthcare
Hospitality
Manufacturing
Petrochemocals
Retail /
Wholesale
Technology
Transportation
Other
Industries
Assessment,
Design &
Implementati
on
Deployment
Enterprise
Support
Client Support
Managed
Lifecycle
Asset
Recovery &
Recycling
Training
Taxonomy Strategies LLC The business of organized information
Product
Family
Desktops
MP3 Players
Monitors
Networking
Notebooks
Printers
Projectors
Servers
Services
Storage
Televisions
Non-Dell
Brands
Audience
Line of
Business
RegionCountry
All
Business
Dell Employee
Education
Gaming
Enthusiast
Home
Investor
Job Seeker
Media
Partner
Shopper
First Time
Experienced
Advanced
Supplier
All
Home & Home
Office
Gaming
Government,
Education &
Healthcare
Medium &
Large
Business
Small Business
All
Asia-Pacific
Canada
Dell EMEA
Japan
Latin America &
Caribbean
United States
6
Walk-through method—
Editorial rules consistency check















Abbreviations
Ampersands
Capitalization
General…, More…, Other…
Languages & character sets
Length limits
Multiple parents
Plural vs. singular form
Scope notes
Serial comma
Sources of terms
Spaces
Synonyms & acronyms
Term order (Alphabetic or …)
Term label order (Direct vs.
inverted)
Rule Name
Abbreviations
Abbreviations, other than colloquial
terms and acronyms, shall not be used
in term labels.
Example:
Public Information
NOT:
Public Info.
Ampersands
The ampersand [&] character shall be
used instead of the word ‘and’.
Example:
Licensing & Compliance
NOT:
Licensing and Compliance
Capitalization
Title case capitalization shall be used.
Example: Customer Service
NOT:
CUSTOMER SERVICE
NOT:
Customer service
NOT:
customer service
General…,
More…,
Other…
The term labels “General…”, “More…”,
and “Other…” shall be used for
categories which contain content items
that are not further classifiable.
Example:
“Other Property”
“Other Services”
“General Information”
“General Audience”
…
…
…
Taxonomy Strategies LLC The business of organized information
Editorial Rule
7
Usability method—
Task-based card sorting (1)
 15 representative questions were selected
 Perspective of various organizational units
 Most frequent website searches
 Most frequently accessed website content
 Correct answers to the questions were agreed in advance by team.
 15 users were tested
 Did not work for the organization
 Represented target audiences
 Testers were asked “where would you look for …”
 “under which facet… Topic, Commodity, or Geography?”
 Then, “… under which category?”
 Then, “…under which sub-category?”
 Tester choices were recorded
 Testers were asked to “think aloud”
 Notes were taken on what they said
 Pre- and post questions were asked
 Tester answers were recorded
Taxonomy Strategies LLC The business of organized information
8
Usability method—
Task-based card sorting (2)
3. What is the average
farm income level in
your state?
1. Topics
2. Commodities
3. Geographic Coverage
1.
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
Topics
Agricultural Economy
Agriculture-Related Policy
Diet, Health & Safety
Farm Financial
Conditions
Farm Practices &
Management
Food & Agricultural
Industries
Food & Nutrition
Assistance
Natural Resources &
Environment
Rural Economy
Trade & International
Markets
Taxonomy Strategies LLC The business of organized information
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
Farm Financial
Conditions
Costs of Production
Commodity Outlook
Farm Financial
Management &
Performance
Farm Income
Farm Household
Financial Well-being
Lenders & Financial
Markets
Taxes
9
Analysis of task-based card sorting (1)
Find-it Tasks
User 1
User 2
User 3
User 4
User 5
1. Cotton
Cotton
Cotton
Asia
Cotton
Cotton
2. Mad cow
Cattle
Food Safety
Cattle
Cattle
Cattle
3. Farm income
Farm Income
Farm Income
US States
Farm Income
Farm Income
4. Fast food
Food
Consumption
Diet Quality &
Nutrition
Food
Expenditures
Diet Quality &
Nutrition
Diet Quality &
Nutrition
5. WIC
WIC Program
WIC Program
WIC Program
WIC Program
WIC Program
6. GE Corn
Corn
Corn
Corn
Corn
Corn
7. Foodborne illness
Foodborne
Disease
Foodborne
Disease
Consumer Food
Safety
Foodborne
Disease
Foodborne
Disease
Retailing &
Wholesaling
8. Food costs
Food Prices
Market Structure
Market Analysis
Food
Expenditures
9. Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
10. Small Farms
Farm Structure
Farm Structure
Farm Structure
Farm Structure
Farm Structure
11. Traceability
Food System
Labeling Policy
Food Safety
Innovations
Food Safety
Policy
Food Prices
12. Hunger
Food Security
Food Security
Food Security
Food Security
Food Security
13. Trade balance
Commodity
Trade
Trade & Intl
Markets
Commodity
Trade
Market Analysis
Commodity
Trade
14. Conservations
Cropping
Practices
Conservation
Policy
Conservation
Policy
Conservation
Policy
Conservation
Policy
Trade Policy
Food Safety &
Trade
Market Analysis
Commodity
Trade
15. Trade restrictions
WTO
Analysis of task-based card sorting (2)
 In 80% of the trials users looked for information under the
categories that we expected them to look for it.
 Breaking-up topics into facets makes it easier to find
information, especially information related to
commodities.
Taxonomy Strategies LLC The business of organized information
11
Analysis of task-based card sorting (3)
Test Questions
%
Correct
%
Agree
1. Cotton
91%
82%
2. Mad cow
73%
64%
100%
55%
91%
73%
5. WIC
100%
100%
6. GE corn
100%
100%
7. Foodborne illness
82%
82%
8. Food costs
55%
27%
100%
100%
10. Small farms
91%
91%
11. Traceability
36%
18%
100%
73%
13. Trade balance
36%
64%
14. Conservation
91%
91%
15. Trade restrictions
55%
36%
3. Farm income
4. Fast food
9. Tobacco
12. Hunger
Taxonomy Strategies LLC The business of organized information
Possible change required.
Change required.
Policy of “Traceability” needs to be clarified.
Use quasi-synonyms.
On these trials, only 50% looked in the right
category, & only 27-36% agreed on the
category.
Possible error in categorization of this
question because 64% thought the answer
should be “Commodity Trade.”
12
User satisfaction method—
Card Sort Questionnaire (1)
 Was it easy, medium or difficult to choose the appropriate
Topic?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Commodity?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Geographic Coverage?
– Easy
– Medium
– Difficult
Taxonomy Strategies LLC The business of organized information
13
User satisfaction method—
Card Sort Questionnaire (2)
More Difficult
Easier
Difficult
1.50
-->
1.00
Easy
2.00
0.50
Topic
Commodity
Geography
Facet
Taxonomy Strategies LLC The business of organized information
14
User interface survey—
Which search UI is ‘better’?
 Criteria
 User satisfaction
 Success completing tasks
 Confidence in results
 Fewer dead ends
 Methodology
 Design tasks from specific to




general
Time performance
Calculate success rates
Survey subjective criteria
Pay attention to survey
hygiene:
–
–
–
Participant selection
Counterbalancing
T-scores
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
15
User interface survey — Results (1)
Which Interface would you rather use for these tasks?
Find images of roses
Google-like
Baseline
Faceted
Category
15
16
Find all works from a certain period
2
30
Find pictures by 2 artists in the same media
1
29
…
Overall assessment:
Google-like
Baseline
Faceted
Category
More useful for your usual tasks
4
28
Easiest to use
8
23
Most flexible
6
24
28
3
Helped you learn more
1
31
Overall preference
2
29
More likely to result in dead-ends
…
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
16
User interface survey — Results (2)
9
8
7
6
5
4
3
2
1
0
y
s
a
E
7.6
7.7
7.2
6.7
6.0
6.3
4.7
5.8
7.8
7.4
6.0
5.5
4.8
4.0
4.6
3.5
to
e
Us
m
Si
e
pl
e
Fl
le
b
i
x
ou
i
d
e
T
Google-like Baseline
Faceted Category
Taxonomy Strategies LLC The business of organized information
s
In
re
te
in
st
g
Ea
sy
to
ow
r
B
se
le
b
a
oy
j
En
O
rw
e
v
lm
e
h
g
in
Source: Yee, Swearingen, Li, & Hearst
17
Tagging samples—
How many items?
Goal
Illustrate metadata schema
Number of
Items
1-3
Criteria
Random (excluding junk)
Develop training
documentation
10-20
Show typical & unusual
cases
Qualitative test of small
vocabulary (<100 categories)
25-50
Random (excluding junk)
3-10X
number of
categories
Use computer-assisted
methods when more than
10-20 categories. Preexisting metadata is the
most meaningful.
Quantitative test of
vocabularies *
* Quantitative methods require large amounts of tagged content. This requires
specialists, or software, to do tagging. Results may be very different than how
“real” users would categorize content.
Taxonomy Strategies LLC The business of organized information
18
Tagging samples—
Manually tagged metadata sample
Attribute
Values
Title
Jupiter’s Ring System
URL
http://ringmaster.arc.nasa.gov/jupiter/
Description
Overview of the Jupiter ring system. Many images,
animations and references are included for both the
scientist and the public.
Content Types
Web Sites; Animations; Images; Reference Sources
Audiences
Educators; Students
Organizations
Ames Research Center
Missions & Projects
Voyager; Galileo; Cassini; Hubble Space Telescope
Locations
Jupiter
Business Functions
Scientific and Technical Information
Disciplines
Planetary and Lunar Science
Time Period
1979-1999
Taxonomy Strategies LLC The business of organized information
19
Tagging samples—
Spreadsheet for tagging 10’s-100’s of items
1) Clickable URLs for sample content
2) Review small sample and describe
3) Drop-down for tagging (including
‘Other’ entry for the unexpected
4) Flag questions
Taxonomy Strategies LLC The business of organized information
20
Rough bulk tagging—
Facet demo (1)
 Collections: 4 content sources
 NTRS, SIRTF, Webb, Lessons Learned
 Taxonomy
 Converted MultiTes format into RDF for Seamark
 Metadata
 Converted from existing metadata on web pages, or
 Created using simple automatic classifier (string matching with
terms & synonyms)
 250k items, ~12 metadata fields, 1.5 weeks effort
 OOTB Seamark user interface, plus logo
Taxonomy Strategies LLC The business of organized information
21
Rough bulk tagging—
Facet demo (2)
Taxonomy Strategies LLC The business of organized information
22
Agenda
 Qualitative methods
 Quantitative methods
 Distribution
 Usability testing
 Query log & click trail examination
Taxonomy Strategies LLC The business of organized information
23
Document distribution—
How evenly does it divide the content?
 Documents do not distribute uniformly across categories
 Zipf (1/x) distribution is expected behavior
 80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in
Library of Congress Database
Leading candidate for
splitting
Number of Records
350,000
300,000
250,000
Leading candidates
for merging
200,000
150,000
100,000
50,000
s
tic
St
at
is
bl
io
gr
ap
hy
Bi
er
at
ur
e
lit
itio
ns
Ju
ve
ni
le
Ex
hi
b
ct
io
n
Fi
ap
s
M
ca
ls
Pe
rio
di
og
ra
ph
y
Bi
Co
ng
re
ss
es
0
Top 10 Content Types
Taxonomy Strategies LLC The business of organized information
24
Document distribution—
How evenly does it divide the content?
 Methodology: 115 randomly selected URLs from corporate intranet
search index were manually categorized. Inaccessible files and ‘junk’
were removed.
 Results: Slightly more uniform than Zipf distribution. Above the curve
is better than expected.
Measured v Expected Intranet Content Type Distribution
25
# Documents
20
15
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Templates
Marketing &
Sales
Operations &
Internal
Communications
Manuals &
Learning
Materials
News & Events
People, Groups
& Places
0
Content Type
Taxonomy Strategies LLC The business of organized information
25
Document distribution— How does taxonomy
“shape” match that of content?
Background:
 Hierarchical taxonomies allow
comparison of “fit” between content
and taxonomy areas
Methodology:
 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2
terms per resource)
 Counts of terms and documents
summed within taxonomy hierarchy
Results:
 Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%)
 Mismatches between term% and
document% flagged
Term Group
%
Terms
%
Docs
Administrators
7.8
15.8
Community Groups
2.8
1.8
Counselors
3.4
1.4
Federal Funds Recipients and
Applicants
9.5
34.4
Librarians
2.8
1.1
News Media
0.6
3.1
Other
7.3
2.0
Parents and Families
2.8
6.0
Policymakers
4.5
11.5
Researchers
2.2
3.6
School Support Staff
2.2
0.2
Student Financial Aid Providers
1.7
0.7
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
Taxonomy Strategies LLC The business of organized information
26
Usability testing—
How intuitive (repeatable) are the categorizations?
 Methodology: Closed Card Sort
 For alpha test of a grocery site
 15 Testers put each of 71 best-selling product types into one of 10
pre-defined categories
 Categories where fewer than 14 of 15 testers put product into
same category were flagged
Taxonomy Strategies LLC The business of organized information
27
Usability testing—
How intuitive (repeatable) are the categorizations?
Taxonomy Strategies LLC The business of organized information
28
Usability testing—
How intuitive (repeatable) are the categorizations?
% of Testers
Cumulative % of
Products
With Poly-Hierarchy
15/15
54%
69%
14/15
70%
83%
13/15
77%
93%
12/15
83%
100%
11/15
85%
100%
<11/15
100%
100%
Taxonomy Strategies LLC The business of organized information
29
Pop Quiz
What is the #1 underused source of quantitative
information on how to improve your taxonomy?
Query Logs & Click Trails
Taxonomy Strategies LLC The business of organized information
30
Query log & click trail examination—
Who are the users & what are they looking for?
 Only 30-40% of organizations regularly examine their
logs*.
 Sophisticated software available, but don’t wait.
 80% of value comes from basic reports
Taxonomy Strategies LLC The business of organized information
31
Query log & click trail examination–
Query log
UltraSeek Reporting
 Top queries
 Queries with no results
 Queries with no click-through
 Most requested documents
 Query trend analysis
 Complete server usage
summary
Taxonomy Strategies LLC The business of organized information
32
Query log & click trail examination—
Click trail packages
 iWebTrack
 NetTracker
 OptimalIQ
 SiteCatalyst
 Visitorville

 WebTrends
Taxonomy Strategies LLC The business of organized information
33
In Summary:
Start a “Measure & Improve” mindset
 Taxonomy changes do not stand alone
 Search system improvements
 Navigation improvements
 Content improvements
 Process improvements
Taxonomy Strategies LLC The business of organized information
34
Taxonomy Strategies LLC
Questions
Joseph A. Busch
[email protected]
http://www.taxonomystrategies.com
30 November 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Bibliography
K. Yee, K. Swearingen, K. Li, M. Hearst. "Searching and organizing:
Faceted metadata for image search and browsing." Proceedings of the
Conference on Human Factors in Computing Systems (April 2003)
http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf
R. Daniel and J. Busch. "Benchmarking Your Search Function: A Maturity
Model.” http://www.taxonomystrategies.com/presentations/maturity-200505-17%28as-presented%29.ppt
Taxonomy Strategies LLC The business of organized information
36
Descargar

Making the Business Case for Taxonomy