Taxonomy Strategies LLC
Taxonomy & metadata
strategies for effective
content management
Melbourne, Sydney, Canberra
Masterclass
6-15 June 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
2
Who I am: Joseph Busch
 Over 25 years in the business of organized information.
 Founder, Taxonomy Strategies LLC
 Director, Solutions Architecture, Interwoven
 VP, Infoware, Metacode Technologies
– (acquired by Interwoven, November 2000)
 Program Manager, Getty Foundation
 Manager, Pricewaterhouse
 Metadata and taxonomies community leadership.
 President, American Society for Information Science & Technology
 Director, Dublin Core Metadata Initiative
 Adviser, National Research Council Computer Science and
Telecommunications Board
 Reviewer, National Science Foundation Division of Information and
Intelligent Systems
 Founder, Networked Knowledge Organization Systems/Services
Taxonomy Strategies LLC The business of organized information
3
What we do
Organize Stuff
Taxonomy Strategies LLC The business of organized information
4
For us, taxonomy work includes:
 Metadata specification defines
the properties needed to
describe content so that it can
be found & used.
 Vocabularies are collections of
terms that are used to specify
some of the metadata
properties.
 Some vocabularies are big
and hierarchical, some are
small and flat.
 An application profile specifies
what metadata & vocabularies
are required, and then
represents them formally.
Taxonomy Strategies LLC The business of organized information
5
Recent & current projects:
http://www.taxonomystrategies.com/html/clients.htm
Government
Commercial
Not-for-Profit
Taxonomy Strategies LLC The business of organized information
6
Who are you? What sectors do you work in?
Your Role
 Administrator
 Records Manager
 Content Manager
 Communications
 Editor
 Information Architect
 Usability Expert
 Librarian
 Knowledge Engineer
 Ontologist
 Chief Information Officer
Industrial Sector
 Agriculture & Processing
 Food, Lumber, Pulp & Paper
 Financial Services
 Banking & Insurance
 Government
 Public administration
 Public safety
 High Tech
 Computers, Software &
Telecommunications
 Heavy Manufacturing
 Steel, Automobiles & Aircraft
 Manufacturing
 Consumer Products
 Medical & Health Care
 Mining & Refining
 Petrochemicals, Oil & Gas
 Pharmaceuticals
Taxonomy Strategies LLC The business of organized information
7
Why are you here?
 What are the key questions that you want answered in today’s
workshop?
 Please rank the questions from the most important (5) to the least
important (1)
 Please provide your job title, organization and department; your name
is optional.
Priority (1-5)
Questions
Your title or role:
Your org or industry:
Your dept:
Your name:
Taxonomy Strategies LLC The business of organized information
(optional)
8
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
9
The Taxonomy problem: How to pick from > 5,000
faucets?
By:
 Category
 Price
 Brand
 Color/Finish
 # Handles
 Series Name
 Water Filter?
 Faucet Spray
 Handle Shape
 Soap Dispenser?
Taxonomy Strategies LLC The business of organized information
10
The main issue: What goes here?
 When do the
things in the list
change?
 How do we
maintain the list?
 What rules do we
follow?
Taxonomy Strategies LLC The business of organized information
11
Seven phases of taxonomy development
Week:
1 Identify
Objectives
2 Inventory
Resources
1
2
3
4
5
6
7
8
9
10
11
12
Conduct interviews
Identify, gather & review
resources
3 Specify
Metadata
4 Model
Content
5 Specify
Vocabularies
6 Specify
Procedures
7 Test & Train
Taxonomy Strategies LLC The business of organized information
Define fields &
purpose
Define content
chunks & XML
DTDs
Compile controlled
vocabularies
Develop workflow,
rules & procedures
Manually tag
small sample
12
Taxonomy design phases need to be iterated
Plan & Prototype
1 Identify
Objectives
2 Inventory
Resources
3 Specify
Metadata
4 Model
Content
5 Specify
Vocabularies
6 Specify
Procedures
7 Test & Train
Alpha Dev & Test
Review
tagged
samples,
default
procedures
Interview core team
and stakeholders
Interview
alpha
users
Gather
additional
resources, if
any
Identify, gather &
review resources
Define
content
chunks &
XML DTDs
Revise if needed,
bake into alpha
CMS
Compile
controlled
vocabularies
Develop
workflow
rules &
procedures
Manually
tag small
sample
Taxonomy Strategies LLC The business of organized information
Final D&T
Interview
beta users
Gather
additional
sources, if
any
Revise if
needed, bake
into alpha
CMS
Define fields &
purpose
Beta D&T
Revise, use in
alpha CMS
alpha workflows
in CMS
Use alpha
CMS to tag
larger sample
Modify
CMS for
beta
Modify
for 1.0
Modify CMS
for beta
Modify
for 1.0
Revise,
use in
beta
CMS
Modify &
extend
workflows
Use beta CMS
to tag larger
sample
Revise
using
team
procedu
re
Finalize
procedure
materials
Finalize training
materials & train
staff
13
Licensing an existing taxonomy
See Factiva’s taxonomy www.taxonomywarehouse.com
 There are usually license fees, but these will be less than
the effort to develop an equivalent taxonomy.
 But pre-existing taxonomies rarely fit an organization’s
needs and may require extensive customization.
Recommendation
 Adopt a faceted approach.
 Reuse existing (especially internal) vocabularies for as
many of the facets as possible.
 Plan on doing full-custom “Content Type” and “Topic”
taxonomies.
Taxonomy Strategies LLC The business of organized information
14
Free sources for 8 common taxonomies
Taxonomy
Definition
Potential Sources
Organization
Organizational structure.
SP 800-87, U.S. Government Manual, Your
organizational structure, etc.
Content Type
Structured list of the various types
of content being managed or used.
Dublin Core Type Vocabulary, AGLS Document
Type, Your records management policy, etc.
Industry
Broad market categories such as
lines of business, life events, or
industry codes.
SIC, NAICS, Your market segments, etc.
Location
Place of operations or
constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics
Div, US Postal Service, Your sales regions, etc.
Business
Activity
Business activities or functions
performed to accomplish mission
and goals.
Federal Enterprise Architecture Business
Reference Model, Enterprise ontology, Your
business functions, etc.
Topic
Business topics relevant to your
mission & goals.
Federal Register Thesaurus, NAL Agricultural
Thesaurus, Your research areas, etc.
Audience
Subset of constituents to whom a
piece of content is directed or is
intended to be used by.
GEM, ERIC Thesaurus, IEEE LOM, Your
psycho-graphics or personas, etc.
Products &
Services
Names of products/programs and
services.
ERP system, Your products and services, etc.
Taxonomy Strategies LLC The business of organized information
15
Typical product catalog:
A-Z, then idiosyncratic categories
Taxonomy Strategies LLC The business of organized information
16
How to analyze existing product catalog categories:
Principles and priorities
Preparing a product catalog for facet browsing (aka Guided
Navigation) requires a category hierarchy and additional attributes.
Principles
1. Categories and subcategories that could be swapped are candidates for
conversion to attributes.
2. Repeated lists of subcategories signal a possible need for an attribute.
3. The number of attributes should not exceed six or seven, so not all attribute
candidates should be used.
• Avoid selecting strongly correlated attributes, such as “Weight” and “Shipping
Weight”.
Priorities
1. Choose Categories that apply to many products, over those with few
products.
2. Choose Attributes that apply to many Categories over those that apply only
to very few categories.
Taxonomy Strategies LLC The business of organized information
17
Product categories example: Wireless carrier
Products
Accessories
Content
Phones
Services
Batteries
Cases
Chargers
Data
Hands-Free
Headsets
Miscellaneous
Purchased
Subscription
Taxonomy Strategies LLC The business of organized information
Versatile Phones
Smart Devices
Basic Phones
Prepaid Phones
International Only
Phones
Mobile Broadband Cards
Conferencing
Internet / Data
Landline Phone
Network &
Roaming
Relay Services
Solutions
Wireless Data
18
Product attributes example: Digital cameras in an
electronics catalog
Resolution
 Types of attributes
 Generic attributes
– Brand/Product Family/Model
– Price Range
– Usually Ships
 Merchandising attributes
– Usage (E-mail, Internet Browsing, Programming, …)
– Segment (Home, Business, Education, Government …)
– Region & Country
– Most Popular
– New
– Related Products
 Specialized attributes
– Capacity (Battery; Memory; MB; GB; BPS, …)
– Resolution (DPI; Megapixels; XGA, XGA, UXGA, …)
– Size (Display; Screen; ...)
– Standard (a, b, g, n, …; scsi, ata, sata, eide, …; dimm, simm,
…)
– Type (Camera; Battery; Display; Printer; Server; Storage;
Switch; …)
Taxonomy Strategies LLC The business of organized information
3 Megapixels (4)
4 Megapixels (5)
5 Megapixels (27)
6-8 Megapixels (21)
Brand
Canon (15)
Fuji (10)
Kodak (17)
Nikon (8)
Olympus (9)
Type
Point & Shoot (25)
Digital SLR (10)
Packages (5)
Price Range
$100-250 (5)
$250-500 (16)
$500-1000 (19)
More than $1000 (3)
19
Faceted taxonomy theory & practice
 How many terms are needed to provide sufficient
granularity? Not as many as you think!
 Post-coordinate indexing allows several simple controlled
vocabularies to be combined, rather than using a single
large pre-coordinated vocabulary.
Taxonomy Strategies LLC The business of organized information
20
The power of faceted taxonomy
 4 independent categories of 10
nodes each have the same
discriminatory power as one
hierarchy of 10,000 nodes (104)
 Easier to maintain
 Easier to tag by content authors
 Can be easier to navigate
Audience
Health
Industry
Advocacy
Contractors &
Grantees
Environmental
Professionals
Federal
Facilities
General Public
Industry
Kids
Researchers &
Scientists
Small Business
Students
Advisory
Exposure
Food Safety
Health
Assessment
Health Effect
Health Risk
Occupational
Health
Pesticide
Effects
Sun Protection
Toxicity
Agriculture &
Cattle
Automobile
Repair
Chemical
Dry Cleaning
Electronics &
Computer
Energy
Extractive
Industries
Food
Processing
Leather
Tanning &
Finishing
Metal Finishing
Substance
Allergen
Biological
Contaminant
Carcinogen
Chemical
Explosive
Liquid Waste
Microorganism
Ozone
Pesticide
Radioactive
Waste
 It’s more effective to increase the
number of facets, than to
increase the number of terms
per facet.
Taxonomy Strategies LLC The business of organized information
21
Automatically created taxonomies
 Documents can be ‘clustered’
based on similarities and
differences.
 Problems:
 Typically only a single
hierarchy
 No overall plan
 Results hard for people to
navigate
What does “North” mean on this map?
Taxonomy Strategies LLC The business of organized information
22
Automatic taxonomy construction software
 Software can scan large quantities of
content and extract statistically significant
words and phrases.
 Example:
 Archive of 10 publications analyzed for
topics related to “copyright.”
 Software does a poor job of
 De-duplication.
 Turning significant words and phrases
into a larger structure.
 Discriminating between “gold” and
“garbage.”
 Software is good for
 Getting an understanding of the key noun
phrases in a large collection.
 Providing test cases for evaluating a
taxonomy.
Source: Sample data courtesy of nStein.
Taxonomy Strategies LLC The business of organized information
23
Most popular flickr tags on 20 Feb 2007
http://www.flickr.com/photos/tags/
Sort flickr categories into 5 or fewer
groups. Then label each group.
Taxonomy Strategies LLC The business of organized information
24
Taxonomy exercise—
Facet grouping
 Universal taxonomy facets
 By location (spatially)
 By time (chronologically)
 By type (genre)
 By physical properties (size, color, shape, etc.)
 By subject (topic)
Richard Saul Wurman. Information Architects (1996)
Taxonomy Strategies LLC The business of organized information
25
Taxonomy exercise— Facet grouping
Sort flickr categories
into 5 or fewer groups.
Then label each group.
Taxonomy Strategies LLC The business of organized information
26
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
27
Business case and motivations for taxonomies
 How are we going to use content, metadata, and
taxonomies in applications to obtain business benefits?
Taxonomy Strategies LLC The business of organized information
28
What technology analysts have said:
Add metadata to search on!
 “Adding metadata to unstructured content allows it to be managed
like structured content. Applications that use structured content
work better.”
 “Enriching content with structured metadata is critical for
supporting search and personalized content delivery.”
 “Content that has been adequately tagged with metadata can be
leveraged in usage tracking, personalization and improved
searching.”
 “Better structure equals better access: Taxonomy serves as a
framework for organizing the ever-growing and changing information
within a company. The many dimensions of taxonomy can greatly
facilitate Web site design, content management, and search
engineering. If well done, taxonomy will allow for structured Web
content, leading to improved information access.”
Taxonomy Strategies LLC The business of organized information
29
Fundamentals of taxonomy ROI
 Tagging content using a taxonomy is a cost, not a benefit.
 There is no benefit without exposing the tagged content
to users in some way that cuts costs or improves
revenues.
 Putting taxonomy into operation requires UI changes
and/or backend system changes, as well as data
changes.
 You need to determine those changes, and their costs, as
part of the ROI.
Taxonomy Strategies LLC The business of organized information
30
Product utilization: Taxonomy compared to search
 Conversion rate increases.
 HomeDepot.com – Double digit increase.
 1-800-Flowers.com – More than a 10% increase.
 Otto Group (Kaleidoscope, Freemans, Grattan, and lookagain
catalogs) – 130% increase.
 Lift in average order size.
Taxonomy Strategies LLC The business of organized information
31
Product catalog: Taxonomy compared to search
Benefit:
Increased conversion rate &
revenue lift
Web sales net income
$
Increased conversion rate
30%
$
Order size lift
Potential revenue increase per year
Taxonomy Strategies LLC The business of organized information
80,000,000
24,000,000
10%
$
8,000,000
$
32,000,000
32
Usability research: Taxonomy compared to search
 “We found that users preferred a browsing oriented
interface for a browsing task, and a direct search
interface when they knew precisely what they wanted.”
Marti Hearst (and others)
 “The category interface is superior to the list interface in
both subjective and objective measures.”
Hao Chen & Susan Dumais
Taxonomy Strategies LLC The business of organized information
33
Usability research: Taxonomy compared to search
Median Search Time in
Seconds
Category is
36% faster
Category is
48% faster
140
120
100
80
60
40
20
0
Category
Source: Chen & Dumais
Taxonomy Strategies LLC The business of organized information
List
In top 20 results
Not in top 20 results
34
Time saved: Taxonomy compared to search
1 hour per day searching x 36% faster = 22 minutes
each day
22 minutes x 250 working days per year = 5500 minutes
or 92 hours per year
Taxonomy Strategies LLC The business of organized information
35
Time saved: Taxonomy compared to search
Benefit:
Increase service efficiency
Number of call center calls per month
50,000
Average cost per call
$
Call response costs per month
$ 1,000,000
Total call response costs per year
$12,000,000
Percentage of self-serviced calls due to
improved information browsing
Service costs savings per year
Taxonomy Strategies LLC The business of organized information
20
30%
$ 3,600,000
36
Trusted advisers: Taxonomy avoids costs
 “The amount of time wasted in futile searching for vital
information is enormous, leading to staggering costs …”
Sue Feldman,
 Sun’s usability experts calculated that 21,000 employees
were wasting an average of six minutes per day due to
inconsistent intranet navigation structures. When lost
time was multiplied by staff salaries, the estimated
productivity loss exceeded $10M per year—about $500
per employee per year.
Jakob Nielsen, useit.com
Taxonomy Strategies LLC The business of organized information
37
Knowledge workers spend up to 2.5 hours
each day looking for information …
Communicating
Searching
Creating
… But find what they are looking for only 40% of
the time.
Source: Kit Sims Taylor
Taxonomy Strategies LLC The business of organized information
38
Knowledge workers spend more time re-creating existing
content than creating new content
Communicating
Recreating
existing
content
25%
Searching
Creating
new
content
8%
Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
Taxonomy Strategies LLC The business of organized information
39
Cost saved by not recreating content
Benefit:
Increase in productivity
Number of employees
100
Average employee salary
$
Employee costs per year
$8,000,000
Increase in productivity from not recreating content
Employee cost savings per year
Taxonomy Strategies LLC The business of organized information
80,000
25%
$2,000,000
40
Business case summary
1. Classifications and classification-like schemes are
being used to facilitate information seeking in the
workplace, and on the web.
2. Users take advantage (and prefer) this type of
scheme (faceted navigation) when it is made
available in the user interface.
3. Hierarchical or facet navigation can be guided by the
User Interface.
4. Facet navigation is best combined with keyword
searching. E.g., keyword search followed by faceted
navigation of results.
Taxonomy Strategies LLC The business of organized information
41
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
42
Taxonomy requires a business processes
 Taxonomies must change, gradually, over time if they are
to remain relevant.
 Maintenance processes need to be specified so that the
changes are based on rational cost/benefit decisions.
Taxonomy Strategies LLC The business of organized information
43
Taxonomy governance can be viewed as a
standards process
 Taxonomy must evolve, but in a predictable way.
 Team structure, with an appeals process
 Taxonomy stewardship is part-time role at most organizations.
 Team needs to make decisions based on costs and benefits.
 Documentation and educational materials.
 Comment-handling responsibilities (part of error-
correction process)
 Issue Logs.
 Release Schedule.
Taxonomy Strategies LLC The business of organized information
44
Taxonomy governance: Change process overview
2: Taxonomy Team decides
when to update CV
2: NASA Taxonomy Team
snapshots
Taxonomy
Facets
Site Search
Tool
Site Search Tool
decides when to
update snapshots of
external CVs
CV Sources
CV
Consumers
Portal
Portal
Subject
Codes
Codes
Taxonomy
Working
Copies
of CVs,Tool
maintain in
Taxonomy Tool
Working
Papers
Project Archives
NASA
Expertise
Competencies
CVsOther
from other
NASA
Sources
Internal
3: 3:
Team
adds
value
via
Team
adds
value
to
definitions,
snapshotssynonyms,
through
definitions,
synonyms,
classification
rules,
classification
rules,
training materials,
etc.
training materials, etc.
External
External
Standard
Vocabularies
Standard
Internally
Internally
Created
CVs
1: External controlled
vocabularies (CVs) change
on their own schedule
’
Web CMS
4: Updated
versions
of CVsof
4: Updated
versions
to
CVspublished
to Consumers
consumers
DMS’
’
DAM
Tagging Tool
Metatagging
Tool
Created
Taxonomy
NASA
Taxonomy
Governance
Governance
Environment
Search
Search UI UI
Environment
CV = Controlled Vocabulary
Taxonomy Strategies LLC The business of organized information
45
Who should build the taxonomy?
 The taxonomy (and metadata specification) should be
produced by a cross-functional team which includes
business, technical, information management, and
content creation stakeholders.
 The team should plan on maintaining the taxonomy as
well as building it.
 Maintenance will not (usually) be anyone’s full-time job.
 Exact mix of people on team will change.
 It should be built in an iterative fashion, with more content
and broader review for each iteration.
Taxonomy Strategies LLC The business of organized information
46
Taxonomy governance: Generic team charter
 Taxonomy Team is responsible for maintaining:
 The Taxonomy, a multi-faceted classification scheme.
 Associated taxonomy materials, such as:
– Editorial Style Guides.
– Taxonomy Training Materials.
– Metadata Standard.
 Team rules and procedures for change management.
 Taxonomy Team will consider costs and benefits of
suggested changes.
 Taxonomy Team will:
 Manage relationship between providers of source vocabularies
and consumers of the Taxonomy.
 Identify new opportunities for use of the Taxonomy across the
enterprise to improve information management practices.
 Promote awareness and use of the Taxonomy.
Taxonomy Strategies LLC The business of organized information
47
Taxonomy governance team:
Generic roles
 Keeps committee on track with larger business objectives.
 Balances cost/benefit issues to decide appropriate levels of
effort.
 Obtains needed resources if those on committee can’t
accomplish a particular task.
Business
Lead
Technical
Specialist
 Estimates costs of proposed changes in terms of amount of
data to be retagged, additional storage and processing burden,
software changes, etc.
 Helps obtain data from various systems.
Taxonomy
Specialist
 Committee’s liaison to content creators.
Content
Specialist
 Suggests potential taxonomy changes based on analysis of
Content
Owners
 Estimates costs of proposed changes in terms of editorial
process changes, additional or reduced workload, etc.
query logs, indexer feedback.
 Makes edits to taxonomy, installs into system with aid of IT
specialist.
 Reality check on process change suggestions.
Taxonomy Strategies LLC The business of organized information
48
Where taxonomy changes come from
Firewall
Application
UI
Tagging
UI
Content
Application
Logic
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
Query log
analysis
End User
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content.
Tagging Staff
Taxonomy Editor
Team Considerations
1. Business goals.
2.
experience
Changes in user experience.
3. Retagging cost.
Taxonomy Team
Taxonomy Strategies LLC The business of organized information
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
49
Taxonomy maintenance processes
 Different organizations will need to consider their own
change processes.
 Organization 1: A custodian is responsible for the content, but
checks facts with department heads before making changes.
 Organization 2: Analysts suggest changes, editors approve,
copyeditors verify consistency.
 Organization 3: Marketing reps ask for a change, taxonomy editor
makes demo, web representative approves it.
 Change process MUST also consider cost of
implementing the change




Retagging data.
Reconfiguring auto-classifier.
Retraining staff.
Changes in user expectations.
Taxonomy Strategies LLC The business of organized information
50
Taxonomy maintenance workflow
Taxonomy Tool
Problem?
Yes
Suggest new
name/category
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Analyst
Editor
Taxonomy Strategies LLC The business of organized information
Copywriter
Sys Admin
51
Sample taxonomy editor: Data Harmony
Hierarchy
Browser
Standard
Term
Info
Taxonomy Strategies LLC The business of organized information
52
Taxonomy editing tools vendors
Ability to Execute
high
Most popular taxonomy
editor is MS Excel
An immature area– No
vendors are in upperright quadrant!
low
High functionality
/high cost
products ($100K+)
Niche Players
MultiTes is widely used,
cheap with functionality
Completeness of Vision
Taxonomy Strategies LLC The business of organized information
Visionaries
53
Taxonomy maturity model
 Taxonomy governance processes must fit the organization.
 As consultants, we notice different levels of maturity in the business
processes around content management, taxonomy, and metadata.
 Honestly assess your organization’s metadata maturity in order to
design appropriate governance processes.
 We are starting to define a maturity model, similar to the Software
Capability Maturity Model (CMM)
 Initial: Ad hoc, each project begins from scratch.
 Repeatable: Procedures defined and used, but not standardized across
organization or are misapplied to projects.
 Defined: Standard processes are tailored for project needs. Strategic
training for long-range goals is in place.
 Managed: Projects managed using quantitative quality measures.
Process itself is measured and controlled.
 Optimizing: Continual process improvement. Extremely accurate project
estimation.
Taxonomy Strategies LLC The business of organized information
54
Purpose of maturity model
 Estimating the maturity of an organization’s information
management processes tells us:
 How involved the taxonomy development and maintenance
process should be
– Overly sophisticated processes will fail.
 What to recommend as first steps.
 Maturity is not a goal, it is a characterization of an
organization’s methods for achieving particular goals.
 Mature processes have expenses which must be justified
by consequent cost savings or revenue gains.
 IT Maturity may not be core to your business.
Taxonomy Strategies LLC The business of organized information
55
Taxonomy maturity scorecard
Initial
Repeatable
Defined
Managed
Optimizing
Organizational Structure
Executive Sponsorship
*
Budgeting
*
Hiring & Training
*
Quality Assurance
Manual Processes
*
Automated Processes
*
1
Project Management
Estimating & Scheduling
*
Cost Control
*
Project Methodology
*
2
Design and Execution
Planning
*
Design Excellence
*
Development Maturity
*
1 – X is starting to examine search query logs, which is an important first step in improving search. But this is only
an isolated example.
2 – IT has a project methodology they are trying to use across all projects. But not all business units have project
methodologies.
Taxonomy Strategies LLC The business of organized information
56
Taxonomy governance self-assessment
Background
2. Does the search engine index more than 4 repositories around
1. Rate your organization’s overall taxonomy maturity from 1 to
the organization?
10.
Immature
1
2
3
4
5
6
7
8
9
10
Mature
cost/benefit analysis, or because they are easy to do with the
current applications and tools? Cost/Benefit Easy
2. What type of change was most recently made to your
organization’s taxonomy management environment?
Functionality
Standards
Tools
People
Data Quality
2. What is the area for your organization’s taxonomy
management environment improvement?
Functionality
Standards
Tools
People
3. Are system features and metadata fields added based on
4. Are applications and tools acquired after requirements have
been analyzed, or are major purchases sometimes made to
use up year-end money?
Requirements Year-End
5. Are there hiring and training practices for metadata and
Data Quality
taxonomy positions?
Yes
No
If there is training, describe it briefly.
Basic
1. Is there a process in place to examine search query logs?
Yes
No
2. Is there an organization-wide metadata standard, such as the
“Dublin Core”, for use by search tools? Yes
No
Advanced
1. Are there established qualitative and quantitative measures of
metadata quality?
Intermediate
Yes
No
If there are measures, describe them briefly.
1. Is there an ongoing data cleansing procedure to look for any
redundant, obsolete or trivial content (ROT)?
Yes
No
If there is a process, describe it briefly.
Taxonomy Strategies LLC The business of organized information
2. Can the CEO explain the return on investment (ROI) for
content management, search and metadata?
Yes
No
57
2005 Maturity survey: Search practices
n=87
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Search Box in standard place on all web pages.
20% (12)
11% (7)
62% (38)
2% (1)
5% (3)
Search engine indexes multiple repositories in addition
to web sites.
25% (15)
21% (13)
44% (27)
2% (1)
8% (5)
Spell Checking.
31% (19)
18% (11)
38% (23)
0% (0)
13% (8)
Synonym Searching.
41% (25)
23% (14)
30% (18)
0% (0)
7% (4)
Search results grouped by date, location, or other
factors in addition to simple relevance score.
37% (22)
20% (12)
37% (22)
0% (0)
7% (4)
Queries are logged and the logs are regularly examined
31% (19)
25% (15)
31% (19)
5% (3)
8% (5)
Common queries identified, 'best' pages for those
queries are found, and search engine configured to
return them at the top. (Best Bets)
46% (28)
25% (15)
21% (13)
0% (0)
8% (5)
Advanced computation of relevance based on data in
addition to the text of the document.
43% (26)
16% (10)
25% (15)
0% (0)
16% (10)
A faceted search tool, such as Endeca, has been
implemented for the organization's external site or
product catalog search.
68% (41)
7% (4)
10% (6)
0% (0)
15% (9)
A faceted search tool, such as Endeca, has been
implemented for the organization's internal website(s)
or portal.
57% (34)
15% (9)
17% (10)
0% (0)
12% (7)
Taxonomy Strategies LLC The business of organized information
58
2005 Maturity survey: Metadata practices
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Metadata standards are developed for the needs of
each system with no overall attempt to unify them.
22% (13)
12% (7)
37% (22)
20% (12)
10% (6)
An Organization-wide metadata standard exists and
new systems consider it during development.
37% (22)
37% (22)
20% (12)
0% (0)
7% (4)
The Organization-wide metadata standard is based on
the Dublin Core.
52% (30)
16% (9)
21% (12)
0% (0)
12% (7)
Multiple repositories comply with metadata standard.
52% (31)
20% (12)
17% (10)
0% (0)
12% (7)
A Cataloging Policy document exists to teach people
how to tag data in compliance with organizational
metadata standard.
48% (29)
20% (12)
20% (12)
0% (0)
12% (7)
The Cataloging Policy document is revised periodically.
48% (29)
15% (9)
17% (10)
0% (0)
20% (12)
A centralized metadata repository exists to aggregate
and unify metadata from disparate sources.
57% (34)
17% (10)
17% (10)
0% (0)
10% (6)
15% (9)
12% (7)
61% (36)
3% (2)
8% (5)
Metadata is generated automatically by software.
38% (23)
18% (11)
27% (16)
2% (1)
15% (9)
Metadata is generated automatically, then reviewed
manually for correction.
48% (29)
18% (11)
17% (10)
2% (1)
15% (9)
n=87
Metadata is manually entered into web forms.
Taxonomy Strategies LLC The business of organized information
59
2005 Maturity survey: Taxonomy practices
Not current
practice
Being
developed
In practice
Former
practice
NA or
Unknown
Org Chart Taxonomy - One based primarily on the
structure of the organization.
36% (21)
10% (6)
34% (20)
5% (3)
15% (9)
Products Taxonomy - One based primarily on the
products and/or services offered by the organization.
37% (22)
10% (6)
32% (19)
5% (3)
15% (9)
Content Types Taxonomy - One based primarily on the
different types of documents.
28% (16)
21% (12)
40% (23)
5% (3)
7% (4)
Topical Taxonomy - One based primarily on topics of
interest to the site users.
20% (12)
36% (21)
34% (20)
3% (2)
7% (4)
Faceted Taxonomy - One which uses several of the
approaches above.
32% (19)
29% (17)
34% (20)
0% (0)
5% (3)
The Taxonomy, or a portion of it, was licensed from an
outside taxonomy vendor.
75% (44)
3% (2)
14% (8)
0% (0)
8% (5)
The Taxonomy follows a written 'style guide' to ensure
its consistency over time.
47% (28)
22% (13)
20% (12)
0% (0)
10% (6)
The Taxonomy is maintained using a taxonomy editing
tool other than MS Excel.
35% (21)
17% (10)
40% (24)
2% (1)
7% (4)
The Taxonomy was validated on a representative
sample of content during its development.
28% (17)
22% (13)
33% (20)
3% (2)
13% (8)
A Roadmap for the future evolution of the Taxonomy
has been developed.
38% (23)
40% (24)
13% (8)
0% (0)
8% (5)
n=87
Taxonomy Strategies LLC The business of organized information
60
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
61
Taxonomy testing methods
Method
Process
Who
Requires
Validation
Walk-thru
Show &
explain
 Taxonomist
 SME
 Team
 Rough
taxonomy
 Approach
 Appropriateness to task
Walk-thru
Check
conformance
to editorial
rules
 Taxonomist
 Draft
taxonomy
 Editorial
Rules
 Consistent look and feel
Usability
Testing
Contextual
analysis (card
sorting,
scenario
testing, etc.)
 Users
 Rough
taxonomy
 Tasks &
Answers
 Tasks are completed
successfully
 Time to complete task is
reduced
User
Satisfaction
Survey
 Users
 Rough
Taxonomy
 UI Mockup
 Search
prototype
Reaction to taxonomy
Reaction to new interface
Reaction to search results
Tagging
Samples
Tag sample
content with
taxonomy
 Taxonomist
 Team
 Indexers
 Sample
content
 Rough
taxonomy
(or better)
Content ‘fit’
Fills out content inventory
Training materials for people &
algorithms
Taxonomy Strategies LLC The business of organized information
62
Walk-through method—
Show & explain
ABC Computers.com
Content
Type
Competency
Industry
Service
Award
Case Study
Contract &
Warranty
Demo
Magazine
News & Event
Product
Information
Services
Solution
Specification
Technical Note
Tool
Training
White Paper
Other Content
Types
Business &
Finance
Interpersonal
Development
IT Professionals
Technical
Training
IT Professionals
Training &
Certification
PC Productivity
Personal
Computing
Proficiency
Banking &
Finance
Communications
E-Business
Education
Government
Healthcare
Hospitality
Manufacturing
Petro-chemicals
Retail /
Wholesale
Technology
Transportation
Other Industries
Assessment,
Design &
Implementation
Deployment
Enterprise
Support
Client Support
Managed
Lifecycle
Asset Recovery
& Recycling
Training
Taxonomy Strategies LLC The business of organized information
Product
Family
Desktops
MP3 Players
Monitors
Networking
Notebooks
Printers
Projectors
Servers
Services
Storage
Televisions
Other Brands
Audience
Line of
Business
RegionCountry
All
Business
Employee
Education
Gaming
Enthusiast
Home
Investor
Job Seeker
Media
Partner
Shopper
First Time
Experienced
Advanced
Supplier
All
Home & Home
Office
Gaming
Government,
Education &
Healthcare
Medium &
Large
Business
Small Business
All
Asia-Pacific
Canada
EMEA
Japan
Latin America &
Caribbean
United States
63
Walk-through method—
Editorial rules consistency check















Abbreviations
Ampersands
Capitalization
General…, More…, Other…
Languages & character sets
Length limits
Multiple parents
Plural vs. singular form
Scope notes
Serial comma
Sources of terms
Spaces
Synonyms & acronyms
Term order (Alphabetic or …)
Term label order (Direct vs.
inverted)
Rule Name
Abbreviations
Abbreviations, other than colloquial
terms and acronyms, shall not be used
in term labels.
Example:
Public Information
NOT:
Public Info.
Ampersands
The ampersand [&] character shall be
used instead of the word ‘and’.
Example:
Licensing & Compliance
NOT:
Licensing and Compliance
Capitalization
Title case capitalization shall be used.
Example: Customer Service
NOT:
CUSTOMER SERVICE
NOT:
Customer service
NOT:
customer service
General…,
More…,
Other…
The term labels “General…”, “More…”,
and “Other…” shall be used for
categories which contain content items
that are not further classifiable.
Example:
“Other Property”
“Other Services”
“General Information”
“General Audience”
…
…
…
Taxonomy Strategies LLC The business of organized information
Editorial Rule
64
Task-based testing*
* Based on Donna Maurer’s usability
work with the Australian government
 15 representative questions were selected
 Perspective of various organizational units
 Most frequent website searches
 Most frequently accessed website content
 Correct answers to the questions were agreed in advance by team.
 15 users were tested
 Did not work for the organization
 Represented target audiences
 Testers were asked “where would you look for …”
 “under which facet… Topic, Commodity, or Geography?”
 Then, “… under which category?”
 Then, “…under which sub-category?”
 Tester choices were recorded
 Testers were asked to “think aloud”
 Notes were taken on what they said
 Pre- and post questions were asked
 Tester answers were recorded
Taxonomy Strategies LLC The business of organized information
65
Task-based testing—
Representative questions
How much cotton is imported from China?
What are the impacts of “mad cow" disease on U.S. meat production, sales?
What is the average farm income level in your state?
How much of our diet comes from fast food?
How many people receive WIC benefits (Special Supplemental Nutrition
Program for Women, Infants, and Children)?
6. How much acreage is planted to genetically engineered corn?
7. What is the cost of foodborne illness in the United States?
8. What part of food costs go to farmers, retailers?
9. Which States produce the most tobacco?
10. What percentage of farms in the United States are small farms?
11. What are the costs and benefits associated with providing more traceability in
the U.S. food supply?
12. How many people in America don’t get enough to eat?
13. What is behind the trade balance (surplus or deficit) in agricultural goods?
14. What is the extent of conservation compliance? How does that impact
farmer's decisions?
15. What are the impacts of foreign trade restrictions on U.S. farmers, U.S. food
prices?
1.
2.
3.
4.
5.
Taxonomy Strategies LLC The business of organized information
66
Task-based testing—
Closed card sorting
3. What is the average
farm income level in
your state?
1. Topics
2. Commodities
3. Geographic Coverage
1.
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
Topics
Agricultural Economy
Agriculture-Related Policy
Diet, Health & Safety
Farm Financial
Conditions
Farm Practices &
Management
Food & Agricultural
Industries
Food & Nutrition
Assistance
Natural Resources &
Environment
Rural Economy
Trade & International
Markets
Taxonomy Strategies LLC The business of organized information
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
Farm Financial
Conditions
Costs of Production
Commodity Outlook
Farm Financial
Management &
Performance
Farm Income
Farm Household
Financial Well-being
Lenders & Financial
Markets
Taxes
67
Task based testing— Card sort analysis
Find-it Tasks
User 1
User 2
User 3
User 4
User 5
1. Cotton
Cotton
Cotton
Asia
Cotton
Cotton
2. Mad cow
Cattle
Food Safety
Cattle
Cattle
Cattle
3. Farm income
Farm Income
Farm Income
US States
Farm Income
Farm Income
4. Fast food
Food
Consumption
Diet Quality &
Nutrition
Food
Expenditures
Diet Quality &
Nutrition
Diet Quality &
Nutrition
5. WIC
WIC Program
WIC Program
WIC Program
WIC Program
WIC Program
6. GE Corn
Corn
Corn
Corn
Corn
Corn
7. Foodborne illness
Foodborne
Disease
Foodborne
Disease
Consumer Food
Safety
Foodborne
Disease
Foodborne
Disease
Retailing &
Wholesaling
8. Food costs
Food Prices
Market Structure
Market Analysis
Food
Expenditures
9. Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
10. Small Farms
Farm Structure
Farm Structure
Farm Structure
Farm Structure
Farm Structure
Food Safety
Policy
Food Prices
11. Traceability
Food System
Labeling Policy
Food Safety
Innovations
12. Hunger
Food Security
Food Security
Food Security
Food Security
Food Security
13. Trade balance
Commodity
Trade
Trade & Intl
Markets
Commodity
Trade
Market Analysis
Commodity
Trade
14. Conservations
Cropping
Practices
Conservation
Policy
Conservation
Policy
Conservation
Policy
Conservation
Policy
15. Trade restrictions
Trade Policy
Food Safety &
Trade
WTO
Market Analysis
Commodity
Trade
Taxonomy Strategies LLC The business of organized information
68
Task based testing—
Card sort results
 In 80% of the trials users looked for information under the
categories that we expected them to look for it.
 Breaking-up topics into facets makes it easier to find
information, especially information related to
commodities.
Taxonomy Strategies LLC The business of organized information
69
Task based testing—
Card sort results
Test Questions
%
Correct
%
Agree
1. Cotton
91%
82%
2. Mad cow
73%
64%
100%
55%
91%
73%
5. WIC
100%
100%
6. GE corn
100%
100%
7. Foodborne illness
82%
82%
8. Food costs
55%
27%
100%
100%
10. Small farms
91%
91%
11. Traceability
36%
18%
100%
73%
13. Trade balance
36%
64%
14. Conservation
91%
91%
15. Trade restrictions
55%
36%
3. Farm income
4. Fast food
9. Tobacco
12. Hunger
Taxonomy Strategies LLC The business of organized information
Possible change required.
Change required.
Policy of “Traceability” needs to be clarified.
Use quasi-synonyms.
On these trials, only 50% looked in the right
category, & only 27-36% agreed on the
category.
Possible error in categorization of this
question because 64% thought the answer
should be “Commodity Trade.”
70
Task-based testing—
User satisfaction survey
 Was it easy, medium or difficult to choose the appropriate
Topic?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Commodity?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Geographic Coverage?
– Easy
– Medium
– Difficult
Taxonomy Strategies LLC The business of organized information
71
User satisfaction survey—
Results
More Difficult
Easier
Difficult
1.50
-->
1.00
Easy
2.00
0.50
Topic
Commodity
Geography
Facet
Taxonomy Strategies LLC The business of organized information
72
User interface survey—
Which search UI is ‘better’?
 Criteria
 User satisfaction
 Success completing tasks
 Confidence in results
 Fewer dead ends
 Methodology
 Design tasks from specific to




general
Time performance
Calculate success rates
Survey subjective criteria
Pay attention to survey
hygiene:
–
–
–
Participant selection
Counterbalancing
T-scores
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
73
User interface survey—
Results (1)
Which Interface would you rather use for these tasks?
Find images of roses
Google-like
Baseline
Faceted
Category
15
16
Find all works from a certain period
2
30
Find pictures by 2 artists in the same media
1
29
…
Overall assessment:
Google-like
Baseline
Faceted
Category
More useful for your usual tasks
4
28
Easiest to use
8
23
Most flexible
6
24
28
3
Helped you learn more
1
31
Overall preference
2
29
More likely to result in dead-ends
…
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
74
User interface survey—
Results (2)
9
8
7
6
5
4
3
2
1
0
y
s
a
E
7.6
7.7
7.2
6.7
6.0
6.3
4.7
5.8
7.8
7.4
6.0
5.5
4.8
4.0
4.6
3.5
to
e
Us
m
Si
e
pl
e
Fl
le
b
i
x
ou
i
d
e
T
Google-like Baseline
Faceted Category
Taxonomy Strategies LLC The business of organized information
s
In
re
te
in
st
g
Ea
sy
to
ow
r
B
se
le
b
a
oy
j
En
O
rw
e
v
lm
e
h
g
in
Source: Yee, Swearingen, Li, & Hearst
75
Tagging samples—
How many items?
Goal
Illustrate metadata schema
Number of
Items
1-3
Criteria
Random (excluding junk)
Develop training
documentation
10-20
Show typical & unusual
cases
Qualitative test of small
vocabulary (<100 categories)
25-50
Random (excluding junk)
3-10X
number of
categories
Use computer-assisted
methods when more than
10-20 categories. Preexisting metadata is the
most meaningful.
Quantitative test of
vocabularies *
* Quantitative methods require large amounts of tagged content. This requires
specialists, or software, to do tagging. Results may be very different than how
“real” users would categorize content.
Taxonomy Strategies LLC The business of organized information
76
Tagging samples—
Manually tagged metadata sample
Attribute
Values
Title
Jupiter’s Ring System
URL
http://ringmaster.arc.nasa.gov/jupiter/
Description
Overview of the Jupiter ring system. Many images,
animations and references are included for both the
scientist and the public.
Content Types
Web Sites; Animations; Images; Reference Sources
Audiences
Educators; Students
Organizations
Ames Research Center
Missions & Projects
Voyager; Galileo; Cassini; Hubble Space Telescope
Locations
Jupiter
Business Functions
Scientific and Technical Information
Disciplines
Planetary and Lunar Science
Time Period
1979-1999
Taxonomy Strategies LLC The business of organized information
77
Tagging samples—
Spreadsheet for tagging 10’s-100’s of items
1) Clickable URLs for sample content
2) Review small sample and describe
3) Drop-down for tagging (including
‘Other’ entry for the unexpected
4) Flag questions
Taxonomy Strategies LLC The business of organized information
78
Rough bulk tagging—
Facet demo (1)
 Collections: 4 content sources
 NTRS, SIRTF, Webb, Lessons Learned
 Taxonomy
 Converted MultiTes format into RDF for Seamark
 Metadata
 Converted from existing metadata on web pages, or
 Created using simple automatic classifier (string matching with
terms & synonyms)
 250k items, ~12 metadata fields, 1.5 weeks effort
 OOTB Seamark user interface, plus logo
Taxonomy Strategies LLC The business of organized information
79
Rough bulk tagging—
Facet demo (2)
Taxonomy Strategies LLC The business of organized information
80
Document distribution—
How evenly does it divide the content?
 Documents do not distribute uniformly across categories
 Zipf (1/x) distribution is expected behavior
 80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in
Library of Congress Database
Leading candidate for
splitting
Number of Records
350,000
300,000
250,000
Leading candidates
for merging
200,000
150,000
100,000
50,000
s
tic
St
at
is
bl
io
gr
ap
hy
Bi
er
at
ur
e
lit
itio
ns
Ju
ve
ni
le
Ex
hi
b
ct
io
n
Fi
ap
s
M
ca
ls
Pe
rio
di
og
ra
ph
y
Bi
Co
ng
re
ss
es
0
Top 10 Content Types
Taxonomy Strategies LLC The business of organized information
81
Document distribution—
How evenly does it divide the content?
 Methodology: 115 randomly selected URLs from corporate intranet
search index were manually categorized. Inaccessible files and ‘junk’
were removed.
 Results: Slightly more uniform than Zipf distribution. Above the curve
is better than expected.
Measured v Expected Intranet Content Type Distribution
25
# Documents
20
15
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Templates
Marketing &
Sales
Operations &
Internal
Communications
Manuals &
Learning
Materials
News & Events
People, Groups
& Places
0
Content Type
Taxonomy Strategies LLC The business of organized information
82
Document distribution— How does taxonomy
“shape” match that of content?
Background:
 Hierarchical taxonomies allow
comparison of “fit” between content
and taxonomy areas
Methodology:
 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2
terms per resource)
 Counts of terms and documents
summed within taxonomy hierarchy
Results:
 Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%)
 Mismatches between term% and
document% flagged
Term Group
%
Terms
%
Docs
Administrators
7.8
15.8
Community Groups
2.8
1.8
Counselors
3.4
1.4
Federal Funds Recipients and
Applicants
9.5
34.4
Librarians
2.8
1.1
News Media
0.6
3.1
Other
7.3
2.0
Parents and Families
2.8
6.0
Policymakers
4.5
11.5
Researchers
2.2
3.6
School Support Staff
2.2
0.2
Student Financial Aid Providers
1.7
0.7
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
Taxonomy Strategies LLC The business of organized information
83
Usability testing—
How intuitive (repeatable) are the categorizations (1)?
 Methodology: Closed Card Sort
 For alpha test of a grocery site
 15 Testers put each of 71 best-selling product types into one of 10
pre-defined categories
 Categories where fewer than 14 of 15 testers put product into
same category were flagged
Taxonomy Strategies LLC The business of organized information
84
Usability testing—
How intuitive (repeatable) are the categorizations (2)?
Taxonomy Strategies LLC The business of organized information
85
Usability testing—
How intuitive (repeatable) are the categorizations?
% of Testers
Cumulative % of
Products
With Poly-Hierarchy
15/15
54%
69%
14/15
70%
83%
13/15
77%
93%
12/15
83%
100%
11/15
85%
100%
<11/15
100%
100%
Taxonomy Strategies LLC The business of organized information
86
The #1 underused source of quantitative
information on how to improve your
taxonomy?
Query Logs & Click Trails
Taxonomy Strategies LLC The business of organized information
87
Query log & click trail examination—
Who are the users & what are they looking for?
 Only 30-40% of organizations regularly examine their
logs*.
 Sophisticated software available, but don’t wait.
 80% of value comes from basic reports
Taxonomy Strategies LLC The business of organized information
88
Query log & click trail examination—
Query log
UltraSeek Reporting
 Top queries
 Queries with no results
 Queries with no click-through
 Most requested documents
 Query trend analysis
 Complete server usage
summary
Taxonomy Strategies LLC The business of organized information
89
Query log & click trail examination—
Click trail packages
 iWebTrack
 NetTracker
 OptimalIQ
 SiteCatalyst
 Visitorville

 WebTrends
Taxonomy Strategies LLC The business of organized information
90
Summary—
Start a “Measure & Improve” mindset
 Taxonomy changes do not stand alone
 Search system improvements
 Navigation improvements
 Content improvements
 Process improvements
Taxonomy Strategies LLC The business of organized information
91
Benchmarking exercise
 What are 5 representative questions that your users ask or tasks
that your users do when using your application?
 Is it currently easy, medium or difficult to answer these questions or
accomplish these tasks?
Rating (Easy/
Medium/Difficult)
Taxonomy Strategies LLC The business of organized information
Questions or Tasks
92
Conclusion—
What is a good taxonomy?
 Incremental, extensible process that identifies and




enables owners, and engages stakeholders.
Quick implementation that provides measurable results
as quickly as possible.
A means to an end, and not the end in itself.
Not perfect, but it does the job it is supposed to do—such
as improving search and navigation.
Improved over time, and maintained.
Taxonomy Strategies LLC The business of organized information
93
Today’s agenda
9:00-9:10
9:10-9:15
9:15-9:45
9:45-10:00
10 min Introduction
5 min Warm-up exercise
30 min Taxonomy fundamentals: Building taxonomies
15 min Taxonomy exercise
10:00-10:30
30 min Taxonomy fundamentals: Taxonomy business case
10:30-11:00
30 min Tea Break
11:00-12:00
60 min Taxonomy governance
12:00-12:30
30 min Capabilities self-assessment
12:30-13:30
60 min Lunch
13:30-14:30
60 min Taxonomy benchmarking
14:30-14:45
15 min Benchmarking exercise
14:45-15:15
30 min Tea Break
15:15-16:15
60 min Content tagging
16:15-16:30
15 min Tagging exercise
16:30-17:00
30 min Q&A
Taxonomy Strategies LLC The business of organized information
94
Tagging Overview
 Tagging is better than the words that happen to occur in a
piece of content.
 All tagging is useful
 End user tagging
 Tagging by librarians
 Automated tagging by OS and algorithms
 Content should be tagged throughout its lifecycle, each
time the content is handled and used so that it accrues
value or its significance is diminished.
Taxonomy Strategies LLC The business of organized information
95
MS Office: File  Properties
Taxonomy Strategies LLC The business of organized information
96
Organize
Taxonomy Strategies LLC The business of organized information
97
What is social tagging?
 End user tagging
 Easy, intuitive tagging interfaces
 Almost instantaneous feedback
 Enables people to tag & re-tag content
 … in response to seeing their tags in context with other tags.
 Emergent categories
 Resembles open card sort process in which patterns emerge
 … rather than validating categories using closed card sorts.
Taxonomy Strategies LLC The business of organized information
98
Social tagging innovators
 flickr founders
 Caterina Fake
 Stewart Butterfield
 del.icio.us founder
 Joshua Schachter
 del.icio.us & flickr are now both part of Yahoo!
 As of April 2006 flickr had 130 million photos posted by 3
million registered users.
Taxonomy Strategies LLC The business of organized information
99
Four tagging rules for end users
Rule
Description
Use specific
terms
Apply the most specific terms when tagging
content. But do not tag every possible topic, just
the ones that are most important or best
characterize the content as a whole.
Use multiple
terms
Use as many terms as necessary to describe
overall What the content is about & Why it is
important. Do not over-tag.
Use appropriate
terms
Only fill-in the facets & values that make sense.
Not all facets apply to all content.
Consider how
content will be
used
Anticipate how the content will be searched for in
the future, & how to make it easy to find it.
Remember that search engines can only operate
on explicit information.
Taxonomy Strategies LLC The business of organized information
100
Agenda
 Content Tagging
 Tagging Interface
Taxonomy Strategies LLC The business of organized information
101
Requirements for a tagging interface
 Automated form fill-in (automatically fills in known data)
 Tagging precedents (see tags already assigned by









others)
Controlled vocabularies, e.g., with pull-down list
Multi-valued tags
Geo-tagging
Group tagging
Clean-up tag tools, e.g., alpha list
Batch editing
Share/Don’t share (Public/Private)
Identified owner (who can be emailed)
Almost immediate feedback, e.g., tag cloud
Taxonomy Strategies LLC The business of organized information
102
Form fill-in: Automatically filled-in known data
Taxonomy Strategies LLC The business of organized information
103
Form fill-in: Automatically filled-in known data
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Auto keyword &
summarization
Taxonomy Strategies LLC The business of organized information
104
Form fill-in: Automatically filled-in known data
Auto-categorization
Rules & pattern
matching
Parse & lookup
(recognize names)
Taxonomy Strategies LLC The business of organized information
105
Tagging precedents:
See tags assigned by others
Taxonomy Strategies LLC The business of organized information
106
Multi-valued group tagging
Taxonomy Strategies LLC The business of organized information
107
Group geo-tagging
Taxonomy Strategies LLC The business of organized information
108
Group geo-tagging
Taxonomy Strategies LLC The business of organized information
109
Clean up tag tools: Alpha list
Taxonomy Strategies LLC The business of organized information
110
Batch edit
Taxonomy Strategies LLC The business of organized information
111
Share or don’t share tagging
Taxonomy Strategies LLC The business of organized information
112
Bulk tagging
 ID collection of related content items by pattern or context
 Then, apply same attributes to all content items
Taxonomy Strategies LLC The business of organized information
113
Tag a folder
 Drag & drop content items into folder
 Then, content items inherit properties of folder
Taxonomy Strategies LLC The business of organized information
114
Workflow
 Approve & improve mindset
Create
Content
Add
Metadata
Review &
Improve
Taxonomy Strategies LLC The business of organized information
Publish
Review &
Improve
115
Interactive rewards
 Almost instantaneous exposure of tags in simple user
interfaces on the web provides positive reinforcement for
user tagging that simply did not exist before.
 For example,
 Most popular
 Tag clouds
 Alerts
Taxonomy Strategies LLC The business of organized information
116
Most popular
 Another example is most emailed from, e.g., the NY Times.
Taxonomy Strategies LLC The business of organized information
117
Tag cloud
Taxonomy Strategies LLC The business of organized information
118
Alerts
 New (content selected by date)
 Subscriptions (content selected by tags)
 Interest (content selected by other people)
 Individual (content selected for you by other people)
Taxonomy Strategies LLC The business of organized information
119
Taxonomy Strategies LLC
Is faceted indexing the future of
social tagging?
6-15 June 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Tagging exercise: Blog tagging (a)
ALA Tech Source. http://www.techsource.ala.org/blog/2007/04/google-buys-oclc-announces-new-products.html
Taxonomy Strategies LLC The business of organized information
121
Tagging exercise: Blog tagging (b)
HBSP. http://discussionleader.hbsp.com/davenport/2007/04/cause_and_effect_reporting_raw.html#comments
Taxonomy Strategies LLC The business of organized information
122
Tagging exercise: Taxonomy facets—definitions
Taxonomy Facets
Descriptions
Business activity
Use for common business function or activity such as
finance, marketing and sales.
Industry / Product
Use for content that is about or related to an industrial
sector or product such as construction equipment.
Geography
Use for content that is about a region, country or city.
Organization
Use for named organizations, brands and business
entities.
Person / Role
Use for named people and the roles people have in
organizations.
Content Type
Use for content genres such as letters, memos and
reports.
Audience
Use to indicate the intended audience.
Topic
Use for other business and associated topics that the
content is about or related to.
Taxonomy Strategies LLC The business of organized information
123
Tagging exercise: Taxonomy facets—values
Business activity
Accounting
Auditing
Finance
HR management
IT
Marketing
Operations
management
Sales
Geography
Africa
Americas
Antarctica
Asia
Europe
Oceania
Global
Historical
geography
Oceans & seas
Regions
Organization /
Entity
Industry / Product
Agriculture …
Mining
Utilities
Construction
Manufacturing
Wholesale trade
Retail trade
Transportation &
warehousing
Information
Finance &
insurance
Real estate
Professional
Management
Administrative
support
Education
Health care
Arts, entertainment
& recreation
Accommodation &
food
Other services
Public
administration
Business entities
Companies &
brands
Government
agencies
International
NGOs
Organization
types
People / Role
Business Leaders
Thought Leaders
Political Leaders
Roles
Content Type
Basic facts &
information
Blog
Brochure
Database
E-mail
Letter
Memo
Multimedia
Report
Newsletter
Podcast
Press Release
Research &
Analysis
RSS Feed
Taxonomy Facets
Audience
Consumer
Employee
Manager
Executive
Tags
Business activity
Industry / Product
Geography
Organization
Person / Role
Content Type
Audience
Taxonomy Strategies LLC The business of organized information
Topic
124
Summary
 There are lessons to be learned from web tagging about
how to get good metadata in document and content
management applications.
 Document and content management system tagging
must be simple, and it must be almost instantaneously
easier to find relevant work products.
Taxonomy Strategies LLC The business of organized information
125
Taxonomy Strategies LLC
Questions?
Joseph A. Busch
+ 415-377-7912
[email protected]
http://www.taxonomystrategies.com
6-15 June 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Descargar

Taxonomy & metadata strategies for effective content