Taxonomies and Meta Data for
Business Impact
April 13, 2005
Theresa Regli, Molecular, Inc.
Ron Daniel, Jr., Taxonomy Strategies LLC
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 |
2
Who we are: Ron Daniel, Jr.
•
Over 15 years in the business of metadata & automatic
classification
• Principal, Taxonomy Strategies
• Standards Architect, Interwoven
• Senior Information Scientist, Metacode Technologies (acquired
by Interwoven, November 2000)
•
•
Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership
• Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group
• Acting chair, XML Linking working group
• Member, RDF working groups
• Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 &
2 reports.
|
Copyright © 2005 |
3
Recent & current projects
•
Government
•
• Commodity Futures Trading
Commission
• Defense Intelligence Agency
• ERIC
• Federal Aviation Administration
• Federal Reserve Bank Atlanta
• Forest Service
• Goddard Space Flight Center
• Head Start
• Infocomm Development Authority of
•
Singapore
• NASA (nasataxonomy.jpl.nasa.gov)
• Small Business Administration
• Social Security Administration
• U.S.D.A. Economic Research Service •
• U.S.D.A. e-Government Program
(www.usda.gov)
• U.S.G.S.A. Office of Citizen Services
(www.firstgov.gov)
|
Commercial
• Allstate Insurance
• Blue Shield of California
• Halliburton
• Hewlett Packard
• Motorola
• PeopleSoft
• Pricewaterhouse Coopers
• Sprint
• Time Inc.
Commercial subcontracts
• Critical Mass - Fortune 50 retailer
• Deloitte Consulting - Top credit card
issuer
• Gistics – Direct selling giant
NGO’s
• CEN
• IDEAlliance
• OCLC
Copyright © 2005 |
4
Who we are: Theresa Regli
•
Over a decade of experience in cross-media publishing and
content management
•
7 years of consulting
•
4 years in “traditional” media: newspapers, publishing
•
•
•
•
•
•
Brought many New England newspapers online in the mid-90s
Principal Consultant, CM and User Experience, Molecular
Focus on users / customers and how they interact with and
use information, industry education and conferences
Background in linguistics
Named as “one to watch” in 2005 by CMS Watch
Passion for how people, cultures – and businesses – use
words and language
|
Copyright © 2005 |
5
About Molecular
•
Offerings designed to help
organizations leverage
technology to increase
revenues and decrease
costs
•
10+ years of Internet
professional services
expertise
•
120+ consultant
professionals
•
Integrated service offerings
- Digital strategy
- User experience
design/redesign
- Development &
implementation
- Multi-site integration
- Multi-channel
integration
|
Copyright © 2005 |
6
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 |
7
Setting the stage: some definitions…
What is Knowledge Management?
•
The process through which firms generate value from
their intellectual assets
•
The efficient sharing of knowledge across the enterprise:
not focused on presentation
•
Often incorrectly used synonymously with CM
What is Document Management?
•
The effective storage and retrieval of documents
•
Traditionally not about the creation aspect of new
content/documents
•
Often incorrectly used synonymously with CM – some of
the tools have evolved towards CM
|
Copyright © 2005 |
8
Some more definitions…
What is Content Management?
•
The integration of various technologies and processes to
manage content - conception thru deployment
•
The management of content lifecycle: create, approve,
tag, publish
What is Enterprise Content Management?
•
Vendor/analyst term to include all content across the firm
(web, catalog, digital, etc.)
•
Integration of various systems to create one unified,
“virtualized” system (CRM, financial, marketing, etc.)
•
Typically thought of as a strategy and not an
implementation
|
Copyright © 2005 |
9
Putting it all together: golf anyone?
 Caddy provides advice
 Caddy tells other caddies
 Other caddies provide advice
Knowledge
Sharing
Caddy master collects advice
and creates tip booklet for all
caddies
Knowledge
Management
Owner implements at 10
courses: ‘Online Caddy’ system
and Personal Cart system
• Course Tip Sheet
• Golfdigest.com
• Course Yardage Books
|
KM to ECM
Content Management
Document Management
Copyright © 2005 | 10
What makes DM, CM and ECM possible?
Taxonomy
•
Framework for organizing information based on user needs
•
Law for categorizing information
Meta Data
•
Information about content: "data about the data"
•
The categories, sub-categories and terms that make up a
taxonomy are often employed as meta data
•
Meta data is leveraged by a CMS to find and display content
easily and consistently
•
Enables more precise search results and personalization
|
Copyright © 2005 | 11
Foundations for ECM Success: Key Terms
•
Facets: Allow for a more complex classification structure,
where the categories are applied to the information like
keywords. Thus, information about a subject can be
“approached” and found in different ways. For example…
•
•
Hypertension
•
Publications / Medical / Journal of Hypertension
•
Diseases / Cardiovascular / Hypertension
•
Associations / Medical / American Society of Hypertension
Red Rock Crab
•
Animals / Invertebrates / Crustaceans
•
World / Seas / Pacific
•
World / Land / Australasia
|
Copyright © 2005 | 12
Foundations for ECM Success: Key Terms
•
•
•
Synonym Ring: A set of words/phrases that can be used
interchangeably for searching. (Hypertension, high blood
pressure)
Thesaurus: A tool that controls synonyms and identifies the
relationships among terms
Controlled Vocabulary: A list of preferred and variant terms,
with relationships (hierarchical and associative) defined. A
taxonomy is a type of controlled vocabulary.
|
Copyright © 2005 | 13
Sample Taxonomies
|
Copyright © 2005 | 14
The Library of Congress
A) General Works
B) Philosophy, Psychology, Religion
C) History: Auxiliary Sciences
D) History: General and Old World
E) History: United States
F) History: Western Hemisphere
G) Geography, Anthropology, Recreation
H) Social Science
J) Political Science
K) Law
L) Education
M) Music
N) Fine Arts
P) Literature & Languages
Q) Science
R) Medicine
S) Agriculture
T) Technology
U) Military Science
V) Naval Science
Z) Bibliography & Library Science
While both taxonomies are used in libraries,
note how the differences in classification
are specifically accommodating:
• Audience
• Subject matter
|
Copyright © 2005 | 15
Category
Facets
Meta data
(rheumatoid
is a type of
arthritis)
Enables userintuitive
presentation of
information
|
Copyright © 2005 | 16
|
Copyright © 2005 | 17
Taxonomy as Multi-Faceted Browsing Tool
|
Copyright © 2005 | 18
Epicurious, First Facet
Browse > Picnics
|
Copyright © 2005 | 19
Epicurious, Second Facet
Browse > Picnics
> Poultry
|
Copyright © 2005 | 20
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 21
Business Case and Motivations for Taxonomies
•
We divide taxonomy projects into three
problems: the ROI Problem, the Tagging
Problem, and the Taxonomy Problem
•
The ROI Problem: How are we going to use
content, metadata, and taxonomies in
applications to obtain business benefits?
|
Copyright © 2005 | 22
What technology analysts have said
“Adding metadata to unstructured content allows it to be managed
like structured content. Applications that use structured content
work better.”
“Enriching content with structured metadata is critical for
supporting search and personalized content delivery.”
“Content that has been adequately tagged with metadata can be
leveraged in usage tracking, personalization and improved
searching.”
“Better structure equals better access: Taxonomy serves as a
framework for organizing the ever-growing and changing
information within a company. The many dimensions of taxonomy
can greatly facilitate Web site design, content management, and
search engineering. If well done, taxonomy will allow for structured
Web content, leading to improved information access.”
|
Copyright © 2005 | 23
Metadata specification – a recipe example
Data
Type
Element
Length
Req. /
Repeat
Source
Purpose
Asset Metadata
Unique ID
Integer
Fixed
1
System supplied
Basic accountability
Recipe Title
String
Variable
1
Licensed Content
Text search & results display
Recipe summary
String
Variable
1
Licensed Content
Content
?
Main Ingredients
vocabulary
Key index to retrieve & aggregate
recipes, & generate shopping list
Main Ingredients
List
Variable
Subject Metadata
Meal Types
List
Variable
*
Meal Types vocab
Cuisines
List
Variable
*
Cuisines
Courses
List
Variable
*
Courses vocab
Coking Method
Flag
Fixed
*
Cooking vocab
Browse or group recipes & filter
search results
Link Metadata
Recipe Image
Pointer
Variable
?
Product Group
Merchandize products
Use Metadata
Rating
String
Variable
1
Licensed Content
Filter, rank, & evaluate recipes
Release Date
Date
Fixed
1
Product Group
Publish & feature new recipes
Legend:
|
? – 1 or more
* - 0 or more
Copyright © 2005 | 24
Fundamentals of taxonomy ROI
•
Building and maintaining a taxonomy, and tagging content
with it, are costs. They are not benefits
•
There is no benefit without exposing the tagged content to
users in some way that cuts costs or improves revenues
Putting a new taxonomy into operation requires UI changes
and/or backend system changes, as well as data changes
•
•
•
Every metadata field costs money, time, and goodwill
You need to determine those changes, and their costs, as
part of the taxonomy ROI
|
Copyright © 2005 | 25
Common taxonomy ROI scenarios
•
Catalog site - ROI based on increased sales through improved:
Product findability
• Product cross-sells and up-sells
• Customer loyalty
•
•
Call center - ROI based on cutting costs through:
Fewer customer calls due to improved website self-service
• Faster, more accurate CSR responses through better information access
•
•
Compliance – ROI based on:
Avoiding penalties for breaching regulations
• Following required procedures (e.g. Medical claims)
•
•
Knowledge worker productivity - ROI based on cutting costs through:
Less time searching for things
• Less time recreating existing materials, with knock-on benefits of less
confusion and reduced storage and backup costs
•
•
Executive mandate
•
No ROI at the start, just someone with a vision and the budget to make it
happen
|
Copyright © 2005 | 26
Taxonomy Justification | Knowledge Worker Productivity
Huge cost to the user & organization
• Finding information (time, frustration, precision)
• “15%-30% of an employee’s time is spent looking for information, and they
find it only 50% of the time”
• IDC Research, on the business drivers for building a taxonomy
• Sun’s usability experts calculated that 21,000 employees were wasting an
average of six minutes per day due to inconsistent intranet navigation
structures. When lost time was multiplied by staff salaries, the estimated
productivity loss exceeded $10M per year
• Web Design and Development, Jakob Nielsen
• Managers spend 17% of their time (6 weeks a year) searching for
information
• Information Ecology, Thomas Davenport & Lawrence Prusack
Lost Learning Value
• Related products, services, projects, people
|
Copyright © 2005 | 27
Challenges of organizing content on enterprise portals (1)
•
Multiple subject domains across the enterprise
•
•
•
•
Vocabularies vary
Granularity varies
Unstructured information represents about 80%
Information is stored in complex ways
•
•
Multiple physical locations
Many different formats
Tagging is time-consuming and requires SME involvement
• Portal doesn’t solve content access problem
•
•
•
•
•
Knowledge is power syndrome
Incentives to share knowledge don’t exist
Free flow of information TO the portal might be inhibited
Content silo mentality changes slowly
•
•
•
•
What content has changed?
What exists?
What has been discontinued?
Lack of awareness of other initiatives
|
Copyright © 2005 | 28
Challenges of organizing content on enterprise portals (2)
•
•
•
Lack of content standardization and consistency
•
Content messages vary among departments
•
How do users know which message is correct?
Re-usability low to non-existent
Costs of content creation, management and delivery may not change
when portal is implemented:
•
Similar subjects, BUT
•
Diverse media
Diverse tools
•
Different users
•
•
•
•
How will personalization be implemented?
How will existing site taxonomies be leveraged?
Taxonomy creation may surface “holes” in content
|
Copyright © 2005 | 29
FAQ – How do you sell it?
•
Don’t sell the taxonomy, sell the vision of what you want to
be able to do
•
Clearly understanding what the problem is and what the
opportunities are
•
Do the calculus (costs and benefits)
•
Design the taxonomy (in terms of LOE) in relation to the
value at hand
|
Copyright © 2005 | 30
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 31
NASA Taxonomy Project Goal: Enable Knowledge Discovery
• Make it easy for various audiences to find relevant
information from NASA programs quickly
•
•
•
Provide easy access for NASA resources found on the Web
•
Share knowledge by enabling users to easily find links to
databases and tools
•
Provide search results targeted to user interests
•
Enable the ability to move content through the enterprise to
where it is needed most
Comply with E-Government Act of 2002
Be a leading participant in Federal XML projects
|
Copyright © 2005 | 32
NASA Taxonomy Project Goal: Develop Best Practices
• Design process that:
•
•
Incorporates existing federal and industry terminology
standards like NASA AFS, NASA CMS, FEA BRM, NAICS,
and IEEE LOM
•
Provides a product for the NASA XML namespace registry
•
Complies with metadata standards like Z39.19, ISO 2709, and
Dublin Core
Practices believed to increase interoperability and
extensibility
|
Copyright © 2005 | 33
Development Process: Interviews
> 70 Interviews conducted across
NASA complex.
Projects
13%
Public
18%
Administrators
26%
Funders
4%
Engineers
22%
Researchers
13%
Scientists
4%
Categorized by type -
52%–Projects, Engineering & Science
|
Copyright © 2005 | 34
Scale of NASA Taxonomy
Facet
# Terms
Source
Audiences
62
Custom
Business Purpose
96
Existing
Competencies
169
Existing
Content Types
96
Custom
Industries
22
Existing
Instruments
56
Semi
Locations
106
Custom
Missions/Projects
648
Semi
Organizations
323
Existing
Subject Categories
78
Existing
Total
1656
|
Facets combine, so millions of
documents can be finely
categorized with a relatively
small number ofCopyright
values.
© 2005 | 35
NASA Taxonomy Web Site
Link to Metadata
Specification
Link to XML DTDs
and Schema
Background and
training materials
Links to
Controlled
Vocabularies
http://nasataxonomy.jpl.nasa.gov
|
Copyright © 2005 | 36
Benefits of Approach
•
Facets and Use of Standards made it possible to respond to
three unexpected needs during and after the project:
•
Search demo
•
Semantic search demo
•
Integration with detailed vocabularies
|
Copyright © 2005 | 37
Example | NASA Taxonomy Search Prototype
Current Search
State
Facets, Values,
and Counts
|
Copyright © 2005 | 38
Example 1 | NASA Taxonomy Search Prototype
• ¾ of the way through the project, request was
made to see a demo of the taxonomy in action
• Taxonomy was represented in RDF
• Metadata was scraped from a few repositories
around NASA (~220k records), converted to
RDF
• Some metadata automatically created with
simple keyword matches
• RDF loaded into Seamark search tool
•
•
•
Time: approx 2 man-weeks
Additional cost: $0
Result: Useful demo that illustrated new facts
|
Copyright © 2005 | 39
Example 2 | Semantic Search
• After project was over, another
project was doing ‘semantic
search’
• They heard about NASA Taxonomy
• They downloaded the RDF file for
the Missions & Projects
vocabulary, mapped to their
RDF/OWL tool, and used it to
answer questions about different
types of missions
• They did not have to ask any
questions or request any data
changes
|
Courtesy Dean Allemang, Top Quadrant,
Robert Brummett, NASA HORM
Copyright © 2005 | 40
Example 3 | Local Extension
•
After project, JPL wanted to incorporate content from
additional repositories
•
Existing metadata was easily mapped as extensions to
NASA taxonomy
RDF mapping allowed Search tool to make immediate use
of the metadata.
•
|
Copyright © 2005 | 41
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 42
The Tagging Problem
• How are we going to populate metadata elements with
complete and consistent values?
• What can we expect to get from automatic classifiers?
|
Copyright © 2005 | 43
Tagging
• Province of authors (SMEs) or editors?
• Taxonomy often highly granular to meet task and re-use
needs
• Vocabulary dependent on originating department
• The more tags there are (and the more values for each
tag), the more hooks to the content
• If there are too many, authors will resist and use “general”
tags (if available).
• Automatic classification tools exist, and are valuable, but
results are not as good as humans can do.
• “Semi-automated” is best
• Degree of human involvement is a cost/benefit tradeoff
|
Copyright © 2005 | 44
low
Content Volumes
high
Automatic categorization vendors | Analyst viewpoint
low
|
Accuracy Level
high
Copyright © 2005 | 45
Considerations in Automatic Classifier Performance
•
•
•
•
Classification Performance is
measured by “Inter-cataloger
agreement”
• Trained librarians agree less
than 80% of the time
• Errors are subtle differences
in judgment, or big goofs
Automatic classification struggles
to match human performance
• Exception: Entity recognition
can exceed human
performance
Classifier performance limited by
algorithms available, which is
limited by development effort
Very wide variance in one
vendor’s performance depending
on who does the implementation,
and how much time they have to
do it
|
Accuracy
Trained Librarians
potential
performance
gain
Regexps
Development Effort/
Licensing Expense
1) 80/20 tradeoff where 20% of effort
gives 80% of performance.
2) Smart implementation of
inexpensive tools will outperform
naive implementations of worldclass tools.
Copyright © 2005 | 46
Tagging tool example | Interwoven MetaTagger
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Auto keyword &
summarization
|
Copyright © 2005 | 47
Tagging tool example | Interwoven MetaTagger
Auto-categorization
Rules & pattern
matching
Parse & lookup
(recognize names)
|
Copyright © 2005 | 48
Metadata tagging workflows
• Even ‘purely’ automatic
meta-tagging systems
need a manual error
correction procedure.
Compose in
Template
Automatically fillin metadata
•
•
Submit to CMS
Y
Should add a QA
sampling mechanism
Hard
Copy
Problem?
Approve/Edit
metadata
Tagging models:
Review content
N
Copy Edit
content
Web
site
Problem?
N
•
Author-generated
•
Central librarians
•
Hybrid – central autotagging service,
distributed manual
review and correction
|
Y
Tagging Tool
Analyst
Editor
Copywriter
Sys Admin
Sample of ‘author-generated’ metadata
workflow
Copyright © 2005 | 49
low
Content Volumes
high
Automatic categorization vendors | Pragmatic viewpoint
low
|
Accuracy Level
high
Copyright © 2005 | 50
Seven practical rules for taxonomies
1. Incremental, extensible process that identifies and
enables users, and engages stakeholders
2. Quick implementation that provides measurable results
as quickly as possible
3. Not monolithic—has separately maintainable facets
4. Re-uses existing IP as much as possible
5. A means to an end, and not the end in itself
6. Not perfect, but it does the job it is supposed to do—such
as improving search and navigation
7. Improved over time, and maintained
|
Copyright © 2005 | 51
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 52
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 53
Typical Project Timeline*
Weeks 1-2
Weeks 3-4
Weeks 5-6
Weeks 7-8
Weeks 9-10
Weeks 11-12
Kick-off
and Prep
Interviews
Content Analysis
Document Requirements
Develop & Validate Taxonomy
Implementation
Caveat: this will vary greatly based on the complexity of the content
and the organization
|
Copyright © 2005 | 54
Seven phases of taxonomy and metadata design
1 Identify
Objectives
Conduct interviews
ID sources, spider
assets & extract
metadata
2 Inventory
Content
3 Specify
Metadata
Define fields &
purpose
Define content
chunks & XML
DTDs
4 Model
Content
5 Specify
Vocabularies
6 Specify
Procedures
7 Train Staff
|
Compile controlled
vocabularies
Develop workflow,
rules & procedures
Develop
materials &
train staff
Copyright © 2005 | 55
Seven phases of taxonomy and metadata design
1 Identify
Objectives
Interview core team
and stakeholders
2 Inventory
Content
ID sources, spider
assets & extract
metadata
3 Specify
Metadata
Define fields &
purpose
Start with UI
sketches,
off-the-shelf
rules.
Revise if needed,
bake into alpha
CMS
Tailor the
default
materials
Manually
tag small
sample
7 Train Staff
Stage
Participants
Plan & Prototype
Project Team
|
Interview
beta users
Gather
additional
sources, if
any
Revise if
needed, bake
into alpha
CMS
Compile
controlled
vocabularies
5 Specify
Vocabularies
Interview
alpha
users
Gather
additional
sources, if
any
Define
content
chunks &
XML DTDs
4 Model
Content
6 Specify
Procedures
Review
tagged
samples,
default
procedures
Revise, use in
alpha CMS
alpha workflows
in CMS
Use alpha
CMS to tag
larger sample
Alpha Dev & Test
Stakeholders and SMEs
Modify
CMS for
beta
Modify
for 1.0
Modify CMS
for beta
Modify
for 1.0
Revise
using
team
procedu
re
Revise,
use in
beta
CMS
Modify &
extend
workflows
Finalize
procedure
materials
Use beta CMS
to tag larger
sample
Beta D&T
Friendly Users
Finalize training
materials & train
staff
Final D&T
Audiences
Copyright © 2005 | 56
Project Prep | Key Considerations
• What is the level of knowledge about taxonomy in the
company as a whole?
• What are the most important priorities for the taxonomy?
• How much do I know about the subject matter? How much
ramp up do I need?
• How many types of content will I need to consider?
• How much content is there (quantity-wise)?
• How many stakeholders and subject matter experts (SMEs)
are there? How are they organized? (e.g. one “owner/SME”
per product line?)
• What types of politics or challenges exist today between
groups of owners/subject matter experts? Will they debate
and/or argue over terminology or what should be classified
where?
|
Copyright © 2005 | 57
Project Prep | Key Considerations
• Does any of the terminology need to be created from scratch
or re-written?
• What kind of data store will the taxonomy be used in?
(Database? XML repository?)
• Has any user feedback been received so far (internal or
external, formal or informal), as to what they like and don’t
like about finding the company’s information?
• Is there a product database of any sort in existence today?
What product characteristics are accounted for? (name,
description, number, etc.)
• If there is a web site, how is it organized today? (e.g.
products, solutions, roles, etc.)
• How will users tag content using this taxonomy? Do they
have that software/interface in place today?
• Will we need to train users to tag content?
|
Copyright © 2005 | 58
Content Analysis | Steps and Approaches
• Conduct stakeholder interviews to determine project goals
and success metrics
• Be sure to be prepared with your own!
• Conduct industry competitive analysis if appropriate
• Review content and create a high-level inventory
• Determine the terms the business uses to categorize
information (top-down approach)
• Determine the term the employees use when seeking
information (bottom-up approach)
• Gather all terms / categories / content types
• Check vis-à-vis original content inventory to ensure
everything is accounted for
|
Copyright © 2005 | 59
Example | Document Topic Inventory
|
Copyright © 2005 | 60
Example | Product Topic Inventory
|
Copyright © 2005 | 61
Taxonomy creation process | Steps and Approaches
• SME analysis of content to determine categories and/or
tags
• Workshops with SME and stakeholders to gain additional
understanding of content
• Card sorting exercises with business users or end
customers to determine intuitive clustering and category
names
• Auto-generation of “rough” taxonomy via software tool
• Refine with SMEs and taxonomy experts
• Iterative taxonomy creation over a period of several weeks
depending on size and scope of the effort
• Validate taxonomy via user testing
|
Copyright © 2005 | 62
Taxonomy creation process | Best practices
• Be aware of the competition: how they name and
categorize products
• Involve engineers early: ensure that the taxonomy
you’re creating can be used with the technology
• Be aware of key parties’ viewpoints
• After determining the high-level categories, have a
midpoint check in with stakeholders to ensure you’re on
the right track and build ongoing consensus
• For the purposes of web design, leverage sample page
layouts to show how categorization and tagging will
affect page layout and content
• Remember taxonomies must evolve and progress as
your business changes
|
Copyright © 2005 | 63
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 64
Taxonomy Business Processes
Taxonomies must change, gradually, over time if they are to
remain relevant
• Maintenance processes need to be specified so that the
changes are based on rational cost/benefit decisions
• A team will need to maintain the taxonomy on a part-time
basis
•
•
Taxonomy team reports into CM governance or steering
committee
|
Copyright © 2005 | 65
Taxonomy governance | Change process overview
2: Taxonomy Team decides
when to update CV
2: NASA Taxonomy Team
snapshots
Taxonomy
Facets
Site Search
Tool
Site Search Tool
decides when to
update snapshots of
external CVs
CV Sources
CV
Consumers
Portal
Portal
Subject
Codes
Codes
Taxonomy
Working
Copies
of CVs,Tool
maintain in
Taxonomy Tool
Working
Papers
Project Archives
NASA
Expertise
Competencies
CVsOther
from other
NASA
Sources
Internal
3: 3:
Team
adds
value
via
Team
adds
value
to
definitions,
snapshotssynonyms,
through
definitions,
synonyms,
classification
rules,
classification
rules,
training materials,
etc.
training materials, etc.
External
External
Standard
Vocabularies
Standard
Internally
Internally
Created
CVs
1: External controlled
vocabularies (CVs) change
on their own schedule
|
’
Web CMS
4: Updated
versions
of CVsof
4: Updated
versions
to
CVspublished
to Consumers
consumers
DMS’
’
DAM
Tagging Tool
Metatagging
Tool
Created
Taxonomy
NASA
Taxonomy
Governance
Governance
Environment
Search
Search UI UI
Environment
Copyright © 2005 | 66
Taxonomy governance | Generic team charter
•
Taxonomy Team is responsible for maintaining:
•
•
The Taxonomy, a multi-faceted classification scheme
Associated taxonomy materials, such as:
Editorial Style Guide
• Taxonomy Training Materials
• Metadata Standard
•
•
Team rules and procedures (subject to CIO review)
Committee will consider costs and benefits of suggested change
• Taxonomy Team will:
•
•
•
•
Manage relationship between providers of source vocabularies and
consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the
Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
|
Copyright © 2005 | 67
Taxonomy governance team | Generic roles
•
Executive Sponsor
•
•
Advocate for the taxonomy team
Business Lead
•
•
Keeps committee on track with larger business objectives
Balances cost/benefit issues to decide appropriate levels of effort
•
•
•
•
•
Committee’s liaison to content creators
Estimates costs of proposed changes in terms of editorial process changes, additional or reduced
workload, etc.
Taxonomy Specialist
•
•
•
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and
processing burden, software changes, etc.
Helps obtain data from various systems
Content Specialist
•
•
Obtains needed resources if those in committee can’t accomplish a particular task
Technical Specialist
•
•
Specialists help in estimating costs
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system with aid of IT specialist
Content Owner
•
Reality check on process change suggestions
|
Copyright © 2005 | 68
Taxonomy governance | Where changes come from
Firewall
Application
UI
Tagging
UI
Application
Logic
Content
Tagging
Logic
Taxonomy
Staff
notes
‘missing’
concepts
Query log
analysis
End User
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application
changes)
3. New “best bets” content
|
Tagging Staff
Taxonomy Editor
Steering Committee
Committee considerations
1. Business goals
2.
experience
Changes in user
experience
3. Retagging cost
Requests from other
Requests
from
other
parts of
NASA
parts of the organization
Copyright © 2005 | 69
Taxonomy governance | Taxonomy maintenance workflow
Problem?
Yes
Suggest new
name/category
Review new
name
Problem?
No
Copy edit new
name
Add to
enterprise
Taxonomy
Taxonomy
No
Yes
Analyst
Taxonomy Tool
|
Editor
Copywriter
Sys Admin
Copyright © 2005 | 70
Sample Taxonomy Editor: Data Harmony
Hierarchy
Browser
Standard
Term
Info
|
Copyright © 2005 | 71
Taxonomy editing tools vendors
Immature industry
– no vendors in
upper-right
quadrant!
Ability to Execute
high
Most popular taxonomy
editor? MS Excel
low
High functionality, high
cost ($100k!)
Widely used, cheap,
single-user
Niche Players
|
Completeness of Vision
Visionaries
Copyright © 2005 | 72
Measuring Metadata and Taxonomy Quality
•
Taxonomy development is an iterative process
•
Elicit feedback via walk-throughs, tagging samples, and
card sorting exercises
•
Use both qualitative and quantitative methods, and remain
flexible throughout
|
Copyright © 2005 | 73
Taxonomy testing | Qualitative methods
Method
Walk-throughs
Process
Show and explain
Validation
 Approach
 Consistency to rules
 Appropriateness to task
Usability Testing
Contextual analysis
 Tasks are completed successfully
 Time to complete task is reduced
User Satisfaction
Survey
 Reaction to new interface
 Reaction to search results
Tagging samples
Tag sample content
with taxonomy
 Content ‘fit’
 Fills out content inventory
 Training materials for people &
algorithms
 Basis for quantitative methods
|
Copyright © 2005 | 74
Quantitative Method | How evenly does it divide the content?
Background:
•
200,000
Series2
150,000
Series1
100,000
50,000
bl
io
gr
ap
hy
St
at
is
tic
s
er
at
ur
e
Bi
Ju
ve
ni
le
lit
itio
ns
ct
io
n
Ex
hi
b
Fi
Co
ng
re
ss
es
Bi
og
ra
ph
y
Pe
rio
di
ca
ls
Top 10 Content Types
Measured and Expected Distribution of Content Types in an
Intranet
Results:
25
20
Results were slightly more uniform than
the Zipf distribution, which is better than
expected
# Documents
15
Measured
Expected
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Marketing &
Sales
Manuals &
Learning
Materials
Operations &
Internal
Communications
0
People, Groups
& Places
•
250,000
0
Part of alpha test of ‘content type’ for
corporate intranet
• 115 URLs selected at random from
search index were manually categorized.
Inaccessible files and ‘junk’ were
removed
•
•
300,000
ap
s
Methodology:
Number of Records
•
350,000
M
Documents do not distribute uniformly
across categories
• Zipf (1/x) distribution is expected behavior
• 80/20 rule in action (actually 70/20 rule)
Measured and Expected Distribution of Top 10 Content Types
in Library of Congress Database
News & Events
•
Content Type
|
Copyright © 2005 | 75
Quantitative Method | How intuitive (repeatable) are the categorizations?
•
Methodology: Closed Card
Sort
For alpha test of a grocery site
• 15 Testers put each of 100
best-selling products into one
of 10 pre-defined categories
• Categories where fewer than
14 of 15 testers put product
into same category were
flagged
•
•
“Cocoa Drinks – Powder” is best
categorized in both “Beverages”
and “Grocery”.
Results:
% of Testers
Cumulative % of
Products
15/15
54%
14/15
70%
13/15
77%
12/15
83%
11/15
85%
<11/15
100%
|
In the trade, “Corn Tortillas” are
a Dairy item!
Copyright © 2005 | 76
Quantitative Method | How does taxonomy “shape” match that of content?
• Background:
Term Group
%
%
Term
Docs
• Hierarchical taxonomies allow
s
comparison of “fit” between content
Administrators
7.8
15.8
and taxonomy areas
Community Groups
2.8
1.8
• Methodology:
Counselors
3.4
1.4
Federal Funds
9.5
34.4
• 25,380 resources tagged with
Recipients and
taxonomy of 179 terms. (Avg. of 2
Applicants
terms per resource)
Librarians
2.8
1.1
• Counts of terms and documents
News Media
0.6
3.1
summed within taxonomy hierarchy
Other
7.3
2.0
Parents and Families
2.8
6.0
• Results:
Policymakers
4.5
11.5
• Roughly Zipf distributed (top 20
Researchers
2.2
3.6
terms: 79%; top 30 terms: 87%)
School Support Staff
2.2
0.2
• Mismatches between term% and
Student Financial
1.7
0.7
Aid Providers
document% flagged
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Education
|
Copyright © 2005 | 77
Metadata Maturity Model
•
•
•
•
Taxonomy governance processes must fit the organization
As consultants, we notice different levels of maturity in the business
processes around Content Management, Taxonomy, and Metadata
Honestly assess your organization’s metadata maturity in order to
design appropriate governance processes
We are starting to define a maturity model, similar to the SCCM model
in the software world:
•
Initial - ad hoc, each project begins from scratch.
Repeatable - Procedures defined and used, but not standardized across
organization or are misapplied to projects.
• Defined – Standard processes are tailored for project needs. Strategic
training for long-range goals is in place.
• Managed – Projects managed using quantitative quality measures. Process
itself is measured and controlled.
• Optimizing – Continual process improvement. Extremely accurate project
estimation.
•
|
Copyright © 2005 | 78
Purpose of Maturity Model
• Estimating the maturity of an organization’s information
management processes tells us:
•
How involved the taxonomy development and maintenance
process should be
•
•
Overly sophisticated processes will fail
What to recommend as first steps
Maturity is not a goal, it is a characterization of an
organization’s methods for achieving particular goals
• Mature processes have expenses which must be
justified by consequent cost savings or revenue gains
• IT Maturity may not be core to your business
•
|
Copyright © 2005 | 79
Metadata Maturity Scorecard
Initial
Repeatable
Defined
Managed
Optimizing
Organizational Structure
Executive Sponsorship
*
Budgeting
*
Hiring & Training
*
Quality Assurance
Manual Processes
*
Automated Processes
*
1
Project Management
Estimating & Scheduling
*
Cost Control
*
Project Methodology
*
2
Design and Execution
Planning
*
Design Excellence
*
Development Maturity
*
1 – X is starting to examine search query logs, which is an important first step in improving search. But this is only
an isolated example.
2 – IT has a project methodology they are trying to use across all projects. But not all business units have project
methodologies.
|
Copyright © 2005 | 80
Metadata Maturity Quick Quiz
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
What process is in place to examine query logs?
Is there a process for adding directories and content to the repository, or do people just
do what they want?
Is there an organization-wide metadata standard, such as an extension of the Dublin
Core, used by search tools, multiple repositories, etc.?
Are system features and metadata fields added based on cost/benefit analysis, rather
than things that are easy to do with the current tools?
Who is breathing down my neck to improve search on our intranet?
Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete,
Trivial content).
Is there an established QA procedure for ensuring metadata accuracy and
conformance? Are there established qualitative and quantitative measures of metadata
quality?
Is there a centralized metadata group with tools and services offered around the
organization?
Are there hiring and training practices especially for metadata and taxonomy positions?
Have features been removed from the metadata standard?
|
Copyright © 2005 | 81
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study: NASA
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study: PC Connection
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 82
Example: PC Connection
|
Copyright © 2005 | 83
Leveraging Technology: Endeca
•
•
|
Drop-down menus
•
More visible
•
More traditional to
match user
expectations
Challenge: Give the
“results” more real
estate but keep the
filters prominent
Copyright © 2005 | 84
PC Connection: The Solution
•
•
DHTML “slider” applied
for second-level
navigation, exposing all
product attributes for
easy filtering
Validated solution
•
|
Powerful comparisons
between first and
second usability tests
(e.g., 5 out of 8
participants used
filters on first test,
10 out of 10 used
filters on second test)
Copyright © 2005 | 85
PC Connection: Results
•
All product categories consistently accessible
•
Drop-down menus with product attributes facilitate ease of
filtering
Easier to use different facets of taxonomy to find desired
products
•
•
Customers use, rather than struggle with, navigation
|
Copyright © 2005 | 86
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 87
Lessons Learned: Taxonomies for Business Impact
•
•
•
•
•
•
Content is no longer king: the user is
Understand how your users/customers want to interact
with information before designing your taxonomy and the
user interface
Carry those user needs through to the back-end data
structure and front-end user interface
Empower the user with the categories and content
attributes they need to filter and find what they want
Leverage UE design best practices like usability testing to
determine needs and validate taxonomy and interface
design
Remember that taxonomy is a “snapshot in time”: keep it
up to date, let it evolve
|
Copyright © 2005 | 88
Summary
• What is the problem you are trying to solve?
•
•
•
•
•
Improve search (or findability)
Browse for content on an enterprise-wide portal
Enable business users to syndicate content
Otherwise provide the basis for content re-use
Comply with regulations
•
What data and metadata do you need to solve it?
• Where will you get the data and metadata?
• How will you control the cost of creating and maintaining the
data and metadata needed to solve these problems?
•
•
•
•
CMS with a metadata tagging products
Semi-automated classification
Taxonomy editing tools
Appropriate governance process
|
Copyright © 2005 | 89
Agenda
1:30
Welcome and Introductions
1:40
Taxonomy Definitions and Examples
2:10
Business Case and Motivations
2:30
Case Study
2:45
Tagging and Tools
3:00
Break
3:15
Running a Taxonomy Project
3:45
Taxonomy Maintenance and Governance
4:15
Case Study
4:30
Summary and Discussion
5:00
Adjourn
|
Copyright © 2005 | 90
Thank you!
Theresa Regli
Ron Daniel
[email protected]
[email protected]
|
Copyright © 2005 | 91
Descargar

Molecular PowerPoint Design Template