Managing Libraries with
Creative Data Mining
Learning to Use Your Library’s Data
Warehouse to Understand and Improve the
Services You Provide
Ted Koppel
The Library Corporation
Computers in Libraries 2005
Session B203, March 17, 2005
The Plan
•
•
•
•
•
•
What is data mining and why is it useful?
Who else does it?
Does it make sense for libraries?
Are libraries already doing data mining?
What data can libraries mine?
How much sophistication do I need?
What is Data Mining?
• Collection and Analysis of one’s own data in order to
make better business decisions.
• More than simple data storage
• Business intelligence technology for discerning unknown
patterns from large databases
• Uses statistics, artificial intelligence,
various modeling techniques
• Related to, but different from,
bibliomining
Value and Importance
• By identifying patterns and predicting future
trends …
– Make decisions based on facts, not
guesswork
– Develop sensible processes
– Reduce costs or increase services by efficient
use of resources
• Serve the customer better
‘High Level’ planning
•
•
•
•
•
•
•
•
Remember -- GIGO.
Define the data mining goals
Data collection
Data organization and normalization
Analysis
Analysis
Analysis
Reiteration
Who is Data Mining now?
• Manufacturing –process
control
• Banks and financial
institutions – “full service”
• Government and law –
fraud, abuse
• Sports – RHP versus LHB?
Sucker for a curve ball?
• Service industries – almost
all CRM systems
• Retail: product stock and
placement
• Travel: airline overbooking
• Las Vegas: guest tracking for
comps and benefits
• Groceries: affinity cards
• Internet: GoogleAds
Nuggets Found by Mining
• Chase Bank: minimum balance versus other
bank business
• Home Depot hurricane planning
• WalMart (UK) diapers and beer (actually a hoax,
but an informative one)
• Casino security in Las Vegas
- fraud
Implementer Level Tools
•
•
•
•
Oracle® Data Mining Suite
Microsoft SQL Server 2000
SPSS and similar
Statistica STATSOFT
• Open Source:
– Cornell Univ. Himalaya Data
Mining Tools
– WEKA Waikato Environment for
Knowledge Analysis (Univ. of Waikato, NZ)
Looking for the Dog that Doesn’t
Bark
• NORA – Non Obvious Relationship
Awareness
– Examines third ++ level relationships between
datasets
• ANNA – Anonymized Data
– Double-blind application/offshoot of NORA
that deals with personal attributes
anonymously
Vocabulary Lesson
• Bagging (averaging)
• Data Models:
• Boosting (calculating
predictive data)
– CRISP = Cross Industry
Standard Process for DM
• Drilling down
– SEMMA = Sample,
Explore, Modify, Model,
Access
• Stacking (combining
predictions from different
models)
• Predictive mining (using
X to predict Y)
Value to Libraries  a Tool
• Citizens demand more/better service at a time of
reduced funding.
• Anticipate USER behavior
• Anticipate STAFF behavior
• Service hours and staffing needs,
facilities planning
• Collection development –
anticipating customer needs
Do Libraries Use DM?
• Association of Research Libraries ARL
Spec Kit 274 (2003) – Mento and Rapple
– 124 surveys, 65 responses
– 40% already doing some data mining
– 90% had plans
• Major areas of activity
– Research and Collection Support
– Administration
– Repository management (future)
ARL Member Benefits Seen
•
•
•
•
•
•
Serials cancellation projects
Collection Development tuning
Budget allocation by material use
Workflow analysis
Weeding
OPAC and Web presence usability
and redesign
• Hacking and break-in analysis
(defensive data mining)
Other Library Data Mining
• Kun Shan University of Technology (Taiwan)
– ABAMDM Model = Acquisition Budget Allocation
Model based on Data Mining
– More material use 
More money
– Compared:
•
•
•
•
•
Circulation
Collection size
Department size
# of courses
# students/faculty per department
Other Library Data Mining (2)
• OCLC’s ACAS (Automated Collection
Analysis System) (recently upgraded!)
– Analyzes bibliographic records by call number
ranges (LC 4-digit, Dewey tens for example)
– Subdivides by years and aggregated years
– Subdivides by branch / collection
– “Collection conspectus” as a way to:
• Compare library collections
• Identify collection deficiencies
Other Library Data Mining (3)
• Univ. of Florida with FCLA
–
–
–
–
Decision Support System for acquisitions activities
Extracted from NOTIS bib files; saved to DB2
Screen scraped Acq files
Created large database of bib and in-process records
which allowed querying:
• Circ history of approval versus firm orders?
• $ spent on titles that never circulate
• Do originally-cataloged items circulate? More or less than
copy cataloged items?
• How many items circulate more than “n” times?
– Assesses collection development and tech service
activity
Libraries are
fountains of data
Everything is countable
(example: Circulation transaction)
• Book:











branch
location
Media type
pubdate
size
color
thickness
#circs
cost
vendor
holds
User:
Extractable:
 age
 Census Tract
 Location
 Curriculum
 Language
 Holds
 Sex
 Circ History
 Zipcode
 Repairs
 phone#
 School
 Loan history
 delinquencies
Multiply this by 10 million times a year!
Expand to:
• Acquisitions information (book attributes, vendor
history and performance, fund history, requester and
department, etc.)
• OPAC searching and navigation (databases,
searches, not founds)
• Metasearch usage (databases, usage)
• Reference desk interactions (who, what, how long?).
VRD by extension
• Resource sharing (NCIP, ILL)
• In-house usage transactions
• Physical plant: elevator, restroom, copier use
Crunch (Data) Creatively
• Unlikely variables give interesting data
• Ideas:
– Sex of user versus color of book
– Call # range vs. age of item vs. circulation ratio
by avg. $ paid per item
– Story hour attendance vs. Adult circ vs. Fines
collected
– Best sellers cost vs. Trade books by cost per circ
– Etc.
If you can count it, you can
analyze it
But remember -
QUALITY and
CONSISTENCY
• Library Automation vendor for over 30 years
• Family-owned, customer focused
• Library•Solution®
• Library•Solution™ for Schools
• CARL•Solution®
• CARL•X ™
Library•Solution Reports
•
•
•
•
•
•
Utilizes
ReportNet software
Drag and Drop Report Design
Completely Web-based
Fitted to Library.Solution data framework
Zero footprint on workstations
Central reporting with enhanced
distribution
• Multiple export formats
• Charts, tables, etc.
• Powerful
Using Library Data Outside the
Library
• City, County, RCOG, State Planning and
Development Authorities
– Require solid statistics about population,
educational level, etc.
– Quality of Life and capital budget services
planning
• Preserve user anonymity but share trends
• Input to GIS systems for real time
projection of future library needs
Applying GIS in the Library
Market
•
•
•
•
•
•
•
•
Library.Decision product
Works with ILS vendors including TLC
Focus collections development
Strengthen advocacy planning; undertake
cardholder development campaigns
Support grant applications
Site new facilities
Calculate service indicators
Evaluate service delivery in relation to the
unique needs of your community
In closing …
• Libraries are producing data every minute of
every day
• You need:
– Some tools
– Some creativity
– Some analytical ability
Knowledge is Power !
Acknowledgements
• Nicholson and Stanton, Gaining strategic advantage
through bibliomining. At www.bibliomining.com
• Banerjee, Is Data Mining Right for your library?
Computers in Libraries, Nov. 98
• Kao, Chang, and Lin. Decision Support for the
Academic Library…, Information Processing and
Management 39(2003)
• Fabris. Advanced Navigation. CIO May 1998
• Library Administration and Management (journal) Winter
1996, section on Data Mining
Thank You
• Contact information
Ted Koppel
The Library Corporation
[email protected]
(800)624-0559
Descargar

Slide 1