Introduction to Clinical Terminology
and Classification
AL Rector
OpenGALEN
CO-ODE
The Medical Informatics Group, U of Manchester
www.cs.man.ac.uk/mig/galen
www.opengalen.org
www.co-ode.org
oiled.man.ac.uk
[email protected]
Slide No.: 1
OpenGALEN
Where we come from
Clinical
Terminology
GALEN
Clinical
Terminology
Slide No.: 2
Data
Entry
Clinical
Record
Decision
Support
Data
Entry
Electronic
Health
Records
Decision
Support &
Aggregated
Data
Best
Practice
Best
Practice
OpenGALEN
OpenGALEN: Philosophy

Terminology is software


Re-use is the key


Terminology is the interface between people and machines
Patient-centred information
Terminology must have a purpose

Always ask: “What’s it for?”
—
Not art for art’s sake
 Terminology supports clinical applications - not vice versa
– Applications for someone to do something for somebody
– Keep the ‘Horse before the Cart’

Slide No.: 3
Always ask: “How will we know if it works?”
“How will we know if it fails?”
OpenGALEN
OpenGALEN: Key ideas

Separation of kinds of knowledge





Terminology, medical record and information system schemas
Models of meaning; Models of Use
Concepts, language, Coding, Indexing, Pragmatics
Machine level, User level
Knowledge is fractal!

There will always be more detail to be added
—

Therefore terminologies must be extensible
Formal logical Support

Too big and complicated to maintain by hand
—
—
Slide No.: 4
Extensibility requires rules
Software needs logical rigour
OpenGALEN
Axes for kinds of Knowledge

Terminology

Concepts

Machine level

Medical Records/
Information
systems

Language

Human Level

Coding
Decision Support
rules

Indexing

Pragmatics &
User Interface

Slide No.: 5
OpenGALEN
9) Interface of EHR, Messaging &
Decision Support
Patient Specific
Records
Information Model
(Patient Data Model)
Inference Model
(Guideline Model)
Dynamic Guideline
Knowledge
Concept Model
(Ontology)
Static Domain
Knowledge
Significant Research Topic Now
Slide No.: 6
OpenGALEN
Uses of Terminology

Clinical




Software





Slide No.: 7
Epidemiology and quality assurance
Reproducibility / Comparability
Indexing
Re-use !
Integration and Messaging between systems
Authoring and configuring systems
Data capture and presentation (user interface)
Indexing information and knowledge (meta-data, The Web)
OpenGALEN
An Old Problem
“On those remote pages it is written that animals are divided into:
a. those that belong to the Emperor
b. embalmed ones
c. those that are trained
d. suckling pigs
e. mermaids
f. fabulous ones
g. stray dogs
h. those that are included in this classification
i. those that tremble as if they were mad
j. innumerable ones
k. those drawn with a very fine camel's hair brush
l. others
m. those that have just broken a flower vase
n. those that resemble flies from a distance"
From The Celestial Emporium of Benevolent Knowledge, Borges
Slide No.: 8
OpenGALEN
History:
Origins of existing terminologies

Epidemiology

ICD - Farr in 1860s to ICD9 in 1979
—

International reporting of morbidity/mortality
ICPC - 1980s
—
Clinically validated epidemiology in primary care
 Now expanded for use in Dutch GP software

Librarianship



MeSH - NLM from around 1900 - Index Medicus & Medline
EMTree - from Elsevier in 1950s - EMBase
Remumeration

ICD9-CM (Clinical Modification) 1980
—
Slide No.: 9
10 x larger than ICD; aimed at US insurance reimbursement
OpenGALEN
Traditional Systems

Built by people for interpretation by people (Coding clerks)

Most knowledge implicit in rubrics
—
Must understand medicine to use intelligently
 Not built for software

On paper for use on paper

Enumerated - top down all possibilities listed
—

Serial - Single use - Single View
Hierarchical Thesauri

Traditional terminological techniques from librarianship
—
‘Broader than’ / ‘Narrower than’ (ISO 1087)
 no logical foundation

Focused on ‘terms’

Language and concepts mixed
—
Slide No.: 10
Synonyms, preferred terms, etc caused confusion
OpenGALEN
History (2)

Pathology indexing

SNOMED 1970s to 1990 (SNOMED International)
—
First faceted or combinatorial system
 Topology, morphology, aetiology, function
 Plus diseases cross referenced to ICD9

Specialty Systems

Mostly similar hierarchical systems
—
—
—
Slide No.: 11
ACRNEMA/SDM - Radiology
NANDA, ICNP… - Nursing
…
OpenGALEN
History (3)

Early computer systems

Read I (4 digit Read)
—
Aimed at saving space on early computers
 1-5 Mbyte / 10,000 patients
—
Hierarchical modelled on ICD9
 Detailed signs and symptoms for primary care
 Purchased by UK government in 1990
—
Single use
 Morbidity indexing

Medical Entities Dictionary (MED)
—
Slide No.: 12
Jim Cimino
OpenGALEN
History (4)

Aspirations for electronic patient records (EPRs)

Weed’s Problem Oriented Medical Record
—

Aspirations for decision support


Ted Shortliffe (MYCIN), Clem McDonald (Computer based
reminders), Perry Miller (Critiquing),..
Aspirations for re-use


Direct entry by health care professionals
Patient centred information
Needed common multi-use multi-purpose terminology

Slide No.: 13
None worked
OpenGALEN
Motivations and Business Models

Remuneration



Public Health Reporting



Read 1-3, SNOMED-RT/CT
ICPC – International Classification of Diseases in Primary Care
Indexing publications



ICD9/10
Clinical Recording


ICD9/10-CM in US for insurance and medicare for diseases
Clinical Procedures Terminology (CPT) for surgical procedures
MeSH – Medical Subject Headings - Basis of indexing MedLine/PubMed
EMTree – basis of indexing EMBASE
Support for applications and decision support

Slide No.: 14
GALEN
OpenGALEN
Summary of Changes at end of 1st
Generation

From terminologies for people to terminologies for machines

From paper to software

From single use to multiple re-use for patient centred systems

From entry by coding clerks to direct entry by health care
professionals

From pre-defined reporting for statistics to reliable indexing
for decision support
Slide No.: 15
OpenGALEN
Changes at end of first generation

From models of USE to models of MEANING

But tended to lose the model of use
—
Slide No.: 16
The goal of “useful and usable systems” lost
OpenGALEN
Problems with
‘First Generation’
Enumerated Systems
in coping with these changes
Slide No.: 17
OpenGALEN
Problems (1)

Scaling !!!


More detail and more specialities required scaling up, but...
The combinatorial explosion

Example: Burns:
—
100 sites x 3 depths  404 codes
 5 subsites/site x chemical or thermal  7272
– x 3 extents x 3 durations  116,352

‘The Persian chessboard’
—
264  1019
 1019 grains of rice  100 billion tonnes of rice
 1019 nanoseconds  10,000 years

Read II grew from 20,000 to 250,000 terms in ~100 staff-years
—
still too small to be useful
 but too big to use
Slide No.: 18
OpenGALEN
Benefits

Avoid the “Exploding Bicycle”
From “phrase book” to “dictionary + grammar”
Tame combinatorial explosions




1980 - ICD-9 (E826) 8
1990 - READ-2 (T30..) 81
1995 - READ-3 87
1996 - ICD-10 (V10-19) 587
—

and meanwhile elsewhere in ICD-10
—
—
Slide No.: 19
V31.22 Occupant of three-wheeled motor vehicle injured in collision with
pedal cycle, person on outside of vehicle, nontraffic accident, while working
for income
W65.40 Drowning and submersion while in bath-tub, street and highway,
while engaged in sports activity
X35.44 Victim of volcanic eruption, street and highway, while resting,
sleeping, eating or engaging in other vital activities
OpenGALEN
Problems (2)

Information implicit in the rubrics

“Hypertension excluding pregancy”
—
Computers can’t read!
 Invisible to software

No explicit information except the hierarchy
—
—

Minimal support for software
No opportunity to use softwre to help
Language and concepts confused




Slide No.: 20
Synonyms
Preferred terms
Homonyms
Only simple look up and spelling correction
OpenGALEN
Problems (3)

Mixed Organisation

‘Heart diseases’ in 13 of 19 chapters of ICD
—

‘Steroids’ in five chapters of standard drug classifications
—

Anti-inflammatories, anthi-asthmatics, …
Unreliable for indexing or Abstractions
—

Tumours, infections, congenital abnormalities, toxic, …
How to say something about ‘all heart diseases’?
Fixed organisation

Single hierarchy - Single use
—
Where to put ‘gout’ - arthritis or metabolic disease?
 Back and forth in each edition of ICD
—
Slide No.: 21
No re-use
OpenGALEN
Problems 3b
Thesauri rather than Classifications
A Mixed Hierarchy
o rg a n
} k in d
h e a rt
} p a rt
h e a rt v a lv e
a o rtic v a lv e
} k in d
} p a rt
a o rtic v a lv e cu sp
A correct kind-of (subsumption) hierarchy
d iso rd e r o f o rg a n
d iso rd e r o f h e a rt
d iso rd e r o f v a lv e in h e a rt
d iso rd e r o f a o rtic v a lv e in h e a rt
d iso rd e r o f cu sp in a o rtic v a lv e in h e a rt
Slide No.: 22
OpenGALEN
Problems (4)

‘Semantic identifiers’


Codes really paths - moving a concept meant changing its code
3 Cardiovascular disorders
…
3.4 Disorders of Artery
...
...
3.4.2 Disorders of coronary artery
...
…
3.4.2.3 Coronary thrombosis
…

Easy to process but...


Slide No.: 23
Reorganisation requires changing codes
Codes cannot be permanent
OpenGALEN
Problems (5)

Maintenance




20 Years from ICD9 to ICD10
~100 person-years from Read 1 to Read 3
Mega francs/guilders/crowns/marks on European coding schemes
Thousands of unpaid hours of committee time
—
Impossible / meaningless decisions take longest
 You can search forever for something that is not there
—
Multiple uses compete  Must choose one use
– Most successful were clear about their purpose - ICD, ICPC, MeSH

Codes change meaning with version changes

Slide No.: 24
Old data misleading!
OpenGALEN
Problems (6)

Version specific artefacts

“Not otherwise specified” (NOS)
—

Used to move a general concept ‘down’
Not elsewhere classified (NEC)
—
Catch all - Nowhere else in coding system e.g. ‘Tumour not elsewhere
classified’
 dependent on version,

“Other”
—
Catch all - Not listed below, e.g. “Other diseases of the cardiovascular
system”
 dependent on version

Not used consistsently
Slide No.: 25
OpenGALEN
Problem (7): Language is slippery:
Two hands or Four?
Slide No.: 26
OpenGALEN
Language/Concepts are slippery

Human cognition makes it look easy

Logic fails to capture it
—
Classification is easy until you try to do it
 Trying since Aristotle in the West and Ancient Chinese in the East

Words/Concepts mean what a community decides they mean




Does a chimpanzee have four hands?
Is a prion alive?
Is surgery on the ovary a kind of ‘Endocrine surgery’?
Easier to agree on the concrete than the abstract

Easy to agree on useful abstractions and generalisations
—
Slide No.: 27
Harder to agree on how to name them
OpenGALEN
Problems (8)

There is no re-use - there is no standard

The ‘grand challenge’: A common controlled vocabulary for medicine
—

But re-use requires multiple different views
People’s needs differ / People do and find different things
—
By profession
 Doctors and specialties, nurses, physiotherapiests, dentists…
—
By situation
 Inpatient, outpatient, primary care, community…
—
By task
 Diagnosis, management, prescribing,
 patient care, public health, quality assurance, management, planning
—
By country and community
 US, UK, France, Germany, Japan, Korea, ...
Slide No.: 28
OpenGALEN
Summary of Problems
1st Generation Enumerated Systems

Enumerated Single Hierarchies

List all possibilities in advance
—

Most knowledge implicit
—

Unreliable for indexing
Difficult to use for healthcare professionals


Invisible to software
Can’t agree on common concepts and classification
—

Cannot cope with fractal knowledge
No support for user interface
Can’t build and maintain big classifications

Slide No.: 29
Language and concepts don’t translate easily to logic and software
OpenGALEN
Cimino’s Desiderata (1)

Concept orientation


Concept permanence


Never re-use a code (‘retire’ it)
Nonsemantic concept identifiers


Separate language (terms) and concepts (codes)
Separate the code from the path
Polyhierarchy

Allow one concept to be classified in multiple ways
—
Slide No.: 30
Gout can be both a metabolic disease and an arthritis
OpenGALEN
Cimino’s Desiderata (2)

Formal Definitions


Reject ‘Not elsewhere classified’


concept permanence and NEC
Multiple granularities




i.e ‘Be compositional’
Organ, tissue, cellular, molecular
Grades, types, classes of diseases
Special clinical criteria
Multiple consistsent views

Allow different organisations
—
Slide No.: 31
e.g. functional, anatomical, pathological
OpenGALEN
Cimino’s Desiderata (3)

Represent context


Evolve gracefully


Family history, risk, source of information
Allow controlled changes
Recognise redundancy (equivalence)

‘Carcinoma’ + ‘Lung’ ?=? ‘Carcinoma of the lung’
—
Slide No.: 32
How would we know?
 How could a machine know?
OpenGALEN
Solution 0: You are worrying about the
wrong problem

International Classification of Primary Care (ICPC)

Slide No.: 33
Focus on repeatability and quality across languages for a small (<2000)
number of codes
OpenGALEN
Solution Generation 1
Megaterm + Crossmapping = UMLS
Coding &
Classification
Decision
support
Clinical Applications
MeSH
ACRNEMA
UMLS
Medical
Records
SNOMED
Axes
MEGATERM
ICPC
Data entry
READ
OPCS
ICD-9
ICD-10
OpenGALEN
Cross mapped and typed
terminologies & vocabularies
Slide No.: 35
OpenGALEN
The UMLS Knowledge Sources

Metathesaurus


Language resources


Cross mappings
NORM – stemming and term recognition
UMLS Semantic Net

170 types attached to categorise concepts
—
Slide No.: 36
Disease, anatomical part, micro-organism, etc.
OpenGALEN
Slide No.: 37
OpenGALEN
Solution 1 Cross-mapping & UMLS

Unified Medical Language System (UMLS) from US National
Library of Medicine

Defacto common registry for vocabularies
—
Concept Unique Identifiers (CUIs) and Lexical Unique Identifiers (LUIs)
are defacto the common nomenclature
 NB must use a CUI + LUI to get unique identification

Licence terms
—
—
—
Slide No.: 38
Class I – free for use
Clsass III – heavily restricted
(Class II – almost nonexistent)
OpenGALEN
Solution 1 Cross-mapping & UMLS

An invaluable resource, but...

No better than the vocabularies which are mapped
—
—
—

Limited detail for patient care
Unreliable for indexing or abstraction of knowledge
Best for relating everything to MeSH for indexing literature
Still limited by combinatorial explosion
—
Still can’t cope with fractal knowledge
 Not extensible - no help in building or extending terminologiese
 No help in reorganising existing terminologies to re-use for new purposes
 Top down

Information still implicit
—

Slide No.: 39
Minimal help with software
No help with data capture, user interfaces
OpenGALEN
Solution IIa: Build what you need as
you need it

LOINC – dominant coding system for laboratory systems
(“Logical Observation Identifiers Names and Codes”)
http://www.loinc.org/

Clinical LOINC contains increasing amounts of clinical
references


Fully Class I included in UMLS
Closely linked to HL7 and HL7 vocabulary committee
Slide No.: 40
OpenGALEN
Slide No.: 41
OpenGALEN
Build and Control what you need only

HL7 Messaging standard



Controls the codes that hold messages together
Uses codes from elsewhere as ‘payload’
See www.hl7.org
—
—
Slide No.: 42
(Possib ly the world’s worst web site)
Some material members only
OpenGALEN
Solutions Generations 2-3
Compositional Systems

Beat the combinatorial explosion

Build concepts out of pieces - leggo
—
Dictionary and grammar rather than phrasebook
 But hard
Slide No.: 43
OpenGALEN
Solution Generation 1.5: Faceted

Faceted systems: SNOMED International
—

Inflammation + Lung + Infection + Pneumococcus  Pneumoccal pneumonia
Limit combinatorial explosion, but…


Rigid - a limited number of axes / facets / chapters
Each facet has the problems of a first generation enumerated system
—

Much knowledge still implicit
No way to know how identifiers relate
—
—
—
No explicit relations, only ‘+’
No way to recognise redundancy / equivalence
No help with data capture or user interface / No way to recognise nonsense
 Carcinoma + Hair + Donkey + Emotional  ????

Still can’t cope with fractal knowledge
—
Limited extensibility: limited help with building, extending or reorganising
 Still Top Down
Slide No.: 44
OpenGALEN
Generation 2: Enumerated
Compositional

Read III with qualifiers
—

Inflammation: site: lung, cause: pneumococcus  Pnemococcal Pneumonia
More semantics but…



Limited qualifiers - limited views - limited re-use
Limited help with data capture - User interface difficult
Much information still implicit - limited software support
—
—


No way to recognise redundancy / equivalence / errors
Organisation still mixed - indexing better but still unreliable
Limited separation of language and concepts
Still can’t cope with fractal knowledge
—
Limited extensibility; limited help with building and reorganising terminologies
 Top down
Slide No.: 45
OpenGALEN
Logic Based Ontologies: The basics
Primitive skeleton
Descriptions
Definitions
Reasoning
Validating
Thing
Feature
Structure
pathological
Heart
Thing
red
+ feature: pathological
MitralValve
Encrustation
* ALWAYS partOf: Heart
* ALWAYS feature: pathological
Structure
+ feature: pathological
+ involves: Heart
red
+ partOf: Heart
Encrustation
+ involves: MitralValve
+ (feature: pathological)
Slide No.: 46
OpenGALEN
CT Vocabulary

“Reference Terminology” vs “Interface Terminologies”


Reference terminology = enumerated hierarchy of formally defined
terms
Interface terminology = navigation structure for user interface
—

Explicitly excluded from SNOMED-RT
“Terming”, “Coding”, and “Grouping”



Slide No.: 47
Terming - finding the lexical string
Coding - finding the correct unique code (concept)
Grouping - putting codes into groupers for epidmiological or other
purposes
OpenGALEN
Generation 2.5 Pre-coordinated
Formal Compositions

SNOMED-CT

Formal collaboration between College of American Pathologists
(CAP/SNOMED) and NHS
—
—


Formal logical model for classifying a fixed list of definitions
Simple fixed ontology (7 links)
Now officially adopted and probably available for both NHS and
related academic uses
GALEN derived terminologies


Slide No.: 48
UK Drug Ontology
Procedure classifications
OpenGALEN
Generation III

Fully compositional post coordinated

Not yet in use or fully available
—
GALEN-like
 Will probably arrive with Semantic Web
Slide No.: 49
OpenGALEN
Other Key Resources

Anatomy

Digital Anatomist Foundational Model of Anatomy
—
University of Washington (http://sig.biostr.washington.edu/projects/da/)
 Comprehensive model of STRUCTURAL anatomy
 Transformed into formal representation in Freiburg
– Feasibility rather than production

Mouse
—

Bioinformatics




GO - The Gene Ontology
MGED – Mircroarray Gene Expression Data
OMIM – Online Mendelian Inheritance in Man
Drugs



The Edinburgh Mouse Atlas Project (http://genex.hgu.mrc.ac.uk/)
Proprietary databases – First Databank, Micromed
UK Drug Dictionary (UKCPRS)
National Cancer Institute CaCore Ontologies
Slide No.: 50
OpenGALEN
Current Status (1)

UMLS is the central coordinating force

Any terminology needs links links to CUIs and LUIs
—


Many people using CLASS I licensed terms only
Links to MeSH and PubMed
ICD9/10-CM used for reporting of diseases for insurance and
Medicare in the US

ICD-10 used for official reporting in UK

CPT and OPCS used for reporting of procedures in US and UK
respectively

SNOMED-CT purchased by US and mandated in UK

Slide No.: 51
As yet few convincing
OpenGALEN
Current Status (2)

ICPC widely used in in primary care on continent, especially
in the Netherlands

LOINC used for lab systems; HL7 for messaging

Variants of SNOMED used for pathology many places

Many specialist systems

SNOMED-DICOM-Microglossary (SDM) for imaging
—


Unrelated to SNOMED
Several nursing systems
A variety of open source resources appearing
Slide No.: 52
OpenGALEN
Current Status (3)

Commercial world dominated by proprietary systems


Slide No.: 53
MedCin
All based on “Model of Use”
OpenGALEN
The Semantic Web and OWL

“Ontologies” – fancy word for terminologies


Means many things to many people
W3C has produced a standard language for compositional “logic based”
ontologies, OWL

“OIL” + “DAML”  “DAML+OIL”  “OWL”
—
—
—

See oiled.man.ac.uk
See www.co-ode.org
See http://www.w3.org/2001/sw/WebOnt/
Rapid proliferation of open source tools and resources


Slide No.: 54
No longer a biomedical problem only
Serious computer scientists finally involved
OpenGALEN
Descargar

No Slide Title