i2b2
Clinical Research Chart
and Hive Architecture
Henry Chueh
Shawn Murphy
Isaac Kohane, PI
i2b2 National Center for Biomedical Computing
Summary
•
•
•
•
Background
Intro to the Clinical Research Chart (CRC)
Hive / Cell Software Architecture
More details on establishing and using the
CRC
i2b2 National Center for Biomedical Computing
Background
• Clinical documentation is…clinical
• Lack of systematic approach for
organizing clinical data for research
• Ownership issues are unique
• Consent issues are a challenge
i2b2 National Center for Biomedical Computing
Driving Biological Projects
•
•
•
•
Asthma
Hypertension
Huntington’s Disease
Diabetes
i2b2 National Center for Biomedical Computing
Clinical Research Chart (CRC)
• Organize and transform clinical data to
maximize its utility for research
• Develop an Application and Database
framework to serve this goal
• Establish an architecture that allows data
from different studies done on this platform
to be integrated
i2b2 National Center for Biomedical Computing
Design of Clinical Research Chart
clinical
trials
Services:
Ontology
HL7 MSH|^/&|736401…..
PID|102|3231285.….
Consent/Tracking
Application Pool
Management
Soap/Http interfaces
Data flowing
CRC DB
Text files
Custom Interfaces
A program
XML <Patient1>
<image>.….
database
i2b2 National Center for Biomedical Computing
Design of Clinical Research Chart
clinical
trials
Data pipeline/workflow application
Services:
Ontology
HL7 MSH|^/&|736401…..
PID|102|3231285.….
Pheno/Genotype
Database
Consent/Tracking
Application Pool
Management
Soap/Http interfaces
Data flowing
CRC DB
Text files
Custom Interfaces
A program
XML <Patient1>
<image>.….
database
Visualization and Analysis of database contents
i2b2 National Center for Biomedical Computing
i2b2 Skeletal Data Flow
EDC
applications
EDC
Service
Shared
data
Enterprise
data source
(RPDR)
i2b2 ETL
workflow
Annotation
Service
Clinical
Research
Chart
Study
specific
data
Annotation
UI
Enterprise Systems
Registration, ADT, Labs,
Reports, Clinical Notes, etc
Local Systems
Systems not gathered into
Enterprise data warehouses
i2b2 National Center for Biomedical Computing
Analytic
workflow
Overall Themes
• Framework to allow development of
application services in a maximally
decoupled fashion.
• Linux and Windows OS support
• Java and C++ programming languages
• Use Cases for construction of CRC come
from Driving Biology Projects and
experience with clients of Partners
Research Patient Data Registry
i2b2 National Center for Biomedical Computing
Focus on Workflow
• Necessary for both pre-CRC and postCRC processes
• Needed for scientific flexibility
• Implies a consistent environment for data
pipelining and flow control
i2b2 National Center for Biomedical Computing
i2b2 Hive
• Formed as a collection of interoperable
Cells, or services
• Loosely coupled
• Makes no assumptions about proximity
• Connected by Web services
• Activated by a workflow engine that forms
basis of choreography among Cells for
complex interactions
i2b2 National Center for Biomedical Computing
Complex choreography
i2b2 National Center for Biomedical Computing
i2b2 Cell
• Behaves as a functional service
• Separates interactions conceptually into
transactions and semantics
• Focuses on facilitating transactions with
simple semantics (e.g., datatype)
• Leaves deep semantics to be defined by
the services provided by a Cell
• Does not restrict language implementation
i2b2 National Center for Biomedical Computing
Target layer for i2b2
Semantic Objects
I2b2 platform
Web Services
TCP/IP
i2b2 National Center for Biomedical Computing
Cell examples
• Concept extraction from clinical narratives
• Simple transformations; e.g., basic text
format conversion
• Complex encoding; e.g., encoding MIAME
in MAGE
• Microarray data normalization
• …
i2b2 National Center for Biomedical Computing
Exposing Cells
• Protocols layered on top of SOAP
• At the WSDL level for integrators; ie,
bioinformaticians & software engineers
• At a functional level for investigators
• i2b2 toolkits to allow integrators to expose
controlled functionality to investigators
(Automator)
i2b2 National Center for Biomedical Computing
Automator Approach
informaticians
Extend Kepler workflow engine
investigators
i2b2 Automator
i2b2 National Center for Biomedical Computing
Bird’s eye view
Investigator Portal
Workflow engine
CRC
Repository
i2b2 National Center for Biomedical Computing
Current Implementation
• Extending Kepler workflow engine for i2b2
• Data model for CRC repository
• Defining protocols necessary for
interaction (in addition to SOAP)
• Created Cell for concept extraction from
narratives
• Early designs for Automator toolkit
i2b2 National Center for Biomedical Computing
i2b2 Architecture Key Points
• Leverage existing workflow standards and
software
• Use Web services as basic form of
interaction
• Assume unlimited choreography, but…
• Provide tools to distill complexity into basic
automation for clinical investigators
i2b2 National Center for Biomedical Computing
SW Licensing and Distribution
• Commit to Open Source software
• Use GNU Lesser General Public License
• Establish local i2b2 repository exposed
through i2b2 website
• Contribute to a more global NCBC
SourceForge style repository if it emerges
?NIH Forge
• Keep i2b2 protocols fully open
i2b2 National Center for Biomedical Computing
Interoperability across NCBC
• Strongly consider Web services as basic
protocol for generic shared interactions
• Consider sharing datasets
• Promote diversity of approach and use of
shared software (don’t impose uniformity)
• Facilitate/promote NCBC Open Source
project teams
i2b2 National Center for Biomedical Computing
Pre-CRC Data
Pipeline/Workflow
Populating the Clinical Research
Chart (CRC)
i2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/Workflow
• Use workflow framework to choreograph
applications services in specific
sequences
• Used to extract, transform, conform, and
load data and metadata into the CRC
i2b2 National Center for Biomedical Computing
Pre-CRC Data Pipeline/Workflow
Services:
Ontology
Consent/Tracking
Application Pool
Management
Soap/Http interfaces
Input
Output
Data flowing
Local or through SOAP service
Custom Interfaces
A program
increasingly useful
i2b2 National Center for Biomedical Computing
Ontology Service
Ontology
Consent/Tracking
Application Pool
Management
• Manages mappings of terms to common vocabularies
• Provides lists of acceptable (enumerated) values for
various attribute and value slots.
• Allows for management of hierarchies, groupings, and
relationships between terms
i2b2 National Center for Biomedical Computing
Person Consent/Tracking Service
Ontology
Consent/Tracking
Application Pool
Management
• Provides mappings between patient/subject identifiers
• Tracks patient/subject consent information
• Allows identification of the patient/subject based upon
fuzzy demographic matches
i2b2 National Center for Biomedical Computing
Application Pool (CVS) Service
Ontology
•
•
•
•
Consent/Tracking
Application Pool
Management
Stores programs/scripts used in pipeline
Provides applications to be downloaded when needed
Manages versioning of software
Provides documentation
i2b2 National Center for Biomedical Computing
Management Service
Ontology
•
•
•
•
•
Consent/Tracking
Application Pool
Management
Stores workflow execution plan
Starts and controls workflow execution
Schedules workflow execution
Monitors workflow execution and data locations
Controls permissions associated with workflow
execution
i2b2 National Center for Biomedical Computing
Data Pipeline/Workflow Application
Use Case for Asthma Data
Services:
RPDR
Ontology
Consent/Tracking
Application Pool
Management
Soap/Http interfaces
Data flowing
Input
Output
Custom Interfaces
CRC DB
A program
AsthmaMart
Data retrieval
Language processing
Data de-identification
Load Data into Mart
Vocabulary matching
i2b2 National Center for Biomedical Computing
Data Pipeline/Workflow
Implementation
• Define standard XML
representation for
workflow - MoML
• Define standards for
SOAP services and
resource discovery
• Adopt and extend
open source workflow
package (Kepler)
• Prototypes by July
timeframe
• BIRN -> NAMIC and
LONI collaboration
• Can follow construction details at
http://diagon/i2b2
i2b2 National Center for Biomedical Computing
Phenotype/Genotype
Database
i2b2 National Center for Biomedical Computing
Phenotype/Genotype Database
Principles
• Analytical database schema that does not
need to change with new data types and
concepts
• Defined fundamental unit of data (atomic
fact) = observation
• Defined metadata strategy
• Various levels of de-identification
(reviewed and approved by IRB)
i2b2 National Center for Biomedical Computing
Phenotype/Genotype Database
Architecture
visit_dimension
PK
PK
Encounter_Id_e
Patient_Id_e
InOutpt_Cd
Location_Cd
Start_Date
End_Date
Visit_Blob
Update_Date
Download_Date
Import_Date
Sourcesystem_Cd
concept_dimension
PK
Concept_Path
Concept_Cd
Name_Char
Concept_Blob
Update_Date
Download_Date
Import_Date
Sourcesystem_Cd
(see preprint)
observation_fact
PK,FK2
PK,FK1,FK2
PK
PK
PK
PK
PK
PK
Encounter_Id_e
Patient_Id_e
Concept_Cd
Provider_Id
Start_Date
ValType_Cd
TVal_Char
NVal_Num
patient_dimension
PK
Patient_Id_e
Vital_Status_Cd
Birth_Date
Death_Date
Sex_Cd
Age_In_Years_Num
Language_Cd
Race_Cd
Marital_Status_Cd
Religion_Cd
Zip_Cd
StateCityZip_Path
Patient_Blob
Update_Date
Download_Date
Import_Date
Sourcesystem_Cd
ValueFlag_Cd
Quantity_Num
Units_Cd
End_Date
Confidence_Num
Observation_Blob
Update_Date
Download_Date
Import_Date
Sourcesystem
provider_dimension
PK
Provider_Path
Provider_Id
Name_Char
Provider_Blob
Update_Date
Download_Date
Import_Date
Sourcesystem
i2b2 National Center for Biomedical Computing
Phenotype/Genotype Database
Use Case
• Smoking observations represented in
database
Provider_id
Provider_path
Name_char
M0022303
MGH\Neurology\M0022303
M0022303
Concept_cd
Concept_path
Name_char
CT-A-SMK
AsthV1\DRptNLP\Tobacco Use\Smoker
Smoking
IC9-3051
V2\Diagnosis\Mental Disorders (290-319)\Nonpsychotic disorders (300-316)\(305) Nondependent
abuse of drugs\(305-1) Tobacco use disorder\(30511) Tobacco use disorder, co~
Tobacco Use
Disorder,
continuous use
CT-A-NSK
AsthV1\DRptNLP\Tobacco Use\Non smoker
Never smoked
Patient_id_e
Concept_cd
Start_date
Provider_id
Confidence_num
Z234
CT-A-SMK
1/1/1997
M0022303
3
Z234
CT-A-SMK
1/1/1998
M0034125
9
Z234
IC9-3051
1/1/2001
M0022303
3
Z234
CT-A-NSK
1/1/2002
M0034125
9
Patient_id_e
Birth_date
Sex_cd
Race_cd
Z234
3/4/1924
Female
Black
i2b2 National Center for Biomedical Computing
Death_date
4/5/2003
Phenotype/Genotype Database
Implementation
• Asthma CRC DB “primed” with data from 90,000
patients from Research Patient Data Registry
• Serves as fundamental data structure for i2b2
supported data Querying and Visualization
Application Suite
• CRC DB’s able to fuse seamlessly together
• Various levels of de-identification to be
supported for data sharing and publication
i2b2 National Center for Biomedical Computing
Visualization and Analysis
of CRC database
Post-CRC workflow
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Principles
• Supported application suite to query and
view CRC database contents
• Outside applications for analysis and
viewing able to plug in to application suite
• Pipeline/Workflow framework may be used
for analysis and re-entry of derived data
into CRC database
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture
• Supported Applications, Querying and
Visualization
– Standard querying
– Data exploration
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture
• Supported Applications, ontology
management
– Ontology Management
i2b2 ontology management
File
Edit
mapping
transform
explain
provenance
Counts mapping
Total
2004
2005
10,124
5,066
5,058
You have picked “seizure disorder”
• Integrate (outside?) population analysis
applications
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture
• Supported applications have plug-in
architecture for outside analytic tools:
– Standard web-link support with GET and
POST oriented data transfer
– Support transfer of specifically transformed
data to outside applications
– Complex analysis supported with workflow
application
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture - Query
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture - Exploration
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Architecture – Ontology mgmt
i2b2 ontology management
File
Edit
mapping
transform
explain
provenance
Counts mapping
Total
2004
2005
10,124
5,066
5,058
You have picked “seizure disorder”
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Use Case
i2b2 National Center for Biomedical Computing
Visualization and Analysis
Implementation of analysis tools
• Workflow framework to accommodate
external analytic applications
patient id 0000004
SNOMED CODE
SN8745
PA5683
SN8745
SN8745
subject id 4
ProgID CA2.3
subject id 4
ProgID CX2.3
ProgID AA3.3
CRC DB
ProgID CN2.3
ProgID SN5.4
account # 347
ProgID PN5.1
ProgID TH3.0
i2b2 National Center for Biomedical Computing
ProgID XN0.9
Final Assembly
person
concept
Z5937X
Z5937X
Z5937X
Z5937X
Z5937X
Z5956X
Z5956X
Z5956X
Z5956X
Z5956X
Z5956X
Z5956X
Z5956X
Surgery
ER visit
Trauma
Gene-Chips
Seizure
Gene-Chips
Seizure
Alzheimer’s
Diabetes
CT Scan
Hemorrhage
Trauma
Thalamus
date
3/4
3/4
3/4
3/4
4/6
5/2
5/2
5/2
5/2
3/9
3/9
3/9
3/9
raw value
microarray
(encrypted)
Gene expression in APOE e4 Allele
Outcomes calculated every week
Alzheimer's
Seizures
ER visits
Clinic visits
Trauma
Surgery
Multiple sclerosis
microarray
(encrypted)
statistics
application
server
i2b2 National Center for Biomedical Computing
population
registry
database
ownership
manager
encryption
Descargar

Slide 1