Dracones: Web-Based Mapping and
Spatial Analysis for Public Health
Surveillance
Christian Jauvin
David Buckeridge
McGill University
Summary
 Dracones:
 Built with MapServer/PostGIS
 We'll be covering:
 Public Health context
 Software architecture
 Some specific problems
Public Health - Two Perspectives
 Case management
 Individual cases of notifiable diseases
 Relationship networks
 Population surveillance
 Larger risk patterns
Case Management
 Questions/problems:
 Is a case due to recent transmission?
 If so, does the case share any feature with
other, recent cases?
 Ways it's being done:
 Investigations/interviews
 Meeting with other investigators
Population Surveillance
 Questions/problems:
 Are more cases happening than expected?
 Does an excess suggest ongoing
transmission in a specific region?
 Way it's being done:
 Semi-automated routine temporal and spacetime statistical analysis (SaTScan)
Montreal DSP
 Département de santé publique de Montréal
(Public Health Agency)
 Need: incorporate spatial data + analysis
capabilities within workflow
 One reason: research shows that spatial
information helps
 Answer: Dracones project
 Funded in part by GeoConnections
 Led by David Buckeridge, MD, PhD
 15 month contract
Case Management at the DSP
 Current Situation
 Information on paper
entered into system (Oracle
DB + Forms)
 System contains sensitive
data (names, addresses)
 Limited tools for analyzing
case data
 Project Goal
 Capture spatial data
 Visualize and analyze
spatial distribution of cases
Population Surveillance at the DSP
 Current Situation
 Routine temporal and
space-time statistical
analysis
 Capacity to visualize
time-series but not
maps
 Project Goal
 Add mapping capacity
 Extend range of
analytic methods
Why Location Matters - Case
Management
 If you are studying a case of a certain
disease that was just declared
 It is harder to picture the situation by
looking at something as this..
Why Location Matters - Case
Management
Why Location Matters - Case
Management
 Than by looking at this..
Why Location Matters - Case
Management
Why Location Matters - Population
Surveillance
 If you are studying the spatial distribution
of a set of disease clusters
 This would seem more difficult..
Why Location Matters - Population
Surveillance
Why Location Matters - Population
Surveillance
 Than this..
Why Location Matters - Population
Surveillance
Development Process
 Management Team
 Led by public health MD with informatics
training
 Members from each area of DSP involved
 User Involvement
 Users on management team
 Input throughout requirements, design,
development
Software Required
and Our Choices
Software Type Required
Our Choice
~GIS
MapServer
General + Spatial DB
PostgreSQL + PostGIS
Cartography-enabled client
HTML/Javascript
Analytical / statistical tools
SaTScan, R, Python
Web Architecture Benefits
 Usually lighter/simpler technologies
 Cross-platform
 Ease of deployment and integration
 Builds on existing set of conventions and
behaviours
System Architecture
Dracones
Current Case Management System
Python
{R
SaTScan
Web client
Oracle Forms
+ PHP
{ Apache
MapServer + MapScript
Bridge
Oracle DB
PostgreSQL/PostGIS DB
Client Side - UI
 UI is 100% Javascript (ExtJS library)
 Future project: extract the mapmanipulation parts:
 Tile-based panning
 Zooming
 Layer activation
And releasing them under an OS license
Client Side - Functions
 From the results of a query performed in
the Oracle client, launch the application to
visualize the results
 Inspect those results by varying certain
parameters
 Launch external analysis tools
Server Side - MapServer
 MapServer: OS tool that add geospatial
content to web applications
 Can be used as a CGI
 Interface with many programming
languages
 Works very closely with PostGIS
Server Side - MapServer
 MapServer with Apache 2.2, using PHP5
 Linux and Windows
 Since it's stateless, each interaction:
 Build a map object from a base mapfile
 Modify the map object (according to client
parameters)
 Return rendered map as a file to the client
(that will display it)
MapServer - Layers
 A map object is made of layers
 A layer can be loaded from a shapefile
(ESRI open format), that specifies its
geometry
 Or it can be loaded directly from a
PostGIS table
PostGIS
 PostGIS: spatial extension for
PostgreSQL
 Adds geometry types (points, lines,
polygons, etc)
 Spatial functions and operators (distance,
convex hull, intersection, etc)
 Spatial indexes
PostGIS
 Queries that mix spatial and non-spatial
aspects of the data
 If you have a case table:
case_id
condition
region_id
1
TB
10
2
Gastro
20
PostGIS
And a region table:
region_id
name
geom
10
Centre-Sud
POLYGON(…)
20
Hochelaga
POLYGON(…)
PostGIS
You can then build a query like this:
SELECT * FROM case, region
WHERE case.condition = 'TB'
AND case.region_id = region.id
AND within(region.geom,
GeomFromText('POLYGON(…)')
PostGIS
 A MS layer can be built simply by adding a
connection attribute, pointing to the PG
table (two lines really!)
 Shapefile and table sources can be mixed
Analysis Tools - SaTScan
 Requirement: interfacing with analysis
tools
 SaTScan: detection of space-time clusters
 Scan for areas where the probability of
being a case is significantly higher than
being a non-case
Analysis Tools
 Since it's a command-line tool without an
open API, we use Python to run it, parse
the results and plot them using MapServer
 We do the same for some external R
routines
System Data Sources
 Health data
 Reportable disease database
 Ancillary data on contacts
 Geographical data
 Street networks and postal code file
 Health regions, census, postal boundaries
Using Address Data from a Public
Health Database
 Problem: addresses are stored as
character fields:
Address: 1500-a Sherbroooke St. Ouest
 No validation at the entry point
 Data quality is compromised
Two Problems with Address
Processing
 The addresses need to be parsed, and
possible (and numerous) transcript errors
and ambiguities must be solved
 The ones which refer to a same place
must be identified and treated as a unique
object
Possible Solutions
 These could be solved in a more SQLintegrated manner: edit distance module
for PG (?)
 We decided however to go the procedural
way (using Python)
Address Validation Algorithm Requirements
 A database with (1) the street network
geometry
 (2) the street segment address ranges
 And (3) the postal code geometry and
street range association
Address Validation Algorithm
So you will know for instance that:
H2X2T1
1001
998
2001
H2X2T2
1998
3001
2998
Address Validation Algorithm Steps
 Parse the text addresses in 3 tokens:
 {S#, SN, PC}
 For each triplet:
 Try to find an exact match, by being tolerant
on SN (maximum coverage, edit distance..)
 By being tolerant on SN, try to vary PC
 Idem with SN, fix PC and vary S#
Address Validation Algorithm Batch Results
 By doing a batch analysis of the DSP data
(105K records), we found that:
 84% of the address records were "exact"
 14.5% were recoverable errors
 1.5% were non-recoverable errors
Last Address Processing Step:
Geocoding
Geocoding by interpolation:
H2X2T1
1001
2001
H2X2T2
1998
998
3001
2998
1500 Sherbrooke
A Last Problem
 DSP management system is read-only (for
us)
 Not spatially enabled
 Must not affect performance
And its Solution
 Create a mirror of the DSP data model,
using PG
 Augmented with spatial aspects (and
more adapted address handling)
 Refreshed periodically
 Reprocessing of the content that has
changed
 Extraction of the new one
A Challenge
 Interface and extend existing:
 System
 Environment (including an important
community of users and developers)
Lessons Learned
 Very strong interest in using spatial information at the
DSP but infrastructure, skills and data quality are limiting
 Large effort to validate and correct all addresses
 The science of spatial analysis in public health often
lags the technology
 How to analyze multiple locations for each individual?
 How important is spatial location in an urban area?
 Open-source, web-based mapping software and spatial
databases (MapServer, PostGIS) are robust and easy to
work with for skilled developers
Acknowledgements
 GeoConnections, CIHR
 McGill University
 Aman Verma, Sherry Olsen, Andrew Carter
 Montreal DSP
 Louise Marcotte
 Robert Allard, Lucie Bedard, André Bilodeau
 Montreal Chest Institute
 Kevin Schwartzman, Jonathan Richard
 Alice Zwerling, Marie-Josee Dion
Descargar

Bla - PGCon