Statistics Canada’s
Real Time Remote Access
Solution
2011 MSIS Meeting –
Karen Doherty
May 2011
Background
 Access to, and analysis of, StatCan data is
fundamental to the fulfilment of our mandate.
 Traditionally provided access through:
• Aggregate data posted on the Agency’s website;
• Public use microdata files (PUMFs); and
• Special and customizations of aggregate data.
 Currently 20 Research Data Centres (located in
universities) provide access to confidential
microdata files to researchers across the country
2
Statistics Canada • Statistique Canada
2011-05-23
Background
 StatCan is facing increasing demands for greater
access to detailed microdata
 Advances in IT offer opportunities for producing,
disseminating, mining analysing data
 Researchers are frustrated with the impediments
to data access imposed by StatCan
3
Statistics Canada • Statistique Canada
2011-05-23
RTRA – The Business Solution
 An on-line remote access facility that
allows researchers to run data analyses on
microdata sets
 Data sets are stored in a central and
secure location under the control and care
of StatCan
4
Statistics Canada • Statistique Canada
2011-05-23
Data Access Strategy
Type of
User
Researcher
RDC, Remote
Access, Research
Contracts
Data User
General user
(student, reporter
etc.)
5
Custom data
products,
PUMF
Pre packaged
tables
Statistics Canada • Statistique Canada
Service &
Cost
2011-05-23
Development of a Working System
 Phase 1 – completed 2009
• Identification of business requirements focusing on components
such as security, legal, and functionality
 Phase 2 – completed 2010
• Pilot version – limited number of researchers and restrictions on
types of requests allowed and level of details provided
 Phase 3 – first production version – 2011
• Functionality will be expanded incrementally in order to evaluate
security measures and mitigate risks
6
Statistics Canada • Statistique Canada
2011-05-23
Solution Approach




Examined lessons learned from other NSIs
Determined key requirements of the model
Adopted a model similar to the ABS model
Built on existing e-File Transfer (e-FT) facility to securely
transfer files across the “air gap”
 Security issues addressed via 4 four control points:
•
•
•
•
Secure dataset housing
Secure transit of datasets
Registered Users validation
Confidentiality rules for output
 Right balance of risk versus security
7
Statistics Canada • Statistique Canada
2011-05-23
How RTRA Works
• Researcher submits SAS
program
• Request passes through
firewalls to secure server
• Upon vetting, tables are
returned to researcher in
specified format
• If request does not comply
submission will not be run and
the log will be returned for
adjustment
• All submissions are monitored
and logged and logs are kept
for auditing purposes
8
Statistics Canada • Statistique Canada
2011-05-23
How RTRA Works
 Pre-Scan of requests:
• Limits access to data files
• Ensures that the programming guidelines have been
followed
• Uses automated SAS process to control output
 Post-Scan of outputs:
• Applies a controlled rounding algorithm to output tables
• Limits each submission to 10 tables
• Limits each researcher to 10 successful program
submission per day
• Supports two formats for output (.sas7dbat and HTML)
9
Statistics Canada • Statistique Canada
2011-05-23
Methodological Challenge
 No absolute criterion for defining confidential data,
however in terms of disclosure control, StatCan applies
risk management practises to safeguard the
confidentiality of microdata
 Developed specific rules:
•
•
•
•
Slightly masked microdata files
Automatic disclosure rules for tabular outputs
Pre-scan for inputs
Post-scan for the outputs
 Strategy involves trade-offs of the four potential
methodologies, any decision involves managing risk
and consideration of levels of security
10
Statistics Canada • Statistique Canada
2011-05-23
Architecture of RTRA – Design
Technologies
• File Transfer – e-FT Services (COTS)
• Workflow Components - SAS
• User Authentication – SAS and StatCan
Customer Relations Management
System (CRMS)
• Archive – Folder
• Data Views – SAS
• Automated Workflow – SAS Sniffer
• Post-Scan – StatCan rounding tool
RNDII.exe
11
Statistics Canada • Statistique Canada
2011-05-23
User Interface
User creates a request
12
Statistics Canada • Statistique Canada
2011-05-23
User Interface
User logs onto RTRA from StatCan website
13
Statistics Canada • Statistique Canada
2011-05-23
User Interface
User submits the request
Resulting data to be delivered to an external FTP server via StatCan e-FT system
14
Statistics Canada • Statistique Canada
2011-05-23
Future Direction
 Adjust service based on client feedback for
requirements and to tap into wider audience of
academics and the private sector
 Bring the solution in-sync with new WAN
infrastructure used by Research Data Centres
 Increase availability of additional cross-sectional
surveys to researchers
 Develop vetting procedures for longitudinal surveys
and administrative data
15
Statistics Canada • Statistique Canada
2011-05-23
2011 Work Plan
 Quality indicators for frequency indicators – June
2011
 Means, medians, percentiles, ratios and proportions
– August 2011
 Investigate support for other programming languages
such as SPSS – on-going
 Add Census information – November 2011
 Work with Generalized Tabulation System (G-Tab)
development team to see if G-Tab can automated
confidentiality by types of output – beginning in 2011
16
Statistics Canada • Statistique Canada
2011-05-23
Conclusion
 Starting to gain traction among Government of
Canada researchers.
 As the system evolves Statistics Canada
believes this tool will become a key component
of the toolset available to researchers such as:
• policy researchers in government departments and
agencies (federal, provincial, or municipal)
• academic researchers in Canadian universities
• any other researcher who agrees to the RTRA terms
and conditions of use
17
Statistics Canada • Statistique Canada
2011-05-23
Descargar

Slide 1