Data Bases: Population and Maintenance Geog 176B Lecture 8 Chapters 9 and 10, Longley et al. Data Collection One of most expensive GIS activities Many diverse sources (source integration, data fusion, interoperability) Two broad types of collection Data capture (direct collection) Data transfer Two broad capture methods Primary (direct measurement) Secondary (indirect derivation) Stages in Data Collection Projects Planning Evaluation Editing / Improvement Preparation Digitizing / Transfer Data Collection Techniques Raster Primary Secondary Vector Digital remote sensing images GPS measurements Digital aerial photographs Survey measurements Scanned maps Topographic surveys DEMs from maps Toponymy data sets from atlases Primary Data Capture Capture specifically for GIS use Raster – remote sensing e.g. SPOT and IKONOS satellites and aerial photography Passive and active sensors Resolution is key consideration Spatial Spectral Temporal www.spot.ucsb.edu Imagery for GIS Vector Primary Data Capture Surveying Locations of objects determines by angle and distance measurements from known locations Uses expensive field equipment and crews Most accurate method for large scale, small areas GPS Collection of satellites used to fix locations on Earth’s surface Differential GPS used to improve accuracy Total Station Pen/Portable PC and GPS Secondary Geographic Data Capture Data collected for other purposes can be converted for use in GIS Raster conversion Scanning of maps, aerial photographs, documents, etc Important scanning parameters are spatial and spectral (bit depth) resolution Scanner Raster to vector conversion Vector Secondary Data Capture Collection of vector objects from maps, photographs, plans, etc. Digitizing Manual (table) Heads-up and vectorization Photogrammetry – the science and technology of making measurements from photographs, etc. Digitizer Data Transfer Buy vs. build is an important question Many widely distributed sources of GI Includes geocoding Key catalogs include Geodata.gov Geography Network Access technologies Translation Direct read Managing Data Capture Projects Key principles Clear plan, adequate resources, appropriate funding, and sufficient time Fundamental tradeoff among Quality, accuracy, speed and price Two strategies Incremental ‘Blitzkrieg’ Alternative resource options In house Specialist external agency A useful rule of thumb is that positions measured from maps are accurate to about 0.5 mm on the map. Multiplying this by the scale of the map gives the corresponding distance on the ground. Map scale Ground distance corresponding to 0.5 mm map distance 1:1250 62.5 cm 1:2500 1.25 m 1:5000 2.5 m 1:10,000 5m 1:24,000 12 m 1:50,000 25 m 1:100,000 50 m 1:250,000 125 m 1:1,000,000 500 m 1:10,000,000 5 km Positional Accuracy (cont.) within a database a typical UTM coordinate pair might be: Easting 579124.349 m Northing 5194732.247 m If the database was digitized from a 1:24,000 map sheet, the last four digits in each coordinate (units, tenths, hundredths, thousandths) would be questionable Testing Positional Accuracy Use an independent source of higher accuracy: find a larger scale map use precision GPS Use internal evidence: digitized polygons that are unclosed, lines that overshoot or undershoot nodes, etc. are indications of error sizes of gaps, overshoots, etc. may be a measure of positional accuracy Testing Accuracy (cont.) Compute accuracy from knowledge of the errors introduced by different sources e.g., 1 mm in source document 0.5 mm in map registration for digitizing 0.2 mm in digitizing if sources combine independently, we can get an estimate of overall accuracy... 2 2 2 (1 + 0 .5 + 0 .2 ) 0.5 = 1 .1 4 m m Definitions Database – an integrated set of data (attributes) on a particular subject Geographic (=spatial) database database containing geographic data of a particular subject for a particular area Database Management System (DBMS) – software to create, maintain and access databases A GIS links attribute and spatial data Attribute Data • Flat File • Relations Map Data • Point File • Line File • Area File • Topology • Theme Advantages of Databases over Files Avoids redundancy and duplication Reduces data maintenance costs Faster for large datasets Applications are separated from the data Applications persist over time Support multiple concurrent applications Better data sharing Security and standards can be defined and enforced Disadvantages of Databases over Files Expense Complexity Performance – especially complex data types Integration with other systems can be difficult Types of DBMS Model Hierarchical Network Relational - RDBMS Object-oriented - OODBMS Object-relational - ORDBMS Relational Databases rule now Characteristics of DBMS (1) Data model support for multiple data types e.g MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object (MS Object linking and embedding), Hyperlink, Lookup Wizard Load data from files, databases and other applications Index for rapid retrieval Characteristics of DBMS (2) Query language – SQL Security – controlled access to data Multi-level groups (e.g. census, NGA) Controlled update using a transaction manager Versioning Backup and recovery Characteristics of DBMS (3) Applications Forms builder Reportwriter Internet Application Server CASE tools Programmable API (Applications program interface) Role of DBMS System Task Geographic Information System • • • • • Data load Editing Visualization Mapping Analysis Database Management System • • • • Storage Indexing Security Query Data Relational DBMS (1) Data stored as tuples (tup-el), conceptualized as tables Table – data about a class of objects Two-dimensional list (array) Rows = objects Columns = object states (properties, attributes) Table Row = object Vector feature Column = attribute Relational DBMS (2) Most popular type of DBMS Over 95% of data in DBMS is in RDBMS Commercial systems IBM DB2 Informix Microsoft Access Microsoft SQL Server Oracle Sybase SQL Structured (Standard) Query Language – (pronounced SEQUEL) Developed by IBM in 1970s Now de facto and de jure standard for accessing relational databases Three types of usage Stand alone queries High level programming Embedded in other applications Types of SQL Statements Data Definition Language (DDL) Create, alter and delete data CREATE TABLE, CREATE INDEX Data Manipulation Language (DML) Retrieve and manipulate data SELECT, UPDATE, DELETE, INSERT Data Control Languages (DCL) Control security of data GRANT, CREATE USER, DROP USER Relational Join Fundamental query operation Occurs because Data created/maintained by different users, but integration needed for queries Table joins use common keys (column values) Table (attribute) join concept has been extended to geographic case Join Record ID Address #cars 1241 1242 1243 1244 123 State St. 3 1 2 1 1801 Main St. 2106 Elm St. 7262 Pine Drive 1241 Ford 2003 1241 Subaru 2000 1241 Honda 1999 1241 123 State St. Ford 1241 123 State St. Subaru 1241 123 State St. Honda 1242 1801 Elm St. Kia Spatial indexing Many maps tiled B-tree (Balanced) Grid indexing Quad tree: Points/regions R-tree (Based on MBR) New global/spatial grids: QTM Go2 Grids 38:53:22.08N 077:02:06.86W US.DC.WAS.188.8.131.52.11 US.CA.SBA.UCSB.UCEN Spatial Search: Gateway to Spatial Analysis Overlay is a spatial retrieval operation that is equivalent to an attribute join. Buffering is a spatial retrieval around points, lines, or areas based on distance.