Unidata 2008: Shaping the Future of Data Use in the Geosciences Expanding Horizons: Using Environmental Data for Education, Research, and Decision Making 23 June 2003 Boulder, CO Mohan Ramamurthy Unidata Program Center UCAR Office of Programs Boulder, CO Thank you, Dave I wish to take this opportunity to extend my sincerest gratitude to Dave Fulker, the founding director of Unidata, for his distinguished service to the Unidata Community for nearly 20 years. Unidata would not be what it is today without his vision, leadership, energy and his many extraordinary qualities. And thank you to Ben Domenico for his excellent stewardship during the transition. The Word of the Day for Jun 23rd is The Word of the Day for Jun 23 is: bloviate \BLOH-vee-ayt\ verb : to speak or write verbosely and windily [Courtesy: Jo Hansen, Unidata Program Center] Example sentence: Mohan can bloviate on a par with the windiest of professors, but he's also capable of being concise and getting right to the point. (yeah, right) Expanding Horizons New Strategic Plan New Director New 5-year proposal Many new and exciting initiatives New logo! Unidata Mission Statement: Provide data, tools, and community leadership for enhanced Earth-system education and research. At the Unidata Program Center, we • Facilitate Data Access • Provide Tools • Support Faculty and Staff • Build and Advocate for a Community user workshops come under this activity Technology Portfolio 1) McIDAS: A client/server analysis and display package, originally developed by U. Wisconsin/SSEC, that emphasizes image processing of data from satellite-borne sensors; 2) GEMPAK: An analysis, display, and product generation package for meteorological data; 3) Local Data Manager: Software for capturing, disseminating, and organizing data in near-real time; It is the heart of the Internet Data Distribution (IDD) system; 4) NetCDF: A software interface for platformindependent access to self describing datasets; 5) Integrated Data Viewer: Java-based, platformindependent data analysis and 3D visualization tools; 6) THREDDS: A project to facilitate remote access to thematic, distributed, interdisciplinary data servers; Unidata as a Diverse Community About 150+ sites are participating in Unidata Internet Data Distribution (IDD) system • 120 or so of those sites are in academia and the rest in government and research labs User community is interdisciplinary 2/3rd of sites have users outside atmospheric sciences Internet Data Distribution Approximately 2 GB of data injected/hour from distributed sources; Model Source LDM LDM Satellite Unidata IDD/LDM uses more of the Internet2 than any other advanced application; Approx. 5 Terabytes of data transmitted each week. (Amount varies with weather) Radar Source LDM Source LDM LDM LDM Internet LDM LDM LDM By design, the system has no data center. Proposed WSR-88D Data Flow (NWS Plans) Education Drivers (a.k.a. A Community-Articulated Need) Active, studentcentered learning Earth-system science or “holistic” approach to education Learning science by doing science • Observations (data) • Tools (models, visualization) • Discovery Science Drivers Grand Challenges in Environmental Sciences National Research Council NSF Director Rita Colwell, 1998: "Interdisciplinary connections are absolutely fundamental. They are synapses in this new capability to look over and beyond the horizon. Interfaces of the sciences are where the excitement will be the most intense... ." Multidisciplinary Problems Fire Danger determination requires taking into account past, present and future weather, fuel types, and the state of both live and dead fuel moisture. • Dead Fuel Moisture • Live fuel moisture (NDVI) • Drought conditions • Atmospheric stability • Lightning maps • Lightning ignition efficiency • Airflow • Recent rainfall • Rainfall forecast Dual-Polarization Radar use in Fire Weather Management The differential reflectivity (ZDR) values are noteworthy in the smoke signal Many regions show ZDR >+6 dB. Suggests flattened ash particles (like corn flakes) Source: CHILL Radar Group, CSU Flooding due to Tropical Storms Tropical Storm Allison Research studies and emergency management of hurricane-induced flooding involve integrating data from atmospheric sciences, oceanography, hydrology, geology, geography, and social sciences. Multidisciplinary Synthesis Requires integration of disparate datasets and databases from diverse sources that are distributed geographically and disciplinarily; Needs integration of Scientific Information Systems with Geographic Information Systems The integration poses numerous challenges; However, such integration is critical to solving societal problems and advancing science. Metadata is crucial to achieving integration Remote Sensing & Data Explosion In the next 10 years, about 100 new satellite instruments will be launched to monitor the environment Five-order magnitude increase in satellite data is expected during that period • GIFTS (Geostationary Imaging Fourier Transform Spectrometer) will have about 1700 channels and a resolution of 4 km • Each NPOESS satellite will generate one terabyte of data each day Advances in Radar technology • 28 fold increase in WSR-88D data volume in 5 years • Phased-array radars will generate 100 fold increase data By 2004, NOAA will ingest more data in one year than was contained in the total archive in 1998. Advances in Modeling Shift from a purely deterministic to a more probabilistic approach, requiring the use of ensemble modeling techniques. Growing emphasis on multidisciplinary studies, requiring coupled models: • e.g., Hurricane landfall flooding problem: Atmospheric model (WRF/MM5), Ocean model (ROMS), Hydrologic model (HMS) Local Modeling: A Notable Trend Over 30 universities are now running mesoscale models locally. One can think of this aggregation as a national forecasting instrument However, only one or two groups initializing their model runs with local observations As the scale of these local model runs becomes finer, there is a natural desire to integrate their output with information from other sources (e.g., hydrology, infrastructure, societal datasets in GIS form) Iowa St. Linux Cluster Technology Drivers Object-oriented programming Open Standards, Interoperability and Open Source Movement • Metcalfe's Law: the usefulness, or utility, of a network increases as the square of the number of users. Web services (HTTP, Java, XML, SOAP, UDDI, …) Digital libraries (Metadata, discovery, information services…) Grid environments and distributed computing Commodity microprocessors Cluster computing High bandwidth networks: 10GigE, Fast IP, … Broadband access Wireless networks: 802.11 networks, GPRS, 3G IPv6: Next-generation internet protocol Collaborative computing Scientific data mining and knowledge discovery Web Services and the Wild and Wooly World of Markup Languages Services Web services is a technology and process for discovery and connection. Users Metadata repository Collections (Data, tools, educational materials) The eXtended Markup Language, XML, is accepted as THE emerging standard for data interchange on the Web. XML allows authors to create their own markup, which has led to the proliferation of “MyOwn Markup Language” Five-year Core Funding NSF Proposal Title: Unidata 2008: Shaping the Future of Data Use in the Geosciences We are moving from an era of data provision to one in which data- and related web-services are emphasized Six endeavors are proposed, focusing on Community and Support Services and Data Services, Systems, and Tools The proposed endeavors will enable the community to advance scientific exploration, education, and decisionmaking. “The unanimous finding of the panel is that the Unidata Program Center program be supported as fully as possible by NSF for the years 2003-2008.” Proposed Endeavors Endeavor 1. Responding to a broader and more diverse community. • Respond to increased emphasis on Earth-system science (e.g., bring new data sets to the community) • Establish new partnerships with related communities (e.g. with Hydrology via CUAHSI) • Support new tools in technically less-sophisticated institutions (e.g., community colleges) Endeavor 2. Comprehensive support services • Deploy web-based training modules • Simplify installation and maintenance for all supported packages • Explore new technologies (e.g., Access Grid) to facilitate remote collaboration Endeavor 3: Real-time, self-managing data flows More flexibility and control • Many more feed types for finer control over routing and subsetting • Configurable product priorities Self-managing data flows (automatic dynamic routing) • Application-level multicast looks promising for hundreds of sites (IP multicast not suitable due to limitations) • NLDM: data flooding via Usenet protocols may provide practical routing solution (needs more testing) Support for new standards • Use of IP version 6 protocols • Internet2, Grid and e-services standards (authentication, resource use, ...) • Location-transparency for data LDM-5 Vs. LDM-6 Latencies CONDUIT Experience Average delivery time : ~20 seconds to top-tier sites Endeavor 4. Software to analyze and visualize geoscience data Integrate diverse datasets Support analysis and visualization of local and climate modeling efforts Develop collaborative tools to make effective use of shared visualizations Allow customized user experiences Adapt to GIS frameworks – Cloud water isosurface from COMMAS storm model data (courtesy Adam Houston and Dan Bramer, NCSA/UIUC) People Discovery and Publication Tools Discovery and Publication Services Documents Analysis and Visualization Tools THREDDS Middleware Data Services Data THREDDS, GIS, DL Interoperability THREDDS Client Applications GIS Client Applications OGC or proprietary GIS protocols OGC or OPeNDAP ADDE. FTP… protocols OpenGIS Protocols: WMS, WFS, WCS GIS Servers GIS Server Demographic, infrastructure, GIS Server societal impacts, … datasets Metadata crosswalk THREDDS Servers THREDDS Server THREDDS Server Satellite, radar, forecast model output, … datasets Metadata crosswalk Open Archives Initiative (OAI) Metadata Harvesting Digital Library Discovery Systems Endeavor 6: Improved data access infrastructure NetCDF-HDF Integration Extend netCDF to high-performance computing environment Implement parallel I/O, large grids, etc. Work will directly benefit WRF and CCSM communities Proposed Implementation Current Implementation Application Application netCDF netCDF HDF5 (serial and/or parallel) POSIX I/O POSIX I/O Split files File File Metadata Raw data MPII/O Custo m UserParallel defined file device system Strea m Network or to/from another application The Visual Geophysical Exploration Environment (VGEE) The VGEE is an integrated framework in which students use visualization tools, data, and curricular materials to learn basic physical principles of atmospheric science It includes: • A learner interface to the IDV • Java-based concept models to support physical insight • A curriculum to guide inquiry • A catalog of data (THREDDS) VGEE: An Integrated Framework Concept Models, which are used to explore relations in an idealized context. Students notice that the Western Pacific is considerably warmer than the East. Identify Relate Explain Integrate Concluding Remarks We live in an exciting moment in the history of the Earth sciences. Workshops like this and the diversity of representation from academia are testimony to the vibrancy of the community and the program. The portfolio of tools and technologies within Unidata, coupled with the energies of a creative and collaborative community, puts us in an ideal position to meet the important challenges facing the education and research communities in the atmospheric and related sciences.