NSF/DHS FODAVA-LEAD:
Missions and Plans
Haesun Park
Computational Science and Engineering Division
Georgia Institute of Technology
FODAVA Kick-off Meeting, September 2008
Data and Visual Analytics (DAVA)
Analytical
Reasoning
Visual Representation
and Interaction
Data Representation
and Transformation
Production, Presentation,
Dissemination
Data and Visual Analytics (DAVA)
Analytical Reasoning
• Apply human judgment to
reach conclusions
• Methods to maximally utilize
human capacity to derive
deep understanding and
insight into complex
situations in a minimum
amount of time
Data Representation and Transformation
• Representing dynamic, incomplete, conflicting data
to convey important content in a form and level of
abstraction appropriate to the analytical task to
enable understanding
• Transforming data among possible representations
to support analysis and discovery
Visual Representation and
Interaction
• Visual presentation of information
in ways that instantly convey
important content taking
advantage of human vision
• Interaction techniques (e.g.,
search) between the analyst and
data to facilitate the analytical
reasoning process
Production, Presentation, Dissemination
• Seamless integration of data acquisition,
analysis, decision making, and action
A Discipline in Data & Visual Analytics
I think,
therefore
I am.
Data
Representation
and Transformation
Analytical
Reasoning
“Solving a problem simply
means representing it so that
the solution is obvious.”
Herbert Simon, 96
Foundations
Visual
Representation
and Interaction
Production
Presentation and
Dissemination
FODAVA is concerned with defining the mathematical and computational
foundations for the Data and Visual Analytics Discipline
Applications
Epidemiology
Medical Informatics :
Bioinformatics
Astrophysics
Homeland Security
Text Analysis
Biometric Recognition
Theory and practice of
knowledge integration,
management and use in
healthcare delivery, med,
public health
Social Networks
• FODAVA team will perform foundational research that can be
applied to many different fields
– Common end objective is to apply knowledge in decision making process, at the time and
place that a decision is needed.
– Common challenges across applications as well as application specific challenges
VISION: Establishing DAVA as a Distinct
Discipline
Data Analytics
Visualization
Production,
Presentation,
Dissemination
Analytic
Reasoning
Data and Visual Analytics
Mathematical and Computational Foundations
• Establish Body of Knowledge • Develop FODAVA community,
engage larger DAVA field
– Foundations, subareas,
applications
– Researchers
– Curriculum
– Educators
– Education programs
– Practitioners
Data and Visual Analytics
Communities
National
Visualization and
Analytics Center
(NVAC)/VAC
Consortium
FODAVA
FODAVA lead
FODAVA partners (08, 09, …)
RVAC/
DHS Science &
Technology Center
of Excellence
“This partnership with NSF is the most
important event since the creation of NVAC
in March 2004. It brings to the front stage
efforts by folks within DHS, NVAC and NSF
to jointly fund the development of basic
research in visual analytics supporting
DHS applied mission needs.”
~Jim Thomas, NVAC Director
FODAVA will interact
with several
communities of
researchers &
practitioners
FODAVA-Lead Mission
• Research and Education: Serve as a central facility
that will involve all FODAVA awardees in a common
effort to develop the scientific foundations for data and
visual analytics
• Effective Liaison between FODAVA Researchers
and NVAC: Interface with DHS NVAC/RVAC and
DHS S&T Center of Excellence in research and
educational opportunities
• Community Building: Integrate diverse DAVA
communities and reach out for broader participation
FODAVA-Lead Challenges
Research and Collaboration
• Creation of the Mathematical and Computational
Sciences Foundations required to represent and
transform all types of digital data in ways to enable
efficient and effective Visualization and Analytic
Reasoning
• Intrinsic Challenges: Data sets massive,
heterogeneous, multi-dimensional, dirty, incomplete,
time-varying; solutions must be produced with time
and space constraints, ….
• Understanding Fundamental issues/needs in VA
and Communicating results
– Isolated theoretical research is not enough
– Problem driven foundational research is needed
FODAVA-Lead Challenges (cont’d)
• Education and Research
– Defining Foundations of Data and Visual Analytics
– Undergraduate and Graduate Curriculum (core
body of knowledge) for Data and Visual Analytics
• Community Building/Integration
– A community of researchers who claim DAVA as
their own discipline and FODAVA an essential part
– Conferences, journals, books, professional
society engagement,
– Industry, tech transfer, …
FODAVA-Lead PIs at GAtech
Alex Gray
CSE
Machine Learning
Fast Algorithms for Massive DA
Vladimir Koltchinskii
Mathematics
Machine Learning Theory
Computational Statistics
Haesun Park
Director
CSE, Associate Chair
Numerical Computing
Data Analysis
Research, FODAVA Community Building
Renato Monteiro
ISyE
Continuous Optimization
Statistical Computing
John Stasko
Associate Director
IC, Associate Chair
SRVAC Co-Director
Information Vis.
Collaboration with NVAC and RVACs
Liaison with Vis. community
FODAVA-Lead Senior Personnel
James Foley
Associate Dean CoC
Graphics and Visualization, HCI
Visual Analytics Digital Library
Alexander Shapiro
ISyE
Stochastic Programming
Optimization
Multivariate Stat. Analysis
Richard Fujimoto
Associate Director
CSE, Chair
Modeling and Simulation
Education and Outreach
Santosh Vempala
CS
Theory of Computig
Director of ARC
Guy Lebanon
Arkadi Nemirovski
CSE
ISyE
Machine Learning
Optimization
Computational Statistics Non-parametric Stat.
Hongyuan Zha
CSE
Numerical Computing
Data Analysis
Director of Graduate Studies
Hao-Min Zhou
Mathematics
Wavelet and PDE
Image Processing
2008 FODAVA Partners
•
•
•
•
•
•
•
Global Structure Discovery on Sampled Spaces
Leonidas Guibas and Gunnar Carlsson (Stanford University)
Visualizing Audio for Anomaly Detection
Mark Hasegawa-Johnson, Thomas Huang, Hank Kaczmarski, Camille
Goudeseune (University of Illinois Urbana-Champaign)
Principles for Scalable Dynamic Visual Analytics
H. Jagadish, and George Michailidis (University of Michigan)
Efficient Data Reduction and Summarization
Ping Li (Cornell University)
Uncertainty-Aware Data Transformations for Collaborative Reasoning
Kwan-Liu Ma (UC Davis)
Mathematical Foundations of Multiscale Graph Representations and Interactive
Learning
Mauro Maggioni, Rachael Brady, Eric Monson (Duke University)
Visually-Motivated Characterizations of Point Sets Embedded in HighDimensional Geometric Spaces
Leland Wilkinson and Robert Grossman (University of Illinois Chicago)
Adilson Motter (Northwestern University)
Expertise of FODAVA team
Computational
Math&Statistics
Human
Computer
Interaction
Information
Visualization
Database
Real-time
Systems
Machine
Learning
Numeric &
Geometric
Computing
Optimization
Simulation
Gaming
Graphics
and Vis.
Information
Retrieval
High
Performance
Computing
Discrete/Graph
Algorithms
Speech
Recognition
FODAVA Activities
• Body of Knowledge
–
–
–
–
Curriculum development
Repository for education materials
Distinguished lecture series
Outreach to underrepresented groups
• Community Development
– Communications: project description and results
– FODAVA web site
• Repository of FODAVA data sets and results
– Conferences and meetings
•
•
•
•
Annual FODAVA Workshop
NVAC Consortium meetings
Activities at established meetings
Meetings to establish new research directions
Curriculum Development
• Goals
– Identify and catalog curriculum development efforts in
Data and Visual Analytics
• Individual courses, minors, degree programs
• Undergraduate and graduate level
– Leverage existing efforts (e.g., RVAC)
– Share experiences, develop best practices
– Develop curriculum recommendations
• Curriculum workshop
– POCs: Cook (NVAC), Fujimoto (FODAVA), Stasko
(RVAC and FODAVA)
– December 2008, Atlanta, Georgia
Visual Analytics Digital Library
(http://vadl.cc.gatech.edu)
• Developed by Georgia Tech
(Foley et al.) in Southeast
Regional Visual Analytics
Center
• Repository for curriculum and
education materials
– Lecture notes
– Homeworks, projects
– Reference materials, videos, etc.
• Includes evolving taxonomy for Data and Visual Analytics
• FODAVA will build upon this resource to
– Provide a library and web portal of FODAVA educational materials
– Expand support to DAVA community to include FODAVA areas
– Document curriculum develop efforts
Distinguished Lecture Series
• Goal: Provide forum for
leaders in DAVA community
to articulate vision and
DAVA-related research and
education activities and
applications
• Plans (2009)
Photo: Joe Kielman, VAC Consortium meeting, 2008
– Lecture series featuring leaders in the data and visual
analytics community
– Develop in collaboration with FODAVA partners, NVAC,
RVAC, DHS/S&T CoE
– Webcast
Outreach to Underrepresented Groups
Example: GT CRUISE Program
• CRUISE: CSE Research Undergraduate Intern
Summer Experience
• Encourage students to consider PhD studies
• Diverse student participation
– Multicultural, emphasizing minorities, women
– U.S. and international students
• Ten week summer research projects in areas
such as data and visual analytics, high
performance computing, modeling & simulation
• Interdisciplinary individual and group projects
– Year-long collaboration with North Carolina
A&T University
• CRUISE-wide events
– Weekly seminars (technical, grad studies)
– Social events
– Symposium: conference-style presentations
FODAVA Website
http://fodava.gatech.edu
• Forum for FODAVA
Community
• Maintain close
collaboration with
NVAC
• Functionality
– Dissemination of results to user communities
– DAVA community events and meeting information
depot
– Repository of data sets for FODAVA community
FODAVA Annual Workshop
(from Fall 2009)
• Annual Theme
– Initially more mathematically/computationally oriented
– Increasing emphasis over time on visualization, humancomputer interaction, cognitive science, …
• Organizers
– Co-organized in collaboration among FODAVA-Lead,
FODAVA-Partners, NVAC, and DHS S&T Center of
Excellence
• Time
– Co-locate with NVAC Fall Consortium meeting
• Location
– PNNL/NVAC, Richland, WA
FODAVA Annual Workshop 2009
• Theme: Machine Learning & Geometric Computing
in Visual Analytics
• Organizers: Vladimir Koltchinskii (GATech)
and Mauro Maggioni (Duke)
• Time: November, 2009
• Location: PNNL/NVAC, Richland, WA
VAC Consortium Meetings
• Provides broader exposure of work, to
DHS and NVAC communities
• Semi-annual:
Next Meeting: Nov 11-13, 2008, PNNL
– Nov. 11: University Technical Exchange Day
– FODAVA Panel session
– FODAVA Demo/Poster session
• Please participate!
Additional Workshops
• FODAVA workshops at major conferences and meetings
• IEEE VAST Conference
– Birds of a Feather session at VAST Oct., 2008
• Workshop on Temporal Analytics
• Other Potential venues
–
–
–
–
–
–
International Conference on Machine Learning
Neural Information Processing Systems (NIPS)
SIAM CSE / SIAM Optimization / SIAM ALA Conferences
ACM Knowledge Discovery and Data Mining (KDD)
AAAS meeting
Others?
Calendar of Events
• Sept 2008: FODAVA Kick-Off Meeting
• Oct 2008: VAST 2008 BoF Session
• Nov 2008: VAC Consortium meeting, FODAVA
Panel and Poster/Demo Session
• Dec 2008: DAVA Curriculum Workshop
• May 2009: VAC Consortium Meeting
• Oct 2009: VAST Conference
• Nov 2009: VAC Consortium and FODAVA Annual
Workshop
• Temporal Analytics Workshop under consideration
Project Materials
• Goal: Articulate contributions being made by
the FODAVA community
• Benefits
– Potential collaborators
– Foster technology transition opportunities
– Broader exposure to potential sponsors
• Materials requested
– Project brochures and other collateral material
– Videos especially welcome
• Tell us what you’re doing!
• POC: Richard Fujimoto
Concluding Remarks
• DAVA represents a new, exciting discipline that
brings together diverse communities
• Research is motivated and driven by real-world
problems
• FODAVA will play a key role in developing and
defining the foundations for DAVA
• Communication and collaboration with other
elements of DAVA (e.g., NVAC, RVAC, DHS/S&T
CoE) is essential
– We need to educate ourselves!
Thank you!
Extra slides
Student Interns
• Support deep research collaboration
between FODAVA lead, FODAVA partners,
and PNNL / NVAC
– Fundamental research driven by real-world
applications
• Leverage existing intern programs at PNNL
– Summer interns
• Leverage GT distance learning capability for
academic year interns
• Details to be determined
Undergraduate Education
• Georgia Tech Threads curriculum
– Undergraduate program defined as a set of 8 threads
– Thread is a body of coursework targeting a certain career
path, e.g., modeling and simulation, human computer
interaction, embedded systems, etc.
– Students take two threads to complete BS in CS degree
• Existing threads
–
–
–
–
–
–
–
–
Modeling and Simulation: representing processes/systems
Devices: embedded computing
Theory: theoretical foundations of computing
Information Networks: information communication
Intelligence: human-level intelligence
Media: systems for creative expression
People: human-centric computing
Platforms: computing systems, architecture, languages
Modeling & Simulation Thread
• Many students come to Georgia Tech with an inherent love
for math and science
• Computation provides a framework to view, understand,
analyze, and design systems
Computational modeling is about going from
to
Involves developing mathematical / conceptual
abstractions of systems that can be
represented by efficient software
Fluid flow
model
Cellular
Automata
Queueing Model
A Data and Visual Analytics Thread?
Aero
Civil, Elect.
EAS, Biology
Chemistry, Math
Physics, Industrial Eng.
Application Discipline
(pick one)
Computational Methods
for Data Analysis
And Visualization
?
Math
Discrete Math
Continuous Math
Computing
Theory
Software
Hardware
Algorithms
Science
Physics
Biology
Chemistry
Foundations
• Curriculum
•
•
•
Foundational mathematics, computing, science
Data analytics, information visualization
Application-oriented specialization
• Integrated approach with capstone design project
• Natural complement to modeling and simulation thread
Application Domains
• DHS: Intelligence analysis, Law Enforcement,
Emergency response, Intrusion and fraud
detection, ….
• BioMedical Informatics
• Bioinformatics/Systems Biology
• Astronomy
• Text Analysis: Documents, e-mails, …
• Cybersecurity
• Transportation
• …
Vladimir Koltchinskii, School of Mathematics
• Machine
Learning
- Learning Theory
- Feature Selection
- Theory of Sparse Recovery
- Empirical Risk Minimization
• Computational Statistics

Sparse Recovery : For automatic determination of relevant features
(Basis pursuit, Soft threshholding, LASSO …)
Comprehensive theory is only starting to be developed
Penalized Empirical Risk Minimization: Basis for many solutions in basic
problems of learning theory, e.g. regression, classification, density estimation
Challenge: extend the theory of sparse recovery to broader framework of learning
theory, e.g. infinite classes of functions
Renato Monteiro, School of Industrial & Sys. Eng.
• Continuous Optimization
- Interior-point methods
- Semidefinite programming
- Cone programming
- Algorithms for large-scale optimization
• Computational Statistics and Graph Theory
Dimension Reduction and Semi-definite Programming
• Higher level of reduction with more difficult objective function
• Learning manifolds which preserve ordering of distances
• Off-the-shelf SDP software does not scale
• Design of efficient algorithms based on the first-order method,
convex-concave saddle point problem
Alexander Gray, Computational Sci. & Eng.
Goal: make machine learning efficient
– For massive datasets, e.g. for astronomy,
Large Hadron Collider, network traffic
– For fast visualization, e.g. our new
manifold learning methods
• Developed fastest practical
algorithms for many learning methods
• Coming in Dec 2008: MLPACK library
John Stasko, School of Interactive Computing
and GVU Center
Information Visualization
Human Computer Interaction
Visualization for Investigative Analysis Putting the Pieces Together with Jigsaw
Help investigative analysts discover plans, plots and
threats embedded across large document collections
Multiple visualizations (views) of the documents, entities, & their connections
Views are highly interactive and coordinated
Analysts explore the documents and entities through the views
Building a collaborative version
Representing reliability and uncertainty
Entity aliasing and hierarchy support
Visualizing the investigative process
Haesun Park, Computational Sci. & Eng.
• Numerical Computing
• Algorithms for Massive Data Analysis
- Dimension Reduction
- Clustering and Classification
• Bioinformatics
- Microarray analysis
- Protein structure prediction
Effective Dimension Reduction with Prior Knowledge
• Dimension Reduction for Clustered Data:
Linear Discriminant Analysis (LDA), Generalized LDA (LDA/GSVD),
Orthogonal Centroid Method (OCM)
• Dimension Reduction for Nonnegative Data: Nonnegative Matrix
Factorization (NMF)
• Applications: Text Classification, Face Recognition, Fingerprint
Classification, Gene Clustering in Microarray Analysis …
Education and Outreach Goals
FODAVA lead will
• Encourage and coordinate development of
FODAVA Curriculum
• Encourage and coordinate knowledge exchange
toward creating a workforce pipeline
– Undergraduate education
– Graduate education
– Lifelong learning
• Facilitate research collaboration
• Facilitate outreach to underrepresented groups
Engaging FODAVA Community
• FODAVA program provides a platform to
bring together community of researchers,
educators and practitioners
• Activities might include
– Education workshops to share experiences,
develop best practices
– Curriculum development
– Repository of information and teaching
materials (e.g., SRVAC, VADL)
Descargar

FODAVA Education and Outreach Activities