Collaboration and Multimedia Group
Jonathan Grudin
Microsoft Research
[email protected]
http://research.microsoft.com/~jgrudin
Our Group
 About 2 years old


9 people (4 Researchers, 3 RSDEs, 1 Usability, 1 Design)
Diverse: Systems, Cognitive Science, Sociology, Vision
Anoop Gupta
Gavin Jancke
Li Wei He
Dave Bargeron
Dan Venolia
Jonathan Grudin
Marc Smith
Yong Rui
JJ Cadiz
 Focus:

Make audio-video information a first-class citizen

Support for remote participation and awareness

Frameworks for enhanced online communities
=>Technologies, Applications, and Social Factors
 Research model
Build
Prototype
Evaluation /
Publication
Refine
Prototype
Product
Impact
Technology and Education
 Two broad facets:

Technology for improved content
 deep models of subject matter and student
 active exploration of subject (simulations)
 relate to students context/environment (situated learning)
MOSTLY DOMAIN DEPENDENT

Technology infrastructure for:
 course and student management
 content creation
 delivery / distribution
 collaboration
MOSTLY DOMAIN INDEPENDENT
 Both aspects are important and complementary
Project Areas
 Low-cost Capture of Video
 Browsing Audio-Video
 Multimedia Annotations
 Remote Synchronous Collaboration
 Enhanced Online Communities
Multimedia is in routine use now
where networking is in place…
Studies of MS Technical Education talks
MSTE Presentations
 Logs of ~30,000 sessions by over 5000 users
 Some results:
On-demand audience larger than live audience
 60% of sessions are under 5 minutes
 Viewers jump around video
 Initial portions much more likely to be watched

 Presentations will be designed differently in future
Present key messages early in talk and in each slide
 Use meaningful slide titles
 Reveal talk structure in slide titles
 Consider post-processing talk for on-line viewers

Viewers Over Time for One Talk
 Viewers decrease overall and within each slide
A
B
User count
70
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
Nth minute into the talk
80
90
MSTE Presentations
 Logs of ~30,000 sessions by over 5000 users
 Some results:
On-demand audience larger than live audience
 60% of sessions are under 5 minutes
 Viewers jump around video
 Initial portions much more likely to be watched

 Presentations will be designed differently in future
Present key messages early in talk and in each slide
 Use meaningful slide titles
 Reveal talk structure in slide titles
 Consider post-processing talk for on-line viewers

Analysis of Online Presentation Viewing
 Logs of ~30,000 sessions by over 5000 users
 Some results:
On-demand audience larger than live audience
 60% of sessions are under 5 minutes
 Viewers jump around video
 Initial portions much more likely to be watched

 Presentations will be designed differently in future
Present key messages early in talk and in each slide
 Use meaningful slide titles
 Reveal talk structure in slide titles
 Consider post-processing talk for on-line viewers

Low-cost Capture of Video
 Cost of capturing content is high today
Large human cost
 Disk cost only $3/hour

End-User
Value
Production
Cost
Time
 Automated capture of talks/meetings with high quality
Use cinematography idioms
 Combined m-array/vision algorithms
 Room set-up and control framework
 Initial prototype, ongoing work

 Starting meeting capture project

Jointly with Vision group

Parabolic mirror with camera
 Capture all local participants
 1000x1300 image
 De-warp, analyze, compress, …

Software enables remote participants
to interact

Capture all and make it browsable
Project Areas
 Low-cost Capture of Video
 Browsing Audio-Video
 Multimedia Annotations
 Remote Synchronous Collaboration
 Enhanced Online Communities
Browsing Audio-Video
 People are good at skimming text; not true for
audio-video
 As A-V content becomes pervasive, ability to
browse will be critical
 Solution components:
Time-compression: up to ~2-fold speedup
 Highlights: > 2-fold (some content omitted)
 Indexes: navigable structure and search
 Role of people
 User interface

Browsing Audio-Video
Studies and Sub-projects
 Time compression



Algorithms for linear TC well understood
New issues: client-server; file formats; UI/UE
Study: Discrete vs. Continuous; Latency; … (CHI 99)
 Highlight extraction

Presentation highlights (ACM MM’99, CHI 2000)
 Metrics: Coverage; Coherence; Comprehension

Baseball highlights (ACM-MM’00 submission)
 Audio features only: generic and baseball specific
 Visual action highlights (CVPR’00, two papers)
 Prototype video browser study (CHI 2000)




Six video categories: lectures, news, soaps, sports, travel…
Standard VCR and speed-up controls
Textual & visual indices: TOC, Notes, timelines, shot boundaries
Jump controls: jump-back-X, jump-forward-X
 Behaviours varied but participants liked new controls
Issues Being Explored
 Adaptive time-compression; client-server systems
issues; user perception; …
 Automated highlight generation; combining multiple
information sources; user perception; …
 Automated index generation; shot boundaries; speaker
transitions; hierarchical ToC
 Role of people: viewers; speakers; middle men
 User interface: user behavior and models; human-in-the
loop; PC vs. WebTV; …
Project Areas
 Low-cost Capture of Video
 Browsing Audio-Video
 Multimedia Annotations
 Remote Synchronous Collaboration
 Enhanced Online Communities
Multimedia Annotations
 Ability to mark-up, take notes, collaborate around
multimedia content can add significant value
University and corporate training models
 Exploring other uses

 Various indices, highlights, … are also annotations

E.g. table of contents, slide-flips, speech-to-text, …
 Multimedia annotations:

Annotations are linked to the media time-line

Annotations stored separately from the media files
Some Unique Aspects
 Annotation sets and sharing
 Displaying Annotations: time or annotation-centric
 Integration with email
 Multiple annotation types
 Collection of flexible and embeddable objects
Annotations in Technical Education
Multimedia Report Scenario
Study Results
 Initial System Design and Use (WWW’99)


Personal note-taking study
Shared note-taking study
 Text preferred over audio
 Exact positioning not critical
 Auto-tracking particularly useful
 MRAS-MSTE Study (Tech Report)

58 students involved in two instances of “C” course
 ~ 20% lower attrition rates (although self selected)
 Class participation levels were same or better
 Overall, students were pleased with experience

Students took advantage of on-demand format
 Saved 28-35% time by skipping unimportant parts
 Log-ins were well-spread over duration of course

Instructors saved 50% on time but felt under utilized
 Usage study of Office-2000 annotations

Office-2000 web discussions used for Office redesign
 Used primarily for spec development
 ~10K annots, by ~450 people, ~1250 docs, over 9 months
 Interviewed 10 users

Top 33% people made ~80% of the annotations
 Approx 30-50 annotations per person

Key benefits:
 Great for asynchronous collaboration
 “In-context” better then chained email threads
 Greater awareness of document state

Key problems:
 Orphaning; notification; prioritization and resolution
 Ongoing work on common framework
 Cooperating in use at MIT, University of Washington
Project Areas
 Low-cost Capture of Video
 Browsing Audio-Video
 Multimedia Annotations
 Remote Synchronous Collaboration
 Enhanced Online Communities
Synchronous “Real-Time” Collaboration
 Core activity for people
 Source of on-demand content

Captured presentations and meetings
 Our work in this area:

Flatland: Desktop-to-desktop tele-presentations

TELEP: Mixed Live+Remote tele-presentations

CVV (NetShow + NetMeeting): Collaborative Video Viewing

Studies of informal awareness in work settings
Flatland and TELEP
 Attending seminars on the web is a passive
experience today

almost no interactivity

almost no sense of presence
 Presenter’s view of remote audience
 Remote audience view of other audience members
 Flatland and TELEP try to rectify these weaknesses

Flatland: desktop to desktop
 Several studies with MSTE courses

TELEP: live audience + remote-desktop audience
 Used for talks at MSR
Prototype Flatland Interface
Flatland Experience (CHI’99, HICSS’00)
 Initial use in 3 multi-session MSTE classes

Presentations from desktop to remote audience

Students:
 Liked the convenience
 Liked ability to multitask
 Did not think learning suffered

Instructors:
 Missed familiar sources of feedback
 Comfort level rose over time for 2 of 3
 Overall: Lack of awareness of others a key problem
TELEP Prototype
 Targets mixed live+remote audiences
 Two displays

Large side-wall display for lecture room audience
 Hands-free speaker interface; voice channel

Small side-frame display for remote audience
 Video vs Image vs Generic; Name vs Anonymous
 Q&A; shared chat; private chat; remote view; …
 Light weight and self-updating in browser frame
 Similar architecture as Flatland

Home-brewed light-weight video multicast system
TELEP Interface (Lecture Room View)
TELEP Experience
 Used for MSR lectures for 3 months

Speaker’s awareness of remote audience is UP

Remote audience representation
 Video not used much; Personal image or generic
 In v2, only a small fraction are choosing anonymity

For Q&A interaction
 Forward video latency is very disruptive
 Several changes in interface to deal with that

Many suggestions, but overall feedback quite positive

Presented results at CHI 2000
Collaborative Video Viewing
 Example scenarios:
Online presentation with demo videos
 Distributed tutored video instruction (D-TVI)

 NetMeeting doesn’t support these out-of-box
 Built a simple solution (CVV) on top of NetMeeting
 Study: Impact of communication channels on
interactivity

Chat; phone; phone+video; same room
Grade Point Average
Stanford TVI Experiments: 10/73 - 3/74
3.9
3.6
3.3
3
2.7
2.4
302
Campus
55
Live Video
6
Tape: No
Tutor
 remote TVI students with tutor do best
 it helped “at-risk” students even more

27
Tape: With
Tutor
Source: J.F. Gibbons, et al. Science, Vol. 195, No. 4283, 18 March 1977
Collaborative Video Viewing
 Example scenarios:
Online presentation with demo videos
 Distributed tutored video instruction (D-TVI)

 NetMeeting doesn’t support these out-of-box
 Built a simple solution (CVV) on top of NetMeeting
 Study: Impact of communication channels on
interactivity

Chat; phone; phone+video; same room
Project Areas
 Low-cost Capture of Video
 Browsing Audio-Video
 Multimedia Annotations
 Remote Synchronous Collaboration
 Enhanced Online Communities
Enhanced Online Communities
 Theme: Use data mining and sociological
principles for better online communities
 Two projects:

Netscan (newsgroups, web-boards, …)
 Social Context
 History
 Reputation
 Neighborhood

Threaded Text Chat (or “Synchronous Newsgroups”)
 Turn Taking
 Conversational Structure
 Group Awareness
Activity Surrounding Teaching/Learning
 Pre-authoring

Slides, web notes, reference material, exercises, …
 Content delivery


Synchronous delivery to local/remote audience
Archived for on-demand audience and review
 On-demand access by students

Watch content; personal notes; TOC; index; …
 Discussion around content


Synchronous: small group; one-on-one
Asynchronous
 Post-lecture work by instructor / tutor


…
Answer questions; discussions; feedback & redesign; …
Student evaluation
Concluding Remarks
 Key drivers of change
market needs
 technology

 Key new directions
learner-centric
 asynchronous; small-group synchronous

 Key challenges
concrete studies to indicate effectiveness
 technology/products taking value beyond cost
 business model and bootstrapping issues

For More Information:
http://www.research.microsoft.com
Netscan (http://netscan.research.microsoft.com)
 Automatically characterizes groups and posters:





Activity: Growing, shrinking, peak days?
Style: Q&A, Announcement, Flames, Binaries…
Community: Is there a stable core group?
Quality: Are questions asked ever answered?
Participants: How has this person acted before?
 Improve:





Discovery: Where are the “good” groups?
Navigation: Where should I go from here?
Activity Monitoring: Where is the action?
Visualization: How does this all fit together?
Accountability: How have other people reacted to you?
 Closely working with MSDN and Microsoft.com
Netscan Interfaces
Microsoft.com: Newsgroup Reports
Microsoft.com: Newsgroup Topic Tracker
Threaded Text Chat
 Resolve the major source of ambiguity in text chat
User 1: Anyone from LA?
User 2: Anyone from St. Louis?
User 3: I am!
 Chat ruptures “Adjacency Pairs”

Recent research (Garcia and Jacobs, Qualitative Sociology, Vol. 21,
no. 3, 1998) shows that a significant number of turns in chat (as
much as 40%) are repairs for misunderstood prior turns
 Threaded Text Chat reconnects turns and responses



Reduces repair overhead
Structuring mechanism for knowledge capture
Just completed user study
Threaded Text Chat
 Replies
always follow
the turns they
target
 Social
accounting
tracks room
and individual
activity
Descargar

No Slide Title