A Study of social influence
in diffusion of innovation
over Facebook
Shaomei Wu
[email protected]
Information Science
Cornell University
Information Science Breakfast, Dec 5, 2008
Diffusion of Innovation
“ Diffusion is the process in which an innovation is
communicated through certain channels over time
among the members of a social system. ”
–––– Everett M. Rogers *




“innovation”: Friendship Quiz – a Facebook application
“Communicated”: Invitations among Facebook friends
“time”: September 25, 2008 – Now
“social system”: Facebook
* Rogers, Everett M. (2003). Diffusion of Innovations, 5th ed.. New York, NY: Free Press, pp 5-6
Basic Diffusion Models
Threshold Model ⇔ Cascade Model
Statistically Equivalent *
*David Kempe, Jon Kleinberg, Eva Tardos. Maximizing the Spread of Influence through a Social Network. KDD, 2003
Cascade Model

Each recommendation will succeed with certain probability.
h
k
b
pgk
pab
pgl
g
c
i
pab
pac
pag
a
l
pad
paf
f
pdi
d
pdj
pae
j
non-adopter
e
adopter
social link
Question: how to estimate puv ?
recommendation
Question: how to estimate puv?

Current practice


Constant [1]
Based on ONLY network structure (e.g., in/out-degree) [2]
Do individuals and the social relationship among them matter?
[1] Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, Cascading Behavior in Large
Blog Graphs. SDM 2007.
[2] Jure Leskovec, Lada Adamic, Bernardo Huberman. The Dynamics of Viral Marketing. ACM Conference on
Electronic Commerce (EC) 2006.
Theories from Empirical Diffusion Research:


Opinion leaders: who own “greater exposure to
mass media than their followers”, “are more
cosmopolite”, “have greater social participation” ,
“have higher socioeconomic status”, and “are
more innovative” [Rogers 2003, pp 316-318].
The importance of heterophily between
participants on certain attributes (i.e., education
and socioeconomic status) at determining the
efficiency of diffusion, despite the fact that “more
effective communication occurs when two or more
individuals are homophilous” [Rogers, 2003, pp19]
This project is to…

Model puv’s for cascade model



Identify the most influential factors at determining puv
Predict the success of contagion
Exploit Facebook data



A real-world, ongoing diffusion instance;
Rich and (most of the time) trustable profile information of
individuals and their social connections/activities;
Precisely timestamped diffusion process, a complete log of
events;
Status



Launched: Sep 25, 2008.
Currently used data is until: Nov 25, 2008.
 216 adopters,
 375 individuals,
 737 edges between 266 pairs of people,
 90 successful infection
 178 failed infection
Network Evolution (in the first month after release)
political view distribution
Gender distribution
90
12
82
80
10
70
8
56
adopters
6
47
50
female
non-adopters
4
male
40
2
26
adopters
r
at
h
ot
he
et
ic
rty
Ap
Pa
ica
n
Pa
bl
ep
u
R
D
0
rty
n
ria
ra
tic
em
oc
Li
be
rta
er
at
e
e
tiv
er
va
co
ns
10
er
al
0
20
lib
30
m
od
# of people
60
# of people
non-adopters
Religious View Distribution
Age distribution
16
30
14
25
10
adopters
8
non-adopters
6
4
people count
20
Non-adopter
15
Adopter
10
2
5
age
e
M
or
55
50
45
40
35
0
30
Religion
Other
25
Muslim
20
Christian
15
0
0
Count
12
Predict the success of invitation with SVM

A Binary classifier:


each invitation is either successful or failed.
Features


Individual features
Pair features (homophily/heterophily)
Individual Features
Social Activeness
Innovativeness
Socioeconomics
Education
# of events attended/invited
# of photo tagged
# of wall posts
# of networks
# of groups participated
# of notes
Religion
Political View
Gender
Age
Culture Background
Relationship Status
Work Info
Education Info
Pair-wise Features
Biological traits
Belief
Socioeconomics
Proximity
Age difference
Same gender?
Same political view?
Same religion?
Same culture background?
# of same networks
# of photos both tagged
# of groups both participated
# of events both attended
Same education level?
Same high school?
Same college?
Same workplace?
Same current city?
Each invitation is a training example - machine learning.
time
sender
receiver
class
sender
features
receiver
features
pair
features
2008-09-25
18:25:41
589483260
3621185
1
1:22 2:1 3:0 4:0
5:0 6:1 … …
35:1 47:0 48:0
49:0 50:0 51:0
……
68:0 69:0 70:0
74:1 76:1
……
2008-09-25
18:25:49
3621185
571023231
-1
…
…
…
…
…
…
…
…
…
…
…
…
…
…
2008-11-24
02:40:34
768059413
81405257
-1
…
…
…
…
…
Training
Data
* all numerical features are normalized across examples.
…
AdaBoost (with DecisionDump)
A popular way to do feature selection.

Selected Features







sender wall post count
sender group count
sender network count
receiver age
receiver group count
sender & receiver common group count
Performance (10-fold cross validation)

Accuracy: 83.6%
Class
precision
Recall
-1
83.5%
93.8%
1
83.8%
63.3%
SVM performance

SVM-light (10-fold cross-validation)
fold
accuracy
precision
recall
1
80.77
100
58.33
2
80.77
100
44.44
3
88.46
100
62.5
4
76.92
50
33.33
5
73.08
100
30
6
84.62
100
50
7
69.23
50
50
8
76.92
100
53.85
9
88.46
100
66.67
10
88.24
80
57.14
average
80.747
88
50.626
Weights from SVM
feature weight distribution
weight
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
receiver_age
receiver_groupCount
receiver_isChristian
sender_isWorking
sender_isModerate
sender_isCollege
sender_isInARelationship
sender_isChristian
sender_isMarried
sender_isOther
sender_age
receiver_photoTagged
receiver_isMiddleEastern
receiver_isMuslim
sameReligion
sameCollege
sender_networkCount receiver_isAtheist/Agnostic
receiver_eventCount
sameWorkPlace
receiver_isWorking
receiver_noteCount
sender_wallCount
receiver_isRepublic
feature
Result

SVM-light performance
209 records into 5 folds, 4 for training, 1 for testing.
Top weighted features:
Performance on the testing set:
8, sender_events_invited,
 Accuracy: 71.43% (30 correct, 12 incorrect,
42 total)
4, sender_friend_count,
11, sender_gender
Feature Weights
 Precision/recall:
55.56%/38.46%
35, receiver_is_It's


1.4
1.2

8
Feature
weights distribution
1
0.8
0.6
0.4
0.2
0
-0.2 0
-0.4
-0.6
4
Complicated
5, sender_wall_post_count,
9, sender_note_count
27. sender_is_In a
Relationship
35
So, the story
can be: when a sender
9
27 who has been invited to greater
28
of events
in Facebook,
friends,
wrote more
22 24has more
18
1 number
2
10 12
303132
33
17 192021
2526
3
5
10
15
20
25
30
40 posts, in a
34less wall
Facebook
notes (blog
entries),
is female,
has35
5 6
relationship, 11
tried to infect a person whose relationship status is “it’s
complicated”, it’s more like the infection will happen compared to other
cases.
SVM with features selected by AdaBoost
fold
accuracy
precision
recall
1
80.77
100
58.33
2
80.77
83.33
55.56
3
88.46
100
62.5
4
73.08
0
0
5
76.92
100
40
6
84.62
83.33
62.5
7
76.92
66.67
50
8
80.77
100
61.54
9
96.15
100
88.89
10
91.18
83.33
71.43
average
82.96
81.67
55.075
Background

Diffusion of Innovation

Question:



How does it work in large online social networks?
What are the key factors at determining the
success of infection?
Can we predict the propagation path?

Social influence depends on 5 dimensions of similarities:
 geographical distance
Hypothesis
current location(country/state/city), current school, current major, year of class,
current workplace, current courses enrolled;

background similarity
sex, sexual preference, dating interest, relationship interest, relationship status,
birthday, political view, religious view, hometown address, previous school,
previous workplace;

social similarity
number of mutual networks they belong to, number of mutual friends;

interest similarity
activities, favorite books, favorite music, favorite movies, favorite TV shows,
favorite quotas;

social status distance
difference of numbers of friends, difference of wallpost counts, difference of counts
of message sent and received, difference of counts of notes.
Project Description

Objectives



Identify the key factors for social influence;
Predict occurrence of adoption based on the key
factors.
Friendship Quiz



A Facebook application we developed;
Enable users to make quizzes and send to their
friends (take a peek!);
We track the spread of application.
Highlights




A real-world diffusion of innovation;
Rich and (most of the time) trustful profile
information of individuals and their social
connections/activities;
Precisely timestamped diffusion process, a
complete log of events;
Ongoing diffusion process
Backup: Threshold Model
Descargar

Social influence in diffusion of innovation over Facebook