Practical Applications of
Complex Network Theory
Niloy Ganguly (IIT Kharagpur)
Complex Network
 Society and nature of human interaction is a
Complex System
 Complex Network: A generic tool to model
complex systems
 There is a growing body of work on CNT Theory
 Applied to a variety of fields – Social, Biological,
Physical & Cognitive sciences, Engineering &
Technology
Objective of this Tutorial
 Familiarize with tools and techniques of complex
Network
 Present practical examples where complex
network concepts are used
Peer to peer networks
Language modeling
Outline of the Tutorial
 Part I: Introduction
 Network Analysis Techniques
 Network Synthesis Techniques
 Part II: Case Studies
 Topology management of peer to peer networks
 Self-organization of Sound Systems
 Conclusion and Discussions
Complex System
 Non-trivial properties and patterns emerging
from the interaction of a large number of simple
entities
 Self-organization: The process through which
these patterns evolve without any external
intervention or central control
 Emergent Property or Emergent Behavior: The
pattern that emerges due to self-organization
The best example from nature
A termite "cathedral" mound
produced by a termite colony
Emergence of a networked life
Communities
Atom
Organisms
Molecule
Tissue
Cell
Organs
Complex Network Theory
 Handy toolbox for modeling mesoscopy
 Marriage of Graph theory, social networks and
Statistics
 Complex because:
 Non-trivial topology
 Difficult to specify completely
 Usually large (in terms of nodes and edges)
 Dynamicity
 Provides insight into the nature and evolution of
the system being modeled
Internet and www
Genetic interaction network
9-11 Terrorist Network
Social Network Analysis is a
mathematical methodology for
connecting the dots -- using
science to fight terrorism.
Connecting multiple pairs of
dots soon reveals an emergent
network of organization.
CNT Examples:
Road and Airlines Network
What Questions can be asked
 Does these networks display some symmetry?
 Are these networks creation of intelligent objects
or they have emerged?
 How have these networks emerged?
 What are the underlying simple rules leading to
their complex formation?
Bi-directional Approach
Analysis of the real-world networks
Global topological properties
Community structure
Node-level properties
Synthesis of the network by means of
some simple rules
Preferential attachment models
Small-world models
Topological
Characterization
of Networks
Types Of Networks and Representation
Unipartite
Binary/
Weighted
Undirected/
Directed
Bipartite
Binary/
Weighted
Undirected/
Directed
Representation
a
b
c
1. Adjacency Matrix
a
0
1
1
2. Adjacency List
b
1
0
1
c
1
1
0
a
{b,c}
b
{a,c}
c
{a,b}
Properties of Adjacency Matrix
A={aij}, where i and j are nodes and aij=1 if
there is an edge between i an j.
A2 = A*A; Entries denote number paths of
length 2 between any two node (Σa
*a
)
ik
kj
k
In general, An denotes number of paths of
length n
Trace(A) = Σaii
How is the trace of A3 related to the
number of triangles in the network?
Characterization of Complex Networks
 They have a non-trivial topological structure
 Properties:
Heavy tail in the degree distribution (non-negligible
probability mass towards the tail; more than in the case of
an exp. distribution)
High clustering coefficient
Centrality Properties
Social Roles & Equivalence
Assortativity
Community Structure
Random Graphs & Small avg. path length
Preferential attachment
Small World Properties
Degree Distribution (DD)
 Let pk be the fraction of vertices in the network that has a
degree k.
 The k versus pk plot is defined as the degree distribution
of a network
 For most of the real world networks these distributions
are right skewed with a long right tail showing up values
far above the mean – pk varies as k-α
 Due to noisy and insufficient data sometimes the
definition is slightly modified
Cumulative degree distribution is plotted
 Probability that the degree of a node is greater than or
equal to k
A Few Examples
Power law: Pk ~ k-α
Friend of Friends
 Consider the following scenario
 Sourish and Ravi are friends
 Sourish and Shaunak are friends
 Are Shaunak and Ravi friends?
 If so then …
Ravi
Saurish
Saunak
 This property is known as transitivity
Measuring Transitivity: Clustering Coefficient
 The clustering coefficient for a vertex ‘v’ in a network
is defined as the ratio between the total number of
connections among the neighbors of ‘v’ to the total
number of possible connections between the
neighbors
 High clustering coefficient means my friends know
each other with high probability – a typical property
of social networks
Mathematically…
 The clustering coefficient of a vertex i is
# of links between ‘n’ neighbors
Ci =
n(n-1)/2
 The clustering coefficient of the whole network is
the average
C=
1
N
∑Ci
 Alternatively,
C=
# triangles in the n/w
# triples in the n/w
N etw o rk
C
C ran d
L
N
WWW
0 .1 078
0 .0 002 3
3 .1
1 531 27
In tern et
0 .1 8-0.3
0 .0 01
3 .7 -3 .7 6
3 015 6 209
A cto r
0 .7 9
0 .0 002 7
3 .6 5
2 252 26
C o au th orsh ip
0 .4 3
0 .0 001 8
5 .9
5 290 9
M etab o lic
0 .3 2
0 .0 26
2 .9
2 82
F oo dw eb
0 .2 2
0 .0 6
2 .4 3
1 34
C . eleg an ce
0 .2 8
0 .0 5
2 .6 5
2 82
Centrality
 Centrality measures are commonly described as
indices of 4 Ps -- prestige, prominence,
importance, and power
Degree – Count of immediate neighbors
Betweenness – Nodes that form a bridge between two
regions of the n/w
 Where σst is total number of shortest paths between s and
t and σst (v) is the total number of shortest paths from s to t
via v
Eigenvector centrality – Bonacich (1972)
It is not just how many people knows me
counts to my popularity (or power) but how
many people knows people who knows
me – this is recursive!
In context of HIV transmission – A person
x with one sex partner is less prone to the
disease than a person y with multiple
partners
But imagine what happens if the partner of x
has multiple partners
The basic idea of eigenvector centrality
Definition
 Eigenvector centrality is defined as the principal
eigenvector of the adjacency matrix
 Eigenvector of any symmetric matrix A = {aij} is
any vector e such that
Where λ is a constant and ei is the centrality of the node i
 What does it imply – centrality of a node is
proportional to the centrality of the nodes it is
connected to (recursively)…
 Practical Example: Google PageRank
Node Equivalence
Social Roles – Nodes (actors) in a social
n/w who have similar patterns of relations
(ties) with other nodes.
Three Different ways to find equivalence
classes:
Structural Equivalence
Automorphic Equivalence
Regular Equivalence
Structural Equivalence
Two nodes are said to be exactly
structurally equivalent if they have the
same relationships to all other nodes.
Computation:
Let A be the adjacency matrix.
Compute the Euclidean Distance
/Pearson Correlation between a
pair or rows/columns representing
the neighbor profile of two nodes
(say i and j). This value shows how
much structurally similar i and j are.
Automorphic Equivalence
 The idea of automorphic equivalence is that sets
of actors can be equivalent by being embedded
in local structures that have the same patterns of
ties -- "parallel" structures.
Swap(B,D) with all their
neighbors:The distances among
all the actors in the graph would
be exactly identical
Path vectors of i: how many
nodes are at distance 1, 2, …
from node i.
Amount of Equivalence:
Distance between path vectors
Regular Equivalence
Two nodes are said to be regularly
equivalent if they have the same profile of
ties with members of other sets of actors
that are also regularly equivalent.
1 tie with Class II
Class I
Class II
No tie with Class III
1 tie with Class I
1/2 tie(s) with Class III
Class III
No tie with Class I
1 tie with Class II
Assortativity (homophily)
Rich goes with the rich (selective linking)
A famous actor (e.g., Shah Rukh Khan) would
prefer to pair up with some other famous actor
(e.g., Rani Mukherjee) in a movie rather than
a new comer in the film industry.
Assortative
Scale-free network
Disassortative
Scale-free network
Measures of Assortativity
ANND (Average nearest neighbor degree)
Find the average degree of the neighbors of each node
i with degree k
Find the Pearson correlation (r) between the degree of i
and the average degree of its neighbors
For further reference see the supplementary material
Community structure
Community structure: a group of vertices that
have a high density of edges within them and a
low density of edges in between groups
Example:
•Friendship n/w of children
•Citation n/ws: research interest
•World Wide Web: subject matter
of pages
•Metabolic networks: Functional
units
•Linguistic n/ws: similar linguistic
categories
Some Examples
Community Structure in
Political Books
Community structure in a Social n/w of
Students (American High School)
Community Identification Algorithms
 Hierarchical
 Girvan-Newman
 Radicchi et al.
 Chinese Whispers
 Spectral Bisection
See (Newman 2004) for a comprehensive
survey (you will find the ref. in the
supplementary material)
Girvan-Newman Algorithm
Bisection Method
Calculate the betweenness for all edges in the
network.
Remove the edge with the highest
betweenness.
Recalculate betweennesses for all edges
affected by the removal.
Repeat from step 2 until no edges remain.
Evolution of Networks
Processes on Networks
Random Graphs & Small
Average Path Length
Q: What do we mean by a ‘random graph’?
A: Erdos-Renyi random graph model:
For every pair of nodes, draw an edge
between them with equal probability p.
Degrees of Separation in a Random Graph
Poisson distribution
• N nodes
• z neighbors per node, on average, z =<k>
• D degrees of separation
z
D
 N
D 
log N
log z
P(k)~ e-<k> <k>k/k!
Degree Distributions
A irlines
P oisson distribution
Exp o n e n tia l N e tw o rk
P ow er-law distribution
S cale -fre e N e tw o rk
Degree distributions for various networks
(a) World-Wide Web
(b) Coauthorship
networks: computer
science, high energy
physics, condensed
matter physics,
astrophysics
(c) Power grid of the
western United
States and Canada
(d) Social network of 43
Mormons in Utah
How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment
(Rich gets Richer)
(1) GROWTH : Starting with a small number of nodes (m0) at
every timestep we add a new node with m (<=m0) edges
(connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT : The probability Π that
a new node will be connected to node i depends on the
connectivity ki of that node
 (ki ) 
ki
 jk j
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
Growth analysis
Markov chain representation
Probability that the new edge is attached to any of the vertices of degree k
where total number of edges
Growth analysis
Markov chain representation
Growth dynamics at time (t+1)
Number of
nodes of
degree (k-1) at
t
Number of nodes
of degree k at t
Number of
nodes of
degree k at
t+1
Growth analysis
Markov chain representation
The net change in npk per vertex added
In the stationary solution, we find
Which results
The World is Small!
 “Registration fee for ICDCN 2008 are being
waived for all participants – get it collected from
the registration counter”
 How long do you think the above information will
take to spread among yourselves
 Experiments say it will spread very fast – within 6
hops from the initiator it would reach all
 This is the famous Milgram’s six degrees of
separation
The Small World Effect
Even in very large social networks, the average distance
between nodes is usually quite short.
Milgram’s small world experiment:
 Target individual in Boston
 Initial senders in Omaha, Nebraska
 Each sender was asked to forward a packet to a friend
who was closer to the target
 Friends asked to do the same
Result: Average of ‘six degrees’ of separation.
S. Milgram, The small world problem, Psych. Today, 2 (1967), pp. 60-67.
Measure of Small-Worldness
 Low average geodesic path length
 High clustering coefficient
 Geodesic path – Shortest path through the
network from one vertex to another
 Mean path length
 ℓ = 2∑i≥jdij/n(n+1) where dij is the geodesic distance from
vertex i to vertex j
Most of the networks observed in real world have ℓ ≤ 6
 Film actors
 Company Directors
 Emails
 Internet
 Electronic circuits
3.48
4.60
4.95
3.33
4.34
Clustering
C = Probability that two of a node’s neighbors
are themselves connected
In a random graph: Crand ~ 1/N (if the average
degree is held constant)
Watts-Strogatz ‘Small World’ Model
Watts and Strogatz introduced this simple model to
show how networks can have both short path lengths
and high clustering.
D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world”
networks, Nature, 393 (1998), pp. 440–442.
Small-world model
 Used for modeling network transitivity
 Many networks assume some kind of
geographical proximity
 Small-world model:
Start with a low-dimensional regular lattice
Rewire:
Add/remove edges to create shortcuts to join
remote parts of the lattice
For each edge with prob p move the other end to a
random vertex
 Rewiring allows to interpolate between regular
lattice and random graph
Small-world model
 Regular lattice (p=0):
Clustering coefficient C=(3k3)/(4k-2)=3/4
Mean distance L/4k
 Almost random graph
(p=1):
Rewiring probability p
Clustering coefficient C=2k/L
Mean distance log L / log k
 No power-law degree
distribution
Degree distribution
Case Study I
Analyzing the Vulnerability of Superpeer
Networks Against Churn and Attack
 Poster - Developing Analytical Framework to Measure
Stability of P2P Networks, ACM Sigcomm 2006 Pisa, Italy
 Brief Communication - Measuring Robustness of Superpeer
Topologies, PODC 2007, Portland, USA
 How stable are large superpeer networks against attack?
The Seventh IEEE Conference on Peer-to-Peer Computing,
Galway, Ireland, 2007
 Full paper - Analyzing the Vulnerability of the Superpeer
Networks Against Attack, ACM CCS, 14th ACM Conference
on Computer and Communications Security, Alexandria,
USA, 29 October - 2 Nov, 2007.
Client/Server architecture
Server
Client
Client
Internet
Client
Client
 Servers: Provide services.
 Clients : Request services from servers
 Very successful architecture
 WWW (HTTP), FTP, Web services, etc.
Client/Server architecture
Limitations
 Scalability : Hard to achieve
 Poor fault tolerance : Single point of failure
 Administration : Highly required
Peer to Peer architecture
Node
Node
Node
Internet
Node
Node
 All peers act as both clients and servers i.e. Servent (SERVer+cliENT)
 Provide and consume data
 Any node can initiate a connection
 No centralized data source
 “The ultimate form of democracy on the Internet”
 File sharing and other applications like IP telephony, distributed
storage, publish subscribe system etc
Peer to peer and overlay network
 An overlay network is built on
top of physical network
 Nodes are connected by virtual
or logical links
Underlying physical network
becomes unimportant
Interested in the complex graph
structure of overlay
Dynamicity of overlay networks
 Peers in the p2p system leave network
randomly without any central coordination (user
churn)
 Important peers are targeted for attack
 DoS attack drown important nodes in fastidious
computation
 Fail to provide services to other peers
 Importance of a node is defined by centrality measures
 Like degree centrality, betweenness centraliy etc
Dynamicity of overlay networks
 Peers in the p2p system leave network
randomly without any central coordination
(user churn)
 Important peers are targeted for attack
 Makes overlay structures highly dynamic in
nature
 Frequently it partitions the network into smaller
fragments
 Communication between peers become
impossible
Problem definition
 Investigating stability of the networks against the churn
and attack
Network Topology+ Dynamicity
= How (long) stable
 Developing an analytical framework
 Examining the impact of different structural parameters
upon stability
 Peer contribution
 degree of peers, superpeers
 their individual fractions
Steps followed to analyze
 Modeling of
 Overlay topologies
 pure p2p networks, superpeer networks, hybrid networks
 Various kinds of failures and attacks
 Defining stability metric
 Developing the analytical framework
 Validation through simulation
 Understanding impact of structural parameters
Modeling overlay topologies
 Topologies are modeled by various random
graphs characterized by degree distribution
pk = Fraction of nodes having degree k
Examples:
 Erdos-Renyi graph
 Scale free network
 Superpeer networks
Modeling overlay topologies:
E-R graph, scale free networks
 Erdos-Renyi graph
Degree distribution follows
Poisson distribution.
k
Average
degree
z
k!
 Scale free network
Degree distribution follows
power law distribution
p k  ck

0.7
0.6
Probability distribution (pk)
pk 
z e
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
25
30
Node degree (k)
35
40
45
50
Modeling overlay topologies:
Superpeer networks
 Superpeer networks emerge as most widely used
network




Small fraction of nodes are superpeers and rest are peers
KaZaA adopted this kind of topology
Can be modeled using bimodal degree distribution
Mathematically p k  0 if k  k l , k m
otherwise p k  0
Superpeer Node
Peer node
Modeling peer dynamics
 We propose a generalized model for peer dynamics
 Probability of removal of a node having degree k is
 fk  k,  models peer dynamics
 By changing the value of , we can obtain various peer
dynamics like
 random failure, degree dependent failure
 deterministic and degree dependent attack
 qk
event

models the probability of survival of a node of degree k after the disrupting
 qk=1-fk
Generalized model for peer dynamics
  = 0 (degree independent failure)
Probability of removal of a node (fk) is constant & degree
independent i.e. qk=q
  < 0 (degree dependent failure)
Probability of removal of a node (fk) is inversely
proportional to the degree of that node (1/k)
Peers having lower connectivity or bandwidth are less
stable because they enter and leave network frequently
  > 0 (Attack)
Peers with high degrees are targeted.
Modeling: Attack
Deterministic attack
Nodes having high degrees are progressively removed
 qk=0 when k>kmax
 0< qk< 1 when k=kmax
 qk=1 when k<kmax
Degree dependent attack
Nodes having high degrees are likely to be removed
Probability of removal of node having degree k
fk  1  qk  k

Stability Metric:
Percolation Threshold
Initially all the nodes in the
network are connected
Forms a single component
Nodes in the network
are connected and
form a single giant
component
Stability Metric:
Percolation Threshold
f fraction of
nodes
removed
Initial single
connected
component
Initially all the nodes in the
network are connected
Forms a single component
Size of the giant component is the
order of the network size
Giant component carries the
structural properties of the entire
network
Giant component
still exists
Stability Metric:
Percolation Threshold
f fraction of
nodes
removed
Initial single
connected
component
Giant component
still exists
fc fraction
of nodes
removed
The entire graph
breaks into smaller
fragments
Therefore fc =1-qc becomes the percolation threshold
Ordinary Percolation on Lattices
Fill in each link (bond percolation) or site (site percolation)
with probability p and ask questions about the sizes of
connected components.
Development of the analytical
framework
 Generating function:
 Formal power series whose coefficients encode information
P ( x )  a  a x  a x  a x  .........   a x

2
0
1
2
3
k
3
k
k 0
Here ( a 0 , a1 , a 2 ,.....)
encode information about a sequence
 Used to understand different properties of the graph


G0 ( x) 

pk x
k
k 0
generates probability distribution of the
vertex degrees.
 Average degree z   k   G 0 ' (1)
Development of the analytical
framework
 p k .q k
specifies the probability of a node having degree k to be present
in the network after fk = (1-qk) fraction of nodes removed.
k


F0 ( x ) 

k 0
k
.
p k q k x becomes the corresponding generating function
pk
(1-qk)
fraction of nodes
removed
p k .q k
Development of the analytical
framework

p k .q k
specifies the probability of a node having degree k to be
k
present in the network after (1-qk) fraction of nodes removed.


F0 ( x ) 

k
p k q k x becomes the corresponding generating function.
k 0
 Distribution of the outgoing edges of first neighbor of a randomly chosen
node
F1 ( x ) 
 kp
k
qk x

k
 kp
k
k 1
k

F0 ( x )
z
p k .q k
Random
node
First
neighbor
Development of the analytical
framework
 H1(x) generates the distribution of the size of the components
that are reached through random edge
 H1(x) satisfies the following condition
H 1 ( x )  1  F1 (1)  xF1 ( H 1 ( x ))
Development of the analytical
framework
 H 0 ( x ) generates distribution for the component size to which a
randomly selected node belongs to
H 0 ( x )  1  F0 (1)  xF0 ( H 1 ( x ))
 Average size of the components


F0 (1) F1 (1)
H 0 (1)  F0 (1) 

1  F1 (1)
 Average component size becomes infinity when
1  F1(1)  0
Development of the analytical
framework
1  F1(1)  0

Average component size becomes infinity when

With the help of generating function, we derive the following critical
condition for the stability of giant component


kp k ( kq k  q k  1)  0
k 0
Degree distribution

Peer dynamics
The critical condition is applicable
 For any kind of topology (modeled by pk)
 Undergoing any kind of dynamics (modeled by 1-qk)
Stability metric: simulation
 The theory is developed based on the concept
of infinite graph
 At percolation point
theoretically ‘infinite’ size graph reduces to the ‘finite’
size components
 In practice we work on finite graph
cannot simulate the phenomenon directly
 We approximate the percolation phenomenon
on finite graph with the help of condensation
theory
How to determine percolation point
during simulation?
 Let s denotes the size of a component and ns determines the
number of components of size s at time t
 At each timestep t a fraction of nodes is removed from the
network
sn s
CS
(
s
)

 Calculate component size distribution
t
 sn
s
s
 If CS t ( s ) becomes monotonically decreasing function at the
time t
 t becomes percolation point
Initial condition (t=1)
Intermediate condition (t=5)
Percolation point (t=10)
Outline of the results
Networks under
consideration
Superpeer networks
(Characterized by
bimodal degree
distribution )
Disrupting events
Degree independent
failure or random failure
Degree dependent failure
Degree dependent attack
Deterministic attack
(special case of degree
dependent attack ??)
Outline of the results
Networks under
consideration
Superpeer networks
(Characterized by
bimodal degree
distribution )
Disrupting events
Degree independent
failure or random failure
Degree dependent failure
Degree dependent attack
Deterministic attack
(special case of degree
dependent attack ??)
Stability against various failures
Degree independent random failure :
Percolation threshold
fc  1 
1
k 
2
k 
1
For superpeer networks
fc  1 
k r
 k   2  k  k m  2 rk m  k   r  k   k m  rk m
2
Average degree of the
network
2
Superpeer degree
2
Fraction of
peers
Stability against random failure
(superpeer networks)
 Comparative study between theoretical and
experimental results
 We keep average degree  k   5 fixed
0.95
fr (Percolation threshold)
fr (Percolation threshold)
0.95
0.9
0.85
0.85
0.8
Theoretical Km=30
Experimental Km=30
0.75
0.8
0.75
0.7
0.65
0.9
0.7
0.65
0.92
0.9
0.95
r (Fraction of peers)
1
Theoretical Km=50
Experimental Km=50
0.94
0.96
0.98
r (Fraction of peers)
1
Stability against random failure
(superpeer networks)
 Comparative study between theoretical and experimental
results
0.95
fr (Percolation threshold)
fr (Percolation threshold)
0.95
0.9
0.85
0.85
0.8
Theoretical Km=30
Experimental Km=30
0.75
0.9
0.95
r (Fraction of peers)
0.8
0.75
0.7
0.65
0.9
1
Theoretical Km=50
Experimental Km=50
0.7
0.65
0.92
0.94
0.96
0.98
r (Fraction of peers)
Increase of the fraction of superpeers (specially
above 15% to 20%) increases stability of the network

1
Stability against random failure
(superpeer networks)
 Comparative study between theoretical and experimental
results
0.95
fr (Percolation threshold)
fr (Percolation threshold)
0.95
0.9
0.85
0.85
0.8
Theoretical Km=30
Experimental Km=30
0.75
0.9
0.95
r (Fraction of peers)
0.8
0.75
0.7
0.65
0.9
1
Theoretical Km=50
Experimental Km=50
0.7
0.65
0.92
0.94
0.96
0.98
r (Fraction of peers)
1
There is a sharp fall of fc when fraction of superpeers
is less than 5%

Stability of superpeer networks
against deterministic attack
 Removal of a fraction of high
degree nodes are sufficient to
breakdown the network
 Case 2:
 Removal of all the high
degree nodes are not
sufficient to breakdown the
network
 Have to remove a fraction of
low degree nodes
1
Theoretical model (Case 1)
Theoretical model (Case 2)
Simulation results
Average degree k=10
Superpeer degree k =50
0.9
ft (Percolation threshold)
Two different cases may arise
 Case 1:
0.8
0.7
m
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
kl (Peer degree)
8
10
Stability of superpeer networks
against deterministic attack
 Removal of a fraction of high
degree nodes are sufficient to
breakdown the network
 Case 2:
1
0.9
ft (Percolation threshold)
Two different cases may arise
 Case 1:
0.8
0.7
Theoretical model (Case 1)
Theoretical model (Case 2)
Simulation results
Average degree k=10
Superpeer degree k =50
m
0.6
0.5
0.4
0.3
0.2
0.1
 Removal of all the high
0
0
2
4
6
degree nodes are not
kl (Peer degree)
sufficient to breakdown the
 Interesting
observation in case 1
network

Have todecreases
remove awith
fraction
of
 Stability
increasing
value of peers –
low degree nodes
counterintuitive
8
10
Peer contribution
 Controls the total bandwidth contributed by the
peers
Determines the amount of influence superpeer nodes
exert on the network
 Peer contribution
where
is the average degree
 We investigate the impact of peer contribution
upon the stability of the network
Impact of peer contribution for
deterministic attack
• The influence of high degree peers increases with
the increase of peer contribution
• This becomes more eminent as peer contribution
Impact of peer contribution for
deterministic attack
• Stability of the networks (
) having peer contribution
primarily depends upon the stability of peer ( )
Impact of peer contribution for
deterministic attack
 Stability of the network increases with peer contribution
for peer degree kl=3,5
 Gradually reduces with peer contribution for peer degree
kl=1
Stability of superpeer networks against
degree dependent attack
 Probability of removal of a node is directly
f k
proportional to its degree k
Hence
C  km

 Calculation of normalizing constant C
k
f

 Minimum value
C
 This yields an inequality

k
rk l
 1
( k l  1)  (1  r ) k m
 1

( k m  1)  k m (  k  ( k m  k l )  k m  2  k  )
Stability of superpeer networks against
degree dependent attack
 Probability of removal of a node is directly
proportional to its degree f k  k 
Hence
C  km

 Calculation of normalizing constant C
k
f

 Minimum value
C
 The solution set of the above inequality can be

k
either bounded ( 0   c   c )
or unbounded ( 0    )
c
bd
Degree dependent attack:
Impact of solution set
Three situations may arise
Removal of all the superpeers along with a
fraction of peers – Case 2 of deterministic
attack
Removal of only a fraction of superpeer – Case
1 of deterministic attack
Removal of some fraction of peers and
superpeers
Degree dependent attack:
Impact of solution set
Three situations may arise
Case 2 of deterministic attack
 Networks having bounded solution set

bd
k


 If  c   c , f  1
f

(0   c   c
bd
)
c
c
c
p
sp
l
C c
Case 1 of deterministic attack
 Networks having unbounded solution set ( 0   c  )


 If  c  ,
fp  0
0  f sp  1
c
c
Degree Dependent attack is a generalized case of
deterministic attack
Impact of critical exponent c Validation through
simulation
Bounded solution set with
 c  1 . 17
bd

Removal of any combination of
where
disintegrates the network

At   1 .17, all superpeer need to be removed
along with a fraction of peers
bd
c
Performed simulation
on graphs with N=5000
and 500 cases
 Good agreement between theoretical and
simulation results

Case Study :
Superpeer network with
kl=3, km=25, k=5
Summarization of the results
 Random failure
 Stability increases with superpeer degree and its fraction
 Drastic fall of the stability when fraction of superpeers is less
than 5%
 In deterministic attack, networks having small peer degrees
are very much vulnerable
 Increase in peer degree improves stability
 Superpeer degree is less important here!
 In degree dependent attack,
 Stability condition provides the critical exponent
 Amount of peers and superpeers required to be removed is
dependent upon
 More general kind of attack
Limitations
We have not considered the change in the
degree distribution in the network due to
disrupting events
Assumed that nodes are turned OFF
during disrupting events
Topological change in the network should
be included in the theory
Deformation of the network due to node
removal
 Removal of a node along with its adjacent links
changes the degrees of its neighbors
Hence changes the topology of the network
 Let initial degree distribution of the network be pk
 Probability of removal of a node having degree k is fk
 We represent the new degree distribution pk’ as a
function of pk and fk
Deformation of the network due to node
removal
 In this diagram, left node set denotes the
survived nodes and right node set denotes the
removed nodes
 The change in the degree distribution is
due to the edges removed from the left set
 We calculate the number of edges
connecting left and right set (E)
Deformation of the network due to node
removal
 The total number of tips in the surviving
node set is
 The probability of finding a random
tip that is going to be removed is
 The total number of edges running
between two subset
Deformation of the network due to node
removal
Probability of finding an edge in the surviving
(left) subset that is connected to a node of
removed (right) subset
Deformation of the network due to node
removal
• Removal of a node reduces the degree of
the survived nodes
• node having degree > k becomes a node
having degree k by losing one or more edges
• Probability that a survived node will lose
one edge becomes
Deformation of the network due to node
removal
Probability of finding a node having degree k (pk’) after removal of nodes following fk, depends upon
1. Probability that nodes having degree k, k+1, k+2 … will lose 0, 1, 2 etc edges respectively to become a
node having degree k
2. Probability that nodes having degree k, k+1, k+2 … will sustain k number of edges with them
Hence
Where
q
denotes the fraction of nodes in the survived (left) node set having degree
Deformation of the network due to node
removal
For random failure
,
Hence the new degree distribution after random failure becomes
Deformation of the network due to node
removal
Degree distribution of the Poisson and power law networks after the attack of the form
Percolation Threshold
Initially all the nodes in the
network are connected
Forms a single giant component
Size of the giant component is the
order of the network size
Nodes in the network
are connected and
form a single giant
component
Giant component carries the
structural properties of the entire
network
Percolation Threshold
f fraction of
nodes
removed
Initial single
connected
component
Giant component
still exists
Percolation Threshold
f fraction of
nodes
removed
Initial single
connected
component
Giant component
still exists
fc fraction
of nodes
removed
The entire graph
breaks into smaller
fragments
Therefore fc =1-qc becomes the percolation threshold
Percolation threshold
 Percolation condition of a network having
degree distribution pk can be given as
 After removal of fk fraction of nodes, if the
degree distribution of the network becomes pk’,
then the condition for percolation becomes
 Which leads to the following critical condition
for percolation
Percolation threshold
 From this equation, the percolation threshold fq
can be obtained numerically for any kind of
network characterized by degree distribution pk
under any kind of disrupting event checterized
by fk
 For random failure
, percolation
threshold becomes
fc  1 
1
k 
2
k 
1
Conclusion
Contribution of the work
Development of general framework to analyze the stability of
superpeer networks
Modeling the dynamic behavior of the peers using degree
independent failure as well as attack.
Comparative study between theoretical and simulation results to
show the effectiveness of our theoretical model.
CASE STUDY II: Self-Organization
of the Sound Inventories
 Rediscovering the Co-occurrence Principles of the Vowel
Inventories: A Complex Network Approach, Advances in
Complex System
 Modeling the Co-occurrence Principles of the Consonant
Inventories: A Complex Network Approach, Int. Jour. of Modern
Phy. C
 Redundancy Ratio: An Invariant Property of the Consonant
Inventories of the World's Languages, ACL(07), Prague, Czech
Republic.
 Analysis and Synthesis of the Distribution of Consonants over
Languages: A Complex Network Approach, COLING-ACL(06),
Sydney, Australia.
Human Communication
Human beings and many other living
organisms produce sound signals
Unlike other organisms, they can
concatenate these sounds to produce new
messages – Language
Language is one of the primary
cause/effect of human intelligence
Human Speech Sounds

Human speech sounds are called
phonemes – the smallest unit of a
language
Phonemes are characterized by certain
distinctive features like

I.
Place of articulation
II. Manner of articulation
III. Phonation
Mermelstein’s Model
Types of Phonemes
Consonants
Vowels
/i/
/a/
L
/u/
/t/
/p/
Diphthongs
/ai/
/k/
Choice of Phonemes
How a language chooses a set of
phonemes in order to build its sound
inventory?
Is the process arbitrary?
Certainly Not!
What are the forces affecting this choice?
Forces of Choice
A Linguistic System – How does it look?
/a/
Speaker
Desires “ease of articulation”
/a/
Listener / Learner
Desires “perceptual contrast” / “ease of learnability”
The forces shaping the choice are opposing – Hence there has to be a non-trivial solution
Vowels: A (Partially) Solved Mystery
Languages choose vowels based on
maximal perceptual contrast.
For instance if a language has three
vowels then in more than 95% of the
cases they are /a/,/i/, and /u/.
/a/
/i/
/u/
Maximally Distinct
Consonants: A J
i
g
s
a
w
puzzle
Research: From 1929 – Date
No single satisfactory explanation of the
organization of the consonant inventories
The set of features that characterize
consonants is much larger than that of vowels
No single force is sufficient to explain this
organization
Rather a complex interplay of forces goes on in
shaping these inventories
The Approach & Objective
We adopt a Complex Network Approach to
attack the problem of consonant
inventories
We try to figure out the principle of the
distribution of the occurrence of
consonants over languages
We also attempt to figure out the cooccurrence patterns (if any) that are found
across the consonant inventories
Principle of Occurrence
 PlaNet – The “Phoneme-Language Network”
A bipartite network N=(VL,VC,E)
VL : Nodes representing languages of the world
VC : Nodes representing consonants
E : Set of edges which run between VL and VC
Languages
L1
/ŋ/
L2
/m/
L3
/d/
Consonants
 There is an edge e Є E between two nodes
 vl Є VL and vc Є VC if the consonant c occurs
 in the language l.
/θ/
/s/
L4
/p/
The Structure of PlaNet
Construction of PlaNet
Data Source : UCLA Phonological
Inventory Database (UPSID)
Number of nodes in VL is 317
Number of nodes in VC is 541
Number of edges in E is 7022
Degree Distribution of PlaNet
0.08
DD of the language nodes follows a βdistribution
pk = beta(k) with α = 7.06,
and β = 47.64
0.06
pk
0.04
Γ(54.7) k6.06(1-k)46.64
pk =
Γ(7.06) Γ(47.64)
0.02
DD of the consonant nodes follows a
power-law with an exponential cut-off
kmin= 5, kmax= 173, kavg= 21
0
50
100
150
11
200
0.1
0.1
Language inventory size (degree k)
Pk
0.01
0.01
Distribution of Consonants over
Languages follow a power-law
Pk = k -0.71
Exponential Cut-off
0.001
0.001
11
10
100
10
100
Degree of a consonant, k
1000
1000
Synthesis of PlaNet
Given: VL = {L1, L2, ..., L317} sorted in the ascending order of their degrees and
541 unlabeled nodes in VC .
Step 0: All nodes in VC have degree 0.
Step t+1:
Choose a language node Lj (in order) with cardinality kj (inventory size)
for c running from 1 to kj do
Connect Lj preferentially with a consonant node Ci VC, to
which it is already not connected, with a probability
Pr(Ci) =
diα+ ε
∑xV* (dxα + ε)
where, di = degree of node Ci at step t and V* = subset of VC not
connected to Lj at t and ε is the smoothing parameter.
The Preferential Mechanism of Synthesis
L1
L2
L3
L4
After step 3
L1
L2
L3
L4
After step 4
Simulation Result
1
PlaNet
PlaNetsyn
PlaNetrand
Pk
.1
.01
.001
1
10
100
1000
Degree (k)
The parameters α and ε are 1.44 and 0.5 respectively.
The results are averaged over 100 runs
Principle of Co-occurrence
 Consonants tend to co-occur in groups or
communities
 These groups tend to be organized around a few
distinctive features (based on: manner of
articulation, place of articulation & phonation) –
Principle of feature economy
plosive
voiced
bilabial
/b/
dental
/d/
then it will also tend to have
voiceless
/p/
/t/
If a language has
in its inventory
How to Capture these Co-occurrences?
 PhoNet – “Phoneme Phoneme Network”
A weighted network N=(VC,E)
VC : Nodes representing consonants
E : Set of edges which run between the nodes in VC
 There is an edge e Є E between two nodes vc1 ,vc2 Є VC
if the consonant c1 and c2 co-occur in a language. The
number of languages in which c1 and c2 co-occurs defines
the edge-weight of e. The number of languages in which c1
occurs defines the node-weight of vc1.
/k′/
50
14
42
/kw/
39
38
/k/
283
13
17
/d′/
Construction of PhoNet
 Data Source : UPSID
 Number of nodes in VC is 541
 Number of edges is 34012
PhoNet
Community Structures in PhoNet
 Radicchi et al. algorithm (for unweighted networks) –
Counts number of triangles that an edge is a part of.
Inter-community edges will have low count so remove
them.
 Modification for a weighted network like PhoNet
 Look for triangles, where the weights on the edges are
comparable.
 If they are comparable, then the group of consonants cooccur highly else it is not so.
 Measure strength S for each edge (u,v) in PhoNet where S
is,
wuv
if √Σi Є Vc-{u,v}(wui – wvi)2>0 else S = ∞
S=
2
√Σi Є Vc-{u,v}(wui – wvi)
 Remove edges with S less than a threshold η
Community Formation
1
52
110
100
3
101
2
10
5
5.17
10.94
4
45
46
1
S
11.11
6
3
0.06
4
7.14
2
5
7.5
3.77
6
η>1
1
For different values of η we get
different sets of communities
2
4
3
5
6
Consonant Societies!
η=0.35
η=0.60
η=0.72
η=1.25
The Binding Force of the Communities:
Feature Economy
 Feature Entropy: The idea is borrowed from information
theory
 For a community C of size N, let there be pf consonants
for which a particular feature f is present and qf other
consonants for which f is absent – probability that a
consonant chosen from C has f is pf /N and that it does
have f is qf /N or (1- pf /N)
 Feature entropy can be therefore defined as
FE = ΣFЄf(-(pf /N)log(pf /N) – (qf /N)log(qf /N))
 where F is the set of all features present in the consonants in C
 Essentially the number of bits needed to transmit the
entire information about C through a channel.
Computing Feature Entropy
Lower FE -> C1 economizes on
the number of features
Higher FE -> C2 does not economize
on the number of features
If the Inventories had Evolved by Chance!
 Construction of PhoNetrand
 For each consonant c let the frequency of occurrence in UPSID
be denoted by fc.
 Let there be 317 bins each corresponding to a language in
UPSID.
 fc bins are then chosen uniformly at random and the consonant
c is packed into these bins without repetition.
 Thus the consonant inventories of the 317 languages
corresponding to the bins are generated.
 PhoNetrand can be constructed from these new consonant
inventories similarly as PhoNet.
 Cluster PhoNetrand by the method proposed earlier
Comparison between PhoNet and PhoNetrand
Average Feature Entropy
10
5
PhoNet
PhoNetrand
0
0
5
10
15
20
Community Size
The curve shows the average feature entropy of the
communities of a particular size versus the community size
Conclusion
Covered
Understanding topological properties of
complex networks
Case Study – peer to peer networks and
language modeling
Not covered
Processes on networks
Search, Epidemics etc.
Contact
 Niloy Ganguly
[email protected]
http://www.facweb.iitkgp.ernet.in/~niloy/
 Workshop
Dynamics on and of complex network
 http://www.cel.iitkgp.ernet.in/~eccs07/
Sequel : Book Volume to be published in May 2008
from Birkhauser Boston.
Thank you!!
Descargar

Emergent Systems