Practical Applications of Complex Network Theory Niloy Ganguly (IIT Kharagpur) Complex Network Society and nature of human interaction is a Complex System Complex Network: A generic tool to model complex systems There is a growing body of work on CNT Theory Applied to a variety of fields – Social, Biological, Physical & Cognitive sciences, Engineering & Technology Objective of this Tutorial Familiarize with tools and techniques of complex Network Present practical examples where complex network concepts are used Peer to peer networks Language modeling Outline of the Tutorial Part I: Introduction Network Analysis Techniques Network Synthesis Techniques Part II: Case Studies Topology management of peer to peer networks Self-organization of Sound Systems Conclusion and Discussions Complex System Non-trivial properties and patterns emerging from the interaction of a large number of simple entities Self-organization: The process through which these patterns evolve without any external intervention or central control Emergent Property or Emergent Behavior: The pattern that emerges due to self-organization The best example from nature A termite "cathedral" mound produced by a termite colony Emergence of a networked life Communities Atom Organisms Molecule Tissue Cell Organs Complex Network Theory Handy toolbox for modeling mesoscopy Marriage of Graph theory, social networks and Statistics Complex because: Non-trivial topology Difficult to specify completely Usually large (in terms of nodes and edges) Dynamicity Provides insight into the nature and evolution of the system being modeled Internet and www Genetic interaction network 9-11 Terrorist Network Social Network Analysis is a mathematical methodology for connecting the dots -- using science to fight terrorism. Connecting multiple pairs of dots soon reveals an emergent network of organization. CNT Examples: Road and Airlines Network What Questions can be asked Does these networks display some symmetry? Are these networks creation of intelligent objects or they have emerged? How have these networks emerged? What are the underlying simple rules leading to their complex formation? Bi-directional Approach Analysis of the real-world networks Global topological properties Community structure Node-level properties Synthesis of the network by means of some simple rules Preferential attachment models Small-world models Topological Characterization of Networks Types Of Networks and Representation Unipartite Binary/ Weighted Undirected/ Directed Bipartite Binary/ Weighted Undirected/ Directed Representation a b c 1. Adjacency Matrix a 0 1 1 2. Adjacency List b 1 0 1 c 1 1 0 a {b,c} b {a,c} c {a,b} Properties of Adjacency Matrix A={aij}, where i and j are nodes and aij=1 if there is an edge between i an j. A2 = A*A; Entries denote number paths of length 2 between any two node (Σa *a ) ik kj k In general, An denotes number of paths of length n Trace(A) = Σaii How is the trace of A3 related to the number of triangles in the network? Characterization of Complex Networks They have a non-trivial topological structure Properties: Heavy tail in the degree distribution (non-negligible probability mass towards the tail; more than in the case of an exp. distribution) High clustering coefficient Centrality Properties Social Roles & Equivalence Assortativity Community Structure Random Graphs & Small avg. path length Preferential attachment Small World Properties Degree Distribution (DD) Let pk be the fraction of vertices in the network that has a degree k. The k versus pk plot is defined as the degree distribution of a network For most of the real world networks these distributions are right skewed with a long right tail showing up values far above the mean – pk varies as k-α Due to noisy and insufficient data sometimes the definition is slightly modified Cumulative degree distribution is plotted Probability that the degree of a node is greater than or equal to k A Few Examples Power law: Pk ~ k-α Friend of Friends Consider the following scenario Sourish and Ravi are friends Sourish and Shaunak are friends Are Shaunak and Ravi friends? If so then … Ravi Saurish Saunak This property is known as transitivity Measuring Transitivity: Clustering Coefficient The clustering coefficient for a vertex ‘v’ in a network is defined as the ratio between the total number of connections among the neighbors of ‘v’ to the total number of possible connections between the neighbors High clustering coefficient means my friends know each other with high probability – a typical property of social networks Mathematically… The clustering coefficient of a vertex i is # of links between ‘n’ neighbors Ci = n(n-1)/2 The clustering coefficient of the whole network is the average C= 1 N ∑Ci Alternatively, C= # triangles in the n/w # triples in the n/w N etw o rk C C ran d L N WWW 0 .1 078 0 .0 002 3 3 .1 1 531 27 In tern et 0 .1 8-0.3 0 .0 01 3 .7 -3 .7 6 3 015 6 209 A cto r 0 .7 9 0 .0 002 7 3 .6 5 2 252 26 C o au th orsh ip 0 .4 3 0 .0 001 8 5 .9 5 290 9 M etab o lic 0 .3 2 0 .0 26 2 .9 2 82 F oo dw eb 0 .2 2 0 .0 6 2 .4 3 1 34 C . eleg an ce 0 .2 8 0 .0 5 2 .6 5 2 82 Centrality Centrality measures are commonly described as indices of 4 Ps -- prestige, prominence, importance, and power Degree – Count of immediate neighbors Betweenness – Nodes that form a bridge between two regions of the n/w Where σst is total number of shortest paths between s and t and σst (v) is the total number of shortest paths from s to t via v Eigenvector centrality – Bonacich (1972) It is not just how many people knows me counts to my popularity (or power) but how many people knows people who knows me – this is recursive! In context of HIV transmission – A person x with one sex partner is less prone to the disease than a person y with multiple partners But imagine what happens if the partner of x has multiple partners The basic idea of eigenvector centrality Definition Eigenvector centrality is defined as the principal eigenvector of the adjacency matrix Eigenvector of any symmetric matrix A = {aij} is any vector e such that Where λ is a constant and ei is the centrality of the node i What does it imply – centrality of a node is proportional to the centrality of the nodes it is connected to (recursively)… Practical Example: Google PageRank Node Equivalence Social Roles – Nodes (actors) in a social n/w who have similar patterns of relations (ties) with other nodes. Three Different ways to find equivalence classes: Structural Equivalence Automorphic Equivalence Regular Equivalence Structural Equivalence Two nodes are said to be exactly structurally equivalent if they have the same relationships to all other nodes. Computation: Let A be the adjacency matrix. Compute the Euclidean Distance /Pearson Correlation between a pair or rows/columns representing the neighbor profile of two nodes (say i and j). This value shows how much structurally similar i and j are. Automorphic Equivalence The idea of automorphic equivalence is that sets of actors can be equivalent by being embedded in local structures that have the same patterns of ties -- "parallel" structures. Swap(B,D) with all their neighbors:The distances among all the actors in the graph would be exactly identical Path vectors of i: how many nodes are at distance 1, 2, … from node i. Amount of Equivalence: Distance between path vectors Regular Equivalence Two nodes are said to be regularly equivalent if they have the same profile of ties with members of other sets of actors that are also regularly equivalent. 1 tie with Class II Class I Class II No tie with Class III 1 tie with Class I 1/2 tie(s) with Class III Class III No tie with Class I 1 tie with Class II Assortativity (homophily) Rich goes with the rich (selective linking) A famous actor (e.g., Shah Rukh Khan) would prefer to pair up with some other famous actor (e.g., Rani Mukherjee) in a movie rather than a new comer in the film industry. Assortative Scale-free network Disassortative Scale-free network Measures of Assortativity ANND (Average nearest neighbor degree) Find the average degree of the neighbors of each node i with degree k Find the Pearson correlation (r) between the degree of i and the average degree of its neighbors For further reference see the supplementary material Community structure Community structure: a group of vertices that have a high density of edges within them and a low density of edges in between groups Example: •Friendship n/w of children •Citation n/ws: research interest •World Wide Web: subject matter of pages •Metabolic networks: Functional units •Linguistic n/ws: similar linguistic categories Some Examples Community Structure in Political Books Community structure in a Social n/w of Students (American High School) Community Identification Algorithms Hierarchical Girvan-Newman Radicchi et al. Chinese Whispers Spectral Bisection See (Newman 2004) for a comprehensive survey (you will find the ref. in the supplementary material) Girvan-Newman Algorithm Bisection Method Calculate the betweenness for all edges in the network. Remove the edge with the highest betweenness. Recalculate betweennesses for all edges affected by the removal. Repeat from step 2 until no edges remain. Evolution of Networks Processes on Networks Random Graphs & Small Average Path Length Q: What do we mean by a ‘random graph’? A: Erdos-Renyi random graph model: For every pair of nodes, draw an edge between them with equal probability p. Degrees of Separation in a Random Graph Poisson distribution • N nodes • z neighbors per node, on average, z =<k> • D degrees of separation z D N D log N log z P(k)~ e-<k> <k>k/k! Degree Distributions A irlines P oisson distribution Exp o n e n tia l N e tw o rk P ow er-law distribution S cale -fre e N e tw o rk Degree distributions for various networks (a) World-Wide Web (b) Coauthorship networks: computer science, high energy physics, condensed matter physics, astrophysics (c) Power grid of the western United States and Canada (d) Social network of 43 Mormons in Utah How do Power law DDs arise? Barabási-Albert Model of Preferential Attachment (Rich gets Richer) (1) GROWTH : Starting with a small number of nodes (m0) at every timestep we add a new node with m (<=m0) edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node (ki ) ki jk j A.-L.Barabási, R. Albert, Science 286, 509 (1999) Growth analysis Markov chain representation Probability that the new edge is attached to any of the vertices of degree k where total number of edges Growth analysis Markov chain representation Growth dynamics at time (t+1) Number of nodes of degree (k-1) at t Number of nodes of degree k at t Number of nodes of degree k at t+1 Growth analysis Markov chain representation The net change in npk per vertex added In the stationary solution, we find Which results The World is Small! “Registration fee for ICDCN 2008 are being waived for all participants – get it collected from the registration counter” How long do you think the above information will take to spread among yourselves Experiments say it will spread very fast – within 6 hops from the initiator it would reach all This is the famous Milgram’s six degrees of separation The Small World Effect Even in very large social networks, the average distance between nodes is usually quite short. Milgram’s small world experiment: Target individual in Boston Initial senders in Omaha, Nebraska Each sender was asked to forward a packet to a friend who was closer to the target Friends asked to do the same Result: Average of ‘six degrees’ of separation. S. Milgram, The small world problem, Psych. Today, 2 (1967), pp. 60-67. Measure of Small-Worldness Low average geodesic path length High clustering coefficient Geodesic path – Shortest path through the network from one vertex to another Mean path length ℓ = 2∑i≥jdij/n(n+1) where dij is the geodesic distance from vertex i to vertex j Most of the networks observed in real world have ℓ ≤ 6 Film actors Company Directors Emails Internet Electronic circuits 3.48 4.60 4.95 3.33 4.34 Clustering C = Probability that two of a node’s neighbors are themselves connected In a random graph: Crand ~ 1/N (if the average degree is held constant) Watts-Strogatz ‘Small World’ Model Watts and Strogatz introduced this simple model to show how networks can have both short path lengths and high clustering. D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world” networks, Nature, 393 (1998), pp. 440–442. Small-world model Used for modeling network transitivity Many networks assume some kind of geographical proximity Small-world model: Start with a low-dimensional regular lattice Rewire: Add/remove edges to create shortcuts to join remote parts of the lattice For each edge with prob p move the other end to a random vertex Rewiring allows to interpolate between regular lattice and random graph Small-world model Regular lattice (p=0): Clustering coefficient C=(3k3)/(4k-2)=3/4 Mean distance L/4k Almost random graph (p=1): Rewiring probability p Clustering coefficient C=2k/L Mean distance log L / log k No power-law degree distribution Degree distribution Case Study I Analyzing the Vulnerability of Superpeer Networks Against Churn and Attack Poster - Developing Analytical Framework to Measure Stability of P2P Networks, ACM Sigcomm 2006 Pisa, Italy Brief Communication - Measuring Robustness of Superpeer Topologies, PODC 2007, Portland, USA How stable are large superpeer networks against attack? The Seventh IEEE Conference on Peer-to-Peer Computing, Galway, Ireland, 2007 Full paper - Analyzing the Vulnerability of the Superpeer Networks Against Attack, ACM CCS, 14th ACM Conference on Computer and Communications Security, Alexandria, USA, 29 October - 2 Nov, 2007. Client/Server architecture Server Client Client Internet Client Client Servers: Provide services. Clients : Request services from servers Very successful architecture WWW (HTTP), FTP, Web services, etc. Client/Server architecture Limitations Scalability : Hard to achieve Poor fault tolerance : Single point of failure Administration : Highly required Peer to Peer architecture Node Node Node Internet Node Node All peers act as both clients and servers i.e. Servent (SERVer+cliENT) Provide and consume data Any node can initiate a connection No centralized data source “The ultimate form of democracy on the Internet” File sharing and other applications like IP telephony, distributed storage, publish subscribe system etc Peer to peer and overlay network An overlay network is built on top of physical network Nodes are connected by virtual or logical links Underlying physical network becomes unimportant Interested in the complex graph structure of overlay Dynamicity of overlay networks Peers in the p2p system leave network randomly without any central coordination (user churn) Important peers are targeted for attack DoS attack drown important nodes in fastidious computation Fail to provide services to other peers Importance of a node is defined by centrality measures Like degree centrality, betweenness centraliy etc Dynamicity of overlay networks Peers in the p2p system leave network randomly without any central coordination (user churn) Important peers are targeted for attack Makes overlay structures highly dynamic in nature Frequently it partitions the network into smaller fragments Communication between peers become impossible Problem definition Investigating stability of the networks against the churn and attack Network Topology+ Dynamicity = How (long) stable Developing an analytical framework Examining the impact of different structural parameters upon stability Peer contribution degree of peers, superpeers their individual fractions Steps followed to analyze Modeling of Overlay topologies pure p2p networks, superpeer networks, hybrid networks Various kinds of failures and attacks Defining stability metric Developing the analytical framework Validation through simulation Understanding impact of structural parameters Modeling overlay topologies Topologies are modeled by various random graphs characterized by degree distribution pk = Fraction of nodes having degree k Examples: Erdos-Renyi graph Scale free network Superpeer networks Modeling overlay topologies: E-R graph, scale free networks Erdos-Renyi graph Degree distribution follows Poisson distribution. k Average degree z k! Scale free network Degree distribution follows power law distribution p k ck 0.7 0.6 Probability distribution (pk) pk z e 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 Node degree (k) 35 40 45 50 Modeling overlay topologies: Superpeer networks Superpeer networks emerge as most widely used network Small fraction of nodes are superpeers and rest are peers KaZaA adopted this kind of topology Can be modeled using bimodal degree distribution Mathematically p k 0 if k k l , k m otherwise p k 0 Superpeer Node Peer node Modeling peer dynamics We propose a generalized model for peer dynamics Probability of removal of a node having degree k is fk k, models peer dynamics By changing the value of , we can obtain various peer dynamics like random failure, degree dependent failure deterministic and degree dependent attack qk event models the probability of survival of a node of degree k after the disrupting qk=1-fk Generalized model for peer dynamics = 0 (degree independent failure) Probability of removal of a node (fk) is constant & degree independent i.e. qk=q < 0 (degree dependent failure) Probability of removal of a node (fk) is inversely proportional to the degree of that node (1/k) Peers having lower connectivity or bandwidth are less stable because they enter and leave network frequently > 0 (Attack) Peers with high degrees are targeted. Modeling: Attack Deterministic attack Nodes having high degrees are progressively removed qk=0 when k>kmax 0< qk< 1 when k=kmax qk=1 when k<kmax Degree dependent attack Nodes having high degrees are likely to be removed Probability of removal of node having degree k fk 1 qk k Stability Metric: Percolation Threshold Initially all the nodes in the network are connected Forms a single component Nodes in the network are connected and form a single giant component Stability Metric: Percolation Threshold f fraction of nodes removed Initial single connected component Initially all the nodes in the network are connected Forms a single component Size of the giant component is the order of the network size Giant component carries the structural properties of the entire network Giant component still exists Stability Metric: Percolation Threshold f fraction of nodes removed Initial single connected component Giant component still exists fc fraction of nodes removed The entire graph breaks into smaller fragments Therefore fc =1-qc becomes the percolation threshold Ordinary Percolation on Lattices Fill in each link (bond percolation) or site (site percolation) with probability p and ask questions about the sizes of connected components. Development of the analytical framework Generating function: Formal power series whose coefficients encode information P ( x ) a a x a x a x ......... a x 2 0 1 2 3 k 3 k k 0 Here ( a 0 , a1 , a 2 ,.....) encode information about a sequence Used to understand different properties of the graph G0 ( x) pk x k k 0 generates probability distribution of the vertex degrees. Average degree z k G 0 ' (1) Development of the analytical framework p k .q k specifies the probability of a node having degree k to be present in the network after fk = (1-qk) fraction of nodes removed. k F0 ( x ) k 0 k . p k q k x becomes the corresponding generating function pk (1-qk) fraction of nodes removed p k .q k Development of the analytical framework p k .q k specifies the probability of a node having degree k to be k present in the network after (1-qk) fraction of nodes removed. F0 ( x ) k p k q k x becomes the corresponding generating function. k 0 Distribution of the outgoing edges of first neighbor of a randomly chosen node F1 ( x ) kp k qk x k kp k k 1 k F0 ( x ) z p k .q k Random node First neighbor Development of the analytical framework H1(x) generates the distribution of the size of the components that are reached through random edge H1(x) satisfies the following condition H 1 ( x ) 1 F1 (1) xF1 ( H 1 ( x )) Development of the analytical framework H 0 ( x ) generates distribution for the component size to which a randomly selected node belongs to H 0 ( x ) 1 F0 (1) xF0 ( H 1 ( x )) Average size of the components F0 (1) F1 (1) H 0 (1) F0 (1) 1 F1 (1) Average component size becomes infinity when 1 F1(1) 0 Development of the analytical framework 1 F1(1) 0 Average component size becomes infinity when With the help of generating function, we derive the following critical condition for the stability of giant component kp k ( kq k q k 1) 0 k 0 Degree distribution Peer dynamics The critical condition is applicable For any kind of topology (modeled by pk) Undergoing any kind of dynamics (modeled by 1-qk) Stability metric: simulation The theory is developed based on the concept of infinite graph At percolation point theoretically ‘infinite’ size graph reduces to the ‘finite’ size components In practice we work on finite graph cannot simulate the phenomenon directly We approximate the percolation phenomenon on finite graph with the help of condensation theory How to determine percolation point during simulation? Let s denotes the size of a component and ns determines the number of components of size s at time t At each timestep t a fraction of nodes is removed from the network sn s CS ( s ) Calculate component size distribution t sn s s If CS t ( s ) becomes monotonically decreasing function at the time t t becomes percolation point Initial condition (t=1) Intermediate condition (t=5) Percolation point (t=10) Outline of the results Networks under consideration Superpeer networks (Characterized by bimodal degree distribution ) Disrupting events Degree independent failure or random failure Degree dependent failure Degree dependent attack Deterministic attack (special case of degree dependent attack ??) Outline of the results Networks under consideration Superpeer networks (Characterized by bimodal degree distribution ) Disrupting events Degree independent failure or random failure Degree dependent failure Degree dependent attack Deterministic attack (special case of degree dependent attack ??) Stability against various failures Degree independent random failure : Percolation threshold fc 1 1 k 2 k 1 For superpeer networks fc 1 k r k 2 k k m 2 rk m k r k k m rk m 2 Average degree of the network 2 Superpeer degree 2 Fraction of peers Stability against random failure (superpeer networks) Comparative study between theoretical and experimental results We keep average degree k 5 fixed 0.95 fr (Percolation threshold) fr (Percolation threshold) 0.95 0.9 0.85 0.85 0.8 Theoretical Km=30 Experimental Km=30 0.75 0.8 0.75 0.7 0.65 0.9 0.7 0.65 0.92 0.9 0.95 r (Fraction of peers) 1 Theoretical Km=50 Experimental Km=50 0.94 0.96 0.98 r (Fraction of peers) 1 Stability against random failure (superpeer networks) Comparative study between theoretical and experimental results 0.95 fr (Percolation threshold) fr (Percolation threshold) 0.95 0.9 0.85 0.85 0.8 Theoretical Km=30 Experimental Km=30 0.75 0.9 0.95 r (Fraction of peers) 0.8 0.75 0.7 0.65 0.9 1 Theoretical Km=50 Experimental Km=50 0.7 0.65 0.92 0.94 0.96 0.98 r (Fraction of peers) Increase of the fraction of superpeers (specially above 15% to 20%) increases stability of the network 1 Stability against random failure (superpeer networks) Comparative study between theoretical and experimental results 0.95 fr (Percolation threshold) fr (Percolation threshold) 0.95 0.9 0.85 0.85 0.8 Theoretical Km=30 Experimental Km=30 0.75 0.9 0.95 r (Fraction of peers) 0.8 0.75 0.7 0.65 0.9 1 Theoretical Km=50 Experimental Km=50 0.7 0.65 0.92 0.94 0.96 0.98 r (Fraction of peers) 1 There is a sharp fall of fc when fraction of superpeers is less than 5% Stability of superpeer networks against deterministic attack Removal of a fraction of high degree nodes are sufficient to breakdown the network Case 2: Removal of all the high degree nodes are not sufficient to breakdown the network Have to remove a fraction of low degree nodes 1 Theoretical model (Case 1) Theoretical model (Case 2) Simulation results Average degree k=10 Superpeer degree k =50 0.9 ft (Percolation threshold) Two different cases may arise Case 1: 0.8 0.7 m 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 kl (Peer degree) 8 10 Stability of superpeer networks against deterministic attack Removal of a fraction of high degree nodes are sufficient to breakdown the network Case 2: 1 0.9 ft (Percolation threshold) Two different cases may arise Case 1: 0.8 0.7 Theoretical model (Case 1) Theoretical model (Case 2) Simulation results Average degree k=10 Superpeer degree k =50 m 0.6 0.5 0.4 0.3 0.2 0.1 Removal of all the high 0 0 2 4 6 degree nodes are not kl (Peer degree) sufficient to breakdown the Interesting observation in case 1 network Have todecreases remove awith fraction of Stability increasing value of peers – low degree nodes counterintuitive 8 10 Peer contribution Controls the total bandwidth contributed by the peers Determines the amount of influence superpeer nodes exert on the network Peer contribution where is the average degree We investigate the impact of peer contribution upon the stability of the network Impact of peer contribution for deterministic attack • The influence of high degree peers increases with the increase of peer contribution • This becomes more eminent as peer contribution Impact of peer contribution for deterministic attack • Stability of the networks ( ) having peer contribution primarily depends upon the stability of peer ( ) Impact of peer contribution for deterministic attack Stability of the network increases with peer contribution for peer degree kl=3,5 Gradually reduces with peer contribution for peer degree kl=1 Stability of superpeer networks against degree dependent attack Probability of removal of a node is directly f k proportional to its degree k Hence C km Calculation of normalizing constant C k f Minimum value C This yields an inequality k rk l 1 ( k l 1) (1 r ) k m 1 ( k m 1) k m ( k ( k m k l ) k m 2 k ) Stability of superpeer networks against degree dependent attack Probability of removal of a node is directly proportional to its degree f k k Hence C km Calculation of normalizing constant C k f Minimum value C The solution set of the above inequality can be k either bounded ( 0 c c ) or unbounded ( 0 ) c bd Degree dependent attack: Impact of solution set Three situations may arise Removal of all the superpeers along with a fraction of peers – Case 2 of deterministic attack Removal of only a fraction of superpeer – Case 1 of deterministic attack Removal of some fraction of peers and superpeers Degree dependent attack: Impact of solution set Three situations may arise Case 2 of deterministic attack Networks having bounded solution set bd k If c c , f 1 f (0 c c bd ) c c c p sp l C c Case 1 of deterministic attack Networks having unbounded solution set ( 0 c ) If c , fp 0 0 f sp 1 c c Degree Dependent attack is a generalized case of deterministic attack Impact of critical exponent c Validation through simulation Bounded solution set with c 1 . 17 bd Removal of any combination of where disintegrates the network At 1 .17, all superpeer need to be removed along with a fraction of peers bd c Performed simulation on graphs with N=5000 and 500 cases Good agreement between theoretical and simulation results Case Study : Superpeer network with kl=3, km=25, k=5 Summarization of the results Random failure Stability increases with superpeer degree and its fraction Drastic fall of the stability when fraction of superpeers is less than 5% In deterministic attack, networks having small peer degrees are very much vulnerable Increase in peer degree improves stability Superpeer degree is less important here! In degree dependent attack, Stability condition provides the critical exponent Amount of peers and superpeers required to be removed is dependent upon More general kind of attack Limitations We have not considered the change in the degree distribution in the network due to disrupting events Assumed that nodes are turned OFF during disrupting events Topological change in the network should be included in the theory Deformation of the network due to node removal Removal of a node along with its adjacent links changes the degrees of its neighbors Hence changes the topology of the network Let initial degree distribution of the network be pk Probability of removal of a node having degree k is fk We represent the new degree distribution pk’ as a function of pk and fk Deformation of the network due to node removal In this diagram, left node set denotes the survived nodes and right node set denotes the removed nodes The change in the degree distribution is due to the edges removed from the left set We calculate the number of edges connecting left and right set (E) Deformation of the network due to node removal The total number of tips in the surviving node set is The probability of finding a random tip that is going to be removed is The total number of edges running between two subset Deformation of the network due to node removal Probability of finding an edge in the surviving (left) subset that is connected to a node of removed (right) subset Deformation of the network due to node removal • Removal of a node reduces the degree of the survived nodes • node having degree > k becomes a node having degree k by losing one or more edges • Probability that a survived node will lose one edge becomes Deformation of the network due to node removal Probability of finding a node having degree k (pk’) after removal of nodes following fk, depends upon 1. Probability that nodes having degree k, k+1, k+2 … will lose 0, 1, 2 etc edges respectively to become a node having degree k 2. Probability that nodes having degree k, k+1, k+2 … will sustain k number of edges with them Hence Where q denotes the fraction of nodes in the survived (left) node set having degree Deformation of the network due to node removal For random failure , Hence the new degree distribution after random failure becomes Deformation of the network due to node removal Degree distribution of the Poisson and power law networks after the attack of the form Percolation Threshold Initially all the nodes in the network are connected Forms a single giant component Size of the giant component is the order of the network size Nodes in the network are connected and form a single giant component Giant component carries the structural properties of the entire network Percolation Threshold f fraction of nodes removed Initial single connected component Giant component still exists Percolation Threshold f fraction of nodes removed Initial single connected component Giant component still exists fc fraction of nodes removed The entire graph breaks into smaller fragments Therefore fc =1-qc becomes the percolation threshold Percolation threshold Percolation condition of a network having degree distribution pk can be given as After removal of fk fraction of nodes, if the degree distribution of the network becomes pk’, then the condition for percolation becomes Which leads to the following critical condition for percolation Percolation threshold From this equation, the percolation threshold fq can be obtained numerically for any kind of network characterized by degree distribution pk under any kind of disrupting event checterized by fk For random failure , percolation threshold becomes fc 1 1 k 2 k 1 Conclusion Contribution of the work Development of general framework to analyze the stability of superpeer networks Modeling the dynamic behavior of the peers using degree independent failure as well as attack. Comparative study between theoretical and simulation results to show the effectiveness of our theoretical model. CASE STUDY II: Self-Organization of the Sound Inventories Rediscovering the Co-occurrence Principles of the Vowel Inventories: A Complex Network Approach, Advances in Complex System Modeling the Co-occurrence Principles of the Consonant Inventories: A Complex Network Approach, Int. Jour. of Modern Phy. C Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World's Languages, ACL(07), Prague, Czech Republic. Analysis and Synthesis of the Distribution of Consonants over Languages: A Complex Network Approach, COLING-ACL(06), Sydney, Australia. Human Communication Human beings and many other living organisms produce sound signals Unlike other organisms, they can concatenate these sounds to produce new messages – Language Language is one of the primary cause/effect of human intelligence Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like I. Place of articulation II. Manner of articulation III. Phonation Mermelstein’s Model Types of Phonemes Consonants Vowels /i/ /a/ L /u/ /t/ /p/ Diphthongs /ai/ /k/ Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice? Forces of Choice A Linguistic System – How does it look? /a/ Speaker Desires “ease of articulation” /a/ Listener / Learner Desires “perceptual contrast” / “ease of learnability” The forces shaping the choice are opposing – Hence there has to be a non-trivial solution Vowels: A (Partially) Solved Mystery Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. /a/ /i/ /u/ Maximally Distinct Consonants: A J i g s a w puzzle Research: From 1929 – Date No single satisfactory explanation of the organization of the consonant inventories The set of features that characterize consonants is much larger than that of vowels No single force is sufficient to explain this organization Rather a complex interplay of forces goes on in shaping these inventories The Approach & Objective We adopt a Complex Network Approach to attack the problem of consonant inventories We try to figure out the principle of the distribution of the occurrence of consonants over languages We also attempt to figure out the cooccurrence patterns (if any) that are found across the consonant inventories Principle of Occurrence PlaNet – The “Phoneme-Language Network” A bipartite network N=(VL,VC,E) VL : Nodes representing languages of the world VC : Nodes representing consonants E : Set of edges which run between VL and VC Languages L1 /ŋ/ L2 /m/ L3 /d/ Consonants There is an edge e Є E between two nodes vl Є VL and vc Є VC if the consonant c occurs in the language l. /θ/ /s/ L4 /p/ The Structure of PlaNet Construction of PlaNet Data Source : UCLA Phonological Inventory Database (UPSID) Number of nodes in VL is 317 Number of nodes in VC is 541 Number of edges in E is 7022 Degree Distribution of PlaNet 0.08 DD of the language nodes follows a βdistribution pk = beta(k) with α = 7.06, and β = 47.64 0.06 pk 0.04 Γ(54.7) k6.06(1-k)46.64 pk = Γ(7.06) Γ(47.64) 0.02 DD of the consonant nodes follows a power-law with an exponential cut-off kmin= 5, kmax= 173, kavg= 21 0 50 100 150 11 200 0.1 0.1 Language inventory size (degree k) Pk 0.01 0.01 Distribution of Consonants over Languages follow a power-law Pk = k -0.71 Exponential Cut-off 0.001 0.001 11 10 100 10 100 Degree of a consonant, k 1000 1000 Synthesis of PlaNet Given: VL = {L1, L2, ..., L317} sorted in the ascending order of their degrees and 541 unlabeled nodes in VC . Step 0: All nodes in VC have degree 0. Step t+1: Choose a language node Lj (in order) with cardinality kj (inventory size) for c running from 1 to kj do Connect Lj preferentially with a consonant node Ci VC, to which it is already not connected, with a probability Pr(Ci) = diα+ ε ∑xV* (dxα + ε) where, di = degree of node Ci at step t and V* = subset of VC not connected to Lj at t and ε is the smoothing parameter. The Preferential Mechanism of Synthesis L1 L2 L3 L4 After step 3 L1 L2 L3 L4 After step 4 Simulation Result 1 PlaNet PlaNetsyn PlaNetrand Pk .1 .01 .001 1 10 100 1000 Degree (k) The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs Principle of Co-occurrence Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy plosive voiced bilabial /b/ dental /d/ then it will also tend to have voiceless /p/ /t/ If a language has in its inventory How to Capture these Co-occurrences? PhoNet – “Phoneme Phoneme Network” A weighted network N=(VC,E) VC : Nodes representing consonants E : Set of edges which run between the nodes in VC There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1. /k′/ 50 14 42 /kw/ 39 38 /k/ 283 13 17 /d′/ Construction of PhoNet Data Source : UPSID Number of nodes in VC is 541 Number of edges is 34012 PhoNet Community Structures in PhoNet Radicchi et al. algorithm (for unweighted networks) – Counts number of triangles that an edge is a part of. Inter-community edges will have low count so remove them. Modification for a weighted network like PhoNet Look for triangles, where the weights on the edges are comparable. If they are comparable, then the group of consonants cooccur highly else it is not so. Measure strength S for each edge (u,v) in PhoNet where S is, wuv if √Σi Є Vc-{u,v}(wui – wvi)2>0 else S = ∞ S= 2 √Σi Є Vc-{u,v}(wui – wvi) Remove edges with S less than a threshold η Community Formation 1 52 110 100 3 101 2 10 5 5.17 10.94 4 45 46 1 S 11.11 6 3 0.06 4 7.14 2 5 7.5 3.77 6 η>1 1 For different values of η we get different sets of communities 2 4 3 5 6 Consonant Societies! η=0.35 η=0.60 η=0.72 η=1.25 The Binding Force of the Communities: Feature Economy Feature Entropy: The idea is borrowed from information theory For a community C of size N, let there be pf consonants for which a particular feature f is present and qf other consonants for which f is absent – probability that a consonant chosen from C has f is pf /N and that it does have f is qf /N or (1- pf /N) Feature entropy can be therefore defined as FE = ΣFЄf(-(pf /N)log(pf /N) – (qf /N)log(qf /N)) where F is the set of all features present in the consonants in C Essentially the number of bits needed to transmit the entire information about C through a channel. Computing Feature Entropy Lower FE -> C1 economizes on the number of features Higher FE -> C2 does not economize on the number of features If the Inventories had Evolved by Chance! Construction of PhoNetrand For each consonant c let the frequency of occurrence in UPSID be denoted by fc. Let there be 317 bins each corresponding to a language in UPSID. fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. Thus the consonant inventories of the 317 languages corresponding to the bins are generated. PhoNetrand can be constructed from these new consonant inventories similarly as PhoNet. Cluster PhoNetrand by the method proposed earlier Comparison between PhoNet and PhoNetrand Average Feature Entropy 10 5 PhoNet PhoNetrand 0 0 5 10 15 20 Community Size The curve shows the average feature entropy of the communities of a particular size versus the community size Conclusion Covered Understanding topological properties of complex networks Case Study – peer to peer networks and language modeling Not covered Processes on networks Search, Epidemics etc. Contact Niloy Ganguly [email protected] http://www.facweb.iitkgp.ernet.in/~niloy/ Workshop Dynamics on and of complex network http://www.cel.iitkgp.ernet.in/~eccs07/ Sequel : Book Volume to be published in May 2008 from Birkhauser Boston. Thank you!!

Descargar
# Emergent Systems