Game Theory and Cognitive Radio Networks Basic Game Models Equilibria Concepts Optimality Concepts Convergence and Noise 1 © Cognitive Radio Technologies, 2007 Game Models Models of interactive decision processes 2 © Cognitive Radio Technologies, 2007 Game 1. 2. 3. A (well-defined) set of 2 or more players A set of actions for each player. A set of preference relationships for each player for each possible action tuple. • More elaborate games exist with more components but these three must always be there. • Some also introduce an outcome function which maps action tuples to outcomes which are then valued by the preference relations. • Games with just these three components (or a variation on the preference relationships) are said to be in Normal form or Strategic Form 3 © Cognitive Radio Technologies, 2007 Set of Players (decision makers) N – set of n players consisting of players “named” {1, 2, 3,…,i, j,…,n} Note the n does not mean that there are 14 players in every game. Other components of the game that “belong” to a particular player are normally indicated by a subscript. Generic players are most commonly written as i or j. Usage: N is the SET of players, n is the number of players. N \ i = {1,2,…,i-1, i+1 ,…, n} All players in N except for i 4 © Cognitive Radio Technologies, 2007 Actions Ai – Set of available actions for player i Example Two Player ai – A particular action chosen by i, ai Ai Action Space A – Action Space, Cartesian product of all Ai A1 = A2 = [0 ) A=A1 A2 A=A1 A2· · · An a – Action tuple – a point in the Action Space A2 = A-1 A-i – Another action space A formed from A-i =A1 A2· · · Ai-1 Ai+1 · · · An a-i – A point from the space A-i a2 = a-1 a b b2 = b-1 A = Ai A-i b1 = b-2 a1 = a-2 5 © Cognitive Radio Technologies, 2007 A1= A-2 Preference Relations (1/2) Preference Relation expresses an individual player’s desirability of one outcome over another (A binary relationship) i Preference Relationship (prefers at least as much as) o i ~i i o * o is preferred at least as much as o* by player i Strict Preference Relationship (prefers strictly more than) * o i o iff o i o * but not o * o i “Indifference” Relationship (prefers equally) o ~i o * iff o i o * * and o i 6 © Cognitive Radio Technologies, 2007 o Preference Relationship (2/2) Games generally assume the relationship between actions and outcomes is invertible so preferences can be expressed over action vectors. Preferences are really an ordinal relationship – Know that player prefers one outcome to another, but quantifying by how much introduces difficulties 7 © Cognitive Radio Technologies, 2007 Utility Functions (1/2) (Objective Fcns, Payoff Fcns) A mathematical description of preference relationships. Maps action space to set of real numbers. ui : A R Preference Relation then defined as a a * i a * i a * iff u i a u i a iff ui a ui a * * u a u a iff a ~i a i i * 8 © Cognitive Radio Technologies, 2007 Utility Functions (2/2) By quantifying preference relationships all sorts of valuable mathematical operations can be introduced. Also note that the quantification operation is not unique as long as relationships are preserved. Many map preference relationships to [0,1]. Example Jack prefers Apples to Oranges Apples Jack Oranges u Jack A p p les u Jack O ra n g es a) uJack(Apples) = 1, uJack(Oranges) = 0 9 b) uJack(Apples) = -1, uJack(Oranges) = -7.5 © Cognitive Radio Technologies, 2007 Variety of game models Normal Form Game <N,A,{ui}> – – – Synchronous play T is a singleton Perfect knowledge of action space, other players’ goals (called utility functions) Repeated Game <N,A,{ui},{di}> – – – – Repeated synchronous play of a normal form game T may be finite or infinite Perfect knowledge of action space, other players’ goals (called utility functions) Players may consider actions in future stages and current stages Asynchronous myopic repeated game <N,A,{ui},{di},T> – – Strategies (modified di) Repeated play of a normal form game under various timings Radios react to most recent stage, decision rule is “intelligent” Many others in the literature and in the dissertation 10 © Cognitive Radio Technologies, 2007 Cognitive radios are naturally modeled as players in a game Infer from Context Utility function Arguments Infer from Radio Model Establish Priority Normal Immediate Observe Outcome Space Outside World 11 Utility Function Orient Autonomous Goal Plan Normal Urgent Learn New States Decide States Act \ Decision Rules Allocate Resources Initiate Processes Action Negotiate Sets Adapted From Mitola, “Cognitive Radio for Flexible Mobile ©Multimedia Cognitive RadioCommunications Technologies, 2007 ”, IEEE Mobile Multimedia Conference, 1999, pp 3-10. Interaction is naturally modeled as a game Radio 1 Radio 2 Actions Decision Rules u1 Actions Decision Rules Action Space Informed by Communications Theory u 1 ˆ1 f :AO Outcome Space ˆ1 , ˆ 2 12 © Cognitive Radio Technologies, 2007 u 2 ˆ 2 u2 Some differences between game models and cognitive radio network model Assuming numerous iterations, normal form game only has a single stage. – – Repeated games are explicitly used as the basis for cognitive radio algorithm design (e.g., Srivastava, MacKenzie) – – 13 Useful for compactly capturing modeling components at a single stage Normal form game properties will be exploited in the analysis of other games Not however, focus of dissertation Not the most commonly encountered implementation Player Cognitive Radio Knowledge Knows A Can learn O (may know or learn A) f : A O Invertible Constant Known Not invertible (noise) May change over time (though relatively fixed for short periods) Has to learn Preferences Ordinal Cardinal (goals) © Cognitive Radio Technologies, 2007 Summary The interactions in a cognitive radio network (levels 1-3) can be represented by the tuple <N, A, {ui}, {di},T> A dynamical system model adequately represents innerloop procedural radios A myopic asynchronous repeated game adequately represents ontological radios and random procedural radios – – 14 Suitable for outer-loop processes Not shown here, but can also handle inner-loop Some differences in models – – Most analysis carries over Some differences © Cognitive Radio Technologies, 2007 Dynamical System Game Model Equilibrium Concepts Nash Equilibria, Mixed Strategy Equilbria, Coalitional Games, the Core, Shapley Value, Nash Bargaining 15 © Cognitive Radio Technologies, 2007 Steady-states Recall model of <N,A,{di},T> which we characterize with the evolution function d Steady-state is a point where a*= d(a*) for all t t * Obvious solution: solve for fixed points of d. For non-cooperative radios, if a* is a fixed point under synchronous timing, then it is under the other three timings. Works well for convex action spaces – – Not always guaranteed to exist Value of fixed point theorems Not so well for finite spaces – Generally requires exhaustive search 16 © Cognitive Radio Technologies, 2007 Nash Equilibrium “A steady-state where each player holds a correct expectation of the other players’ behavior and acts rationally.” - Osborne An action vector from which no player can profitably unilaterally deviate. Definition An action tuple a is a NE if for every i N u i a i , a i u i bi , a i for all bi Ai. 17 Note showing that a point is a NE says nothing about the process by which the steady state is reached. Nor anything about its uniqueness. Also note that we are implicitly assuming that only pure strategies are possible in this case. © Cognitive Radio Technologies, 2007 Examples Cognitive Radios’ Dilemma – – – Two radios have two signals to choose between {n,w} and {N,W} n and N do not overlap Higher throughput from operating as a high power wideband signal when other is narrowband Jamming Avoidance – – Two channels No NE Jammer Transmitter 18 © Cognitive Radio Technologies, 2007 0 1 0 (-1,1) (1,-1) 1 (1,-1) (-1,1) How do the players find the Nash Equilibrium? Preplay Communication – Rational Introspection – Based on what each player knows about the other players, reason what the other players would do in its own best interest. (Best Response - tomorrow) Points where everyone would be playing “correctly” are the NE. Focal Point – Before the game, discuss their options. Note only NE are suitable candidates for coordination as one player could profitably violate any agreement. Some distinguishing characteristic of the tuple causes it to stand out. The NE stands out because it’s every player’s best response. Trial and Error – Starting on some tuple which is not a NE a player “discovers” that deviating improves its payoff. This continues until no player can improve by deviating. Only guaranteed to work for Potential Games (couple weeks) 19 © Cognitive Radio Technologies, 2007 Iterative Elimination of Strongly Dominated Strategies (IESDS) Motivation Sometimes a player’s actions are not preferable, no matter what the other players do. These actions would thus never rationally be played and can be eliminated from consideration in any NE action vector. Definition An action of player i, ai, is said to be dominated, if there exists some other action bi such that u i a i , a i u i bi , a i a i A i Note this is strictly the definition for strong domination and the technique being discussed is actually Strong IEDS (or IESDS). Normal (weak) IEDS and sometimes results in incorrect elimination of NE. 20 © Cognitive Radio Technologies, 2007 IEDS Algorithm 1. Remove any readily apparent dominated actions from the analysis. 2. Check to see if this reveals new dominated actions. 3. If there are any dominated actions, repeat from 1, else continue to 4. 4. Apply NE definition on any remaining action tuples. 21 © Cognitive Radio Technologies, 2007 Example IEDS Iteration 1. Note the following D C D -1, -1 -10, 0 C 0, -10 u1 C , D u1 D , D u1 C , C u1 D , C So D is dominated by C for player 1. So we remove D for player 1 from the game. Iteration 2. Note the following u2 C , C u2 C , D 22 -5, -5 As there is only one action tuple left (thus no deviation is possible, nor is a profitable deviation), it must be a NE So in the remaining game D is dominated by C for player 2. So we remove D for player 2 from the game. © Cognitive Radio Technologies, 2007 Comments on IESDS IESDS is not useful for all games as many games have no dominated strategies. While each iterative elimination is rational, Dutta states that rationality is more difficult to assume when a large number of iterative eliminations are required. (I disagree for repetitive play) IESDS can also augment other solution techniques. If a game is IESDS solvable 23 © Cognitive Radio Technologies, 2007 Nash Equilibrium as a Fixed Point Individual Best Response Bˆ i a bi Ai : u i bi , a i u i a i , a i a i Ai Synchronous Best Response Bˆ a Bˆ i a i N Nash Equilibrium as a fixed point Fixed point theorems can be used to establish existence of NE (see dissertation) NE can be solved by implied system of equations * * a Bˆ a 24 © Cognitive Radio Technologies, 2007 Example solution for Fixed Point by Solving for Best Response Fixed Point Bandwidth Allocation Game – – – – Five cognitive radios with each radio, i, free to determine the number of simultaneous frequency hopping channels the radio implements, ci [0,). Goal u i c P c c i C i c i P(c) fraction of symbols that are not interfered with (making P(c)ci the goodput for radio i) Ci(ci) is radio i’s cost for supporting ci simultaneous channels. ui c B c k ci K ci kN 25 © Cognitive Radio Technologies, 2007 Best Response Analysis Goal ui c B c k ci K ci kN Best Response ˆ ci Bi c B K ck / 2 kN \i Simultaneous System of Equations Solution cˆi B K / 6 i N 26 Generalization cˆi B K / N 1 i N © Cognitive Radio Technologies, 2007 Significance of NE for CRNs Autonomously Rational Decision Rule Why not “if and only if”? – – Consider a self-motivated game with a local maximum and a hill-climbing algorithm. For many decision rules, NE do capture all fixed points (see dissertation) Identifies steady-states for all “intelligent” decision rules with the same goal. Implies a mechanism for policy design while accommodating differing implementations – – Verify goals result in desired performance Verify radios act intelligently 27 © Cognitive Radio Technologies, 2007 Key Theorem for NE Existence Kakutani’s Fixed Point Theorem – Let f :X X be an upper semi-continuous convex valued correspondence from a non-empty compact convex set X n, then there is some x*X such that x* f(x*) f(x) x 28 © Cognitive Radio Technologies, 2007 Nash Equilibrium Existence Visualizable Definition of Quasi-Concavity All upper-level sets are convex U a * a A : f a a * f (a ) U (a 1 ) 29 © Cognitive Radio Technologies, 2007 a0 a1 a2 a My Favorite Mixed Strategy Story Pure Strategies in an Extended Game Consider an extensive form game where each stage is a strategic form game and the action space remains the same at each stage. Before play begins, each player chooses a probabilistic strategy that assigns a probability to each action in his action set. At each stage, the player chooses an action from his action set according to the probabilities he assigned before play began. Example Consider a video football game which will be simulated. Before the game begins two players assign probabilities of calling running plays or passing plays for both offense or defense. In the simulation, for each down the kind of play chosen by each team is based on the initial probabilities assigned to kinds of plays. 30 © Cognitive Radio Technologies, 2007 Example Mixed Strategy Game Jamming game p a1 q (1-q) a2 b2 1,-1 -1, 1 Action Tuples Probabilities pq (a1,a2) p(1-q) (a1,b2) (1-p)q (b1,a2) (1-p)(1-q) (b1,b2) Expected Utilities (1-p) b1 -1, 1 1, -1 U 1 p , q pq 1 p 1 q 1 1 p q 1 1 p 1 q 1 Sets of probability distributions U 2 p , q pq 1 p 1 q 1 (A1)={p,(1-p): p[0,1]} (A2)={q,(1-q): q[0,1]} 31 1 p q 1 1 p 1 q 1 © Cognitive Radio Technologies, 2007 Nash Equilibrium in a Mixed Strategy Game Definition Mixed Strategy Nash Equilibrium A mixed strategy profile * is a NE iff iN U i i , i U i i , i i Ai * * * Best Response Correspondence B R i i arg m ax U i i , i i Ai Alternate NE Definition Consider B i N B R i A mixed strategy profile * is a NE iff B * 32 * © Cognitive Radio Technologies, 2007 Nash Equilibrium U 1 p , q 4 pq 2 p 2 q 1 Best Response Correspondences 1 U 2 p , q 4 pq 2 p 2 q 1 q Best Response u1 p u 2 q q 4q 2 p 4 p 2 0 B R1 q [0,1] 1 B R2 33 1 p [0,1] 0 p 1-q p(a1) BR2(p) 0.5 1-p BR1(q) q 1/ 2 q 1/ 2 q 1/ 2 p 1/ 2 0 1 Note: NE in mixed extension which did not exist in original p 1/ 2 p 1/ 2 p(a2) 0.5 © Cognitive Radio Technologies, 2007 Interesting Properties of Mixed Strategy Games 1. 2. 3. Every Mixed Extension of a Strategic Game has an NE. A mixed strategy i is a best response to -i iff every action in the support of i is itself a best response to -i. Every action in the support of any player’s equilibrium mixed strategy yields the same payoff to that player. 34 © Cognitive Radio Technologies, 2007 Coalitional Game (with transferable payoff) Concept: groups of players (called coalitions) conspire together to implement actions which yields a result for the coalition. The value received by the coalition is then distributed among the coalition members. Where do radios collaborate and distribute value? – – – v:2 \ Transferable utility refers to existence of some commodity for which a player’s utility increases by one unit for every unit of the commodity it receives Game Components, N,v N – – – N set of players Characteristic function Coalition, SN How is this value distributed? – 802.16h interference groups – allocation of bandwidth Distribution of frequencies/spreading codes among cells File sharing in P2P network Payoff vector, (xi)iS Payoff vector is said to be S-feasible if x(S) v(S) x S x i i S 35 © Cognitive Radio Technologies, 2007 The Core (Transferrable) The Core – For N,v, the set of feasible payoff profiles, (xi)iS for which there is no coalition S and S-feasible payoff vector (yi)iS for which yi > xi for all iS. General principles of the NE also apply to the Core: – – Number of solutions for a game may be anywhere from 0 to May be stable or unstable. 36 © Cognitive Radio Technologies, 2007 Example Suppose three radios, N = {1,2,3}, can choose to participate in a peer-to-peer network. Characteristic Function – – – v(N) = 1 v({1,2})= v({1,3})= v({2,3})=[0,1] v(1)= v(2)= v(3) = 0 Loosely, indicates # of duplicated files If >2/3, Core is empty Example adaptations for =4/5 x = (2/5, 2/5,0) x = (0, 3/5, 1/5) x = (2/5, 0, 2/5) x = (1/3, 1/3, 1/3) x = (2/5, 2/5,0) 37 © Cognitive Radio Technologies, 2007 Comments on the Core Possibility of empty core implies that even when radios can freely negotiate and form arbitrary coalitions, no steady-state may exist Frequently very large (infinite) number of steadystates, e.g., <2/3 – Makes it impossible to predict exact behavior Existence conditions for the Core, but would need to cover some linear programming concepts Related (but not addressed today) concepts: – Bargaining Sets, Kernel, Nucleolus 38 © Cognitive Radio Technologies, 2007 Strong NE Concept: Assume radios are able to collaborate, but utilities aren’t necessarily transferrable An action tuple a* such that ui a * u i a S , a S S N , a S Ai * i S No Strong NE Unique Strong NE n w 39 © Cognitive Radio Technologies, 2007 N W (9.6,9.6) (9.6, 21) (21, 9.6) (22, 22) Motivation for Shapley value Core was generally either empty or very large. Want a “good” single solution. Terminology – Marginal Contribution of i – Interchangeability of i, j – i S v S i v S i S j S S N \ {i , j } Dummy player (no synergy) i S v i S N \ i 40 © Cognitive Radio Technologies, 2007 Axioms for Shapley Value Let be some distribution of value for a TU coalition game Symmetry: – Dummy: – If i is a dummy, then i(v) = v({i}) Additivity: – If i and j are interchangeable, then i(v)=j(v) Given N,v and N,w, i(v + w)= i(v)+ i(w) for all iN, where v+w = v(S) + w(S) Balanced Contributions – Given N,v, i N , v i N \ j .v N\j N , v N \ i.v N \i j 41 © Cognitive Radio Technologies, 2007 j Shapley Value i S S N \i S ! N S 1 ! N ! v S i v S Marginal Value Contributed by i Probability that i will be next one invited to the grand coalition given that coalition S is already part of the coalition assuming random ordering. Only assignment (value) that satisfies balanced contributions; only assignment that simultaneously satisfies symmetry, dummy, and additivity axioms 42 © Cognitive Radio Technologies, 2007 Implications of Shapley Value One form of a fair allocation – – – What you receive is based on the value you add Independent of order of arrival I liken it to setting salaries according to the Value Over Replacement Player concept in Baseball Statistics “Better” solution concept than the core as it’s a single payoff as opposed to a potentially infinite number Allows for analysis of relative “power” of different players in the system 43 © Cognitive Radio Technologies, 2007 Bargaining Problem Components: F, v – – Feasible payoffs F, closed convex subset of n Disagreement Point v = (v1, v2) Example: – – What 1 or 2 could achieve without bargaining Even if system is jammed, still gets some throughput Member of 802.16h interference group and try its luck F is said to be essential if there is some yF such that y1>v1 and y2>v2 If contracts are “binding” then F could be the payoffs corresponding to entire original action space Otherwise, F may need to be drawn from the set of NE or from enforceable set (see punishment in repeated games) A particular solution is referred to by (F, v) n 44 © Cognitive Radio Technologies, 2007 Desirable Bargaining Axioms about a Solution Strong Efficiency – (F, v) is Pareto Efficient Independence of Irrelevant Alternatives – Individually Rational – (F, v) v Scale Covariance – For any 1, 2, 1, 2, 1, 2 >0, if If G F and G is closed and convex and (F, v)G, then (G, v)=(F, v) Symmetry – then If v1=v2 and {(x1,x2)|(x2,x1)F}=F, then 1(F, v)= 2(F, v) G 1 x1 1 , 2 x 2 2 | x1 , x 2 F G , w 11 F , v 1 , 2 2 F , v 2 45 w 1 v1 1 , 2 v 2 2 © Cognitive Radio Technologies, 2007 Nash Bargaining Solution NBS F , v arg max x i v i x F , x v i N Interestingly, this is the only bargaining solution which simultaneously satisfies the preceding 5 axioms 46 © Cognitive Radio Technologies, 2007 GT framework for BW allocation [Yaiche]: System Model N users L links Users compete for the total link capacity Each user has a minimum rate MRi and peak rate PRi Admissible rate vector is given by, X 0 x ¡ N | x M R , x P R , and A x C C : vector of link capacities A L*N: alp = 1 if link belongs to path p, else 0. 47 Scenario given in H. Yaiche, R. Mazumdar, C. Rosenberg, “A game theoretic framework for bandwidth allocation and pricing in broadband networks”, IEEE/ACM Transactions on Networking, Volume: 8 , Issue: 5 , Oct. 2000, pp. 667-678. © Cognitive Radio Technologies, 2007 Centralized Optimization Problem N M ax x x i M R i v i 1 st : xi M Ri i 1 K N xi P Ri i 1 K N A x l C l l 1 K L NBS exists 48 © Cognitive Radio Technologies, 2007 F Steady-State Summary Not every game has a steady-state NE are analogous to fixed points of self-interested decision processes NE can be applied to procedural and ontological radios – A game (network) may have 0, 1, or many steady-states All finite normal form games have an NE in its mixed extension – Don’t need to know decision rule, only goals, actions, and assumption that radios act in their own interest Over multiple iterations, implies constant adaptation More complex game models yield more complex steady-state concepts Can define steady-states concepts for coalitional games – Frequently so broad that specific solutions are used 49 © Cognitive Radio Technologies, 2007 Evaluating Equilibria Objective Function Maximization, Pareto Efficiency, Notions of Fairness 50 © Cognitive Radio Technologies, 2007 Optimality In general we assume the existence of some design objective function J:A The desirableness of a network state, a, is the value of J(a). In general maximizers of J are unrelated to fixed points of d. Figure from Fig 2.6 in I. Akbar, “Statistical Analysis of Wireless Systems Using Markov Models,” PhD Dissertation, Virginia Tech, January 2007 51 © Cognitive Radio Technologies, 2007 Example Functions Utilitarian – – Sum of all players’ utilities Product of all players’ utilities Practical – – – – Utilitarian Maximizers Total system throughput Average SINR Maximum End-to-End Latency Minimal sum system interference System Throughput Maximizers Objective can be unrelated to utilities Interference Minimization 52 © Cognitive Radio Technologies, 2007 Price of Anarchy (Factor) Performance of Centralized Algorithm Solution Performance of Distributed Algorithm Solution 1 Centralized solution always at least as good as distributed solution – Like ASIC is always at least as good as DSP Ignores costs of implementing algorithms – – Sometimes centralized is infeasible (e.g., routing the Internet) Distributed can sometimes (but not generally) be more costly than centralized 9.6 7 53 © Cognitive Radio Technologies, 2007 Implications Best of All Possible Worlds – Low complexity distributed algorithms with low anarchy factors Reality implies mix of methods – Hodgepodge of mixed solutions – Policy – bounds the price of anarchy Utility adjustments – align distributed solution with centralized solution Market methods – sometimes distributed, sometimes centralized Punishment – sometimes centralized, sometimes distributed, sometimes both Radio environment maps –”centralized” information for distributed decision processes Fully distributed Potential game design – really, the panglossian solution, but only applies to particular problems 54 © Cognitive Radio Technologies, 2007 Pareto efficiency (optimality) Formal definition: An action vector a* is Pareto efficient if there exists no other action vector a, such that every radio’s valuation of the network is at least as good and at least one radio assigns a higher valuation Informal definition: An action tuple is Pareto efficient if some radios must be hurt in order to improve the payoff of other radios. Important note – – Like design objective function, unrelated to fixed points (NE) Generally less specific than evaluating design objective function 55 © Cognitive Radio Technologies, 2007 Example Games Legend Pareto Efficient NE NE + PE a2 b2 a1 1,1 -5,5 b1 5,-5 -1,-1 a2 b2 a1 1,1 -5,5 b1 5,-5 3, 3 56 © Cognitive Radio Technologies, 2007 Notions of Fairness What is “Fair”? – – Abstractly “fair” means different things to different analysts In every day life, really just short hand for “I deserve more than I got” Nonetheless is used to evaluate how equitably radio resources are distributed 57 © Cognitive Radio Technologies, 2007 Basic concept: – – – – 58 Order players by utility. Form CDF for sorted utility distribution (Lorenz curve) Integrate (sum) the difference between perfect equality (of outcome) and CDF Divide result by sum of all players’ utilities Formula n 1 i ui a 1 G a n 1 2 i N n ui a i N Used in a lot of macro-economic comparisons of income distributions Relatively simple, independent of scale, independent of size of N, anonymity Radically different outcomes can give the same result Aggregate Utility Gini Coefficient Lorenz curve Player # G N W n 0 0.37 w 0.37 0 © Cognitive Radio Technologies, 2007 Other Metrics of Fairness Theill Index ui a ui a T a ln n i N u u 1 u a 1 u a n i i N Atkinson Index, is income inequality aversion 11 T a 1 u n 11 T a 1 u n i N u a i i N ui a 1 1 1 1/ n , 1 59 © Cognitive Radio Technologies, 2007 , 0,1 Summary of Equilibria Evaluation Lots of different ways which a point can be evaluated Many are contradictory Loosely, any point could be said to be optimal given the right objective function Insufficient to say that a point is optimal Must describe the metric in use Suggestion: use whatever metric makes sense to you as a network designer 60 © Cognitive Radio Technologies, 2007 The Notion of Time and Imperfections in Games and Networks Extensive Form Games, Repeated Games, Convergence Concepts in Normal Form Games, Trembling Hand Games, Noisy Observations 61 © Cognitive Radio Technologies, 2007 Model Timing Review When decisions are made also matters and different radios will likely make decisions at different time Tj – when radio j makes its adaptations – – Generally assumed to be an infinite set Assumed to occur at discrete time Consistent with DSP implementation T=T1T2Tn tT Decision timing classes Synchronous – Round-robin – – One at a time in order Used in a lot of analysis Random – All at once One at a time in no order Asynchronous – – Random subset at a time Least overhead for a network 62 © Cognitive Radio Technologies, 2007 Extensive Form Game Components Components A set of players. The actions available to each player at each decision moment (state). A way of deciding who is the current decision maker. Outcomes on the sequence of actions. Preferences over all outcomes. 1. 2. 3. 4. 5. A Silly Jammer Avoidance Game Game Tree Representation B 1 A 1 2 63 2 B 1’ 2’ Strategic Form Equivalence 1,-1 Strategies for A 1,1’ 1,2’ 2,1’ 2,2’ {1,2} -1,1 Strategies for B 1 1,-1 1,-1 -1,1 -1,1 -1,1 {(1,1’),(1,2’), (2,1’),(2,2’)} 2 -1,1 1,-1 -1,1 1,-1 1,-1 © Cognitive Radio Technologies, 2007 Backwards Induction Concept – – – – Reason backwards based on what each player would rationally play Predicated on Sequential Rationality Sequential Rationality – if starting at any decision point for a player in the game, his strategy from that point on represents a best response to the strategies of the other players Subgame Perfect Nash Equilibrium is a key concept (not formally discussed today). Alternating Packet Forwarding Game 2 1 1 C 2 C 1 C 2 C C 4,6 C 5,3 0,2 3,1 2,4 S S S S S S 64 1,0 0,2 3,1 2,4 5,3 © Cognitive Radio Technologies, 2007 4,6 7,5 Comments on Extensive Form Games Actions will generally not be directly observable However, likely that cognitive radios will build up histories Ability to apply backwards induction is predicated on knowing other radio’s objectives, actions, observations and what they know they know… – Likely not practical Really the best choice for modeling notion of time when actions available to radios change with history 65 © Cognitive Radio Technologies, 2007 Repeated Games Same game is repeated – – Indefinitely Finitely Players consider discounted payoffs across multiple stages – S ta g e 1 S ta g e 2 Stage k ui a ui a – Expected value over all future stages k k k ui a k k ui a k S ta g e k k 0 66 © Cognitive Radio Technologies, 2007 Lesser Rationality: Myopic Processes Players have no knowledge about utility functions, or expectations about future play, typically can observe or infer current actions Best response dynamic – maximize individual performance presuming other players’ actions are fixed Better response dynamic – improve individual performance presuming other players’ actions are fixed Interesting convergence results can be established 67 © Cognitive Radio Technologies, 2007 Paths and Convergence Path [Monderer_96] – – – – – A path in is a sequence = (a0, a1,…) such that for every k 1 there exists a unique player such that the strategy combinations (ak-1, ak) differs in exactly one coordinate. Equivalently, a path is a sequence of unilateral deviations. When discussing paths, we make use of the following conventions. Each element of is called a step. a0 is referred to as the initial or starting point of . Assuming is finite with m steps, am is called the terminal point or ending point of and say that has length m. Cycle [Voorneveld_96] – A finite path = (a0, a1,…,ak) where ak = a0 68 © Cognitive Radio Technologies, 2007 Improvement Paths Improvement Path – A path = (a0, a1,…) where for all k1, ui(ak)>ui(ak-1) where i is the unique deviator at k Improvement Cycle – – An improvement path that is also a cycle See the DFS example 1 2 5 6 3 69 © Cognitive Radio Technologies, 2007 4 Convergence Properties Finite Improvement Property (FIP) – Weak Finite Improvement Property (weak FIP) – All improvement paths in a game are finite From every action tuple, there exists an improvement path that terminates in an NE. FIP implies weak FIP FIP implies lack of improvement cycles Weak FIP implies existence of an NE 70 © Cognitive Radio Technologies, 2007 Examples Game with FIP A B a 1,-1 0,2 b -1,1 2,2 Weak FIP but not FIP 71 A B C a 1,-1 -1,1 0,2 b -1,1 1,-1 1,2 c 2,0 2,1 2,2 © Cognitive Radio Technologies, 2007 Implications of FIP and weak FIP Assumes radios are incapable of reasoning ahead and must react to internal states and current observations Unless the game model of a CRN has weak FIP, then no autonomously rational decision rule can be guaranteed to converge from all initial states under random and round-robin timing (Theorem 4.10 in dissertation). If the game model of a CRN has FIP, then ALL autonomously rational decision rules are guaranteed to converge from all initial states under random and round-robin timing. – And asynchronous timings, but not immediate from definition More insights possible by considering more refined classes of decision rules and timings 72 © Cognitive Radio Technologies, 2007 Decision Rules 73 © Cognitive Radio Technologies, 2007 Absorbing Markov Chains and Improvement Paths Sources of randomness – – – Timing (Random, Asynchronous) Decision rule (random decision rule) Corrupted observations An NE is an absorbing state for autonomously rational decision rules. Weak FIP implies that the game is an absorbing Markov chain as long as the NE terminating improvement path always has a nonzero probability of being implemented. This then allows us to characterize – – – convergence rate, probability of ending up in a particular NE, expected number of times a particular transient state will be visited 74 © Cognitive Radio Technologies, 2007 Connecting Markov models, improvement paths, and decision rules Suppose we need the path = (a0, a1,…am) for convergence by weak FIP. Must get right sequence of players and right sequence of adaptations. Friedman Random Better Response – Random or Asynchronous – Every sequence of players have a chance to occur Random decision rule means that all improvements have a chance to be chosen Synchronous not guaranteed Alternate random better response (chance of choosing same action) – – Because of chance to choose same action, every sequence of players can result from every decision timing. Because of random choice, every improvement path has a chance of occurring 75 © Cognitive Radio Technologies, 2007 Convergence Results (Finite Games) If a decision rule converges under round-robin, random, or synchronous timing, then it also converges under asynchronous timing. Random better responses converge for the most decision timings and the most surveyed game conditions. – Implies that non-deterministic procedural cognitive radio implementations are a good approach if you don’t know much about the network. 76 © Cognitive Radio Technologies, 2007 Impact of Noise Noise impacts the mapping from actions to outcomes, f :AO Same action tuple can lead to different outcomes Most noise encountered in wireless systems is theoretically unbounded. Implies that every outcome has a nonzero chance of being observed for a particular action tuple. Some outcomes are more likely to be observed than others (and some outcomes may have a very small chance of occurring) 77 © Cognitive Radio Technologies, 2007 DFS Example Consider a radio observing the spectral energy across the bands defined by the set C where each radio k is choosing its band of operation fk. Noiseless observation of channel ck oi c k g ki p k c k , f k k N Noisy observation o i c k g ki p k c k , f k n i c k , t k N If radio is attempting to minimize inband interference, then noise can lead a radio to believe that a band has lower or higher interference than it does 78 © Cognitive Radio Technologies, 2007 Trembling Hand (“Noise” in Games) Assumes players have a nonzero chance of making an error implementing their action. – – Who has not accidentally handed over the wrong amount of cash at a restaurant? Who has not accidentally written a “tpyo”? Related to errors in observation as erroneous observations cause errors in implementation (from an outside observer’s perspective). 79 © Cognitive Radio Technologies, 2007 Noisy decision rules Noisy utility u i a , t u i a n i a , t Trembling Hand Observation Errors 80 © Cognitive Radio Technologies, 2007 Implications of noise For random timing, [Friedman] shows game with noisy random better response is an ergodic Markov chain. Likewise other observation based noisy decision rules are ergodic Markov chains – – – Unbounded noise implies chance of adapting (or not adapting) to any action If coupled with random, synchronous, or asynchronous timings, then CRNs with corrupted observation can be modeled as ergodic Makov chains. Not so for round-robin (violates aperiodicity) Somewhat disappointing – No real steady-state (though unique limiting stationary distribution) 81 © Cognitive Radio Technologies, 2007 DFS Example with three access points 3 access nodes, 3 channels, attempting to operate in band with least spectral energy. Constant power Link gain matrix 3 1 Noiseless observations Random timing 82 © Cognitive Radio Technologies, 2007 2 Trembling Hand Transition Matrix, p=0.1 Limiting distribution 83 © Cognitive Radio Technologies, 2007 Noisy Best Response Transition Matrix, (0,1) Gaussian Noise Limiting stationary distributions 84 © Cognitive Radio Technologies, 2007 Comment on Noise and Observations Cardinality of goals makes a difference for cognitive radios – – Probability of making an error is a function of the difference in utilities With ordinal preferences, utility functions are just useful fictions Might as well assume a trembling hand Unboundedness of noise implies that no state can be absorbing for most decision rules NE retains significant predictive power – – – – While CRN is an ergodic Markov chain, NE (and the adjacent states) remain most likely states to visit Stronger prediction with less noise Also stronger when network has a Lyapunov function Exception - elusive equilibria ([Hicks_04]) 85 © Cognitive Radio Technologies, 2007 Summary Given a set of goals, an NE is a fixed point for all radios with those goals for all autonomously rational decision processes Traditional engineering analysis techniques can be applied in a game theoretic setting – Network must have weak FIP for autonomously rational radios to converge – Weak FIP implies existence of absorbing Markov chain for many decision rules/timings In practical system, network has a theoretically nonzero chance of visiting every possible state (ergodicity), but does have unique limiting stationary distribution – – Markov chains to improvement paths Specific distribution function of decision rules, goals Will be important to show Lyapunov stability Shortly, we’ll cover potential games and supermodular games which can be shown to have FIP or weak FIP. Further potential games have a Lyapunov function. 86 © Cognitive Radio Technologies, 2007 Designing Cognitive Radio Networks to Yield Desired Behavior Policy, Cost Functions, Global Altruism, Supermodular Games, Potential Games 87 © Cognitive Radio Technologies, 2007 Policy Concept: Constrain the available actions so the worst cases of distributed decision making can be avoided Not a new concept – – Policy has been used since there’s been an FCC What’s new is assuming decision makers are the radios instead of the people controlling the radios 88 © Cognitive Radio Technologies, 2007 Policy applied to radios instead of humans mask Need a language to convey policy – – Might need to tie in to humans Need a source for policy – – Policy engine? Need an enforcement mechanism – frequency Policies How do radios interpret policy – Learn what it is Expand upon policy later Who sets it? Who resolves disputes? Logical extreme can be quite complex, but logical extreme may not be necessary. 89 © Cognitive Radio Technologies, 2007 802.22 Example Policies Detection – – – Digital TV: -116 dBm over a 6 MHz channel Analog TV: -94 dBm at the peak of the NTSC (National Television System Committee) picture carrier Wireless microphone: -107 dBm in a 200 kHz bandwidth. Transmitted Signal – – – 4 W Effective Isotropic Radiated Power (EIRP) Specific spectral masks Channel vacation times L. Challapali, D. Birru, S. Shankar, “IEEE 802.22: The First Worldwide Wireless Standard based on Cognitive Radios,” 90C.IEEECordeiro, DySPAN2005, Nov 8-11, 2005 Baltimore, MD. © Cognitive Radio Technologies, 2007 Cost Adjustments Concept: Centralized unit dynamically adjusts costs in radios’ objective functions to ensure radios operate on desired point u i a u i a ci a Example: Add -12 to use of wideband waveform 91 © Cognitive Radio Technologies, 2007 Comments on Cost Adjustments Permits more flexibility than policy – If a radio really needs to deviate, then it can Easy to turn off and on as a policy tool – – Example: protected user shows up in a channel, cost to use that channel goes up Example: prioritized user requests channel, other users’ cost to use prioritized user’s channel goes up (down if when done) 92 © Cognitive Radio Technologies, 2007 Global Altruism: distributed, but more costly Concept: All radios distributed all relevant information to all other radios and then each independently computes jointly optimal solution – – C = cost of computation I = cost of information transfer from node to node n = number of nodes Distributed – nC + n(n-1)I/2 Centralized (election) – Proposed for spreading code allocation in Popescu04, Sung03 Used in xG Program (Comments of G. Denker, SDR Forum Panel Session on “A Policy Engine Framework”) Overhead ranges from 5%-27% C + 2(n-1)I Price of anarchy = 1 May differ if I is asymmetric 93 © Cognitive Radio Technologies, 2007 Improving Global Altruism Global altruism is clearly inferior to a centralized solution for a single problem. However, suppose radios reported information to and used information from a common database – And suppose different radios are concerned with different problems with costs C1,…,Cn Centralized – – n(n-1)I/2 => 2nI Resources = 2(n-1)I + sum(C1,…,Cn) Time = 2(n-1)I + sum(C1,…,Cn) Distributed – – Resources = 2nI + sum(C1,…,Cn) Time = 2I + max (C1,…,Cn) 94 © Cognitive Radio Technologies, 2007 Example Application: Overlay network of secondary users (SU) free to adapt power, transmit time, and channel Without REM: – Decisions solely based on link SINR With REM – Radios effectively know everything Upshot: A little gain for the secondary users; big gain for primary users 95 From: Y. Zhao, J. Gaeddert, K. Bae, J. Reed, “Radio Environment Map Enabled SituationAlgorithms,” SDR Forum Technical Conference 2006. Radio Learning © Aware CognitiveCognitive Radio Technologies, 2007 Comments on Radio Environment Map Local altruism also possible – Less information transfer Like policy, effectively needs a common language Nominally could be centralized or distributed database Read more: – Y. Zhao, B. Le, J. Reed, “Infrastructure Support – The Radio Environment MAP,” in Cognitive Radio Technology, B. Fette, ed., Elsevier 2006. 96 © Cognitive Radio Technologies, 2007 Repeated Games Same game is repeated – – Indefinitely Finitely Players consider discounted payoffs across multiple stages – k ui a k k Expected value over all future stages 97 S ta g e 2 Stage k ui a – S ta g e 1 ui a k k 0 k ui a k S ta g e k © Cognitive Radio Technologies, 2007 Impact of Strategies Rather than merely reacting to the state of the network, radios can choose their actions to influence the actions of other radios Threaten to act in a way that minimizes another radio’s performance unless it implements the desired actions Common strategies – – – Tit-for-tat Grim trigger Generous tit-for-tat Play can be forced to any “feasible” payoff vector with proper selection of punishment strategy. 98 © Cognitive Radio Technologies, 2007 Impact of Communication on Strategies Players agree to play in a certain manner Threats can force play to almost any state – 99 Breaks down for finite number of stages Nada C N nada 0,0 -5,5 -100,0 c 5,-5 -1,1 -100,-1 n 0,-100 -1,-100 -100,-100 © Cognitive Radio Technologies, 2007 Improvement from Punishment 100 Throughput/unit power gains be enforcing a common received power level at a base station Punishment by jamming Without benefit to deviating, players can operate at lower power level and achieve same throughput A. MacKenzie and S. Wicker, “Game Theory in Communications: Motivation, Explanation, and Application to Power Control,” Globecom2001, pp. 821-825. © Cognitive Radio Technologies, 2007 Instability in Punishment Issues arise when radios aren’t directly observing actions and are punishing with their actions without announcing punishment Eventually, a deviation will be falsely detected, punished and without signaling, this leads to a cascade of problems V. Srivastava, L. DaSilva, “Equilibria for Node Participation in Ad Hoc Networks – An Imperfect Monitoring Approach,” ICC 06, June 2006, vol 8, pp. 3850-3855 101 © Cognitive Radio Technologies, 2007 Comments on Punishment Works best with a common controller to announce Problems in fully distributed system – – Problems when actions cannot be directly observed – Leads to Byzantine problem No single best strategy exists – – Need to elect a controller Otherwise competing punishments, without knowing other players’ utilities can spiral out of control Strategy flexibility is important Significant problems with jammers (they nominally receive higher utility when “punished” Generally better to implement centralized controller – Operating point has to be announced anyways 102 © Cognitive Radio Technologies, 2007 What does game theory bring to the design of cognitive radio networks? (1/2) A natural “language” for modeling cognitive radio networks Permits analysis of ontological radios – Only know goals and that radios will adapt towards its goal Simplifies analysis of random procedural radios Permits simultaneous analysis of multiple decision rules – only need goal Provides condition to be assured of possibility of convergence for all autonomously myopic cognitive radios (weak FIP) 103 © Cognitive Radio Technologies, 2007 What does game theory bring to the design of cognitive radio networks? (2/2) Provides condition to be assured of convergence for all autonomously myopic cognitive radios (FIP, not synchronous timing) Rapid analysis – Verify goals and actions satisfy a single model, and steadystates, convergence, and stability An intuition as to what conditions will be needed to field successful cognitive radio decision rules. A natural understanding of distributed interactive behavior which simplifies the design of low complexity distributed algorithms 104 © Cognitive Radio Technologies, 2007 Game Models of Cognitive Radio Networks Almost as many models as there are algorithms Normal form game excellent for capturing single iteration of a complex system Most other models add features to this model – Time, decision rules, noisy observations, Natural states Some can be recast as a normal form game – Normal Form Game – Repeated Game – N, A, {ui}, {di}, T TU Game – N, A, {ui}, {di}, T Extensive Form Game – N, A, {ui}, {di} Asynchronous Game – N, A, {ui} N, v Bargaining Game – N, v Extensive form game 105 © Cognitive Radio Technologies, 2007 Steady-states Different game models have different steady-state concepts Games can have many, one, or no steady-states Nash equilibrium (and its variants) is most commonly applied concept – Excellent for distributed noncollaborative algorithms Nash Equilibrium Strong Nash Equilibrium Core Shapley Value Nash Bargaining Solution Games with punishment and Coalitional games tend to have a very large number of equilibria Game theory permits analysis of steady-states without knowing specific decision rules 106 © Cognitive Radio Technologies, 2007 Optimality Numerous different notions of optimality Many are contradictory Use whatever metric makes sense Pareto Efficiency Objective Maximization Gini Index Shapley Value Nash Bargaining Solution 107 © Cognitive Radio Technologies, 2007 Convergence Showing existence of steady-states is insufficient; need to know if radios can reach those states FIP (potential games) gives the broadest convergence conditions Random timing actually helps convergence 108 © Cognitive Radio Technologies, 2007 Noise Unbounded noise causes all networks to theoretically behave as ergodic Markov chains Important to show Lyapunov stabiltiy Noisy observations cause noisy implementation to an outside observer – Trembling hand 109 © Cognitive Radio Technologies, 2007 Game Theory and Design Numerous techniques for improving the behavior of cognitive radio networks Techniques can be combined Potential games yield lowest complexity implementations – – Observing actions Likely best when a referee exists Policy can limit the worst effects, doesn’t really address optimality or convergence issues Punishment – – Limits worst case performance Cost function – – Can enforce any action tuple Can be brittle when distributed Policy – Practical limitations limit effectiveness of punishment – Judicious design of goals, actions, Reshapes preferences Could damage underlying structure if not a selfinterested cost Centralized – – – Can theoretically realize any result Consumes overhead Slower reactions 110 © Cognitive Radio Technologies, 2007 Future Directions in General Game Theory Research and Cognitive Radio Design Integrate policy and potential games Integration of coalitional and distributed forms Increasing dimensionality of action sets – Cross-layer Integration of dynamic and hierarchical policies and games 111 © Cognitive Radio Technologies, 2007 Future Direction in Regulation Can incorporate optimization into policy by specifying goals In theory, correctly implementing goals, correctly implementing actions, and exhaustive self-interested adaptation is enough to predict behavior (at least for potential games) – Simpler policy certification Provable network behavior 112 © Cognitive Radio Technologies, 2007 Avenues for Future Research on Game Theory and CRNs Integration of bargaining, centralized, and distributed algorithms into a common framework Cross-layer algorithms Better incorporating performance of classification techniques into behavior Asymmetric potential games Bargaining algorithms for cognitive radio Improving the brittleness of punishment in distributed implementations with imperfect observations Imperfection in observations in general Time varying game models while inferring convergence, stability… Combination of policy, potential games, coalition formation, and token economies Can be modeled as a game with to types of players – – 113 © Cognitive Radio Technologies, 2007 Distributed cognitive radios Dynamic policy provider

Descargar
# Cognitive Radio and Wireless Trends