```Game Theory and Cognitive Radio
Networks
Basic Game Models
Equilibria Concepts
Optimality Concepts
Convergence and Noise
1
Game Models
Models of interactive
decision processes
2
Game
1.
2.
3.
A (well-defined) set of 2 or more players
A set of actions for each player.
A set of preference relationships for each player for
each possible action tuple.
• More elaborate games exist with more components but these
three must always be there.
• Some also introduce an outcome function which maps action
tuples to outcomes which are then valued by the preference
relations.
• Games with just these three components (or a variation on
the preference relationships) are said to be in Normal form
or Strategic Form
3
Set of Players (decision makers)






N – set of n players consisting of players “named” {1,
2, 3,…,i, j,…,n}
Note the n does not mean that there are 14 players in
every game.
Other components of the game that “belong” to a
particular player are normally indicated by a
subscript.
Generic players are most commonly written as i or j.
Usage: N is the SET of players, n is the number of
players.
N \ i = {1,2,…,i-1, i+1 ,…, n} All players in N except
for i
4
Actions
Ai – Set of available actions for player i
Example Two Player
ai – A particular action chosen by i, ai  Ai Action Space
A – Action Space, Cartesian product of all Ai A1 = A2 = [0 )
A=A1 A2
A=A1 A2· · ·  An
a – Action tuple – a point in the Action
Space
A2 = A-1
A-i – Another action space A formed from
A-i =A1 A2· · · Ai-1  Ai+1  · · ·  An
a-i – A point from the space A-i
a2 = a-1
a
b
b2 = b-1
A = Ai  A-i
b1 = b-2
a1 = a-2
5
A1= A-2
Preference Relations (1/2)
Preference Relation expresses an individual player’s desirability of
one outcome over another (A binary relationship)
i
Preference Relationship (prefers at least as much as)
o
i
~i
i
o
*
o is preferred at least as much as o* by player i
Strict Preference Relationship (prefers strictly more than)
*
o i o iff o i o * but not o *
o
i
“Indifference” Relationship (prefers equally)
o ~i o
*
iff o
i
o
*
*
and o
i
6
o
Preference Relationship (2/2)


Games generally assume the relationship
between actions and outcomes is invertible
so preferences can be expressed over action
vectors.
Preferences are really an ordinal relationship
–
Know that player prefers one outcome to another,
but quantifying by how much introduces
difficulties
7
Utility Functions (1/2)
(Objective Fcns, Payoff Fcns)
A mathematical description of preference relationships.
Maps action space to set of real numbers.
ui : A  R
Preference Relation then defined as
a
a
*
i
a
*
i
a
*
iff u i  a   u i  a 
iff
ui  a   ui  a
*

*
u
a

u
a
iff
a ~i a

i  
i 
*
8
Utility Functions (2/2)
By quantifying preference relationships all sorts of valuable
mathematical operations can be introduced.
Also note that the quantification operation is not unique as
long as relationships are preserved. Many map preference
relationships to [0,1].
Example
Jack prefers Apples to Oranges
Apples
Jack
Oranges
u Jack  A p p les   u Jack  O ra n g es 
a) uJack(Apples) = 1, uJack(Oranges) = 0
9
b) uJack(Apples) = -1, uJack(Oranges) = -7.5
Variety of game models

Normal Form Game <N,A,{ui}>
–
–
–

Synchronous play
T is a singleton
Perfect knowledge of action space, other players’ goals (called
utility functions)
Repeated Game <N,A,{ui},{di}>
–
–
–
–
Repeated synchronous play of a normal form game
T may be finite or infinite
Perfect knowledge of action space, other players’ goals (called
utility functions)
Players may consider actions in future stages and current stages


Asynchronous myopic repeated game <N,A,{ui},{di},T>
–
–

Strategies (modified di)
Repeated play of a normal form game under various timings
Radios react to most recent stage, decision rule is “intelligent”
Many others in the literature and in the dissertation
10
as players in a game
Infer from Context
Utility function
Arguments
Establish Priority
Normal
Immediate
Observe
Outcome Space
Outside
World
11
Utility Function
Orient
Autonomous
Goal
Plan
Normal
Urgent
Learn
New
States
Decide
States
Act
\
Decision
Rules
Allocate Resources
Initiate Processes
Action
Negotiate
Sets
Technologies, 2007 ”, IEEE Mobile Multimedia Conference, 1999, pp 3-10.
Interaction is naturally modeled as a
game
Actions
Decision
Rules
u1
Actions
Decision
Rules
Action Space
Informed by
Communications
Theory
u 1  ˆ1 
f :AO
Outcome Space
 ˆ1 , ˆ 2 
12
u 2  ˆ 2 
u2
Some differences between game models

Assuming numerous iterations, normal form game only has a
single stage.
–
–

Repeated games are explicitly used as the basis for cognitive
radio algorithm design (e.g., Srivastava, MacKenzie)
–
–
13
Useful for compactly capturing modeling components at a single
stage
Normal form game properties will be exploited in the analysis of
other games
Not however, focus of dissertation
Not the most commonly encountered implementation
Player
Knowledge
Knows A
Can learn O (may know or learn A)
f : A O
Invertible
Constant
Known
Not invertible (noise)
May change over time (though
relatively fixed for short periods)
Has to learn
Preferences Ordinal
Cardinal (goals)
Summary



The interactions in a cognitive
be represented by the tuple
<N, A, {ui}, {di},T>
A dynamical system model
A myopic asynchronous
–
–

14
Suitable for outer-loop
processes
Not shown here, but can also
handle inner-loop
Some differences in models
–
–
Most analysis carries over
Some differences
Dynamical System
Game Model
Equilibrium Concepts
Nash Equilibria, Mixed
Strategy Equilbria,
Coalitional Games, the
Core, Shapley Value,
Nash Bargaining
15





Recall model of <N,A,{di},T> which we characterize with the
evolution function d
Steady-state is a point where a*= d(a*) for all t t *
Obvious solution: solve for fixed points of d.
For non-cooperative radios, if a* is a fixed point under
synchronous timing, then it is under the other three timings.
Works well for convex action spaces
–
–

Not always guaranteed to exist
Value of fixed point theorems
Not so well for finite spaces
–
Generally requires exhaustive search
16
Nash Equilibrium
“A steady-state where each player holds a correct
expectation of the other players’ behavior and acts
rationally.” - Osborne
An action vector from which no player can profitably
unilaterally deviate.
Definition
An action tuple a is a NE if for every i  N u i  a i , a  i   u i  bi , a  i 
for all bi Ai.
17
Note showing that a point is a NE says nothing about the
process by which the steady state is reached. Nor
Also note that we are implicitly assuming that only pure
strategies are possible in this case.
Examples

–
–
–

signals to choose between
{n,w} and {N,W}
n and N do not overlap
Higher throughput from
operating as a high power
wideband signal when other
is narrowband
Jamming Avoidance
–
–
Two channels
No NE
Jammer
Transmitter
18
0
1
0
(-1,1)
(1,-1)
1
(1,-1)
(-1,1)
How do the players find the Nash
Equilibrium?

Preplay Communication
–

Rational Introspection
–

Based on what each player knows about the other players, reason
what the other players would do in its own best interest. (Best
Response - tomorrow) Points where everyone would be playing
“correctly” are the NE.
Focal Point
–

Before the game, discuss their options. Note only NE are suitable
candidates for coordination as one player could profitably violate any
agreement.
Some distinguishing characteristic of the tuple causes it to stand out.
The NE stands out because it’s every player’s best response.
Trial and Error
–
Starting on some tuple which is not a NE a player “discovers” that
deviating improves its payoff. This continues until no player can
improve by deviating. Only guaranteed to work for Potential Games
(couple weeks)
19
Iterative Elimination of Strongly
Dominated Strategies (IESDS)
Motivation
Sometimes a player’s actions are not preferable, no matter
what the other players do.
These actions would thus never rationally be played and can
be eliminated from consideration in any NE action vector.
Definition
An action of player i, ai, is said to be dominated, if there exists
some other action bi such that
u i  a i , a  i   u i  bi , a  i   a  i  A  i
Note this is strictly the definition for strong domination and the
technique being discussed is actually Strong IEDS (or IESDS).
Normal (weak) IEDS and sometimes results in incorrect elimination
of NE.
20
IEDS Algorithm
1. Remove any readily apparent dominated
actions from the analysis.
2. Check to see if this reveals new dominated
actions.
3. If there are any dominated actions, repeat
from 1, else continue to 4.
4. Apply NE definition on any remaining action
tuples.
21
Example IEDS
Iteration 1.
Note the following
D
C
D
-1, -1
-10, 0
C
0, -10
u1  C , D   u1  D , D 
u1  C , C   u1  D , C 
So D is dominated by C for player
1. So we remove D for player 1
from the game.
Iteration 2.
Note the following
u2 C , C   u2 C , D 
22
-5, -5
As there is only one action tuple
left (thus no deviation is possible,
nor is a profitable deviation),
it must be a NE
So in the remaining game D is
dominated by C for player 2. So we
remove D for player 2 from the game.




IESDS is not useful for all games as many
games have no dominated strategies.
While each iterative elimination is rational,
Dutta states that rationality is more difficult to
assume when a large number of iterative
eliminations are required. (I disagree for
repetitive play)
IESDS can also augment other solution
techniques.
If a game is IESDS solvable
23
Nash Equilibrium as a Fixed Point

Individual Best Response
Bˆ i  a    bi  Ai : u i  bi , a  i   u i  a i , a  i   a i  Ai 

Synchronous Best Response
Bˆ  a    Bˆ i  a 
i N

Nash Equilibrium as a fixed point

Fixed point theorems can be used to establish
existence of NE (see dissertation)
NE can be solved by implied system of equations

*
*
a  Bˆ  a 
24
Example solution for Fixed Point by Solving for Best
Response Fixed Point

Bandwidth Allocation Game
–
–
–
–
the number of simultaneous frequency hopping channels
Goal u i  c   P  c  c i  C i  c i 
P(c) fraction of symbols that are not interfered with (making
P(c)ci the goodput for radio i)
Ci(ci) is radio i’s cost for supporting ci simultaneous
channels.

ui  c    B 


 c k  ci  K ci
kN

25
Best Response Analysis
Goal

ui  c    B 


 c k  ci  K ci
kN

Best Response

ˆ
ci  Bi  c    B  K 


 ck  / 2
kN \i

Simultaneous
System of
Equations
Solution
cˆi   B  K  / 6  i  N
26
Generalization
cˆi   B  K  /  N  1   i  N
Significance of NE for CRNs
Autonomously Rational Decision Rule

Why not “if and only if”?
–
–


Consider a self-motivated game with a local maximum and a hill-climbing algorithm.
For many decision rules, NE do capture all fixed points (see dissertation)
Identifies steady-states for all “intelligent” decision rules with the same goal.
Implies a mechanism for policy design while accommodating differing
implementations
–
–
Verify goals result in desired performance
27
Key Theorem for NE Existence

Kakutani’s Fixed Point
Theorem
–
Let f :X X be an upper
semi-continuous convex
valued correspondence
from a non-empty
compact convex set X 
n, then there is some
x*X such that x*  f(x*)
f(x)
x
28
Nash Equilibrium Existence
Visualizable Definition of Quasi-Concavity
All upper-level sets are convex
U a
*
  a  A : f  a   a 
*
f (a )
U (a 1 )
29
a0
a1
a2
a
My Favorite Mixed Strategy Story
Pure Strategies in an Extended Game
Consider an extensive form game where each stage is a strategic form game
and the action space remains the same at each stage.
Before play begins, each player chooses a probabilistic strategy that assigns
a probability to each action in his action set. At each stage, the player
chooses an action from his action set according to the probabilities he
assigned before play began.
Example
Consider a video football game which will be simulated. Before the game
begins two players assign probabilities of calling running plays or passing
plays for both offense or defense. In the simulation, for each down the kind of
play chosen by each team is based on the initial probabilities assigned to kinds
of plays.
30
Example Mixed Strategy Game
Jamming game
p
a1
q
(1-q)
a2
b2
1,-1
-1, 1
Action Tuples Probabilities
pq
(a1,a2)
p(1-q)
(a1,b2)
(1-p)q
(b1,a2)
(1-p)(1-q)
(b1,b2)
Expected Utilities
(1-p) b1
-1, 1
1, -1
U 1  p , q   pq 1   p 1  q    1  
 1  p  q   1   1  p  1  q  1 
Sets of probability distributions
U 2  p , q   pq   1   p 1  q  1  
(A1)={p,(1-p): p[0,1]}
(A2)={q,(1-q): q[0,1]}
31
 1  p  q 1   1  p  1  q    1 
Nash Equilibrium in a Mixed Strategy
Game
Definition Mixed Strategy Nash Equilibrium
A mixed strategy profile * is a NE iff iN
U i   i ,   i   U i   i ,   i    i    Ai 
*
*
*
Best Response Correspondence
B R i    i   arg m ax U i   i ,   i 
 i   Ai 
Alternate NE Definition
Consider B      i N B R i   
A mixed strategy profile * is a NE iff
  B 
*
32
*

Nash Equilibrium
U 1  p , q   4 pq  2 p  2 q  1
Best Response Correspondences
1
U 2  p , q     4 pq  2 p  2 q  1 
q
Best Response
 u1
p
u 2
q
 q   4q  2
 p   4 p  2
0

B R1  q    [0,1]
1

B R2 
33
1

p    [0,1]
0

p
1-q
p(a1)
BR2(p)
0.5
1-p
BR1(q)
q  1/ 2
q  1/ 2
q  1/ 2
p  1/ 2
0
1
Note: NE in mixed extension which
did not exist in original
p  1/ 2
p  1/ 2
p(a2) 0.5
Interesting Properties of Mixed Strategy
Games
1.
2.
3.
Every Mixed Extension of a Strategic Game
has an NE.
A mixed strategy i is a best response to -i
iff every action in the support of i is itself a
best response to -i.
Every action in the support of any player’s
equilibrium mixed strategy yields the same
payoff to that player.
34
Coalitional Game (with transferable
payoff)


Concept: groups of players (called coalitions) conspire together to
implement actions which yields a result for the coalition. The value
received by the coalition is then distributed among the coalition
members.
Where do radios collaborate and distribute value?
–
–
–


v:2 \ 
Transferable utility refers to existence of some commodity
for which a
player’s utility increases by one unit for every unit of the commodity it
Game Components, N,v
N
–
–
–

N set of players
Characteristic function
Coalition, SN
How is this value distributed?
–

802.16h interference groups – allocation of bandwidth
Distribution of frequencies/spreading codes among cells
File sharing in P2P network
Payoff vector, (xi)iS
Payoff vector is said to be S-feasible if x(S)  v(S) x  S    x i
i S
35
The Core (Transferrable)

The Core
–

For N,v, the set of feasible payoff profiles, (xi)iS
for which there is no coalition S and S-feasible
payoff vector (yi)iS for which yi > xi for all iS.
General principles of the NE also apply to the
Core:
–
–
Number of solutions for a game may be anywhere
from 0 to 
May be stable or unstable.
36
Example


Suppose three radios, N = {1,2,3}, can choose to
participate in a peer-to-peer network.
Characteristic Function
–
–
–


v(N) = 1
v({1,2})= v({1,3})= v({2,3})=[0,1]
v(1)= v(2)= v(3) = 0
Loosely,  indicates # of duplicated files
If >2/3, Core is empty
=4/5
x = (2/5, 2/5,0) x = (0, 3/5, 1/5) x = (2/5, 0, 2/5)
x = (1/3, 1/3, 1/3) x = (2/5, 2/5,0)
37


Possibility of empty core implies that even when
radios can freely negotiate and form arbitrary
Frequently very large (infinite) number of steadystates, e.g., <2/3
–


Makes it impossible to predict exact behavior
Existence conditions for the Core, but would need to
cover some linear programming concepts
Related (but not addressed today) concepts:
–
Bargaining Sets, Kernel, Nucleolus
38
Strong NE


Concept: Assume radios are able to collaborate, but
utilities aren’t necessarily transferrable
An action tuple a* such that
 
ui a
*


 u i a S , a  S  S  N , a S   Ai
*
i S
No Strong NE
Unique Strong NE
n
w
39
N
W
(9.6,9.6)
(9.6, 21)
(21, 9.6)
(22, 22)
Motivation for Shapley value


Core was generally either empty or very large. Want a
“good” single solution.
Terminology
–
Marginal Contribution of i
–
Interchangeability of i, j
–
i S   v S  i  v S 
i S   
j
 S  S
 N \ {i , j }
Dummy player (no synergy)
 i  S   v  i   S  N \ i
40
Axioms for Shapley Value


Let  be some distribution of value for a TU coalition
game
Symmetry:
–

Dummy:
–

If i is a dummy, then i(v) = v({i})
–

If i and j are interchangeable, then i(v)=j(v)
Given N,v and N,w, i(v + w)= i(v)+ i(w) for all iN,
where v+w = v(S) + w(S)
Balanced Contributions
–
Given N,v,
 i  N , v    i  N \ j .v
N\j
    N , v     N \ i.v 
N \i
j
41
j
Shapley Value
 i S  

S  N \i
S ! N  S  1 !
N !
v S
i   v  S 
Marginal Value Contributed by i
Probability that i will be next one invited to the grand coalition
given that coalition S is already part of the coalition assuming
random ordering.
Only assignment (value) that satisfies balanced contributions; only
assignment that simultaneously satisfies symmetry, dummy, and
42
Implications of Shapley Value

One form of a fair allocation
–
–
–


Independent of order of arrival
I liken it to setting salaries according to the Value Over
Replacement Player concept in Baseball Statistics
“Better” solution concept than the core as it’s a single
payoff as opposed to a potentially infinite number
Allows for analysis of relative “power” of different
players in the system
43
Bargaining Problem

Components: F, v
–
–
Feasible payoffs F, closed convex subset of n
Disagreement Point v = (v1, v2)


Example:
–
–




What 1 or 2 could achieve without bargaining
Even if system is jammed, still gets some throughput
Member of 802.16h interference group and try its luck
F is said to be essential if there is some yF such that y1>v1 and
y2>v2
If contracts are “binding” then F could be the payoffs
corresponding to entire original action space
Otherwise, F may need to be drawn from the set of NE or from
enforceable set (see punishment in repeated games)
A particular solution is referred to by (F, v) n
44
Solution

Strong Efficiency
–

(F, v) is Pareto Efficient
Independence of
Irrelevant Alternatives
–
Individually Rational
–


(F, v) v
Scale Covariance
–
For any 1, 2, 1, 2,
1, 2 >0, if

If G F and G is closed
and convex and (F,
v)G, then (G, v)=(F,
v)
Symmetry
–
then
If v1=v2 and
{(x1,x2)|(x2,x1)F}=F, then
1(F, v)= 2(F, v)
G    1 x1   1 ,  2 x 2   2  |  x1 , x 2   F 
  G , w    11  F , v    1 ,  2 2  F , v    2 
45
w    1 v1   1 ,  2 v 2   2 
Nash Bargaining Solution

NBS
  F , v   arg max   x i  v i 
x  F , x  v i N

Interestingly, this is the only bargaining
solution which simultaneously satisfies the
preceding 5 axioms
46
GT framework for BW allocation [Yaiche]:
System Model





N users
Users compete for the total link capacity
Each user has a minimum rate MRi and peak rate
PRi
Admissible rate vector is given by,
X 0  x  ¡
N
| x  M R , x  P R , and A x  C 
C : vector of link capacities
A L*N: alp = 1 if link belongs to path p, else 0.
47
Scenario given in H. Yaiche, R. Mazumdar, C. Rosenberg, “A game theoretic framework for bandwidth allocation and pricing in broadband networks”,
IEEE/ACM Transactions on Networking, Volume: 8 , Issue: 5 , Oct. 2000, pp. 667-678.
Centralized Optimization Problem
N
M ax x   x i  M R i 
v
i 1
st :
xi  M Ri
i  1 K N 
xi  P Ri
i  1 K N 
 A x l

  C l
l  1 K L 
NBS exists
48
F



Not every game has a steady-state
NE are analogous to fixed points of self-interested decision
processes
NE can be applied to procedural and ontological radios
–


A game (network) may have 0, 1, or many steady-states
All finite normal form games have an NE in its mixed extension
–


Don’t need to know decision rule, only goals, actions, and
assumption that radios act in their own interest
Over multiple iterations, implies constant adaptation
More complex game models yield more complex steady-state
concepts
Can define steady-states concepts for coalitional games
–
Frequently so broad that specific solutions are used
49
Evaluating Equilibria
Objective Function
Maximization, Pareto
Efficiency, Notions of
Fairness
50
Optimality



In general we assume the
existence of some design
objective function J:A
The desirableness of a
network state, a, is the
value of J(a).
In general maximizers of
J are unrelated to fixed
points of d.
Figure from Fig 2.6 in I. Akbar, “Statistical Analysis of
Wireless Systems Using Markov Models,” PhD
Dissertation, Virginia Tech, January 2007
51
Example Functions

Utilitarian
–
–

Sum of all players’ utilities
Product of all players’
utilities
Practical
–
–
–
–

Utilitarian Maximizers
Total system throughput
Average SINR
Maximum End-to-End
Latency
Minimal sum system
interference
System Throughput Maximizers
Objective can be unrelated
to utilities
Interference Minimization
52
Price of Anarchy (Factor)
Performance of Centralized Algorithm Solution
Performance of Distributed Algorithm Solution
1

Centralized solution always at least as good
as distributed solution
–

Like ASIC is always at least as good as DSP
Ignores costs of implementing algorithms
–
–
Sometimes centralized is infeasible (e.g.,
routing the Internet)
Distributed can sometimes (but not generally)
be more costly than centralized
9.6
7
53
Implications

Best of All Possible Worlds
–

Low complexity distributed algorithms with low anarchy factors
Reality implies mix of methods
–
Hodgepodge of mixed solutions





–
Policy – bounds the price of anarchy
Utility adjustments – align distributed solution with centralized solution
Market methods – sometimes distributed, sometimes centralized
Punishment – sometimes centralized, sometimes distributed,
sometimes both
Radio environment maps –”centralized” information for distributed
decision processes
Fully distributed

Potential game design – really, the panglossian solution, but only
applies to particular problems
54
Pareto efficiency (optimality)



Formal definition: An action vector a* is Pareto
efficient if there exists no other action vector a, such
that every radio’s valuation of the network is at least
as good and at least one radio assigns a higher
valuation
Informal definition: An action tuple is Pareto
efficient if some radios must be hurt in order to
improve the payoff of other radios.
Important note
–
–
Like design objective function, unrelated to fixed points (NE)
Generally less specific than evaluating design objective
function
55
Example Games
Legend
Pareto Efficient
NE
NE + PE
a2
b2
a1
1,1
-5,5
b1
5,-5
-1,-1
a2
b2
a1
1,1
-5,5
b1
5,-5
3, 3
56
Notions of Fairness

What is “Fair”?
–
–

Abstractly “fair” means different things to different analysts
In every day life, really just short hand for “I deserve more
than I got”
Nonetheless is used to evaluate how equitably
57

Basic concept:
–
–
–
–




58
Order players by utility.
Form CDF for sorted utility distribution
(Lorenz curve)
Integrate (sum) the difference between
perfect equality (of outcome) and CDF
Divide result by sum of all players’ utilities
Formula

 n  1  i  ui  a  

1

G a 
n  1  2 i N

n 
 ui  a 

i N


Used in a lot of macro-economic
comparisons of income distributions
Relatively simple, independent of scale,
independent of size of N, anonymity
Radically different outcomes can give the
same result
Aggregate Utility
Gini Coefficient
Lorenz curve
Player #
G
N
W
n
0
0.37
w
0.37
0
Other Metrics of Fairness

Theill Index
 ui  a  ui  a  
T a   
ln

n i N  u
u 
1

u a 
1
u a

n
i
i N
Atkinson Index,  is income inequality aversion
11
T a  1 
u n
11
T a  1 
u n

i N
 u a
i
i N

ui  a  

1 



1 1   
1/ n
,  1
59
,    0,1 
Summary of Equilibria Evaluation






Lots of different ways which a point can be
evaluated
Loosely, any point could be said to be
optimal given the right objective function
Insufficient to say that a point is optimal
Must describe the metric in use
Suggestion: use whatever metric makes
sense to you as a network designer
60
The Notion of Time and Imperfections
in Games and Networks
Extensive Form Games,
Repeated Games, Convergence
Concepts in Normal Form
Games, Trembling Hand
Games, Noisy Observations
61
Model Timing Review


also matters and different
decisions at different time
Tj – when radio j makes its
–
–
Generally assumed to be
an infinite set
Assumed to occur at
discrete time



Consistent with DSP
implementation
T=T1T2Tn
tT

Decision timing classes
Synchronous
–

Round-robin
–
–

One at a time in order
Used in a lot of analysis
Random
–

All at once
One at a time in no order
Asynchronous
–
–
Random subset at a time
62
Extensive Form Game Components
Components
A set of players.
The actions available to each player at each decision moment (state).
A way of deciding who is the current decision maker.
Outcomes on the sequence of actions.
Preferences over all outcomes.
1.
2.
3.
4.
5.
A Silly Jammer Avoidance Game
Game Tree Representation
B 1
A
1
2
63
2
B 1’
2’
Strategic Form Equivalence
1,-1 Strategies for A
1,1’ 1,2’ 2,1’ 2,2’
{1,2}
-1,1
Strategies for B 1 1,-1 1,-1 -1,1 -1,1
-1,1 {(1,1’),(1,2’),
(2,1’),(2,2’)} 2 -1,1 1,-1 -1,1 1,-1
1,-1
Backwards Induction

Concept
–
–
–
–
Reason backwards based on what each player would rationally
play
Predicated on Sequential Rationality
Sequential Rationality – if starting at any decision point for a player
in the game, his strategy from that point on represents a best
response to the strategies of the other players
Subgame Perfect Nash Equilibrium is a key concept (not formally
discussed today).
Alternating Packet Forwarding Game
2
1
1 C 2 C 1 C 2
C
C 4,6
C 5,3
0,2 3,1 2,4
S
S
S
S
S
S
64
1,0
0,2
3,1
2,4
5,3
4,6
7,5



Actions will generally not be directly observable
However, likely that cognitive radios will build up
histories
Ability to apply backwards induction is predicated on
observations and what they know they know…
–

Likely not practical
Really the best choice for modeling notion of time
when actions available to radios change with history
65
Repeated Games

Same game is repeated
–
–

Indefinitely
Finitely
Players consider
discounted payoffs across
multiple stages
–
S ta g e 1
S ta g e 2
Stage k
ui  a    ui  a 
– Expected value over all
future stages
k
k
k

ui
 a    
k
k
ui  a
k

S ta g e k
k 0
66
Lesser Rationality: Myopic Processes




Players have no knowledge about utility functions, or
expectations about future play, typically can observe
or infer current actions
Best response dynamic – maximize individual
performance presuming other players’ actions are
fixed
Better response dynamic – improve individual
performance presuming other players’ actions are
fixed
Interesting convergence results can be established
67
Paths and Convergence

Path [Monderer_96]
–
–
–
–
–

A path in  is a sequence  = (a0, a1,…) such that for every k  1
there exists a unique player such that the strategy combinations
(ak-1, ak) differs in exactly one coordinate.
Equivalently, a path is a sequence of unilateral deviations. When
discussing paths, we make use of the following conventions.
Each element of  is called a step.
a0 is referred to as the initial or starting point of .
Assuming  is finite with m steps, am is called the terminal point or
ending point of  and say that  has length m.
Cycle [Voorneveld_96]
–
A finite path  = (a0, a1,…,ak) where ak = a0
68
Improvement Paths

Improvement Path
–

A path  = (a0, a1,…) where for all k1, ui(ak)>ui(ak-1)
where i is the unique deviator at k
Improvement Cycle
–
–
An improvement path that is also a cycle
See the DFS example
1
2
5
6
3
69
4
Convergence Properties

Finite Improvement Property (FIP)
–

Weak Finite Improvement Property (weak
FIP)
–



All improvement paths in a game are finite
From every action tuple, there exists an
improvement path that terminates in an NE.
FIP implies weak FIP
FIP implies lack of improvement cycles
Weak FIP implies existence of an NE
70
Examples
Game with FIP
A
B
a
1,-1
0,2
b
-1,1
2,2
Weak FIP but not FIP
71
A
B
C
a
1,-1
-1,1
0,2
b
-1,1
1,-1
1,2
c
2,0
2,1
2,2
Implications of FIP and weak FIP



internal states and current observations
Unless the game model of a CRN has weak FIP, then no
autonomously rational decision rule can be guaranteed to converge
from all initial states under random and round-robin timing (Theorem
4.10 in dissertation).
If the game model of a CRN has FIP, then ALL autonomously
rational decision rules are guaranteed to converge from all initial
states under random and round-robin timing.
–

And asynchronous timings, but not immediate from definition
More insights possible by considering more refined classes of
decision rules and timings
72
Decision Rules
73
Absorbing Markov Chains and
Improvement Paths

Sources of randomness
–
–
–



Timing (Random, Asynchronous)
Decision rule (random decision rule)
Corrupted observations
An NE is an absorbing state for autonomously rational decision
rules.
Weak FIP implies that the game is an absorbing Markov chain
as long as the NE terminating improvement path always has a
nonzero probability of being implemented.
This then allows us to characterize
–
–
–
convergence rate,
probability of ending up in a particular NE,
expected number of times a particular transient state will be visited
74
Connecting Markov models, improvement paths, and
decision rules



Suppose we need the path  = (a0, a1,…am) for convergence by
weak FIP.
Must get right sequence of players and right sequence of
Friedman Random Better Response
–
Random or Asynchronous


–

Every sequence of players have a chance to occur
Random decision rule means that all improvements have a chance to
be chosen
Synchronous not guaranteed
Alternate random better response (chance of choosing same
action)
–
–
Because of chance to choose same action, every sequence of
players can result from every decision timing.
Because of random choice, every improvement path has a chance
of occurring
75
Convergence Results (Finite Games)


If a decision rule converges under round-robin, random, or
synchronous timing, then it also converges under asynchronous
timing.
Random better responses converge for the most decision timings
and the most surveyed game conditions.
–
Implies that non-deterministic procedural cognitive radio
implementations are a good approach if you don’t know much about the
network.
76
Impact of Noise





Noise impacts the mapping from actions to outcomes, f
:AO
Same action tuple can lead to different outcomes
Most noise encountered in wireless systems is
theoretically unbounded.
Implies that every outcome has a nonzero chance of being
observed for a particular action tuple.
Some outcomes are more likely to be observed than others
(and some outcomes may have a very small chance of
occurring)
77
DFS Example


Consider a radio observing the spectral
energy across the bands defined by the set
C where each radio k is choosing its band of
operation fk.
Noiseless observation of channel ck
oi  c k  


g
ki
p k  c k , f k

k N
Noisy observation o i  c k    g ki p k  c k , f k   n i  c k , t 
k N
If radio is attempting to minimize inband
believe that a band has lower or higher
interference than it does
78
Trembling Hand (“Noise” in Games)

Assumes players have a nonzero chance of making
an error implementing their action.
–
–

Who has not accidentally handed over the wrong amount of
cash at a restaurant?
Who has not accidentally written a “tpyo”?
Related to errors in observation as erroneous
observations cause errors in implementation (from
an outside observer’s perspective).
79
Noisy decision rules

Noisy utility u i  a , t   u i  a   n i  a , t 
Trembling
Hand
Observation
Errors
80
Implications of noise


For random timing, [Friedman] shows game with noisy random
better response is an ergodic Markov chain.
Likewise other observation based noisy decision rules are
ergodic Markov chains
–
–
–

any action
If coupled with random, synchronous, or asynchronous timings,
then CRNs with corrupted observation can be modeled as ergodic
Makov chains.
Not so for round-robin (violates aperiodicity)
Somewhat disappointing
–
No real steady-state (though unique limiting stationary distribution)
81
DFS Example with three access points



3 access nodes, 3 channels, attempting to
operate in band with least spectral energy.
Constant power
3
1

Noiseless observations

Random timing
82
2
Trembling Hand

Transition Matrix, p=0.1

Limiting distribution
83
Noisy Best Response

Transition Matrix, (0,1) Gaussian Noise

Limiting stationary distributions
84
Comment on Noise and Observations

Cardinality of goals makes a difference for cognitive radios
–
–
Probability of making an error is a function of the difference in
utilities
With ordinal preferences, utility functions are just useful fictions



Might as well assume a trembling hand
Unboundedness of noise implies that no state can be absorbing
for most decision rules
NE retains significant predictive power
–
–
–
–
While CRN is an ergodic Markov chain, NE (and the adjacent
states) remain most likely states to visit
Stronger prediction with less noise
Also stronger when network has a Lyapunov function
Exception - elusive equilibria ([Hicks_04])
85
Summary


Given a set of goals, an NE is a fixed point for all radios with those goals
for all autonomously rational decision processes
Traditional engineering analysis techniques can be applied in a game
theoretic setting
–

Network must have weak FIP for autonomously rational radios to
converge
–

Weak FIP implies existence of absorbing Markov chain for many decision
rules/timings
In practical system, network has a theoretically nonzero chance of visiting
every possible state (ergodicity), but does have unique limiting stationary
distribution
–
–

Markov chains to improvement paths
Specific distribution function of decision rules, goals
Will be important to show Lyapunov stability
Shortly, we’ll cover potential games and supermodular games which can
be shown to have FIP or weak FIP. Further potential games have a
Lyapunov function.
86
to Yield Desired Behavior
Policy, Cost Functions,
Global Altruism,
Supermodular Games,
Potential Games
87
Policy


Concept: Constrain the
available actions so the
worst cases of distributed
decision making can be
avoided
Not a new concept –
–

Policy has been used since
there’s been an FCC
What’s new is assuming
decision makers are the
88
humans

Need a language to convey policy
–
–


Might need to tie in to humans
Need a source for policy
–
–

Policy engine?
Need an enforcement mechanism
–
frequency
Policies
–

Learn what it is
Expand upon policy later
Who sets it?
Who resolves disputes?
Logical extreme can be quite complex,
but logical extreme may not be
necessary.
89
802.22 Example Policies

Detection
–
–
–

Digital TV: -116 dBm over a 6 MHz channel
Analog TV: -94 dBm at the peak of the NTSC (National
Television System Committee) picture carrier
Wireless microphone: -107 dBm in a 200 kHz bandwidth.
Transmitted Signal
–
–
–
4 W Effective Isotropic Radiated Power (EIRP)
Channel vacation times
L. Challapali, D. Birru, S. Shankar, “IEEE 802.22: The First Worldwide Wireless Standard based on Cognitive Radios,”
90C.IEEECordeiro,
DySPAN2005, Nov 8-11, 2005 Baltimore, MD.

Concept: Centralized unit dynamically adjusts costs
operate on desired point
u i  a   u i  a   ci  a 

Example: Add -12 to use of wideband waveform
91

Permits more flexibility than policy
–

If a radio really needs to deviate, then it can
Easy to turn off and on as a policy tool
–
–
Example: protected user shows up in a channel,
cost to use that channel goes up
Example: prioritized user requests channel, other
users’ cost to use prioritized user’s channel goes
up (down if when done)
92
Global Altruism:
distributed, but more costly

and then each independently computes jointly optimal solution
–
–




C = cost of computation
I = cost of information transfer from node to node
n = number of nodes
Distributed
–


nC + n(n-1)I/2
Centralized (election)
–

Proposed for spreading code allocation in Popescu04, Sung03
Used in xG Program (Comments of G. Denker, SDR Forum Panel Session on
“A Policy Engine Framework”) Overhead ranges from 5%-27%
C + 2(n-1)I
Price of anarchy = 1
May differ if I is asymmetric
93
Improving Global Altruism


Global altruism is clearly inferior to a centralized solution for a
single problem.
However, suppose radios reported information to and used
information from a common database
–


And suppose different radios are concerned with different
problems with costs C1,…,Cn
Centralized
–
–

n(n-1)I/2 => 2nI
Resources = 2(n-1)I + sum(C1,…,Cn)
Time = 2(n-1)I + sum(C1,…,Cn)
Distributed
–
–
Resources = 2nI + sum(C1,…,Cn)
Time = 2I + max (C1,…,Cn)
94
Example Application:

Overlay network of secondary
users (SU) free to adapt power,
transmit time, and channel

Without REM:
–

Decisions solely based on link SINR
With REM
–
Upshot: A little gain for the secondary users;
big gain for primary users
95
From: Y. Zhao, J. Gaeddert, K. Bae, J. Reed, “Radio Environment Map Enabled SituationAlgorithms,” SDR Forum Technical Conference 2006.
CognitiveCognitive
2007

Local altruism also possible
–



Less information transfer
Like policy, effectively needs a common language
Nominally could be centralized or distributed
database
–
Y. Zhao, B. Le, J. Reed, “Infrastructure Support – The Radio
Environment MAP,” in Cognitive Radio Technology, B. Fette,
ed., Elsevier 2006.
96
Repeated Games

Same game is repeated
–
–

Indefinitely
Finitely
Players consider
discounted payoffs
across multiple stages
–
k

  ui  a
k
k

Expected value over all
future stages

97
S ta g e 2
Stage k
ui  a
–
S ta g e 1
ui
 a    
k
k 0
k
ui  a
k
S ta g e k

Impact of Strategies



Rather than merely reacting to the state of the network, radios
can choose their actions to influence the actions of other radios
Threaten to act in a way that minimizes another radio’s
performance unless it implements the desired actions
Common strategies
–
–
–

Tit-for-tat
Grim trigger
Generous tit-for-tat
Play can be forced to any “feasible” payoff vector with proper
selection of punishment strategy.
98
Impact of Communication on
Strategies


Players agree to play in a certain manner
Threats can force play to almost any state
–
99
Breaks down for finite number of stages
C
N
0,0
-5,5
-100,0
c
5,-5
-1,1
-100,-1
n
0,-100
-1,-100 -100,-100
Improvement from Punishment



100
Throughput/unit
power gains be
enforcing a
power level at a
base station
Punishment by
jamming
Without benefit to
deviating, players
can operate at
lower power level
and achieve same
throughput
A. MacKenzie and S. Wicker, “Game Theory in Communications:
Motivation, Explanation, and Application to Power Control,” Globecom2001,
pp.
821-825.
Cognitive
Instability in Punishment


Issues arise when
observing actions and
are punishing with their
actions without
announcing punishment
Eventually, a deviation
will be falsely detected,
punished and without
V. Srivastava, L. DaSilva, “Equilibria for Node Participation in Ad Hoc Networks –
An Imperfect Monitoring Approach,” ICC 06, June 2006, vol 8, pp. 3850-3855
101


Works best with a common controller to announce
Problems in fully distributed system
–
–

Problems when actions cannot be directly observed
–

No single best strategy exists
–
–

Need to elect a controller
Otherwise competing punishments, without knowing other players’
utilities can spiral out of control
Strategy flexibility is important
Significant problems with jammers (they nominally receive higher
utility when “punished”
Generally better to implement centralized controller
–
Operating point has to be announced anyways
102
What does game theory bring to the design of


A natural “language” for modeling cognitive radio
networks
–



goal
Simplifies analysis of random procedural radios
Permits simultaneous analysis of multiple
decision rules – only need goal
Provides condition to be assured of possibility of
convergence for all autonomously myopic
103
What does game theory bring to the design of


Provides condition to be assured of convergence for
all autonomously myopic cognitive radios (FIP, not
synchronous timing)
Rapid analysis
–


Verify goals and actions satisfy a single model, and steadystates, convergence, and stability
An intuition as to what conditions will be needed to
field successful cognitive radio decision rules.
A natural understanding of distributed interactive
behavior which simplifies the design of low
complexity distributed algorithms
104
Networks



Almost as many models as
there are algorithms
Normal form game excellent
for capturing single iteration
of a complex system
features to this model
–

Time, decision rules, noisy
observations, Natural
states
Some can be recast as a
normal form game
–

Normal Form Game
–

Repeated Game
–

N, A, {ui}, {di}, T
TU Game
–

N, A, {ui}, {di}, T
Extensive Form Game
–

N, A, {ui}, {di}
Asynchronous Game
–

N, A, {ui}
N, v
Bargaining Game
–
N, v
Extensive form game
105



Different game models have
Games can have many, one, or
Nash equilibrium (and its
variants) is most commonly
applied concept
–


Excellent for distributed
noncollaborative algorithms





Nash Equilibrium
Strong Nash Equilibrium
Core
Shapley Value
Nash Bargaining Solution
Games with punishment and
Coalitional games tend to have a
very large number of equilibria
Game theory permits analysis of
specific decision rules
106
Optimality



Numerous different
notions of optimality
Use whatever metric
makes sense





Pareto Efficiency
Objective Maximization
Gini Index
Shapley Value
Nash Bargaining
Solution
107
Convergence



Showing existence
insufficient; need to
reach those states
FIP (potential
games) gives the
convergence
conditions
Random timing
actually helps
convergence
108
Noise



Unbounded noise causes
all networks to
theoretically behave as
ergodic Markov chains
Important to show
Lyapunov stabiltiy
Noisy observations
cause noisy
implementation to an
outside observer
–
Trembling hand
109
Game Theory and Design



Numerous techniques for
improving the behavior of
Techniques can be combined
Potential games yield lowest
complexity implementations
–

–
Observing actions
Likely best when a referee
exists
Policy can limit the worst
optimality or convergence
issues
Punishment
–
–


Limits worst case performance
Cost function
–
–

Can enforce any action tuple
Can be brittle when distributed
Policy
–
Practical limitations limit
effectiveness of punishment
–

Judicious design of goals,
actions,

Reshapes preferences
Could damage underlying
structure if not a selfinterested cost
Centralized
–
–
–
Can theoretically realize any
result
Slower reactions
110
Future Directions in General Game Theory



Integrate policy and potential games
Integration of coalitional and distributed
forms
Increasing dimensionality of action sets
–

Cross-layer
Integration of dynamic and hierarchical
policies and games
111
Future Direction in Regulation


Can incorporate optimization into policy by
specifying goals
In theory, correctly implementing goals,
correctly implementing actions, and
enough to predict behavior (at least for
potential games)
–

Simpler policy certification
Provable network behavior
112
Avenues for Future Research on Game
Theory and CRNs






Integration of bargaining,
centralized, and distributed
algorithms into a common
framework
Cross-layer algorithms
Better incorporating
performance of classification
techniques into behavior
Asymmetric potential games
Bargaining algorithms for
Improving the brittleness of
punishment in distributed
implementations with imperfect
observations




Imperfection in observations in
general
Time varying game models
while inferring convergence,
stability…
Combination of policy, potential
games, coalition formation, and
token economies
Can be modeled as a game
with to types of players
–
–
113
Dynamic policy provider
```