WIRELESS COMMUNICATIONS
From Systems to Silicon
Raghu Rao
Wireless Systems Group,
Xilinx Inc.
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
2
R. M. Rao, 2008
Communications Roadmap
• Key markets
• Core DSP technologies
– OFDM
– MIMO
• IP Network is key
• Enables new
approaches to
– QoS management
– Robustness
– Capacity
3
R. M. Rao, 2008
Wireless Environment
• Multipaths caused by reflections from various
objects.
4
R. M. Rao, 2008
Modeling the Channel
• As the mobile moves through the environment, the
field strength varies due to :
– Free space path loss
– Long term (slow) fading
– Short term (fast) fading
Signal Level (dB)
short term fading
path loss
long term fading
log(distance)
5
R. M. Rao, 2008
Doppler
• Changes in the received carrier frequency due to the
relative motion of the mobile to the base station
q
D=v. t
• f= fd = (v/l) cos(q)
– for f=900 MHz, v = 70 MPH (112 km/h)
– fD-max = v/l = 93.3 Hz
6
R. M. Rao, 2008
Delay Spread
• Measure of the time distribution of power in the channel
impulse response
– Typical office 25 ns to 60 ns
– Large Lobbies and atria: 100 ns
– Warehouse and factory floors: 100 ns to 200 ns
– Delay spreads are up to 10 microseconds in cellular environments
• Greater than 3 msec in urban areas
• 0.5 ms in suburban and open areas
7
R. M. Rao, 2008
Exponential Power Delay Profile
• If the delay spread of the channel is larger than the symbol interval
we will see multiple paths in our channel.
• Leads to inter-symbol interference (ISI).
• Leads to a frequency selective channel.
• Average energy of the channel impulse response follows an
exponential power-delay profile.
8
R. M. Rao, 2008
Coherence Bandwidth
• Maximum frequency bandwidth for which the signals
are still considered to be correlated.
• Bc in Hz = 1/(2ptrms) when considering amplitude
correlation (correlation coefficient = 0.5)
• trms is the rms-delay-spread of the channel
9
R. M. Rao, 2008
Coherence Time
• Maximum time period for which the signals are still
considered to be correlated.
• It is used to characterize the time varying nature of
the channel.
• Rule of Thumb 9/(16pfm)<Tc<0.423/(fm)
– fm is the maximum Doppler frequency
– Correlation coefficient = 0.5
10
R. M. Rao, 2008
Link Budget
• A link budget is used to compute the range, transmit
power, receiver sensitivity and other requirements of the
communication system.
• In free space the path loss is given by the Friis equation :
Pr 
Pt G t G r l
(4 p ) d
2
2
2
• Gt , Gr represent transmit and receive antenna gains. Pt ,
Pr represent the transmit power and receive power.
l
is the wavelength, d is the distance.
11
R. M. Rao, 2008
Link Budget
• Expressing path loss in dB :
Pr ( dB )  Pt ( dB )  G t ( dB )  G r ( dB )  20 log(
l
4p
)  ( ).10 log( d )
• Note:  is the path loss exponent depending on the
environment (2 in free space).
• To compute the SNR at the baseband we need to include
thermal noise in the signal bandwidth B, and noise figure
of the system NF.
Pr ( dB )   174 dBm / H z  N F  10 log( B )  SN R
12
R. M. Rao, 2008
Link Budget
• Margin for desired outage taking into account receiver
structure and antenna diversity.
– Standards specify outage probabilities
– WiMax – 90% in the cell, 75% at the boundary of the cell.
• Compensation factors for other impairments
– Interference from neighbouring cell
– Shadow fading, etc.
• Diversity helps achieve the outage probability (or reduces
the margin for outage) without increase in transmit power.
13
R. M. Rao, 2008
Diversity
• Diversity provides the receiver with multiple looks at the transmitted
signal.
• Prob(all channels in a fade) << Prob(any 1 channel in a fade)
• Diversity improves link reliability.
Combined
channel
Channel 1
10
Channel 2
Signal Level (dB)
5
0
-5
-10
-15
-20
0
20
40
60
80
100
120
140
160
180
200
Time
14
R. M. Rao, 2008
Diversity Techniques
• Spatial Diversity
– Antennas “sufficiently spaced” apart (> ½ wavelength).
– Will result in an independent channel response and provide another look at the
transmitted signal.
• Frequency Diversity
– Transmit over multiple carrier frequencies.
– If the frequencies are “sufficiently far” (coherence bandwidth) apart the channel
response will be different on the different frequencies.
• Time Diversity
– Channel is continuously changing.
– Transmit signals “sufficiently spaced” (coherence time) apart in time so the 2nd
transmission “sees” a different channel compared to the first one.
• Polarization Diversity
– Signals transmitted on two orthogonal polarizations exhibit uncorrelated fading
statistics.
15
R. M. Rao, 2008
MIMO Systems
•
•
•
MIMO systems:
• Multiple Antennas at the transmitter and
receiver.
3 types of MIMO Systems:
• STBC MIMO systems
• Diversity gain.
• Spatial Multiplexing MIMO systems
• Capacity/throughput gain.
• Feedback MIMO systems
• Higher performance thru interference
reduction.
MISO (multiple input single output) Systems:
• STBC can be used with just 1 receive antenna.
• Provides diversity gain.
• To achieve array gain, need knowledge of
channel at the transmitter (feedback).
Rx Antenna 1
Tx Antenna 1
Tx Antenna 2
Tx Antenna M
H
Rx Antenna 2
Rx Antenna N
16
R. M. Rao, 2008
Spatial Multiplexing
• A spatial multiplexing MIMO system transmits different data symbols from each
transmitter.
• The signals from each transmitter combine over the air and are received by multiple
receive antennas.
• SM systems have a rate=M (num transmit antennas). The diversity order depends on
the type of encoding and receiver (uncoded SM with ML decoding has diversity
order=N (num receive antennas)).
r1(t) = a11x(t)+a12y(t)+a13z(t)
x(n)
y(n)
MODULATOR
x(t)
MODULATOR
y(t)
z(n)
MODULATOR
z(t)
MIMO
MIMO
Receiver
Receiver
x(n)
y(n)
z(n)
r3(t) = a31x(t)+a32y(t)+a33z(t)
17
R. M. Rao, 2008
Spatial Multiplexing Receivers
Zero Forcing receiver:
y1  h11 x1  h12 x 2  n1
y 2  h 21 x1  h 22 x 2  n 2
h11
Rx Antenna 1
Tx Antenna 1
h12
h21
Tx Antenna 2
h22
Rx Antenna 2
For ZF receivers W  H
1
 y1   h11
  
 y 2   h 21
 xˆ1  
  
 xˆ 2  
h12   x1   n1 
  
h 22   x 2   n 2 
W
  y1 
y 
 2
 xˆ1   h11
  
 xˆ 2   h 21
h12 

h 22 
1
 y1 
 
 y2 
Significant increase in noise when the channel is in a deep fade.
18
R. M. Rao, 2008
Spatial Multiplexing Receivers
• MMSE MIMO Decoders:
– Cancels interference and minimizes noise.
– Minimizes the over all error (mean squared error).
E [( xˆ  x ) ]
2
W M M SE 
M 
H
Es 
H
H 

IM 
SN R

M
1
H
H
19
R. M. Rao, 2008
Spatial Multiplexing Receivers
•
•
•
•
Zero-Forcing
MMSE
Successive Interference cancellation receivers
Sphere detectors (sub-optimal Maximum
Likelihood)
20
R. M. Rao, 2008
Transmit Diversity
• Space Time Block Code (STBC)
– 2 Antenna STBC also known as “Alamouti Code”.
– Improves BER/SER performance.
h1
Information
Source
Constellation
Mapper
Alamouti ST
block code
h2
Soft decision for
c1
ML Decision
r2 
h1 (  c 2* ) 
Symbol
Period 2
h2 ( c1* )
r1  h1c1  h2 c2
STBC
Decoder
ML Decision
Symbol
Period 1
Soft decision for
c2
21
R. M. Rao, 2008
STBC Decoder
In matrix form the received signal is:
 r1   h1
 *   *
 r2   h 2
h 2   c1   n 1 
   *
* 
 h1   c 2   n 2 
r  Hc  n
Decoder:
cˆ  H
H
cˆ  ( h1
r  H
H
2
2
 h2
(Hc  n)
 c1
)
0
0   n1 
   *
c2   n2 
Low complexity decoder.
Just 2 complex mults
per symbol for a 2
antenna system (and
grows linearly with block
length/num antennas).
22
R. M. Rao, 2008
Other MIMO schemes
• Achieving high rate high diversity MIMO systems
is an area of active research.
• There are many suboptimal STBC schemes that
improve the rate but reduce the diversity order.
• There are also combinations of spatial
multiplexing and STBC schemes.
• One such scheme is 2 (or more) Alamouti’s in
parallel.
23
R. M. Rao, 2008
Stacked Alamouti
•
•
•
•
Interference Cancelling STBC
2 Alamouti’s in parallel
Rate 2 system
Diversity order =
N*(M-K+1)
– K : co-channel users
– N : transmit antennas per user.
– M : receive antennas
• Requires N*(K-1)+1 antennas
at the receiver to suppress K-1
interferers.
Constellation
Mapper
Alamouti ST
block code
Data Stream 1
Information
Source
Constellation
Mapper
Alamouti ST
block code
Data Stream 2
Transmitter for Interference Cancelling STBC
r1
Interference Cancellation and ML
Decision
Data Stream 1
C1
r2
C2
Data Stream 2
Receiver for Interference Cancelling STBC
24
R. M. Rao, 2008
Magnitude
Orthogonal Frequency Division
Multiplexing (OFDM)
Frequency
OFDM divides a frequency selective channel into a number
of flat fading channels
25
R. M. Rao, 2008
OFDM Modulation
• A QAM symbol is modulated onto each subcarrier
• IFFT/FFT are used for efficient modulation and demodulation
Frequency Domain
QAM
Mapping
S/P
Strip
cyclic
prefix
Cyclic
Prefix
IFFT
Time Domain
RF
and
A/D
Time Domain
S/P
P/S
D/A
and
RF
(a)
Frequency Domain
FFT
FEQ
P/S
QAM
decoding
(b)
26
R. M. Rao, 2008
Combating Multipath
OFDM Symbol
CP
Constructing the cyclic prefix (CP)
Multipath components
tmax
Sampling Instant
Ts
• Sampling at instant Ts all channels experience
the same channel and there is no ICI
27
R. M. Rao, 2008
MIMO and OFDM
• MIMO – Multiple Input Multiple Output
Communication System. Employs multiple
antennas at both transmitter and receiver.
• OFDM – Orthogonal Frequency Division
Multiplexing. Breaks up a broadband channel into
many parallel narrowband channels (subcarriers).
• MIMO-OFDM – A Combination of MIMO and
OFDM. Appears like many parallel MIMO systems
on orthogonal subcarriers.
28
R. M. Rao, 2008
MIMO-OFDM System
OFDM
DEMODULATOR 1
MIMO DECODER
OFDM
TRANSMITTER N
RICH SCATTERING
ENVIRONMENT
OFDM
TRANSMITTER 1
OFDM
DEMODULATOR N
Each transmitter is an independent OFDM modulator.
The source symbols could be space-time block coded or just QAM modulated
for spatial multiplexing.
Each receiver is an OFDM demodulator combined with a MIMO decoder to
invert the channel on each subcarrier and extract the source symbols.
29
R. M. Rao, 2008
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
• Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
30
R. M. Rao, 2008
802.16/802.16e
• The 802.16 WirelessMAN standard includes
requirements for operation in :
– Line Of Sight (LOS), 10-66 GHz for fixed wireless systems.
– Non Line Of Sight (NLOS), <11 GHz for fixed wireless
systems.
• 802.16e (Mobile WiWax) adds enhancements for mobility
in the <11 GHz licensed and unlicensed bands.
– Operation in mobile mode is limited to licensed bands between
2 GHz and 6 GHz.
31
R. M. Rao, 2008
Scalable OFDMA parameters
Parameters
Values
System bandwidth (MHz)
1.25
5
10
20
FFT size (NFFT)
128
512
1024
2048
Sampling Frequency (Fs, MHz)
1.4
5.6
11.2
22.4
Sample Time (1/Fs ns)
714.28
178.57
89.28
44.64
Subcarrier spacing
10.94 KHz
Useful Symbol time
91.4 us
Guard interval
11.4 us
OFDMA symbol time
102.9 us
32
R. M. Rao, 2008
Link Budget
Downlink
Uplink
Transmit Power
10 Watts = 40dBm
(max=20 Watts)
200 mW = 23dBm
(max=200 mW)
Antenna Height
32 meters
1.5 meters
Antenna Gain
15 dBi (BS)
-1 dBi (mobile)
EIRP
55 dBm (approx)
22 dBm
# occupied subcarriers
840 out of 1024
840 out of 1024
Power/subcarrier
28 dBm
3.44 dBm
Noise Figure
9 dB (at mobile)
4 dB (at BS)
Total margin for interference, shadow fading, ..
(75% coverage at cell edge, 90% overall)
20 dB
20 dB
BS to BS distance
2.8 kms
2.8 kms
SNR Required (Modulation – QPSK 1/8,
(repetition code = 4)) (BER=10^-6 after FEC)
-3.31 dB
-2.5 dB
Rx sensitivity
-100.7 dB
-111.1 dB
Max allowable path loss
136.4 dB
133 dB
33
R. M. Rao, 2008
Time Division Duplexing
•
•
•
•
802.16e can be deployed in TDD and FDD environments.
Initial certification profiles are only for TDD.
The DL subframe and UL subframe lengths are adjustable.
TDD assures channel reciprocity.
Downlink subframe
Uplink subframe
Adaptive
RTG : ReceiveTransmit transition gap
TTG : TransmitReceive transition gap
Frame (j-2)
Frame (j-1)
Frame (j)
Frame (j+1)
Frame (j+2)
34
R. M. Rao, 2008
OFDMA Frame Structure
OFDMA Symbol Number
FC
H
FC
H
UL Burst SS1
DL Burst
SS4
UL Burst SS2
DL Burst
Broadcast
DL Burst
Multicast
DL Burst
SS1
(From BS2)
DL-MAP
Preamble
DL-MAP
DL Burst SS2
Preamble
Subchannel logical number
UL-MAP
DL Burst SS1
UL Burst SS3
DL Burst
SS3
UL Burst SS4
Ranging subchannel
Downlink (DL) Subframe
Uplink (UL) Subframe
TTG
RTG
DL-MAP – Downlink MAP : downlink allocations
UL-MAP – Uplink MAP
: uplink allocations
FCH – Frame control header : contains information about the DL-MAP
35
R. M. Rao, 2008
Data rates for SIMO/MIMO
configurations
64 QAM with 5/6 CTC
Source: WiMax Forum
36
R. M. Rao, 2008
Baseband Transmission Model
OFDM
Transmitter
ai,k
s(t)
Channel
r(t)
s(t)
r(n)
Inner
Receiver
Timing Delay
W0(t)
Noise
n(t)
hn,i(t)
Resulting
Channel
hi(t)
ADC
Timing Delay
d(t-eT')
Outer
Receiver
T'
r(n)
• OFDM receiver provides estimates of
–
–
–
–
Channel hn,i(t)
Frequency offset W0
Sample timing T'
OFDM symbol timing
37
R. M. Rao, 2008
Generic OFDM Transmitter
Source
Coding
e.g.
LDPC
MAC
IFFT
Insert
Pilots
Append
CP
CFR
DUC DPD DAC
RF
PA
IFFT
Insert
Pilots
Append
CP
CFR
DUC DPD DAC
RF
PA
Space-Time
Encoder
Beamforming
• Figure shows a generic MIMO OFDM Tx
– MIMO not an element of 802.11a, but it is in 802.11n,
3GPP-LTE and 802.16e
38
R. M. Rao, 2008
OFDM Receiver Architecture
Extract
Pilots
Symbol Timing
ADC
DDC
Course Freq.
Offset
Correction
DAC
Power
Est.
Extract Preamble
CP
Removal
Sample
Clock Adj.
FFT
Fine
Sample
Clock Adj
Fine Freq.
Offset Adj.
Freq. Domain
Equalizer
Channel
Estimation
Channel
Decoding, e.g.
LDPC
To/From
Network
Medium
Access
Controller
• Figure illustrates architecture for generic OFDM Rx
• Details will vary as a function of
– Packet-based versus broadcast transmission
– Existance of a preamble (or not) in the waveform
39
R. M. Rao, 2008
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
– Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
40
R. M. Rao, 2008
Digital Receiver Architecture:
Abstracted Architecture
• Common model of abstraction for digital receiver is inner/outer receiver
Receiver Abstraction
Digital IF Processing
Ø
Ø
Ø
Ø
Up-Conversion
Down-Conversion
Channelizer
Fast AGC
Inner Receiver
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Outer Receiver
Frequency Offset Estimation/Correction
Sample Clock Offset Correction
Channel Estimation/Equalization
Frame detection
AGC
Successive Interference Cancellation
Space-Time-Coding
IFFT/FFT
Per sub-carrier processing
q Beamforming
q QRD-RLS
Ø
Channel Coding
q LDPC
q TPC
q CTC
q Viterbi
q (De-) Interleave
Control, Protocol and Link Layer processing
Ø
Ø
Ø
Ø
Ø
Ø
Medium Access Control (MAC)
Link Layer Processing
CPRI
OBSAI
Ø
System Initialization, Control and Monitoring
Application
Ethernet
Ø
Ø
PCI Express
SRIO
41
R. M. Rao, 2008
Receiver Abstraction and
Projection on to Platform FPGA
Receiver
Characteristics
Function
Digital IF
Ø MAC Intensive
Processing
Inner Receiver Ø MAC intensive
Ø Some functions LUT
intensive
CORDIC in QRD-RLS
Ø FFT processing for OFDM
Ø Correlation processing for
timing
Ø Per-carrier complexity
processing (MIMO-OFDM)
FPGA
Platform
SX
SX/LX
Comments
Ø DSP48 main
requirement
Ø DSP48 leveraged
FFT
Ø FPGA fabric for
CORDIC
FFT
N TX  N RX  Num. Sub-carriers
Outer
Receiver
Ø Symbol rate tasks
Ø Channel coding
LX
Control/
Protocol
Ø
Ø
Ø
Ø
FX
Gigabit connectivity
Linux
OS “heavy” tasks
TCP/IP
Ø ACS/ACSO dominated
by low bit precision
add/multiplexors
Good match for
fabric
Lots of memory
required
Ø Embedded PPC used
Ø Rocket IO for
PCI Express
SRIO
Receiver Abstraction
SX
SX/LX
LX
FX
FPGA product portfolio
Tailored for various
processing Tasks in
communications
receiver
42
R. M. Rao, 2008
Digital Frontend
Digital upconversion (downconversion)
Crest factor reduction
Digital pre-distortion
43
R. M. Rao, 2008
High Performance
Processing
Embedded Software
Serial Gigabit
OBSAI/CPRI
Proprietary serial
backplane
Inter-chip connectivity
The Platform
DUC,DDC
CFR,DPD
RACH
Searcher
OFDM PHY
TCC
MIMO
Connectivity
MAC (Media Access)
Decision oriented
tasks
CORBA
RTOS
NBAP
SCA (JTRS radios)
High MIPs tasks
Radio PHY
Supported by embedded
DSP tiles, distributed
memory, block memory and
logic fabric
DAC
DAC
ADC
ADC
EMIF
SRIO
Logic & IO
OBSAI/CPRI
SRIO
AD/DA interface
EMIF
44
R. M. Rao, 2008
Virtex-4/5 FPGA Arhitecture
High-Level View
• FPGA family with 3 members
tailored for specific classes of
processing
– SX: DSP
– LX: Logic centric
– FX: Full featured
• Embedded PowerPC hard IP
• Giga-bit serial connectivity
• DSP processing tiles “DSP48”
45
R. M. Rao, 2008
Virtex-5 FPGA Platform
Can be configured as a
shift register
• 2 slices per CLB, 4 LUTs per CLB
• Can be configured as a shift
register
• Can be configured as distributed
memory
Can be configured as RAM
46
R. M. Rao, 2008
Virtex-4 DSP48 Slice
Scalable 500MHz Performance Not Possible Using
Standard Cell Libraries and Standard Cell Design Flow
Pipeline Registers
Enable 500Mhz
Performance
Integrated Cascade
Routing Enables
Scalable Performance
Arithmatica Parallel Counter
20% Faster Performance and
Uses Less Area
Arithmatica A+Adder
20% Faster Than
Other Implementations
47
R. M. Rao, 2008
Pipelined Multiplier
C
To Adjacent DSP48 Tile
48
BCOUT
18
3 delay latency
PCOUT
18
48
18
MS Word
LS Word
A
B
18
X
36
72
36
Y
48
48
A
B
ZERO 48
18
z-3
18
48
48
36
18
18
48
CIN

48
SUB
Z
48
P (PCOUT)
48
36b product sign extended to 48b
Register
18
Wire Shift Right By 17b
48
BCIN
PCIN
48
R. M. Rao, 2008
P
Pipelined Complex 18x18 MPY
Ai
S4
18
Bi
Ar
18
18
Pr
48
18
‘0’
S2
18
Bi
Ar
48
S3
Br
Ar
-
48
48
18
48
Pi
S1
Bi
sn = Slice n
18
18
Register
36
48
‘0’
Sign Extension
49
R. M. Rao, 2008
Wide Filters At Full Speed
Within the Virtex-4 DSP Slice Column
• Systolic N-tap FIR
– Scalable N-levels deep implementation
– N-levels deep at 500MHz performance
• Uses Integrated Pipeline Registers to
Synchronize Filter Inputs
• Utilizes Input and Output Cascade Routing
Build Massively Parallel 512-TAP FIR Filter
In a Single Device Achieving
256 GMACCs/s Performance
Equivalent Implementation Would Consume
444 Embedded Multipliers and 77,008 LCs
And Would Only Achieve ½ The Performance
50
R. M. Rao, 2008
Xilinx FFT IP (4)
• FFT fully utilizes FPGA arithmetic hardware resources
• FFT viewed as a recursion using a butterfly kernel

(  b)
CADD1
CADD2
b
(  b) e-j2pk/N
CMPY
Phase factors: e-j2pk/N
•
•
CADD{1|2}: complex adder
CMPY: complex multiplier
51
R. M. Rao, 2008
Virtex-4 DSP Slice
• DSP slice key for
implementing highperformance arithmetic
• Embedded 18x18 MPY
and 48b adder
– Butterfly phase rotator
– Cross-addition
52
R. M. Rao, 2008
Butterfly CMPLX MPY
Pr + jPi = (Ar+jAi) x (Br + jBi)
• Complex MPY used in FFT
butterfly
• Optimized to employ Virtex-4
DSP Slice
Pr
DSP Slice 2
DSP Slice 1
Ar
– 4 and 3 MPY option
•
Br
Complex MPY available as IP Bi
module†
DSP Slice 4
Ai
DSP Slice 3
Pi
† Available: 6.2i IP Update 2
53
R. M. Rao, 2008
Performance/Parallelism/Area
• FPGA: highly parallel computing machine
• Achieve performance using functional unit parallelism
• Butterfly array to produce highperformance FFT processor
• High computation rate using (possibly)
hundreds of DSP slices
– Allocate resources as appropriate to meet
system requirements
• Large memory bandwidth using multiport memory constructed from BRAMs
Mem read BW: 320 x 36 x 500e6 = 5.76 Tera-bps
• Area/throughput tradeoff delivered via Xilinx IP library
54
R. M. Rao, 2008
FFT Architecture
• For small number of carriers and modest data rates single
butterfly (I)FFT is probably suitable - Small FPGA footprint
Input Data
Data
Ram 0
switch
Iteration Engine
switch
Phase
Factor ROM
Data
Ram 1
Output Data
55
R. M. Rao, 2008
Block boundary detection/Fine
timing acquisition
1 OFDM block of
repeated data
Half an OFDM block
SAMPLES
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
||2
KNOWN
SEQUENCE
ave
Timing Est
()*
Freq Est
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
Z-1
arg
F. Tufvesson, O. Edfors, M. Faulkner, “Time and Frequency Synchronization for OFDM
using PN-Sequence Preambles”, VTC-1999/Fall, vol 4, pp.2203-7, New Jersey, 1999.
56
R. M. Rao, 2008
Fine-timing acquisition using a
clipped correlator
10 time multiplexed
correlators
in0
out0
Full precision correlators :
32 embedded multipliers
896 flipflops
in1
Register1
2
xnz
1
1
d
xn
2
addr
sysgenq
DAddr
coef f
yn
R
-1
addr
sysgen
z
bsysgen
ab
a
2
a
sysgen
cast
coeff
bc2
1
3
sysgen
cast
yn
ld
bc3
en
3
a
-1
sysgen
z
Delay2
Data Addr
BaudClkCoef Addr
BaudClk
AddSub
1
yn
en
1-bit correlator
MAC
ROM1
LD
2
-1
sysgen
z q
ld
CAddr
4
d
sub
-1
sysgen
z
-1
sysgen
z
Delay2
Delay4
-2
sysgen
z
Delay
load
FSM
1
xn
x
DAddr
DAddr
DAddr
DAddr
DAddr
DAddr
DAddr
CAddr
CAddr
CAddr
CAddr
CAddr
CAddr
CAddr
xn
xnz
LD
xnz
yn
xnz
LD
C4
C6
C7
Delay5
a
-1
sysgen
z
AddSub1
-1
sysgen
z
-8
sysgen
zen
a+b b
en
a
a+b b
en
a+b b
en
-1
sysgen
zen
AddSub12
a
xnz
LD
C5
-8
sysgen
zen
Delay1
AddSub
yn
xnz
LD
C3
Each 1-bit correlator :
10 slices
Total for clipped correlator :
589 slices
xn
yn
xnz
LD
C2
xn
yn
LD
C1
xn
yn
-1
sysgen
z
-1
sysgen
z
en
Delay3
a
AddSub13
a+b b
en
Delay7
a
xnz
LD
xn
yn
a+b b
en
xn
yn
-1
AddSub2 sysgen
z
-1
sysgen
z
-7
sysgen
zen
a
Bank of correlators
AddSub4
a+b b
en
Delay6
-1
sysgen
z
1
y
57
R. M. Rao, 2008
QRD
• One of the popular methods of matrix inversion is
based on QRD.
H  QR
H
1
1
R Q
H
• Q is Unitary and R is upper triangular
• A Unitary matrix has a trival inverse, Q  Q
• An upper triangular matrix can be inverted by
back-substitution
1
H
58
R. M. Rao, 2008
Givens Rotations
• For a 2x1 vector of real numbers
s  a   a 2  b2
   
c   b  
0
 c

s
c
a
a b
2
,s 
2



b
a b
2
2
• For a NxM matrix, repeat the process 2 cells at a time.
 a11

a
 21
 a 31
a12
a 22
a 32
 a11
a13 


a 23   a 21


a 33 
 0
a12
a 22
a 32
 a11
a13 


a 23    0


a 33 
 0

a12
a 22
a 32
a
a13 
 11

a 23    0


 0
a 33 


a12
a 22
0
a13 

a 23 

a 33 

59
R. M. Rao, 2008
Systolic Arrays
• Structured arrays with identical cells. Usually a
“boundary” cell and an “internal” cell for the QRD
process.
Internal cell
Boundary cell
1. The boundary cell generates the
rotations.
2. Internal cell applies the rotations to all
the cells in the row.
3. The systolic array in this figure can
handle any matrix below 3x3.
60
R. M. Rao, 2008
Triangularization mode
a 33
a 32
a 23
a 31
a 22
a1 3
a 21
a1 2
a1 1
The rotation factors for
zeroing out cell A(2,1)
are stored in cell
A(1,2), etc.
 a11

a
 21
 a 31
a12
a 22
a 32
 a11
a13 


a 23   0


a 33 
a
 31
a12
a 22
a 32
a
a13 
 11

a 23    0


a 33
 0


a12
a 22
a 32
a
a13 
 11

a 23    0


 0
a 33 


a12
a 22
0
a13 

a 23 

a 33 

• For QRD of upto a 3x3
matrix we need 3 boundary
cells and 3 internal cells.
• Boundary cells calculate
rotation vectors and
internal cells store them.
• Data is fed column-wise
into the systolic array.
• This may have to be
staggered depending on
the pipelining delays thru
the boundary cell and
internal cell.
61
R. M. Rao, 2008
Q-matrix computation mode
1
0
0
1
0
0
s  x * I .s  s * I .c
0
z  x * I .c  s * I .s
*
0
*
c  c;
Q A R
H
1
Q I Q
H
H
first column of Q matrix
second column of Q matrix
third column of Q matrix
1

0

 0
0
c 32
 s 32
0   c 21

s 32  s 21

c 32   0
s 21
c 21
0
Q
H
0   c 31

0
0

1    s 31
0
1
0
s 31   a11

0
a
  21
c 31   a 31
a12
a 22
a 32
A
a
a13 
 11
 
a 23  0
 
a 33   0

a12
a 22
0
a13 

a 23 

a 33 

R
62
R. M. Rao, 2008
Agenda
• Introduction to Wireless communications
– Systems design and considerations
• The wireless environment
• Link budget
• MIMO and OFDM Systems
– High level view of wireless communication systems
• Mobile WiMax, an example of wireless comm system,
• Hardware/software partitioning
• PHY/MAC etc.
• The Platform FPGA
– Overview of FPGAs and FPGA tools
– Building DSP sub-systems on FPGAs
– Digital baseband
• FPGA tools and design methodology
63
R. M. Rao, 2008
FPGA Tools for DSP Systems
Design
• Higher level tools are raising the level of
abstraction.
• Allows non-hardware engineers (algorithm
designers) to get a first look at hardware.
• System Generator
– Simulink to Hardware
• C-to-Gates tools
– C or “higher” level languages to gates
64
R. M. Rao, 2008
System Generator
System Level Modeling & Simulation Framework
HDL
C
Work in the language of your problem
65
R. M. Rao, 2008
System Generator Flow
DSP Development Flow
1. Develop Algorithm &
System Model
HDL Simulation Flow
Simulink MDL
2. Automatic Code
Generation
RTL VHDL & Cores
HDL Test Bench
Test Vectors
3. Xilinx Implementation
Flow
Bitstream
Download to FPGA
FPGA
66
R. M. Rao, 2008
Configurable MIMO-OFDM
Transmitter
Add Cyclic
Extension/Block
Shaping
Time shared
FFT across
antennas
Pilot insertion and
data loading
1
RealOut1
double
1
RealIn
DataIn
DataIn
SampleClk
3
RealOut
double
double
DataDone
Bool
2
sysgen
not
Inverter
Bool
RealOut
double
SampleClk
and
sysgen
-0
z
Bdata
double
Start
rfd
Logical2
Zeroblks
double
DataSubc
ImagOut
Preamble
Bool
double
xk_im
xn_im
double
SampleClk
Bdata
double
Enable
xk_index
FFT
double
rfd
double
RealIn
RealOut
DataRequest
Bool
vout
RealIn
double
RealOut1
ImagOut1
ImagIn
UFix_6_0
double
ImagOut
Fix_16_10
RealOut2
ImagIn
ImagOut2
Addr
RealOut3
start
Preamble
Fix_16_10
double
WriteFIFO
ReadFIFO
double
WriteFIFO
ImagOut3
BFrame
DataEnable
and
sysgen
-0
z
double
Preamble
Bdata
DataEnable
ImagOut
Zeroblks
xk_re
xn_re
ImagIn
Packetization
and Encoding
DataSubcarrier
FFTbusy
Pilot Insertion
and Data loading
double
enable
Busy
FFT
double
RealOut4
Add Cyclic Extension
BaudClk
ImagOut4
double
double
double
double
4
double
ImagOut2
double
double
Clock Generator
BaudClk
double
6
ImagOut3
RealOut4
BFrame
SampleClk
5
RealOut3
7
Logical
double
3
RealOut2
double
Spatial Demultiplexing
double
2
ImagOut1
Packet Controller
Packet
Controller
double
8
Packetization and
configurable STBC
encoding
ImagOut4
Spatial
Demultiplexing
and Interpolation
Clock
Generator
Resource sharing (folding factor)
Ratio of System clock rate to symbol rate > 8 needed for a 4 transmit antenna system
67
R. M. Rao, 2008
MIMO Receiver Architecture
Combine
PD
Block
Boundary
Detection
Packet
Detection
Block
Boundary
Rx 1
Strip
CP
Input
FIFO
FFT
Output
FIFO
Strip
CP
Input
FIFO
FFT
Output
FIFO
Rx 2
Packet
Detection
Coarse CFO
estimate
Rx 3
Packet
Detection
CFO
estimator
CFO Compensator
Packet
Detection
MIMO
Decode
Strip
CP
Input
FIFO
FFT
Output
FIFO
Strip
CP
Input
FIFO
FFT
Output
FIFO
Soft
Decisions
Coarse CFO
estimate
Rx 4
Pilot based CFO
estimator
Preamble
Packet
Controller
Payload
Samples processed at sample clock rate
Channel
Estimator
Samples processed
at system clock rate
MIMO
Decoder
FIFO
MIMO
Decoder
Matrix
(MMSE, etc)
68
R. M. Rao, 2008
Fine-timing acquisition using a
clipped correlator
10 time multiplexed
correlators
in0
out0
Full precision correlators :
32 embedded multipliers
896 flipflops
in1
Register1
2
xnz
1
1
d
xn
2
addr
sysgenq
DAddr
coef f
yn
R
-1
addr
sysgen
z
bsysgen
ab
a
2
a
sysgen
cast
coeff
bc2
1
3
sysgen
cast
yn
ld
bc3
en
3
a
-1
sysgen
z
Delay2
Data Addr
BaudClkCoef Addr
BaudClk
AddSub
1
yn
en
1-bit correlator
MAC
ROM1
LD
2
-1
sysgen
z q
ld
CAddr
4
d
sub
-1
sysgen
z
-1
sysgen
z
Delay2
Delay4
-2
sysgen
z
Delay
load
FSM
1
xn
x
DAddr
DAddr
DAddr
DAddr
DAddr
DAddr
DAddr
CAddr
CAddr
CAddr
CAddr
CAddr
CAddr
CAddr
xn
xnz
LD
xnz
yn
xnz
LD
C4
C6
C7
Delay5
a
-1
sysgen
z
AddSub1
-1
sysgen
z
-8
sysgen
zen
a+b b
en
a
a+b b
en
a+b b
en
-1
sysgen
zen
AddSub12
a
xnz
LD
C5
-8
sysgen
zen
Delay1
AddSub
yn
xnz
LD
C3
Each 1-bit correlator :
10 slices
Total for clipped correlator :
589 slices
xn
yn
xnz
LD
C2
xn
yn
LD
C1
xn
yn
-1
sysgen
z
-1
sysgen
z
en
Delay3
a
AddSub13
a+b b
en
Delay7
a
xnz
LD
xn
yn
a+b b
en
xn
yn
-1
AddSub2 sysgen
z
-1
sysgen
z
-7
sysgen
zen
a
Bank of correlators
AddSub4
a+b b
en
Delay6
-1
sysgen
z
1
y
69
R. M. Rao, 2008
MIMO-OFDM Receiver
Packet Detection
1
-1
en
RealIn1
z
RealIn1
9
PacketDetect
Delay
-1
z
2
PacketDetect
en
ImagIn1
Delay1
-1
z
3
ImagIn1
en
RealIn2
Delay2
-1
z
en
4
ImagIn2
CFO_Est
Delay4
-1
z
en
Carrier Frequency
Offset Correction
Display2
en
6
0
Fine Timing Acq
RealIn2
Delay3
-1
z
5
RealIn3
ImagIn3
ImagIn2
Delay5
-1
z
7
en
RealIn4
PktDetPulse
Delay6
-1
z
8
Baud_clk
en
ImagIn4
Delay7
MIMO Packet Detect1
9
Reset
a
b
z
-1
Delay8
a -b
Output FIFO
AddSub
RealIn
Out2
Cyclic prefix
removal
0
FFT
Display1
ImagIn
SampleClk
Clock Generator
BBDValid
BaudClk
BaudClk
Clock
Generator
Fine T iming Acquisition
BlkBounDetect
Rxreal1
ReadEnable
Out_real1
WriteFIFO
RealIn1
RxReal1
RxOut1
RxStream1
ImagIn1
Rxreal1
Rximag1
RxStream1
Out_imag1
Rximag1
RxImag1
RealIn2
RxStream1
RxStream2
Rxreal2
Out_real2
Rxreal2
RxOut2
ImagIn2
RealIn3
RxStream2
RxReal2
Rximag2
RxStream2
ImagIn3
RxOut3
RxStream3
Rxreal3
Enable
Rximag3
ReadFIFO
RxStream4
Rxreal3
Rximag3
Out_imag3
RxImag3
RxOut4
Rxreal4
Rxreal4
PacketDetect
Out_real4
Out_imag1
CFO_est
RxStream4
BaudClk
RxReal4
Rximag4
FIFO_status_f lag
FFT_Start
1
SoftDecReal1
Rximag2
Out_real3
RxReal3
RxStream3
ImagIn4
Out_imag2
RxImag2
RxStream4
RealIn4
Out_real1
RxStream3
Rximag4
Out_imag4
ValidData
ReadFIFO
FIFO_status_f lag
Cyclic Prefix Removal
Input Buffer
2
SoftDecImag1
ReadFIFO
RxImag4
CFO_Valid
Enable
Addr
Valid out
Addr
Addr
CFO_Valid
AddrOut
wreal_1_1
v alid_out
Output FIFO
wimag_1_1
10
ValidOut
FFT_RFD
Channel
Estimation
wreal_1_2
Reset
FFT_Start
wimag_1_2
FFT
wreal_1_3
wimag_1_3
ReadWeightMatrix
wreal_1_4
wimag_1_4
wreal_2_1
Rxreal1
Ch_1_1
Ch_tx1rx1
Ch_1_2
Ch_tx1rx2
wreal_1_1
Ch_1_3
Ch_tx1rx3
Ch_1_4
Ch_tx1rx4
Ch_2_1
Ch_tx2rx1
Ch_2_2
Ch_tx2rx2
Ch_2_3
Ch_tx2rx3
Ch_2_4
Ch_tx2rx4
Ch_3_1
Ch_tx3rx1
Ch_3_2
Ch_tx3rx2
Ch_3_3
Ch_tx3rx3
Ch_3_4
Ch_tx3rx4
wreal_1_4
wreal_2_4
Ch_tx4rx3
Ch_4_4
Ch_tx4rx4
wreal_4_1
Out_imag3
wreal_3_4
MIMO Decoder
6
SoftDecImag3
wimag_4_2
wreal_4_3
wimag_3_4
wimag_4_3
wreal_4_4
En
wimag_3_3
wimag_4_1
wreal_4_2
CFO_Est
wreal_3_3
wimag_3_4
ValidData
Addr
5
SoftDecReal3
wimag_3_2
wimag_3_3
wreal_3_4
Ch_tx4rx2
Out_real3
wreal_3_2
wimag_3_2
wreal_3_3
Ch_tx4rx1
wimag_3_1
wimag_3_1
wreal_3_2
Ch_4_3
wreal_3_1
wimag_2_4
wreal_3_1
Ch_4_2
4
SoftDecImag2
wimag_2_4
wimag_2_3
Rximag3
Ch_4_1
Out_imag2
wreal_2_4
wimag_2_2
wreal_2_3
Rximag4
wimag_2_3
wimag_2_1
wreal_2_2
Rxreal4
wreal_2_3
wimag_1_4
wreal_2_1
Rxreal3
3
SoftDecReal2
wimag_2_2
wimag_1_3
Rxreal2
Rximag2
Out_real2
wreal_2_2
wimag_1_2
wreal_1_3
Weight Matrix
Computation
wimag_2_1
wimag_1_1
wreal_1_2
Rximag1
wreal_4_1
wimag_4_4
wimag_4_1
ReadAddr
CFO_Est_Valid
Addr
Weight Matrix Computation
Out_real4
wreal_4_2
7
SoftDecReal4
Channel Estimation
wimag_4_2
wreal_4_3
wimag_4_3
Out_imag4
wreal_4_4
8
SoftDecImag4
wimag_4_4
BaudClk
MIMO Decoder
70
R. M. Rao, 2008
Channel Estimation
Channel Estimation Pilots
for Tx1
rst
UFix_6_0
sysgen
-2 UFix_6_0
out
sysgen
z
en
Enable
Pilots
double
Counter1
Channel Estimation Pilots
for Tx4
Enable
Enable
Delay2
Pilot_real
double
Enable
Pilot_real
Fix_2_0
Fix_2_0
Pilot_real
sel
rst
UFix_6_0
sysgen
out
-1
sysgen
Delay6en
z
Bool
Counter2
UFix_6_0
10
-2
sysgen
z
sysgen
d0
UFix_6_0
Reset
sel
Reset
Addr
d1
sysgen
d0
Mux1
Training Symbols
Tx1
UFix_6_0
Reset
Reset
double
Training Symbols
Tx3
Training Symbols
Tx2
d1
Training Symbols
Tx4
Mux
7
Addr
Delay3
5
addr
addr
Fix_16_10 -2 Fix_16_10
sysgen
1
-2
sysgen
z
Delay4
Fix_16_10
2
Fix_16_10
Fix_32_20
x sysgen
0.3535
imag_out
Real
Imag
Fix_16_10
CMult
Imag
WE
Delay5
Rximag1
real_out
Real
z
Rxreal1
real_out
Pilots1
imag_out
Fix_16_10
Real_in
double
addr
double
Imag
Imag_in
Fix_32_20
VDATA
ChEst Tx1-Rx1
CMult1
Real_in
double
1
Single Port RAM
Imag_in
addr
Real
Fix_32_20
Fix_32_20
Imag
imag_out
Real_in
Fix_16_10
Fix_16_10
Chreal3
6
addr
Chimag3
Real
Fix_32_20
VDATA
real_out
Pilots1
Imag
WE
3
Chreal2
4
ChEst Tx2-Rx1
real_out
Pilots1
double
WE
VDATA
EN
imag_out
Real
Fix_32_20
WE
Fix_32_20
x sysgen
0.3535
real_out
Pilots1
imag_out
Fix_16_10
Chreal4
8
Fix_16_10
Chimag4
Real_in
WE
Imag_in
Fix_32_20
VDATA
ChEst Tx3-Rx1
Imag_in
ChEst Tx4-Rx1
Chimag2
Chreal1
2
Chimag1
addr
addr
Fix_16_10 -2 Fix_16_10
sysgen
3
Rxreal2
Delay7
-2
sysgen
z
Fix_16_10
double
x sysgen
0.3535
imag_out
Real
Imag
Fix_16_10
CMult2
Imag
Fix_16_10
4
real_out
Pilots1
real_out
Real
z
WE
Delay8
imag_out
Rximag2
Fix_16_10
Real_in
double
double
double
double (8)
9
Chreal5
10
addr
Chimag5
Real
Imag
WE
Fix_32_20
VDATA
Imag_in
VDATA
ChEst Tx1-Rx2
CMult3
imag_out
Real_in
double
Imag_in
addr
Real
double
Fix_32_20
Imag
11
Chreal6
ChEst Tx2-Rx2
12
Single Port RAM1
real_out
Pilots1
double
WE
Fix_32_20
x sysgen
0.3535
EN
real_out
Pilots1
imag_out
Real_in
Chreal7
double
addr
real_out
Pilots1
Fix_16_10
14
Chimag7
WE
VDATA
simout11
13
Fix_16_10
Real
Imag
imag_out
Fix_16_10
15
To Workspace2
Chreal8
Fix_16_10
16
Real_in
Chimag8
WE
Imag_in
Fix_32_20
VDATA
ChEst Tx3-Rx2
Imag_in
ChEst Tx4-Rx2
Chimag6
19
17
addr
addr
5
Fix_16_10-2
sysgen
z
Fix_16_10
Rxreal3
Fix_16_10
Delay9
6
-2
sysgen
z
Rximag3
Real
Fix_16_10
Fix_32_20
x sysgen
0.3535
Real
Imag
Fix_16_10
CMult4
Imag
WE
Delay10
real_out
imag_out
Fix_16_10
Fix_16_10
Chreal9
18
Pilots1
real_out
imag_out
Real_in
Fix_16_10
x sysgen
0.3535
EN
VDATA
Imag_in
real_out
imag_out
Real_in
Fix_16_10
Chreal10
20
Fix_16_10
Chimag10
VDATA
addr
real_out
Pilots1
Real
Fix_32_20
Imag
WE
Fix_32_20
imag_out
Real_in
Fix_16_10
21
Chreal11
Fix_16_10
Fix_32_20
VDATA
addr
real_out
Pilots1
22
Fix_32_20 Chimag11
WE
Imag_in
ChEst Tx2-Rx3
ChEst Tx1-Rx3
CMult5
Real
Imag
Fix_32_20
WE
Fix_32_20
addr
Pilots1
Chimag9
Real
Imag
imag_out
Fix_16_10
Fix_16_10
23
Chreal12
24
Chimag12
Real_in
WE
Imag_in
Fix_32_20
VDATA
ChEst Tx3-Rx3
Imag_in
ChEst Tx4-Rx3
Single Port RAM2
25
27
Chreal13
addr
addr
7
Fix_16_10
-2
sysgen
z
Fix_16_10
Rxreal4
Delay11
8
-2
sysgen
z
Real
Fix_16_10
Fix_32_20
x sysgen
0.3535
Real
Imag
Fix_16_10
CMult6
Imag
Fix_16_10
WE
Delay12
imag_out
Rximag4
real_out
Pilots1
real_out
Fix_16_10
imag_out
Real_in
Fix_16_10
VDATA
addr
Fix_32_20
real_out
Pilots1
Real
WE
Fix_32_20
Chreal14
26
Chimag13
Fix_16_10
Imag
imag_out
Real_in
Fix_16_10
Fix_16_10
Fix_32_20
VDATA
28
Chimag14
Chreal15
addr
Fix_32_20
real_out
Pilots1
Real
WE
Imag_in
29
Imag
imag_out
Real_in
Fix_16_10
Fix_16_10
Fix_32_20
VDATA
addr
real_out
Pilots1
Real
Fix_32_20
WE
Imag_in
30
Chimag15
Imag
imag_out
Fix_16_10
31
Chreal16
Fix_16_10
32
Chimag16
Real_in
WE
Imag_in
Fix_32_20
VDATA
Imag_in
x sysgen
0.3535
EN
CMult7
ChEst Tx1-Rx4
ChEst Tx2-Rx4
ChEst Tx3-Rx4
ChEst Tx4-Rx4
Single Port RAM3
9
double
ValidData
11
double
and
sysgen
-2
z
Bool
Logical
ChEstPilots
-3
sysgen
z
ValidData
ChEstRst
En
Rst
ChEstPilots
En2
ChEstPilots1
Bool
Delay1
ChEstEn
double
double
double
double
double
ControlSignals
Input FIFO
12
4x4 Channel Estimation
Memory
double
ReadAddr
Control Signals
71
R. M. Rao, 2008
Packet Detection
Identical halves of 1 OFDM symbol
Schmidl and Cox algorithm for Packet
Detection and coarse carrier frequency
offset estimation.
T. M. Schmidl, D. C. Cox, “Low Overhead Low
Complexity Synchronization for OFDM”,
ICC 1996, Vol 3, pp 1301-1306.
r(n)
*
Z-D
*
C
c(n)
P
p(n)( )

2
m(n)
2
2
RealIn1
double
1
RealIn
Fix_16_8
sysgen
force
sysgen
cast
Reinterpret1
Fix_8_0
sysgen
z-32
en
Convert
RealOut
ImagIn1
Fix_8_0
RealIn2
ImagOut
ImagIn
sysgen
force
Fix_16_8
Reinterpret
sysgen
cast
BaudClk
Convert1
Fix_8_0
sysgen
z-32
en
Delay1
3
Bool
Fix_16_0 RealIn1
ImagIn
Correlate
double
RealIn
ImagIn1
Fix_8_0
BaudClk
2
CorrMetric
RealOut
Complex
Multiply
ImagIn2
Delay
Fix_8_0
Fix_8_0
ImagOut
Complex
double
RealOut
Multiply
a
Fix_16_0
a>b
sysgen
z-1
En
BaudClk
1
PacketDetect
Complex Sliding
Window Averager
Power Calculator1
b
Relational
RealIn1
ImagIn1
double
Complex
Fix_8_0
PwrOut
Multiply
DecisionMetric = (CorrelationPeak >= 0.5(AveragePower))
In
In1
Out
Fix_16_0
Out1
BaudClk
BaudClk
Power Calculator
Fix_32_0 X >> 1
sysgen
z-0
double
Shift
En
3
Sliding Window
Averager
Squarer
AvePower
72
R. M. Rao, 2008
Two Branch CFO estimation using
Schmidl and Cox algo
Ye jq
Y
Carrier Frequency Offset causes a linearly increasing rotation in the time domain
1
RealIn 1
[a:b]
Slice 2
reinterpret
RealIn 1
a
RealOut
ImagIn 1
Reinterpret 4
z
en
-32
Delay
RealIn 2
RealIn
Complex
Multiply
ImagIn 2
RealOut
ImagIn
ImagOut
BaudClk
BaudClk
Rst
ImagOut
Complex Multiply 2
2
ImagIn 1
[a:b]
Slice 1
reinterpret
b
Reinterpret 1
z
en
Complex Sliding
Window Averager
-32
Delay 1
a
-1
za + b
b
RealIn 1
AddSub 1
ImagIn 1
RealIn 2
1
CorrMetric _real
Squarer
RealOut
In
ImagIn 2
BaudClk
BaudClk
Out
en
Magnitude -Squared 1
Rst
z
-2
Delay 2
a
Sliding Window
Averager
-1
za + b
b
2
CorrMetric _imag
AddSub 2
3
RealIn 2
[a:b]
Slice 5
RealIn 1
reinterpret
RealOut
ImagIn 1
Reinterpret 2
z
en
-32
Delay 3
RealIn 2
ImagIn 2
RealIn
RealOut
Complex
Multiply
ImagIn
BaudClk
ImagOut
BaudClk
Rst
ImagOut
3
4
ImagIn 2
[a:b]
Slice 3
reinterpret
Reinterpret 3
z
en
-32
Delay 4
5
BaudClk
Complex Multiply 3
Complex Sliding
Window Averager 1
AvePwr
Combine the metric
from both Antennas
6
Rst
73
R. M. Rao, 2008
Carrier Frequency Offset
Estimation
• Pre-FFT
– Uses a dedicated preamble or symbol for CFO estimation
• Post-FFT using channel estimation pilots
– Uses channel estimation training symbols
• Post-FFT CFO Tracking
– Needs continuous pilots during payload symbols
• CFO Estimation using Cyclic Prefix
– Works well when you have a lengthy cyclic prefix
– Examples: WiMax, 3GPP-LTE, DVB-T/H
– Does not need preamble or pilot support
74
R. M. Rao, 2008
Pre-FFT Carrier Frequency Offset
Estimation
z-14
en
1
RealIn 1
qˆ
Delay5
2
ImagIn1
N 
2p  s 
 2 
RealIn 1
ImagIn 1
3
RealIn 2
CorrMetric _ real
In 1
Out 1
CorrMetric _ imag
In 2
Out 2
In 3
Out 3
x
ImagIn 2
BaudClk
5
Baud_clk
Rising edge
detector
z -17
RealIn 2
4
ImagIn2
mag
Out 1In 1
7
BBD
AvePwr
Rst
Packet Detection 3
z-24
en
y
atan
x 0 . 003906
-2
z
cast
Convert
CORDIC ATAN
Truncate
CMult8
d
rstz - 1 q
en
1
CFO_Est
Register1
Delay6
6
Rst
The angle of the correlation metric is
proportional to the Carrier frequency offset.
Right size the number of bits before the
CORDIC operation.
CORDIC ATAN from the Xilinx Math library
calculates the angle.
75
R. M. Rao, 2008
Post-FFT CFO Estimation and
tracking
Location of channel estimation
training symbols for Antenna 1
for a 2 antenna MIMO system
A subset of channel estimation
training symbols is used for CFO
estimation
Angular rotation
on symbol 1
q (k )
Proportional to
CFO
Angular rotation
on symbol 2
mean(q (k ))
eˆ 
c
N
2p (1  CP )
N
s
CFO causes a linear
rotation every sample in
the time domain.
CFO causes a constant
rotation on all subcarriers
in the frequency domain.
This rotation increases
from OFDM symbol to
symbol and can be used
to estimate CFO.
76
R. M. Rao, 2008
Carrier Frequency Offset
Correction
9
Direct digital synthesizer (DDS) from
the Xilinx DSP SysGen library.
Fix_16_12
Fix_16_16
x 0 . 01563
CFO_Est
freq_off
cos_out
11
CMult
Fix_16_15
Enable
CFO_Est_valid
12
double
sin_out
Reset
Fix_16_15 Fix_17_15
x(- 1 )
Reset
DDS
1
Fix_16_10
z
Fix_16_10
-1
RealIn 1
2
z
Fix_16_10
Negate 1
RealIn 1
Fix_16_10
-1
RealIn 2
Delay 1
ImagIn 1
RealOut
ImagIn 1
Delay
1
RealOut 1
ImagIn 2
BaudClk
Fix_16_10
Complex
Multiply
ImagOut
Fix_16_10
2
ImagOut 1
Complex Multiply
3
Fix_16_10
z
Fix_16_10
-1
RealIn 2
4
z
Fix_16_10
RealIn 1
Fix_16_10
-1
RealIn 2
Delay 3
ImagIn 2
RealOut
ImagIn 1
Delay 2
3
RealOut 2
ImagIn 2
BaudClk
Fix_16_10
Complex
Multiply
ImagOut
Fix_16_10
4
ImagOut 2
Complex Multiply 1
5
Fix_16_10
z
Fix_16_10
-1
RealIn 3
6
z
RealIn 1
Fix_16_10
-1
Fix_16_10
RealIn 2
Delay 5
ImagIn 3
RealOut
ImagIn 1
Delay 4
5
RealOut 3
ImagIn 2
BaudClk
Fix_16_10
Complex
Multiply
ImagOut
Fix_16_10
6
ImagOut 3
Complex Multiply 2
7
Fix_16_10
z
Fix_16_10
-1
RealIn 4
8
z
Fix_16_10
-1
RealIn 1
Fix_16_10
RealIn 2
ImagIn 4
RealOut
ImagIn 1
Delay 6
Delay 7
ImagIn 2
1
Constant 3
Bool
BaudClk
Fix_16_10
ImagOut
7
RealOut 4
Complex
Multiply
Fix_16_10
8
ImagOut 4
Complex Multiply 3
Bool
or
z -0
Logical 1
0
UFix_16_0
a a <b
Constant 1
bz
10
Bool
In1
Out1
Bool
rst
out
UFix_16_0
Bool
-0
Relational
and
-0
z
Bool
FFT_Start
Rising edge
detector
aa < = b
-0
bz
Counter
78
UFix_16_0
Bool
Logical
Relational 1
Constant 2
77
R. M. Rao, 2008
Design methodology issues
• FPGA tools
– Where to from here?
• C-to-gates
– Higher level design languages to gates
– Raising the level of abstraction
78
R. M. Rao, 2008
End of Roadmap for the
Von Neumann Model
Clock L2 $
frequency
scaling L1 $
CPU
Source: Ronen [2001]
Multi-core Arrays
2005 - ????
Concurrent
programming
Spot the CPU!
CPUs are as smart as they can be!
SPECInt92/MHz
1945-2005
Sequential
programming
MHz
Divide and
conquer
Source: Agarwala [2002]
With Moore’s law you
also get leakage!
Absolute
power limits
Source: Zu & Baas [2006]
6x6 GALS Processor Array
TI 6416
Source: Borkar [1999]
79
R. M. Rao, 2008
Merging Mindsets:
Software Design vs. Hardware Design
class A
class C
 Encapsulation
 Abstraction
 Portability
 Re-use
start()
class D
class B
resourceA
resourceB
resourceC
Implementation Detail 
Control Logic 
Interface Glue 
Concurrency 
Communication 
Architecture 
 Events
 Protocols
 Ordering
Clocks 
 Sequential
Signals 
execution
Timing 
Combining the strengths of both paradigms can bring about a radical
improvement in hardware/software system design productivity.
80
R. M. Rao, 2008
Objective for a New Methodology:
reduce design cost (by a lot)
• Quality of result (QoR) is not a design goal!
Ø Performance, power, BOM cost budgets make QoR a design constraint
• The real objective is to meet the QoR target and minimize:
Ø Non-recurring engineering costs (NRE)
Ø Time-to-market (TTM)
• The new methodology should save on design cost by enabling
Ø Design of portable, retargetable, composable IP blocks
Ø Rapid design space exploration and system composition
QoR
performance/$
performance/W
Abstraction
Profit
Traditional HDL Flow
New methodology
abstraction
cost
Total Design Cost
NRE $, TTM
81
R. M. Rao, 2008
‘C’ or higher level language to
Gates
• There is interest in higher level design
methodologies, such as C-to-Gates from the
design community.
• ESL (Electronic system level) tools/design
methodologies are being explored.
• But, extracting all the concurrency from a
sequential description is not an easy problem.
82
R. M. Rao, 2008
Actor/Dataflow Programming Model
actors
guarded atomic actions
point-to-point, buffered
token-passing connections
Actions
State
encapsulated
state
• A well-known and researched model for concurrent systems
– Edward Lee et. al. (UC Berkeley)
– Arvind et. al. (MIT)
• Broadly applicable to heterogeneous HW/SW systems
• Actors are described in the CAL language (UC Berkeley)
– Open source simulator available from SourceForge
– Under consideration as reference model for MPEG
83
R. M. Rao, 2008
Conclusion
• FPGAs are finding wide use in infrastructure
communication systems and signal processing
systems.
• FPGA are an efficient choice for exploring VLSI
architectures.
• FPGA tools are raising the level of abstraction to
allow algorithm designers the ability to explore
h/w architectures without learning “h/w design
tools/languages”.
84
R. M. Rao, 2008
Questions?
85
R. M. Rao, 2008
Descargar

Introduction to OFDM and MIMO with emphasis on …