Rapid Prototyping Using Field
Programmable Devices
Allen C.-H. Wu
Department of Computer Science
Tsing Hua University
Hsinchu, Taiwan 30043, ROC
email: [email protected]
1
Outline
Introduction to programmable logic devices
and rapid prototyping.
 FPGA design technologies and applications.
 Logic emulation.
 Reconfigurable computing and systems.

2
Part I
Introduction to Programmable Logic
Devices and Rapid Prototyping
3
Programmable Logic Devices
SPLDs (simple PLDs).
 CPLDs (complex PLDs).
 FPGAs (field programmable gate arrays).
 SPGAs (system-programmable gate arrays).

4
Programmable Interconnect
Components
FPID: I-Cube.
- Dynamic switching.
- Communication switches, network routes.
- 32-320 programmable I/O ports.
- Up to 150 MHz clock frequency.
 FPIC: Aptix.
- 1024 programmable I/O ports.

5
SPLD
Universal designs.
 Useable gates < 1,500 gates.
 Speed is the main advantage.
 0.5um CMOS -> 3.5ns logic delays
-> 200 MHz.
 Market is shrinking 5-7% per year.

6
CPLD
Rising densities/performance and declining
prices => become a good choice for many
applications.
 100K gates today, 250K gates by 1998.
 Low-density CPLD (32 macrocells/44 pins)
-> 5ns logic delays,
high-density CPLD (128 macrocells/100
pins) -> 7.5ns.

7
FPGA
FPGA
Antifuseprogrammed
SRAMprogrammed
Island
Actel ACT1 & 2
Quicklogic’s pASIC
Crosspoint’s CP20K
Xilinx LCA
AT&T Orca
Altera Flex
EPROMprogrammed
Cellular
Toshiba
Plesser’s ERA
Atmel’s CLi
Altera’s MAX
AMD’s Mach
Xilinx’s EPLD
8
Categories of FPGA’s
Block organized, SRAM based.
 Channel organized, antifuse based.
 SOP organized (each logic cell likes a PAL
device), various programming techniques.

9
Block organized, SRAM based
S
S
L
S
L
S
L
S
S
S
S
L
S
L
S
S
L
L
L
S
S
S
L
S
S
10
SRAM Programming Technology
SRAM
cell
Pass
transistor
i1
SRAM
cell
“1” -> “on”
“0” -> “off”
i2
Mux
“1” -> o = i1
“0” -> o = i2
o
11
SRAM Programming Technology
Advantages:
- Reprogrammability.
- Quality -> parts are fully tested at the
factory.
- Standard process technology.
 Disadvantages:
- Volatile -> FPGA must be reprogrammed
each time when power is applied.
- Need an external memory to store the
program.
- Large area (6 trs for 1 cell + 1 switch).

12
Cell Organized and Antifuse
Based
L
S
S
13
Antifuse Programming
Technology
Poly
Dielectric
Substrate
Small antifuse area!
Diffusion
- Normally in high-Z state.
- Can be fused to low
impedance.
- High-voltage melts dielectric
causes link poly and diffusion.
14
EPROM/EEPROM Technology
EPROM can be reprogrammed, no need for
external storage.
 EPROM can not be re-programmed in
circuit.
 EEPROM can be re-programmed in circuit.
 EEPROM consumes 2X more area as
EPROM.

15
Erasable PLD (EPLD)
SOP-based PAL
Logic array
In, Out, bidirection
Registers
I/Os
Configured to
D, T, JK, SR FFs.
Programmable clock
to each FF.
16
Programming the FPGA
Configuration.
 Readback - design verification and
debugging.
 Security - a security-bit to prevent readback.

17
Advantages and Disadvantages of
FPGA
Fast turnaround.
 Low NRE (non-recurring engineering)
changes.
 Low risk.
 Effective design verification.
 low testing cost.
 Chip size & cost.
 Slow speed.

18
CPLD Vs. FPGA
CPLD
Interconnect style
Architecture and timing
Software compile times
In-system performance
Power consumption
Applications addressed
Continuous
Predictable
Short
Fast
High
Combinational and
registered logic
FPGA
Segmented
Unpredictable
Long
Moderate
Moderate
Registered
logic only
Source: Altera
19
FPGA Selection Criteria
Density.
 Speed.
 Price.
 Flexibility.

20
SPGA
Allow multiple building blocks.
 Logic.
 Memory.
 Data path.

21
Applications Using SPGAs
Intellectual property (IP).
 Communication & networking.
 Graphical processing.
 Embedded processing.

22
Designing with SPGAs
A team-based approach.
 Understanding how to use SPGA system
features will be the key to pulling the entire
design into a single device.

23
CMOS PLD Market Share
Other
31%
5%
5%
6%
24%
Cpress
3%
AT&T
Actel
Lattice
AMD
Altera
11%
Xilinx
15%
Source:dataquest
24
CMOS Logic Market
8%
14%
Std logic
10%
Programmable
GA
30%
9%
Std cell
Custom
Chipset
29%
Source:dataquest
25
FPGAs Growth
2500
2000
1500
M USD
1000
500
0
1996
1997
1998
1999
2000
Source: Integrated Circuit Engineering
26
CMOS Programmable-logic
Market
5
4
3
B USD
2
1
0
1997
1998
1999
2000
Source:dataquest
27
Rapid Prototyping
What?
 Why?
 How?

28
What is prototyping?
Basic components: FPGAs and FPICs.
 Hardware : boards, boxes, and cabinets.
 Software: methodologies and CAD tools.

29
Product Development Cycle
Market survey
Customer
acceptance
Product development
Production
30
Pressures on Today’s Product
Development
Time-to-market!
 Design complexity!

31
Why Needs Prototyping?
Design verification.
 Limited production.
 Concurrent engineering.

32
Design Verification
Specification
Functionality &
requirements
?
Final product
Final functionality
& performance
33
Design Process
Specification
System-level design
RTL design
Logic-level design
Physical-level design
Simulation
Fast prototyping
Formal verification
Logic emulation
Final chips
34
Verification Alternatives
Modeling System Prepare
accuracy integration time Speed
Event Driven Simulation
High
No
Short
Slow
Cycle-Based Simulation
Med.
No
Short
Med.
Behavioral Simulation
Low
No
Short
Med.
Hardware Accelerated Sim Varies No
Med.
Med. Fast
Breadboarding
Long Very Fast
Med.
Emulation or Prototyping Med.
Yes
Yes Med. Very Fast
35
A Minute in the Life of a 100K
Gates Design
1 --------- Actual hardware at 50MHz
10 -------- Logic emulator or prototype at 5MHz
100------2K-------- HW accelerator at 250M evals/sec
1 Mon. 50K------- Cycle-based simulator at 1K insts/sec
3 Mon. 120K----- Compiled-code logic simulator at 125MIPs
1.5 Yr. 800K----- Event-driven logic simulator at 125 MIPs
36
Development with Prototyping
SW
Design
Code
HW
Design
Build
CHIP
Design
Fab
Integration
Integration
Debug
Debug
Debug
37
Development with Prototyping
SW
HW
CHIP
Design
Design
Design
Integration
Code System
& SW Debug
Build
HW Integration
& Debug
Chip debug
Final
Integration
Fab
38
How to Develop a Prototyping
using FPDs
Custom-designed prototyping board.
 Logic-emulation systems.
 Field-programmable printed-circuit-boards.

39
Part II
FPGA Design Technologies and
Applications
40
FPGAs
What? - Programmable logic +
programmable routing = FPGAs.
 Why? - Zero NREs, easy bug fixes, and
short time-to-market.
 How?

41
Comparison of Different Design
Technologies
Design time
Fabrication
Chip area
Design cost
Unit cost
Design cycle
Custom Std Cells Gate Arrays
Long
Short
Short
Long
Long
Short
Small
Med.
Large
High
Med.
Low
Low
Low
Med.
Long
Med.
Short
FPGAs
Short
None
Very large
Very low
High
Very short
42
Emerging FPGA-based
Applications
Low-volume production.
 Urgent time-to-market competition.
 Rapid prototyping.
 Logic emulation.
 Custom-computing hardware.
 Reconfigurable computing.

43
Design Considerations
Target architecture.
 Fixed logic and routing resources.
 Fixed I/O pins.
 Slow signal delays.

44
An HDL-based Design Flow
HDL design specification
RTL synthesis
Verification
(Simulation)
Logic synthesis
Physical synthesis
FPGAs
45
Design Specification
HDLs - VHDL and Verilog.
 Why needs an HDL-based design
methodology?
 Target Applications.
 Coding Styles.
 Design representation.
 Design entry.

46
Why Needs an HDL-based
Design Methodology
Then
Design complexity
Schematic capture
Component mapping &
may be some logic
optimization
Now
HDL design
specification
Synthesis
Place & route
Place & route
Layouts
Layouts
SW : assembly language => high-level language
47
Target Applications and Layout
Architectures



Datapath dominated
designs : DSPs and
processors.
Control dominated
designs: controllers
and communication
chips.
Mixed type of designs.




Bit-sliced stacks.
Standard cells.
Macro-cell-based.
FPGAs.
48
HDL Coding Styles Vs. Design
Quality
Ideas?
HDL
spec1
HDL
spec2
HDL
spec3
Synthesis system
Design1
Design2
Design3
49
Coding Styles and Design
Representation




Hierarchical style
Structural style
Random style
FSMD
module MUX2(o,i1,i2,sel);
output[1:4] o; input[1:4] i1,i2;
input sel; reg[1:4] o;
always
case(sel)
1’b0: o = i1;
1’b1: o = i2;
endcase
endmodule



Behavioral level
Logic level
Gate level
module MUX2(o,i1,i2,sel);
output[1:4] o; input[1:4] i1,i2;
input sel;
assign o[1] = ((sel&i1[1])|(~sel&i2[1]));
assign o[2] = ((sel&i1[2])|(~sel&i2[2]));
assign o[3] = ((sel&i1[3])|(~sel&i2[3]));
assign o[4] = ((sel&i1[4])|(~sel&i2[4]));
endmodule
50
RTL Synthesis
HDL compilation.
 Design representation.
 Component selection.
 Component generation.
 Resource sharing.

51
Logic Synthesis
Logic minimization.
 Technology dependent/independent
minimization.
 Technology mapping.

52
Physical Synthesis
Placement.
 Routing.

53
Logic Synthesis Problems for
FPGAs
How to synthesize a logic network to realize
a given function.
 How to realize a logic network using
FPGAs.
 How to optimize a given network for area
and timing.
 How to synthesize routable circuits.
 How to solve these problems efficiently.

54
Representation of Boolean
Functions
Truth tables.
 Factored forms: SOP and POS.
 BDD.
 Boolean networks.

55
Synthesis with Multiplexers
Boolean
equations
HOW?
d0
d1
d2
d3
y
d4
d5
d6
d7
s1 s2 s3
56
Synthesis with Look-Up-Table
(LUT)
Boolean
equations
HOW?
d0
d1
d2
d3
y
LUT
d4
d5
d6
d7
57
An Example
XOR(a,b) = a’b + ab’
1 0
d0
d1
d2 MUX
d3
a
y
b
Decoder
RAM
0
1
1
0
s0 s1
58
Multilevel Logic Minimization
MIS and SIS by UC Berkeley.
 Optimization for timing, area, and power.
 Technology independent.

59
Technology Mapping for FPGAs
Technology mapping is the process of
binding technology dependent circuits to
technology independent circuits.
 Technology mapping for FPGAs consists of
two steps: (1) decomposition and (2)
covering.
 Technology mapper optimizes the final
circuit by selecting sub-networks which are
covered by LUTs.

60
Technology Mapping for FPGAs
LUTs have fixed number of inputs, k-input,
which can implement logic functions up to k
variables.
 Nodes and sub-networks with at most k
inputs in a Boolean network are referred to
feasible nodes and sub-networks else
infeasible.
 Infeasible nodes need to be decomposed into
a set of feasible nodes so that a circuit
covering the network exists.
61

Technology Mapping for FPGAs

An FPGA-based technology mapper
performs three tasks:
1. Decomposition - It decomposes infeasible
expressions into feasible ones.
2. Reduction - It groups small expressions
into CLBs to promote sharing of resources.
3. Packing - It allocates CLBs to expressions
that cannot be shared.
62
Technology Mapping for FPGAs

The optimization goals for FPGA-based
technology mapping include:
1. The number of CLBs,
2. The number of levels of CLB circuits, and
3. Routable designs.
63
Decomposition

Decomposition consists of three steps:
1. Identify divisors which are common to
many functions.
2. Introduce the divisor as a new node.
3. Re-express existing nodes using the new
nodes.
64
An Example

Given the expression
f=
ab’+ac’+ad’+a’b+bc’+bd’+a’c+b’c+cd’+b’d+c’d

Suppose a factor found is
p = a+b+c+d

f can be re-expressed based on p:
f = p(a’+b’+c’+d’)
65
Decomposition Techniques
Disjoint decomposition.
 Shannon cofactoring.
 Roth-Karp decomposition.
 Algebraic decomposition.
 AND-OR decomposition.

66
Disjoint Decomposition
Disjoint decomposition can be found by
searching through all possible partitions of
inputs to the infeasible nodes, and using well
known methods, such as residues, to
determine if each partition leads to a disjoint
decomposition.
 Disadvantage: the number of partitions
grows exponentially with number of inputs
to the infeasible nodes.

67
Shannon Cofactoring
The residue of a function f(x1,x2,..,xn) with
respect to a variable xj is the value of the
function for a specific value of xj. It is
denoted for xj=1 and by f(xj’) for xj=0.
 Ex. The residues, wrt a, of
f(a,b,c,d)=ab+bc+bd’+a’cd
are f(a’)=bc+bd’+cd and f(a)=b
then f(a,b,c,d)=a’f(a’)+af(a)

68
Roth-Karp Decomposition
Try to decompose a function into the form:
f(x,y) = g(z1(x), z2(x),..,zt(x), y)
x: the bound set
y: free set
 Based on the concept of compatible classes.
 The xl_k_decomp operation in SIS for
decomposition of k-input LUTs.
 Computationally expensive. It is useful for
small designs with high degree of symmetry.

69
Algebraic Decomposition
Based on factored from representation and
algebraic operations.
 Manipulating algebraic expressions as
polynomials; I.e., xi and xi’ are different
variables.
 To reduce search, only common cube factors
are kernels are used.
 Ex. x = ac+bc+bd+ce
y = a+b+c and x = cy + bd

70
AND-OR Decomposition
Ensure that any infeasible node is
decomposed into a set of feasible nodes.
 Can be used to decompose large infeasible
nodes into infeasible nodes that are small
enough to make an exhaustive search for
disjoint decomposition practice.
 Ex. F = ab+ac+bc can be decomposed into
v=ab, w=ac, x=bc, y=v+w and z=y+x

71
Covering
Graph-covering - for each node, find all the
matches which cover that node. Then
formulate as a covering problem.
 Tree-covering - an approximation to graph
covering. Since average tree size is small,
optimally of tree-covering can be obtained
using a dynamic programming method.

72
Covering Techniques
Decomposition-based covering using bin
packing.
 Covering reconvergent paths.
 Replication of logic at fanout nodes.
 Covering using edge visibility.

73
Tree-based Technology Mapping
Methods
Chortle, Chortle-crf, and Chortle-d.
 Hydra.
 TM-based on edge visibility.
 mis-PGA.

74
Graph-Based Technology
Mapping Methods
DAG-Map.
 Flow-Map.
 Area/depth trade-off.

75
Layout-Driven FPGA Synthesis
Mapping directed synthesis.
 Mapping with resynthesis.
 Combining technology mapping and
placement.
 Routability-driven technology mapping.

76
Performance-Driven Methods
mis-pga (xln_p) - mapping with synthesis.
Logic synthesis during a timing driven
placement.
 M.map - interwinded mapping and
placement procedures by taking into account
wiring delays.

77
Routability-Driven Methods
Alternative wires - attempt to identify
alternative wires and alternative functions
for wires that cannot be routed due to the
limited routing resources.
 Balanced routing resources and cell
resources by trading off the routability with
the compactness of a design. Try to deliver
routable designs by controlling directly the
pins-per-cell ratio of the design.

78
Sequential Synthesis for FPGAs
Each CLB has two FFs.
 Not much work has done in this area.
WHY?
 Two attempts were conducted by the UCB
group: map combinational and sequential
circuits simultaneously and separately.
 How the Xilinx’s APR handles the
sequential circuits?

79
Placement
S
S
L
S
CLB netlist
L
S
L
S
Assign logic to cells
S
S
S
L
S
L
S
S
L
L
L
S
S
S
L
S
S
80
Routing
S
S
L
S
L
S
L
S
S
S
S
L
S
L
S
S
L
L
L
S
S
S
L
S
S
Realized interconnection by turning on
switches of routing resources.
81
Placement & Routing Methods
Placement - simulated annealing is the
commonly used method.
 Routing - routability-driven and timingdriven.
 Time-consuming design tasks.
 Architectural dependent.

82
HDL-based Design Flow for
Multi-FPGA Designs
HDL description
HDL synthesis
Netlists
Partitioning
Partitioned netlists
83
Basic Partitioning Techniques
The min-cut partitioning:
. The Kernighan-Lin algorithm.
. The Fiduccia and Mattheyses algorithm.
. The Krishnamurthy algorithm.
 The ratio-cut algorithm.
 A variety of clustering algorithms.

84
Multi-FPGA Partitioning
Constraints:
1. Fixed number of I/O pins.
2. Fixed number of CLBs.
3. Utilization.
 Objectives:
1. Cost minimization.
2. Delay minimization.

85
Circuit-Level Partitioning
Methods
Multiway partitioning methods based on the
min-cut algorithm.
 Interconnect minimization by cell
replication.
 Clustering-based partitioning methods cone.
 Combining top-down partitioning and
bottom-up clustering methods.

86
Considerations for Multi-FPGA
Partitioning
Limited IO-pin and logic resources.
 Logic utilization is predominated by IO-pin
limitation.
 How to alleviate the IO-limitation problem
is the key to improve the logic utilization of
FPGA chips.

87
Combining HDL Synthesis and
Partitioning
HDL description
HDL synthesis
Netlists
Bridging HDL
synthesis and
partitioning?
Partitioning
Partitioned netlists
88
Design Considerations
Datapath-dominated
Control-dominated
HDL Spec.
Varying coding
styles
Application-Oriented Synthesis
Module-based
Fine-grained
Bit-sliced
Function-based
89
Coding Styles
Top
Top
Mod1
Mod2
Mod1_1
Mod2_1
Mod1_2
Mod2_2
M11
Top
M1
M1
M2
M12
M21
M22
Top
M2
M11
M12
M21
M22
90
The FSMD Coding Style
Top
Top
CU
DP
CU1
DP1
CU2
DP2
CU1
CU
DP
CU2
DP1
DP2
91
Integrated HDL-Synthesis and
Partitioning Methodology
HDL descriptions
Module-based
HDL synthesis
Fine-grained
HDL synthesis
Bit-sliced-based
HDL synthesis
Circuit-level
partitioning
Covering-based
partitioning
Bit-sliced-based
partitioning
P&R
FPGAs
92
Module-based HDL Synthesis
Top
M1
M2
Mn
93
Fine-Grained HDL Synthesis
Top
M1
M2
P1
F1
Mn
Pm
F2
Clusters
94
A Process Example
Process{P1}
input[0:3] i1,i2;
input i3;
output[0:3] o1;
output o2;
o1 = i1 + i2;
o2 = i1[0] & i3;
i2
i1
P1
o1
o2
i3
o1[0]
o1[3]
f1.0
f1.3
f2
4
4
+
&
o1
o2
4
95
Functional-based Clustering
Design
Design
Module{M1}
M1
Process{P1}
Process{P2}
P1
f1
M2
P2
f2
Module{M2}
96
Bit-Sliced-Based Synthesis
[0]
Mux[0:7]
Mux[0:5]
[5]
[7]
Mux
Mux
Adder[0:7]
Adder
97
Functional Clustering
DP
Mux
DP[0]
[0]
[7]
Mux
Adder
Mux[0]
DP
Mux[0]
[0]
Adder[0]
[5]
Mux[7]
[0]
[7]
DP[7]
Adder[7]
98
Part III
Logic Emulation
99
What is a Logic Emulation
System
A programmable hardware built with
programmable logic and programmable
interconnect devices.
 A software which automatically programs
the hardware according to the circuit under
design.
 Control HW/SW to support operation of the
emulated design as a hardware component
operating in real time.

100
Typical Logic Emulation
Environment
Compiler, runtime software
Workstation
Logic Emulator
Logic Module
Target System
In-circuit
Interface
Probe Module
Stimulus generator, logic analyzer
101
Why needs Logic Emulation
Design verification issues.
 Real-time operation.
 System-level testing.
 Rapid prototyping.

102
Design Verification Issues
Simulation-based verification methods have
run out of stem when chip complexity
grows.
 Emulation is a verification technology that
grows along with design size.

103
Real-Time Operation
Simulation requires test vector development
which is costly and difficult. Verification
depends on test vector correctness.
 Certain applications must be verified in real
time - human perception: audio and video.
 Emulation connected to actual hardware can
run: real diagnostic code, operating systems,
and applications.

104
System-Level Testing
Often the chip meets spec but fails in the
system.
 System-level interactions between the chip
and other components.
 Internal probing is impossible when the chip
is fabbed and placed in a system, but it is
possible using emulation.

105
Rapid Prototyping
Once emulated design is debugged it is
available for immediate use by software
developers for software debugging.
 Emulated design is available for demo and
experiments with architecture on real
applications and data.

106
Programmable Hardware
Logic
element
Logic
element
Programmable
interconnect
Memory
element
Interface
VLSI
core
107
Considerations
The capacity of logic and interconnection
depends on package constraints. This forces
a hierarchical system.
Chips => boards => boxes => system
 The interconnect structure must:
1. Provide successful connectivity,
2. Maximize FPGA utilization, and
3. Minimize delay and skew.
 Rent’s rule applies to predict interconnect
needs.

108
Multi-FPGA Systems
Topologies:
- Mesh - nearest neighboring.
- Crossbar - full and partial.
 Interconnect scheme:
- Circuit switched.
- Time multiplexed.

109
Nearest Neighbor Interconnection
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
110
Advantages and Disadvantages
Advantages:
- Uniform: all chips the same.
- Easy to lay out on PCB.
 Disadvantages:
- Routing is easily blocked.
- Through pins limit logic utilization of
FPGAs.
- Long and unpredictable delays.
- No natural hierarchical extension.

111
Nearest Neighbor Extensions
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
112
Advantages and Disadvantages
Advantages:
- More choices for router by adding diagonal
lines & skip lines.
 Disadvantages:
- More complex PCB.
- More complex routing software.

113
Partial Crossbar Interconnect
Logic blocks
ABCD
ABCD
ABCD
ABCD
Crossbars
A pins
B pins
C pins
D pins
Second-level crossbars
114
Partial Crossbar Interconnect
Partial crossbar consists of a set of small full
crossbars, connected to logic blocks but not
to each other.
 I/O pins of each FPGA are divided into
subsets. Each subset is connected by a full
crossbar circuit switch.
 Partial crossbar is a potentially blocking
network.

115
Partial Crossbar Characteristics
Partial crossbar’s size is proportional to the
number of FPGA pins.
 All interconnections go through one/three
crossbar chips for a one-level/two-level
partial crossbar interconnect - delays are
uniform and bounded.

116
Mixed Full and Partial Crossbar
External
connections
Global
Global Partial
FPIC
FPIC crossbar
Local
FPIC
FPGA
FPGA
Local
FPIC
FPGA
FPGA
Full
Local
FPIC crossbar
FPGA
FPGA
117
Circuit Switched Vs. Time
Multiplexed
Trade off operating speed and hardware cost.
 Time-multiplexing method:
- can greatly expand available interconnect.
- allows lower cost IC package and PCB.
- makes partitioning easier.
BUT
- System power increases due to frequent
signal switching (higher hardware cost).
- Complex scheduling software.
- Slow operating speed.
118

Virtual Wires
Mux
FPGA
Physical
wires
FPGA
Logical
inputs
FPGA
Mux
FPGA
Logical
outputs
119
Logic Emulation Systems
System with mesh topology - Quickturn’s
RPM and Virtual Machine Works (IKOS).
 System with partial crossbar - Quickturn’s
Enterprise, Mars, and System Realizer.
 System with mixed full and partial crossbar
- Aptix Prototyping System.
 System using time-multiplexed interconnect
- Virtual Machine Works (IKOS) , CoBALT
and Arkos (Quickturn).

120
Memory Solutions
Goal: programmable memories with
different width/depth/port combinations.
 FPGA-based memories:
- inefficient of using logic resources.
- timing correctness is difficult to be insured.
- large or highly multi-ported memories
must be partitioned across several FPGAs.
 SRAMs with dedicated or programmable
controllers.

121
Logic Emulation Design Flow
HDL synthesis
Synthesis
Pre-configuration
preparation
Partitioning
System mapping
P&R
Full-chip
configuration
Design downloading
Emulators
In-circuit
emulation
122
Logic Emulation Design
Compiler

Logic emulation design compiler is a large
and complex EDA tool which includes:
- Front-end design importer.
- HDL-based synthesizer.
- Clock and timing analyzer.
- Partitioner.
- System-level placer and router.
- FPGA-based placer and router.
123
Objectives
Fast compilation time.
 Fast emulation clock.
 Timing correctness.
 Easy ECO.
 Minimize circuit size.

124
Design Considerations
HDL synthesis:
- Trade-off run-time and quality.
- CLB-based Vs. gate-based designs.
 Clock and timing analysis:
- Timing correctness, hold-time violation
free. - Clock skew minimization.
 Partitioning:
- Run time.
- Timing and area.

125
Design Considerations
System placement and routing:
- Timing.
- Completeness of routing.
 FPGA-based placement and routing:
- Fast run time.
- Parallel compilation.

126
Hold-Time Violation
Clock distribution problem (Skew)!!!
Q
D
CK
LUT
CLB
Q
D
CK
Routing delay
Hold-time violation occurs
when Routing delay > LUT delay!!!
127
Timing Correctness
Delay insertion
Q
D
CK
Delay
element
LUT
CLB
Q
D
CK
Routing delay
128
Timing Correctness
Use clock enables for gated clocks
Q
D
CK
Q
LUT
D
CLB
CE CK
Clock path
Primary clock
Low-skew net
129
Methodology
Pre-configuration preparation - prepare
netlists and control files for configuration.
 Testbed preparation - prepare emulationbased operation environment.
 Full-chip configuration - download design to
the emulator.
 In-circuit emulation - test the design.

130
Pre-Configuration
Translate the leaf-cell libraries into
emulation primitives.
 Translated libraries must be verified for
functional equivalence to original.
 Modify and redesign some components to
attain compatibility with emulation
techniques, such as precharge logic circuits.
 Assemble all the gate-level netlists for the
entire design.

131
Testbed
Design and implement target ICE board
combining the emulated design with real
hardware.
 Slowdown testbed to emulation speed.
 Assemble the testbed and emulation
equipment.

132
Full-Chip Configuration & InCircuit Emulation
Full-chip configuration:
- Prepare control files.
- Partition the design to fit into the emulation
system.
- Download design into the system.
- Verify that emulation model faithfully
implements the design as specified by RTL.
 In-circuit emulation

133
Part IV
Reconfigurable Computing and
Systems
134
General-Purpose Computing Vs.
Custom Computing
General-purpose computing - applying
applications on a general-purpose computer.
 Custom computing - applying applications
on a custom-made application-specific
hardware.
 Field-programmable devices make this into
a reality.

135
Goals of Reconfigurable
Computing
Tailor the architecture to the application.
 Minimize or eliminate instruction
interpretation.
 Exploit fine grained parallelism.
 Map software to hardware.

136
Applications
Database search and analysis.
 Image processing and machine vision.
 Data compression.
 Signal processing.
 Neural networks.
 Biology computing.
 Medical computing.
 Many more.

137
Multi-Mode Systems
ROM
Application 1
Reconfigurable
system
Application 2
- Different configurations for read & write
operations of a tape driver (Honeywell).
- Different configurations for different
printer controllers (Tektronix).
138
Run-Time Reconfiguration
Image data
Truck?
Jeep?
I/O
?
Tank?
- Break single computation into multiple pieces.
- Page in components as needed (virtual hardware),
ex., automatic target recognition.
139
Custom Computing
Application-specific systems.
 Numerous applications for similar
reconfigurable systems.
 Offers hardware performance, flexibility to
handle numerous algorithms.
 Multi-FPGA systems can be viewed as
hardware supercomputers.

140
Reconfigurable Ceprocessors
Program 1
Processor
Inst1
Coprocessor
Program 2
Inst2
- Provide custom instructions
on a per-application basis.
141
Types of Reprogrammable
Systems
Coprocessor
CPU
Attached
processing
unit
Memory
caches
Standalone
PU
I/O
interface
142
Types of Reprogrammable
Systems
Attached and standalone processing units
are reprogrammable systems on computer
add-on cards and separate reprogrammable
cabinets.
Considerations: large communication
overhead may over-shadow the speed gain.
 Application-specific coprocessors can
achieve significant improvement over a
wide range of applications.

143
Types of Reprogrammable
Systems

Integrate the reprogrammable logic into the
processor itself.
- A reprogrammable functional unit can be
configured on a per-algorithm basis.
- Providing some special-purpose
instructions tailored to the needs of a given
application.
144
Architectures of Multi-FPGA
Systems

The most commonly used topologies:
- Mesh: 1D (linear array), 2D, and 3D.
- Crossbar: full, partial, mixed, and
hierarchical.
- Hybrid between mesh and crossbar.
- Application-specific architecture.
145
Hybrid Topology
Ext. Interface
FPGA
FPGA
FPGA
Ext. Interface
FPGA
FPGA
RAM
RAM
16 FPGAs
RAM
RAM
Splash 2: augments a linear array of FPGAs with
a crossbar switch.
Goal: Supporting systolic circuits.
146
Hybrid Topology
FPGA
FPGA
FPGA
FPGA
Host
interface
RAM
RAM
RAM
Anyboard: A linear array of FPGAs augmented
by global buses.
147
Hybrid Topology
RAM
Host
interface
RAM
4 X 4 mesh
of FPGAs
RAM
RAM
DECPeRLe-1: a 4 X 4 mesh of FPGAs augmented
with shred global buses.
148
Application-Specific Topology
4
1
5
2
3
1
FPGA
FPGA
4
Memory
FPGA
FPGA
3
5 2
FPGA
4
FPGA
3
5 2
FPU
FPGA
FPGA
The Marc-1: subsystem 1.
1
1
FPGA
149
Application-Specific Topology
The Marc-1
Target to circuit
simulation where Subsystem1
the program to be
executed can be
optimized on a
Subsystem1
per-run basis for
values constant
within that run, but which may vary from
dataset to dataset.
1
2
3
4
5
150
Application-Specific Topology
RAM
RAM
RAM
RAM
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
RAM
RAM
RAM
RAM
RAM
The RM-nc system: neural network.
151
Architecture for Computer
Prototyping
VME bus
FPGA
Cache memory
FPGA
FPGA
FPGA
Register file
FPGA
FPGA
ALU
FPU
FPGA
The Mushroom processor
prototyping system.
152
Expandable Topology
Hierarchical crossbar topology: by adding
extra level.
- Quickturn systems.
 Expandable mesh topology: by connecting
individual board to form a large mesh.
- The Virtual Wires Emulation System
(IKOS).

153
Topology for Adapting Other
Components
Many multi-FPGA systems include nonFPGA resources to provide more general
purpose solutions.
 The MORRPH system - sockets next to
FPGAs which allow to add arbitrary devices
to the array.
 The G800 board - contains two FPGAs and
four sockets.

154
Topology for Adapting Other
Components
The COBRA system - contains based
modules (expanding to 2D mesh), RAM
modules, I/O modules, and bus modules.
 The Springbok system - pre-made daughter
board which is able to contain an arbitrary
device (on the top) and an FPGA (on the
bottom). Daughter boards is mounted on a
baseplate.

155
Topology for Adapting Other
Components
The Quickturn systems - external
component adapters.
 The Aptix FPCB - a reprogrammable PCB.

156
Design Methodology
Applications
Mapping
Host
computer
Reprogrammable
system
157
Typical Software Methodology
Application
spec.
Analysis
System-level
synthesis
Software
spec.
Code
generation
Object code
Hardware
spec.
Hardware
synthesis
158
Typical Software Methodology
Hardware spec.
Synthesis
Partitioning & placement
Pin assignment & routing
FPGA P & R
Bit-stream files
159
Considerations
Architectural-specific design tasks.
 Design automation process.
 The mapping time dominates the setup time
for operating the system.
 Run-time reconfigurability.

160
Design Specification and
Languages
Standard software programming languages,
e.g., C, C++, FORTRAN, and assembly
language, Vs. HDLs.
 Standard software programming languages a sequential execution model.
 HDLs - a parallel execution model.
 Who will use it and which one is more
suitable for system description???

161
Compilation Issues
Translate code from software languages into
hardware without losing the inherent
concurrency of hardware.
 Compiler techniques for parallelizing code.
 Straight-line code, control flow, and loops.
 Transmogrifier C compiler.

162
System-level and High-level
Synthesis
System-level design evaluation and analysis.
 Design estimation.
 Hardware-software partitioning.
 Interface synthesis.
 RTL synthesis.
 Logic synthesis and technology mapping.

163
Partitioning and Placement
Topology-aware partitioning methods.
 Partitioning onto a multi-FPGA system is
equivalent to a placement problem.
 Logic utilization and timing.

164
Pin Assignment and Routing
Pin-assignment - the process of determining
which I/O pins to be used for each interFPGA signal.
 Pin-assignment for a pre-fabricated multiFPGA system is equivalent to the global
routing problem.
 Pin-assignment will greatly affect the quality
of FPGA’s logic utilization and routability.

165
Run-Time Reconfigurability
Virtual hardware <=> virtual memory.
 Hardware on demand.
 Unconfigured and reconfiguring methods.
 Software supporting time-varying mapping.
 Many open problems need to be solved in
the forth coming years.

166
Applications: Splash 2
Stream oriented systolic and SIMD
applications.
 Scalable linear array of 16 to 256 processing
elements (1 XC4010 with 1/2 Mbyte).
 VHDL based.
 Sequence comparison - 2300M:0.75M cell
updates/sec (Splash 2:Sparc 10).
 Edge detection - 10M:242K pixels/sec
(Splash 2:Sparc 10).

167
Applications: PAM (DEC)
Programmable Active Memory (PAM).
 C++ based and mesh arrays of XC3090
(DECPeRLe-1).
 Applications:
- Multiple precision arithmetic.
- RSA encryption.
- Video compression (JPEG, MPEG, DCT).
- High energy physics.
- Telecommunications.

168
Descargar

Rapid Prototyping Using Field Programmable Devices