Kris Gaj
Research and teaching interests:
• cryptography
• computer arithmetic
• VLSI design and testing
Contact:
Science & Technology II, room 223
[email protected]
(703) 993-1575
Office hours: Monday, 6:00-7:00 PM
Tuesday, Thursday, 7:30-8:30 PM,
and by appointment
ECE 645
Part of:
MS in CpE
Digital Systems Design – pre-approved course
Other concentration areas – elective course
MS in EE
Certificate in VLSI Design/Manufacturing
PhD in ECE
PhD in IT
Spring 2007 Enrollment as of January 22, 2006
PhD in CS
1
Non-Degree
4
MS in EE
4
MS in CpE
7
My general area
of interest is…
I want to specialize
primarily in…
CAD tools & Design Automation
VLSI
Hardware description languages
Recommended
degree &
concentration
MS CpE
Digital Systems Design
Digital Systems Design FPGAs & Reconfigurable computing
ASICs & FPGAs
Computer arithmetic
VHDL/Verilog
CAD Tools
Front-end ASIC
design (algorithmic downto
gate level)
Reconfigurable
Computing
Back-end ASIC
design (transistor downto device
level)
Microelectronics
Analog & Mixed Circuit Design
VLSI Fabrication
VLSI Fabrication
Nanoelectronics
Micro- and Nanoelectronics
Semiconductor Devices
MS EE
Microelectronics
Introduction
to VHDL
Computer
Arithmetic
VLSI Design VLSI Test
Automation Concepts
algorithmic
register-transfer
gate
transistor
layout
devices
ECE
645
ECE
545
CpE
core
ECE
586
Digital
Integrated
Circuits
ECE
699
Mixed
Signals VLSI
ECE 584
EE
core
Semiconductor
Device Fundamentals
ECE684
ECE
681
ECE
682
ECE
587
ECE
745
Analog
Integrated
Circuits
ECE 699
MOS Device
ULSI
NanoElectronics Microelectronics electronics
MS CpE: DIGITAL SYSTEMS DESIGN
Concentration advisors: Kris Gaj, Ken Hintz, David Hwang
1. ECE 545 Introduction to VHDL
– K. Gaj, D. Hwang, project, VHDL, Aldec/Synplicity/Xilinx and
Synopsys Design Analyzer/PrimeTime
2. ECE 645 Computer Arithmetic: HW and SW Implementation
– K. Gaj, project, VHDL, Aldec/Synplicity/Xilinx and
Synopsys Design Analyzer/PrimeTime
3. ECE 681 VLSI Design Automation
– T. Storey, project/lab, back-end design with Synopsys tools
4. ECE 586 Digital Integrated Circuits
– D. Ioannou
Prerequisites
ECE 545 Introduction to VHDL
or
Permission of the instructor,
granted assuming that you know
VHDL or Verilog,
High level
programming
language
(preferably C)
Course web page
ECE web page  Courses  Course web pages
 ECE 645
http://teal.gmu.edu/courses/ECE645/index.htm
Computer Arithmetic
Lecture
Homework
10 %
Midterm exam 1 (in class)
20 %
Midterm exam 2 (take-home)
20 %
Project
Project 1
20 %
Project 2
30 %
Advanced digital circuit design course covering
Efficient
• addition and subtraction
• multiplication
• division and modular reduction
• exponentiation
Integers
unsigned and signed
Real numbers
Elements
of the Galois
field GF(2n)
• fixed point
• single and double precision
floating point
• polynomial base
Lecture topics (1)
INTRODUCTION
1. Applications of computer arithmetic algorithms
2. Number representation
• Unsigned Integers
• Signed Integers
• Fixed-point real numbers
• Floating-point real numbers
• Elements of the Galois Field GF(2n)
ADDITION AND SUBTRACTION
1. Basic addition, subtraction, and counting
2. Carry-lookahead, carry-select, and hybrid adders
3. Adders based on Parallel Prefix Networks
MULTIOPERAND ADDITION
1. Carry-save adders
2. Wallace and Dadda Trees
3. Adding multiple signed numbers
MULTIPLICATION
1. Tree and array multipliers
2. Sequential multipliers
3. Multiplication of signed numbers and squaring
DIVISION
1. Basic restoring and non-restoring
sequential dividers
2. SRT and high-radix dividers
3. Array dividers
LONG INTEGER ARITHMETIC
1. Modular Exponentiation
2. Multi-Precision Arithmetic in Software
FLOATING POINT
AND
GALOIS FIELD ARITHMETIC
1. Floating-point units
2. Galois Field GF(2n) units
Similar courses at other universities
• University of California, Santa Barbara, Behrooz Parhami,
ECE252B: Computer Arithmetic.
• University of Massachusetts, Amherst, Israel Koren,
ECE666: Digital Computer Arithmetic
• Lehigh University, Michael Schulte,
ECE496: High-Speed Computer Arithmetic.
• Worcester Polytechnic Institute, Berk Sunar,
EE-579 V Computer Arithmetic Circuits.
• Stanford University, Michael Flynn,
EE486: Advanced Computer Arithmetic.
• University of California, Davies, Vojin Oklobdzija,
ECE278: Computer Arithmetic for Digital Implementation.
New in this course
• real-life project based on VHDL or Verilog HDL
• operations in the Galois Field
(with application in cryptography
and communications)
Possible topics for a
Scholarly Paper or Research Project
for the CpE & EE students
Advanced Computer Arithmetic
Square root
Exponential and logarithmic functions
Trigonometric functions
Hyperbolic functions
Fault-Tolerant Arithmetic
Low-Power Arithmetic
High-Throughput Arithmetic
Literature (1)
Required textbook:
Behrooz Parhami,
Computer Arithmetic: Algorithms and Hardware Design,
Oxford University Press, 2000.
Recommended textbooks:
Milos D. Ercegovac and Tomas Lang
Digital Arithmetic, Morgan Kaufmann Publishers, 2004.
Isreal Koren, Computer Arithmetic Algorithms, 2nd edition,
A. K. Peters, Natick, MA, 2002.
Literature (2)
VHDL books
(used in ECE 545 in Fall 2005)
1. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right,
S & G Publishing, 1998.
2. Volnei A. Pedroni, Circuit Design with VHDL,
The MIT Press, 2004.
Literature (3)
Supplementary books:
1. E. E. Swartzlander, Jr., Computer Arithmetic,
vols. I and II, IEEE Computer Society Press, 1990.
2. Alfred J. Menezes, Paul C. van Oorschot,
and Scott A. Vanstone,
Handbook of Applied Cryptology,
Chapter 14, Efficient Implementation,
CRC Press, Inc., 1998.
3. Christof Paar, Efficient VLSI Architectures for Bit
Parallel Computation in Galois Fields,
VDI Verlag, 1994.
Literature (3)
Proceedings of conferences
ARITH - International Symposium on Computer Arithmetic
ASIL - Asilomar Conference on Signals, Systems, and Computers
ICCD - International Conference on Computer Design
CHES - Workshop on Cryptographic Hardware and
Embedded Systems
Journals and periodicals
IEEE Transactions on Computers,
in particular special issues on computer arithmetic:
8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05.
IEEE Transactions on Circuits and Systems
IEEE Transactions on Very Large Scale Integration
IEE Proceedings: Computer and Digital Techniques
Journal of VLSI Signal Processing
Homework
• reading assignments (main textbook + articles)
• analysis of hardware and software algorithms
and implementations
• design of small hardware units using VHDL or Verilog
Optional assignments
Possibility of trading
analysis vs. design vs. coding
Midterm exams
Exam 1 - 2 hrs 30 minutes, in class
multiple choice + short problems
Exam 2 – 48 hrs, take-home
conceptual questions,
analysis and design of arithmetic units
using VHDL or Verilog HDL
Practice exams on the web
Tentative days of exams:
Exam 1
Exam 2
- Monday, March 26
- Saturday-Sunday, May 5-6
Project (1)
Project I (20% of grade)
Design and comparative analysis of fast adders
(several hundred bits long)
Optimization criteria:
• minimum latency
• maximum throughput
• minimum area
• minimum product latency · area
• maximum ratio throughput/area
• scalability
Similar for all students
Done individually
Final report due
Monday, March 19
Project (2)
Project II (30% of grade)
Long unsigned or signed integers
or
Fast
•
•
•
•
•
multiplication
squaring
division
modular reduction, or
modular exponentiation
Floating-point numbers
Fast
• addition or
• multiplication
Project II (rules)
• Real life application
• Requirements derived from the analysis of the application
• Typically both hardware and software design
• Several project topics proposed on the web
• You can choose project topic by yourself
• Can be done in a group of 1-3 students
Written report & oral presentation
Monday, May 14
Project II (rules)
• Every team works on a slightly different problem
• Project topics should be more complex for larger teams
• Cooperation (but not exchange of codes)
between teams is encouraged
Project
Hardware
Software
VHDL (or Verilog) code
High level language
(C preferred)
Latency and/or throughput
Execution time
Area
Memory requirements
Scalability
Scalability
Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
ECE 586, 681
testability
ECE 682
Degrees of freedom and possible trade-offs
speed
latency
area
throughput
Timing parameters
definition
units
delay
time pointpoint
ns
latency
time inputoutput
ns
throughput
#output bits/time unit
clock period
clock frequency
Mbits/s
pipelining
bad
good
rising edge rising edge
of clock
ns
good
1
clock period
MHz
good
Project technologies
semi-custom Application Specific Integrated Circuits
and
Field Programmable Gate Arrays
Levels of design description
Algorithmic level
Register Transfer Level
Logic (gate) level
Circuit (transistor) level
Physical (layout) level
Level of description
most suitable for synthesis
Register Transfer Logic (RTL)
Design Description
Registers
Combinational
Logic
Clock
Combinational
Logic
…
CAD software available at GMU (1)
VHDL simulators
• Aldec Active-HDL (under Windows)
• available in the FPGA Lab, S&T II, room 203
• student edition can be purchased on an individual
basis ($59.95 + S&H)
• ModelSim Xilinx Edition III (under Windows)
• available in the FPGA Lab, S&T II, room 203
• limited version available for free for individual use
at home as a part of Xilinx WebPACK
CAD software available at GMU (2)
Tools used for logic synthesis
FPGA synthesis
• Synplicity Synplify Pro (under Windows)
• Xilinx XST (under Windows)
• available in the FPGA Lab, S&T II, room 203
• available for free as a part of WebPACK
ASIC synthesis
• Synopsys Design Compiler and PrimeTime (under Unix)
• available from all PCs in the ECE educational labs
using an X-terminal emulator
• available remotely from home using a fast Internet
connection
CAD software available at GMU (3)
Tools used for implementation (mapping, placing
& routing) in the FPGA technology
• Xilinx ISE (under Windows)
• available in the FPGA Lab, S&T II, room 203
• Xilinx WebPACK (under Windows)
• limited version available for free for individual use
at home as a part of Xilinx WebPACK
How to learn VHDL for synthesis
by yourself?
• Lecture slides for ECE 545 from Fall 2005
• Sundar Rajan, Essential VHDL: RTL Synthesis Done Right,
S & G Publishing, 1998.
• Volnei A. Pedroni, Circuit Design with VHDL,
The MIT Press, 2004.
• Individual or small-group hands-on sessions with the TA
• Practice, Practice, Practice!!!
Testbench
Non-synthesizable
testbench
Synthesizable
design entity
Architecture 1
Architecture 2
....
Architecture N
Hardware Design Verification
Testbench
HDL Design
(VHDL or Verilog)
actual results
= ?
Representative
Inputs
Reference Model
(C or MAGMA )
expected results
Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers
(8, 16, 32, 64 bits)
Real numbers
(32, 64 bits)
Primary applications (2)
Digital signal and digital image processing
e.g., digital filters
Discrete Fourier Transform
Discrete Hilbert Transform
General purpose
DSP processors
Specialized circuits
Real or complex numbers
(fixed-point or floating point)
Primary applications (3)
Coding
Error detection codes
Error correcting codes
Elements of
the Galois fields GF(2n)
(4-64 bits)
Secret-key (Symmetric) Cryptosystems
key of Alice and Bob - KAB
key of Alice and Bob - KAB
Network
Encryption
Alice
Decryption
Bob
Primary applications (4)
Cryptography
Secret key cryptography
IDEA, RC6, Mars
Twofish, Rijndael
Integers
(16, 32 bits)
Elements of
the Galois field GF(2n)
(4, 8 bits)
Main
operations
RC6
2 x SQR32,
2 x ROL32
MARS
MUL32,
2 x ROL32,
S-box 9x32
Twofish
96 S-box 4x4,
24 MUL GF(28)
Auxiliary
operations
XOR,
ADD/SUB32
XOR,
ADD/SUB32
XOR
ADD32
Rijndael
16 S-box 8x8
24 MUL GF(28)
XOR
Serpent
8 x 32
S-box 4x4
XOR
Public Key (Asymmetric) Cryptosystems
Private key of Bob - kB
Public key of Bob - KB
Network
Encryption
Alice
Decryption
Bob
RSA as a trap-door one-way function
PUBLIC KEY
M
C = f(M) = Me mod N
C
M = f-1(C) = Cd mod N
PRIVATE KEY
N=PQ
P, Q - large prime numbers
e  d  1 mod ((P-1)(Q-1))
RSA keys
PUBLIC KEY
PRIVATE KEY
{ e, N }
{ d, P, Q }
N=PQ
P, Q - large prime numbers
e  d  1 mod ((P-1)(Q-1))
Primary applications (5)
Cryptography
Public key cryptography
RSA, DSA,
Diffie-Hellman
Long integers
(1000-16,000 bits)
Elliptic Curve Cryptosystems
Elements of
the Galois field GF(2n)
(150-500 bits)
Primary applications (5)
Cipher Breaking
Public key cryptography
RSA PUBLIC KEY
RSA PRIVATE KEY
{ e, N }
{ d, P, Q }
N=PQ
P, Q
e  d  1 mod ((P-1)(Q-1))
Estimation of RSA Security Inc. regarding
the number and memory of PCs
necessary to break RSA-1024
Attack time:
1 year
Single machine:
PC, 500 MHz, 170 GB RAM
Number of machines: 342,000,000
Factoring 1024-bit RSA keys
using Number Field Sieve (NFS)
Polynomial Selection
Relation Collection
Sieving
200 bit & 350 bit
numbers
Linear Algebra
Square Root
Norm Factoring/Cofactoring
ECM
p-1 method
Pollard rho
Sashisu
Bajracharya
Ramakrishna
Bachimanchi
Comparison among technologies
Microprocessors
SRC
FPGAs
COPACOBANA
ASICs
FPGAs vs. Microprocessors
Spartan3s5000
Virtex2v6000
Pentium4 2.8GHz
10.8x
11.3x
857
869
7.9x
8.4x
635
637
7.8x
315
80
rho
10.8x
435
76
p-1
40
ECM
Rho in an ASIC 130 nm
Global Memory
Local
Memory
ASIC 130 nm vs. Virtex II 6000 – rho (24 units)
19.68 mm
19.80 mm
51x
Area of Virtex II 6000
(estimation by R.J. Lim Fong,
MS Thesis, VPI, 2004)
2.7 mm
2.82 mm
Area of an ASIC with equivalent functionality
Number of computations per second using the same
chip area
130 nm ASIC library
101x
Virtex2v6000 FPGA
88,405
50x
21,739
869
rho
435
ECM
Cofactorization Unit
interesting
Computer
Arithmetic
project
Famous computer arithmetic
bugs and flaws
Learn to deal with approximations
• In digital arithmetic one has to come to grips
with approximation and questions like:
– When is approximation good enough
– What margin of error is acceptable
• Be aware of the applications you are designing
the arithmetic circuit or program for
• Analyze the implications of your
approximation
Calculators
u=
..... 2 = 1.000 677 131
v = 21/1024 = 1.000 677 131
10 times
x = (((u2)2)…)2 = 1.999 999 963
y = (((v2)2)…)2 = 1.999 999 983
10 times
10 times
x’ =
u1024
= 1.999 999 973
y’ =
v1024
= 1.999 999 994
Hidden digits in the internal representation of numbers
Different algorithms give slightly different results
Very good accuracy
Consequences of bad approximations
Example: Failure of Patriot Missile (1991 Feb. 25)
Source http://www.math.psu.edu/dna/455.f96/disasters.html
American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept
incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28
Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of
the time since boot)
Specifics of the problem: time in tenths of second as measured by the system’s
internal clock was multiplied by 1/10 to get the time in seconds. Internal registers
were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b)
Error @ 0.1100 1100  2–23 @ 9.5  10–8
Error in 100-hr operation period
@ 9.5  10–8  100  60  60  10 = 0.34 s
Distance traveled by Scud = (0.34 s)  (1676 m/s) @ 570 m
This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad
time calculation had been improved in some (but not all) code parts contributed to
the problem, since it meant that inaccuracies did not cancel out
Consequences of bad approximations
Example: Explosion of Ariane Rocket (1996 June 4)
Source http://www.math.psu.edu/dna/455.f96/disasters.html
Unmanned Ariane 5 rocket launched by the European Space Agency veered off its
flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its 1st voyage after a decade of
development costing $7 billion
Cause: “software error in the inertial reference system”
Specifics of the problem: a 64 bit floating point number relating to the horizontal
velocity of the rocket was being converted to a 16 bit signed integer
An SRI* software exception arose during conversion because the 64-bit floating
point number had a value greater than what could be represented by a 16-bit signed
integer (max 32 767)
Pentium bug (1)
October 1994
Thomas Nicely, Lynchburg Collage, Virginia
finds an error in his computer calculations, and traces
it back to the Pentium processor
November 7, 1994
First press announcement, Electronic Engineering Times
Late 1994
Tim Coe, Vitesse Semiconductor
presents an example with the worst-case error
c = 4 195 835/3 145 727
Pentium
= 1.333 739 06...
Correct result = 1.333 820 44...
Pentium bug (2)
Intel admits “subtle flaw”
November 30, 1994
Intel’s white paper about the bug and its possible consequences
Intel - average spreadsheet user affected once in 27,000 years
IBM - average spreadsheet user affected once every 24 days
Replacements based on customer needs
December 20, 1994
Announcement of no-question-asked replacements
Pentium bug (3)
Error traced back to the look-up table used by
the radix-4 SRT division algorithm
2048 cells, 1066 non-zero values {-2, -1, 1, 2}
5 non-zero values not downloaded correctly
to the lookup table due to an error in the C script
Descargar

No Slide Title