Kris Gaj Research and teaching interests: • cryptography • computer arithmetic • VLSI design and testing Contact: Science & Technology II, room 223 kgaj@gmu.edu (703) 993-1575 Office hours: Monday, 6:00-7:00 PM Tuesday, Thursday, 7:30-8:30 PM, and by appointment ECE 645 Part of: MS in CpE Digital Systems Design – pre-approved course Other concentration areas – elective course MS in EE Certificate in VLSI Design/Manufacturing PhD in ECE PhD in IT Spring 2007 Enrollment as of January 22, 2006 PhD in CS 1 Non-Degree 4 MS in EE 4 MS in CpE 7 My general area of interest is… I want to specialize primarily in… CAD tools & Design Automation VLSI Hardware description languages Recommended degree & concentration MS CpE Digital Systems Design Digital Systems Design FPGAs & Reconfigurable computing ASICs & FPGAs Computer arithmetic VHDL/Verilog CAD Tools Front-end ASIC design (algorithmic downto gate level) Reconfigurable Computing Back-end ASIC design (transistor downto device level) Microelectronics Analog & Mixed Circuit Design VLSI Fabrication VLSI Fabrication Nanoelectronics Micro- and Nanoelectronics Semiconductor Devices MS EE Microelectronics Introduction to VHDL Computer Arithmetic VLSI Design VLSI Test Automation Concepts algorithmic register-transfer gate transistor layout devices ECE 645 ECE 545 CpE core ECE 586 Digital Integrated Circuits ECE 699 Mixed Signals VLSI ECE 584 EE core Semiconductor Device Fundamentals ECE684 ECE 681 ECE 682 ECE 587 ECE 745 Analog Integrated Circuits ECE 699 MOS Device ULSI NanoElectronics Microelectronics electronics MS CpE: DIGITAL SYSTEMS DESIGN Concentration advisors: Kris Gaj, Ken Hintz, David Hwang 1. ECE 545 Introduction to VHDL – K. Gaj, D. Hwang, project, VHDL, Aldec/Synplicity/Xilinx and Synopsys Design Analyzer/PrimeTime 2. ECE 645 Computer Arithmetic: HW and SW Implementation – K. Gaj, project, VHDL, Aldec/Synplicity/Xilinx and Synopsys Design Analyzer/PrimeTime 3. ECE 681 VLSI Design Automation – T. Storey, project/lab, back-end design with Synopsys tools 4. ECE 586 Digital Integrated Circuits – D. Ioannou Prerequisites ECE 545 Introduction to VHDL or Permission of the instructor, granted assuming that you know VHDL or Verilog, High level programming language (preferably C) Course web page ECE web page Courses Course web pages ECE 645 http://teal.gmu.edu/courses/ECE645/index.htm Computer Arithmetic Lecture Homework 10 % Midterm exam 1 (in class) 20 % Midterm exam 2 (take-home) 20 % Project Project 1 20 % Project 2 30 % Advanced digital circuit design course covering Efficient • addition and subtraction • multiplication • division and modular reduction • exponentiation Integers unsigned and signed Real numbers Elements of the Galois field GF(2n) • fixed point • single and double precision floating point • polynomial base Lecture topics (1) INTRODUCTION 1. Applications of computer arithmetic algorithms 2. Number representation • Unsigned Integers • Signed Integers • Fixed-point real numbers • Floating-point real numbers • Elements of the Galois Field GF(2n) ADDITION AND SUBTRACTION 1. Basic addition, subtraction, and counting 2. Carry-lookahead, carry-select, and hybrid adders 3. Adders based on Parallel Prefix Networks MULTIOPERAND ADDITION 1. Carry-save adders 2. Wallace and Dadda Trees 3. Adding multiple signed numbers MULTIPLICATION 1. Tree and array multipliers 2. Sequential multipliers 3. Multiplication of signed numbers and squaring DIVISION 1. Basic restoring and non-restoring sequential dividers 2. SRT and high-radix dividers 3. Array dividers LONG INTEGER ARITHMETIC 1. Modular Exponentiation 2. Multi-Precision Arithmetic in Software FLOATING POINT AND GALOIS FIELD ARITHMETIC 1. Floating-point units 2. Galois Field GF(2n) units Similar courses at other universities • University of California, Santa Barbara, Behrooz Parhami, ECE252B: Computer Arithmetic. • University of Massachusetts, Amherst, Israel Koren, ECE666: Digital Computer Arithmetic • Lehigh University, Michael Schulte, ECE496: High-Speed Computer Arithmetic. • Worcester Polytechnic Institute, Berk Sunar, EE-579 V Computer Arithmetic Circuits. • Stanford University, Michael Flynn, EE486: Advanced Computer Arithmetic. • University of California, Davies, Vojin Oklobdzija, ECE278: Computer Arithmetic for Digital Implementation. New in this course • real-life project based on VHDL or Verilog HDL • operations in the Galois Field (with application in cryptography and communications) Possible topics for a Scholarly Paper or Research Project for the CpE & EE students Advanced Computer Arithmetic Square root Exponential and logarithmic functions Trigonometric functions Hyperbolic functions Fault-Tolerant Arithmetic Low-Power Arithmetic High-Throughput Arithmetic Literature (1) Required textbook: Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000. Recommended textbooks: Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004. Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002. Literature (2) VHDL books (used in ECE 545 in Fall 2005) 1. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998. 2. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004. Literature (3) Supplementary books: 1. E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, 1990. 2. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998. 3. Christof Paar, Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields, VDI Verlag, 1994. Literature (3) Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing Homework • reading assignments (main textbook + articles) • analysis of hardware and software algorithms and implementations • design of small hardware units using VHDL or Verilog Optional assignments Possibility of trading analysis vs. design vs. coding Midterm exams Exam 1 - 2 hrs 30 minutes, in class multiple choice + short problems Exam 2 – 48 hrs, take-home conceptual questions, analysis and design of arithmetic units using VHDL or Verilog HDL Practice exams on the web Tentative days of exams: Exam 1 Exam 2 - Monday, March 26 - Saturday-Sunday, May 5-6 Project (1) Project I (20% of grade) Design and comparative analysis of fast adders (several hundred bits long) Optimization criteria: • minimum latency • maximum throughput • minimum area • minimum product latency · area • maximum ratio throughput/area • scalability Similar for all students Done individually Final report due Monday, March 19 Project (2) Project II (30% of grade) Long unsigned or signed integers or Fast • • • • • multiplication squaring division modular reduction, or modular exponentiation Floating-point numbers Fast • addition or • multiplication Project II (rules) • Real life application • Requirements derived from the analysis of the application • Typically both hardware and software design • Several project topics proposed on the web • You can choose project topic by yourself • Can be done in a group of 1-3 students Written report & oral presentation Monday, May 14 Project II (rules) • Every team works on a slightly different problem • Project topics should be more complex for larger teams • Cooperation (but not exchange of codes) between teams is encouraged Project Hardware Software VHDL (or Verilog) code High level language (C preferred) Latency and/or throughput Execution time Area Memory requirements Scalability Scalability Degrees of freedom and possible trade-offs speed area ECE 645 power ECE 586, 681 testability ECE 682 Degrees of freedom and possible trade-offs speed latency area throughput Timing parameters definition units delay time pointpoint ns latency time inputoutput ns throughput #output bits/time unit clock period clock frequency Mbits/s pipelining bad good rising edge rising edge of clock ns good 1 clock period MHz good Project technologies semi-custom Application Specific Integrated Circuits and Field Programmable Gate Arrays Levels of design description Algorithmic level Register Transfer Level Logic (gate) level Circuit (transistor) level Physical (layout) level Level of description most suitable for synthesis Register Transfer Logic (RTL) Design Description Registers Combinational Logic Clock Combinational Logic … CAD software available at GMU (1) VHDL simulators • Aldec Active-HDL (under Windows) • available in the FPGA Lab, S&T II, room 203 • student edition can be purchased on an individual basis ($59.95 + S&H) • ModelSim Xilinx Edition III (under Windows) • available in the FPGA Lab, S&T II, room 203 • limited version available for free for individual use at home as a part of Xilinx WebPACK CAD software available at GMU (2) Tools used for logic synthesis FPGA synthesis • Synplicity Synplify Pro (under Windows) • Xilinx XST (under Windows) • available in the FPGA Lab, S&T II, room 203 • available for free as a part of WebPACK ASIC synthesis • Synopsys Design Compiler and PrimeTime (under Unix) • available from all PCs in the ECE educational labs using an X-terminal emulator • available remotely from home using a fast Internet connection CAD software available at GMU (3) Tools used for implementation (mapping, placing & routing) in the FPGA technology • Xilinx ISE (under Windows) • available in the FPGA Lab, S&T II, room 203 • Xilinx WebPACK (under Windows) • limited version available for free for individual use at home as a part of Xilinx WebPACK How to learn VHDL for synthesis by yourself? • Lecture slides for ECE 545 from Fall 2005 • Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998. • Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004. • Individual or small-group hands-on sessions with the TA • Practice, Practice, Practice!!! Testbench Non-synthesizable testbench Synthesizable design entity Architecture 1 Architecture 2 .... Architecture N Hardware Design Verification Testbench HDL Design (VHDL or Verilog) actual results = ? Representative Inputs Reference Model (C or MAGMA ) expected results Primary applications (1) Execution units of general purpose microprocessors Integer units Floating point units Integers (8, 16, 32, 64 bits) Real numbers (32, 64 bits) Primary applications (2) Digital signal and digital image processing e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform General purpose DSP processors Specialized circuits Real or complex numbers (fixed-point or floating point) Primary applications (3) Coding Error detection codes Error correcting codes Elements of the Galois fields GF(2n) (4-64 bits) Secret-key (Symmetric) Cryptosystems key of Alice and Bob - KAB key of Alice and Bob - KAB Network Encryption Alice Decryption Bob Primary applications (4) Cryptography Secret key cryptography IDEA, RC6, Mars Twofish, Rijndael Integers (16, 32 bits) Elements of the Galois field GF(2n) (4, 8 bits) Main operations RC6 2 x SQR32, 2 x ROL32 MARS MUL32, 2 x ROL32, S-box 9x32 Twofish 96 S-box 4x4, 24 MUL GF(28) Auxiliary operations XOR, ADD/SUB32 XOR, ADD/SUB32 XOR ADD32 Rijndael 16 S-box 8x8 24 MUL GF(28) XOR Serpent 8 x 32 S-box 4x4 XOR Public Key (Asymmetric) Cryptosystems Private key of Bob - kB Public key of Bob - KB Network Encryption Alice Decryption Bob RSA as a trap-door one-way function PUBLIC KEY M C = f(M) = Me mod N C M = f-1(C) = Cd mod N PRIVATE KEY N=PQ P, Q - large prime numbers e d 1 mod ((P-1)(Q-1)) RSA keys PUBLIC KEY PRIVATE KEY { e, N } { d, P, Q } N=PQ P, Q - large prime numbers e d 1 mod ((P-1)(Q-1)) Primary applications (5) Cryptography Public key cryptography RSA, DSA, Diffie-Hellman Long integers (1000-16,000 bits) Elliptic Curve Cryptosystems Elements of the Galois field GF(2n) (150-500 bits) Primary applications (5) Cipher Breaking Public key cryptography RSA PUBLIC KEY RSA PRIVATE KEY { e, N } { d, P, Q } N=PQ P, Q e d 1 mod ((P-1)(Q-1)) Estimation of RSA Security Inc. regarding the number and memory of PCs necessary to break RSA-1024 Attack time: 1 year Single machine: PC, 500 MHz, 170 GB RAM Number of machines: 342,000,000 Factoring 1024-bit RSA keys using Number Field Sieve (NFS) Polynomial Selection Relation Collection Sieving 200 bit & 350 bit numbers Linear Algebra Square Root Norm Factoring/Cofactoring ECM p-1 method Pollard rho Sashisu Bajracharya Ramakrishna Bachimanchi Comparison among technologies Microprocessors SRC FPGAs COPACOBANA ASICs FPGAs vs. Microprocessors Spartan3s5000 Virtex2v6000 Pentium4 2.8GHz 10.8x 11.3x 857 869 7.9x 8.4x 635 637 7.8x 315 80 rho 10.8x 435 76 p-1 40 ECM Rho in an ASIC 130 nm Global Memory Local Memory ASIC 130 nm vs. Virtex II 6000 – rho (24 units) 19.68 mm 19.80 mm 51x Area of Virtex II 6000 (estimation by R.J. Lim Fong, MS Thesis, VPI, 2004) 2.7 mm 2.82 mm Area of an ASIC with equivalent functionality Number of computations per second using the same chip area 130 nm ASIC library 101x Virtex2v6000 FPGA 88,405 50x 21,739 869 rho 435 ECM Cofactorization Unit interesting Computer Arithmetic project Famous computer arithmetic bugs and flaws Learn to deal with approximations • In digital arithmetic one has to come to grips with approximation and questions like: – When is approximation good enough – What margin of error is acceptable • Be aware of the applications you are designing the arithmetic circuit or program for • Analyze the implications of your approximation Calculators u= ..... 2 = 1.000 677 131 v = 21/1024 = 1.000 677 131 10 times x = (((u2)2)…)2 = 1.999 999 963 y = (((v2)2)…)2 = 1.999 999 983 10 times 10 times x’ = u1024 = 1.999 999 973 y’ = v1024 = 1.999 999 994 Hidden digits in the internal representation of numbers Different algorithms give slightly different results Very good accuracy Consequences of bad approximations Example: Failure of Patriot Missile (1991 Feb. 25) Source http://www.math.psu.edu/dna/455.f96/disasters.html American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28 Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot) Specifics of the problem: time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds. Internal registers were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error @ 0.1100 1100 2–23 @ 9.5 10–8 Error in 100-hr operation period @ 9.5 10–8 100 60 60 10 = 0.34 s Distance traveled by Scud = (0.34 s) (1676 m/s) @ 570 m This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad time calculation had been improved in some (but not all) code parts contributed to the problem, since it meant that inaccuracies did not cancel out Consequences of bad approximations Example: Explosion of Ariane Rocket (1996 June 4) Source http://www.math.psu.edu/dna/455.f96/disasters.html Unmanned Ariane 5 rocket launched by the European Space Agency veered off its flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m) The $500 million rocket (with cargo) was on its 1st voyage after a decade of development costing $7 billion Cause: “software error in the inertial reference system” Specifics of the problem: a 64 bit floating point number relating to the horizontal velocity of the rocket was being converted to a 16 bit signed integer An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767) Pentium bug (1) October 1994 Thomas Nicely, Lynchburg Collage, Virginia finds an error in his computer calculations, and traces it back to the Pentium processor November 7, 1994 First press announcement, Electronic Engineering Times Late 1994 Tim Coe, Vitesse Semiconductor presents an example with the worst-case error c = 4 195 835/3 145 727 Pentium = 1.333 739 06... Correct result = 1.333 820 44... Pentium bug (2) Intel admits “subtle flaw” November 30, 1994 Intel’s white paper about the bug and its possible consequences Intel - average spreadsheet user affected once in 27,000 years IBM - average spreadsheet user affected once every 24 days Replacements based on customer needs December 20, 1994 Announcement of no-question-asked replacements Pentium bug (3) Error traced back to the look-up table used by the radix-4 SRT division algorithm 2048 cells, 1066 non-zero values {-2, -1, 1, 2} 5 non-zero values not downloaded correctly to the lookup table due to an error in the C script

Descargar
# No Slide Title