Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs NASA 2005 Military and Aerospace Programmable Logic Devices (MAPLD) International Conference John Porcello L-3 Communications, Inc. Cleared by DOD/OFOISR for Public Release under 05-S-2094 on 24 August 2005 J. Porcello MAPLD 2005 #167 Page 1 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Outline Background Automation Techniques DSP Algorithm Design HDL Coding and Synthesis Timing & Placement Hardware-In-The-Loop (HITL) Test and Verification Case Study: Direct Digital Synthesizer (DDS) using Xilinx Virtex-4 XtremeDSP Summary J. Porcello MAPLD 2005 #167 Page 2 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Background Field Programmable Gate Arrays (FPGAs) are the leading implementation path for Reprogrammable, High Performance Digital Signal Processing (DSP) Applications. The performance advantage of FPGAs over Programmable DSPs is a driving factor for implementing DSP designs in an FPGA. Using VHDL and Verilog Hardware Description Languages (HDL) is often a lengthy development path to implement a DSP design into an FPGA. FPGA development tools are using HDL and non-HDL DSP Intellectual Property (IP) to reduce the design and implementation time. This concept and approach is successful at reducing the design and implementation cycle and increasing productivity in many applications. However, High Performance DSP implementations using dedicated HDL still provide the greatest flexibility for implementing High Performance DSP Algorithms… WHY? J. Porcello MAPLD 2005 #167 Page 3 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Three (3) Reasons to use a dedicated HDL Implementation Path for a High Performance DSP Application 1) Control: Available IP can’t achieve required performance and functionality. 2) Complexity: Increasing DSP Algorithm Complexity requires unique tailoring for the application. 3) Components: FPGA architectures are increasing the number of dedicated components other than FPGA fabric (embedded multipliers, hard microprocessors, dedicated transceivers, application specific devices, etc). Low level control is required to maximize these components into a high performance design. J. Porcello MAPLD 2005 #167 Page 4 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Major Advantages and Disadvantages using the HDL Implementation Path for High Performance DSP Applications • Low Level Control and flexibility to achieve required or specific performance (+) • Design, development and integration of various IP cores (+) • Source level control of DSP design (+) • Considerable design and implementation path relative to non-HDL implementation path (-) • Extensive Debug, Test and Verification Path (-) Can we reduce or eliminate any of these disadvantages to improve productivity? J. Porcello MAPLD 2005 #167 Page 5 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs YES The Objectives of Automation Techniques - Identify and apply methods useful for faster implementation of High Performance DSP Designs. • Reduce Design and Implementation Time • Perform Error Checking • Develop greater insight into successful high performance DSP Implementations by automating techniques Specific focus areas to achieve objectives: • DSP Algorithm Design • HDL Coding and Synthesis • Timing & Placement • Hardware-In-The-Loop (HITL) Test and Verification If one of these processes cannot meet required performance, it is often necessary to back up and apply techniques to collect data to study the problem. J. Porcello MAPLD 2005 #167 Page 6 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Automation Techniques - Not a new concept. No single direct formula for applying them. Automation Techniques are a function of DSP design and FPGA implementation processes. Automation Techniques are a means to improve and refine these processes. A look at the overall design through to implementation is required. Automation Techniques are then developed to improve processes. Consider the following processes and goals: Process Goal DSP Algorithm Design Produce a DSP Algorithm structured for an FPGA (function). HDL Coding and Synthesis Synthesizable DSP functions and performance (implementation). Timing & Placement DSP timing and interface performance (speed). H/W-In-The-Loop (HITL) Test and Verification DSP numerical and interface performance (accuracy, speed). Automation Techniques can be applied to improve these processes. J. Porcello MAPLD 2005 #167 Page 7 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Considerations for developing Automation Techniques 1) Technical: Automation Technique(s) are often required to go beyond the basics, and increase technical capabilities: A substantial amount of data will be generated, tested or analyzed to quantify performance. This includes the DSP design (truth vectors) and FPGA testing (DUT). Develop greater insight into DSP Design and FPGA Implementation. Solve a specific problem. Current processes not effective. Improve DSP Design and FPGA Implementation processes in terms of efficiency and productivity. 2) Cost: Development of Automation Techniques easily provide a cost benefit for processing large amounts of data. Other techniques may require substantial Non-Recurring Engineering (NRE) to design, develop and implement. In these cases, Automation Techniques must provide substantial benefit to justify the NRE. Substantial effort to develop Automation Techniques for High Performance DSP Algorithms can often be applied when there is significant near-term benefit (current project) or long-term benefit (marketing new DSP algorithms with increased functionality and/or improved performance). J. Porcello MAPLD 2005 #167 Page 8 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design - The DSP Algorithm has the greatest impact on the implementation and performance. Best practice matches the DSP Algorithm to the FPGA Architecture. Knowledge of target hardware architecture is important to reduce a DSP Algorithm to equivalent high performance functions within an FPGA. The class of DSP Algorithm is significant (wide variation): Filter, FFT, Multiply and Accumulate (MAC), Up/Down Converters Carrier Recovery, Timing and Synchronization Direct Digital Synthesizers (DDS), Waveform Generators Systolic Arrays, Matrix Methods, Statistical DSP Beam Forming, Image Processing Wideband, High Speed Spectral Processing Full parallel (unrolled, unfolded) implementations of iterative DSP Algorithms yield significant increase in performance at the expense of FPGA resources. J. Porcello MAPLD 2005 #167 Page 9 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile Systolic Arrays are small, interconnected arrays of DSP Processing Elements (PEs). Very useful for many high performance DSP applications such as Digital Filters and Matrix Processing. Systolic arrays are typically full parallel structures processing one data sample per clock. Used in many VLSI designs, they can be 1-Dimensional or Multidimensional. Systolic array can be mapped from DSP equations consisting of iterative algorithms that can be “unrolled” (Filters, FFTs, etc.) . Latency is higher since data flow is through each element. However, structures of this type may be implemented using FPGA fabric and/or dedicated FPGA components over high speed interconnects. 1D Systolic Array Input Output Processing Element (PE) J. Porcello MAPLD 2005 #167 Page 10 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.) Processing Element (PE) FPGA Embedded Component: Xilinx Virtex-4 XtremeDSP Tile consists of two (2) DSP48 slices: Dedicated, pipelined MULT, Add/Subtract, ACC, MACC, Shift, Divide, Square Root, etc. High speed, dedicated interconnects between DSP48 slices and to other XtremeDSP tiles Dynamically configurable functions (via OPMODE) Highest performance achieved w/out FPGA fabric 1D Systolic Array Input Ref. Xilinx XtremeDSP Design Considerations User Guide, Courtesy of Xilinx, Inc. Output Processing Element (PE) J. Porcello MAPLD 2005 #167 Page 11 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.) 1 Dimensional Systolic Array: FIR filter with constant coefficients, relatively easy to manage design and implementation. 1D Systolic Array FIR Filter N y[ n ] N h[ k ] x[ n k ] PE k 0 with k k 0 PE k h[ k ] x[ n k ] 1D Systolic Array – FIR Filter Input PE 0 PE 1 PE 2 PE N Output Processing Element (PE) Routing over dedicated, high speed interconnect J. Porcello MAPLD 2005 #167 Page 12 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design - Systolic Array Design using the Xilinx Virtex-4 XtremeDSP Tile (cont.) 2 Dimensional Systolic Array: Increasing capabilities in DSP applications at the expense of increasing algorithm complexity. 2D Systolic Array N-Point FFT N 1 X [k ] x[ n ] W n0 2D Systolic Array – FFT Routing over FPGA fabric k N Input j 2 kn with W N e k N Reduce to Even and Odd PEs: PE N _ Even PE N _ Odd Apply DSP Algorithm Automation Techniques to manage complex DSP design, debugging, test and validation. J. Porcello MAPLD 2005 #167 Output Page 13 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs DSP Algorithm Design Automation Techniques DSP Design Validation, Quantifying Required Algorithm Performance and Limitations: Automating tools and simulations to perform extensive end-to-end test, data reduction and analysis, and algorithm validation. Automated techniques are useful in DSP designs where algorithm confidence level over a broad performance range requires substantial baseline of test data. Techniques may utilize scripts or custom programs (MATLAB, C/C++, etc.) to verify algorithm numerical accuracy or maximum error, using simulated or actual test data. Methods used to validate a DSP algorithm are very important. Testing and Debugging DSP Modular Functions: Automating generation of “truth” data or vectors for test and analysis of synthesizable DSP functional building blocks. Algorithm Strength Reduction: Testing and evaluating alternate, equivalent DSP Algorithms and mathematically equivalent functions (symmetry, periodicity, transform reduction, etc.). Functions that will have a higher performance and/or consume fewer FPGA resources. J. Porcello MAPLD 2005 #167 Page 14 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs HDL Coding and Synthesis HDL Coding style directly impacts FPGA Implementation. Good Coding techniques use HDL Coding Styles that support Scalable and Modular DSP designs (use of generics, VHDL generate, etc.). Important to tailor HDL coding to maximize Synthesis Tool. Full Parallel implementations often require dividing up the DSP processing into small operations that can be performed during very short clock periods. This amounts to isolating functions or breaking up processing over several clock cycles at increased latency (and additional FPGA resources) to maintain throughput. Maximize DSP processing onto high-speed interconnects for dedicated DSP components, such as the XtremeDSP tile, whenever possible. J. Porcello MAPLD 2005 #167 Page 15 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs HDL Coding and Synthesis Automation Techniques • Autocoding Functions: Autocoding routines can be used to automatically implement (or change) HDL code: Custom DSP Functions that must be divided up across several clock cycles to operate at maximum speed Clocking Techniques, Positive and Negative Edge HDL implementations Built-In-Test (BIT) Vector Generators / Vector Receivers: support debug, test and verification up to the system level. Place multiple BIT blocks at full throughput. Useful for debugging, analysis and insight into successful High Performance DSP Designs. Can be combined with HITL testing for performance verification. • HDL Converters: convert code (interpret code) from another language to Synthesizable HDL. Effective converter tools may be implemented for porting algorithms to FPGA platforms. J. Porcello MAPLD 2005 #167 Page 16 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs HDL Coding and Synthesis Automation Techniques • Synthesis Profiling: Batch processing multiple Synthesis runs to obtain insight into the synthesis of a design: Establish desired variations in an HDL design for analysis. Generate multiple versions or incrementally modify HDL parameters in the design via C/C++, script or equivalent code. Batch process Synthesis Tool with synthesis constraints and obtain synthesis report. Batch processing via script or command line, refer to synthesis tool manual, such as the Xilinx Synthesis Technology (XST) User Guide for an XST design flow. Extract desired performance parameters from the Synthesis Report via C/C++, script or equivalent code. (continued next slide) J. Porcello MAPLD 2005 #167 Page 17 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs HDL Coding and Synthesis Automation Techniques • Synthesis Profiling: (continued) Repeat process until sufficient information from multiple synthesis runs are collected. Analyze the results of the multiple Synthesis runs. Profile the performance impact of parameters on the synthesis of the design. Useful for profiling effect of DSP Design and HDL coding parameters on Synthesis, performing design tradeoffs, best-match analysis between DSP design and FPGA Implementation, and obtaining insight into successful High Performance DSP Designs. Combine with Timing and Placement Profiling for analyzing the entire FPGA implementation flow. FPGA Implementation Tools are usually well suited for command line processing of the entire implementation flow (example: Xilinx XFLOW). J. Porcello MAPLD 2005 #167 Page 18 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Timing and Placement Timing and Placement constraints direct the FPGA implementation tools and control the maximum speed and placement of the design. These constraints will directly impact many important performance criteria such as design margin, DSP throughput, pin placement, and data I/O. Effective methods exist such as the use of Relationally Placed Macros (RPMs) to create instances of specific DSP functions and direct their placement within the FPGA. Timing Analysis reveals details of the speed of a given implementation and design margin against performance requirements. The Timing Analysis must be carefully interpreted to draw conclusions and identify where recoding and/or change to synthesis, timing and placement constraints is necessary. Timing Analysis also reveals which functions within the DSP algorithm are the issue and may not be achievable given fixed resources (FPGA type) and performance requirements. This indicates that a fundamental change in the DSP function or HDL coding is required. J. Porcello MAPLD 2005 #167 Page 19 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Timing and Placement (cont.) High Performance DSP designs often require considerable attention to the data I/O for signal processing, in addition to the internal functionality of the algorithm. Successful High Performance DSP designs carefully match DSP functionality to high speed I/O lines. Interfacing the FPGA to other high performance components has to remain a consideration through design and implementation. Timing and Placement will take a substantial amount of time for large DSP implementations. Most tools are capable of running at the command line, which supports batch processing. Many timing and placement constraints are available for FPGA implementation. Careful interpretation and selection of timing and placement constraints is required. J. Porcello MAPLD 2005 #167 Page 20 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Timing and Placement Automation Techniques • Timing and Placement Profiling: Same as Synthesis Profiling with a few additional notes: Establish desired variations to constraints and/or pin placement of the design. Profile timing and placement constraints against a single synthesized design. Profiling a single set of constraints against multiple designs amounts to processing entire flow for different designs. Batch process Translation, Mapping and Place & Route Tools with timing and placement constraints and obtain performance parameters. Use C/C++, script or equivalent code used to extract desired performance parameters from these reports. Repeat process until sufficient information from multiple runs are collected. J. Porcello MAPLD 2005 #167 Page 21 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Timing and Placement Automation Techniques • Timing and Placement Profiling: (continued) Analyze and Interpret the results of multiple runs. Timing reports are available after the MAP and PAR processes. Profile timing and placement constraints only when multiple runs will provide insight into performance. Such as being combined with synthesis profiling over the entire implementation flow. Using timing analysis tools is a better approach than timing and placement profiling for debugging a single implementation that does not meet timing. J. Porcello MAPLD 2005 #167 Page 22 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Hardware-In-The-Loop (HITL) Test and Verification HITL: Direct Input and/or Output through one or more interfaces with the FPGA: Analog-To-Digital Converter (ADC) Digital-To-Analog Converter (DAC) On Chip Debugger (Dedicated IP cores for data capture and transfer via JTAG, local bus, I/O pins) Logic Analyzer interface to pins HITL is a real-time test configuration. HITL provides a significant advantage in terms of incremental design, test and verification: Real-Time “Divide-and-Conquer” Debugging and Test of modules and subsystems Inject and/or transmit real-time signals (interface testing) Event and anomaly capture Practical Performance Benchmarking, HITL used as a True Measure-Of-Performance J. Porcello MAPLD 2005 #167 Page 23 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Hardware-In-The-Loop (HITL) Test and Verification Automation Techniques HITL Automation Techniques are used to automate generating, collecting and processing large amounts of test data. This supports design validation, on-chip debugging, test and verification Test Equipment: Utilize COTS and/or custom automation software to control test instruments and inject input or store/analyze output. Supports interface and end-to-end testing HITL Data Reduction and Analysis: Collection and batch processing of large amounts of HITL data HITL Generated Performance Curves: Useful for quantifying actual performance data (Threshold Sensitivity, Frequency Stability, Error, etc.), compare to theoretical for design insight J. Porcello MAPLD 2005 #167 Page 24 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Case Study - Direct Digital Synthesizer (DDS) using Xilinx Virtex-4 XtremeDSP Objective: High Speed, High Resolution Multimode DDS for Communications, Radar, Navigation, Tracking SIGINT, ELINT High Speed Spectral Processing Software Defined Radio (SDR) EW, ECM, Self-Protection Jamming Performance: (Algorithm & FPGA Only – Pre DAC) J. Porcello Frequency Resolution: Frequency Tuning Speed: Spurious: Harmonics: Maximum Clock Speed: MAPLD 2005 #167 < < < < > Page 25 1 Hz 1 uSec -100 dBc -100 dBc 200 MHz H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Case Study - DDS using Xilinx Virtex-4 XtremeDSP DDS Block Diagram (I) Inphase Phase Per CLK Phase Accumulate DDS Transform (Q) Quadrature FM Mod J. Porcello PM Mod MAPLD 2005 #167 AM Mod Page 26 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Case Study - DDS using Xilinx Virtex-4 XtremeDSP Automation Technique: DSP Algorithm Verification and Analysis Tools DDS Block Diagram (I) Inphase Phase Per CLK Phase Accumulate DDS Transform (Q) Quadrature FM Mod PM Mod Automation Technique: HDL Coding and Synthesis – One time handcrafting required to meet performance. Now that a solution is verified, a scalable Autocoding function will be developed to implement this solution into the next High Performance DSP design J. Porcello AM Mod Note: Although Timing & Placement was important and required adjustment, no Automation Techniques were necessary to meet Performance Requirements MAPLD 2005 #167 Page 27 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Case Study - DDS using Xilinx Virtex-4 XtremeDSP DDS Output Power Spectrum: Automation Technique: HITL Debugging, Testing, Performance Analysis and Verification DDS Output Power Spectrum 0 -20 Relative Power (dBc) -40 -60 -80 -100 f CLK 100 MHz -120 0 0.5 1 1.5 2 2.5 3 3.5 4 Frequency J. Porcello MAPLD 2005 #167 4.5 5 7 x 10 Page 28 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Case Study - DDS using Xilinx Virtex-4 XtremeDSP DDS Spectrogram: 7 x 10 5 100 4.5 50 4 0 3.5 -50 3 Frequency Automation Technique: DSP Analysis and HITL Performance Analysis provides insight into this design. This class of DDS capable of faster frequency tuning speed, higher frequency resolution, and clock speed greater than 300 MHz using Register Balancing and Double Data Rate (DDR) techniques. 2.5 -100 2 -150 1.5 -200 1 -250 0.5 f CLK 100 MHz -300 0 -0.04 -0.02 0 0.02 0.04 0.06 0.08 Time J. Porcello MAPLD 2005 #167 Page 29 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Summary Automation Techniques can be applied to improve DSP design and FPGA implementation processes. Automation Techniques are a means to shorten development time, improve efficiency and manage substantial design, debugging, test and verification efforts. There is no direct formula for applying them. Examine the DSP design techniques, FPGA implementation flow and tools used for a project. Do not blindly apply automation techniques. Look for processes where a benefit can be realized by applying Automation Techniques. Refer to the Summary of Automation Techniques matrix, or create new techniques to meet requirements. Objectives of Automation Techniques: • Reduce Design and Implementation Time • Perform Error Checking • Develop greater insight into successful high performance DSP Implementations by automating techniques J. Porcello MAPLD 2005 #167 Page 30 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Summary (cont.) Specific focus areas to achieve objectives: • DSP Algorithm Design • HDL Coding and Synthesis • Timing & Placement • Hardware-In-The-Loop (HITL) Test and Verification Automation Techniques may be required to go beyond basic DSP design and FPGA implementation. Development of Automation Techniques easily provide a cost benefit for processing large amounts of data. Other Automation Techniques may require substantial NRE. For these cases, techniques must provide substantial benefit to the design and implementation process. J. Porcello MAPLD 2005 #167 Page 31 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Summary (cont.) Designing effective Automation Techniques for High Performance DSP Implementations requires understanding of DSP Design and FPGA Implementation Tools. Automation Techniques can be used to profile Synthesis, Timing and Placement of FPGA Implementations. Careful interpretation of this data is required. Automation Techniques can be used for High Performance DSP Designs that require substantial amounts of data, test, analysis and verification. J. Porcello MAPLD 2005 #167 Page 32 H Automation Techniques for Fast Implementation of High Performance DSP Algorithms in FPGAs Summary (cont.) Summary of Automation Techniques Automation Technique DSP Algorithm Design • HDL Coding & Synthesis • Timing & Placement • Batch Timing and Placement Profiling & Analysis HITL Test & Verification • End-to-End Design Verification Test Equipment Configuration Vector Generator/Vector Receiver Data Reduction and Analysis On-Chip Debugging Individual DSP Functions Real-Time Interface Testing Generating Actual Performance Curves DSP Design Validation • “Truth” Vector Generator/ Vector Receiver • Data Reduction and Analysis • Algorithm Strength Reduction Autocoding Batch Synthesis Profiling & Analysis • BIT Vector Generator/Vector Receiver • HDL Converters • • • • • • • J. Porcello MAPLD 2005 #167 Page 33 H

Descargar
# NASA - klabs.org