Research on Key Issues in FPGA Design-EEWORLD

Collect

With the improvement of FPGA (Field Programmable Gate Array) capacity, function and reliability, its application in modern digital communication systems is becoming more and more extensive. Using FPGA to design digital circuits has become one of the main design methods in the field of digital circuit systems. In signal processing and the control of the entire system, FPGA can not only greatly reduce the size of the circuit and improve the stability of the circuit, but also its advanced development tools can greatly shorten the design and debugging cycle of the entire system.
Based on the author's experience and experience, this article points out some difficult problems in FPGA design, analyzes the causes of the problems and gives solutions, which is conducive to FPGA designers to avoid detours and master FPGA design technology in a shorter time.
1 FPGA design process
FPGA design mostly adopts a top-down design process, which is generally divided into seven steps: design specification, design input, synthesis, functional simulation (pre-simulation), logic implementation, timing simulation (post-simulation), configuration download, etc.
2 Core issues of FPGA design
2.1 Clock design
In any digital circuit design, a reliable clock is very critical. Clocks can generally be divided into several types such as global clocks, gated clocks and multi-level logic clocks.
[i] 2.1.1 Global Clock
[/i] Global clock or synchronous clock is the simplest and most reliable clock.
The best solution for clock in FPGA design is: a single master clock driven by a dedicated global clock input pin to clock each timing device in the design. Whenever possible, you should try to use a global clock in your design project. FPGAs have a dedicated global clock pin that is directly connected to each register in the device. In the device, this global clock can provide the shortest clock delay (the time from data input to data arrival at the output).
[i] 2.1.2 Gated Clock]
[/i] In many applications, it is not practical to use an external global clock. Usually, an array clock is used to form a gated clock. Gated clocks are often related to microprocessor interfaces. Whenever a combinational function clocked flip-flop is used, a gated clock is usually present. If the following conditions are met, a gated clock can work as reliably as a global clock:
(1) The logic driving the clock must contain only one AND gate or OR gate;
(2) One input of the logic gate is the actual clock, and all other inputs of the logic gate must be address or control lines that constrain the clock setup and hold time. Of course, the gated clock can also be converted to a global clock to improve the reliability of the design project.
2.1.3 Multi-clock system
Many applications require multiple clocks in the same FPGA, such as the interface between two asynchronous microprocessors or the interface between a microprocessor and an asynchronous communication channel. Since a certain setup and hold time is required between the two clock signals, additional timing constraints are introduced to synchronize certain asynchronous signals. In many systems, it is not enough to synchronize asynchronous signals. When there are two or more non-coherent clocks in the system, the data setup and hold time is difficult to guarantee. The best solution is to synchronize all non-coherent clocks. Using the phase-locked loop (PLL) module inside the FPGA is a good method. If PLL is not used, when the frequency ratio of the two clocks is an integer, the synchronization method is relatively simple; when the frequency ratio of the two clocks is not an integer, the processing method is much more complicated. In this case, it is necessary to use a D flip-flop with an enable terminal and introduce a high-frequency clock to achieve this.
[i] 2.1.4 Clock skew
[/i] Clock skew is one of the most serious problems in FPGA design. The distance from the clock source that controls the synchronous operation of each component in the circuit to each component varies greatly. Clock skew is the time difference required to detect a valid clock transition edge at different components in the system. In order to ensure the setup and hold time of each component, the skew must be small enough. If the degree of skew is greater than the delay time from the output of an edge-sensitive memory to the input of the next stage, the data of the shift register can be lost and the synchronous counter output will be wrong. Therefore, it is necessary to find a way to eliminate clock skew. There are several ways to reduce clock skew:
(1) Use an appropriate clock buffer, or add a certain delay between the output of the edge-sensitive device and the input of any edge-sensitive device it feeds to reduce the skew.
(2) Severe clock skew is often caused by excessive load on the clock and other global control lines (such as reset lines) in the FPGA. Connecting a series of linear buffers to the signal line can gradually increase the drive strength and eliminate clock skew.
(3) Connect buffers after the components controlled by the clock, and connect a balancing network between the output ends of the two buffers.
(4) The PLL module in the FPGA can be used to divide and multiply the input clock, thereby minimizing clock skew.

2.2 Glitch signals and their elimination
In combinational logic circuits, signals must pass through a series of gate circuits and signal transformations. Due to the effect of delay, when the input signal changes, its output signal cannot synchronously follow the input signal change, but can only reach the original expected state after a transition period. At this time, small parasitic glitch signals will be generated, causing the circuit to produce instantaneous erroneous outputs, resulting in instantaneous disorder of logical functions. There are no distributed inductors and capacitors inside the FPGA, and unforeseen glitch signals can be transmitted through the designed circuit, causing the circuit to have erroneous logical outputs.
Any combinational circuit, feedback circuit and counter may be a potential glitch signal generator. Glitches are not harmful to all inputs. For example, as long as the D input of the trigger does not appear on the rising edge of the clock and meets the data setup and hold time, it will not cause harm to the system. However, when the glitch signal becomes the system's startup signal, control signal, handshake signal, the trigger's clear signal (CLEAR), preset signal (PRESET), clock input signal (CLK) or latch input signal, logical errors will occur. Any glitch may cause the system to fail, so eliminating glitch signals is an important issue in FPGA design. The cause of the glitch problem cannot be found in the circuit connection, and measures can only be taken from the logic design to solve it. The general methods for eliminating glitches are as follows:
(1) Eliminating glitches using redundant items
Functional expressions and truth tables describe static logic, while competition is the process from one steady state to another. Therefore, competition is a dynamic process that occurs when the input variable changes. At this time, modifying the Karnaugh map, adding redundant items, and adding a circle at the tangent point of the two circles of the Karnaugh map can eliminate logical hazards. However, this method cannot eliminate the burrs generated by the counter type.
(2) Sampling method
Since the risk occurs when the variable changes, if the sampling pulse is added after the signal stabilizes, the signal output during the sampling pulse will be valid. This can prevent the generated burrs from affecting the output waveform.
(3) Absorption method
Adding output filtering and connecting a small capacitor C at the output end can filter out burrs, as shown in Figure 3. However, the leading and trailing edges of the output waveform will deteriorate. When the waveform requirements are strict, a shaping circuit should be added. This method is not suitable for use in the intermediate stage.

2009-04-19 17:49:33

material

BBK
【1楼】

Promotion points: 24
Posts: 83
Points: 427

(4) Delay method
Because glitches are ultimately caused by delays, the branch that produces the delay can be found. For branches with relatively small delays, adding a delay equal to the width of the glitch can eliminate the glitches. However, sometimes as the load increases, the glitches will continue to appear, so this method is also limited. Moreover, the delay line method used to generate delays will make the system unreliable due to changes in ambient temperature.
(5) Latch method
Glitches are generated when the output of the counter is ANDed or ORed. As the number of bits of the counter increases, the number and types of glitches will become more and more complex.
When the FPGA output has edge or level sensitive signals from other parts of the system, the combinational outputs that are sensitive to hazards should be registered at the output. For asynchronous inputs, the setup and hold times required by the state machine can be ensured by adding input registers. For glitches generated under normal circumstances, you can try to eliminate them with a D flip-flop. However, when using a D flip-flop to eliminate them, the timing is sometimes affected, and many issues need to be considered. Therefore, it is necessary to carefully analyze the source of the glitches and the nature of the glitches, and use circuit modification or other methods to completely eliminate them.

2.3 Delay Design in FPGA
When a signal in a circuit needs to be delayed for a period of time, some "not gates" or other gate circuits can be connected in series after the signal. However, in FPGA, the development software will remove these gates as redundant logic during the integrated design, and the delay effect cannot be achieved. When developing FPGA with ALTERA's MAXPLUSII, a certain delay can be generated by inserting LCELL primitives or calling delay line modules. However, the delay formed in this way is not stable in the FPGA chip and will change with changes in external environment such as temperature, which will affect the performance of the FPGA. Therefore, a high-frequency clock can be used to drive a shift register, and the signal to be delayed is used as the data input. The number of stages of the shift register is correctly set according to the required delay, and the output of the shift register is the delayed signal. The delay signal generated by this method has an error, and the error size is determined by the period of the high-frequency clock. For the delay of the data signal, the error can be eliminated by resampling the delayed signal with the data clock at the output end. Of course, when the required delay is long, this is a waste of resources. In addition, when using VHDL language for FPGA design, after statement cannot be used to implement delay, because the current synthesis tools cannot achieve such precise delay, that is, after statement in the program cannot be synthesized.

2.4 Synchronous Circuit Design in FPGA
2.4.1 Synchronous Circuit and Asynchronous Circuit
Asynchronous circuit is mainly a combinational logic circuit, which is used to generate read and write control signal pulses of address decoder, FIFO or RAM. Its logic output has nothing to do with any clock signal, and the burrs generated by the decoded output can usually be monitored. Synchronous circuit is a circuit composed of timing circuit (register and various triggers) and combinational logic circuit, and all its operations are completed under strict clock control. These timing circuits share the same clock CLK, and all state changes are completed on the rising edge (or falling edge) of the clock. For example, for D flip-flop, when the rising delay arrives, the register transfers the level of the D end to the Q output end. The following introduces the problem of setup and hold time. Setup time (tsu) refers to the time when the data is stable and unchanged before the rising edge of the trigger clock arrives. If the setup time is not enough, the data will not be entered into the trigger at this rising edge of the clock; the hold time (th) refers to the time that the data remains stable after the rising edge of the trigger clock arrives. If the hold time is not enough, the data cannot be entered into the trigger either. The stable transmission of data must meet the requirements of the setup time and the hold time, otherwise the circuit will have logic errors.
For example, when the Q output of a D flip-flop is directly fed to the D input of another flip-flop, the first D flip-flop can meet the setup and hold time, but the delay to the second D flip-flop may not be enough to meet the hold time requirement of the second flip-flop. At this time, a logic error will occur, and the error will be more serious when the clock is skewed. The solution is to add a buffer to the Q end of the first flip-flop, as shown in Figure 7. This can meet the timing requirements of the second flip-flop. Another solution is to use a low-drive source D-type flip-flop without a buffer. The high relative fan-out helps improve the hold time.
Synchronous digital circuit systems are absolutely dominant today. Engineers often use them to design all imaginable digital circuits, with frequencies ranging from DC to several GHz. Compared with asynchronous circuits, synchronous circuits have the following advantages:
(1) Synchronous circuits can maintain normal operation under the condition of temperature, voltage, process and other parameters, while the performance of asynchronous circuits is usually related to ambient temperature, operating voltage and production process.
(2) Synchronous circuits are portable and easy to adopt new or more advanced technologies, while asynchronous circuits are difficult to reuse and maintain.
(3) Synchronous circuits can simplify the interface between two modules, while asynchronous circuits require handshake signals or tokens to ensure signal integrity.
(4) Designing synchronous circuits with D flip-flops or registers can eliminate glitches and synchronize internal skewed data. Asynchronous circuits do not have this advantage and are difficult to simulate and debug, and cannot be well integrated.
Synchronous circuits also have disadvantages because they require timing devices, which will consume more logic gate resources than asynchronous circuits. Although asynchronous circuits are faster and consume less power, since current FPGA chips have millions of gates, this does not need to be too much of a concern. The author recommends avoiding asynchronous circuits as much as possible and using synchronous circuits for design.

2.4.2 Using pipeline technology to improve the speed of synchronous circuits
The speed of synchronous circuits refers to the speed of the synchronous system clock. The faster the synchronous clock, the shorter the time interval for the circuit to process data, and the greater the amount of data processed by the circuit per unit time.

Tco is the delay time from the input data of the trigger being clocked into the trigger to the data reaching the output of the trigger; Tdelay is the delay of the combinational logic; Tsetup is the setup time of the D trigger. Assuming that the data has been clocked into the D trigger, the delay time required for the data to reach the Q output of the first trigger is Tco, and the delay time after the combinational logic is Tdelay, and then it reaches the D end of the second trigger. If the clock is expected to be stably clocked into the trigger again in the second trigger, the clock delay must be greater than Tco+Tdelay+Tsetup, that is, the minimum clock cycle Tmin=Tco+Tdelay+Tsetup, that is, the fastest clock frequency Fmax=1/Tmin. FPGA development software also uses this method to calculate the maximum operating speed Fmax of the system. Because Tco and Tsetup are determined by the specific device process, only the delay time Tdelay of the combinational logic can be changed when designing the circuit. Therefore, shortening the delay time of the combinational logic between triggers is the key to improving the speed of the synchronous circuit. Since the general synchronous circuit is larger than the first-level latch, the clock cycle must meet the maximum delay requirement to make the circuit work stably. Therefore, only by shortening the longest delay path can the operating frequency of the circuit be increased. The larger combinational logic can be decomposed into N smaller blocks, and the combinational logic can be evenly distributed through appropriate methods. Then, a trigger is inserted in the middle and the same clock as the original trigger is used. This can avoid excessive delay between the two triggers and eliminate the speed bottleneck, so that the operating frequency of the circuit can be increased. This is the basic design idea of the so-called "pipeline" technology, that is, the speed-limited part of the original design is realized with one clock cycle. After inserting the trigger using pipeline technology, it can be realized with N clock cycles, so the system can work faster and the throughput can be increased. Note that the pipeline design will add delay to the original data path, and the hardware area will also increase slightly.

3 Other issues that should be noted in FPGA design
(1) All state machine inputs, including reset and set signals, must use synchronous signals. All state machine outputs must be stored in registers. Be careful not to have a deadlock state in the state machine design.
(2) Use registers and triggers to design circuits, and try not to use latches, because they are too sensitive to input signal glitches. If you insist on using latches, you must ensure that the input signal is absolutely glitch-free and meets the hold time.
(3) Be very careful when designing decoding logic circuits, because decoders and comparators themselves will generate spikes and are prone to glitches. Connecting the output of the decoder or comparator directly to the clock input or asynchronous clear terminal will cause serious consequences.
(4) Try to avoid the appearance of implicit RS triggers. Generally, the control output is directly fed back to the input. Using a feedback loop will result in an implicit RS trigger, which is very sensitive to input spikes and false signals. Any change in the input may cause the output value to change immediately. At this time, it is easy to cause glitches, resulting in serious timing confusion. Once there is an implicit RS trigger, adding a latch to eliminate glitches will not solve the problem. At this time, the only way to fundamentally solve the problem is to modify the circuit comprehensively.
(5) Use only one clock in each module to avoid using multiple clock designs. At the same time, avoid using the secondary clock after the main clock is divided as the clock input of the timing device, because the secondary clock may have too much clock skew relative to the primary clock. The input clock, input signal, and output signal of all modules are synchronized using D flip-flops or registers, that is, the output signal comes directly from the output end of the flip-flop or register. This can eliminate spikes and glitches. Whether it is a control signal or an address bus signal or a data bus signal, another register must be used to turn the internal skewed data into synchronized data. These seemingly useless operations can greatly improve the performance of the circuit system.
(6) Delay lines should be avoided as much as possible, because they are extremely sensitive to changes in the process, which will greatly reduce the stability and reliability of the circuit and will cause trouble for testing.
(7) Most FPGA devices provide special global routing resources for clock, reset, preset and other signals. Make full use of these resources. This can reduce glitches in the circuit and greatly improve the performance of the designed circuit.
(8) Do not try to use HDL language to synthesize RAM, ROM or FIFO and other storage modules. Current synthesis tools are mainly used to generate logic circuits. If you need to use these modules, you can directly call or instantiate the corresponding macro units.
(9) Pay attention to the inconsistency between the simulation results and the actual synthesized circuits. Whether it is a sequential circuit or an asynchronous logic circuit, its behavior is not exactly the same as its simulator result. Especially for asynchronous logic circuits, the simulation results will hide competition risks and burrs, which are far from the actual behavior. Therefore, in FPGA design, every logic gate and every line of VHDL (Verilog) language must be fully understood. Don't expect the simulator to find errors for you. A good design engineer should know how to improve circuit performance by modifying the design, rather than blaming the software used.

Using FPGA to develop digital circuits can greatly shorten the design time, reduce PCB area, and improve system reliability. These advantages have led to the rapid development of FPGA technology, which has been widely used in communications, electronics, signal processing, industrial control and other fields. With the increase of FPGA capacity, the application era of SOPC (signal processing and control of the entire system) is coming. SOPC has both embedded processors, I/O circuits and large-scale embedded memories, as well as CPLD/FPGA, which users can choose. At the same time, you can also choose FPGA IP cores provided by PLD companies. The use of IP cores can ensure the development efficiency and quality of system-level chips and greatly shorten product development time. Therefore, FPGA has become one of the important options for solving system-level design. This article studies the key issues in FPGA design, and proposes the main problems and solutions that affect system reliability in design, hoping to provide a certain reference for FPGA designers.