Optimizing the feedback loop when input data has gaps

fish001 · Published on 2018-10-23 18:52

Optimizing the feedback loop when input data has gaps [Copy link]

In some cases, in communication digital signal processing, designs with feedback loops are often encountered. For example, digital clock recovery or adaptive equalization filters. The basis of this type of design is to filter or equalize the input data, and the coefficients used for the above calculations come from the output results of the filter or equalizer. In other words, the output of this data depends on the previous output data. It may be necessary to re-transform the coefficients calculated by the loop calculation/coefficient calculation before loading them into the filter or equalizer. At this time, there will be an additional coefficient transformation link in the middle of the loop calculation/coefficient calculation. At this time, there is an additional coefficient transformation module at the output of the loop calculation/coefficient calculation. This module can be an FFT or other transformation. Assuming that there is a gap in the input data at this time (that is, not every clock beat is valid), then the design with feedback loops should pay special attention to this. Since there is a delay in the loop calculation/coefficient calculation and coefficient transformation, when there is a gap in the data, how should the coefficients passed to the filter be handled? At this time, there is a common solution, which is to cache the coefficients through a buffer, wait until new valid data comes out, and then read the data from the buffer. Note that since the delay of the feedback loop has a great impact on system performance, the buffer cannot simply use the traditional FIFO at this time, because the status signal (empty, almost empty) of the traditional FIFO has at least one cycle of delay. What we need is a cache that can immediately reflect the status, that is, once the cache is written, the non-empty status can be immediately obtained without waiting for the next cycle to know. There is another key point. In large-scale design, assuming that the input of the coefficient transformation module in Figure 2 is only 10 parallel numbers, but the output is 128 parallel numbers, then the location of the cache directly affects the logic resources, because 128 caches are required at the output end of the coefficient transformation, but only 10 caches are required at the input end. Therefore, at this time, we will definitely tend to place the cache at the input end of the coefficient transformation module. There is a problem at this time, that is, if the coefficient transformation is a pure pipeline processing, then the data will not be stored in the buffer until the gap of the input data Din appears. At this time, the valid data in the register of the coefficient transformation module will be passed forward in the next clock cycle, and finally act on invalid data. Therefore, one method is to use the valid signal of the input data to control the register of the coefficient transformation module. When the input data is invalid, all the registers in the pipeline of the coefficient transformation module are kept and no longer flipped. However, this method has a very large design flaw. If the coefficient transformation module is fully parallel and has a large degree of parallelism, and there are many pipeline stages, then a valid data signal will control many registers at this time, and the fan-out will be amazing. Even if the fan-out can be reduced by copying or adding buffers, it may not be able to get a good result for wiring. So is there a better way? Of course there is! At this time, a good way is proposed: take the data status indication signal of the data Din that will enter the filter/equalizer in advance. As for how many clock cycles to advance, it depends on the delay of the coefficient transformation module. Assuming that the delay of the coefficient transformation module is 3 cycles, then 3 cycles before the data Din entering the filter/equalizer is invalid, the data is written to the cache, and the invalid indication signal of Din is used as the write enable signal of the cache. The data input to the coefficient conversion module will remain unchanged at this time, so after 3 clock cycles, all valid data in the coefficient conversion module pipeline will be out, and the data gap will just appear at this time. The data indication signal 7 clock cycles in advance is used as the cache read enable signal. At this time, when the gap is gone and the data becomes valid, the valid data just fills all the pipelines and re-enters the filter/equalizer calculation. This method greatly reduces the fan-out of the data valid signal and truly optimizes the logic circuit. It is a very good idea.