Design of Real Number FFT Algorithm and Its Implementation in C Language[Copy link]
At present, when teaching digital signal processing in China, the focus is on complex FFT when explaining fast Fourier transform (FFT), and the real FFT algorithm is only briefly mentioned. The specific implementation programs given in the books are mostly BASIC or FORTRAN programs and most of them cannot be actually run. In view of the fact that FFT operations are currently used in many embedded systems, such as AC sampling systems with DSP as the core, spectrum analysis, correlation analysis, etc. Based on my actual development experience, I have studied the real FFT algorithm and given a specific C language function, which readers can directly apply to their own systems. First, the derivation process of the real FFT algorithm is analyzed, and then a C language program that specifically implements the FFT algorithm is given, which can be directly applied to embedded systems such as single-chip microcomputers or DSPs that require FFT operations. 1 Analysis of the reverse order algorithm The FFT algorithm with time extraction (DIT) usually stores the original data in reverse order, and finally outputs the results X(0), X(1),..., X(k),... in the normal order. Assume that at the beginning, the data is in the array float dataR[128]. We represent the subscript i as (b6b5b4b3b2b1b0)b. The reverse order storage is to store the original element at the i-th position at the (b0b1b2b3b4b5b6)b position. Since the C language has strong bit operation capabilities, we can extract b6, b5, b4, b3, b2, b1, b0 respectively, and then recombine them into b0, b1, b2, b3, b4, b5, b6, which are the reverse order positions. The program segment is as follows (assuming 128-point FFT): /* i is the original storage position, and the final invert_pos is the inverted storage position*/ int b0=b1=b2=b3=b4=b5=6=0; b0=i&0x01; b1=(i/2)&0x01; b2=(i/4)&0x01; b3=(i/8)&0x01; b4=(i/16)&0x01; b5=(i/32)&0x01; b6=(i/64)&0x01; /*The above statements extract the 0 and 1 values of each bit*/ invert_pos=x0*64+x1*32+x2*16+x3*8+x4*4+x5*2+x6; You can compare it with the inversion program in the textbook and find that this algorithm makes full use of the bit operation capability of C language. It is very easy to understand and the bit operation speed is very fast. 2 Derivation of real number butterfly operation algorithm Let us first look at the butterfly diagram shown in Figure 1.
Butterfly formula: X(K) = X'(K) + X'(K+B)W PN , X(K+B) = X'(K) - X'(K+B) W PN Where W PN = cos(2πP/N)- jsin(2πP/N). Assume X(K+B) = XR(K+B) + jXI(K+B), X(K) = XR(K) + jXI(K), then: XR(K)+jXI(K)= XR'(K)+jXI'(K)+[ XR'(K+B) + jXI'(K+B)]*[ cos(2πP/N)-jsin(2πP/N)]; Continue to decompose and get the following two equations: XR(K)= XR'(K)+ XR'(K+B) cos(2πP/N)+ XI'(K+B) sin (2πP/N) (1) XI(K)= XI'(K)-XR'(K+B) sin(2πP/N)+XI'(K+B)cos (2πP/N) (2) It should be noted that: XR(K) and XR'(K) have the same storage location, so after (1) and (2), the value at that location has changed. X'(K) is needed to calculate X(K+B) below, so when programming, be sure to save XR'(K) and XI'(K) into two temporary variables, TR and TI. Similarly: XR(K+B)+jXI(K+B)= XR'(K)+jXI'(K)- [ XR'(K+B)+jXI'(K+B)] *[ cos(2πP/N)-jsin(2πP/N)]Continue to decompose and get the following two equations: XR(K+B)= XR'(K)-XR'(K+B) cos(2πP/N)- XI'(K+B) sin (2πP/N) (3) XI(K+B)= XI'(K)+ XR'(K+B) sin(2πP/N)- XI'(K+B) cos (2πP/N) (4) Note: ① When programming, XR'(K) and XI'(K) in equations (3) and (4) are replaced by TR and TI respectively. ② After formula (3), the value of XR(K+B) has changed, and the value of the previous level at this position is used in formula (4), so the value of the previous level XR'(K+B) must be saved before executing formula (3). ③ When programming, XR(K) and XR'(K), XI(K) and XI'(K) use the same variable. Through the above analysis, we only need to convert formulas (1), (2), (3), and (4) into C language statements. Pay attention to the intermediate storage of variables, as shown in the following program segment. /* Butterfly operation program segment, dataR[] stores the real part, dataI[] stores the imaginary part*/ /* Make a table of cos and sin functions, and directly look up the table to speed up the operation*/ TR=dataR[k]; TI=dataI[k]; temp=dataR[k+b];/*Save variables for use in the following statements*/ dataR[k]=dataR[k]+dataR[k+b]*cos_tab[p]+dataI[k+b]*sin_tab[p]; dataI[k]=dataI[k]-dataR[k+b]*sin_tab[p]+dataI[k+b]*cos_tab[p]; dataR[k+b]=TR-dataR[k+b]*cos_tab[p]-dataI[k+b]*sin_tab[p]; dataI[k+b]=TI+temp*sin_tab[p]-dataI[k+b]*cos_tab[p]; 3 Analysis of the basic idea of DIT FFT algorithm We know that the N-point FFT operation can be divided into LOGN2 levels, and each level has N/2 discs. The basic idea of DIT FFT is to use 3 layers of loops to complete all operations (N-point FFT). The first layer of loop: Since N=2m requires m-level calculations, the first layer of loop controls the number of levels of operations. The second layer of loop: Since the L-th level has 2L-1 butterfly factors (multipliers), the second layer of loop is controlled according to the multipliers to ensure that the third layer of loop is executed once for each butterfly factor. In this way, the third layer of loop is controlled by the second layer of loop, and each level must perform 2L-1 loop calculations. The third loop: Since there are N/2L groups in the Lth level, and the multipliers of different groups in the same level are distributed in the same way, when the second loop determines a multiplier, the third loop will calculate the butterfly with this multiplier in each group in this level once, that is, N/2L butterfly calculations will be performed each time the third loop is executed. It can be concluded that in each level, the third loop completes N/2L butterfly calculations; the second loop makes the third loop perform 2L-1 times, so when the second loop is completed, a total of 2L-1 *N/2L=N/2 butterfly calculations are performed. In essence, the second and third loops complete the calculation of the Lth level. Several data to note: ① In the Lth level, the two input ends of each butterfly are b=2L-1 points apart. ② The same multiplier corresponds to N/2L butterfly with adjacent intervals of 2L points. ③ The P in the 2L-1 dishing factors WPN of the Lth level can be expressed as p = j*2m-L, where j = 0, 1, 2, ..., (2L-1-1). The above is an analysis and study of the FFT algorithm in embedded systems. Readers can directly apply the algorithm to their own systems. Welcome to write to discuss. (Email: xiaowanang@163.net) Attached is a 128-point DIT FFT function: /* The sampled data is placed in the dataR[ ] array. The dataI[ ] array is initialized to 0 before operation */ void FFT(float dataR[],float dataI[]) {int x0,x1,x2,x3,x4,x5,x6; int L,j,k,b,p; float TR,TI,temp; /********** following code invert sequence ************/ for(i=0;i<128;i++) { x0=x1=x2=x3=x4=x5=x6=0; x0=i&0x01; x1=(i/2)&0x01; x2=(i/4)&0x01; x3=(i/8)&0x01;x4=(i/16)&0x01; x5=(i/32)&0x01; x6=(i/64)&0x01; xx=x0*64+x1*32+x2*16+x3*8+x4*4+x5*2+x6; dataI[xx]=dataR; } for(i=0;i<128;i++) { dataR=dataI; dataI=0; } /************** following code FFT *******************/ for(L=1;L<=7;L++) { /* for(1) */ b=1; i=L-1; while(i>0) {b=b*2; i--;} /* b= 2^(L-1) */ for(j=0;j<=b-1;j++) /* for (2) */ { p=1; i=7-L; while(i>0) /* p=pow(2,7-L)*j; */ {p=p*2; i--;} p=p*j; for(k=j;k<128;k=k+2*b) /* for (3) */ { TR=dataR[k]; TI=dataI[k]; temp=dataR[k+b]; dataR[k]=dataR[k]+dataR[k+b]*cos_tab[p]+dataI[k+b]*sin_tab[p]; dataI[k]=dataI[k]-dataR[k+b]*sin_tab[p]+dataI[k+b]*cos_tab[p]; dataR[k+b]=TR-dataR[k+b]*cos_tab[p]-dataI[k+b]*sin_tab[p]; dataI[k+b]=TI+temp*sin_tab[p]-dataI[k+b]*cos_tab[p]; } /* END for (3) */ } /* END for (2) */ } /* END for (1) */ for(i=0;i<32;i++){ /* Only harmonics below the 32th order are needed for analysis*/ w=sqrt(dataR*dataR+dataI*dataI); w=w/64;} w[0]=w[0]/2; } /* END FFT */