Introduce DSP data structure program optimization method

Jacktang

Introduce DSP data structure program optimization method [Copy link]

Choosing the right data structure is important. If you use a lot of insert and delete instructions on a bunch of randomly stored numbers, it will be much faster to use a linked list. Arrays and pointer statements are closely related. Generally speaking, pointers are more flexible and concise, while arrays are more intuitive and easy to understand. For most compilers, pointers generate shorter code and more efficient execution than arrays. In many cases, pointer operations can be used instead of array indexing, which often produces fast and short code. Compared with array indexing, pointers generally make the code faster and take up less space. The difference is more obvious when using multidimensional arrays. The following code has the same effect, but the efficiency is different. For(;;) { p=array A=array[t++]; for(;;){ } } Copy code The advantage of the pointer method is that after the address of array is loaded into address p, only p needs to be incremented in each loop. In the array indexing method, the complex operation of finding the array subscript based on the value of t must be performed in each loop. If you can use character variables (char), do not use integer variables to define them; if you can use integer variables, do not use long integers (long int) to define variables. , don't use floating-point variables if you can avoid it. Of course, don't exceed the scope of the variable after defining it. If you assign a value that exceeds the scope of the variable, the C compiler will not report an error, but the program will run with an error, and such errors are difficult to find. In ICCAVR, you can set the use of printf parameters in Options, try to use basic parameters (%c %d %x %X %u and %s format specifiers), and use less long integer parameters (%ld %lu %lx and %lX format specifiers). As for floating-point parameters (%f), try not to use them. The same is true for other C compilers. Under the condition that other conditions remain unchanged, using the %f parameter will increase the amount of generated code a lot and reduce the execution speed. A smart game prawn basically does not do any calculations in his main loop. He will definitely calculate first and then look up the table in the loop. See the following example: return i * factorial(i - 1); static long factorial_table[] = , 1 , 2 , 6 , 24, 120, return factorial_table ;
Copy code If the table is very large and difficult to write, write an init function to temporarily generate the table outside the loop. Note: Bit operations only require one instruction cycle to complete, while most C compilers use subroutines to complete % operations, which makes the code long and the execution speed slow. Usually, as long as the remainder of 2n is required, the bit operation method can be used instead. Note: In microcontrollers with built-in hardware multipliers (such as the 51 series), multiplication is much faster than squaring, because the squaring of floating-point numbers is implemented by calling subroutines. In AVR microcontrollers with built-in hardware multipliers, such as ATMega163, multiplication only requires 2 clock cycles to complete. Even in AVR microcontrollers without built-in hardware multipliers, the subroutine for multiplication is shorter than the subroutine for squaring, and the execution speed is fast. Usually, if you need to multiply or divide by 2n, you can use the shift method instead. In ICCAVR, if you multiply by 2n , can generate left shift code, and multiplication by other integers or division by any number, all call the multiplication and division subroutines. The code obtained by the shift method is more efficient than the code generated by calling the multiplication and division subroutines. In fact, as long as it is multiplication or division by an integer, the result can be obtained by the shift method, such as: Replace the original expression with an expression with smaller computational complexity. The following is a classic example: for (i = 0;i < MAX;i++) Bit operations are faster than remainder operations*/ y = x * x; /* Multiplication is faster than square operations*/ z = (y << 5) + y; /* Shift multiplication is faster than multiplication*/ for (i = h = 0; i < MAX; i++) h += 14; /* Addition is faster than multiplication*/ } Copy code Integer division is the slowest of integer operations, so it should be avoided as much as possible. One possible way to reduce integer division is continuous division, where division can be replaced by multiplication. The side effect of this replacement is that there may be overflow when calculating the product, so it can only be used in a certain range of division. When using addition and subtraction operations, try to use increment and decrement operators, because increment statements are faster than assignment statements. The reason is that for most CPUs, the increase and decrease operations on memory words do not need to explicitly use memory fetch and memory write instructions. For example, the following statement: x=x+1;, fetch x from memory and store it into accumulator A, and accumulator A increases by 1. Copy code Obviously, without fetching and storing instructions, the execution speed of increment and decrement operations is faster, and the length is also shortened. Compound assignment expressions (such as a-=1 and a+=1, etc.) can generate high-quality program code. In some cases, C++ The compiler cannot extract common subexpressions from floating-point expressions because this would be equivalent to reordering the expressions. It should be noted that the compiler cannot rearrange the expressions according to algebraic equivalence before extracting common subexpressions. At this time, the programmer has to manually extract common subexpressions (there is a global optimization option in VC.NET that can do this, but the effect is unknown) float a, b, c, d, e, f; float a, b, c, d, e, f; Copy code Many compilers have options to align structures to words, double words, or quad words. However, it is still necessary to improve the alignment of structure members. Some compilers may allocate space for structure members in a different order than they are declared. However, some compilers do not provide these features or the effect is not good. Therefore, in order to achieve the best alignment of structures and structure members at the lowest cost, the following methods are recommended: Sort the members of the structure according to their type length, and put the long type before the short type when declaring the members. The compiler requires that long data types be stored on even address boundaries. When declaring a complex data type (的变量 (string ) When there are both multi-byte and single-byte data, the multi-byte data should be stored first, followed by the single-byte data, to avoid memory holes The compiler automatically aligns instances of structures on even-numbered memory boundaries Pad the structure to an integer multiple of the longest type length In this way, if the first member of the structure is aligned, the entire structure is naturally aligned The following example demonstrates how to reorder structure members: struct { char a[5] ; struct { double x ; long k ; char a[5] ; Copy code When the compiler allocates space for local variables, their order is the same as the order in which they are declared in the source code. As with the previous rule, long variables should be placed before short variables If the first variable is aligned, the other variables will be stored continuously and will be aligned naturally without padding bytes Some compilers do not automatically change the order of variables when allocating them, and some compilers cannot generate 4-byte aligned stacks, so 4 bytes may not be aligned The following example demonstrates the reordering of local variable declarations: Avoid frequently using the values pointed to by pointer parameters in functions. Because the compiler does not know whether there are conflicts between pointers, pointer parameters are often not optimized by the compiler. In this way, data cannot be stored in registers, and memory bandwidth is obviously occupied. Note that many compilers have an assume-no-conflict optimization switch (in VC, you must manually add the compiler command line /Oa or /Ow), which allows the compiler to assume that two different pointers always have different contents, so there is no need to save pointer parameters to local variables. Otherwise, please save the data pointed to by the pointer to a local variable at the beginning of the function. If necessary, copy it back before the end of the function. void isqrt(unsigned long a, unsigned long* q, unsigned long* r) { while (*q > (*r = a / *q)) *q = (*q + *r) >> 1; } void isqrt(unsigned long a, unsigned long* q, unsigned long* r) { while (qq > (rr = a / qq)) qq = (qq + rr) >> 1; } Breaking up loops can improve performance, especially when the loop body itself is small. Note:Many compilers cannot automatically factor out bad loops: // 3D transformation: multiply vector V by 4x4 matrix M for (i = 0; i < 4; i ++) { for (j = 0; j < 4; j ++) } r[0] = M[0][0]*V[0] + M[1][0]*V[1] + M[2][0]*V[2] + M[3][0]*V[3]; r[1] = M[0][1]*V[0] + M[1][1]*V[1] + M[2][1]*V[2] + M[3][1]*V[3]; r[2] = M[0][2]*V[0] + M[1][2]*V[1] + M[2][2]*V[2] + M[3][2]*V[3]; r[3] = M[0][3]*V[0] + M[1][3]*V[1] + M[2][3]*V[2] + M[3][3]*v[3] ; Copy code For some tasks that do not require the loop variable to participate in the calculation, they can be placed outside the loop. The tasks here include calling expression functions, pointer operations, array access, etc. All operations that do not need to be performed multiple times should be grouped together and put into an init initialization program. unsigned int i; for (i=0;i<1000;i++) ; unsigned int i; for (i=1000;i>0;i--) ; Copy code The delay effect of the two functions is similar, but almost all C compilers generate 1~3 bytes less code for the latter function than for the former, because almost all MCUs have instructions to transfer to 0, and the latter method can generate such instructions. The same is true when using a while loop. Using a decrement instruction to control the loop will generate 1~3 fewer letters of code than using an increment instruction to control the loop. However, there is a loop variable i in the loop. When reading and writing array instructions, using a pre-decrement loop may cause the array to exceed the bounds, so be careful unsigned int i; unsigned int i; do { } while (i>0); Copy code Of these two loops, the length of the code generated after compiling using the do while loop is shorter than that of the while loop