MCU Programming Optimization-EEWORLD

Collect

I saw this in a book and thought it was great, so I shared it with you.

Since the performance of a microcontroller
is vastly different from that of a computer, no matter in terms of space resources, memory resources, or operating frequency, it is incomparable. When programming a PC, you basically don't have to consider the space and memory usage issues. The ultimate goal is to achieve the function.
For a microcontroller, it is completely different. The Flash and Ram resources of a general microcontroller are measured in KB. As you can imagine,
the resources of a microcontroller are pitifully few. For this reason, we must find ways to squeeze out all its resources and maximize its performance. When designing a program, we must
follow the following points for optimization:

1. Use the smallest data type
possible. If you can use a character type (char) to define a variable, don't use an integer (int) variable to define it; if you can use an integer variable to define a variable,
don't use a long integer (long int); if you can avoid using a floating point variable, don't use a floating point variable. Of course, after defining a variable,
do not exceed the scope of the variable. If you assign a value beyond the scope of the variable, the C compiler will not report an error, but the program will run incorrectly,
and such errors are difficult to detect.

2. Use self-increment and self-decrement instructions.
Usually, the use of self-increment and self-decrement instructions and compound assignment expressions (such as a-=1 and a+=1, etc.) can generate high-quality
program code. The compiler can usually generate instructions such as inc and dec. When using
instructions such as a=a+1 or a=a-1, many C compilers will generate two to three bytes of instructions.

3. Reduce the intensity of operations.
You can use expressions with small operations but the same functions to replace the original complex expressions.
(1) Remainder operation
N= N %8 can be changed to N = N &7
. Note: Bit operations only require one instruction cycle to complete, while most C compilers use subroutines to
complete the "%" operation, which has long code and slow execution speed. Usually, if you only need to find the remainder of 2n, you can use bit operations instead.
(2) Square operation
N=Pow(3,2) can be changed to N=3*3.
Note: In a microcontroller with a built-in hardware multiplier (such as the 51 series), multiplication is much faster than square operation because the
square of a floating point number is implemented by calling a subroutine. The subroutine for multiplication is shorter and faster than the subroutine for square operation.
(3) Use shift to replace multiplication and division.
N=M*8 can be changed to N=M<<3.
N=M/8 can be changed to N=M>>3.
Note: Usually, if you need to multiply or divide by 2n, you can use the shift method instead. If you multiply by 2n, you can generate a left shift
code, while if you multiply by other integers or divide by any number, you call the multiplication and division subroutine. The code generated by the shift method is
more efficient than the code generated by calling the multiplication and division subroutine. In fact, as long as you multiply or divide by an integer, you can use the shift method to get the result. For example, N=M*9
can be changed to N=(M<<3)+M;
(4) The difference between self-increment and self-decrement
For example, the delay functions we usually use are implemented by self-increment.
void DelayNms(UINT16 t)
{
UINT16 i,j;
for(i=0;i for(j=0;i<1000;j++)
}
can be changed to
void DelayNms(UINT16 t)
{
UINT16 i,j;
for(i=t;i>=0;i--)
for(j=1000;i>=0;j--)
}
Note: The delay effects of the two functions are similar, but almost all C compilers generate 1~3 bytes less code for the latter function than the former
, because almost all MCUs have instructions for transferring to 0, and the latter method can generate such instructions.

4. The difference between while and do...while
void DelayNus(UINT16 t)
{
while(t--)
{
NOP();
}
}
can be changed to
void DelayNus(UINT16 t)
{
do
{
NOP();
}while(--t)
}
Note: The length of the code generated after compiling with the do...while loop is shorter than that of the while loop.

5. The register keyword
void UARTPrintfString(INT8 *str)
{
while(*str && str)
{
UARTSendByte(*str++)
}
}
can be changed to
void UARTPrintfString(INT8 *str)
{
register INT8 *pstr=str;
while(*pstr && pstr)
{
UARTSendByte(*pstr++)
}
}
Note: The register keyword can be used when declaring local variables. This causes the compiler to put the variable into a multi-purpose register
instead of the stack. Proper use of this method can increase execution speed. The more frequently a function is called, the more likely it is to increase
the speed of the code. Note that the register keyword is only a suggestion to the compiler.

6. Volatile keyword
Volatile is always related to optimization. The compiler has a technology called data flow analysis, which analyzes where variables in the program are assigned, used
, and invalidated. The analysis results can be used for constant merging, constant propagation and other optimizations, and further eliminate dead code. Generally speaking
, the volatile keyword is only used in the following three situations:
a) Variables modified in interrupt service functions for detection by other programs need to be added with volatile (refer to the advanced experimental program in this book)
b) Flags shared between tasks in a multi-tasking environment should be added with volatile
c) Hardware registers mapped by memory usually also need to be added with volatile instructions, because each read and write to it may have different meanings
. In short, the volatile keyword is a type modifier. The type variable declared with it indicates that it can be changed by some factors unknown to the compiler , such as: operating system, hardware or other threads. When encountering a variable declared with this keyword, the compiler will no longer optimize
the code that accesses the variable , thereby providing stable access to special addresses. 7. Trade space for time In actual data verification, there is actually another method for CRC16 cyclic redundancy check, which is the table lookup method. The table lookup method can obtain the check value more quickly and more efficiently. When the amount of verification data is large, the advantage of using the table lookup method is more obvious, but the only disadvantage is that it takes up a lot of space . //Table lookup method: code UINT16 szCRC16Tbl[256] = { 0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50a5, 0x60c6, 0x70e7, 0x8108, 0x9129, 0xa14a, 0xb16b, 0xc18c, 0xd1ad, 0xe1ce, 0xf1ef, 0x1231, 0x0210, 0x3273, 0x2252, 0x52b5, 0x4294, 0x72f7, 0x62d6, 0x9339, 0x8318, 0xb37b, 0xa35a, 0xd3bd, 0xc39c, 0xf3ff, 0xe3de, 0x2462, 0x3443, 0x0420, 0x1401, 0x64e6, 0x74c7, 0x44a4, 0x5485, 0xa56a, 0xb54b, 0x8528, 0x950 9, 0xe5ee, 0xf5cf, 0xc5ac, 0xd58d, 0x3653, 0x2672, 0x1611, 0x0630, 0x76d7, 0x66f6, 0x5695, 0x46b4, 0xb75b, 0xa77a, 0x9719, 0x8738, 0xf7df, 0xe7fe, 0xd79d, 0xc7bc, 0x48c4, 0x58e5, 0x6886, 0x78a7, 0x0840, 0x1861, 0x2802, 0x3823, 0xc9cc, 0xd9ed, 0xe98e, 0xf9af, 0x8948, 0x9969, 0xa90a, 0xb92b, 0x5af5, 0x4ad4, 0x7ab7, 0x6a96, 0x1a71, 0x0a50, 0x3a33, 0x2a12, 0xdbfd, 0xcbdc, 0xfbbf, 0xeb9e, 0x9b79, 0x8b58, 0xbb3b, 0xab1a, 0x6ca6, 0x7c87, 0x4ce4, 0x5cc5, 0x2c22, 0x3c03, 0x0c60, 0x1c41, 0xedae, 0xfd8f, 0xcdec, 0xddc d, 0xad2a, 0xbd0b, 0x8d68, 0x9d49, 0x7e97 , 0x6eb6, 0x5ed5, 0x4ef4, 0x3e13, 0x2e32, 0x1e51, 0x0e70, 0xff9f, 0xefbe, 0xdfdd, 0xcffc, 0xbf1b, 0xaf3a, 0x9f59, 0x8f78, 0x9188, 0x81a9, 0xb1ca, 0xa1eb, 0xd10c, 0xc12d, 0xf14e, 0xe16f, 0x1080 , 0x00a1, 0x30c2, 0x20e3 , 0x5004, 0x4025, 0x7046, 0x6067, 0x83b9, 0x9398, 0xa3fb, 0xb3da, 0xc33d, 0xd31c, 0xe37f, 0xf35e, 0x02b1, 0x1290, 0x22f3, 0x32d2, 0x4235, 0x5214, 0x6277, 0x7256, 0xb5ea, 0xa5cb, 0x95a8, 0x8589, 0xf56e, 0xe54f, 0xd52c, 0xc50d, 0x34e2, 0x24c3, 0x14 a0, 0x0481, 0x7466, 0x6447, 0x5424, 0x4405, 0xa7db, 0xb7fa, 0x8799, 0x97b8, 0xe75f, 0xf77e, 0xc71d, 0xd73c, 0x26d3, 0x36f2, 0x0691, 0x16b0, 0x6657, 0x7676, 0x4615, 0x5634, 0xd94c , 0xc96d, 0xf90e, 0xe92f, 0x99c8, 0x89e9, 0xb98a, 0xa9ab, 0x5844 , 0x48 65, 0x7806, 0x6827, 0x18c0, 0x08e1, 0x3882, 0x28a3, 0xcb7d, 0xdb5c, 0xeb3f, 0xfb1e, 0x8bf9, 0x9bd8, 0xabbb, 0xbb9a, 0x4a75, 0x5a54, 0x6a37, 0x7a16, 0x0af1, 0x1ad0, 0x2ab3, 0x3a92, 0xfd2e, 0xed0f, 0xdd6c, 0xcd4d, 0xbdaa, 0xad8b, 0x9de8, 0x8dc9,

0x7c26, 0x6c07, 0x5c64, 0x4c45, 0x3ca2, 0x2c83, 0x1ce0, 0x0cc1,
0xef1f, 0xff3e, 0xcf5d, 0xdf7c, 0xaf9b, 0xbfba, 0x8fd9, 0x9ff8,
0x6 e17, 0x7e36, 0x4e55, 0x5e74, 0x2e93, 0x3eb2, 0x0ed1, 0x1ef0
};
UINT16 CRC16CheckFromTbl(UINT8 *buf,UINT8 len)
{
UINT16 i;
UINT16 uncrcReg = 0, uncrcConst = 0xffff;
for(i = 0;i < len;i ++)
{
uncrcReg = (uncrcReg << 8) ^ szCRC16Tbl[(((uncrcConst ^ uncrcReg) >> 8)
^ *buf++) & 0xFF];
uncrcConst <<= 8;
}
return uncrcReg ;
}
If the system requires strong real-time performance, it is recommended to use the table lookup method in the CRC16 cyclic redundancy check to exchange space for time.

8. Macro function replaces function
First of all, it is not recommended to change all functions to macro functions to avoid unnecessary However, it is necessary to use
macros to replace some basic functions .
UINT8 Max(UINT8 A,UINT8 B)
{
return (A>B?A:B)
}
can be changed to
#define MAX (A, B) {(A)>(B)?(A):(B)}
Note: The difference between a function and a macro function is that a macro function takes up a lot of space, while a function takes up time. What you need to know is that function
calls use the system stack to save data. If the compiler has a stack check option, some assembly
statements to check the current stack. At the same time, the CPU also To save and restore the current scene when calling a function, push and pop the stack, so
function calls require some CPU time. Macro functions do not have this problem. Macro functions are simply pre-written code embedded in the current program. ,
no function call will be generated, so it just takes up space. This phenomenon is particularly prominent when the same macro function is called frequently.

9. Use algorithms appropriately
Suppose there is an arithmetic problem to find the sum of 1 to 100.
As a program As computer programmers, we can type the following calculation method without hesitation:
UINT16 Sum(void)
{
UINT8 i,s;
for(i=1;i<=100;i++)
{
s+=i;
}
return s;
Obviously
everyone will think of this method, but the efficiency is not satisfactory. We need to use our brains to use mathematical algorithms to solve the problem and
improve the computing efficiency to a higher level.
UINT16 Sum(void)
{
UINT16 s;
s=(100 *(100+1))>>1;
return s;
}
The result is obvious. The same result with different calculation methods will have greatly different operating efficiency, so we need Maximize
the efficiency of program execution through mathematical methods.

10. Use pointers instead of arrays
In many cases, pointer operations can be used instead of array indexes, which often produces fast and short code.
Pointers generally make code faster and take up less space. The difference is more pronounced when using multidimensional arrays. The following code does the same thing,
but at different efficiency levels.
UINT8 szArrayA[64];
UINT8 szArrayB[64];
UINT8 i;
UINT8 *p=szArray;
for(i=0;i<64;i++)szArrayB=szArrayA;
for(i=0;i<64;i++)szArrayB =*p++;
The advantage of the pointer method is that after the address of szArrayA is loaded into the pointer p, only the increment operation of p is required in each loop. In the array index
method, the array must be searched based on the value of i in each loop.

11. Forced conversion
The essence of C language is the use of pointers, and the second essence is the use of forced conversion. Proper use of pointers and forced conversion
can not only improve program efficiency, but also make the program more concise . Forced conversion plays an important role in C language programming. Five
typical examples are given below to explain.
Example 1: Convert a signed byte integer to an unsigned byte integer
UINT8 a=0;
INT8 b=-3;
a=(UINT8)b;
Example 2: In big-endian mode (the 8051 series microcontroller is big
Method 1: Use the bit shift method. UINT8 a[2]
={0x12,0x34};
UINT16 b=0;
b=(a[0 ]<<8)|a[1];
Result: b=0x1234
Method 2: Forced type conversion.
UINT8 a[2]={0x12,0x34};
UINT16 b=0;
b= *(UINT16 *)a; / /Force conversion
Result: b=0x1234
Example 3: Save the structure data content.
Method 1: Save one by one.
typedef struct _ST
{
UINT8 a;
UINT8 b;
UINT8 c;
UINT8 d;
UINT8 e;
}ST;
ST s;
UINT8 a[5]={0};
sa=1;
sb=2;
sc=3;
sd=4;
se=5;
a[0]=sa;
a[1]=sb;
a[2]=sc;
a[3]=sd;
a[4]=se;
Result: The contents stored in array a are 1, 2, 3, 4, 5.
Method 2: Forced type conversion.
typedef struct _ST
{
UINT8 a;
UINT8 b;
UINT8 c;
UINT8 d;
UINT8 e;
}ST;
ST s;
UINT8 a[5]={0};
UINT8 *p=(UINT8 *)&s;//Forced conversion to
UINT8 i=0;
sa=1;
sb=2;
sc=3;
sd=4;
se=5;
for(i=0;i {
a=*p++;
}
Result: The contents stored in array a are 1, 2, 3, 4, 5.
Example 4: In big-endian mode (8051 series microcontrollers are big-endian mode), assign a structure containing bit fields to an unsigned byte integer value
Method 1: Assign values bit by bit.
typedef struct __BYTE2BITS
{
UINT8 _bit7:1;
UINT8 _bit6:1;
UINT8 _bit5:1;
UINT8 _bit4:1;
UINT8 _bit3:1; UINT8 _bit2:1; UINT8 _bit1:1;
Byte2Bits._bit6 = 0 ; Byte2Bits._bit5=1; Byte2Bits._bit4 = 1; Byte2Bits._bit3=1 ; Byte2Bits._bit2 = 1; Byte2Bits._bit1= 0 ; Byte2Bits._bit0 =0 ; Byte2Bits._bit6<<6; a|= Byte2Bits._bit5<<5; a|= Byte2Bits._bit4<<4; a|= Byte2Bits._bit3<<3; a|= Byte2Bits._bit2<<2; a|= Byte2Bits._bit1<<1; a|= Byte2Bits._bit0<<0; Result: a=0x3C Method 2: Forced conversion. typedef struct __BYTE2BITS { UINT8 _bit7:1; UINT8 _bit6:1; UINT8 _bit5:1; UINT8 _bit4:1; UINT8 _bit3:1; UINT8 _bit2:1; UINT8 _bit1:1; Byte2Bits._bit6=0; Byte2Bits._bit5= 1 ; Byte2Bits._bit4 =1 ; Byte2Bits._bit3=1; Byte2Bits._bit2=1; Byte2Bits._bit1 =0; Byte2Bits._bit0=0; UINT8 a=0; a = *(UINT8 *)&Byte2Bits result: a=0x3C Example 5: In big-endian mode (8051 series microcontrollers are in big-endian mode) assign an unsigned byte integer value to a structure containing a bit field. Method 1: Assign values bit by bit. typedef struct __BYTE2BITS { UINT8 _bit7:1;

0x01
;
Method
2
: Force
conversion . typedef struct __BYTE2BITS { UINT8 _bit7:1; UINT8 _bit6:1; UINT8 _bit5:1; UINT8 _bit4:1; UINT8 _bit3:1; UINT8 _bit2:1; UINT8 _bit1:1; UINT8 _bit0:1; }BYTE2BITS; BYTE2BITS Byte2Bits; UINT8 a=0x3C; Byte2Bits= *(BYTE2BITS *)&a; 12. Reduce function call parameters Using global variables is more efficient than passing parameters to functions. This eliminates the time required to push function call parameters onto the stack and pop them off the stack after the function is completed. However, the decision to use global variables will affect the modularity and reentrancy of the program, so use them with caution. 13. Order cases in switch statements by frequency of occurrence Switch statements are a common programming technique where the compiler generates nested if-else-if code and compares the cases in order. When a match is found, it jumps to the statement that satisfies the condition. Be careful when using them. Each test and jump implemented in machine language uses up precious processor time just to decide what to do next. To increase speed, it is not possible to order specific cases by their relative frequency of occurrence. In other words, put the most likely case first and the least likely case last. 14. Convert large switch statements to nested switch statements When there are many case labels in a switch statement, it is wise to convert the large switch statement to nested switch statements to reduce the number of comparisons. Put the case labels with high frequency of occurrence in one switch statement and make it the outermost layer of the nested switch statement, and put the case labels with relatively low frequency of occurrence in another switch statement. For example, the following program segment puts the relatively low frequency case in the default case label. UINT8 ucCurTask=1; void Task1(void); void Task2(void); void Task3(void); void Task4(void); ………… void Task16(void); switch(ucCurTask) { case 1: Task1();break; case 2: Task2();break; case 3: Task3();break; case 4: Task4();break; ……………………… case 16: Task16();break ; default:break; } can be changed to UINT8 ucCurTask=1; void Task1(void); void Task2(void); void Task3(void); void Task4(void); …………… void Task16(void); switch(ucCurTask) { case 1: Task1();break; case 2: Task2();break; default: switch(ucCurTask) { case 3: Task3();break; case 4: 4();break; …………………… case 16: Task16();break; default:break; } Break; } Since the switch statement is equivalent to the nested code of if-else-if, the large if statement should also be converted into a nested if statement. UINT8 ucCurTask=1; void Task1(void); void Task2(void);

void Task3(void);
void Task4(void);
……………
void Task16(void);
if (ucCurTask==1) Task1();
else if(ucCurTask==2) Task2();
else
{
if (ucCurTask==3) Task3();
else if(ucCurTask==4) Task4();
………………
else Task16();
}

15. Use of function pointers
When there are many case labels in the switch statement, or when there are too many comparisons in the if statement, in order to improve the execution speed of the program,
you can use function pointers to replace the use of switch or if statements. These uses can refer to the electronic menu experiment code, USB experiment code
and network experiment code.
UINT8 ucCurTask=1;
void Task1(void);
void Task2(void);
void Task3(void);
void Task4(void);
…………
void Task16(void);
switch(ucCurTask)
{
case 1: Task1();break;
case 2: Task2();break;
case 3: Task3();break;
case 4: Task4();break;
………………………
case 16: Task16();break ;
default:break;
}
can be changed to
UINT8 ucCurTask=1;
void Task1(void);
void Task2(void);
void Task3(void);
void Task4(void);
……………
void Task16(void);
void (*szTaskTbl)[16])(void)={Task1,Task2,Task3,Task4,…,Task16};
Calling method 1: (*szTaskTbl[uc CurTask])();
Call method 2: szTaskTbl[ucCurTask]();

16. Loop nesting
Loops are often used in programming, and loop nesting often occurs. Let's take the for loop as an example.
UINT8 i,j;
for(i=0;i<255;i++)
{
for(j=0;j<25;j++)
{
………………
}
}
Large loops nested within smaller loops will waste more time for the compiler, so the recommended approach is to nest smaller loops within larger loops.
UINT8 i,j;
for(j=0;j<25;j++)
{
for(i=0;i<255;i++)
{
………………
}
}

17. Inline function
In C++, the keyword inline can be added to any function declaration. This keyword requests the compiler
to replace all calls to the specified function with the code inside the function. This is faster than function calls in two ways. This is faster than function calls in two aspects:
first, it saves the execution time required for the call instruction; second, it saves the time required for passing variables and the passing process. However,
while using this method to optimize program speed, the program length becomes larger, so more ROM is required. This optimization
is most effective when the inline function is frequently called and contains only a few lines of code.
If the compiler allows the inline keyword to be supported in C language programming, note that it is not C++ language programming, and the ROM of the microcontroller
is large enough, you can consider adding the inline keyword. Compilers that support the inline keyword include ADS1.2, RealView MDK, etc.

18. Start with the compiler
Many compilers have optimizations that favor code execution speed and optimizations that favor code that occupies too little free space. For example,
when compiling in the Keil development environment, you can choose to favor code execution speed optimization (Favor Speed) or code that occupies too little space optimization (Favor
Size). Other development environments based on GCC generally provide optimization options such as -O0, -O1, -O2, -O3, and -Os.
The optimized code with -O2 is the most ideal in terms of execution speed, and the optimized code with -Os takes up the smallest space.

19. Embedded assembly---killer weapon
Assembly language is the most efficient computer language. In general project development, C language is generally used for development, because embedded assembly
will affect the portability and readability of the platform, and the assembly instructions of different platforms are incompatible. However, for some persistent programmers who require the program
to achieve the ultimate running efficiency, they all embed assembly in C language, that is, "mixed programming".
Note: If you want to embed assembly, you must have a deep understanding of assembly. Do not use embedded assembly unless it is absolutely necessary.

You can learn from it and prove its application and development in practice.

Reference address：MCU Programming Optimization

Previous article：Calibration method based on temperature measurement of single chip microcomputer
Next article：Optimization Method of Single Chip Microcomputer C Language Program Code

Popular Resources
Popular amplifiers