The ready table setting, clearing, and searching algorithm of μc/OS-II is an efficient, cross-platform program. It uses two search arrays, OSMapTbl[8] and OSUnMapTbl[256], to increase the speed of searching the ready table and obtain the highest priority ready task as soon as possible.
Cortex-M3 is a newer architecture version of ARM, mainly used in the field of single-chip microcomputers. More and more 32-bit chips are produced based on it; Cortex-M3 only supports Thumb-2 instruction set, which can achieve better performance between performance and code density.
1 Reasons and pros and cons of changing the algorithm on ARM
Since the ready list operation is run in the interrupt-disabled state, its execution affects the interrupt response time of the system. Therefore, the efficiency of the ready list operation algorithm is one of the benchmarks for measuring the quality of the real-time operating system.
In the instruction set used by Cortex-M3, some instruction functions cannot be underestimated, such as leading zero count clz, word bit inversion rbit, bit clear bic. Among them, clz and bic point out another way for the high priority acquisition algorithm of the μC/OS ready table.
(1) Advantages after the change
① Save storage space. The search arrays OSMapTbl[8] and OSUnMapTbl[256] are no longer used. The purpose of setting up these two arrays is to improve the efficiency of searching the ready table.
② Improve search efficiency. clz is a single-cycle instruction that uses addition instructions with shifts, which greatly shortens the operation time.
③ The number of tasks supported by μC/Os-Ⅱ has been increased from 64 to 1,024 (the number of tasks supported by version 2.84 has reached 256, but the efficiency has decreased).
(2) Existing deficiencies
① Realview MDK (version 3.20 and its instruction simulator are used here) does not yet support the use of Thumb-2 instruction inline assembly in C language programs. When using inline assembly functions, function calls (jump returns) reduce execution efficiency.
② The C language does not support the clz instruction enough, so the new algorithm is not cross-platform. However, given that ARM chips are widely used and the instruction is widely compatible with chips after ARM9, the application space is still broad.
2 μc/OS-Ⅱ Ready Table Algorithm Introduction and Specific Changes
The μC/OS-II ready table is an array. The value of a bit in the array element (1 or 0) corresponds to whether a task is ready or not. The position of the bit in the array indicates the priority of the task. When the highest priority task that is ready needs to be scheduled to run, the task is searched in the ready table.
2.1 Introduction to μc/OS-II Ready List Algorithm
One solution is to determine whether each item in the array is 0 in turn: if > 0, enter the item and search for the 1-bit position with the smallest weight; if = 0, add a base to the priority and check the next item until the priority is found.
μc/Os-Ⅱ is one step ahead and sets a variable to determine whether each item in the ready list is 0, called the ready list group. A bit in the ready list group is 0 or 1, corresponding to whether the value of an item in the ready list is 0. By finding the position of the least weighted bit in the ready list group that is set to 1, the index corresponding to the first ready list item > 0 is determined, thereby avoiding loops and greatly improving efficiency.
2.2 Modification method and source code
The clz algorithm adopts the idea of μc/Os-Ⅱ and optimizes it by using the clz instruction. The difference is that clz searches from right to left, and the high-weight bit of the binary corresponds to the high priority, while the priority of μc/Os-Ⅱ is small.
Considering that sometimes not many tasks are used, it is a waste to use an array as a ready list. Therefore, when the total number of tasks is less than 32, a 32-bit unsigned integer variable is used as the ready list. Note that the ready list group variable OSRdyGrp is used as the ready list at this time.
The constant OS_LES_TSK indicates whether to use a smaller number of tasks. 0 indicates using a maximum of 32 tasks, and 1 indicates using a maximum of 1,024 tasks.
The constant RdySt sets the most significant bit of a 32-bit integer to 1 for shifting.
2.3 C language implementation
The following algorithm is written using a function with an embedded clz instruction to implement the setting and clearing of tasks with a specified priority in the ready list, and to search for the highest priority of the ready tasks in the ready list.
Some documents require that bx r14 in the program must be written, but after checking the disassembled code, the compiler has already added it. It seems that the compiler has been upgraded. Whether there will be errors depends on the compiler used. It is recommended to write it according to the specification. Since the return of the embedded function call is time-consuming, the search algorithm cannot be fully utilized. It is necessary to improve the compiled assembly code to achieve higher efficiency, or rewrite this part of the program using assembly code. [page]
2.4 THUMB-2 assembly instruction implementation
Tips for writing programs in assembly language: In the setup and clearing functions of the highest priority task, the C language operator "︱=" is equivalent to the assembly instruction "orr", and "&=~" is equivalent to the assembly instruction "bic. Both instructions can perform pre-shift operations, greatly improving execution efficiency. You can check the disassembled source code to see if the C compiler takes advantage of this convenience.
In the search function, the inline assembly call in the C language program can be omitted to reduce redundant instructions. The pseudo code is as follows:
ldr rO, =OsRdyGrp; load ready table group variable OSRdyGrp address
It can be seen that, in addition to the data loading instructions, the core algorithm of the search has only 3 instructions (only 1 instruction when using <32 tasks). However, when actually designing the algorithm, it is also necessary to consider the instruction pipeline stall to achieve the best effect.
2.5 Introduction to the source code of μC/OS-Ⅱ2.84 version
The following is the translated and sorted μC/OS-Ⅱ priority search algorithm source code (version 2.84). The longer comments are added to explain the algorithm.
The clz highest priority search algorithm is different from the new algorithm of μC/OS-Ⅱ: the returned results are 8-bit and 16-bit integers, respectively. This is because 8 bits can no longer represent values > 255; the clz algorithm uses more 16- or 32-bit integers in the process to fully utilize the chip performance.
3 Scope of application
The waiting task list uses a similar process to the ready list operation. Note that the data type and algorithm must be changed at the same time. Although the algorithm is executed on Cortex-M3, it is applicable to ARM9 and later chips. Chips that support the ARM instruction set can use inline assembly in C language, and there is no need to write assembly search functions.
The algorithm described in this article is applicable to the following two situations.
①Using μC/OS-Ⅱ system:
◆Require more task priorities;
◆Requires superior product performance or time-critical applications, want to further
Steps to improve efficiency;
◆Study, research or hope to optimize μC/OS-Ⅱ to expand its application range.
② Not using μC/OS-Ⅱ system:
◆Transplant and modify the ready list algorithms of other operating systems;
◆Write a new operating system or implement a scheduler;
◆Programming enthusiasts learn from and improve programming methods.
Conclusion
When Cortex-M3 was launched, I decided that it was a powerful tool for transitioning from single-chip microcomputers to ARM. Its small storage capacity makes it more suitable for small real-time systems. In the process of learning μC/OS-Ⅱ, I found that its ready list operation algorithm might be better after modification, so I did the experiment described in this article.