TI C6000 Optimization Advanced: Several Steps for Compiler Optimization Loops
[Copy link]
The performance of a loop depends mainly on whether the compiler can compile appropriate software pipelines. The compiler optimizes a loop in three steps:
Get loop count information. This information can help the compiler determine whether to automatically expand the loop. Sometimes the compiler cannot obtain complete information from the code, and the compiler will adopt a conservative optimization strategy for the loop. Therefore, to obtain the best optimization performance, the programmer should provide this information to the compiler as much as possible, through pragma statements such as MUST_ITERATE and UNROLL.
Several key parameters are as follows:
Collect loop resources and related graph information. The number of cycles required for the CPU to complete a loop iteration is called the iteration interval (ii), and the compiler's optimization goal is to minimize ii.
Several key parameters are as follows:
Take the following code as an example:
void simple_sum(short *sum, short *in1, unsigned int N)
{
int i;
for (i = 0; i < N; i++)
{
sum = in1 + 1;
}
}
The largest dependency path is shown in the figure below, which shows that the next data loading needs to wait for the previous data storage to complete.
Figure Circular Dependency Path
Many times, loop execution dependencies are caused by the compiler's lack of information about certain pointer variables. When the exact value of the pointer is unknown, the compiler must assume that any two pointers may point to the same location. Therefore, loading from one pointer implies a dependency on the store operation performed on the other pointer, and vice versa. Such dependencies are usually unnecessary and can be eliminated by the programmer by adding the "restrict" keyword.
-
Unpartitioned Resource Bound: The compiler allocates each instruction to the A and B operation units before making ii reach the minimum resource limit.
-
Partitioned Resource Bound: After the compiler assigns each instruction to the A and B operation units, ii reaches the minimum resource limit.
Find the software pipeline orchestration strategy. The compiler first sets ii to the larger value of the two indicators, Loop Carried Dependency Bound and Partitioned Resource Bound, and then uses it as the target to find an orchestration strategy. If it fails, it will increase ii+1 and continue searching... During this period, the compiler will feedback the reason for the orchestration failure and some related information.
|