1761 views|0 replies

2015

Posts

0

Resources
The OP
 

TI C6000 Optimization Advanced: Several Steps for Compiler Optimization Loops [Copy link]

The performance of a loop depends mainly on whether the compiler can compile appropriate software pipelines. The compiler optimizes a loop in three steps:

Get loop count information. This information can help the compiler determine whether to automatically expand the loop. Sometimes the compiler cannot obtain complete information from the code, and the compiler will adopt a conservative optimization strategy for the loop. Therefore, to obtain the best optimization performance, the programmer should provide this information to the compiler as much as possible, through pragma statements such as MUST_ITERATE and UNROLL.

Several key parameters are as follows:

  • Minimum Trip Count

  • Maximum possible number of loops (Maximum Trip Count)

  • Max Trip Count Factor

Collect loop resources and related graph information. The number of cycles required for the CPU to complete a loop iteration is called the iteration interval (ii), and the compiler's optimization goal is to minimize ii.

Several key parameters are as follows:

  • Loop Carried Dependency Bound refers to the distance of the largest dependency path in the loop body, and the so-called dependency means that the start of the current instruction depends on the end of the previous instruction.

Take the following code as an example:

void simple_sum(short *sum, short *in1, unsigned int N)

{

int i;

for (i = 0; i < N; i++)

{

sum = in1 + 1;

}

}

The largest dependency path is shown in the figure below, which shows that the next data loading needs to wait for the previous data storage to complete.

Figure Circular Dependency Path

Many times, loop execution dependencies are caused by the compiler's lack of information about certain pointer variables. When the exact value of the pointer is unknown, the compiler must assume that any two pointers may point to the same location. Therefore, loading from one pointer implies a dependency on the store operation performed on the other pointer, and vice versa. Such dependencies are usually unnecessary and can be eliminated by the programmer by adding the "restrict" keyword.

  • Unpartitioned Resource Bound: The compiler allocates each instruction to the A and B operation units before making ii reach the minimum resource limit.

  • Partitioned Resource Bound: After the compiler assigns each instruction to the A and B operation units, ii reaches the minimum resource limit.

Find the software pipeline orchestration strategy. The compiler first sets ii to the larger value of the two indicators, Loop Carried Dependency Bound and Partitioned Resource Bound, and then uses it as the target to find an orchestration strategy. If it fails, it will increase ii+1 and continue searching... During this period, the compiler will feedback the reason for the orchestration failure and some related information.

This post is from DSP and ARM Processors
 

Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list