1617 views|0 replies

3836

Posts

19

Resources
The OP
 

C6XX Optimization Experience Summary [Copy link]

1. Original program: for (i = LO_CHAN; i <= HI_CHAN; i++) { norm_shift = norm_l(st->ch_noise[i ]); Ltmp = L_shl(st->ch_noise, norm_shift); norm_shift1 = norm_l(st->ch_enrg); Ltmp3 = L_shl1(st->ch_enrg, norm_shift1 - 1); Ltmp2 = L_divide(Ltmp3, Ltmp); Ltmp2 = L_shr(Ltmp2, 27 - 1 + norm_shift1 - norm_shift ); // * scaled as 27,4 * if (Ltmp2 == 0) Ltmp2 = 1; Ltmp1 = fnLog10(Ltmp2); Ltmp3 = L_add(Ltmp1, LOG_OFFSET - 80807124); // * -round(log10(2^4)*2^26 * Ltmp2 = L_mult(TEN_S5_10, extract_h(Ltmp3)); if (Ltmp2 < 0) Ltmp2 = 0; // * 0.1875 scaled as 10,21 * Ltmp1 = L_add(Ltmp2, CONST_0_1875_S10_21); // * tmp / 0.375 2.667 scaled as 5,10, Ltmp is scaled 15,16 * Ltmp = L_mult(extract_h(Ltmp1), CONST_2_667_S5_10); ch_snr = extract_h(Ltmp); } */ 2. Optimized program: //Because the loop body is too large, split it into two loops and embed the corresponding functions to enable pipelined program, //Use L_div_tmp[] to save the intermediate variables generated by splitting. for ( i = LO_CHAN; i <= HI_CHAN; i++) { //norm_shift = norm_l(st->ch_noise); norm_shift = _norm(st->ch_noise); Ltmp = _sshl(st->ch_noise[i ], norm_shift); //norm_shift1 = norm_l(st->ch_enrg); norm_shift1 = _norm(st->ch_enrg); //Ltmp3 = L_shl1(st->ch_enrg, norm_shift1 - 1 ); LLtmp1 = st->ch_enrg; LLtmp1 = LLtmp1 << (norm_shift1 + 7); Ltmp3 = (Word32)(LLtmp1 >> 8); Ltmp2 = IL_divide(Ltmp3, Ltmp); //Ltmp2 = L_shr(Ltmp2, 27 - 1 + norm_shift1 - norm_shift); Ltmp2 = (Ltmp2 >> (27 - 1 + norm_shift1 - norm_shift)); if (Ltmp2 == 0) Ltmp2 = 1; L_div_tmp = Ltmp2; } for (i = LO_CHAN; i <= HI_CHAN; i++) { Ltmp2 = L_div_tmp; Ltmp1 = IfnLog10(Ltmp2); //Ltmp3 = L_add(Ltmp1, LOG_OFFSET - 80807124); Ltmp3 = _sadd(Ltmp1, LOG_OFFSET - 80807124); //Ltmp2 = L_mult(TEN_S5_10, extract_h(Ltmp3)); Ltmp2 = _smpy(TEN_S5_10, (Ltmp3 >> 16)); if (Ltmp2 < 0) Ltmp2 = 0; Ltmp1 = _sadd(Ltmp2, CONST_0_1875_S10_21); //Ltmp = L_mult(extract_h(Ltmp1), CONST_2_667_S5_10); Ltmp = _smpy((Ltmp1 >> 16),CONST_2_667_S5_10); //ch_snr = extract_h(Ltmp); ch_snr = (Ltmp >> 16); } 3. Optimization description Observe the above loop. The loop body itself is relatively large and contains two functions L_divide() and fnLog10(). However, there are only 32 registers in C62, and some registers are used by the system, such as B14 and B15. If the loop body is too large, there will be insufficient register allocation, which will cause the system compiler to be unable to implement the pipeline of the loop. In order to implement the pipeline of the loop. We need to split the loop body. When splitting, we need to consider the following points: (1) How many loops are suitable to split? Under the premise that each loop can be pipelined, the fewer the number of split loops, the better. This requires that the amount of calculation of each loop should be as close as possible. (2) Consider where it is more appropriate to split the program? The data flow in the loop body is often not single. At the breakpoint of the split, it is necessary to use intermediate variables to save the result of the previous loop operation for use in the next loop. Split the loop body appropriately to make the required intermediate variables as few as possible. (3) Function calls in the loop body must be defined as embedded. The loop system containing function calls cannot be pipelined. There should not be too many judgment branches in each loop body, otherwise the system cannot pipeline it. For this reason, the branches that can be determined should be determined as much as possible, and embedded instructions should be used as much as possible. For the above example, consider: (1) In order to make the calculation amount of each loop roughly the same, L_divide() and fnLog10() should be divided into two loops. Considering the size of the loop body, it is estimated that splitting into two loops is more appropriate. (2) Consider where it is more appropriate to split the program? After if (Ltmp2 == 0) Ltmp2 = 1;, it is separated. Since only Ltmp2 is used later, only one array is needed to store the Ltmp2 value of each loop. (3) The two function calls L_divide() and fnLog10() in the loop body both define their embedded forms, IL_divide() and IfnLog10(). After the branches that can be determined are determined and embedded instructions are used as much as possible, there are very few branch structures left in the loop body, and the loop body can be pipelined. The program used 2676 cycles before optimization and 400 cycles after optimization. The MII of the two sub-loops after optimization is 14 and 6 cycles respectively. Memory address format: Pentium and C6000 are both 32-bit computers with a word length of 32, but the memory addresses are organized by bytes, with 4 bytes in a word (when checking the memory, each word is: for example, two consecutive words ox1000 ox1004). When writing an assembly program, the next word also needs to be +4, but when writing C language, int type, +1 means adding 4. However, in Tiger SHARC, although it is also a 32-bit machine, the memory address is organized by word. When checking the memory, the consecutive word addresses differ by 1. 69)]/ ...
#define INTRINSIC

short add(short var1,short var2)
  {
   short var_out;
   int L_somme;

   L_somme = (int) var1 + var2;
   return(var_out);
  }
  
int main()
{
int i,result;
#ifdef INTRINSIC
for(i=0; i<1000;i++)
{
result=_sadd(100000,20);
result>0X00007fff?result=0x7fff:(result<0x8000?result=0x8000:0);
}
#else
for(i=0;i<1000;i++)
add(10,20);
#endif

return 0;
}


This post is from DSP and ARM Processors
 

Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building, Block B, 18 Zhongguancun Street, Haidian District, Beijing 100190, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list