2206 views|1 replies

2015

Posts

0

Resources
The OP
 

TI C6000 Optimization Advanced: Loops are the most important! [Copy link]

Software Pipelining

1. C6000 Pipeline

The processing of an instruction is not completed in one step, it is divided into three stages: fetch, decode, and execute. Putting each stage into an independent process workshop to form a pipeline processing process can greatly speed up the processing speed of instructions.

As shown in Figure 1, the three instructions after pipeline arrangement only require five cycles, which is significantly reduced compared to the nine cycles of sequential execution. As the number of instructions increases, the advantage of the pipeline will become more obvious.

Figure 1: Simple pipeline arrangement

In fact, the C6000 architecture further divides each stage into multiple sub-stages, each of which consumes 1 CPU cycle.

Instruction fetch (4 sub-stages):

  • PG: Program address generate (update program counter register)

  • PS: Program address send (to memory)

  • PW: Program (memory) access ready wait

  • PR: Program fetch packet receive (fetch packet = eight 32-bit instructions)

Decoding (2 sub-stages):

  • DP: Instruction dispatch (or assign, to the functional units)

  • DC: Instruction decode

Execution (1-10 sub-stages, different between instructions):

  • E1 – E10, where E1 is the first sub stage in the execute stage

Figure 2 High-performance C6000 pipeline

2. Pipeline blocking

The pipeline will be blocked when the following two situations occur:

  • When the current instruction is a load, complex multiply, or other instruction with multiple delay slots, the next instruction requires multiple cycles and can only continue execution after it returns a result.

  • When a jump instruction appears, the CPU cannot predict which branch instruction to execute next, so the jump target instruction must wait until the jump instruction is executed to the E1 stage before entering the pipeline.

In order to make full use of pipeline resources and avoid blocking caused by delay slots, the C6000 architecture adds a new processing mechanism in software and hardware:

  • Software: Provide software pipelining instruction arrangement

  • Hardware: Provide SPLOOP buffer (software pipelining loop buffer)

3. Software Pipelining

Software pipelining ≠ assembly line!

Software pipelining technology refers to the compiler re-adjusting the position of instructions so that the pipeline that would otherwise be blocked can be fully utilized. The emphasis is on the word "software".

For example, processing the following loop:

for(i=0; i<15; i++)

{

sum += tab;

}

The traditional instruction flow (Solution 1) and the instruction flow after software pipelining (Solution 2) are shown in Figure 3:

Figure 3 Traditional orchestration vs software pipeline orchestration

As can be seen from the figure, instructions that have been re-arranged by software will no longer cause pipeline blockage, which improves operating efficiency while not affecting the implementation of code functions.

Explain three terms related to software pipelining:

  • Pipeline Kernel: A piece of code that fully utilizes the pipeline

  • Pipeline filling (Prolog): a piece of filling process code before the pipeline core

  • Epilog: The epilog code after the epilog core.

4. SPLOOP Buffer

Software pipelining will bring two major disadvantages: the increase in the size of the assembly file code and the impact on the interrupt properties of the code.

SPLOOP Buffer is designed to solve the above problems. SPLOOP Buffer is a storage area inside C6000, which is used to load SPLOOP instructions. When a SPLOOP is executed for the first time, the relevant instructions of the loop are copied to SPLOOP Buffer, and the entire loop operation process will fetch instructions from here until the loop ends.

C6000 also provides special registers and operation instructions for the use of SPLOOP Buffer. If the programmer uses assembly/linear assembly programming, he needs to be familiar with these instructions and registers, and understand the unique execution mechanism of SPLOOP Buffer (reference [1]). If C/C++ programming is used, the compiler will automatically generate the corresponding instructions.

Taking the storage block copy function as an example, the encoding effect before and after using SPLOOP Buffer is shown:

Figure 4: Memery copy before using SPLOOP buffer

Figure 5: Memery copy after using SPLOOP buffer

*Note: SPLOOP Buffer can only store up to 14 execution packages (each execution package can contain 8 instructions, sequential/parallel), so if the loop body is complex, SPLOOP Buffer cannot be used.

5. Factors that lead to software pipelining failure

In the CCS development environment, turn on the -O2/-O3 optimization option, and the compiler will automatically perform software pipelining for the appropriate code. Therefore, programmers need to pay attention to making the designed loop body meet the conditions for software pipelining.

The following factors may cause software pipelining to fail:

  • Assembly statements embedded in C/C++ code

  • Complex flow control statements such as goto, break, etc. appear

  • A loop contains a call (except for inline functions)

  • Too many instructions need to be software pipelining

  • The loop counter is not initialized

  • The loop variable is modified during the loop

  • Software pipelining is disabled: -O2 or -O3 options are not used; -ms2 or -ms3 options are used; -mu is used to disable software pipelining

This post is from DSP and ARM Processors

Latest reply

Very good sharing.   Details Published on 2020-1-3 13:34
 

2618

Posts

0

Resources
2
 

Very good sharing.

This post is from DSP and ARM Processors
 
 

Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building, Block B, 18 Zhongguancun Street, Haidian District, Beijing 100190, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list