Label: Parallel tag [condition register] instruction mnemonic functional unit operand; comment. For example: LDW .D2 *B4,B2 || [A1]SHL .S2X A4,B4; cross data channel used The TMS320C62X chip has 8 parallel processing units, divided into two identical groups. Its architecture adopts the very long instruction word (VLIW) structure. The 8 parallel instructions in an instruction package can be simultaneously assigned to 8 processing units for parallel operation. This kind of parallel execution of 8 instructions in an instruction package also brings many issues to consider when writing parallel assembly code, as follows: (1) The execution of TMS320C62X instructions can be described by delay gaps. The delay gap is equal to the number of instruction cycles from the source operand of the instruction to the execution result can be accessed. For example, for the multiplication instruction (MPY), the source operand is read from the i-th cycle, and its calculation result is available only in the (i+2)th cycle. (2) Two instructions using the same functional unit cannot be arranged as parallel instructions. (3) Two instructions that use the same cross path cannot be arranged in the same execution instruction package, because there is only one cross path from register group A to B or from B to A. (4) Two read (write) instructions that read data into (or store data from) the same register group cannot be arranged in the same execution package. (5) Each register group can only process one long fixed-point type data in each execution package. (6) It is not allowed to read the same register more than four times in one instruction cycle, but the conditional register is not subject to this restriction. In one instruction cycle, there cannot be two instructions that write to the same register at the same time. Only when the write operation does not occur in the same instruction cycle can two instructions with the same destination address be arranged in parallel.