Introduction to TMS320C25's memory allocation and other hardware
[Copy link]
The following describes the memory allocation, central arithmetic logic unit (CALU), hardware multiplier, control operations, serial ports, and I/O interfaces of the TMS320C25.
1. Memory allocation
TMS320C25 has 4K words of on-chip program ROM and 544 words of on-chip RAM. The RAM is divided into three blocks: B0, B1, and B2. Among them, the B0 block (256 words) can be configured as data memory (using CNFD instructions) or as program memory (using CNFP instructions). The remaining 288 words (B1 and B2 blocks) can only be data memory. The 544 words of on-chip RAM enable C25 to process 512-word data arrays, such as 256-point complex FFT operations, and there are still 32 words for temporary storage of intermediate results. TMS320C25 provides 64K words of program and data space that can be directly addressed off-chip.
The register group contains 8 auxiliary registers (AR0~AR7), which can be used for indirect addressing and temporary storage of data memory, thereby increasing the flexibility and efficiency of the chip. These registers can be directly addressed by instructions or indirectly addressed by the 3-bit auxiliary register pointer (ARP). The auxiliary registers and ARP can be loaded from the data memory or loaded with immediate data. The contents of the registers can also be stored in the data memory. The auxiliary register group is connected to the auxiliary register arithmetic unit (ARAU). Using ARAU to access the information table does not require the CALU to participate in the address operation, which allows the CALU to perform other operations.
2. Central Arithmetic Logic Unit
The CALU contains a 16-bit scaling shifter (Scaling), a 16×16-bit parallel multiplier, a 32-bit accumulator and a 32-bit arithmetic logic unit (ALU). The shifter provides 0 to 16-bit left shift of data according to the instruction requirements. The shifters at the output of the accumulator and multiplier are suitable for normalization of values, bit extraction, extended precision arithmetic and overflow protection.
A typical ALU instruction implementation includes the following three steps:
(1) Data is retrieved from RAM on the data bus;
(2) The data is transferred to the scaled shifter and ALU which perform arithmetic operations;
(3) The result is sent back to the accumulator.
The 32-bit accumulator can be divided into two 16-bit registers for data storage: SACH (upper 16 bits) and ACCL (lower 16 bits). The accumulator has a carry bit to facilitate multi-precision operations for addition and subtraction.
3. Hardware multiplier
The TMS320C25 has a 16×16-bit hardware multiplier that can calculate a 32-bit product in one instruction cycle. There are two registers associated with the multiplier: ① a 16-bit temporary register TR, which is used to store an operand of the multiplier; and ② a 32-bit product register PR, which is used to store the product.
The output of the product register can be left shifted 1 or 4 bits, which is useful for implementing fractional arithmetic or adjusting fractional products. The output of the PR can also be right shifted 6 bits, which allows 128 consecutive multiplications/additions to be performed without overflow. The Multiply Unsigned (MPYU) instruction facilitates extended precision multiplication.
4. I/O interface
The I/O space consists of 16 input ports and 16 output ports. These ports provide a full 16-bit parallel I/O interface. Input (IN) and output (OUT) operations are typically 2 cycles, but can be turned into single-cycle instructions if repeated instructions are used. I/O devices are mapped to the I/O address space in the same way as memory mapping. Interfacing with memory or I/O devices of different speeds is accomplished using the READY line.
TMS320C25 also supports DMA for external program/data memory. Other processors can fully control the external memory of TMS320C25 by setting HOLD\ to low, so that C25 puts its address, data and control lines in high impedance state. Communication between external processor and C25 can be completed through interrupts. TMS320C25 chip provides two DMA modes, one is to stop execution after adding HOLD; the other is that C25 continues to execute, but the execution is carried out in the on-chip ROM and RAM, which can greatly improve performance.
2.3.2.3 TMS320C25 Software
There are a total of 133 instructions in TMS320C25, of which 97 are single-cycle instructions. Of the other 36 instructions, 21 include jumps, calls, returns, etc. These instructions need to be reloaded into the program counter to interrupt the execution pipeline. The other 7 instructions are double-word and long immediate instructions. The remaining 8 instructions (IN, OUT, BLKD, BLKP, TBLR, TBLW, MAC, MACD) support I/O operations, data exchange between memories, or provide additional parallel operations within the processor, and these 8 instructions can become single-cycle instructions when used with a repeat counter. This mainly utilizes the parallel mechanism of the processor, so that complex calculations can be completed with very few instructions.
Since most instructions are encoded in a single 16-bit word, they can be completed in one cycle. There are three memory addressing modes: direct addressing, indirect addressing, and immediate addressing. Both direct addressing and indirect addressing are used to access data memory, and immediate addressing uses the memory contents determined by the program counter.
When using direct addressing, the 7-bit instruction word and the 9-bit data memory page pointer (DP) form a 16-bit data memory address. Each page is 128 words long, with a total of 512 pages, so 64K of data space can be addressed. Indirect addressing uses 8 auxiliary registers (AR0~AR7). Table 2.2 lists 7 indirect addressing methods. The bit reversal addressing can greatly improve the I/O efficiency of FFT operations. OP represents a certain operation, and NARP represents a new ARP.
Addressing mode of TMS320C25
|