MSP430 library hardware multiplier usage

Aguilera

MSP430 library hardware multiplier usage [Copy link]

This post was last edited by Aguilera on 2020-3-16 21:40

Hardware Introduction:
In the MSP430 series of microcontrollers, the hardware multiplier is a peripheral module, not part of the CPU core; so its activity is independent of the CPU activity, and its registers are read and written by CPU instructions like other peripheral registers.

The hardware multiplier module supports the following functions: unsigned multiplication, signed multiplication, unsigned multiplication-add, signed multiplication-add; it can support 16*16 16*8 8*16 8*8 bits multiplication.

The module block diagram of the hardware multiplier is as follows:

image.png (77.07 KB, downloads: 0)

download attach save to album

2020-3-16 21:33 上传

The four types of operations of the hardware multiplier module (unsigned multiplication, signed multiplication, unsigned multiply-add, and signed multiply-add) are determined by the position of the first operand written. This module has two operand registers: OP1 and OP2, and three result registers: RESLO, RESHI, and SUMEXT. The RESLO register stores the low word (lower 16 bits) of the result; the RESHI register stores the high word (upper 16 bits) of the result; and the SUMEXT register stores information about the result. The result is ready after 3 clock cycles; the next instruction after writing to OP2 can read the result, with one exception: when the result is accessed using indirect addressing. When the result is accessed using indirect addressing, a NOP instruction is required before reading the result.

Operand OP1 has four addresses (MPY:0130h MPYS:0132h MAC:0134h MACS:0136h). These four registers are used to select the multiplication operation mode. Writing to the first operand register determines which operation to use: unsigned uses signed, etc., but does not start the multiplication operation; writing to the second operand register starts the multiplication operation. After the calculation is completed, the result is stored in registers RESLO, RESHI, and SUMEXT.

The operations corresponding to the four addresses of operand 1 are:
```
OP1 Address Register Name   Operation
0130h MPY Unsigned multiply (unsigned multiplication)
0132h MPYS Signed multiply (signed multiplication)
0134h MAC Unsigned multiply accumulate
0136h MACS Signed multiply accumulate
```
The contents of the high-order result register in the four operation modes are as follows:
```
Mode        RESHI Contents
MPY         Upper 16-bits of the result
MPYS        The MSB is the sign of the result. The remaining bits are the upper
            15-bits of the result. Two’s complement notation is usedfor the result.
MAC         Upper 16-bits of the result
MACS        Upper 16-bits of the result. Two’s complement notation is used for the result.
```
The contents of the SUMEXT register in four operation modes:
```
Mode        SUMEXT
MPY         SUMEXT is always 0000h
MPYS        SUMEXT contains the extended sign of the result
            00000h Result was positive or zero
            0FFFFh Result was negative
MAC         SUMEXT contains the carry of the result
            0000h No carry for result
            0001h Result has a carry
MACS        SUMEXT contains the extended sign of the result
            00000h Result was positive or zero
            0FFFFh Result was negative
```
During continuous multiplication operations, if operand 1 can be operated without change, there is no need to rewrite the sum to save the same number; but OP2 must be rewritten to start the multiplication operation.

MACS Underflow and Overflow: The hardware multiplier does not detect overflow or underflow of the result of a signed multiplication and addition operation. The positive range of the result is: 0 to 7FFF FFFFh; the negative range is: 0FFFF FFFFh to 8000 0000h. Underflow is when the sum of two negative numbers is a positive number in the result register, and overflow is when the sum of two positive numbers is a negative number in the result register. The SUMEXT register stores the sign of the result, which can be used to determine whether there is an overflow (0000h for a negative sum, overflow; 0FFFFh for a positive sum, underflow). When using MACS, the program must properly detect and handle MACS overflow.

Program example (assembly example given in the user guide):

Examples for all multiplier modes are as follows. All 8x8 modes use absolute addresses for registers because the assembler will not allow B access to word registers when using the standard defined file labels.
```
; 16x16 Unsigned Multiply
MOV #01234h,&MPY ; Load first operand
MOV #05678h,&OP2 ; Load second operand
; ... ; Process results
; 8x8 Unsigned Multiply. Absolute addressing.
MOV.B #012h,&0130h ; Load first operand
MOV.B #034h,&0138h ; Load 2nd operand
; ... ; Process results
; 16x16 Signed Multiply
MOV #01234h,&MPYS ; Load first operand
MOV #05678h,&OP2 ; Load 2nd operand
; ... ; Process results
; 8x8 Signed Multiply. Absolute addressing.
MOV.B #012h,&0132h ; Load first operand
SXT &MPYS ; Sign extend first operand
MOV.B #034h,&0138h ; Load 2nd operand
SXT &OP2 ; Sign extend 2nd operand
; (triggers 2nd multiplication)
; ... ; Process results
; 16x16 Unsigned Multiply Accumulate
MOV #01234h,&MAC ; Load first operand
MOV #05678h,&OP2 ; Load 2nd operand
; ... ; Process results
; 8x8 Unsigned Multiply Accumulate. Absolute addressing
MOV.B #012h,&0134h ; Load first operand
MOV.B #034h,&0138h ; Load 2nd operand
; ... ; Process results
; 16x16 Signed Multiply Accumulate
MOV #01234h,&MACS ; Load first operand
MOV #05678h,&OP2 ; Load 2nd operand
; ... ; Process results
; 8x8 Signed Multiply Accumulate. Absolute addressing
MOV.B #012h,&0136h ; Load first operand
SXT &MACS ; Sign extend first operand
MOV.B #034h,R5 ; Temp. location for 2nd operand
SXT R5 ; Sign extend 2nd operand
MOV R5,&OP2 ; Load 2nd operand
; ... ; Process results
```
Although the above program is quite different from the standard assembly language, it is still easy for people with a basic understanding of assembly language to understand it. The program here provides multiple ways to write to the operand register.

When addressing the result register indirectly, after writing the OP2 operand to start the multiplication, at least one instruction delay is required before accessing the result register RESLO, etc.; when addressing directly, after writing OP2, the next instruction can read the result. Sample program (assembly):
```
; Access multiplier results with indirect addressing
MOV #RESLO,R5 ; RESLO address in R5 for indirect
MOV &OPER1,&MPY ; Load 1st operand
MOV &OPER2,&OP2 ; Load 2nd operand
NOP; Need one cycle Write two operands and a NOP is needed after the multiplication operation starts
MOV @R5+,&xxx ; Move RESLO
MOV @R5,&xxx ; Move RESHI
```
If an interrupt occurs between writing OP1 and writing OP2, the calculation mode of the source operand is lost after the interrupt response, and the operation result is uncertain. To avoid this situation, disable interrupts when writing operands or do not use the hardware multiplier in the interrupt response function. For example:
```
; Disable interrupts before using the hardware multiplier
DINT ; Disable interrupts
NOP ; Required for DINT
MOV #xxh,&MPY ; Load 1st operand
MOV #xxh,&OP2 ; Load 2nd operand
EINT ; Interrupts may be enable before
; Process results
```
That’s all I have to say about the hardware. If you don’t understand anything, you can refer to the user guide.
Example of use:
My program is just to demonstrate the use of hardware multiplier in C language. The main contents of the program are as follows:
```
#include <msp430x16x.h>
/****************************************************************************
* Name: main program
* Function: Demonstration of using hardware multiplier library
* Entry parameters: None
* Export parameters: None
****************************************************************************/
void main( void )
{
    // Stop watchdog timer to prevent time out reset
    WDTCTL = WDTPW + WDTHOLD;
    ClkInit();
    
    /*Put the register of the hardware multiplier in the watch window to observe whether it changes
    int a = 0;
    a=  5*6;
    */
    //Test unsigned multiplication
    MPY = 65535;
    OP2 = 2;
    //Signed multiplication
    MPYS = 65535;
    OP2 = 2;
    //Unsigned multiplication and addition
    MAC = 65535;
    OP2 = 2;
    //Signed multiplication and addition
    MACS = 65535;
    OP2 = 2;
    LPM0;
}
```
The program demonstrates 4 multiplication modes: single-step debugging when used, observing the relevant registers of the hardware multiplier. For example:

The hardware multiplier operates very fast, requiring only 3 clock cycles; when IAR is single-stepping, the OP2 assignment is completed and the operation result can be seen immediately in the watch window. The other three modes are similar.

The commented out part is what I use to test whether the IAR compiler uses the hardware multiplier. By default, multiplication should be performed using the hardware multiplier. The default settings are as follows:

image.png (70.37 KB, downloads: 0)

download attach save to album

2020-3-16 21:39 上传

The hardware multiplier is selected, and it should be used at this time, but my debugging results show that it does not use the hardware multiplier. The screenshot is as follows:

image.png (76.34 KB, downloads: 0)

download attach save to album

2020-3-16 21:40 上传

After running, there is no corresponding change in the multiplier related bits. If used, they should change.

When the hardware multiplier is not selected, the register does not change accordingly. From this, IAR does not use the hardware multiplier; perhaps the program is not optimized too much or the debug version does not use the hardware multiplier.

If you need to use the hardware multiplier directly, remove the set hardware multiplier if necessary to prevent conflicts.

Here is an example using the hardware multiplier directly:
```
#include "msp430x16x.h"
unsigned int Result[7];
unsigned char Data1[7];
unsigned char Data2[7];
void main(void)
{
    unsigned char i;
    WDTCTL = WDTPW + WDTHOLD; // Turn off the watchdog
    for(i=0; i<7; i++)
    {
        Data1 = 10 * i; // Assign values to two arrays
        Data2 = 25 * i;
    }
    for(i=0; i<7; i++)
    {
        MPY = Data1;
        OP2 = Data2;
        _NOP(); // Delay
        _NOP();
        _NOP();
        Result = RESLO; // Save the result. Since it is 8×8 type, RESHI is not used.
    }
}
```
This program uses unsigned multiplication and stores the result in the result array. It is worth noting that there are three NOPs in the program. NOPs are not needed here. According to the header file, IAR compiler should use direct addressing mode, which can be omitted. If you are not sure, one NOP is enough. Even if indirect addressing is used, the delay of one NOP is enough.

Hardware multipliers are generally not used as in the above program, it would be a waste if it were; it is easier to use the * operator directly; hardware multipliers are mainly used in situations where time is critical, such as using 430 for digital filtering, fast Fourier transform, etc.

This is the end of the hardware multiplier. I hope it helps you. If you find any shortcomings, please feel free to comment.

兰博

This post is really good, it is very clear, thank you

MSP430 library hardware multiplier usage [Copy link]

Latest reply