SPRUII0 User guide

SPRUII0F May 2019 – June 2024 TMS320F28384D , TMS320F28384D-Q1 , TMS320F28384S , TMS320F28384S-Q1 , TMS320F28386D , TMS320F28386D-Q1 , TMS320F28386S , TMS320F28386S-Q1 , TMS320F28388D , TMS320F28388S

MMPYF32 MRa, MRb, MRc||MADDF32 MRd, MRe, MRf

32-Bit Floating-Point Multiply with Parallel Add

Operands

MRa	CLA floating-point destination register for MMPYF32 (MR0 to MR3) MRa cannot be the same register as MRd
MRb	CLA floating-point source register for MMPYF32 (MR0 to MR3)
MRc	CLA floating-point source register for MMPYF32 (MR0 to MR3)
MRd	CLA floating-point destination register for MADDF32 (MR0 to MR3) MRd cannot be the same register as MRa
MRe	CLA floating-point source register for MADDF32 (MR0 to MR3)
MRf	CLA floating-point source register for MADDF32 (MR0 to MR3)

Opcode

LSW: 0000 ffee ddcc bbaa 
MSW: 0111 1010 0000 0000

Description

Multiply the contents of two floating-point registers with parallel addition of two registers.

MRa = MRb * MRc;
MRd = MRe + MRf;

Restrictions

The destination register for the MMPYF32 and the MADDF32 must be unique. That is, MRa cannot be the same register as MRd.

Flags

This instruction modifies the following flags in the MSTF register:

Flag	TF	ZF	NF	LUF	LVF
Modified	No	No	No	Yes	Yes

The MSTF register flags are modified as follows:

LUF = 1 if MMPYF32 or MADDF32 generates an underflow condition.
LVF = 1 if MMPYF32 or MADDF32 generates an overflow condition.

Pipeline

Both MMPYF32 and MADDF32 complete in a single cycle.

Example

; Perform 5 multiply and accumulate operations: 
; 
; X and Y are 32-bit floating-point arrays 
; 
; 1st multiply: A = X0 * Y0 
; 2nd multiply: B = X1 * Y1 
; 3rd multiply: C = X2 * Y2 
; 4th multiply: D = X3 * Y3 
; 5th multiply: E = X3 * Y3 
; 
; Result = A + B + C + D + E 
; 
_Cla1Task1: 
    MMOVI16    MAR0, #_X               ; MAR0 points to X array 
    MMOVI16    MAR1, #_Y               ; MAR1 points to Y array 
    MNOP                               ; Delay for MAR0, MAR1 load 
    MNOP                               ; Delay for MAR0, MAR1 load 
                                       ; <-- MAR0 valid 
    MMOV32     MR0, *MAR0[2]++         ; MR0 = X0, MAR0 += 2 
                                       ; <-- MAR1 valid 
    MMOV32     MR1, *MAR1[2]++         ; MR1 = Y0, MAR1 += 2 
    MMPYF32    MR2, MR0, MR1           ; MR2 = A = X0 * Y0 
||  MMOV32     MR0, *MAR0[2]++         ; In parallel MR0 = X1, MAR0 += 2 
    MMOV32     MR1, *MAR1[2]++         ; MR1 = Y1, MAR1 += 2 
    MMPYF32    MR3, MR0, MR1           ; MR3 = B = X1 * Y1 
||  MMOV32     MR0, *MAR0[2]++         ; In parallel MR0 = X2, MAR0 += 2 
    MMOV32     MR1, *MAR1[2]++         ; MR1 = Y2, MAR2 += 2 
    MMACF32    MR3, MR2, MR2, MR0, MR1 ; MR3 = A + B, MR2 = C = X2 * Y2 
||  MMOV32     MR0, *MAR0[2]++         ; In parallel MR0 = X3 
    MMOV32     MR1, *MAR1[2]++         ; MR1 = Y3 
    MMACF32    MR3, MR2, MR2, MR0, MR1 ; MR3 = (A + B) + C, MR2 = D = X3 * Y3 
||  MMOV32     MR0, *MAR0              ; In parallel MR0 = X4 
    MMOV32     MR1, *MAR1              ; MR1 = Y4 
    MMPYF32    MR2, MR0, MR1           ; MR2 = E = X4 * Y4 
||  MADDF32    MR3, MR3, MR2           ; in parallel MR3 = (A + B + C) + D 
    
    MADDF32    MR3, MR3, MR2           ; MR3 = (A + B + C + D) + E 
    MMOV32     @_Result, MR3           ; Store the result 
    MSTOP                              ; end of task