SPRACN4 Application note

SPRACN4 August 2019 66AK2G12 , 66AK2H06 , 66AK2H12 , 66AK2H14 , OMAP-L132 , OMAP-L138 , TMS320C6452 , TMS320C6454 , TMS320C6455 , TMS320C6457 , TMS320C6652 , TMS320C6654 , TMS320C6655 , TMS320C6657 , TMS320C6672 , TMS320C6674 , TMS320C6678 , TMS320C6742 , TMS320C6743 , TMS320C6745 , TMS320C6746 , TMS320C6747 , TMS320C6748

Using DSPLIB FFT Implementation for Real Input and Without Data Scaling

2.1 Suggested Change

The change to both the routines (DSP_fft16x16 and DSP_ifft16x16) is similar. The below description suggests modifications to the serial assembly (SA) implementation of the kernels. The kernels are located at:

[DSPLIB_INSTALLATION_DIR]\packages\ti\dsplib\src\DSP_fft16x16\c64P\DSP_fft16x16_sa.sa
[DSPLIB_INSTALLATION_DIR]\packages\ti\dsplib\src\DSP_fft16x16\c64P\DSP_fft16x16_sa.sa

Change 1: Identify the below code in the SA files:

;----------------------------------------------------------;
; Compute first set of outputs: ;
; ;
; x0[0]= xh0_0 + xh20_0 + 1 >> 1 ;
; x0[1]= xh1_0 + xh21_0 + 1 >> 1 ;
; x0[2]= xh0_1 + xh20_1 +1 >> 1 ;
; x0[3]= xh1_1 + xh21_1 +1 >> 1 ;
;----------------------------------------------------------;
AVG2.2 B_xh1_0_xh0_0, A_xh21_0_xh20_0, B_x_1o_x_0o
AVG2.2 B_xh1_1_xh0_1, A_xh21_1_xh20_1, B_x_3o_x_2°

Update the code to:

ADD2.2 B_xh1_0_xh0_0, A_xh21_0_xh20_0, B_x_1o_x_0o
ADD2.2 B_xh1_1_xh0_1, A_xh21_1_xh20_1, B_x_3o_x_2°

Note replacement of AVG2 instruction with ADD2.

Change 2: Identify the below code in the SA files:

;---------------------------------------------------------;
; The following code computes intermediate results for: ;
; ;
; si10' = -si10 twiddle table has -sin factors ;
; 
; x2[h2 ] = (co10 * xt0_0 + si10'* yt0_0 + 0x8000) >> 16 ;
; x2[h2+1] = (co10 * yt0_0 - si10'* xt0_0 + 0x8000) >> 16 ;
; x2[h2+2] = (co11 * xt0_1 + si11'* yt0_1 + 0x8000) >> 16 ;
; x2[h2+3] = (co11 * yt0_1 - si11'* xt0_1 + 0x8000) >> 16 ;
;---------------------------------------------------------;
FFT Implementation With No Data Scaling 2
CMPYR .M1 A_co10_si10, B_yt1_0_xt1_0, A_xh2_1_0;
CMPYR .M1 A_co11_si11, B_yt1_1_xt1_1, A_xh2_3_2;
;---------------------------------------------------------;
; 
; x2[l1 ] = (co20 * xt1_0 + si20'* yt1_0 + 0x8000) >> 16 ;
; x2[l1+1] = (co20 * yt1_0 - si20'* xt1_0 + 0x8000) >> 16 ;
; x2[l1+2] = (co21 * xt1_1 + si21'* yt1_1 + 0x8000) >> 16 ;
; x2[l1+3] = (co21 * yt1_1 - si21'* xt1_1 + 0x8000) >> 16 ;
; 
; These four results are retained in registers and a ;
; double word is formed so that it can be stored with ;
; one STDW. ;
;---------------------------------------------------------;
; This equation ONLY has minus sign for x, y components
CMPYR .M1 A_co20_si20, A_myt0_0_mxt0_0, A_xl1_1_0;
CMPYR .M1 A_co21_si21, A_myt0_1_mxt0_1, A_xl1_3_2;
;---------------------------------------------------------;
; The following code computes intermediate results for: ;
; 
; x2[l2 ] = (co30 * xt2_0 + si30'* yt2_0 + 0x8000) >> 16 ;
; x2[l2+1] = (co30 * yt2_0 - si30'* xt2_0 + 0x8000) >> 16 ;
; x2[l2+2] = (co31 * xt2_1 + si31'* yt2_1 + 0x8000) >> 16 ;
; x2[l2+3] = (co31 * yt2_1 - si31'* xt2_1 + 0x8000) >> 16 ;
;---------------------------------------------------------;
CMPYR .M2 B_co30_si30, B_yt2_0_xt2_0, B_xl2_1_0
CMPYR .M2 B_co31_si31, B_yt2_1_xt2_1, B_xl2_3_2

Update the code to:

;---------------------------------------------------------;
; The following code computes intermediate results for: ;
; 
; si10' = -si10 twiddle table has -sin factors ;
; ;
; x2[h2 ] = (co10 * xt0_0 + si10'* yt0_0 + 0x8000) >> 16 ;
; x2[h2+1] = (co10 * yt0_0 - si10'* xt0_0 + 0x8000) >> 16 ;
; x2[h2+2] = (co11 * xt0_1 + si11'* yt0_1 + 0x8000) >> 16 ;
; x2[h2+3] = (co11 * yt0_1 - si11'* xt0_1 + 0x8000) >> 16 ;
;---------------------------------------------------------;
CMPYR1 .M1 A_co10_si10, B_yt1_0_xt1_0, A_xh2_1_0;
CMPYR1 .M1 A_co11_si11, B_yt1_1_xt1_1, A_xh2_3_2;
;---------------------------------------------------------;
; 
; x2[l1 ] = (co20 * xt1_0 + si20'* yt1_0 + 0x8000) >> 16 ;
; x2[l1+1] = (co20 * yt1_0 - si20'* xt1_0 + 0x8000) >> 16 ;
; x2[l1+2] = (co21 * xt1_1 + si21'* yt1_1 + 0x8000) >> 16 ;
; x2[l1+3] = (co21 * yt1_1 - si21'* xt1_1 + 0x8000) >> 16 ;
; ;
FFT Implementation With No Data Scaling 3
; These four results are retained in registers and a ;
; double word is formed so that it can be stored with ;
; one STDW. ;
;---------------------------------------------------------;
; This equation ONLY has minus sign for x, y components
CMPYR1 .M1 A_co20_si20, A_myt0_0_mxt0_0, A_xl1_1_0;
CMPYR1 .M1 A_co21_si21, A_myt0_1_mxt0_1, A_xl1_3_2;
;---------------------------------------------------------;
; The following code computes intermediate results for: ;
; 
; x2[l2 ] = (co30 * xt2_0 + si30'* yt2_0 + 0x8000) >> 16 ;
; x2[l2+1] = (co30 * yt2_0 - si30'* xt2_0 + 0x8000) >> 16 ;
; x2[l2+2] = (co31 * xt2_1 + si31'* yt2_1 + 0x8000) >> 16 ;
; x2[l2+3] = (co31 * yt2_1 - si31'* xt2_1 + 0x8000) >> 16 ;
;---------------------------------------------------------;
CMPYR1 .M2 B_co30_si30, B_yt2_0_xt2_0, B_xl2_1_0
CMPYR1 .M2 B_co31_si31, B_yt2_1_xt2_1, B_xl2_3_2

Note the CMPYR instruction has been replaced with CMPYR1 intrinsic The updates to the FFT routines can be incorporated in the application in two ways:

The DSPLIB SW can be recompiled so that the generated library includes the updated kernels. To do this, recompile the library project [DSPLIB_INSTALLATION_DIR] \dsplib_v210\dsplib64plus.pjt. The updated library will be generated at [DSPLIB_INSTALLATION_DIR]\Release\dsplib64plus_rebuild.lib.
The updated kernels can be directly included in the application project. Including the updated kernels will override the kernels that are included in the dsplib library.