SPRACN0 Application note

SPRACN0F October 2021 – March 2023 F29H850TU , F29H859TU-Q1 , TMS320F280021 , TMS320F280021-Q1 , TMS320F280023 , TMS320F280023-Q1 , TMS320F280023C , TMS320F280025 , TMS320F280025-Q1 , TMS320F280025C , TMS320F280025C-Q1 , TMS320F280033 , TMS320F280034 , TMS320F280034-Q1 , TMS320F280036-Q1 , TMS320F280036C-Q1 , TMS320F280037 , TMS320F280037-Q1 , TMS320F280037C , TMS320F280037C-Q1 , TMS320F280038-Q1 , TMS320F280038C-Q1 , TMS320F280039 , TMS320F280039-Q1 , TMS320F280039C , TMS320F280039C-Q1 , TMS320F280040-Q1 , TMS320F280040C-Q1 , TMS320F280041 , TMS320F280041-Q1 , TMS320F280041C , TMS320F280041C-Q1 , TMS320F280045 , TMS320F280048-Q1 , TMS320F280048C-Q1 , TMS320F280049 , TMS320F280049-Q1 , TMS320F280049C , TMS320F280049C-Q1 , TMS320F28374D , TMS320F28374S , TMS320F28375D , TMS320F28375S , TMS320F28375S-Q1 , TMS320F28376D , TMS320F28376S , TMS320F28377S , TMS320F28377S-Q1 , TMS320F28378D , TMS320F28378S , TMS320F28379D , TMS320F28379D-Q1 , TMS320F28379S , TMS320F28384D , TMS320F28384S , TMS320F28386D , TMS320F28386S , TMS320F28388D , TMS320F28388S , TMS320F28P650DH , TMS320F28P650DK , TMS320F28P650SH , TMS320F28P650SK , TMS320F28P659DH-Q1 , TMS320F28P659DK-Q1 , TMS320F28P659SH-Q1

3.3.2 In Depth

The FPU64 module is extremely useful when 32-bit precision, such that, single-precision floating point is not sufficient for applications. In many real-time control applications, this is the case, so 64-bit precision (double-precision floating point) is needed. This normally comes at a price since native hardware on the device does not support double-precision floating point operations. Users see a significant increase in CPU cycles when running double-precision floating-point algorithms. By incorporating hardware support, this increase is avoided.

#GUID-BC8C6374-4B4C-4649-B574-974D746FBD8C/TABLE_AZY_HYC_SMB compares the performance of double-precision operations on FPU64 vs FPU (single-precision floating point hardware).

Table 3-3 Cycle Comparison Between FPU64 and FPU32

Floating-Point Operations	FPU64 (cycles) --fp_mode=relaxed	FPU (cycles) --fp_mode=strict	FPU (cycles) --fp_mode=relaxed FPU64 disabled	FPU (cycles) --fp_mode=strict FPU64 disabled
32-bit Division	8	234	8	234
64-bit Division	27	27	2222	2222

The results in the above table (profiled with optimization off, code running from RAM) illustrate that the performance of double-precision floating-point division on FPU64 is slightly more expensive than single-precision floating-point division on FPU32. In the single-precision case, when the floating point mode (fp_mode) is set to relaxed, the compiler generates hardware instructions to perform single-precision division, at the slight expense of accuracy. When the floating point mode is set to strict, the compiler does not generate hardware instructions and calls into the RTS library to maintain accuracy, but at the cost of cycles.

In the double-precision case, the floating point mode does not matter, because the FPU64 implementation is accurate. As long as the FPU64 is enabled, the compiler generates hardware instructions supported by the FPU64 to perform double-precision division. If FPU64 is disabled, only then the compiler does not generate FPU64 hardware instructions and calls into the RTS library. 64-bit floating-point division cannot take advantage of 32-bit floating-point hardware, even in relaxed mode, which is why the cycles remain 2222.

Devices with the FPU64 utilize the same registers as the FPU except for the addition of eight floating-point results extension registers for double-precision floating-point operations. The FPU64 enhancements support all existing FPU single-precision floating-point instructions in addition to the 64-bit double-precision floating-point instructions. FPU64 64-bit instructions operate in one to three pipeline cycles, with some instructions also supporting a parallel move operation.

Using the FPU64 is straightforward. Users simply write C code with double-precision floating-point variables and operations, compile it with TI C28x C/C++ Compiler v18.9.0.STS (or later), and with compiler switches --float_support = fpu64. This will generate C28x native double-precision floating-point instructions. It may be helpful to refer to #GUID-BC8C6374-4B4C-4649-B574-974D746FBD8C/GUID-063F2853-70D6-424A-A77B-32A57EB47557 for those unfamiliar with the size of the standard C data types on a C28x CPU. Users who want to write hand-optimized assembly for FPU64 may do so easily as well. There is roughly a one-to-one correspondence between single-precision and double-precision floating point assembly instructions. For the complete instruction set and details, see the document referenced in the documentation section. Software examples are listed in a section below that point to DSP and Math double-precision floating-point hand-optimized assembly routines that users can use for application development.

Table 3-4 Data Type Size C28x vs Arm

Data Type	C28x Bit Length (EABI)	Arm Bit Length
char	16	8
short	16	16
int	16	32
long	32	32
long long	64	64
float	32	32
double	64⁽¹⁾	64
long double	64	64
pointer	32	32

(1) C28x COFF treats double as 32-bits

With the advent of auto code generation tools like MATLAB’s Embedded Coder, many customers are migrating to them for auto code generation. Since MATLAB uses double-precision floating-point by default, it becomes easier to port code from a simulation environment to an embedded environment without having to revalidate operating performance of the system due to a reduction in precision. This is another key benefit of using the FPU64.