SPRACW5 Application note

SPRACW5A April 2021 – December 2021 F29H850TU , F29H859TU-Q1 , TMS320F2800132 , TMS320F2800133 , TMS320F2800135 , TMS320F2800137 , TMS320F280021 , TMS320F280021-Q1 , TMS320F280023 , TMS320F280023-Q1 , TMS320F280023C , TMS320F280025 , TMS320F280025-Q1 , TMS320F280025C , TMS320F280025C-Q1 , TMS320F280033 , TMS320F280034 , TMS320F280034-Q1 , TMS320F280036-Q1 , TMS320F280036C-Q1 , TMS320F280037 , TMS320F280037-Q1 , TMS320F280037C , TMS320F280037C-Q1 , TMS320F280038-Q1 , TMS320F280038C-Q1 , TMS320F280039 , TMS320F280039-Q1 , TMS320F280039C , TMS320F280039C-Q1 , TMS320F280040-Q1 , TMS320F280040C-Q1 , TMS320F280041 , TMS320F280041-Q1 , TMS320F280041C , TMS320F280041C-Q1 , TMS320F280045 , TMS320F280048-Q1 , TMS320F280048C-Q1 , TMS320F280049 , TMS320F280049-Q1 , TMS320F280049C , TMS320F280049C-Q1 , TMS320F28075 , TMS320F28075-Q1 , TMS320F28076 , TMS320F28374D , TMS320F28374S , TMS320F28375D , TMS320F28375S , TMS320F28375S-Q1 , TMS320F28376D , TMS320F28376S , TMS320F28377D , TMS320F28377D-EP , TMS320F28377D-Q1 , TMS320F28377S , TMS320F28377S-Q1 , TMS320F28378D , TMS320F28378S , TMS320F28379D , TMS320F28379D-Q1 , TMS320F28379S , TMS320F28384D , TMS320F28384D-Q1 , TMS320F28384S , TMS320F28384S-Q1 , TMS320F28386D , TMS320F28386D-Q1 , TMS320F28386S , TMS320F28386S-Q1 , TMS320F28388D , TMS320F28388S , TMS320F28P650DH , TMS320F28P650DK , TMS320F28P650SH , TMS320F28P650SK , TMS320F28P659DH-Q1 , TMS320F28P659DK-Q1 , TMS320F28P659SH-Q1

3.5.2 Offloading Compute to CLA

The C28x CPU can software trigger a CLA task using the IACK instruction. This can be used by the C28x to offload computations to the CLA at specific points in the code running on the C28x CPU. The ACI benchmark example is fairly linear in that one control algorithm block requires an input from a previous control algorithm block. However there are two instances where parallelism can be introduced:

PID control block:
There are three instances of the PID control algorithm and one of these (PID Id instance) does not have dependencies on the others and can be parallelized. In the implementation, the C28x offloads the PID Id control execution to the CLA Task 2 while the C28x is executing the PID for Speed and Iq in parallel.
SVGen control block:
The SVGen control block depends on the output of the Inverse Park transform. Since this is a sensor-less ACI implementation additional control blocks are also present such as Flux and Speed Estimators. The SVGen is not dependent on these and can be parallelized with the Estimators. In the implemetation the C28x offloads the SVGen calculation to the CLA Task 1.

For the implementation, see the 'SignalChain_RAM_TMU_CLA_OFFLOAD' build configuration. The implementation demonstrates how easy it is to transition code to offload compute to the CLA. The same PID inline C function header file is included on the C28x source file as well as CLA source file and gets compiled into C28x code on C28x side and CLA code on CLA side.

Figure 3-9 ACI Motor Benchmark Output for C28x With CLA Offloading

The opportunity to offload compute to CLA resulted in an execution cycle count reduction thus boosting performance by 12%.