SPRACS0A May 2020 – November 2022 TMS320F280048-Q1 , TMS320F280048C-Q1 , TMS320F280049 , TMS320F280049-Q1 , TMS320F280049C , TMS320F280049C-Q1 , TMS320F28033 , TMS320F28033-Q1 , TMS320F28035 , TMS320F28035-EP , TMS320F28035-Q1 , TMS320F28053 , TMS320F28055 , TMS320F2806-Q1 , TMS320F28065 , TMS320F28069 , TMS320F28069-Q1 , TMS320F28069F , TMS320F28069F-Q1 , TMS320F28069M , TMS320F28069M-Q1 , TMS320F28075 , TMS320F28075-Q1 , TMS320F28076 , TMS320F28374D , TMS320F28374S , TMS320F28375D , TMS320F28375S , TMS320F28375S-Q1 , TMS320F28376D , TMS320F28376S , TMS320F28377D , TMS320F28377D-EP , TMS320F28377D-Q1 , TMS320F28377S , TMS320F28377S-Q1 , TMS320F28378D , TMS320F28378S , TMS320F28379D , TMS320F28379D-Q1 , TMS320F28379S , TMS320F28384D , TMS320F28384D-Q1 , TMS320F28384S , TMS320F28384S-Q1 , TMS320F28386D , TMS320F28386D-Q1 , TMS320F28386S , TMS320F28386S-Q1 , TMS320F28388D , TMS320F28388S , TMS320F28P650DH , TMS320F28P650DK , TMS320F28P650SH , TMS320F28P650SK , TMS320F28P659DH-Q1 , TMS320F28P659DK-Q1 , TMS320F28P659SH-Q1
Various real-time control applications involve implementation of multiple control loops on a single device. However integration of multiple control systems on a single controller remains challenging from a processor bandwidth point of view while keeping system costs down. The CLA is a fully parallel processor to the main C28x core that brings concurrent control-loop execution to the C28x family. The CLA has its own program and data bus, and executes independently of the main core on the MCU. As described in Section 2 and Section 3, CLA provides unique combination of minimal latency and ease of access to the key control peripherals, which enables CLA to offload the fast control algorithm task entirely from C28x. Offloading the control task to CLA also offers additional benefits such as reduced jitter in execution and deterministic operation of control loops. This is made possible because the CLA is task oriented instead of an interrupt service driven machine and the tasks on CLA cannot be interrupted guaranteeing the deterministic nature of control loops. In a pipelined CPU, the ISRs can be delayed by an “n” number of cycles if the CPU is executing branch type statements when the ISR is received. However this is not a problem with CLA CPU as it waits in an idle state till the periodic task triggers to begin any execution due to its task-driven nature. Therefore, offloading fast control task to CLA and running remaining tasks on C28x helps to improve the overall system performance with reduced jitter in execution.
The example “cla_ex6_cpu_offloading” illustrates how to optimally offload a control loop from C28x to CLA when multiple control tasks and background tasks are involved which require more than single CPU (C28x) bandwidth. Figure 5-1 shows that two control loops are simulated in this example. The faster one (loop1) runs at 200 KHz while the slower one (loop2) runs at 20 KHz. Both the loops make use of PI controller to control the duty of single PWM output with different weightage, the faster one contribution being 80% while the slower one contributes 20% to the PWM output. The inputs for both the loops are sampled using ADCA and ADCB with multiple SOCs for each to filter out any noise in the inputs. There is also a background task continuously running in the main loop that disables or enables the entire system including the PWM output and the control loops based on the user configured switch "system_OFF". Note that the CCS debugger clock cannot be used for profiling CLA routines, hence GPIO based profiling technique is employed in this example to profile the both tasks. GPIO2 and GPIO 3 have been used for this purpose.
Figure 5-2 depicts the flow diagram of both the control tasks when everything runs on C28x without the use of CLA. In this case, the total CPU (C28x) utilization exceeds the schedulable Utilization bound (UB) and, hence the system is schedulable in this scenario. This can also be further substantiated by observing the profiling waveforms shown in Figure 5-3. Note that there is no toggling observed on GPIO3, which clearly suggests that the lower priority Loop 2 task never gets chance to complete and neither the background task.
Since the system is non-schedulable with C28x, one of the control tasks can be offloaded to CLA in order to meet the system requirements. As CLA offers very low interrupt latency, it is better to offload the fast control task to CLA, and this will also free up maximum bandwidth on C28x, which can be utilized for executing background and other system tasks. Figure 5-4 depicts the flow diagrams of both the tasks when the higher frequency Loop 1 task is offloaded to CLA. With the use of CLA for concurrent loop execution, the C28x utilization for control tasks has come down to approximately 7.7% allowing the other background task to execute correctly. Offloading the task to CLA makes the system perfectly schedulable in this case, which is also evident from the profiling waveforms shown in Figure 5-5. The example allows the user to offload the loop1 task quickly & conveniently from C28x to CLA by just updating the pre-defined symbol "run_loop1_cla" to 1 in the project build options.