SPRUIV4 User guide

SPRUIV4D May 2020 – May 2024

5.4.3 Performance Issues

You can find the following issues by examining the assembly source and the Software Pipeline Information comment block. Potential solutions are given for each condition.

Two Loops are Generated, One Not Software Pipelined / Duplicate Loop Generated. If you see the message "Duplicate Loop Generated" in the Software Pipeline Information comment block, or you notice that there is a second version of the loop that isn't software pipelined, it may mean that when the iteration count (iteration count) of the loop is too low, it is illegal to execute the software pipelined version of the loop that the compiler has created. In order to generate only the software pipelined version of the loop, the compiler needs to prove that the minimum iteration count of the loop would be high enough to always safe execute the pipelined version. If the minimum number of iterations of the loop is known, using the MUST_ITERATE pragma to tell the compiler this information may help eliminate the duplicate loop.
Loop Carried Dependency Bound is Larger than the Partitioned Resource Bound. If you see a loop carried dependency bound that is higher than the partitioned resource bound, you likely have one of two problems. First, the compiler may think there is a memory dependence from a store to a subsequent load. See the "Memory Dependencies" section of the TMS320C6000 Programmer’s Guide (SPRU198) for more information. Second, a computation in one iteration of the loop may be used in the next iteration of the loop. In this case, the only option is to try to eliminate the flow of information from one iteration to the next, thereby making the iterations more independent of each other.
Large Outer Loop Overhead in Nested Loop. If the inner loop count of a nested loop is relatively small, the time to execute the outer loop can become a large percentage of the total execution time. For cases where this seems to degrade the overall loop nest performance, two approaches can be tried. First, if there are not too many instructions in the outer loop, you may want to give a hint to the compiler that it should coalesce the loop nest. Try using the COALESCE_LOOP pragma and check the relative performance of the entire loop nest. If the COALESCE_LOOP pragma does not work, and the number of iterations of the inner loop is small and do not vary, fully unrolling the inner loop by hand may improve performance of the nested loop because the outer loop may be able to be software pipelined.
There are Memory Bank Conflicts If the compiler generates two memory accesses in one cycle and those accesses reside within the same memory block in the cache hierarchy, a memory bank stall can occur. To avoid this degradation, memory bank conflicts can be avoided by placing the two accesses in different memory blocks through use of the DATA_ALIGN pragma.

See the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8) for information about pragmas.