You can find the following issues by
examining the assembly source and the Software Pipeline Information comment block.
Potential solutions are given for each condition.
- Two Loops are Generated, One
Not Software Pipelined / Duplicate Loop Generated. If you see the
message "Duplicate Loop Generated" in the Software Pipeline Information comment
block, or you notice that there is a second version of the loop that isn't
software pipelined, it may mean that when the iteration count (iteration count)
of the loop is too low, it is illegal to execute the software pipelined version
of the loop that the compiler has created. In order to generate only the
software pipelined version of the loop, the compiler needs to prove that the
minimum iteration count of the loop would be high enough to always safe execute
the pipelined version. If the minimum number of iterations of the loop is known,
using the MUST_ITERATE pragma to tell the compiler this information may help
eliminate the duplicate loop.
- Loop Carried Dependency Bound
is Larger than the Partitioned Resource Bound. If you see a loop carried
dependency bound that is higher than the partitioned resource bound, you likely
have one of two problems. First, the compiler may think there is a memory
dependence from a store to a subsequent load. See the "Memory Dependencies"
section of the TMS320C6000 Programmer’s Guide (SPRU198) for more information. Second, a
computation in one iteration of the loop may be used in the next iteration of
the loop. In this case, the only option is to try to eliminate the flow of
information from one iteration to the next, thereby making the iterations more
independent of each other.
- Large Outer Loop Overhead in
Nested Loop. If the inner loop count of a nested loop is relatively
small, the time to execute the outer loop can become a large percentage of the
total execution time. For cases where this seems to degrade the overall loop
nest performance, two approaches can be tried. First, if there are not too many
instructions in the outer loop, you may want to give a hint to the compiler that
it should coalesce the loop nest. Try using the COALESCE_LOOP pragma and check
the relative performance of the entire loop nest. If the COALESCE_LOOP pragma
does not work, and the number of iterations of the inner loop is small and do
not vary, fully unrolling the inner loop by hand may improve performance of the
nested loop because the outer loop may be able to be software pipelined.
- There are Memory Bank Conflicts If the compiler generates two memory
accesses in one cycle and those accesses reside within the same memory block in
the cache hierarchy, a memory bank stall can occur. To avoid this degradation,
memory bank conflicts can be avoided by placing the two accesses in different
memory blocks through use of the DATA_ALIGN pragma.
See the C7000 Optimizing C/C++
Compiler User's Guide (SPRUIG8) for information about pragmas.