SPRUIV4D May 2020 – May 2024
The compiler attempts to collapse or coalesce nested loops if it is legal and can improve performance. A nested loop is a set of two loops where one loop resides inside of another enclosing loop. Both collapsing and coalescing involve transforming a nested loop into a single loop. Collapsing takes place when there is no code in the outer loop. Coalescing takes place when there is code in the outer loop.
After the two nested loops are combined into one loop, the code that was in the body of the outer loop must be transformed so that it conditionally executes only when necessary. Collapsing and coalescing can have performance benefits because only one pipe-up and pipe-down are executed when the loop nest is executed, instead of a pipe-down and pipe-up of the inner loop every time the outer loop executes when loop coalescing/collapsing is not performed.
In order to perform loop collapsing or loop coalescing, the combined loop must be able to be software pipelined. This means that the loop nest must not contain function calls. The loops must each have a signed counting iterator that iterates a fixed amount each time. That is, the inner loop must not iterate a different number of times depending on which outer loop iteration execution is in. Also, the outer loop must not contain too much code, otherwise the transformation will not improve performance. If the outer loop carries a memory dependence, loop coalescing and loop collapsing likely will not be performed.
When loop collapsing or loop
coalescing take place, the software pipelined loop indicates the beginning loop
source line ("Loop source line
") near the top of the software
information comment block. When this source line number references an outer loop,
this indicates that the inner loop has been fully unrolled or the compiler has
performed loop coalescing or collapsing. In cases of loop coalescing, the compiler
uses special instructions, such as NLCINIT, TICK, GETP, and BNL. A description of
these hardware features, encompassing what is known as the "NLC", is beyond the
scope of this document. More details of the NLC may be found in the C71x DSP CPU,
Instruction Set, and Matrix Multiply Accelerator Technical Reference Manual
(SPRUIP0).