SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
As generated, each loop level requires computation to control the loop: initialization of the LCV outside the loop, and an increment, compare, and branch inside the loop. The compiler and ISA provide a handful of mechanisms to reduce or eliminate this overhead.
First, the compiler typically converts simple counting loops to count down rather than up, slightly reducing the cost of the compare and branch operation.
Second, if the body of a loop consists entirely of another loop, the compiler can collapse the two loops into a single loop whose trip count is the product of the original loops.
Finally, and most significantly, the compiler can leverage an ISA feature called the Nested Loop Controller (NLC), which provides hardware loop control similar to VCOP for up to two levels of loop nesting.
The NLC allows loops to be collapsed even if the outer loop has code in addition to the inner loop, by providing automatically-generated predicates for the outer loop code.
The combination of the compiler’s collapsing and the NLC reduce loop control overhead to an acceptable degree. Virtually all kernels should have at least the two inner levels collapsed and controlled by the NLC. Often there are at least two levels that the compiler can collapse, so most loops execute as either singly nested loops, or in rare cases, doubly or triply nested loops. This is important because the C7x relies on software pipelining for performance, and only the inner loop can be pipelined.