SPRUIV4D May 2020 – May 2024
The correct the problem in the previous example caused by a loop-carried dependency, we need to tell the compiler that these arrays do not overlap in memory, and thus there is no memory dependence from one iteration to the next.
Many common digital signal processing loops contain one or more load operations, some
computation, and then a store operation. Typically, the loads are reading from an
array and the stores are storing values to an array. If the compiler does not know
that the arrays are separate (or do not overlap), the compiler must be conservative
and assume that the stores of iteration i
may be needed in the
loads of iteration i+1
, or i+2
, etc. Therefore, it
is important to tell the compiler if the load and store arrays inhabit entirely
different memory areas (that is, the objects/arrays pointed-to do not overlap).
We can do this with the use of the restrict
keyword. This keyword
tells the compiler that throughout the scope of the variable (array name or pointer
name used to access the array), accesses to that object or array will only be made
through that array name or pointer name.
Use of the restrict keyword effectively allows you to tell the compiler that the store to memory will not write to the same place where the next iterations' loads will read from. Thus, successive iterations can be overlapped when the compiler performs software pipelining, thus allowing the generated code to run faster.
This C function example uses the restrict keyword. The resulting Software Pipeline Information comment block will show that when the restrict keyword is used, the loop-carried dependence bound is zero, while the partitioned resource bound is two. This leads to a much-improved initiation interval (ii) of two cycles.
void weighted_sum(int * restrict a, int *restrict b, int *restrict out,
int weight_a, int weight_b, int n)
The Texas Instruments C7000 C/C++ Compiler allows the restrict keyword to be used in both C and C++ modes, despite the restrict keyword not being part of the C++14 or C++17 standards.