SPRUIV4D May 2020 – May 2024
The compiler sometimes takes functions defined in header files and places the code at the call site. This allows software pipelining in an enclosing loop and thus improves performance. The compiler may also do this to eliminate the cost of calling and returning from a function.
In the following example, the add_and_saturate_to_255()
function
sums two values and caps the sum at 255 if the sum is over 255. This function is
called from a function in inlining.cpp
, which includes the
inlining.h
file via a preprocessor #include directive.
// inlining.cpp
// Compile with "cl7x -mv7100 --opt_level=3
// --debug_software_pipeline --src_interlist"
#include "inlining.h"
void saturated_vector_sum(int * restrict a, int * restrict b,
int * restrict out, int n)
{
#pragma MUST_ITERATE(1024,,)
#pragma UNROLL(1)
for (int i = 0; i < n; i++)
{
out[i] = add_and_saturate_to_255(a[i], b[i]);
}
}
// inlining.h
int add_and_saturate_to_255(int a, int b)
{
int sum = a + b;
if (sum > 255) sum = 255;
return sum;
In this case, the compiler will inline the call to
add_and_saturate_to_255()
so that software pipelining can be
performed. You can determine that inlining has been performed by looking at the
bottom of the generated assembly file. Here, the compiler places a comment that
add_and_saturate_to_255()
has been inlined. Note that the
function's identifier has been modified due to C++ name mangling.
;; Inlined function references:
;; [0] _Z23add_and_saturate_to_255ii
The inlining can also be seen in the generated assembly code, because there is no CALL instruction to a function in the loop. In fact, because of the inlining (and thus the elimination of the call to a function), the loop can be software pipelined. Software pipelining cannot occur if there is a call to another function in the loop. Note that because of code size concerns, not every call that can be inlined will be inlined automatically. See the C7000 Optimizing Compiler User's Guide for more information on inlining.
;*----------------------------------------------------------------------------*
;* SINGLE SCHEDULED ITERATION
;*
;* ||$C$C44||:
;* 0 TICK ; [A_U]
;* 1 SLDW .D1 *D1++(4),BL0 ; [A_D1] |5|
;* 2 SLDW .D2 *D2++(4),BL1 ; [A_D2] |5|
;* 3 NOP 0x5 ; [A_B]
;* 8 ADDW .L2 BL1,BL0,BL1 ; [B_L2] |5|
;* 9 VMINW .L2 BL2,BL1,B0 ; [B_L2] |5|
;* 10 STW .D1X B0,*D0++(4) ; [A_D1] |5|
;* || BNL .B1 ||$C$C44|| ; [A_B] |11|
;* 11 ; BRANCHCC OCCURS {||$C$C44||} ; [] |11|
;*----------------------------------------------------------------------------*