SPRUIG8J January 2018 – March 2024
The compiler can be directed to automatically use the Streaming Engines (SE) and/or the Streaming Address Generators (SA) using the --auto_stream option. This behavior can be controlled with the following settings for the --auto_stream option:
C7100 and C7120 devices support only "off" and "no_saving" modes. Enable optimization manually using the --auto_stream=no_saving option. These devices provide no support for SE or SA context switching, so --auto_stream=off by default.
Newer devices, such as C7504, support all three modes and --auto_stream=saving is the default.
With --auto_stream enabled, memory accesses are converted in nested loops that have addressing patterns that can fit into an SE or SA configuration template. For example, suppose you have the following code:
void example1(char *in, char *restrict out, int len1, int len2)
{
for (int i = 0; i < len1; i++)
for (int j = 0; j < len2; j++)
out[i*len1 + j] = in[i*len1 + j];
}
With --auto_stream enabled, this code is transformed to be equivalent to the following SE configuration on C7504 devices after being vectorized:
__SE_TEMPLATE_v1 tmplt = __gen_SE_TEMPLATE_v1();
tmplt.ICNT0 = 32;
tmplt.ICNT1 = (len2>>5)+((len2&0x1f) != 0);
tmplt.DIM1 = 32;
tmplt.ICNT2 = len1;
tmplt.DIM2 = len1;
tmplt.VECLEN = __SE_VECLEN_32ELEMS;
tmplt.DIMFMT = __SE_DIMFMT_3D;
For further information about how a stream is programmed, see Section 4.15.6.