SPRUIG8J January 2018 – March 2024
Intrinsics are also provided to facilitate a direct-map between an intrinsic and a corresponding C7000 instruction. As such, these intrinsics are not overloaded; they are therefore the least abstracted from the hardware and are therefore considered low-level. The primary purpose of these low-level intrinsics is to ensure that no other instructions are generated other than those desired by the programmer. This is particularly useful for operations that require operand interleaving on input or operand deinterleaving on output.
All direct-mapped intrinsics are listed in
c7x_direct.h
, which is included by the top-level
c7x.h
file.
For example, the C7000 instruction VCMATMPYHW (vector complex matrix multiply) requires that its second source operand be interleaved along 64-bit boundaries. It also requires that its output also be deinterleaved along 64-bit boundaries. As listed in
c7x.h
and
c7x_direct.h
, the C7000 compiler provides two interfaces into this instruction:
/*-----------------------------------------------------------------------------
* ID: __cmatmpy_ext
*----------------------------------------------------------------------------*/
/*
VCMATMPYHW
cint2 = __cmatmpy_ext(cshort2, cshort4);
cint4 = __cmatmpy_ext(cshort4, cshort8);
cint8 = __cmatmpy_ext(cshort8, cshort16);
*/
/*-----------------------------------------------------------------------------
* ID: __vcmatmpyhw_vww
*----------------------------------------------------------------------------*/
/*
VCMATMPYHW
__vcmatmpyhw_vww(cshort16, cshort16, cshort16, cint8&, cint8&);
*/
If you choose to use the overloaded “__cmatmpy_ext(…)” intrinsic, the compiler will assume that both the input and output data is not interleaved and will attempt to abstract this. Therefore, the compiler will insert special instructions to interleave the input prior to instruction execution, and it will also insert special instructions to deinterleave the output after instruction execution. This method leans toward programmer ease-of-use at the expense of instruction cycles.
More advanced programmers may instead opt to use the direct-mapped, low-level “__vcmatmpyhw_vww(…)” intrinsic and manage the interleaving and deinterleaving themselves. In this case, the interleaved input is shown as a pair of
cshort16
vectors, and the output is given as a pair of
cint8
vectors, each of which are determined by the maximum width and basic type supported by the VCMATMPYHW instruction.