SPRUIG5E January 2018 – March 2023 TDA4VM , TDA4VM-Q1
Accessing a portion of a vector type may be “free” on C6000 devices, but requires an extra instruction on C7000 devices.
For example, a subvector access of an int4 element is likely to be free on C6000, since an int4 on C6000 is composed of four 32-bit registers. Therefore, accessing one element of an int4 can be performed by the compiler by using the appropriate 32-bit register. However, on C7000 devices, an int4 element is located in a single vector register. Therefore, accessing one element of an int4 requires the compiler to use an instruction (such as VGETW) to extract that data.
Similarly, packing a vector of int4 is likely free or almost free on C6000 devices, while on C7000 devices it may require a sequence of instructions (such as VPUTWs).
If the C7000 compiler is able to vectorize the code further, in some cases the performance penalty may be mitigated. For example, the access of the low 32 bits of 64 bits with _loll() may be vectorized into VDEAL2W.