SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
In general, there are two kinds of errors that result from narrowing lanes.
The first is when the upper bits indicate signedness. For example on VCOP the value 0x00.FFFF.FFFF represents a large unsigned number (4294967295), whereas the value 0xFF.FFFF.FFFF represents a negative number (-1). When translated, the 32 bit result 0xFFFF.FFFF could be either value depending on whether it’s treated as signed or unsigned.
By default, the migration tool treats all 32-bit values as signed. This covers the majority of cases, since the 40-bit values in VCOP are always treated as signed. In some cases, however, this can lead to incorrect results. For example, when the value 0x00.FFFF.FFFF is right-shifted or compared, VCOP treats it as positive while the C7x translation treats it as negative.
The migration tool helps address this issue, in
some cases, treats 32-bit values as unsigned. This can happen in two ways. First, if
the vector is loaded from an unsigned base pointer (__vptr_uchar
,
__vptr_ushort
, or __vptr_uint
), its element
type becomes unsigned. Second, you can force a vector to be unsigned by declaring it
using the __vector_uint32
keyword (rather than the normal
__vector
).
The following operations are affected by the signedness of vector elements.
Compatibility Warning: Unsigned vector elements |
---|
If a kernel relies on vector elements being treated as unsigned when bit 31 is set, the translated code may not work properly. Most such issues can be fixed by declaring the vector as __vector_uint32. |
The second error that can result from the reduced lane width is when values have significance in the upper 8 bits. On VCOP these bits are typically used as overflow (guard) bits for accumulation loops, or to hold the upper bits of extended multiply operations. Here is a partial list of VCOP operations that use the guard bits:
(Vdst1,Vdst2) = Vsrc1 + hi(Vsrc2)
) Vdst = jus16(src)
) The migration tool does not attempt to account for or detect these incompatibilities. The resultant code will likely fail at run time.
Compatibility Warning: Reliance on 40-bit elements |
---|
If a kernel depends on more than 32-bits of precision in vector elements, the translated code may not work properly. |