SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
VCOP has three distribution modes which store vector elements in a non-linear fashion. The OFFSET_NP1 mode stores each element 9 words apart, effecting a 9x8 transpose. (The 9-word offset causes each element to be stored into a different memory bank, to achieve a parallel store). The SDDA (
s_scatter
in Kernel-C) mode stores each element at a data-dependent offset; the offsets come from another vector. The PDDA (
p_scatter
in Kernel-C) mode is like SDDA except the offsets must be such that each element goes to a different bank, so they can be parallel. The PDDA mode is also typically used for transpose.
There is no direct support for these operations on C7x. On C7x, transpose is supported by the Streaming Engine, but only for loads and not for stores.
For the specific case of OFFSET_NP1 used to store into a scratch buffer for a transpose operation, the migration tool may detect this and re-arrange the layout of the scratch buffer so as to implement the transpose using the streaming engine. See Section 5.3.3.
Otherwise, the migration tool generates naive sequences for these modes, storing elements one at a time rather than in parallel.
Efficiency Warning: Parallel Stores |
---|
Unless the above optimizations are enabled, kernels that use these distribution modes will have inefficient translations. |