SPRUI30H November 2015 – May 2024 DRA745 , DRA746 , DRA750 , DRA756
The VLD instruction specifies the data distribution, data type, base address, agen, and the vector register to load the data to. Format of the VLD instruction in assembly is:
VLDtype_distribution base [agen], vreg |
The base field points to a pair of parameter registers (even/odd pair only), for example base = P8 to refer to the P8:P9 pair. The vector data memory space is 20-bit, with the lower 16-bit encoded in the even parameter register (P8 in this case), the upper 4-bit in the odd parameter register (P9 in this case).
The destination vector register specified in VLD must be an even register. VLD_DINTRLV loads into an even/odd pair of registers. VLD with any other distribution option loads into one register.There are up to eight VLD instructions in a loop to potentially load up all 16 registers. No register can be the destination of more than one VLD instruction in the same loop.
As EVE is an N-way SIMD machine, loading of N consecutive data points from local memory is supported, with each data point being a byte, short, or long (32-bit) word, either signed or unsigned.
In addition to loading N data points, EVE supports a few other distribution options. EVE supports the following:
For CUST_Pi options, number of bits required to specify the address offset for each way of SIMD depends on N, the number of ways of SIMD. Arbitrary distribution of N data items (each a byte, halfword, or word) is supported. For N = {2 | 4 | 8 | 16}, 4 bits is used to encode each destination. For N = 32, 5 bits is used; and each 16-bit parameter register supplies three fields (leaving 1 bit unused).
The DINTRLV option is supported so software can fully utilize the two function units for simple operation sequences, by processing 2N data points in parallel.
VLD instruction normally loads into one destination vector register. VLD_DINTRLV is the exception in that it loads into a pair of vector registers. Only even registers, V0, V2, ..., V14, can be referenced in the destination register field. In case of VLD_DINTRLV, the referenced register and the next register shall be the destination, for example V0 and V1.
The following data types are supported:
Address offsets for each way of SIMD, as the function of distribution option and data type are shown in Table 8-353 and Figure 8-63, Figure 8-64, and Figure 8-65.
Distribution(1)(2) | vreg[r][0] gets | vreg[r][1] gets | vreg[r][2] gets | vreg[r][3] gets | vreg[r][4] gets | vreg[r][5] gets | vreg[r][6] gets | vreg[r][7] gets |
---|---|---|---|---|---|---|---|---|
NPT | data[0] | data[1] | data[2] | data[3] | data[4] | data[5] | data[6] | data[7] |
1PT | data[0] | data[0] | data[0] | data[0] | data[0] | data[0] | data[0] | data[0] |
CIRC2 | data[0] | data[1] | data[0] | data[1] | data[0] | data[1] | data[0] | data[1] |
DS2 | data[0] | data[2] | data[4] | data[6] | data[8] | data[10] | data[12] | data[14] |
US2 | data[0] | data[0] | data[1] | data[1] | data[2] | data[2] | data[3] | data[3] |
DINTRLV | vreg[r][0] = data[0], vreg[r+1][0] = data[1] | vreg[r][1] = data[2], vreg[r+1][1] = data[3] | vreg[r][2] = data[4], vreg[r+1][2] = data[5] | vreg[r][3] = data[6], vreg[r+1][3] = data[7] | vreg[r][4] = data[8], vreg[r+1][4] = data[9] | vreg[r][5] = data[10], vreg[r+1][5] = data[11] | vreg[r][6] = data[12], vreg[r+1][6] = data[13] | vreg[r][7] = data[14], vreg[r+1][7] = data[15] |
CUST_Pi(3) | data[pf[0]] | data[pf[1]] | data[pf[2]] | data[pf[3]] | data[pf[4]] | data[pf[5]] | data[pf[6]] | data[pf[7]] |
Loads are automatically predicated. Loads are always performed at the very first iteration, and then subsequently upon any change in the associated agen address pointer.
See Section 8.3.5.5.10 for address alignment requirement for various data type and distribution options.
LD_EXP is load with expansion, and is for reading a compacted (collated) array and expanding the elements to the original locations. It is the opposite of ST_COLLAT (collating store). Per-item address increment is implied by the load data type (1/2/4 bytes depending on byte/halfword/word type). The agen field is don’t care. Per iteration, load pointer (VCOP_LD_PTR_i/VCOP_ST_PTR_j) is increment by number of non-zeroes in predicate register V2 times the data size. Agen is not used, and so the agen field is not required in the assembly code:
VLDtype_EXP base, vreg |
For example with this instruction, where V2 = {0, 0, 1, 0, 1, 1, 0, 0}, load_ptr = 0x100:
LDBU_EXP Pbase[A0], V1
There shall be, after this instruction:
V1 = {0, 0, mem[0x100], 0, mem[0x101], mem[0x102], 0, 0}, and load_ptr = 0x103
Load with expansion is restricted to be used in table lookup VLOOP. This is because the dependency (where the expanding load depends on predicated register data from another load) is more similar to table lookup, and thus it’s feasible to achieve N data points per clock cycle inside table lookup pipeline. As in normal table lookup, no operation instructions are allowed in table lookup VLOOP.