SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
VCOP has dedicated support for Lookup and Histogram Table (LHT) operations that access memory in a non-linear fashion. Both operate by indexing into a table using index values that contain offsets from the table base. In the case of lookup table, the lookup operation loads from memory at the indexed locations and returns the values. In the case of histogram, the update operation increments the memory at the indexed locations.
Both lookup table and histogram support the notion of parallel tables, in which up to 8 distinct tables are referenced using index values contained in the lanes of a vector register (and in the case of lookup, values read from the table are returned in corresponding lanes). To enable parallel access to the tables, the tables are interleaved in memory such that each distinct table occupies a different memory bank or banks than the others.
C7x supports LHT operations in a very similar way. However there are two important differences that fundamentally affect the translation of these operations to C7x.
First, VCOP supports LHT operations directly on its local memory, consisting of the IBUFA, IBUFB, and WBUF buffers. Each is 32KB, for a total of 96KB. Typically these buffers will be mapped to L2 on C7x. But C7x supports LHT operations only on a 32KB area of L1D. Therefore LHT operations require table data to be copied from L2 into L1D before the operation, and then (in the case of histogram) copied back out.
Second, VCOP’s memory is laid out as 8 banks, each 32 bits wide. Each full-width line from the memory interface contains 8x32 or 256 bits. C7x’s memory is laid out as 16 banks, each 64 bits wide. Each line contains 16x64 or 1024 bits. For LHT operations that use parallel tables, the bank geometry affects the table layout in memory.
For example, consider a LHT operation with 4 parallel tables and 32-bit elements. We’ll call them T0, T1, T2, and T3. (In the following description, for simplicity, “locations” refer to word offsets within the linear address space of table memory. In reality, both machines are byte addressable.) On VCOP each table occupies 2 of the 8 banks. In each table line, the two banks corresponding to a given table contain 2 elements: locations 0 and 1 are part of table T0; locations 2 and 3 are part of T1, and so on. On C7x, each table occupies 4 of the 16 banks; in each line the 4 64-bit banks contain 8 table elements. Locations 0—7 are part of T0; locations 8—15 are part of T1, and so on. Figure 5-1 illustrates the two layouts for the table in this example. In this figure, TN[M] represents the element at index M of table N. Shading indicates memory banks. Heavy borders indicate table boundaries.
VCOP kernels assume tables are laid out according to VCOP’s geometry; therefore this layout must be preserved at kernel boundaries. In order to operate on such tables with C7x LHT operations, the tables must be rearranged to conform to C7X’s geometry.
In general, a kernel that uses LHT operations must go through the following steps when translated to C7x:
init()
function) Compute the LTCR control register
flags to configure the table for the number of parallel tables and element type,
and store the configuration in the tvals structure.In practice, steps 2a and 2b (copy and rearrange) are combined: the table is rearranged on-the-fly as it is copied. Similarly, steps 5a and 5b are combined.
Efficiency Warning: LHT Operations |
---|
The translation of LHT operations suffers from the overhead introduced by copying and rearranging the table data. This step takes approximately one cycle per 512 bits (64 bytes) of data. The exact penalty depends on the size of the table. |
The virtual machine contains two template classes,
vcop_lookup
and vcop_hist
. These implement the
translation steps in the previous outline. The template parameters specify the
element data type, the number of parallel tables, and in the case of
vcop_lookup
, the interpolation degree. Both classes are based
on another template called LHT_base
which provides functionality
common to both classes.