SPRUI30H November 2015 – May 2024 DRA745 , DRA746 , DRA750 , DRA756
Histogram functionality includes normal histogram, in which the addressed bin entry is incremented by 1, and weighted histogram, in which the addressed bin entry is incremented by an element in the weight array input.
Histogram operation is indicated by starting the vector command with:
VLOOP HIST,CL#:cmd_len , PL#: param_len |
The histogram operation uses a subset of the available resources and instructions:
For normal histogram, the increment value is initialized via a VINIT instruction on V4:
VINITHU_ONCE P1, V4
Data is loaded with normal load into V2, for example,
VLDBU_NPT data_base[A0], V2
Distribution options of VLD are {1PT, NPT, DS2, US2}. 1PT works for single histogram configuration, NPT works for 2, 4, or 8 parallel histograms.
Optional weights (for weighted histogram) are loaded with another normal load into V4, for example,
VLDBU_NPT weight_base[A1], V4
A special histogram load, VHLD, is used to read in the bins:
VHLDtype_mHIST hist_base[hist_agen][V2], V0, RND_SAT: rnd_sat |
The bins are then incremented with normal VADD.
VADD V0, V4, V0
Results are stored back to the histogram memory with a special histogram store, VHST:
VHSTtype_mHIST V0, hist_base[hist_agen][V2] |
Base and agen are supposed to be consistent between VHLD and VHST; otherwise, the outcome is undetermined.
Both signed and unsigned types are supported in data load, weight load, histogram load/store, to support unsigned histogram as well as weighted histogram with positive and negative weights.
Parameters for histogram are extracted from the instructions:
The histogram command basically takes the data array, and processes num_par_hist data points at a time by assuming that many sets of parallel histogram bin storage. Each data value is rounded, saturated to generate a bin index. The address bin storage elements (which can be byte, halfword, or word) is then incremented by 1 or the weight, saturated to min/max values of that type, signed/unsigned byte/halfword/word:
There are restrictions on the weight data type (as indicated on the VLD into V4):
C representation of histogram processing for a single histogram is as follows.
for (i=0; i<num_data; i++)
{
data = data_base[i];
bin = saturate(round(data));
hist[bin]++;
}
Weighted histogram has the following C representation:
for (i=0; i<num_data; i++)
{
data = data_base[i];
bin = saturate(round(data));
hist[bin] += weight[i];
}
Histogram organization in memory depends on histogram type and num_par_hist. The N physical banks of 32-bit memory is partitioned into num_par_hist logical banks, and each parallel histogram is organized inside its own bank, as with parallel lookup tables (see Figure 8-67).
The histogram command does not pre-initialize the histogram bin storage. The histogram command also does not sum up the parallel sets of storage into a single set; this is performed as a post-processing step.
Histogram operation takes 2M/P cycles plus some per-command overhead to process M data items with P parallel histograms (P = 1, 2, 4, or 8). As P parallel histograms takes P times the storage of a single histogram, it is a tradeoff of memory size and performance.
An example of histogram operation is:
; single histogram, byte data, short histogram, round down 2 bits, 33 bins
; rounding mode, shift, saturation bound coded in P7 using scalar instructions
VLOOP HIST, CL#: cmd_len, PL#: param_len
VINITHU_ONCE P1, V4 ; initialize V4 = 1
VLDBU_NPT data_base[A0], V2 ; load data to V2
VHLDHU_1HIST hist_base[A1][V2], V0, RND_SAT: P7 ; load hist entry into V0
VADD V0, V4, V0 ; increment entry
VHSTH_1HIST V0, hist_base[A1][V2] ; store histogram entry
An example of weighted histogram operation is:
; single weighted histogram, byte data, byte weight, short histogram,
; round down 2 bits, 33 bins
; rounding mode, shift, saturation bound coded in P7 using scalar instructions
VLOOP HIST, CL#: cmd_len, PL#: param_len
VLDBU_NPT data_base[A0], V2 ; load data to V2
VLDBU_NPT weight_base[A2], V4 ; load weights to V4
VHLDHU_1HIST hist_base[A1][V2], V0, RND_SAT: P7 ; load hist entry into V0
VADD V0, V4, V0 ; increment entry by weights
VHSTH_1HIST V0, hist_base[A1][V2] ; store histogram entry
Table lookup and histogram loops are limited to use designated vector register and address generator resources as shown in Table 8-358.
Loop type | Input/outputstream | Address generator | Vector register |
---|---|---|---|
table lookup | data | A0 | V2 |
table | A1 | V0 | |
output | A2 | V0 | |
load with expansion | flag | A0 | V2 |
data | n/a | V0 | |
output | A2 | V0 | |
histogram | data | A0 | V2 |
histogram | A1 | V0 | |
weight | A2 | V4 |