The vector core is a SIMD machine with built-in loop control and address generation. It is programmed in array or 2D block processing level. The vector core has the following resources:
- 4 nested for loops, with loop variables i1, i2, i3, and i4, plus an optional outer loop i0
- 8 address generators, each capable of 4-dimensional addressing, that is, the address pattern is: base + i1 × const1 + i2 × const2 + i3 × const3 + i4 × const4
- 16-entry vector register file, each entry is N-way SIMD × 40-bit signed (sign-extended or zero-padded from 8/16/32-bit signed/unsigned memory data, or from operation upon register data)
- Two general-purpose functional units, each N-way SIMD. Functional spec supports N = {2 | 4 | 8 | 16 | 32}, first instance N = 8.
- Table lookup unit supporting up to N parallel lookups.
- Histogram unit supporting up to N parallel histogram operations.
- 8 load units
- 8 store units
The vector core supports the following functions:
- Generic compute
- Table lookup
- Histogram and weighted histogram