SPRUIV4D May 2020 – May 2024
The following block diagram shows the datapath split on the C7100 DSP CPU. There is an A-side datapath and a B-side datapath. The diagram shows the functional units and multiple, heterogeneous register files. The A-side datapath is responsible for scalar computation, loading and storing scalars and vectors to and from memory, and control-flow (branches, calls). The B-side datapath handles vector math operations, permutations of data, and vector predication operations.
To simplify the image above, some data movement capabilities and data paths are not shown in this figure.
C7100 and C7120 devices have a 512-bit vector width. C7504 and C7524 devices have a 256-bit vector width. Registers have 64 bits per register ("scalar") or a "vector-width" number of bits per register. Thus, C7100 and C7120 devices have 512-bit vector registers, while C7504 and C7524 devices have 256-bit vector registers.
On a given datapath, there are several different kinds of register files. On a given datapath, each functional unit can write to the global register file on that datapath and most of the “local” register files on that datapath. However, only some functional units can read from a “local” register file.
In addition to the D1 and D2 units providing CPU access to the memory hierarchy, the C7100 DSP has two “streaming engines” that facilitate a fast path to obtain data from memory. A streaming engine is a hardware feature that allows you (or the compiler) to specify a pattern of memory addresses to obtain from memory. The streaming engine will do its best to pre-fetch that data from the memory hierarchy into a scratchpad memory close to the CPU, to minimize CPU stalls due to cold cache misses.