For each vector command specifying a loop, several overhead components are present. In general, developers are encouraged to run place into each loop as much processing as possible and run as many iterations as possible to reduce the percentage of time spent on overhead.
Overhead components are:
- Command decode time: it takes 2 cycles per instruction to decode, since scalar core is providing instructions at this rate. Command decode time can be hidden if ARP32 feeds these instructions while VCOP is executing a previous command.
- Parameter fetch time: it takes 9 + ceiling(num_param/16) cycles to fetch parameter from data memory (WBUF, IBUFL, or IBUFH).
- Execution pipeline ramp up/down time: it takes time to ramp up and down the long pipeline of VCOP. Minimal time for any loop is execution is about 17 cycles, due to the pipeline ramp up/down time.
Figure 8-68 shows a decode-parameter-fetch-execution time line.
Figure 8-68 Example of Operation Delay Slots - t = 0, ARP32 sends VLOOP, followed by VCOP instructions in the loop, instructions go into FIFO.
- t = 2, VCOP pulls instruction out of FIFO, recognizes VLOOP.
- t = 10 (depend on command length), VCOP pulls last instruction of loop from FIFO.
- t = 10, ARP32 sends next command, starting from VLOOP.
- t = 14, VCOP starts access to parameter memory, 16 parameters/cycle.
- t = 18, ARP32 keeps sending vector commands, until program stream switches to scalar code, or VCOP pre-decode instruction FIFO becomes full (capacity is 64 words).
- t = 23 (14 + 9), VCOP starts to copy parameters into decoded command buffer.
- t = 25 (after parameter copy), VCOP starts executing the loop.
- t = 26 (sync to 250-MHz clock), VCOP pulls instruction out of FIFO, recognizes VLOOP.
- t = 32 (depend on command length), VCOP pulls last instruction of loop from FIFO.
- t = 36 (after prev loop finishes executing), VCOP starts to access parameter memory.
This is for showing the dependency among activities; execution time here is not realistic; it takes minimally 17 cycles to execute any loop, due to the pipeline depth.