SPRUI30H November 2015 – May 2024 DRA745 , DRA746 , DRA750 , DRA756
The pipeline is fully interlocked – the CPU stalls in case of source operand registers have pending loads. Data bypassing for read-after-write dependency is implemented at the end of EXE and WB stages to increase instructions-per-cycle (IPC).
The load data has a single-cycle load use penalty since the load data is written back to register file at WB stage. In the following example, the
ADD
instruction stalls for a cycle to allow load to complete:
LDW *+R0(0), R0 ; Load a word into R0
ADD 4, R0, R0 ; Increment R0 to the next word address
MVK 100, R1 ; Move a value 100 into R1
Since the CPU allows a nondependent instruction to continue executing, this stall is avoided, if the
MVK
(no dependency on the load data) instruction is placed in the load delay slot – the CPU executes all three instructions without a stall.
LDW *+R0(0), R0 ; Load a word into R0
MVK 100, R1 ; Move a value 100 into R1
ADD 4, R0, R0 ; Increment R0 to the next word address