SPRUI04F july 2015 – april 2023
The C code in Example5-7 implements a dot product function. The inner loop is unrolled once to take advantage of the C6000's ability to operate on two 16-bit data items in a single 32-bit register. LDW instructions are used to load two consecutive short values. The linear assembly instructions in Example5-8 implement the dotp loop kernel. Example5-9 shows the loop kernel determined by the assembly optimizer.
For this loop kernel, there are two restrictions associated with the arrays a[ ] and b[ ]:
Bank conflict:
MVK 0, A0
|| MVK 8, B0
LDW *A0, A1
No bank conflict:
MVK 0, A0
|| MVK 4, B0
LDW *A0, A1
|| LDW *B0, B1