SPRU514Z July 2001 – October 2023 SM320F28335-EP
You can force the compiler to generate a DMAC instruction by using the __dmac intrinsic. In this case, you are fully responsible for ensuring that the data addresses are 32-bit aligned.
void __dmac(long *src1, long *src2, long &accum1, long &accum2, int shift);
See Table 7-6 for details about the __dmac intrinsic.
Example 1:
int src1[2N], src2[2N]; // src1 and src2 are int arrays of at least size 2N
// You must ensure that both start on 32-bit
// aligned boundaries.
{...}
int i;
long res = 0;
long temp = 0;
for (i=0; i < N; i++) // N does not have to be a known constant
__dmac(((long *)src1)[i], ((long *)src2)[i], res, temp, 0);
res += temp;
Example 2:
int *src1, *src2; // src1 and src2 are pointers to int arrays of at
// least size 2N. User must ensure that both are
// 32-bit aligned addresses.
{...}
int i;
long res = 0;
long temp = 0;
long *ls1 = (long *)src1;
long *ls2 = (long *)src2;
for (i=0; i < N; i++) // N does not have to be a known constant
__dmac(*ls1++, *ls2++, res, temp, 0);
res += temp;
In these examples, res holds the final sum of a multiply-accumulate operation on int arrays of length 2*N, with two computations being performed at a time.
Additionally, an optimization level >= -O2 must be used to generate efficient code. Moreover, if there is nothing else in the loop body as in these examples, the compiler generates a RPT || DMAC instruction, further improving performance.