SPRAC21A June 2016 – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV
The source and destination buffers from and to which the DSP CPU read and write throughput would be measured are placed in the system memory (DDR/OCMC RAM). Rest of the code, stack, global variables, constants, and so on, are placed in the L2 RAM to avoid any other traffic in the system other than the system buffer access.
The compiler version used during the measurements is TI Code Gen Tool v7.4.4. Target processor version is set to 6600 (-mv6600).
The pipeline copy, read and write functions are optimized with the code optimization level 3 setting (–O3) and the space optimization of level 3 (-ms3). Additionally, all debug symbols are suppressed by setting --symdebug:none.
In order to statically debug the loops for the iterative copy/read/write, the following compiler option was used:
The aim of the optimized code is to ensure that the two load/store engines in the C66x CPU is occupied every cycle of the loop performing 64-bit loads or stores.
With this in mind, the following sections provide some sample codes and their disassembly to help understand the pipeline copy, read and write functions used in the throughput measurements.
The following sections analyze the copy, read and write loops as scheduled iterations and pipelined loops. You are encouraged to go through the following link to understand the basics of software pipelined loops on the C6000 architecture: C6000 Compiler: Tuning Software Pipelined Loops.