SPRAC21A June 2016 – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV
In order to understand the impact of executing code from QSPI flash in XIP mode, the following test setup was used on the TDA2xx device:
Table 63 provides a comparative analysis of the impact of QSPI XIP code execution versus DDR Cortex-M4-based Capture Display Vision SDK. Note Cortex-M4 frequency is 212 MHz. The FPS was found to match between DDR and QSPI XIP code execution. (30 FPS)
Scenario | M4_0 CPU Task Load (%) (load can vary by 2 % in different runs) |
---|---|
M4_0 code execution from DDR3 532 MHz | Approximately 6.2 % |
QSPI XIP (64 MHz clock frequency, Mode 0) | Approximately 10.91 % |
In order to understand the impact of the QSPI Code execution for a fully loaded M4 CPU, the networking usecase was also run with the capture display usecase. In order to run the networking threads on the M4_0 core, the Network Development Kit (NDK) (http://www.ti.com/lit/ug/spru524j/spru524j.pdf) windows application was run as shown below:
ndk_2_24_02_31\packages\ti\ndk\winapps>send <IPAddress> 2000
This tool prints out the number of megabytes of data that were sent by the tool and serviced by the TDA2xx device (M4_0) running the network stack. The M4_0 is 100% loaded in the following experiments. Table 64 provides the comparison of the achieved network throughput at different device conditions.
Networking Bandwidth Achieved
(all numbers mega Bytes per second) |
QSPI4 (64 MHz) | DDR3 532 MHz |
---|---|---|
M4 (212.8 MHz) | 3.05 | 5.26 |
When there is a concurrent EDMA transfer from QSPI to DDR (possible application image copy from QSPI) while the M4_0 is executing code out of QSPI, there is a significant impact on the M4_0 code execution time. The impact on M4 code execution for varying EDMA ACNT parameter for AB_SYNC and ASYNC transfers is shown in Table 65.
Networking Bandwidth Achieved
With concurrent EDMA Traffic M4 @ 212.8 MHz QSPI @ 64 MHz |
A_SYNC
(Bandwidth in MBps) |
AB_SYNC
With BCNT = 512 (Bandwidth in MBps) |
---|---|---|
ACNT = 65535 | 0.0118 | 0.0046 |
ACNT = 16384 | 0.0379 | 0.0046 |
ACNT = 4096 | 0.233 | 0.0050 |
ACNT = 512 | 1.638 | 0.0088 |
ACNT = 256 | 2.048 | 0.0121 |
ACNT = 128 | 2.559 | 0.0122 |
ACNT = 64 | 2.730 | 0.0186 |
Without EDMA | 2.935 | 2.935 |
The impact on M4 traffic can be controlled by using bandwidth limiter on the EDMA Read from QSPI. Table 66 provides the impact on performance for M4_0 code running the (1) network usecase + Capture and Display and (2) Capture-display usecase only in two independent runs.
BW Limited EDMA TPUT
(ACNT = 65535, A SYNC) |
M4 Networking Performance (MBps) QSPI4 (64 MHz) | M4 Capture Display Total CPU Load (%) QSPI4 (64 MHz) |
---|---|---|
22.46 MBps | 0.0118 MBps | 99.9 % |
17.86 MBps | 0.621 MBps | 40.1 % |
8.98 MBps | 1.772 MBps | 19.2 % |
Without EDMA | 2.935 MBps | 13.2 % |
The impact of the performance of the IPU code can also be understood by looking at the traffic profile using L3 statistic collectors. Figure 41 through Figure 45 shows how the IPU traffic is impacted with a concurrent EDMA traffic and how the performance of IPU can be recovered using BW limiters on the EDMA Read from QSPI. QSPI is operating at 64 MHz in all the below BW plots.