SPRAC21 Application note

SPRAC21A June 2016 – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV

15.2 QSPI XIP Code Execution Performance

In order to understand the impact of executing code from QSPI flash in XIP mode, the following test setup was used on the TDA2xx device:

QSPI configured to Mode 0 and 64 MHz operating frequency
Vision SDK (version 2.9) IPU application modified to run out of QSPI. The TI Vision SDK is a multi-processor software development platform for TI’s family of ADAS SoCs. For more information, see the TI Vision SDK, Optimized Vision Libraries for ADAS Systems (SPRY260). The software framework allows users to create different ADAS application data flows involving video capture, video pre-processing, video analytics algorithms and video display.
Cortex-M4 Unicache enabled
Cortex-M4 operating at 212 MHz

Table 63 provides a comparative analysis of the impact of QSPI XIP code execution versus DDR Cortex-M4-based Capture Display Vision SDK. Note Cortex-M4 frequency is 212 MHz. The FPS was found to match between DDR and QSPI XIP code execution. (30 FPS)

Table 63. M4_0 CPU Execution Time in QSPI XIP Mode

Scenario	M4_0 CPU Task Load (%) (load can vary by 2 % in different runs)
M4_0 code execution from DDR3 532 MHz	Approximately 6.2 %
QSPI XIP (64 MHz clock frequency, Mode 0)	Approximately 10.91 %

In order to understand the impact of the QSPI Code execution for a fully loaded M4 CPU, the networking usecase was also run with the capture display usecase. In order to run the networking threads on the M4_0 core, the Network Development Kit (NDK) (http://www.ti.com/lit/ug/spru524j/spru524j.pdf) windows application was run as shown below:

ndk_2_24_02_31\packages\ti\ndk\winapps>send <IPAddress> 2000

This tool prints out the number of megabytes of data that were sent by the tool and serviced by the TDA2xx device (M4_0) running the network stack. The M4_0 is 100% loaded in the following experiments. Table 64 provides the comparison of the achieved network throughput at different device conditions.

Table 64. M4_0 CPU Networking Bandwidth Performance

Networking Bandwidth Achieved (all numbers mega Bytes per second)	QSPI4 (64 MHz)	DDR3 532 MHz
M4 (212.8 MHz)	3.05	5.26

When there is a concurrent EDMA transfer from QSPI to DDR (possible application image copy from QSPI) while the M4_0 is executing code out of QSPI, there is a significant impact on the M4_0 code execution time. The impact on M4 code execution for varying EDMA ACNT parameter for AB_SYNC and ASYNC transfers is shown in Table 65.

Table 65. M4_0 CPU Networking Bandwidth Performance for Different EDMA ACNT Values

Networking Bandwidth Achieved With concurrent EDMA Traffic M4 @ 212.8 MHz QSPI @ 64 MHz	A_SYNC (Bandwidth in MBps)	AB_SYNC With BCNT = 512 (Bandwidth in MBps)
ACNT = 65535	0.0118	0.0046
ACNT = 16384	0.0379	0.0046
ACNT = 4096	0.233	0.0050
ACNT = 512	1.638	0.0088
ACNT = 256	2.048	0.0121
ACNT = 128	2.559	0.0122
ACNT = 64	2.730	0.0186
Without EDMA	2.935	2.935

The impact on M4 traffic can be controlled by using bandwidth limiter on the EDMA Read from QSPI. Table 66 provides the impact on performance for M4_0 code running the (1) network usecase + Capture and Display and (2) Capture-display usecase only in two independent runs.

Table 66. M4_0 CPU Networking Bandwidth Performance for Concurrent EDMA Traffic at Different EDMA Throughputs

BW Limited EDMA TPUT (ACNT = 65535, A SYNC)	M4 Networking Performance (MBps) QSPI4 (64 MHz)	M4 Capture Display Total CPU Load (%) QSPI4 (64 MHz)
22.46 MBps	0.0118 MBps	99.9 %
17.86 MBps	0.621 MBps	40.1 %
8.98 MBps	1.772 MBps	19.2 %
Without EDMA	2.935 MBps	13.2 %

The impact of the performance of the IPU code can also be understood by looking at the traffic profile using L3 statistic collectors. Figure 41 through Figure 45 shows how the IPU traffic is impacted with a concurrent EDMA traffic and how the performance of IPU can be recovered using BW limiters on the EDMA Read from QSPI. QSPI is operating at 64 MHz in all the below BW plots.

Figure 41. IPU (QSPI XIP) Vision SDK + Networking Bandwidth Profile

Figure 42. EDMA ASYNC Transfer QSPI to DDR (ACNT = 65535)

Figure 43. IPU (QSPI XIP) Vision SDK + Networking Bandwidth Profile With Concurrent EDMA Traffic

Figure 44. IPU (QSPI XIP) Vision SDK + Networking Bandwidth Profile EDMA BW Limited to Approximately 18 MBps

Figure 45. IPU (QSPI XIP) Vision SDK + Networking Bandwidth Profile EDMA BW Limtied to Approximately 9 MBps