SPRADH2A February 2024 – November 2024 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM62P , AM62P-Q1 , DS90UB953A-Q1 , DS90UB960-Q1 , TDES960 , TSER953

5 Performance Analysis

The setup with four cameras using the V3Link board and the AM62A SK was tested in various application scenarios, including directly displaying on a screen, streaming over Ethernet (four UDP channels), recording to 4 separate files, and with deep learning inference. In each experiment, we monitored the frame rate and the utilization of CPU cores to explore the whole system capabilities.

As previously shown in Figure 4-4, the deep learning pipeline uses the tiperfoverlay GStreamer plugin to show CPU core loads as a bar graph at the bottom of the screen. By default, the graph is updated every two seconds to show the loads as a utilization percentage. In addition to the tiperfoverlay GStreamer plugin, the perf_stats tool is a second option to show cores performance directly on the terminal with an option for saving to a file. This tool is more accurate compared to the tiperfoverlay as the later adds extra load on the Arm cores and the DDR to draw the graph and overlay on the screen. The perf_stats tool is mainly used to collect hardware utilization results in all of the test cases shown in this document. Some of the important processing cores and accelerators studied in these tests include the main processors (four A53 Arm cores at 1.25GHz), the deep learning accelerator (C7x-MMA at 850MHz), the VPAC (ISP) with VISS and multiscalers (MSC0 and MSC1), and DDR operations.

Table 5-1 shows the performance and resource utilization when using AM62A with four cameras for three use cases, including streaming four cameras to a display, streaming over ethernet, and recording to four separate files. Two tests are implemented in each use case: with camera only and with deep learning inference. In addition, the first row in Table 5-1 shows hardware utilizations when only the operating system was running on AM62A without any user applications. This is used as a baseline to compare against when evaluating hardware utilizations of the other test cases. As shown in the table, the four cameras with deep learning and screen display operated at 30 FPS each with total 120 FPS for the four cameras. This high frame rate is achieved with only 86% of the deep learning accelerator (C7x-MMA) full capacity. In addition, this is important to note that the deep learning accelerator was clocked at 850MHz instead of 1000MHz in these experiments, which is about only 85% of the maximum performance.

Table 5-1 Performance (FPS) and Resource Utilization of AM62A when used with 4 IMX219 Cameras for Screen Display, Ethernet Stream, Record to Files, and Performing Deep Learning Inferencing

Application	Pipeline (operation)	Output	FPS avg pipelines	FPS total	MPUs A53s at 1.25GHz [%]	MCU R5 [%]	DLA (C7x-MMA) at 850MHz [%]	VISS [%]	MSC0 [%]	MSC1 [%]	DDR Rd [MB/s]	DDR Wr [MB/s]	DDR Total [MB/s]
No app	Baseline No operation	NA	NA	NA	1.87	1	0	0	0	0	560	19	579
Camera only	Stream to Screen	Screen	30	120	12	12	0	70	61	60	1015	757	1782
	Stream over Ethernet	UDP: 4 ports 1920x1080	30	120	23	6	0	70	0	0	2071	1390	3461
	Record to files	4 files 1920x1080	30	120	25	3	0	70	0	0	2100	1403	3503
Cam with deep learning	Deep learning: Object detection MobV1-coco	Screen	30	120	38	25	86	71	85	82	2926	1676	4602
	Deep learning: Object detection MobV1-coco and Stream over Ethernet	UDP: 4 ports 1920x1080	28	112	84	20	99	66	65	72	4157	2563	6720
	Deep learning: Object detection MobV1-coco and record to files	4 files 1920x1080	28	112	87	22	98	75	82	61	2024	2458	6482