SPRADH2A February 2024 – November 2024 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM62P , AM62P-Q1 , DS90UB953A-Q1 , DS90UB960-Q1 , TDES960 , TSER953
The setup with four cameras using the V3Link board and the AM62A SK was tested in various application scenarios, including directly displaying on a screen, streaming over Ethernet (four UDP channels), recording to 4 separate files, and with deep learning inference. In each experiment, we monitored the frame rate and the utilization of CPU cores to explore the whole system capabilities.
As previously shown in Figure 4-4, the deep learning pipeline uses the tiperfoverlay GStreamer plugin to show CPU core loads as a bar graph at the bottom of the screen. By default, the graph is updated every two seconds to show the loads as a utilization percentage. In addition to the tiperfoverlay GStreamer plugin, the perf_stats tool is a second option to show cores performance directly on the terminal with an option for saving to a file. This tool is more accurate compared to the tiperfoverlay as the later adds extra load on the Arm cores and the DDR to draw the graph and overlay on the screen. The perf_stats tool is mainly used to collect hardware utilization results in all of the test cases shown in this document. Some of the important processing cores and accelerators studied in these tests include the main processors (four A53 Arm cores at 1.25GHz), the deep learning accelerator (C7x-MMA at 850MHz), the VPAC (ISP) with VISS and multiscalers (MSC0 and MSC1), and DDR operations.
Table 5-1 shows the performance and resource utilization when using AM62A with four cameras for three use cases, including streaming four cameras to a display, streaming over ethernet, and recording to four separate files. Two tests are implemented in each use case: with camera only and with deep learning inference. In addition, the first row in Table 5-1 shows hardware utilizations when only the operating system was running on AM62A without any user applications. This is used as a baseline to compare against when evaluating hardware utilizations of the other test cases. As shown in the table, the four cameras with deep learning and screen display operated at 30 FPS each with total 120 FPS for the four cameras. This high frame rate is achieved with only 86% of the deep learning accelerator (C7x-MMA) full capacity. In addition, this is important to note that the deep learning accelerator was clocked at 850MHz instead of 1000MHz in these experiments, which is about only 85% of the maximum performance.
Application | Pipeline (operation) | Output | FPS avg pipelines | FPS total | MPUs A53s at 1.25GHz [%] | MCU R5 [%] | DLA (C7x-MMA) at 850MHz [%] | VISS [%] | MSC0 [%] | MSC1 [%] | DDR Rd [MB/s] | DDR Wr [MB/s] | DDR Total [MB/s] |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No app | Baseline No operation | NA | NA | NA | 1.87 | 1 | 0 | 0 | 0 | 0 | 560 | 19 | 579 |
Camera only | Stream to Screen | Screen | 30 | 120 | 12 | 12 | 0 | 70 | 61 | 60 | 1015 | 757 | 1782 |
Stream over Ethernet | UDP: 4 ports 1920x1080 | 30 | 120 | 23 | 6 | 0 | 70 | 0 | 0 | 2071 | 1390 | 3461 | |
Record to files | 4 files 1920x1080 | 30 | 120 | 25 | 3 | 0 | 70 | 0 | 0 | 2100 | 1403 | 3503 | |
Cam with deep learning | Deep learning: Object detection MobV1-coco | Screen | 30 | 120 | 38 | 25 | 86 | 71 | 85 | 82 | 2926 | 1676 | 4602 |
Deep learning: Object detection MobV1-coco and Stream over Ethernet | UDP: 4 ports 1920x1080 | 28 | 112 | 84 | 20 | 99 | 66 | 65 | 72 | 4157 | 2563 | 6720 | |
Deep learning: Object detection MobV1-coco and record to files | 4 files 1920x1080 | 28 | 112 | 87 | 22 | 98 | 75 | 82 | 61 | 2024 | 2458 | 6482 |