SPRADH2A February   2024  – November 2024 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM62P , AM62P-Q1 , DS90UB953A-Q1 , DS90UB960-Q1 , TDES960 , TSER953

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2Connecting Multiple CSI-2 Cameras to the SoC
    1. 2.1 CSI-2 Aggregator Using SerDes
    2. 2.2 CSI-2 Aggregator without Using SerDes
    3. 2.3 Supported Camera Data Throughput
  6. 3Enabling Multiple Cameras in Software
    1. 3.1 Camera Subsystem Software Architecture
    2. 3.2 Image Pipeline Software Architecture
  7. 4Reference Design
    1. 4.1 Supported Cameras
    2. 4.2 Setting up Four IMX219 Cameras
    3. 4.3 Configuring Cameras and CSI-2 RX Interface
    4. 4.4 Streaming from Four Cameras
      1. 4.4.1 Streaming Camera Data to Display
      2. 4.4.2 Streaming Camera Data through Ethernet
      3. 4.4.3 Storing Camera Data to Files
    5. 4.5 Multicamera Deep Learning Inference
      1. 4.5.1 Model Selection
      2. 4.5.2 Pipeline Setup
  8. 5Performance Analysis
  9. 6Summary
  10. 7References
  11. 8Revision History

Performance Analysis

The setup with four cameras using the V3Link board and the AM62A SK was tested in various application scenarios, including directly displaying on a screen, streaming over Ethernet (four UDP channels), recording to 4 separate files, and with deep learning inference. In each experiment, we monitored the frame rate and the utilization of CPU cores to explore the whole system capabilities.

As previously shown in Figure 4-4, the deep learning pipeline uses the tiperfoverlay GStreamer plugin to show CPU core loads as a bar graph at the bottom of the screen. By default, the graph is updated every two seconds to show the loads as a utilization percentage. In addition to the tiperfoverlay GStreamer plugin, the perf_stats tool is a second option to show cores performance directly on the terminal with an option for saving to a file. This tool is more accurate compared to the tiperfoverlay as the later adds extra load on the Arm cores and the DDR to draw the graph and overlay on the screen. The perf_stats tool is mainly used to collect hardware utilization results in all of the test cases shown in this document. Some of the important processing cores and accelerators studied in these tests include the main processors (four A53 Arm cores at 1.25GHz), the deep learning accelerator (C7x-MMA at 850MHz), the VPAC (ISP) with VISS and multiscalers (MSC0 and MSC1), and DDR operations.

Table 5-1 shows the performance and resource utilization when using AM62A with four cameras for three use cases, including streaming four cameras to a display, streaming over ethernet, and recording to four separate files. Two tests are implemented in each use case: with camera only and with deep learning inference. In addition, the first row in Table 5-1 shows hardware utilizations when only the operating system was running on AM62A without any user applications. This is used as a baseline to compare against when evaluating hardware utilizations of the other test cases. As shown in the table, the four cameras with deep learning and screen display operated at 30 FPS each with total 120 FPS for the four cameras. This high frame rate is achieved with only 86% of the deep learning accelerator (C7x-MMA) full capacity. In addition, this is important to note that the deep learning accelerator was clocked at 850MHz instead of 1000MHz in these experiments, which is about only 85% of the maximum performance.

Table 5-1 Performance (FPS) and Resource Utilization of AM62A when used with 4 IMX219 Cameras for Screen Display, Ethernet Stream, Record to Files, and Performing Deep Learning Inferencing
Application Pipeline (operation) Output FPS avg pipelines FPS total MPUs A53s at 1.25GHz [%] MCU R5 [%] DLA (C7x-MMA) at 850MHz [%] VISS [%] MSC0 [%] MSC1 [%] DDR Rd [MB/s] DDR Wr [MB/s] DDR Total [MB/s]
No app Baseline No operation NA NA NA 1.87 1 0 0 0 0 560 19 579
Camera only Stream to Screen Screen 30 120 12 12 0 70 61 60 1015 757 1782
Stream over Ethernet UDP: 4 ports 1920x1080 30 120 23 6 0 70 0 0 2071 1390 3461
Record to files 4 files 1920x1080 30 120 25 3 0 70 0 0 2100 1403 3503
Cam with deep learning Deep learning: Object detection MobV1-coco Screen 30 120 38 25 86 71 85 82 2926 1676 4602
Deep learning: Object detection MobV1-coco and Stream over Ethernet UDP: 4 ports 1920x1080 28 112 84 20 99 66 65 72 4157 2563 6720
Deep learning: Object detection MobV1-coco and record to files 4 files 1920x1080 28 112 87 22 98 75 82 61 2024 2458 6482