SPRADC9 july 2023 AM62A3 , AM62A7
The AM62A SoC consists of various processing cores and hardware accelerators. Monitoring the loads on these components is important to explore the whole system capabilities and the expansion opportunity. The defect detection demo uses tiperfoverlay gstreamer plugin to show core loads as a bar graph at the bottom of the screen. Figure 5-2 shows a screenshot of the core loads graph of AM62A while running the defect detection demo. By default, the graph is updated every two seconds to show the loads as a utilization percentage. In addition to the tiperfoverlay gstreamer plugin, the perf_stats tool is a second option to show cores performance directly on the terminal with an option for file save. This option is more accurate compared to the tiperfoverlay as the later adds extra load on the Arm cores and the DDR to draw the graph and overlay it on the screen.
The graph shown in Figure 5-2 shows that the defect detection demo in addition to the whole supporting Linux processes utilizes about only 39% of the Arm cores capacity (averaged across four A53 cores). In the same time the yolox-nano-lite used in the application utilizes about 22% of the C7xMMA deep learning accelerator. It is important to note that in this experiment, the C7xMMA is clocked at 850 MHz instead of 1000 MHz. In other words, if the C7xMMA accelerator was clocked at 1000 MHz, its utilization will be less than the reported 22%. The DDR used for read operations is 1706 MB/s and for write operations is 1118 MB/s resulting in a total of 2824 MB/s operations. The AM62A supports a total DDR band of 12.8 GB/s when using 32 bit DDR4 at 3200 MT/s. The total 2824 MB/s utilizes about 22 % of the total DDR bandwidth.
These low utilization values of the Arm cores, accelerators, and DDR bandwidth indicate that there is a big room for expansion on the AM62A to run additional applications or to expand the defect detection application itself such as increasing the frame rate by using another faster camera. In addition, the low cores utilization provides flexibility to select the right SoC variant of AM6A. The core loads shown in Figure 5-2 are for the AM62A74 variant of the SOC AM62A family. This variant is equipped with four A53 Arm cores and a C7xMMA deep learning accelerators capable of executing two TOPS. The low utilization values suggest that the defect detection demo in its current form can be implemented on other lower end variants of the AM62A such as AM62A3, which includes two Arm cores and one TOPS deep learning accelerator.