SPRADD2 august 2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1

5 Deep Learning Acceleration

Convolutional Neural Networks (CNN’s) are well suited for several of the computer vision tasks required in a DMS/OMS system. As an example, for a DMS system the first step involves accurately identifying the key points around the drivers’ eyes and mouth for accurate Gaze tracking and Drowsiness detection. Driver engagement may involve detecting the activity of the driver, for example, if the driver has food in his hands or is talking on the cellphone by his ear both of which are object detection tasks. For Occupancy monitoring systems the “child in the backseat” can be framed as an object detection task while the seat belt check for driver and passenger can be framed as a semantic segmentation problem. CNN’s can be run on both ARM-A53 cores as well as C7/MMA deep learning accelerator

AM62A’s deep learning accelerator is the C7/MMA DSP engine that is capable of up to 2TOPs of performance. It consists of a floating point C7- DSP that is coupled with a matrix multiply accelerator that is capable of multiplying two 8-bit vectors of length 32 in 1 cycle. When clocked at 1GHz, and counting a MAC as two operations it is capable of 2x32x32x1GHz=2 Tera Operations per second. The matrix multiply accelerator is a general-purpose accelerator and enables ~50x speed up when running typical Convolution Neural Networks compared to an A-53 core.

AM62A’s SDK supports the three popular runtime frameworks for deploying and executing machine learning models namely a) tflite-runtime b) ONNX-runtime c) TVM. This enables the user to train your model anywhere and deploy into the hardware with only a few lines of your code using your favorite industry-standard Python or C++ application programming interface (API) from one of the frameworks. The TIDL compilation tools take care of all the memory optimizations required to map a network optimally to AM62A allowing the user to focus more on network design and selection.

TI also provides a model analyzer and model selection tool [2] that enables 3P perception stack providers to choose the deep learning model that will provide the maximum entitlement in terms of frames per second and accuracy. As an example, Table 5-1 illustrates the performance entitlement with SSDLite-MobDet-EdgeTPU-Coco model when running at 30 FPS.

Table 5-1 Example Performance of a CNN on AM62A With C7/MMA@850MHz

Model	Resolution	Target FPS	MAP Accuracy On CoCo	Latency (ms)	Deep Learning Utilization	DDR BW Utilization
SSDLite-MobDet-. EdgeTPU-coco	320x320	30	29.7	8.35	25%	504MB/s