SPRACZ2 August 2022 TDA4VM , TDA4VM-Q1
ADVANCE INFORMATION
Using the matrix multiplication accelerator (MMA) as the acceleration for AI functions, the overall TDA4x block diagram is shown in the below Figure 2-3. Based on heterogenous architecture, the TDA4x System on Chip (SoC) optimizes entire platform around easy programming on the multi-core Cortex-A72 microprocessor unit (MPUs) while offloading compute intensive tasks such as deep learning, imaging, vision, video, and graphics processing to the specialized hardware accelerators and programable cores. High throughput and high energy efficiency are enabled by holistic system level integration of these cores using high bandwidth interconnect and smart memory architecture. An optimized system BOM is achieved by advanced integration of the system components.
As we discussed in the previous section, TOPS (tera operations per second) are used to measure the deep learning performance comparison. However, actual inference time depends on the efficiency of the system architecture making use of optimum data flow in the system. So, a better performance benchmarking is inference time for a given model at a given input image resolution. If the inference time is lesser, more images can be processed resulting in higher frames per second (FPS). So, FPS divided by TOPS (FPS/TOPS) indicates the deep learning architecture efficiency. Similarly, FPS divided by Watts (FPS/Watt) is a good benchmark for energy efficiency of an embedded processor.