SPRACZ2 August 2022 TDA4VM , TDA4VM-Q1
ADVANCE INFORMATION
There are several architectures available for embedded edge AI applications. Figure 2-2 below shows some of the SoC devices based on different architectures – GPU (graphical processing unit), FPGA (field programmable gate array) and other embedded processors [6] [7] [8]. There are many other technologies offering higher than 32 TOPS of deep learning performance but such high performance is typically needed for data center and infrastructure applications. This application note focuses on the edge AI segment so primarily SoC architectures with performance up to 32 TOPS are considered.
To measure AI inference performance, TOPS (tera operations per second) can be used as the first indicator. However, actual system performance depends on many other factors influenced by the processor interconnect and data flow architecture. This is because a typical deep learning model has a lot of data movement, and it is not just the computations that affect the inference time but also how the data is handled efficiently. This, in turn, also affects power consumption of the processor and the overall system. So, in this application note, we will look at additional metrics to design an efficient edge AI system.