AI is revolutionizing our lifestyles with constant innovation in deep learning and machine learning driving new use cases across home, retail, and factories. AI at the edge is instrumental for continued success of AI delivering low latency, privacy, and better user experience. The key AI function that happens in an embedded edge device is inference. This is where Texas Instruments (TI) is innovating with TDA4x processor family specially designed to make greener, smarter, and safer edgeAI devices possible.
With industry-leading vision and AI accelerators, TDA4x processors achieve more than 60% higher deep learning performance and energy efficiency compared to leading GPU based architectures. With process and technology leadership, developers can achieve more than six times better deep learning performance compared to leading FPGA based architectures that exist today.
This application note uses the industry standard performance and power benchmarking used to compare the TDA4x system-on-chip (SoC) with other architectures. TDA4x processor family also comes with easy to use, no-cost to low-cost development platforms making it easier for developers to innovate with AI even without any prior experience.
The world population today is 7.8 billion and is on the constant rise with an estimate of 10 billion by 2050 [1]. The growing population needs necessities such as food, clothing and ever-increasing comforts and tools - safely and securely. There is constant technology innovation across all markets - consumer, industrial and automotive to meet these needs. New technologies that we all got used to have made the data generation more affordable and more fun. Think about the number of pictures taken with smart phones and the amount of data being generated from various sensors and edge devices in buildings and factories. All this data is propelling end-to-end automation in factories and buildings to drive productivity to produce more goods and services. This exponentially increases the data that has to be managed - processed, analyzed to take corrective actions. For example, a smart factory could have more than 50,000 sensors and generate several petabytes a day. Even a standard office building will generate hundreds of gigabytes of data. Most of this data will be stored, managed, analyzed and kept right where it was produced, at the edge driven by security, real-time performance and reliability
There are three types of data produced in the edge devices - video, audio and other sensor data. Video-based analytics tend to be more complex as each video is a collection of images per second and the image itself will have multiple channels - Red, Green and Blue. With advances in cameras, vision-based analytics are gaining momentum across many applications - smart video doorbells, video surveillance, drones, robots, autonomous vehicles and last mile delivery. Fundamentally, there are three functions one can implement with vision-based analytics as shown in Figure 1-1: Classification, Detection and Segmentation. You can see from the same image, three different functions that can be implemented based on vision-analytics in an edgeAI system - starting from classifying the image to pixel level analysis of the entire scene.
The AI-enabled vision market is an exciting area with rapid growth expected in the next few years. Vision based analytics have broad use cases across many different markets and end equipment as shown in Figure 1-2.
For example, a robot in factory or warehouse setting or a last-mile robot can use vision-based analytics to do the below functions.
A surveillance camera can be smarter with edgeAI functionality by analyzing the objects to do extended functions such as:
A smart shopping cart can use vision-based analytics to completely automate shopping experience for better efficiency and lower costs to consumers by adding functions such as:
Deep learning model development is a hot research area in the deep learning and AI community. Driven by smartphones and the amount of image data that is being generated, a special focus has been given for image or vision deep learning functions to be able to identify faces, scenes, moods and other information in pictures. A specific type of neural network, Convolution Neural network (CNN), is an enabler for the latest advancements in computer vision. Convolution is a cool technique to detect different features in the input image. Convolution process uses a kernel, also called a filer, to sweep across an image to detect patterns in the image. A kernel is a very small matrix (usually 3x3 or 5x5) with a set of weights corresponding to its size and typically detects one feature in the image such as eyes, nose or a specific expression.
The seminal AlexNet [3] paper in 2012, showed researchers and industry that deep learning was an extremely effective algorithmic processing technique in solving computer vision tasks like classification, object detection and semantic segmentation. This triggered a series of new innovations continuing to improve the inference performance and efficiency targeting myriad of applications from robots and smart retail carts to last mile delivery autonomous delivery systems.
Figure 1-3 below shows popular models used in the industry today starting with AlexNet [4]. In general, there is a clear trade-off between the accuracy of the model and the number of operations used by the model shown in Giga operations (G-ops) [4].
This trade-off highlights the need for efficient SoC architectures to be able to run large computations efficiently. Deep learning community is pretty vibrant and constant innovation is happening to improve the model performance and efficiency. TI is constantly looking into new SoC technology innovations to offer this community best-in-class AI capabilities.
It is common practice to use architectures of deep learning networks published in literature. There are open-source implementations of several models. TI is making the process even easier with its own Model Zoo. TI's ModelZoo is a large collection of models that are optimized for inference speed and low power consumption. The models used in this benchmarking app note are examples of such open source models.