SPRACZ2 August 2022 TDA4VM , TDA4VM-Q1
ADVANCE INFORMATION
Image classification: As we saw in the introduction, image classification is a commonly used deep learning function for applications that include photo searches, text extraction, and industrial automation, such as object sorting and defect detection. MLPerf uses the ImageNet 2012 data set [10], crop the images to 224x224 in preprocessing, and measure Top-1 accuracy. MLPerf suggests two models: a computationally heavyweight model that is more accurate and a computationally lightweight model that is faster but less accurate. The heavyweight model, ResNet-50 v1.5 [16] is used in this benchmarking and comparison.
Object detection. Object detection is a vision task that determines the coordinates of bounding boxes around objects in an image and then classifies those boxes. Implementations typically use a pretrained image-classifier network as a backbone or feature extractor, then perform regression for localization and bounding-box selection. Object detection is crucial for a multitude of tasks in automotive and robotics, such as detecting hazards and analyzing traffic, and for mobile-retail tasks, such as identifying items in a picture. MLPerf suggests two models: a lightweight model using 300x300 image and a heavyweight model using 1200x1200 image with the COCO data set [11].
Based on this, the two models used in the app note are shown in Table 3-1below.
DL Model | Function | Image size | Data set | Compute requirements per input |
---|---|---|---|---|
ResNet-50 | Image Classification | 224x224 | IMAGENET | 8.2 GOPS 25.6 million parameters |
SSD MobileNet-V1 | Object Detection | 300x300 | COCO | 2.47 GOPS 6.91 million parameters |
MLPerf inference standard also defines different scenarios for benchmarking - single-stream, multi-stream, server, and offline. For real-time embedded edge AI systems such as smart cameras, machine vision and robotics, the most relevant scenarios are single-stream and multi-stream use cases involving image and video processing from single and multiple cameras simultaneously. We will be using single-stream use case in the benchmarking.