SPRADB0 may   2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM67A , AM68A , AM69A

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
    1. 1.1 Intended Audience
    2. 1.2 Host Machine Information
  5. 2Creating the Dataset
    1. 2.1 Collecting Images
    2. 2.2 Labelling Images
    3. 2.3 Augmenting the Dataset (Optional)
  6. 3Selecting a Model
  7. 4Training the Model
    1. 4.1 Input Optimization (Optional)
  8. 5Compiling the Model
  9. 6Using the Model
  10. 7Building the End Application
    1. 7.1 Optimizing the Application With TI’s Gstreamer Plugins
    2. 7.2 Using a Raw MIPI-CSI2 Camera
  11. 8Summary
  12. 9References

Optimizing the Application With TI’s Gstreamer Plugins

Streaming applications strongly benefit from pipelining such that each component of the application runs in a separate process. This is effective when hardware accelerators and multiple CPU cores are available to allow concurrency. TI’s gstreamer plugins are optimized to leverage the heterogeneous architecture of AM6xA processors and are implemented with zero-buffer copy to reduce memory overhead.

The OpTIFlow plugins (i.e., tiovxdlpreproc, tidlinferer, and tidlpostproc) are instrumental in effective pipelining of an AI application since they allow preprocessing, inference, and postprocessing to happen in parallel. Using the OpTIFlow plugins, the only function that needs to run in dedicated application code is making use of the deep learning output. In the retail-scanner application, this means creating a small image that lists the customer’s order. Figure 7-1 illustrates this pipeline. For the exact gstreamer pipeline and a detailed explanation, see the HOW_ITS_MADE.md readme.

With this pipeline on AM62A (Processor SDK 8.6, E2 revision of the starter-kit EVM), the framerate is 15-18 fps. The main bottleneck is Python application code, as this includes additional memory copies and multiple text-drawing function calls with OpenCV. The deep learning accelerator is loaded 25-30%, quad A53 CPU at 30-40% (avg. across all cores), and 32-bit 3200 MT/s LPDDR4 at 20% load.

Optimizing the pipeline to better offload different tasks of the application represented +200% performance boost in terms of FPS and order of magnitude reduction in latency. Much more can be done to optimize, such as developing in C/C++, creating a more efficient function for writing all text to the frame at once (or removing that portion of the application altogether), or even converting application code into a GST plugin.

GUID-20230424-SS0I-GWPC-ZLZM-ZWQDPBNRM9BF-low.svg Figure 7-1 Block Diagram of gstreamer pipeline for the Retail-Scanner Application