SPRADB4 june   2023 AM69A , TDA4VH-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2AM69 Processor
  6. 3Edge AI Use Cases on AM69A
    1. 3.1 AI Box
    2. 3.2 Machine Vision
    3. 3.3 Multi-Camera AI
    4. 3.4 Other Use Cases
  7. 4Software Tools and Support
  8. 5Conclusion
  9. 6References

AI Box

AI Box is a cost-effective way of adding intelligence to existing non-analytics cameras in retail stores, traffic roads, factories and buildings. AI Box is preferred over AI camera because AI Box is more cost effective than replacing legacy cameras with smart cameras that have AI capabilities. Such a system receives live video streams from multiple cameras, decodes them and does intelligent video analytics at the edge relieving the burden of transferring large video streams back to the cloud for analysis. The video analytics outputs are encoded before being streamed out and saved in storage. The exemplary applications of AI Box include

  • Security surveillance system, which detects events and anomalous activities in the areas monitored by the remote cameras.
  • Smart traffic management system, where AI Box is running deep learning networks for vehicle counting, vehicle type classification and moving direction predictions for traffic flow measurement.
  • Workplace safety system, which monitors workplace to provide compliance with all safety standards imposed, for example, workers wearing personal protective equipment (PPE).
GUID-20230517-SS0I-LMMX-BRSN-2D5WPPKSD8LR-low.svgFigure 3-1 AI Box Block Diagram With Data Flow on AM69A

Figure 3-1 shows the data flow for AI Box on AM69A, where 12 channels of 2MP bitstreams are coming through Ethernet at 30 fps. The hardware accelerated H.264 or H.265 decoder decodes the bitstreams and the decoded frames are scaled to smaller resolution by MSC. DL networks run on these smaller-resolution frames at lower frame rate, for example, 12 fps. In DL pre-processing, the smaller-resolution frames in YUV are converted to RGB, which is the input format to DL network. MMA accelerates DL inference. In DL post-processing, the network outputs are overlaid on the input frame. Then the output frames from 12 channels are stitched together into a single 2MP frame and 13 channels, that is, 12 channels plus 1 composite channel, are encoded by hardware accelerated H.264 or H.265 encoder at lower frame rate, for example, 6 fps, and streamed out or saved in storage.

Table 3-1 summarizes the resource utilization and estimated power consumptions with 12 and 16 channels of bitstreams. However, due to the limited maximum throughput of the video codec, the input frame rate and output frame rate need to be reduced to 24 fps and 4 fps, respectively, for 16 channels of bitstreams. An assumption made here is that each channel needs 1 TOPS for inference. Two C7x cores are still available for additional vision processing and JPEG image encode to create snapshots. While both DL pre- and post-processing run on A72 cores in this example, they also can run on the available C7x cores, in which power consumption can be slightly different.

Table 3-1 AM69A Resource Utilization and Power Consumption Estimate for the AI Box Use Case
Main IPUtilization (12 × 2MP at 30 fps)Utilization (16 × 2MP at 24 fps)
Decoder12 × 2MP at 30 fps = 720 MP/s (75%)16 × 2MP at 24 fps = 768 MP/s (80%)
Encoder12 × 2MP at 6 fps + 1 composite × 2MP at 6 fps = 156 MP/s (17%)16 × 2MP at 4 fps + 1 composite × 2MP at 4 fps = 136 MP/s (15%)
Encoder + Decoder720 MP/s + 156 MP/s = 876 MP/s (92%)768 MP/s + 136 MP/s = 908 MP/s (95%)
GPU20%20%
VPAC (MSC)12 × 2MP at 30 fps = 720 MP/s (60%)16 × 2MP at 24 fps = 768 MP/s (64%)
MMA12 × 1 TOPS per ch = 12 TOPS (38%)16 × 1 TOPS per ch = 16 TOPS (50%)
8 × A72DL pre- and post-processing, depacketization, JPEC encode, and so forth (50%)DL pre- and post-processing, depacketization, JPEG encode, and so forth (40%)
DDR Bandwidth9.49GBps (14%)11.95GBps (18%)
Power Consumption (85°C)18 W18 W