SPRADB4 White paper

SPRADB4 june 2023 AM69A , TDA4VH-Q1

3.1 AI Box

AI Box is a cost-effective way of adding intelligence to existing non-analytics cameras in retail stores, traffic roads, factories and buildings. AI Box is preferred over AI camera because AI Box is more cost effective than replacing legacy cameras with smart cameras that have AI capabilities. Such a system receives live video streams from multiple cameras, decodes them and does intelligent video analytics at the edge relieving the burden of transferring large video streams back to the cloud for analysis. The video analytics outputs are encoded before being streamed out and saved in storage. The exemplary applications of AI Box include

Security surveillance system, which detects events and anomalous activities in the areas monitored by the remote cameras.
Smart traffic management system, where AI Box is running deep learning networks for vehicle counting, vehicle type classification and moving direction predictions for traffic flow measurement.
Workplace safety system, which monitors workplace to provide compliance with all safety standards imposed, for example, workers wearing personal protective equipment (PPE).

Figure 3-1 AI Box Block Diagram With Data Flow on AM69A

Figure 3-1 shows the data flow for AI Box on AM69A, where 12 channels of 2MP bitstreams are coming through Ethernet at 30 fps. The hardware accelerated H.264 or H.265 decoder decodes the bitstreams and the decoded frames are scaled to smaller resolution by MSC. DL networks run on these smaller-resolution frames at lower frame rate, for example, 12 fps. In DL pre-processing, the smaller-resolution frames in YUV are converted to RGB, which is the input format to DL network. MMA accelerates DL inference. In DL post-processing, the network outputs are overlaid on the input frame. Then the output frames from 12 channels are stitched together into a single 2MP frame and 13 channels, that is, 12 channels plus 1 composite channel, are encoded by hardware accelerated H.264 or H.265 encoder at lower frame rate, for example, 6 fps, and streamed out or saved in storage.

Table 3-1 summarizes the resource utilization and estimated power consumptions with 12 and 16 channels of bitstreams. However, due to the limited maximum throughput of the video codec, the input frame rate and output frame rate need to be reduced to 24 fps and 4 fps, respectively, for 16 channels of bitstreams. An assumption made here is that each channel needs 1 TOPS for inference. Two C7x cores are still available for additional vision processing and JPEG image encode to create snapshots. While both DL pre- and post-processing run on A72 cores in this example, they also can run on the available C7x cores, in which power consumption can be slightly different.

Table 3-1 AM69A Resource Utilization and Power Consumption Estimate for the AI Box Use Case

Main IP	Utilization (12 × 2MP at 30 fps)	Utilization (16 × 2MP at 24 fps)
Decoder	12 × 2MP at 30 fps = 720 MP/s (75%)	16 × 2MP at 24 fps = 768 MP/s (80%)
Encoder	12 × 2MP at 6 fps + 1 composite × 2MP at 6 fps = 156 MP/s (17%)	16 × 2MP at 4 fps + 1 composite × 2MP at 4 fps = 136 MP/s (15%)
Encoder + Decoder	720 MP/s + 156 MP/s = 876 MP/s (92%)	768 MP/s + 136 MP/s = 908 MP/s (95%)
GPU	20%	20%
VPAC (MSC)	12 × 2MP at 30 fps = 720 MP/s (60%)	16 × 2MP at 24 fps = 768 MP/s (64%)
MMA	12 × 1 TOPS per ch = 12 TOPS (38%)	16 × 1 TOPS per ch = 16 TOPS (50%)
8 × A72	DL pre- and post-processing, depacketization, JPEC encode, and so forth (50%)	DL pre- and post-processing, depacketization, JPEG encode, and so forth (40%)
DDR Bandwidth	9.49GBps (14%)	11.95GBps (18%)
Power Consumption (85°C)	18 W	18 W