SPRADB4 june 2023 AM69A , TDA4VH-Q1
AI Box is a cost-effective way of adding intelligence to existing non-analytics cameras in retail stores, traffic roads, factories and buildings. AI Box is preferred over AI camera because AI Box is more cost effective than replacing legacy cameras with smart cameras that have AI capabilities. Such a system receives live video streams from multiple cameras, decodes them and does intelligent video analytics at the edge relieving the burden of transferring large video streams back to the cloud for analysis. The video analytics outputs are encoded before being streamed out and saved in storage. The exemplary applications of AI Box include
Figure 3-1 shows the data flow for AI Box on AM69A, where 12 channels of 2MP bitstreams are coming through Ethernet at 30 fps. The hardware accelerated H.264 or H.265 decoder decodes the bitstreams and the decoded frames are scaled to smaller resolution by MSC. DL networks run on these smaller-resolution frames at lower frame rate, for example, 12 fps. In DL pre-processing, the smaller-resolution frames in YUV are converted to RGB, which is the input format to DL network. MMA accelerates DL inference. In DL post-processing, the network outputs are overlaid on the input frame. Then the output frames from 12 channels are stitched together into a single 2MP frame and 13 channels, that is, 12 channels plus 1 composite channel, are encoded by hardware accelerated H.264 or H.265 encoder at lower frame rate, for example, 6 fps, and streamed out or saved in storage.
Table 3-1 summarizes the resource utilization and estimated power consumptions with 12 and 16 channels of bitstreams. However, due to the limited maximum throughput of the video codec, the input frame rate and output frame rate need to be reduced to 24 fps and 4 fps, respectively, for 16 channels of bitstreams. An assumption made here is that each channel needs 1 TOPS for inference. Two C7x cores are still available for additional vision processing and JPEG image encode to create snapshots. While both DL pre- and post-processing run on A72 cores in this example, they also can run on the available C7x cores, in which power consumption can be slightly different.
Main IP | Utilization (12 × 2MP at 30 fps) | Utilization (16 × 2MP at 24 fps) |
---|---|---|
Decoder | 12 × 2MP at 30 fps = 720 MP/s (75%) | 16 × 2MP at 24 fps = 768 MP/s (80%) |
Encoder | 12 × 2MP at 6 fps + 1 composite × 2MP at 6 fps = 156 MP/s (17%) | 16 × 2MP at 4 fps + 1 composite × 2MP at 4 fps = 136 MP/s (15%) |
Encoder + Decoder | 720 MP/s + 156 MP/s = 876 MP/s (92%) | 768 MP/s + 136 MP/s = 908 MP/s (95%) |
GPU | 20% | 20% |
VPAC (MSC) | 12 × 2MP at 30 fps = 720 MP/s (60%) | 16 × 2MP at 24 fps = 768 MP/s (64%) |
MMA | 12 × 1 TOPS per ch = 12 TOPS (38%) | 16 × 1 TOPS per ch = 16 TOPS (50%) |
8 × A72 | DL pre- and post-processing, depacketization, JPEC encode, and so forth (50%) | DL pre- and post-processing, depacketization, JPEG encode, and so forth (40%) |
DDR Bandwidth | 9.49GBps (14%) | 11.95GBps (18%) |
Power Consumption (85°C) | 18 W | 18 W |