SPRUJ28E November 2021 – September 2024 AM68 , AM68A , TDA4AL-Q1 , TDA4VE-Q1 , TDA4VL-Q1
Figure 6-46 shows the hierarchical architecture of the Video Accelerator and external bus interfaces.
The Video Accelerator is built on a layer structure for efficient video processing. It is composed of two main parts, V-CPU and V-CORE. V-CPU is a 32-bit processor where the Video Accelerator L1 driver software is running. It mainly encodes and decodes a high-level syntax of bitstream - SPS/PPS/SH, and communicates with the host processor, such as receiving a picture level command from the host processor or sending the result of video task through the 32-bit AMBA3 APB bus. V-CPU can also parse necessary parameters and give necessary slice/tile-level information to the underlying V-CORE hardware.
Particularly in case of an HEVC decoder, V-CPU uses a virtual wavefront parallel processing (VWPP) scheme to balance processing load among V-COREs and to simplify its operation. During the VWPP, V-CPU parses bitstream and extracts some data from bitstream such as CABAC context, QP, and start address for each row and tile. After that, V-CPU reorders the data to meet the WPP processing sequence and dispatches them to the appropriate V-COREs.
V-CORE can encode and decode a slice unit of data. It consists of a 16-bit DSP called a bit processor unit (BPU) and a group of video codec hardware blocks called a video codec engine (VCE).
They are doing these tasks with CTU-based pipelines and FIFO queues to ensure real-time speed and performance. When it comes to system connection as shown in the Figure 6-46, there is one 32-bit AMBA3 APB interface connecting to the host processor, and there are two 128-bit AMBA3 AXI bus interfaces connecting to the external memory controller for reading and writing picture data and temporal data.