The VPAC subsystem includes imaging pipe to do image processing on raw pixels (bayer, RCCC, etc.). The VPAC subsystem loads pixel data either from memory (for example, DDR or on-chip memory), or captured from camera sensor (via external module on SoC level), and applies many vision primitive functions such as image processing, scaling to generate image pyramids, noise filtering, correcting lens distortion, applying perpective transformation for stereo rectification, etc. Optionally, the VPAC subsystem can also do dynamic range management on received image data. The output image processed data generated by the VPAC subsystem is written into memory (for example, DDR or on-chip). The VPAC subsysem supports software managed and HTS controlled flexible load/store of data using K3 DMA infrastructure sub-modules.
The VPAC subsystem includes the following image processing harware accelerators (HWAs) and infrastructure blocks:
- Vision Imaging Subsystem (VISS). The VISS does on-the-fly (OTF) processing on raw pixels captured from camera sensor. It also does memory to memory processing for data captured from other sources ot its parallel port or video port. The VISS uses VPAC infrastructure modules (HTS, DRU/UTC, interrupt aggregator, etc.) for flow and data management. For more information on the VISS operation and the sub-modules integrated inside, see Section VPAC Vision Imaging Subsystem (VISS). The VISS consists of following Hardware Accelerators (HWA):
- RAW Front End (RFE) block: The RFE block does RAW
pixel (that is, Bayer, RCCC, RGBW, etc.) processing on captured image
data from sensor. The processed data is later passed on to the NSF4V
block, or otherwise it is passed onto the Flexible Color Proccesing
(FCP) block for demosaicing and color conversion. The RFE includes also
the H3A block, which supports the control loops for auto focus (AF),
auto white balance (AWB) and auto exposure (AE) by computing image
statistics.
- Noise Filter Engine (NSF4v): The NSF4v is a spatial, 3-level wavelet based noise filter engine, supporting generic 2x2 sensor. It receives data from the RFE block and sends it to the GLBCE module.
- Global Local Brightness Contrast Enhancement (GLBCE) module: The GLBCE module is used for dynamic range control within image for visual quality. If a contrast enhancement on the input image is required for visual quality, the RFE output is processed by the NSF4v and GLBCE blocks and then passed onto the FCP block. Otherwise, if contrast enhancement is not required, the NSF4v and GLBCE blocks can be bypassed, and the RFE output is directly provided to the FCP block.
- Flexible Color Processing (FCP) block: The FCP block receives data from GLBCE and does demosaicing and color conversion. The output of the color processing is sent to the internal EE (Edge Enhancer) block, if it is enabled, before being ouput to VPAC shared memory. Otherwise, the FCP output data is directly sent to VPAC shared memory to be written into external memory for further vision processing by programmable processors (that is, DSP or Arm), or other vision hardware accelerators at SoC level.
- Load/Store Engine (LSE): The LSE is an infrastructure block, which provides the following functions:
- Receives data, captured through MIPI CSI-2 image sensor (by external module at SoC level), for on-the-fly image processing to reduce DDR bandwidth and latency. LSE also provides horizontal and vertical blanking cycles to allow core data path to settle at proper boundary of line and frame.
- Provides access to VPAC SL2 memory for loading/storing data. Loaded data from SL2 are passed on to unpacker function of LSE for RFE. Packed data after FCP/H3A processing is written into SL2.
- Pixel packing/unpacking. Source data (512-bit) for RFE
processing is loaded from SL2 and passed onto unpacker function
for pixel extraction. Extracted pixels are driven on the video
port of RFE. Similarly, FCP produced pixels are driven to packer
function for eventual write into SL2. H3A generated data is
pseudo mapped as pixel of 32 bits for packing purpose and
directly driven by RFE.
- Event control. HTS events are generated at line
level. These events are routed to RFE to start the processing.
For each consumed line HTS needs task done indication.
For some initial lines consumed, there would not be any valid
data output. Similarly, H3A generated data will be at
paxel/window height number of lines. For consumed lines, which is not producing any valid data
to be written into DDR, HTS needs to be indicated with
separate mask bits for each output streams. For initial
lines, when there is no valid output due to delay lines, LSE
will still generates mask output to HTS indicating lack of
proper output data.
- Lens Distortion Correction (LDC) module. The LDC module deals with lens geometric distortion issues in the camera system. The distortion can be common optical distortions, such as barrel distortion, pin-cushion distortion, or fisheye distortion. Correction is not limited to just these types of distortions. The LDC module consists of a Back Mapping block (which gives coordinates of the distorted image as a function of coordinates of the undistorted output image), a delta_x/delta_y offset table, frame buffer inetrface, buffer, an interpolation block, and SL2 interface. The frame buffer is external to the LDC module, and is usually in an off-chip SDRAM. The LDC module uses the common VPAC infratructure modules (HTS, DRU/UTC, interrupt aggregator, etc.) for flow and data management, except for loading input data for which it has its own DMA engine. For more details on the LDC operation, see Section VPAC Lens Distortion Correction (LDC).
- Multi-Scaler (MSC) module. The MSC consists of 10
programmable resizers performing multi-thread/multi-scaling operations (2-input
to N-outputs, and 2-input to M-outputs, where N+M is 10 or less). Each
processing thread of MSC reads its input plane data from the SL2 circular line
buffer, performs multi-scaling operations (ratios between X and 0.25X) on the
same input, and writes out results to SL2 circular line buffers. In case of
on-the-fly operation, the source data is generated from another VPAC HWA. In
case of memory-to-memory operation, the source data is read from the external
memory. Data transfers from/to SL2 to/from external memory (or another HWA) are
handled by the VPAC DMA controller (DRU/UTC), with transfer request events
coming from the VPAC HTS module. The MSC module uses the VPAC infrastrucutre
modules (HTS, DRU/UTC, interrupt aggregator, etc.) for flow and data management.
For more details on the the MSC operation, see Section VPAC Subsystem
Multi-Scaler (MSC).
- Noise-filter (NF). The NF module reads data from memory (that is, DDR or on-chip) to the shared memory (SL2) and does bilateral filtering to remove noise. The output of the NF block can be sent to external memory (that is, DDR) from the shared memory (SL2) or can be further re-sized using the MSC module. The NF module supports two modes of filtering: bilateral filtering mode (where the overall weights are 8-bit unsigned weights computed based on center pixels), and generic filtering mode (where all weights are 9-bit signed value read LUT). The NF module uses the VPAC infrastrucutre modules (HTS, DRU/UTC, interrupt aggregator, etc.) for flow and data management. For more details on the NF operation, see Section VPAC Nosise Filter (NF).
- Hardware Thread Scheduler (HTS). The HTS module is a messaging layer for low-overhead synchronization of the parallel computing tasks and DMA transfers, and is independent from the host processor. It allows autonomous frame level processing for the VPAC subsystem. The HTS module defines various aspects of synchronization and data sharing between the VPAC HWAs. With regards to producer and consumer dependencies, the HTS module ensures that a task starts only when input data and adequate space to write out data is available. In addition to this, it also takes care of pipe-up, debug, abort and interrupt management aspects related to the VPAC HWAs. For more details on the HTS operation, see Section VPAC Hardware Thread Scheduler (HTS).
- Common shared level 2 (SL2) memory. The SL2 memory subsystem implements a full crossbar inteconnect, and serves as input/output scratch memory for the VPAC HWAs.
- Data Router Unit (DRU). The DRUs acts like (and is also referred to as) a
Universal Transfer Controllers (UTCs) for managing real-time and nonreal-time
DMA transfers.
- Switched Central Resource (SCR) block. The SCR acts
like a 256-bit data routing interconnect within the VPAC subsystem, and is used
to route UTC traffic onto SL2 memory or for external (system level) access.
- Counter, Timer and System Event Trace (CTSET) module. The CTSET module provides event tracing capabilities for debug purposes.
- ECC Aggregator. The ECC Aggregator modules provide a mechanism to control and monitor certain internal ECC RAMs via Single Error Correction (SEC) and Double Error Detection (DED) functions.