The AM69A processor is the best
performance device among the AM6xA scalable embedded processor family. Along with
the octal-core Arm®
Cortex® A72 microprocessor,
the AM69A provides the most significant levels of processing power, image and video
processing, and graphics capability. Compared with the AM62A(1) and
the AM68A(2),
which are excellent choices for the applications with 1 – 2 cameras and 4 – 8
cameras, respectively, the AM69A enables the real-time processing of 12 cameras with
improved AI performance. As shown in Figure 2-1, the AM69A
processor features the following multiple sub-systems based on the heterogeneous
architecture:
- An octal-core Arm Cortex-A72
microprocessor at 2 GHz provides up to 100K Dhrystone Million
Instructions Per Second (DMIPS).
- Vision Processing Accelerator
V3 (VPAC3) performs image processing in Vision Image Sub-System (VISS)
to support raw image sensor through de-mosaic, defective pixel correction, auto
exposure, auto white balance, chromatic aberration correction (CAC), and so
forth. In addition, VPAC3 includes Lens Distortion Correction (LDC),
Multi-Scaler (MSC), and Bilateral Noise Filter (BNF) hardware accelerators
(HWAs) to accelerate correction of distorted images, down scaling of images into
multiple resolutions and noise filtering, respectively. The AM69A has two
instances of VPAC3, which can process 1,200 MP per second (MP/s) when assuming
20% system overhead.
- Digital Signal Processing
(DSP) and Matrix Multiplication Accelerator (MMA) are integrated
together for DL acceleration as well as traditional computer vision tasks. The
AM69A processor has four 512-bit C7x DSP running at 1 GHz. And each of them is
tightly coupled with one of four MMAs capable of 4K (64 × 64) 8-bit fixed
multiply accumulates per cycles. When run at 1 GHz, four MMAs provide a maximum
compute speed of 32 Trillion Operations per Second (TOPS).
- H.264, H.265 codec can encode and decode multiple channels
simultaneously. H.264, H.265 codec supports H.264 Baseline, Main, High Profile
at L5.2, and H.265 Main Profile at L5.1. There are two instances of video codec
so that the H.264, H.265 encoder and decoder can process 960 MP/s, for example,
16 channels of 2MP at 30 frames per second (fps).
- 3x 4-lane mobile industry
processor interface (MIPI) CIS-2 RX ports are equipped in AM69A. Three
high-resolution (for example, 12MP) cameras can be directly connected to CSI-2
RX ports, captured and pre-processed by two VPAC3 instances. Capturing twelve
2MP cameras is possible via MIPI CSI-2 4-to-1 aggregators.
- BXS-4-64 GPU offers up to
50 Giga Floating-point Operations Per Second (GFLOPS) to enable dynamic 2D and
3D rendering for enhanced viewing applications.
- Display Sub-System (DSS)
supports multiple displays with the flexibility to interface with different
panel types such as eDP, DSI, and DPI.
- Improved memory architecture
and high-speed interfaces improve the system throughput and energy
efficiency by enabling high utilization of cores and HWAs. AM69A supports up to
64 Giga Bytes Per Second (GBps) DDR memory bandwidth.
Deep learning inference efficiency is
crucial for the performance of an edge AI system. As the Performance and
efficiency benchmarking with TDA4 Edge AI processors application
note shows, MMA-based deep learning inference is 60% more efficient than a GPU-based
one in terms of FPS and TOPS. The optimized network models for C7xMMA are also
provided by TI Model Zoo(3),
which is a large collection of DNN models optimized for C7xMMA for various computer
vision tasks. The models include popular image classification, 2D and 3D object
detection, semantic segmentation, and 6D pose estimation models. For the several
models in TI Model Zoo, the 8-bit fixed-point inference performances on the TI
embedded processors including AM69A can be evaluated via TI's Edge AI
Studio.
The multicore heterogeneous
architecture of AM6xA provides flexibility to optimize the performance of an edge AI
system for various applications by utilizing suitable programmable cores or HWAs for
particular tasks. For example, on AM69A, computationally-intense deep learning (DL)
inference can run on four instances of MMA with optimized DL models, and vision
processing, video encoding and decoding can be offloaded to two instances of VPAC3
and hardware-accelerated video codec for the best performance. Other functional
blocks can be programmed in eight A72 cores or available C7x cores. Section 3 describes how
edge AI systems can be built on AM69A for several industrial use cases.