SPRAC21A June 2016 – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV
NOTE
This section is not applicable to TDA2ex.
The Embedded Vision Engine (EVE) module, Figure 16, is a programmable imaging and vision processing engine, intended to be used in devices that serves consumer electronics imaging and vision applications. The EVE Module consists of an ARP32 scalar core and a VCOP vector core and DMA controller.
The ARP32 scalar core plays the role of the subsystem controller, coordinating internal EVE interaction (VCOP, EDMA), as well as interaction with the host processor (ARM typically) and DSP (CGEM typically). The ARP32 program memory is serviced via a dedicated direct-mapped program cache. Data memory accesses are typically serviced by tightly coupled DMEM block though ARP32 is able to access other memory blocks as well as both internal and external MMRs. The Vector Coprocessor (VCOP) is a SIMD engine with built-in loop control and address generation. The EDMA block is the local DMA, and is used to transfer data between system memories (typically SDRAM, and/or L3 SRAM) and internal EVE memories.
The OCP interconnect is conceptually broken into two categories: OCP high performance interconnect and OCP configuration interconnect. The OCP high performance interconnect serves as the primary high bandwidth (partial) crossbar connection between the EDMA, ARP32, DMA OCP Init/Target busses and the EVE Memories. The OCP CFG interconnect provides connectivity to the various MMRs located within EVE.
This section provides a throughput analysis of the EVE EDMA. The enhanced direct memory access module, also called EDMA, performs high-performance data transfers between two slave points, memories and peripheral devices without ARP32 support during transfer. EDMA transfer is programmed through a logical EDMA channel, which allows the transfer to be optimally tailored to the requirements of the application. The EDMA can also perform transfers between external memories and between device subsystems internal memories, with some performance loss caused by resource sharing between the read and write ports.
The EDMA controller block diagram is shown in Figure 17. The EDMA controller is based on two major principal blocks:
The EDMA controller’s primary purpose is to service user programmed data transfers between internal or external memory-mapped slave endpoints. It can also be configured for servicing event driven peripherals (such as serial ports), as well. There are 64 direct memory access (DMA) channels and 8 QDMA channels serviced by two concurrent physical channels.
DMA channels are triggered by external event, manual write to event set register (ESR), or chained event. QDMA are auto-triggered when write is performed to the user-programmable trigger word. Once a trigger event is recognized, the event is queued in the programmed event queue. If two events are detected simultaneously, then the lowest-numbered channel has highest priority. Each event in the event queue is processed in the order it was queued. On reaching the head of the queue, the PaRAM associated with that event is read to determine the transfer details. The transfer request (TR) submission logic evaluates the validity of the TR and is submits a valid transfer request to the appropriate transfer controller. The maximum theoretical bandwidth for a given transfer can be found by multiplying the width of the interface and the frequency at which it transfers data.
The maximum speed the transfer can achieve is equal to the bandwidth of the limiting port. In general, a given transfer scenario will never achieve maximum theoretical band width due to several factors, like transfer overheads, access latency of source/destination memories, finite number of cycles taken by EDMA CC and EDMA TC between the time the transfer event is registered to the time the first read command is issued to EDMA TC. These overheads can be calibrated by looking at the time taken to do a 1 byte transfer. These factors are not excluded in these throughput measurements.