The L1D includes a non-blocking cache controller, which has a main cache and a small fully associative victim cache. Main cache is read-allocate and supports write-back and write-through. Victim cache services read and write hit directly with no performance overhead as main cache.
The purpose of the L1D memory/cache is to maximize performance of the data processing. The cache is necessary to facilitate reading and writing data at the full CPU clock rate, while still having a large system memory. It is the cache's responsibility to hide much of the latency associated with reading from and writing to the slower system memory. The L1D memory includes two logical sections:
- L1D partition 0: Can support 32KB (max) / 8KB (min) of cache. Cache may only reside in partition 0. A cache size of 0KB is not supported, which means that disabling L1D cache functionality is not possible
- L1D partition 1: Can support 40KB (max) / 16KB (min) of SRAM (may include LUT and histogram)
Note: The minimum cache size (8KB) can only be achieved via L1DMODE register programming.
The L1D memory system provides the following key features:
- L1 data memory controller (DMC) with 48KB L1D memory, configurable as cache and/or SRAM
- L1D partition 0 (cache/SRAM) + L1D partition 1 (SRAM only)
- L1D cache
- 32KB cache (max), configurable down to 8KB cache (min)
- Dual datapath (DP0 + DP1) support
- Direct-mapped (1-way set-associative) main cache
- 128-byte cache line size
- Read-allocate cache
- Support for both write-back and write-through modes
- Support for victim cache
- 16 cache lines, fully associative
- Physically indexed, physically tagged (40-bit physical address)
- Support for speculative loads
- Hit under miss
- Posted write miss support
- Write merging on all outstanding write transactions inside L1D
- L1D SRAM
- 16KB (min), configurable up to 40KB (max)
- Accessible from CPU or DMA
- Lookup table (LUT) and histogram
- LUT
- Up to 4 sets of look up tables can be specified simultaneously
- Supports interpolation mode
- Indices are unsigned values at 32-bit lanes of the source register
- Look up table elements can be signed or unsigned, bytes, half-words or words
- Histogram
- Up to 4 sets of histograms can be specified simultaneously
- Supports regular and weighted histogram operations
- Indices are unsigned values at 32-bit lanes of the source register
- Histogram weights are at 32-bit lanes of the source register
- Histogram weights can only be signed bytes or signed half-words
- Histogram bin data saturates to the minimum or maximum values of its data bin type
- Bandwidth
- 1024-bit data throughput
- 16×64-bit banks
- ECC SECDED support
- Coherence
- Full MESI support
- Support for global cache coherence operations
- Snoops and cache maintenance operation support from L2
- Snoops for L2 SRAM, MSMC SRAM and external (DDR) addresses
- Virtual memory support
- Support for wider (40-bit) physical address
- ECR access
- L1D ECR registers are not memory mapped and instead are mapped to a MOVC CPU instruction