SPRAC21 Application note

SPRAC21A June 2016 – June 2019 OMAP-L132 , OMAP-L138 , TDA2E , TDA2EG-17 , TDA2HF , TDA2HG , TDA2HV , TDA2LF , TDA2P-ABZ , TDA2P-ACD , TDA2SA , TDA2SG , TDA2SX , TDA3LA , TDA3LX , TDA3MA , TDA3MD , TDA3MV

16.2.2 LMbench Latency

Lat_mem_rd is the bandwidth micro benchmark of LMbench.

lat_mem_rd measures memory read latency for varying memory sizes and strides. The results are reported in nanoseconds per load.
The entire memory hierarchy is measured, including on-board cache latency and size, external cache latency and size, main memory latency, and TLB miss latency.
Only data accesses are measured; the instruction cache is not measured.
The benchmark runs as two nested loops. The outer loop is the stride size. The inner loop is the array size. For each array size, the benchmark creates a ring of pointers that point backward one stride. Traversing the array is done by p = (char **)*p; in a for loop (the over head of the for loop is not significant; the loop is an unrolled loop 100 loads long).
The size of the array varies from 512 bytes to (typically) 8 megabytes. For the small sizes, the cache will have an effect and the loads will be much faster. This becomes much more apparent when the data is plotted.
Default stride length is 128.

The result of the latency micro benchmark plots are shown in the next sections. The index shown has the format Bare metal (BM) – device name (CPU Frequency : DDR frequency). Bare metal since it is done in a no-operating system environment.