16.2.2 LMbench Latency
Lat_mem_rd is the bandwidth micro benchmark of LMbench.
- lat_mem_rd measures memory read latency for varying memory sizes and strides. The results are reported in nanoseconds per load.
- The entire memory hierarchy is measured, including on-board cache latency and size, external cache latency and size, main memory latency, and TLB miss latency.
- Only data accesses are measured; the instruction cache is not measured.
- The benchmark runs as two nested loops. The outer loop is the stride size. The inner loop is the array size. For each array size, the benchmark creates a ring of pointers that point backward one stride. Traversing the array is done by p = (char **)*p; in a for loop (the over head of the for loop is not significant; the loop is an unrolled loop 100 loads long).
- The size of the array varies from 512 bytes to (typically) 8 megabytes. For the small sizes, the cache will have an effect and the loads will be much faster. This becomes much more apparent when the data is plotted.
- Default stride length is 128.
The result of the latency micro benchmark plots are shown in the next sections. The index shown has the format Bare metal (BM) – device name (CPU Frequency : DDR frequency). Bare metal since it is done in a no-operating system environment.