SPRUIV4D May 2020 – May 2024
The C7100 CPU has two streaming engines. A streaming engine is a feature of the C7000 CPU cores that aids in loading data from memory to the CPU. The streaming engines can significantly improve the performance of the memory hierarchy by prefetching data from memory to a location near the CPU. Prefetching data can significantly reduce the time needed to bring data into the CPU. It may also reduce the number of L1 data cache capacity misses as the L1 cache is bypassed for data accessed through the streaming engine.
The streaming engine supports up to a
six-dimensional address access pattern. When the performance bottleneck involves
reads from memory (if D unit resource bound dominates or cache misses dominate),
consider using one or both of the streaming engines if the access pattern to the
objects in memory is known in advance. Streaming engines have the greatest effect
when used in conjunction with loops that are vectorized by hand. For more
information on the streaming engine and code examples, please see the C71x DSP
CPU, Instruction Set, and Matrix Multiply Accelerator Technical Reference
Manual (SPRUIP0), the C7000 Optimizing C/C++ Compiler User’s Guide
(SPRUIG8), and the c7x_strm.h
file in the
include
directory of the compiler's installation directory.
The C7000 compiler does not yet make automatic use of the streaming engine feature.