SPRUIG8J January 2018 – March 2024
A set of utilities are provided in the
compiler library for writing vector-width independent code for C7000. To use these
utilities, #include c7x_scalable.h
in source code.
These utilities are available for use in C++ code only due to use of C++ language features in their implementation.
These utilities are available when using the TI C7000 compiler or when compiling with TI C7000 Host Emulation.
The following APIs are available, all of which are described in further detail in the
c7x_scalable.h
file:
c7x::max_simd<T>::value
c7x::element_count_of<T>::value
c7x::element_type_of<T>::type
c7x::component_type_of<T>::type
c7x::make_vector<T,N>::type
c7x::make_full_vector<T>::type
c7x::is_target_vector<T>::value
c7x::char_vec
c7x::short_vec
etc
c7x::char_hvec
c7x::short_hvec
etc
c7x::char_qvec
c7x::short_qvec
etc
c7x::char_vec_ptr
c7x::const_short_vec_ptr
etc
c7x::reinterpret<T>(v)
c7x::convert<T>(v)
c7x::as_char_vec(v)
c7x::convert_short_vec(v)etc
c7x::se_veclen<T>::value
c7x::se_eletype<T>::value
c7x::sa_veclen<T>::value
c7x::strm_eng<I,T>::get()
c7x::strm_eng<I,T>::get_adv()
c7x::strm_agen<I,T>::get(p)
c7x::strm_agen<I,T>::get_adv(p)
c7x::strm_agen<I,T>::get_vpred()
The following macros are defined by
c7x_mma.h
and can be used to determine information about the
MMA for use with the scalable vector programming model:
Macro Syntax | Description |
---|---|
__MMA_A_MAT_BYTES__ | The size of an A matrix in bytes. Currently, each A matrix contains one row. |
__MMA_A_ROW_WIDTH_BYTES__ | The size of a row in an A matrix in bytes. |
__MMA_A_ROWS__ | The number of rows in an A matrix. |
__MMA_A_COLS(ebytes) | The number of columns in an A matrix given the number of bytes in
each element of A. Often useful with sizeof(). For example,
__MMA_A_COLS(sizeof(short)) . |
__MMA_A_ENTRIES__ | The number of A entries that can be contained in the A storage. |
__MMA_B_MAT_BYTES__ | The size of a B matrix in bytes. |
__MMA_B_ROW_WIDTH_BYTES__ | The size of a row in a B matrix in bytes. |
__MMA_B_ROWS(ebytes) | The number of rows in a B matrix given the number of bytes in
each element of B. Often useful with sizeof(). For example,
__MMA_B_ROWS(sizeof(short)) . |
__MMA_B_COLS(ebytes) | The number of columns in a B matrix given the number of bytes in
each element of B. Often useful with sizeof(). For example,
__MMA_B_COLS(sizeof(short)) . |
__MMA_C_MAT_BYTES__ | The size of a C matrix. Currently, each C matrix contains one row. Currently the C matrix is 4 times wider than the A matrix for larger accumulators. |
__MMA_C_ROW_WIDTH_BYTES__ | The size of a row in a C matrix. |
__MMA_C_ROWS__ | The number of rows in a C matrix. |
__MMA_C_COLS(ebytes) | The number of columns in a C matrix given the number of bytes in
each element of C. Often useful with sizeof(). For example,
__MMA_C_COLS(sizeof(short)) . |
__MMA_C_ENTRIES__ | The number of C entries that can be contained in C storage. |
As a moderate complexity example, the following is an implementation of a C++ function template for memcpy that uses the input type as a template. This example uses a Streaming Engine and a Streaming Address Generator (see Section 4.15).
#include <c7x_scalable.h>
using namespace c7x;
/* memcpy_scalable_strm<typename S>(const S*in, S *out, int len)
*
* S - A basic data type such as short or float.
* in - The input buffer.
* out - The output buffer.
* len - The number of elements to copy.
*
* Defaulted template arguments:
* V - A full vector type of S
*/
template<typename S,
typename V = typename make_full_vector<S>::type>
void memcpy_scalable_strm(const S *restrict in, S *restrict out, int len)
{
/*
* Find the maximum number of vector loads/stores needed to copy the buffer,
* including any remainder.
*/
int cnt = len / element_count_of<V>::value;
cnt += (len % element_count_of<V>::value > 0);
/* Initialize the SE for a linear read in and the SA for a linear write out. */
__SE_TEMPLATE_v1 in_tmplt = __gen_SE_TEMPLATE_v1();
__SA_TEMPLATE_v1 out_tmplt = __gen_SA_TEMPLATE_v1();
in_tmplt.VECLEN = se_veclen<V>::value;
in_tmplt.ELETYPE = se_eletype<V>::value;
in_tmplt.ICNT0 = len;
out_tmplt.VECLEN = sa_veclen<V>::value;
out_tmplt.ICNT0 = len;
__SE0_OPEN(in, in_tmplt);
__SA0_OPEN(out_tmplt);
/* Perform the copy. If there is remainder, the last store will be predicated. */
int i;
for (i = 0; i < cnt; i++)
{
V tmp = strm_eng<0, V>::get_adv();
__vpred pred = strm_agen<0, V>::get_vpred();
V *addr = strm_agen<0, V>::get_adv(out);
__vstore_pred(pred, addr, tmp);
}
__SE0_CLOSE();
__SA0_CLOSE();
}