SPRUIG3C January 2018 – August 2019 TDA4VM , TDA4VM-Q1
The SEs provide the most efficient addressing, but are restricted as follows:
The migration tool allocates the two SE resources to what it considers to be the two highest priority loads. Loads in the innermost loop are considered to have the highest priority. The heuristic simply picks the first two loads in the innermost loop; if there are fewer than two, it moves to the next outer loop.
For SE-based loads, the migration tool generates the following sequence of steps:
init()
function, the migration tool generates a call to
the SE_init()
template function in the virtual machine, which
returns an SE setup vector. (The ISA spec refers to the setup vector as an SE
template; here we use the term setup vector to avoid confusion with the
virtual machine’s C++ templates). The setup vector is saved in the tvals
structure for later access by the vloops()
function. The setup
vector consists of static (compile-time) and dynamic (run-time) values. The
static values correspond to flags in the SE setup vector and are determined from
the distribution mode and data type. These are passed as template parameters to
SE_init()
. The dynamic values correspond to stride and trip
count values that are determined from the terms in the Agen expression and loop
trip counts. These are passed as runtime arguments to
SE_init()
.init()
function, the migration tool generates the
expression that represents the base address and saves that in another field of
the tvals structure.vloops()
function, outside the outermost loop, the
migration tool generates a call to the SE_OPEN()
intrinsic,
passing it both the setup vector and the base address from the tvals
structure.__se_ac_<type>
intrinsic for
the access, which turns into a quasi-register operand SEn++
containing the loaded value.SEn++
operand into the instruction where the value is used, thereby eliminating the
load instruction altogether.SE_CLOSE()
intrinsic after the loop nest.